What is a common problem when looking for sources using a search engine like google?

Summary:

This section covers finding information online. It includes information about search engines, Boolean operators, Web directories, and the invisible Web. It also includes an extensive, annotated links section.

A search engine is a device that sends out inquiries to sites on the Web and catalogs any Web site it encounters, without evaluating it. Methods of inquiry differ from search engine to search engine, so the results reported by each one will also differ. Search engines maintain an incredibly large number of sites in their archives, so you must limit your search terms in order to avoid becoming overwhelmed by an unmanageable number of responses.

Search engines are good for finding sources for well-defined topics. Typing in a general term such as "education" or "Shakespeare" will bring back far too many results, but by narrowing your topic, you can get the kind (and amount) of information that you need.

Example:

  • Go to Google (a search engine)
  • Type in a general term ("education")
  • Add modifiers to further define and narrow your topic ("rural education Indiana")
  • Be as specific as you can ("rural education Indiana elementary school")
  • Submit your search.

Adjust your search based upon the number of responses you receive (if you get too few responses, submit a more general search; if you get too many, add more modifiers).

Learn how the search engine works

Read the instructions and FAQs located on the search engine to learn how that particular site works. Each search engine is slightly different, and a few minutes learning how to use the site properly will save you large amounts of time and prevent useless searching.

Each search engine has different advantages. Google is one of the largest search engines, followed closely by MSN and Yahoo . This means that these three search engines will search a larger portion of the Internet than other search engines. Lycos allows you to search by region, language, and date. Ask allows you to phrase your search terms in the form of a question. It is wise to search through multiple search engines to find the most available information.

Select your terms carefully

Using inexact terms or terms that are too general will cause you problems. If your terms are too broad or general, the search engine may not process them. Search engines are programmed with various lists of words the designers determined to be so general that a search would turn up hundreds of thousands of references. Check the search engine to see if it has a list of such stopwords. One stopword, for example, is "computers." Some search engines allow you to search stopwords with a specific code (for Google, entering a "+" before the word allows you to search for it).

If your early searches turn up too many references, try searching some relevant ones to find more specific or exact terms. You can start combining these specific terms with NOT (see the section on Boolean operators below) when you see which terms come up in references that are not relevant to your topic. In other words, keep refining your search as you learn more about the terms.

You can also try to make your terms more precise by checking the online catalog of a library. For example, check THOR+, the Purdue University Library online catalog, and try their subject word search. Or try searching the term in the online databases in the library.

Most search engines now have "Advanced Search" features. These features allow you to use Boolean operators (below) as well as specify other details like date, language, or file type.

Know Boolean operators

Most search engines allow you to combine terms with words (referred to as Boolean operators) such as "and," "or," or "not." Knowing how to use these terms is very important for a successful search. Most search engines will allow you to apply the Boolean operators in an "advanced search" option.

AND

AND is the most useful and most important term. It tells the search engine to find your first word AND your second word or term. AND can, however, cause problems, especially when you use it with phrases or two terms that are each broad in themselves or likely to appear together in other contexts.

For example, if you'd like information about the basketball team Chicago Bulls and type in "Chicago AND Bulls," you will get references to Chicago and to bulls. Since Chicago is the center of a large meat packing industry, many of the references will be about this since it is likely that "Chicago" and "bull" will appear in many of the references relating to the meat-packing industry.

OR

Use OR when a key term may appear in two different ways.

For example, if you want information on sudden infant death syndrome, try "sudden infant death syndrome OR SIDS."

OR is not always a helpful term because you may find too many combinations with OR. For example, if you want information on the American economy and you type in "American OR economy," you will get thousands of references to documents containing the word "American" and thousands of unrelated ones with the word "economy."

NEAR

NEAR is a term that can only be used on some search engines, and it can be very useful. It tells the search engine to find documents with both words but only when they appear near each other, usually within a few words.

For example, suppose you were looking for information on mobile homes, almost every site has a notice to "click here to return to the home page." Since "home" appears on so many sites, the search engine will report references to sites with the word "mobile" and "click here to return to the home page" since both terms appear on the page. Using NEAR would eliminate that problem.

NOT

NOT tells the search engine to find a reference that contains one term but not the other. This is useful when a term refers to multiple concepts.

For example, if you are working on an informative paper on eagles, you may encounter a host of Web sites that discuss the football team the Philadelphia Eagles, instead. To omit the football team from your search results, you could search for "eagles NOT Philadelphia."

Google Search is a fully-automated search engine that uses software known as web crawlers that explore the web regularly to find pages to add to our index. In fact, the vast majority of pages listed in our results aren't manually submitted for inclusion, but are found and added automatically when our web crawlers explore the web. This document explains the stages of how Search works in the context of your website. Having this base knowledge can help you fix crawling issues, get your pages indexed, and learn how to optimize how your site appears in Google Search.

Looking for something less technical? Check out our How Search Works site, which explains how Search works from a searcher's perspective.

A few notes before we get started

Before we get into the details of how Search works, it's important to note that Google doesn't accept payment to crawl a site more frequently, or rank it higher. If anyone tells you otherwise, they're wrong.

Google doesn't guarantee that it will crawl, index, or serve your page, even if your page follows Google's guidelines and policies for site owners.

Google Search works in three stages, and not all pages make it through each stage:

  1. Crawling: Google downloads text, images, and videos from pages it found on the internet with automated programs called crawlers.
  2. Indexing: Google analyzes the text, images, and video files on the page, and stores the information in the Google index, which is a large database.
  3. Serving search results: When a user searches on Google, Google returns information that's relevant to the user's query.

Crawling

The first stage is finding out what pages exist on the web. There isn't a central registry of all web pages, so Google must constantly look for new and updated pages and add them to its list of known pages. This process is called "URL discovery". Some pages are known because Google has already visited them. Other pages are discovered when Google follows a link from a known page to a new page: for example, a hub page, such as a category page, links to a new blog post. Still other pages are discovered when you submit a list of pages (a sitemap) for Google to crawl.

Once Google discovers a page's URL, it may visit (or "crawl") the page to find out what's on it. We use a huge set of computers to crawl billions of pages on the web. The program that does the fetching is called Googlebot (also known as a robot, bot, or spider). Googlebot uses an algorithmic process to determine which sites to crawl, how often, and how many pages to fetch from each site. Google's crawlers are also programmed such that they try not to crawl the site too fast to avoid overloading it. This mechanism is based on the responses of the site (for example, HTTP 500 errors mean "slow down") and settings in Search Console.

However, Googlebot doesn't crawl all the pages it discovered. Some pages may be disallowed for crawling by the site owner, other pages may not be accessible without logging in to the site, and other pages may be duplicates of previously crawled pages. For example, many sites are accessible through the www (www.example.com) and non-www (example.com) version of the domain name, even though the content is identical under both versions.

During the crawl, Google renders the page and runs any JavaScript it finds using a recent version of Chrome, similar to how your browser renders pages you visit. Rendering is important because websites often rely on JavaScript to bring content to the page, and without rendering Google might not see that content.

Crawling depends on whether Google's crawlers can access the site. Some common issues with Googlebot accessing sites include:

Indexing

After a page is crawled, Google tries to understand what the page is about. This stage is called indexing and it includes processing and analyzing the textual content and key content tags and attributes, such as <title> elements and alt attributes, images, videos, and more.

During the indexing process, Google determines if a page is a duplicate of another page on the internet or canonical. The canonical is the page that may be shown in search results. To select the canonical, we first cluster the pages that we found on the internet that have similar content, and then we select the one that's most representative of the group. The other pages in the group are alternate versions that may be served in different contexts, like if the user is searching from a mobile device or they're looking for a very specific page from that cluster.

Google also collects signals about the canonical page and its contents, which may be used in the next stage, where we serve the page in search results. Some signals include the language of the page, the country the content is local to, the usability of the page, and so on.

The collected information about the canonical page and its cluster may be stored in the Google index, a large database hosted on thousands of computers. Indexing isn't guaranteed; not every page that Google processes will be indexed.

Indexing also depends on the content of the page and its metadata. Some common indexing issues can include:

Serving search results

Google doesn't accept payment to rank pages higher, and ranking is done programmatically.

When a user enters a query, our machines search the index for matching pages and return the results we believe are the highest quality and most relevant to the user. Relevancy is determined by hundreds of factors, which could include information such as the user's location, language, and device (desktop or phone). For example, searching for "bicycle repair shops" would show different results to a user in Paris than it would to a user in Hong Kong.

Search Console might tell you that a page is indexed, but you don't see it in search results. This might be because:

While this guide explains how Search works, we are always working on improving our algorithms. You can keep track of these changes by following the Google Search Central blog.