Crawling is the process by which Googlebot discovers new and updated pages to be added to the Google index. Googlebot uses an algorithmic process: computer programs determine which sites to crawl, how often, and how many pages to fetch from each site. Google’s crawl process begins with a list of web page URLs, generated from previous crawl processes, and augmented with Sitemap data provided by webmasters. As Googlebot visits each of these websites it detects links on each page and adds them to its list of pages to crawl. New sites, changes to existing sites, and dead links are noted and used to update the Google index.
Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such as Title tags and ALT attributes. Googlebot can process many, but not all, content types. For example, we cannot process the content of some rich media files or dynamic pages.
When a user enters a query, Google machines search the index for matching pages and return the results we believe are the most relevant to the user. Relevancy is determined by over 200 factors, one of which is the PageRank for a given page. PageRank is the measure of the importance of a page based on the incoming links from other pages. In simple terms, each link to a page on your site from another site adds to your site’s PageRank. Not all links are equal: Google works hard to improve the user experience by identifying spam links and other practices that negatively impact search results. The best types of links are those that are given based on the quality of your content. In order for your site to rank well in search results pages, it’s important to make sure that Google can crawl and index your site correctly. Google Webmaster Guidelines outline some best practices that can help you avoid common pitfalls and improve your site’s ranking.
Google’s Did you mean and Google Autocomplete features are designed to help users save time by displaying related terms, common misspellings, and popular queries. Like our google.com search results, the keywords used by these features are automatically generated by our web crawlers and search algorithms. We display these predictions only when we think they might save the user time. If a site ranks well for a keyword, it’s because we’ve algorithmically determined that its content is more relevant to the user’s query.
Check your site is in the Google index
- Do a site: search When a webmaster tells us that his or her site has fallen out of our search results, we often find that it’s still included. To quickly determine whether your site is still in our index, just perform a Google site search for its entire URL.
- Verify that your site ranks for your domain nameDo a Google search for www.[yourdomain].com. If your site doesn’t appear in the results, or if it ranks poorly in the results, this is a sign that your site may have a manual spam action for violations of the Webmaster Guidelines. Sitemaps are a great way to tell Google about the pages you consider most important.
See if your site has been impacted by a manual spam action
- Check the Manual Actions page If your site’s ranking is impacting by a manual spam action, we’ll tell you about it on the Manual Actions page of Search Console.
Make sure Google can find and crawl your site
- Check for crawl errors. The Crawl errors page in Search Console provides details about the URLs in your site that we tried to crawl and couldn’t access. Review these errors, and fix any you can. The next time Googlebot crawls your site, it will note the changes and use them to update the Google index.
- Review your robots.txt file. The Test robots.txt tool lets you analyze your robots.txt file to see if you’re blocking Googlebot from any URLs or directories on your site.
- Make sure that the URLs haven’t been blocked with meta tags.
- Use 301 redirects (“RedirectPermanent”) in your .htaccess file to smartly redirect users, Googlebot, and other spiders. (In Apache, you can do this with an .htaccess file; in IIS, you can do this through the administrative console.) For more information about 301 HTTP redirects, please see http://www.ietf.org/rfc/rfc2616.txt.
- Consider creating and submitting a Sitemap. Even if your site is already indexed, Sitemaps are a way to give Google information about your site and the URLs you consider most important. Sitemaps are particularly helpful if your site has dynamic content or other content not easily discoverable by Googlebot, or if your site is new or does not have many links to it.
Make sure Google can index your site
Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such as title tags and alt attributes. Google can process many types of content. However, while we can process HTML, PDF, and Flash files, we have a more difficult time understanding (e.g. crawling and indexing) other rich media formats, such as Silverlight.
- Check your site’s index stats.
Make sure your content is relevant and useful
- Understand how users are reaching your site by reviewing the Search queries page. The first column shows the Google searches in which your site most often appears. The page also lists the number of impressions, the number of clicks, and the CTR (click-through rate) for each query. This information is particularly useful because it gives you an insight into what users are searching for (the query), and the queries for which users often click on your site. For example, your site may often appear in Google searches for espresso gadgets and coffee widgets, but if your site has a low CTR for this query, it could be because it’s not clear to users that your site contains information about coffee widgets. In this case, consider revising your content to make it more compelling and relevant. Avoid keyword stuffing.
- Check the HTML Improvements page in Search Console. Descriptive information in title tags and meta descriptions will give us good information about the content of your site. In addition, this text can appear in search results pages, and useful, descriptive text is more likely to be clicked on by users.
- Natural links to your site develop as part of the dynamic nature of the web when other sites find your content valuable and think it would be helpful for their visitors.
- Check to see if any of your content has been flagged as adult content by turning off SafeSearch. Google’s SafeSearch filter eliminates sites that contain pornography and explicit sexual content from search results. While no filter is 100% accurate, SafeSearch uses advanced proprietary technology that checks keywords and phrases, URLs, and Open Directory categories.
- Great image content can be an excellent way to generate traffic. Create the best user experience you can, and follow our image guidelines.
There’s almost nothing a competitor can do to harm your ranking or have your site removed from our index. Occasionally, fluctuation in search results is the result of differences in our data centers. When you perform a Google search, your query is sent to a Google data center in order to retrieve search results. There are numerous data centers, and many factors (such as geographic location and search traffic) determine where a query is sent. Because not all of our data centers are updated simultaneously, it’s possible to see slightly different search results depending on which data center handles your query.
Create custom 404 pages
A 404 page is what a user sees when they try to reach a non-existent page on your site (because they’ve clicked on a broken link, the page has been deleted, or they’ve mistyped a URL).