Home » Internet » Web Development

The Importance of the Robots.txt file

Aug 5, 2008
Despite the importance of the Robots.txt file in getting your website indexed with the major search engines, many webmasters don't offer one on their site. What is the robots.txt file you ask? If you don't know, you are far from alone. The robots.txt file is a simple text file (no html) that is placed in your website's root directory in order to tell the search engines which pages to index and which to skip.

When a search engine sends its webcrawler to your site, one of the first things the webcrawler will do is search the root directory for the robots.txt file. A correctly formated robots.txt file will consist of several records, each providing instructions for a particular search-bot. A record will generally consist of two components, the first is called the user-agent and is where the name of the search-bot is listed. The second line consits of one or more "disallow" lines. These lines tell the webcrawler which files or folders should not be indexed (ie a cgi-bin folder).

If you currently have a website and do not have a robots.txt file, you can create one easily. As mentioned earlier, the files are plain text, so just open up notepad and save the file at robots.txt. Most webmasters can use one record that will apply to all of the search engine crawlers. Once you have opened notepad enter the following:

User-agent: *

The "*" applies this rule to all bots. In this example, there is nothing listed in the disallow line. This tells the robot to index the entire site. You can also enter a folder path here such as "/private" if there is a folder that shouldn't be indexed. This can be very useful if you are still testing a portion of your website or is a section is still under construction.

Now that you know what should go into your robots.txt file, there are several common mistakes people make when creating these files. Never enter notes or comments into the file as these items can cause confusion for the webcrawler. Also, the format should always be the user-agent on the first line, followed by the disallow(s). Do not reverse the order. Another common mistake made involves using the incorrect case. If the disallowed folder is /private, make sure your robots.txt file does not list the folder as /Private. It seems like a very minor issue, but it will cause problems if done incorrectly. Finally, there is no Allow command. You cannot tell the webcrawler what to look at, only what not to look at.

If you are still curious about the robots.txt file you can find many more complex examples online. Just try one of your favorite websites and look for their robots.txt file. For example you can go to http://www.cnn.com/robots.txt. If you need help creating a robots.txt file for your site, there are plenty of places online that will create the file for you for free. One example is http://www.seochat.com/seo-tools/robots-generator/. Despite its apparently simplicity, this file can make or break your site's chances with the search engines. Make sure you have your robots.txt file in place and correctly formatted today.
About the Author
Justin Scarborough founded Profit Program Reviews in order
to help others interested in affiliate marketing sort out the valuable
information from the many scams out there. He also runs a webmasters website
directory at www.thetopweblist.com .
Please Rate:
(Average: Not rated)
Views: 177
Print Email Report Share
Article Categories