How to Create a Robots.txt file for your website including WordPress sites

Have you ever wondered why pages or contents within your website is appearing in search results? Or perhaps you are having problems being seen by Google, Yahoo & friends? – Well maybe it could be down to you and the way your website is configured. One of the first things you should check with your website is the robots.txt file, not only is it the easiest place to check but also one of the quickest.

What is a robots.txt file?

In simple terms, a robots.txt file is a plain text file (created using Notepad Wordpad or similar) and usually contains a few lines of code to help instruct spiders or crawlers (Google, Yahoo etc) what to add to their index/ search engines when they come to visit your site. This file normally gets read first by search engines after the HTTP headers are called. Please note that the robots.txt is only a guide for search engine providers, not all search engines will follow your rules but the most common providers will.

Why would I want to use a robots.txt?

In today’s age it’s almost criminal not to have a robots.txt file. It can be a simple but powerful addition to help spiders crawl certain areas of your website that matter the most to improve rankings. Or to even tell search engines to stay away from sensitive areas that you don’t want to be released publicly. Or in fact, a simple way of telling search engines about our sitemaps and where they are located, this is a huge factor about search engine optimisation. After all SEO is primarily about helping search engines index your website correctly, efficiently and helping your visitors get the information they need.

How to create a robots.txt file – for WordPress sites

Below is a quick guide and a template on how to create a robots.txt file for WordPress based websites. Please use this as a guide only as some WordPress site structures may differ depending on how you or your web development personnel set-up your website. If you have any questions or may require help please don’t hesitate to ask me in the comment below.

User-agent: *

Allow: /
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Disallow: /trackback/
Disallow: /xmlrpc.php
Allow: /wp-content/uploads/

#Sitemap
Sitemap: http://www.mywebsiteurl.co.uk/path-to-sitemap.xml

Replace the sitemap URL with your own.

A brief explanation of the above robots.txt code.

User-Agent:

This needs to be declared so you can tell all or specific search engines to crawl areas of your website. By adding the ‘*’ character your telling all search engines (bots,crawlers etc) to follow the following commands that we list out. You can target specific search engines like ‘ Google Images’, Yahoo or Bing… the list is endless depending on your needs. I may write a more-in depth article why you would want to do this at a later stage.

Allow:

As it says on the ‘tin’, basically you are telling what bots from the ‘User-Agent’ command to allow them access to. The best way is to simply allow them access to all your website’s files and folders by the example above – ‘/’ a forward slash. – You can hide certain directories at a later stage if need be, which you’ll need to declare. To add multiple directories or files you need to declare each on a new line like shown above. But as long as you have the first statement “Allow: /” this will be enough, then you jump on to the next stage “Disallow”.

Disallow:

Disallow certain directories or files. Again, declare each one on a new line. This is probably the important area for SEO, by default if a robots.txt file doesn’t exist crawlers will scan the whole site regardless. But with the disallow command you can leave files and folders out of the SERPs (Search Engine Results Pages) like login area, admin areas, sensitive pages like checkouts.

Create a generic robots.txt file for standard websites

What I mean by standard websites is websites that aren’t built around a CMS structure or a custom structure. Being a web designer or developer you should know what files or folders to disallow access to within your robots.txt file.

User-agent: *

Allow: /

#Sitemap
Sitemap: http://www.mywebsiteurl.co.uk/path-to-sitemap.xml

Don’t forget to swap out ‘www.mywebsiteurl.co.uk’ with your own path to your sitemap.

If you have any suggestions, questions or need help with how to create a robots.txt file I’ll be happy to help, just reply in the comments below.