Somethings are better left unsaid, Robot.txt

Robot Exclusion Protocol

Search engines or crawlers that work launching their robots on the hunt for your website pages, that will be added to their database, for later inclusion and indexing, check the contents of your robots.txt file too.

This file is placed in the root directory of the web site and is a plain text file, not an HTML file.

The existence and function of the robots.txt file is due to the protocols of W3, with the intention that the webmaster can hide/block content from the robot that you do not want to publicize or enforce such rules only for some or a few robots in particular.

The robot first looks at http://www.BostonMediaDomain.com/robots.txt when it lands on your website, so that’s where you should include your robots.txt file, in the root.

The number of requests received in this site section and logged in the website logs and analytic statistics, tells us the number of times we have been visited by the robot and which robots hit the site.

If you do not have a robots.txt file, the robot finds that there is no exclusion and can track any web site page without exception.

Examples:

User-agent: *
Disallow:

The asterisk * means all robots. In this case there is no special rule, so this file does not restrict access to any page or any robot. Implies full access.

User-agent: *
Disallow: / cgi-bin /
Disallow: / tmp /
Disallow: / comp /

All robots have restricted access to the / cgi-bin to / tmp or / comp.

Attention because you will need a sentence for each directory.

User-agent: *
Disallow: /

All robots are not permitted access to any directory on the web site.

User-agent: Googlebot
Disallow: /

Exclude a particular robot, in this case Google’s robot has no access to any directory.

User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: / comp / trouble.html

Now Google has no access to any directory, however, all other robots have unlimited access trouble.html except for the directory page draft that is restricted.

The important thing is to restrict taking into account the path to that file or directory.

And you can restrict access to a specific page with the META tags CONTENT =

Do not abuse the restrictions, remember that more pages that are indexed usually means much better promotion for your site and better SERP’s.

Share this:

Post navigation

Leave a Reply Cancel reply