Stop or Prevent Google Bot to Crawl Attachment Image Files in WordPress using Robots.txt, Plugin or Code & there are many methods to disable this option.Barring the great features of WordPress, there are some flaws or holes in how WordPress generates the different pages of your site. And a webmaster should be very careful in dealing with the unnecessary machine generated pages on his/her blog.
WordPress Image Attachment Pages
Whenever you write a new blog post and attach an image within the WordPress post editor, the image is uploaded to the default image directory of your WordPress installation. Let’s say you have installed WordPress in a custom directory of your website e.g www.domain.com/files/ but using the web address as www.domain.com.
When you upload a new image to one of your blog posts, the image will be stored under www.domain.com/files/wp-content/uploads by default. You can however define a custom directory or subdomain for images and call the image from the FTP server using “Add Image from URL” .But in case you’re using the earlier option (i.e uploading images from the WordPress post editor), it might actually hurt your site’s visibility in the eyes of the search engine.
This is because whenever you upload an image from the WYSIWYG post editor, WordPress creates a separate image attachment page. The image attachment page will hold only the image of the post with practically no content in it.
Let’s say one of your blog posts whose URL is http://domain.com/hello-world/ has an image with the file name of welcome.png. When you publish that article, the URL of your blog post will be http://domain.com/hello-world/1/ (this depends upon the permalink structure of your site).
Additionally, a new page will also be created which will contain only the image e.g http://domain.com/hello-world/welcome/. The following demonstration will clear the idea
Now here are some problems with WordPress image attachment pages.
1. If you add 5 images to one blog post, WordPress will create 6 URL’s after you hit the “Publish” button. One URL will be that of the original article, while 5 other URL’s will be generated for all those images contained in the blog post.
2. Checking the source code for WordPress image attachment pages, you will find that all of them do not contain the “NoIndex” meta tag. Which means, the search bots can discover and index those useless Image attachment pages.
3. SEO plugins like All In One SEO Pack have the ability to enter the canonical URL of a page within the source but surprisingly, the canonical URL generated for each image attachment page is not the actual URL of your blog post. It is the same URL of the image attachment page, as shown in the following screenshot
Disallow Goolge to Crawl using Robots.txt
The WordPress robots.txt file is used to give the search engines “robots” instructions to follow, when they crawl your WordPress blog. These instructions will tell search engines not to crawl non-relevant files, folders, images and duplicate content. Excluding non-relevant files, such as “/wp-admin/”, “/wp-content”, “/wp-includes/” will save bandwidth and speed up the search engine crawling process, when they access your site. Read Allow or disallow using Robots.txt
Just Use Below Code:
Disallow: /wp-admin/ Disallow: /wp-content/ Disallow: /cgi-bin/
Add or edit directory that you want.
How The Image Attachment Page Of WordPress Might Hurt Your Site
Google has recently tightened it’s grips on spam sites who plagiarize content from genuine sources or do not produce original content on their own. The current buzz word is “Content farm” and if your site has a lot of unnecessary pages with practically no content in them – your site might be accidentally sending “Content farm signals” to the Googlebot.
Furthermore, linking to images from the actual post will allow the Googlebot to crawl those attachment pages and will dilute the Google juice flowing through the actual content pages. If you populate Google’s web index with “content-less” pages, your site may get flagged as a content mill.
Recently, while checking through the error reports of Google webmaster tools, I found that a lot of these image attachment pages were indexed. Checking through the source code of these image attachment pages, I found that they are not Noindexed.
This is a serious SEO blunder !
How To Prevent Indexing of WordPress Image Attachment Pages
There are two ways to prevent the crawling of useless image attachment pages, generated by WordPress.
Either, add a “Noindex,Follow” meta tag to all image attachment pages. You will need to edit the attachment.php or single.php file of your theme and add the meta tag manually.
A much simpler option is to use the Robots Meta WordPress plugin. This is by far one of the best WordPress plugin to Noindex the unnecessary pages of your site e.g tag, category, date, author etc. Once you have installed the plugin, open the plugin options and you will see an entry as “Redirect attachment URL’s to parent Post URL”
Once you have enabled that selection, all the image attachment pages of WordPress will redirect to their original post pages. The redirect in place will be a 301 permanent redirect which will pass on the message to the Googlebot “This page has permanently moved to this new location”
Once you are sure that all the attachment pages are properly redirecting to their individual post URL’s, login to your Google webmaster tool reports and use the URL removal tool to remove all the previously crawled image URL’s of your site.
301 redirecting useless pages to their actual content is a good practice – this helps the search engines discover the actual content on your site without letting them divert across an enormus cloud of useless, junk pages.
Hope you are happy after Reading this.