Today’s post could really be called Duplicate Content in Joomla Part 2. Yesterday, I blogged about duplicate content in Joomla and the damage it can do to your SEO efforts. Well, soon afterwards I realised that a follow-up was needed after reading a post on Aaron Wall’s blog about how he helped another blogger increase his Google traffic by 1400% in a month by reducing the number of pages he had indexed.
Whats the Problem?
As can be seen by this post on the Joomla forums, some components can produce thousands of pages that are useless but still get indexed by Google. This particular user installed the Events Calendar component and ended up with empty pages indexed until the year 3200. He also ended up with a heavy penalty from Google.
Google is in the process of prioritizing the way it crawls pages. If you have a lot of junk, Googlebots will crawl your site less frequently. The days of when bigger sites were always better sites are gone. Each page only has a certain amount of Page Rank and link authority to spend. You need to spend it wisely on pages that matter. If your excessive content gets out of hand you’ll be hit by a rankings penalty.
The following is an incomplete list of components that can create empty and/or duplicate content:
These are not bad components. Use them, but handle with care.
Whats the solution?
Four things you can do to avoid SEO problems with components on your site:
- Carefully and regualarly monitor the pages you have indexed in Google. Check for any components that have too many pages indexed.
- Use your robots.txt to stop Google indexing components that might cause trouble. For example, we use the Amazon Products Feed Bridge on many sites. Its useful for visitors, but because Amazon products appear on so many other sites, its useless for visitors. So, I simply open up the robot.txt file and stop Google from indexing the component by adding: Disallow: /option,com_apf_bridge/
- Check your components carefully when you set them up. If any component produces many pages without any effort on your part, change the settings to minimize those pages, or use robots.txt.
- Turn off your RSS feeds unless you really believe they’re going to be useful. Don’t have RSS published just because its “Web 2.0″ and its cool. If you decided you do need RSS in place, use robots.txt to stop the search engines from indexing the feeds.