Back in Janaury I mentioned that I’d met a puzzle I couldn’t solve … Google was indexing a lot of search results on Joomla sites.
The problem appeared all on kinds of Joomla sites and with all kinds of URL extensions. I just couldn’t work out what where the bug was in Joomla.
It turns out the bug was in Google.
The Google Bug
Google is trying a new crawling method … automatically filling in forms such as search boxes in order to try and find new URLs. Matt Cutts discusses it here. Unfortunately they are creating new pages as well as finding them and in Joomla the main outcome is that random search pages are indexed.
Example of the Problem URLs
- Default Joomla URLs : /index.php?option=com_
search&searchword=stuff - Default SEF URLs: /component/option,com_search/
Itemid,38/index.php?searchword=stuff - sh404SEF: /search/newest-first.
html?searchphrase=any&searchword=stuff
Solution
Add the search component to your robots.txt file. With the examples above, you would use this code:
- Default Joomla URLs: Disallow: /*com_search*/
- Default SEF URLs: Disallow: /*com_search*/
- sh404SEF: Disallow: /search/