Google-cache-only spam?

A helpful soul contacted me on IRC today: it turns out that the google cache for my homepage shows linkspam.

There is no linkspam just viewing the page. There is also no linkspam if I wget the page using the googlebot useragent.

But I’m not the only one affected by this linkspam. Very similar linkspam shows up on simplepie.org and svg-whiz.com.

The common denominator seems to be that we all use DreamHost for web hosting. I submitted a ticket with DreamHost yesterday: their support staff responded that this must not be a DreamHost issue, and it must be a problem with Google. I am extremely disappointed with DreamHost right now.

Has anyone heard of similar linkspam that only shows up for the Google webcrawler, but not for normal visitors? What other ways could it detect the googlebot, other than via useragent?

Atom Feed for Comments 12 Responses to “Google-cache-only spam?”

  1. Robert Accettura Says:

    My guess would be they aren’t using useragent but the IP block. There are several known IP blocks for Google. That way you can’t check as easily.

    I recall a similar hack being done before (either 2.3.x era or 2.5.0)… but a quick Google search turned up nothing. Looks like your running the latest version. Hardening your install is never a bad idea.

  2. Dan Says:

    Check the source of the Google cached page and see if you can apply CSS rules to hide the links. Won’t stop Google from following them but it might annoy DreamHost, but they can’t do anything about it since they claimed they weren’t doing it.

  3. Dan Says:

    You could also inform Google about what’s happening and ask them to check it out (since THEY have access to that IP block mentioned above) and ask them to contact DreamHost or even to remove DreamHost hosted sites from their index until DreamHost fixes the “problem”. That last one probably won’t happen right off the bat since it’s a bit extreme but if it does I bet the linkspam will disappear fairly quickly.

  4. shadytrees Says:

    Is this related to the FTP passwords leak a while back where a bunch of index.php files got linkspam inserted? If nothing else, you can do a diff between an official WordPress release and your current installation of it.

  5. Philip Taylor Says:

    Yahoo and Live Search have the same problem, so it is not limited to Google.

    Following some of the links shows that very nearly all are on Dreamhost – http://philip.html5.org/misc/spammy-sites.txt

  6. Benjamin Smedberg Says:

    Thanks guys: it seems right now that the linkspam is being triggered by specific IP blocks belonging to Y! and Google.

    It does not appear as if the FTP password leak is directly related to the spam (i.e. I’m pretty sure my user files have not been modified).

  7. damjan Says:

    I don’t see any linkspam on those google cache pages? Do you have a screenshot??

  8. Jesse Ruderman Says:

    The same thing happened to me during the second round of attacks associated with a DreamHost Panel security hole in May 2007. One of my WordPress files had been modified to include link spam, but only when visited from a Google IP address, and most of the spammy links went to other compromised sites that were also hosted by DreamHost. I noticed because Google delisted my site almost immediately. I found the changed files by searching my entire home directory for recently modified files.

    Be sure to change your site-related passwords (panel, ftp, blog, database).

    It’s pretty lame of support to blame Google when it’s much more likely that the DH-hosted sites were compromised. You might have better luck getting help in #dreamhost (irc.freenode.org).

  9. Ehsan Akhgari Says:

    And it seems like the spam content is unique!

    check out http://www.google.com/search?q=%22That%27s+woundily+cross+training%22 or http://www.google.com/search?&q=%22was+as+numeral+as+only+a+embow+targum+be%22

  10. Robert Accettura Says:

    Just wanted to mention:
    http://www.google.com/webmasters/

    You should sign up if you haven’t, and if you have… login and see if there are any messages. See:
    http://www.mattcutts.com/blog/helping-hacked-sites/

    for details.

  11. Andy Says:

    Hi I am having the same issue as the original post. I see that the cache of this site is now back to normal. Could you post what happened?

  12. Benjamin Smedberg Says:

    It was likely either a wordpress admin panel exploit or a dreamhost exploit: in either case, Jesse had it right: one of the wordpress files was altered so that link spam would only show up to Google’s webcrawler IP block.

Leave a Reply