Spamdexing

  • Post author:
  • Post category:Privacy

Adapted from an article in http://www.wikipedia.org

Spamdexing is computer jargon, a portmanteau of “spamming” and “indexing”. It is also called search engine spamming and refers to the practice on the World Wide Web of deliberately modifying HTML pages to increase the chance of them being placed high on search engine relevancy lists. People who do this are called search engine spammers.

Search engines use a variety of algorithms to determine relevancy ranking. Some of these include determining whether the search term appears in meta-keywords, others whether the search term appears in the body text of a web page. A variety of techniques are used to spamdex, including listing chosen keywords on a page in small-point font face the same colour as the page background (rendering it invisible to humans but not search engine web crawlers).

Search engine spammers are generally aware that the content that they promote is not very useful or irrelevant to the ordinary internet surfer. They try to use dirty methods that will make the website appear above more relevant websites in the search engine listings.

Here are some common spamdexing techniques:

  • Hidden or invisible text
    • Disguising keywords and phrases by making them the same colour as the background, using a tiny font size or hiding them within the HTML code such as no frame sections, alt tags and no script sections.
  • Keyword stuffing (also known as keyword spamming)
    • Repeated use of a word to increase its frequency on a page. Most search engines have the ability to analyze a page and determine whether the frequency is above a “normal” level.
  • Meta tag stuffing
    • Repeating keywords in the Meta tags more than once, and using keywords that are unrelated to the site’s content.
  • Hidden links
    • Putting links where visitors will not see them in order to increase link popularity.
  • Mirror websites
    • Hosting of multiple websites all with the same content but using different URL’s.
  • Gateway or doorway pages
    • Creating low-quality web pages that contain very little content but are instead stuffed with very similar key words and phrases. They are designed to rank highly within the search results. A doorway page will generally have “click here to enter” in the middle of it.
  • Page redirects
    • Taking the user to another page without his or her intervention, e.g. using META refresh tags, CGI scripts, Java, JavaScript, or server side techniques.
  • Cloaking
    • Sending to a search engine a version of a web page different from what web surfers see.
  • Code swapping
    • Optimizing a page for top ranking, then swapping another page in its place once a top ranking is achieved.
  • Link spamming
    • Link spam takes advantage of Google’s PageRank algorithm, which gives a higher ranking to a website the more other websites link to it. A spammer may create multiple web sites at different domain names that all link to each other. Another technique is to take advantage of web applications such as weblogs and wikis that display hyperlinks submitted by anonymous or pseudonymous users.
  • Referrer log spamming
    • When someone accesses a web page, i.e. the referree, by following a link from another web page, i.e. the referrer, the referree is given the address of the referrer by the person’s internet browser. Some websites have a referrer log which shows which pages link to that site. By having a robot randomly access many sites enough times, with a message or specific address given as the referrer, that message or internet address then appears in the referrer log of those sites that have referrer logs. Since some search engines base the importance of sites by the number of different sites linking to them, referrer log spam may be used to increase the search engine rankings of the spammer’s sites, by getting the referrer logs of many sites to link to them.

Spamdexing often gets confused with legitimate search engine optimization (SEO) techniques, which do not involve deceit.

Spamming involves getting web sites more exposure than they deserve for their keywords, leading to unsatisfactory search results. Optimization involves getting web sites the rank they deserve on the most targeted keywords, leading to satisfactory search experiences. To be sure, there is much gray area between the two extremes. The root problem is that search engine administrators and web site builders have different agendas: the search engine wants to present valuable search results, the webmaster just wants to come up first, particularly if he/she runs a commercial website and needs visitor traffic from search engines and directories. For that reason, many search engine administrators say that any form of search engine optimization used to improve a website’s page rank is nothing else than spamdexing.

Many search engines check for instances of spamdexing and will remove suspect pages from their indexes.

In 2002, search engine manipulator SearchKing filed suit in the US District Court of the West District of Oklahoma against search engine Google (case No.02-CV-1457). SearchKing’s claim was that Google’s tactics to prevent spamdexing constituted an unfair business practice. This may be compared to lawsuits which email spammers have filed against spam-fighters, as in various cases against MAPS and other DNSBLs. In January of 2003, the court pronounced a summary judgment in Google’s favor.