Sunday, May 02, 2010

Google search tip: eliminating the wikipedia based splogs

Great tip in a rich but rambling article ...
Nerd Skill Number One (Dans Data)

... suppose you're looking up some fairly obscure subject, and the "best" page Google finds for you is a small, badly-written Wikipedia article with no references. The PageRank-zero personal site with the answer to your question is out there somewhere, but it'll be pushed well off the first results page by umpteen copies of that Wikipedia article on podunk ad-farm "encyclopedia" sites that take advantage of - or completely ignore - Wikipedia's generous licensing terms.

To avoid seeing all those, you need only add -"[some string from the Wikipedia article]" to your Google search. Usually, it only takes one such minused phrase to clear sufficient of the copies that the page you really want will bubble up onto the first page of results.

This is connected to an interesting, and immensely useful, property of human language, which is that the combinatorial explosion of possible grammatical sentences (as opposed to random strings of words, or of letters) means that most sentences of only six words are likely to be unique...
There are more tips in the essay. This one's a gem though.

It's a bit surprising Google can't dump the wikipedia splogs though.

No comments:

Post a Comment