This is a republication of an article I wrote in early 2008, touching lightly upon the topic of Google queries and the mighty search-giants history. I first published the text on my then pet project site, pworks (which by the way now redirects to tenzui), and from there it got mirrored on a couple of technology-focused sites and forums Since writing this, both I and the world of search has progressed greatly, and I hope to be able to write a follow-up, more in-depth post on the topic of searching in the future. So if you're interested in that - stay connected. ;) Short History
Right, this will be a really, really short history lesson. If you're interested, check out what the people over there has written for yourself. (Link at page bottom)
So, Google was created by the duo Larry Page and Sergey Brin, two Stanford grad students who, although they didn't see eye-to-eye on many topics, were determined to crack the quite boring nut of organizing all that information that was spread out on the web. By 1997, their BackRub search engine had started gaining a sparkling reputation for its unique way of analyzing and ranking webpages through "back links", links pointing to a page from other pages. The system also gained attention for its interesting server environment, contrary to the "normal" high-end servers, BackRub ran on a collection of simpler PCs, collected from the campus' nooks and crannies.
From there, the story is one of unfathomable success ("Instead of discussing all the details, why don't I just write you a check?"), leading to the status of The One Search-engine we all know, love and envy.
"Back links?" you think. Yeah, Google's system of deciding what pages are worth your reading-cycles differed from all other search engines' way at the time. The PageRank algorithm ranks all sites by giving them a rank between 0 and 10, based on how many other pages are linking to the site, and what value the linking pages has. If you are interested in the mathematics between the PageRank algorithm, I suggest you read about it on Wikipedia. The logics behind PR is not in the scope of this article.
From this information, you can probably figure out the basics of SEO,Search Engine Optimization. Get your page linked to by the big boys. Of course, some people just can't be content with playing by the rules, and the PR-algorithm isn't perfect, so from time to time someone manages to fool the algorithm, an example being the 302 Google Jack, redirecting the new, zero-ranked page to a rank ten page, like Google itself. When Google updates the PageRanks, the new page will get the same rank as the page it linked to. Other people buys and sells high-valued links, really a kind of advertising, but with a big debate buzzing in the background. Google has requested that such links use the HTML attribute "nofollow", rendering the page linked to to be ignored when re-ranking.
The above mentioned kinds of tricks, as well as many others, can of course lead your page to get devalued, meaning that it will not be ranked at all. Play safe!
Every Joe Schmoe knows that search-engines like Google doesn't take kindly too long sentences and free-text, but he probably never bothered reading up on how the magical searchbox actually works, something he should be severely punished for. Let's leave Joe to his fate, and rise far above him, to the lands without stupid questions. Even in the "basic" syntax collection I'm sure you are able to find a few sparkling gems you didn't know about, so skim through it even if you feel confident in your Google-Fu.
So, top down, a standalone word yields pages containing that word, a sentence enclosed with quotation-marks (" ") similarly yields pages that contain that exact phrase. If you have ever created an SQL-query for some database, I'm sure you will find a lot of similarities as we go on now. Google is actually "just a database", remember?
Command Example Result
AND [&] (ampersand) Slackware AND Linux Shows pages containing both arguments, *OBS* this is the default operator, no need to include OR [|] (pipe) laptop OR Desktop Shows pages containing either argument - (minus) Hamburger -McDonalds Shows pages containing the word "Hamburger", but only if they don't mention "McDonalds" + (plus) +coke Contrary to the "includes" belief, this limits the results to the given form only, no pluralis or other tenses ~ (tilde) ~Hacker Results include everything deemed similar to "Hacker" * (asterisk) Fish * Chips The wildcard (*) is replaced by one or more words/characters (and, n, 'n, &) define: define:Nocturnal A personal favorite, looks up the meaning of the word site: Phreaking site:phrack.org Limits the search to a specific site #...# zeroday 2007...2008 Search results include a value within the given range info: info:www.hacktivismo.com Shows information about the site related: related:www.google.com Shows pages similar/related to argument link: link:www.darkmindz.com Shows sites linking to the argument filetype: phrack filetype:pdf Results are limited to given filetype ([?]) Cyber (China & America) Nestling combines several terms in the same query [?A] in [?B] 1 dollar in yen Converts argument A to argument B daterange: daterange:2452122-2452234 Results are within the specified daterange. Dates are calculated by the Julian calendar movie: movie:Hackers Movie reviews, can also find movie theaters running the movie in U.S cities music: music:"Weird Al" Hits relate to music stock: stock: goog Returns stock information (NYSE, NASDAQ, AMEX) time: time: Stockholm Shows the current time in requested city safesearch: safesearch: teen Excludes pornography allinanchor: allinanchor: Best webcomic ever" Results are called argument by others inanchor: foo bar inanchor:jargon As above, but not for all. The corresponding below all bear the same meaning allintext: allintext:8-bit music Argument exists in text intext: allintitle: allintitle: Portfolio Argument exists in title intitle: allinurl: allinurl:albino sheep Argument exists in URL inurl:
Advanced GET-variable breakdown http://www.google.com/search? as_q=test (query string) &hl=en (language) &num=10 (number of results [ 10,20,30,50,100 ]) &btnG=Google+Search &as_epq= (complete phrase) &as_oq= (at least one) &as_eq= (excluding) &lr= (language results. [ lang_countrycode ]) &as_ft=i (filetype include or exclude. [i,e]) &as_filetype= (filetype extension) &as_qdr=all (date [ all,M3,m6,y ]) &as_nlo= (number range, low) &as_nhi= (number range, high) &as_occt=any (terms occur [ any,title,body,url,links ]) &as_dt=i (restrict by domain [ i,e ]) &as_sitesearch= (restrict by [ site ]) &as_rights= (usage rights [ cc_publicdomain, cc_attribute, cc_sharealike, cc_noncommercial, cc_nonderived ] &safe=images (safesearch [ safe=on,images=off ]) &as_rq= (similar pages) &as_lq= (pages that link) &as_qdr= (get only recently updated pages d[ i ] | w[ i ] | y[ i ]) &gl=us (country)
So, Google gives us all those handy tools for filtering away what we don't want to see, how can we use this to help securing our own systems?
Well, for example, we could use the neat Google Hacking Database, a project where people has submitted a huge collection of queries yielding results that the unskilled webmaster (the Googledork) wishes weren't there. Everything from vulnerable login-forms to passwords surfaces with some cleverly engineered queries.
Goolag is a vulnerability scanner (and a politically involved protest..) made by the famous Cult of the Dead Cow. It builds on the above mentioned GHDB, scanning for vulnerabilities in the database. At the moment there is only a Windows-version of the program. The Goolag project is also a campaign against Google's (and a few other big players') choise to comply with the Chinese censorship policy.
-inurl:htm -inurl:html intitle:"index of" "Last modified" mp3 mp3-file indexes, add desired artist site:rapidshare.de -filetype:zip OR rar daterange:2453402-2453412 zip files on rapidshare uploaded on specified date http://www.google.com/search?q=your+query+here&as_qdr=d1 Query results updated within one day