The post, Google To Launch New Doorway Page Penalty Algorithm by Barry Schwartz reminded me that Internet search is truly a crap shoot in a black box.
Google has over two hundred (200) factors that are known (or suspected) to play a role in its search algorithms and their ranking of results.
Even if you memorized the 200, if you are searching you don’t know how those factors will impact pages with information you want to see. (Unless you want to drive web traffic, the 200 factors are a curiosity and not much more.)
When you throw your search terms, like dice, in to the Google black box, you don’t know how they will interact with the unknown results of the ranking algorithms.
To make matters worse, yes, worse, the Google algorithms change over time. Some major, some not quite so major. But every change stands a chance to impact any ad hoc process you have adopted for finding information.
A good number of you won’t remember print indexes but one of their attractive features (in hindsight) was that the indexing was uniform, at least within reasonable limits, for decades. If you learned how to effectively use the printed index, you could always find information using that technique, without fear that the familiar results would simply disappear.
Perhaps that is a commercial use case for the Common Crawl data. Imagine a disclosed ranking algorithm that could be exposed to create a custom ranking for a sub-set of the data against which to perform searches. So the ranking against which you are searching is known and can be explored.
It would not have the very latest data but that’s difficult to extract from Google since it apparently tosses the information about when it first encountered a page. Or at the very least doesn’t make it available to users. At least as an option, being able to pick the most recent resources matching a search would be vastly superior to the page-rank orthodoxy at Google.
Not to single Google out too much because I haven’t encountered other search engines that are more transparent. They may exist but I am unaware of them.