Sublinear Time Algorithm for PageRank Computations and Related Applications by Christian Borgs, Michael Brautbar, Jennifer Chayes, Shang-Hua Teng
In a network, identifying all vertices whose PageRank is more than a given threshold value $\Delta$ is a basic problem that has arisen in Web and social network analyses. In this paper, we develop a nearly optimal, sublinear time, randomized algorithm for a close variant of this problem. When given a network \graph, a threshold value $\Delta$, and a positive constant $c>1$, with probability $1-o(1)$, our algorithm will return a subset $S\subseteq V$ with the property that $S$ contains all vertices of PageRank at least $\Delta$ and no vertex with PageRank less than $\Delta/c$. The running time of our algorithm is always $\tilde{O}(\frac{n}{\Delta})$. In addition, our algorithm can be efficiently implemented in various network access models including the Jump and Crawl query model recently studied by \cite{brautbar_kearns10}, making it suitable for dealing with large social and information networks.
As part of our analysis, we show that any algorithm for solving this problem must have expected time complexity of ${\Omega}(\frac{n}{\Delta})$. Thus, our algorithm is optimal up to a logarithmic factor. Our algorithm (for identifying vertices with significant PageRank) applies a multi-scale sampling scheme that uses a fast personalized PageRank estimator as its main subroutine. We develop a new local randomized algorithm for approximating personalized PageRank, which is more robust than the earlier ones developed by Jeh and Widom \cite{JehW03} and by Andersen, Chung, and Lang \cite{AndersenCL06}. Our multi-scale sampling scheme can also be adapted to handle a large class of matrix sampling problems that may have potential applications to online advertising on large social networks (See the appendix).
Pay close attention to the author’s definition of “significant” vertices:
A basic problem in network analysis is to identify the set of its vertices that are “significant.” For example, the significant nodes in the web graph defined by a query could provide the authoritative contents in web search; they could be the critical proteins in a protein interaction network; and they could be the set of people (in a social network) most effective to seed the influence for online advertising. As the networks become larger, we need more efficient algorithms to identify these “significant” nodes.
As far as online advertising, I await the discovery by vendors that “pull” models of advertising pre-qualify potential purchasers. “Push” models spam everyone within reach, with correspondingly low success rates.
For your convenience, the cites that don’t work well as source in the abstract:
brautbar_kearns10 – Local Algorithms for Finding Interesting Individuals in Large Networks by Mickey Brautbar , Michael Kearns.
Jeh and Widom – Scaling personalized web search (ACM), Scaling personalized web search (Stanford, free).
Andersen, Chung, and Lang – Local graph partitioning using PageRank vectors.