As a follow-up to my previous post on the discussion of where theory and experimentation meet in the study of large-scale networks, I would like to discuss in more detail one of the empirically best-performing algorithms which also has a sound theoretical background: spectral partitioning. In this post I will examine the history of the problem, outline some key results, and present some future ideas for the problem.
Finding local partitions
The goal of a local partitioning algorithm is to identify a community in a massive network. In this context, a community can be loosely defined as a collection of well-connected vertices who are are also reasonably well-separated from the rest of the network. The quality of a community given by a subset of vertices $latex S\subseteq V$ can be determined by a number of measures. One common measure is a ratio of edge connections from $latex S$ to the rest of the network divided by the size of $latex S$, known as the conductance. Specifically, let $latex \partial(S)$ be the number of edges with one endpoint in $latex S$ and the other not in $latex S$, and let the volume of $latex S$, $latex \textmd{vol}(S) = \sum_{v\in S}deg(v)$ be the sum of the degrees of vertices in $latex S$. Then the conductance of $latex S$ is $latex \phi(S) := \partial(S)/\textmd{vol}(S)$. The goal of a local partitioning algorithm is, given a vertex $latex v\in V$, to identify a subset $latex S$ near $latex v$ with small conductance.
Local algorithms may be used iteratively to find global balanced cuts or a clustering of the entire network. However, the problem is also of independent interest in itself. The ability to identify local communities has important implications in social networks, for instance, when we are only interested in a set of entities with particular features, rather than how these features vary over the entire network. A clear example is advertising. A particular product will be of highest interest to a particular constituent, and advertisers are probably unconcerned with the rest of the market.
In any case, running time determines how these can be applied to massive networks. In general, local algorithms will have running times in terms of the size of the output – the local cluster – rather than the entire graph. The trend in the last decade or so has been in developing local partitioning algorithms which run in time nearly linear in the size of the output.
Previous work on the problem
Results for partitioning algorithms can be traced back to the Cheeger inequalities given by Alon and Milman in ’85. Recall that the edge expansion of a set, related to the conductance, is defined by $latex h_G(S) = \partial(S)/|S|$, where now we are simply concerned with the number of vertices in $latex S$, $latex |S|$. The edge expansion of a graph, $latex h(G)$, is the minimum edge expansion over all subsets. The Cheeger inequalities give a way to relate $latex h(G)$ to the second smallest eigenvalue of a normalized adjacency matrix. In her book on spectral graph theory, Chung gives an analog of the Cheeger inequalities for the conductance of a set as defined above, this time using the second smallest eigenvalue of the normalized Laplacian. The Cheeger inequalities prove that, in nearly linear time, an $latex O(1/\sqrt{\phi(G)})$-approximation to the conductance of the graph can be computed.
Following the Cheeger inequalities, a local partitioning algorithm was studied by Spielman and Teng beginning with their STOC result of 2004. Their result was improved in 2006 by Andersen, Chung, and Lang, later in 2009 by Andersen and Peres, and in 2012 by Gharan and Trevisan.
The intuition behind these algorithms has to do with mixing rates. Let us revisit our notion of a community – a collection of well-connected vertices which are relatively separate from the rest of the graph. This can also be understood in term of random walks. Namely, if we start a random walk within a high-quality community, we can expect with reasonably high probability to remain within the community after a certain number of steps.
Theoretical results for local partitioning algorithms are typically formulated as follows. Given a set $latex S\subseteq V$, if a starting vertex $latex v$ is sampled from $latex S$, then with certain probability a set $latex T$ can be found with small conductance, in terms of $latex \phi(S)$, in time proportional to the size of $latex T$. The work/volume ratio is the work required for a single run of the algorithm divided by the volume of the output set. Spielman and Teng find such a set by examining threshold sets of the probability distribution of a $latex t$-step random walk from the vertex $latex v$ sampled with probability proportional to degree in $latex S$. The set output by their algorithm achieves conductance $latex O(\phi(S)^{1/2}\log^{3/2}n)$ with work/volume ratio $latex O(\phi(S)^{-2} polylog(n))$. This is improved by Andersen, Chung, and Lang using distributions of PageRank vectors to analyze random walks with a probability of being “reset” to the starting vertex at each step. Their algorithm improves the conductance of the output set to $latex O(\phi(S)^{1/2}\log^{1/2}n)$ with work/volume ratio $latex O(\phi(S)^{-1} polylog(n))$.
Andersen and Peres manage to improve the work/volume ratio of the output set. Their methods simulate a volume biased evolving set process which is a Markov chain with states that are subsets of vertices and transition rules that grow or shrink the current set. Specifically, start with a vertex $latex v$ and produce a sequence of sets $latex S_1, S_2, \ldots, S_{\tau}$ with the property that at least one set $latex S_t$ is such that $latex \partial(S_t)/\textmd{vol}(S_t) \leq O(\sqrt{\log(\textmd{vol}(S_{\tau}))/\tau})$. Then if the process constructs sets of volume at most $latex \gamma$ for all sets up to some time $latex T$, they achieve a set of volume at most $latex \gamma$ and conductance $latex O(\sqrt{\log\gamma/T})$.
Current best results
As mentioned, the results of Andersen and Peres have recently been improved by Gharan and Trevisan in their 2012 FOCS paper. Their main tool is again threshold sets of random walks, with an improved lower bound on the probability that a lazy random walk is entirely contained in a subset $latex S$ after $latex t$ steps. This is an improvement on the work of Spielman and Teng. In their algorithm, they also use an evolving set process, and beat the bounds of Andersen and Peres by performing copies of the evolving set process in parallel. They show that, for a starting vertex $latex v$, a target conductance $latex \phi\in(0,1)$, a target volume $latex \gamma$, and $latex 0 < \epsilon < 1$, their algorithm outputs a set $latex S$ with conductance $latex O(\sqrt{\phi/\epsilon})$ that performs with work/volume ratio $latex O(\gamma^{\epsilon}\phi^{-1/2}\log^2 n)$. The running time is slightly super linear in the size of the optimum.
All the results are summarized in the following table.
Where do we go from here?
This is a hard problem, but a rewarding one. Gharan and Trevisan design a local variant of the Cheeger inequalities that give a sublinear time algorithm with an approximation guarantee that does not depend on the size of the graph. This opens many doors for algorithms on massive networks. There is already a body of work that applies local partitioning algorithms. For instance, in identifying local alignments in protein networks. However, as I also mentioned in my previous post, there is often some calibration involved in moving these algorithms to production. I think local partitioning algorithms are primed for application, due to their promising theoretical bounds and the simplicity of their implementations (simulating random walks) and can be close to mainstream application.