Active Learning for Node Classification in Assortative and Disassortative Networks by Cristopher Moore, Xiaoran Yan, Yaojia Zhu, Jean-Baptiste Rouquier, and Terran Lane.
Abstract:
In many real-world networks, nodes have class labels, attributes, or variables that affect the network’s topology. If the topology of the network is known but the labels of the nodes are hidden, we would like to select a small subset of nodes such that, if we knew their labels, we could accurately predict the labels of all the other nodes. We develop an active learning algorithm for this problem which uses information-theoretic techniques to choose which nodes to explore. We test our algorithm on networks from three different domains: a social network, a network of English words that appear adjacently in a novel, and a marine food web. Our algorithm makes no initial assumptions about how the groups connect, and performs well even when faced with quite general types of network structure. In particular, we do not assume that nodes of the same class are more likely to be connected to each other—only that they connect to the rest of the network in similar ways.
If abstract doesn’t recommend this paper as weekend reading, perhaps the following quote from the paper will:
our focus is on the discovery of functional communities in the network, and our underlying generative model is designed around the assumption of that these communities exist.
You will recall from Don’t Trust Your Instincts that we are likely to see what we expect to see in text, or in this case, networks. Not that using this approach frees us from introducing bias, but it does insure the observer bias is uniformly applied across the data set. Which may lead to results that startle us, interest us or that we consider to be spurious. In any event, this is one more approach to test and possibly illuminate our understanding of a network.
PS: Are communities the equivalent of clusters?