Community Detection Algorithms
One of the main areas of interest in the field of network analysis is the detection of the communities that exist within a given network. Over the years, a large number of techniques have been proposed by different authors. These different approaches have their own advantages and disadvantages: some methods scale better, some obtain better results, some automatically determine the number of communities instead of requiring it as a parameter, etc..
The most relevant approaches to community detection have been implemented in the NOESIS framework, including a wide range of techniques from very different families of community detection methods. This section describes the provided community detection algorithms and how to use them. Different families of community detection algorithms are included in NOESIS within the noesis.algorithms.communities package, including hierarchical, spectral, overlapping, partitioning, and modularity-based algorithms.
Detecting communities
Community detectors are implement according to the interface defined by the CommunityDetector abstract class. In order to use a community detection algorithm, we must instantiate the corresponding community detector first. For example, in order to use the Kernighan–Lin, we would instantiate the class KernighanLinCommunityDetector as follows:
Network network = ...
CommunityDetector communityDetector = new KernighanLinCommunityDetector(network);
Once the community detector has been created, we must call the compute method to actually run the community detection algorithm:
communityDetector.compute();
Finally, we have two alternatives to obtain the computed communities. We could use the getResults method to retrieve the original matrix of communities computed by the algorithm as follows:
Matrix communities = communityDetector.getResults();
However, most of the time we only want the community each node belongs to. This information can be obtained using the getBest method as shown below:
Vector communities = communityDetector.getBest();
The obtained vector contains the index of the main community for each node in the network.
Creating a custom community detector
A custom community detector can be implemented in NOESIS by extending the CommunityDetector class. Extending this class requires implementing the compute method, where the logic of the community detection algorithm is placed. The CommunityDetector class also provides two important protected members. First, the network member contains the network for which communities are going to be computed. Second, the results member contains the community matrix where the results of the algorithm should be stored. The structure of a custom community detector is shown below:
public class MyCommunityDetector extends CommunityDetector
{
/* NOTE: The network and the matrix of communities are stored in the
* protected members "network" and "results", respectively */
public MyCommunityDetector(AttributeNetwork network)
{
super(network);
}
@Override
public void compute()
{
... // Implement your community detector here
}
}
Hierarchical community detection
Hierarchical community detection algorithms build a hierarchy by computing the similarity (or distance) between pairs of node clusters. There are two main approaches: agglomerative and divisive. On the one hand, agglomerative approaches initially treat each node as a cluster and carry out a merging procedure until all nodes belong to the same cluster. On the other hand, divisive approaches generate the hierarchy top-down by starting from a cluster containing all nodes and splitting it until each node is placed in a own single cluster.
This section describes the hierarchical algorithms implemented in the NOESIS framework, including examples of how to use them.
Agglomerative clustering algorithm
This clustering algorithm detects network communities by building a hierarchy of clusters. Initially, each node is assigned its own cluster. Then, pairs of clusters are merged until only one final cluster, containing all nodes in the network, is left. In each step of this merging steps, the pair of cluster with highest similarity is combined. Different strategies, described below, can be followed to estimate the similarity between pairs of clusters.
Single-link agglomerative clustering
In this strategy, the similarity between a pair of clusters is equal to the similarity of the most similar pair of nodes between both communities. An example using this strategy is shown in the following code fragment:
AttributeNetwork network = ...
SingleLinkCommunityDetector detector = new SingleLinkCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
Complete-link agglomerative clustering
This strategy considers the similarity between a pair of clusters to be equal to the similarity of the less similar pair of nodes between both communities. The following code snippet shows an example using this strategy:
AttributeNetwork network = ...
CompleteLinkCommunityDetector detector = new CompleteLinkCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
Average-link agglomerative clustering
This strategy computes the similarity between a pair of clusters as the average similarity between the nodes in both communities. This strategy can be used as shown below:
AttributeNetwork network = ...
AverageLinkCommunityDetector detector = new AverageLinkCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
References:
- Fortunato, S. (2010). Community detection in graphs. Physics reports, 486(3-5), 75-174. DOI: 10.1016/j.physrep.2009.11.002.
- Sibson, R. (1973). SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1):30–34. DOI: 10.1093/comjnl/16.1.30.
- Defays, D. (1977). An efficient algorithm for a complete link method. The Computer Journal, 20(4):364–366. DOI: 10.1093/comjnl/20.4.364.
Newman-Girvan community detection algorithm
The Newman-Girvan algorithm is a divisive hierarchical community detection technique based on the concept of link betweenness. The betweenness of a specific link is defined as the number of shortest paths between pairs of nodes containing that link. Since links between communities are expected to show a high betweenness score, this algorithm works by iteratively removing the link with the highest betweenness until no link is left in the network. The process of removing these links reveals the communities present in the network, which are composed of those nodes connected by the links that are deleted later in the link removal process.
The following code snippet shows an example of how to use the Newman-Girvan algorithm for community detection:
AttributeNetwork network = ...
NewmanGirvanCommunityDetector detector = new NewmanGirvanCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
References:
- Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821-7826. DOI: 10.1073/pnas.122653799.
Radicchi's algorithm
Recomputing link betweenness in the Newman-Girvan algorithm can be computationally expensive in large networks. The Radicchi's algorithm follows the same procedure than the Newman-Girvan algorithm, but removes links more efficiently in terms of the link-clustering coefficient. This coefficient is defined as the number of triangles the link belongs to, divided by the number of potential triangles the link could belong to.
The following code fragment shows an example of how to apply the Radicchi's algorithm to a given network:
AttributeNetwork network = ...
RadicchiCommunityDetector detector = new RadicchiCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
References:
- Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 101(9), 2658-2663. DOI: 10.1073/pnas.0400054101.
Modularity-based community detection
Modularity measures the concentration of links between groups of nodes compared to what would be expected if they were connected at random. Modularity-based techniques find communities by maximizing their modularity. NOESIS implements different modularity-based greedy approaches, which are introduced below.
Fast greedy algorithm
The fast greedy algorithm is an efficient approach to detect communities based on modularity. This strategy starts with a subnetwork composed only of links between highly connected nodes. Then, the algorithm iteratively samples random links that improve the modularity of the subnetwork and adds them. This iterative process is repeated as long as the modularity keeps improving. Finally, the communities are obtained based on the connected components in the subnetwork.
An example of how to use this partitioning algorithm is shown below:
AttributeNetwork network = ...
FastGreedyCommunityDetector detector = new FastGreedyCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
References:
- Newman, M. E. (2004). Fast algorithm for detecting community structure in networks. Physical review E, 69(6), 066133. DOI: 10.1103/PhysRevE.69.066133.
Multi-step greedy algorithm
The multi-step greedy algorithm is an even more efficient alternative to the fast greedy algorithm. Both algorithms are similar. However, the multi-step greedy algorithm is based on a priority queue where the increment in modularity associated to adding each link is stored. This algorithm takes links from the priority queue and adds them to the subnetwork while the modularity is improved, similar to how the fast greedy algorithm did it. When modularity stops improving, this algorithms recomputes the priority queue and repeats the link adding process. This step is repeated until the modularity of the subnetwork stops improving.
This algorithm can be used as shown in the following example:
AttributeNetwork network = ...
MultiStepGreedyCommunityDetector detector = new MultiStepGreedyCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
References:
- Schuetz, P. and Caflisch, A. (2008). Efficient modularity optimization by multistep greedy algorithm and vertex mover refinement. Physical Review E, 77(4):046112. DOI: 10.1103/PhysRevE.77.046112
Partitioning-based community detection
Partitioning-based community detection algorithms search for communities by decomposing the network in different groups of nodes and performing an iterative relocation of nodes between these groups.
Kernighan-Lin algorithm
The Kernighan-Lin algorithm is an efficient graph bi-sectioning heuristic. This algorithm initially splits the network in two random sets of nodes with a similar size. Then, the algorithm iteratively swaps pairs of nodes in order to reduce the number of links between pairs of nodes that are not in the same set. Exchanged nodes are locked, not being allowed to change of set again. This process is repeated until no pair of nodes can be exchanged, finally obtaining two communities of nodes.
This community detection algorithm can be applied as shown below:
AttributeNetwork network = ...
KernighanLinCommunityDetector detector = new KernighanLinCommunityDetector(network);
detector.compute();
Vector communities = detector.getBest();
References:
- Kernighan, B. W. and Lin, S. (1970). An efficient heuristic procedure for partitioning graphs. Bell System Technical Journal, 49(2):291–307. DOI: 10.1002/j.1538-7305.1970.tb01770.x.
K-means algorithm
This community detection method is based on applying the k-means algorithm over a vectorial representation of nodes. This representation is computed using the force-directed Fruchterman-Reingold layout algorithm. This method can take the number of communities to be detected as a parameter.
An example showing how this technique can be applied to detect five communities in a network is shown below:
AttributeNetwork network = ...
KMeansCommunityDetector detector = new KMeansCommunityDetector(network, 5);
detector.compute();
Vector communities = detector.getBest();
References:
- MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(14):281-297.
Spectral community detection
NOESIS also includes several spectral community detection algorithms. These techniques are based on the Laplacian matrix, which is another type of network representation. Communities are finally obtained by running the k-means algorithm over a truncated version of the Laplacian matrix. This section describes the different spectral techniques implemented in the NOESIS framework.
Spectral K-means
Given the number of communities k to be detected, this spectral approach truncates the Laplacian matrix to the first k eigenvectors. Then, communities are detected by running the k-means algorithm over this truncated matrix.
For example, the following code snippet detects four communities in a given network using the spectral k-means algorithm:
AttributeNetwork network = ...
UKMeansCommunityDetector detector = new UKMeansCommunityDetector(network, 4);
detector.compute();
Vector communities = detector.getBest();
References:
- Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905. DOI: 10.1109/34.868688.
KNSC1 algorithm
The KNSC1 algorithm is similar to the spectral k-means algorithm. The main difference is that this method applies a symetric normalization to the Laplacian matrix and also normalizes each row of the truncated matrix dividing each element by the vector length of the row.
An example of how to use this algorithm to detect six communities in a network is shown below:
AttributeNetwork network = ...
NJWCommunityDetector detector = new NJWCommunityDetector(network, 6);
detector.compute();
Vector communities = detector.getBest();
References:
- Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems, 2:849–856.
Bi-partitioning algorithm
This spectral algorithm is based on computing the Fiedler vector, which is the eigenvector associated to the second smallest eigenvalue of the Laplacian matrix. Both the Fiedler vector and the eigenvalue, called the algebraic connectivity, have interesting properties in the context of network analysis. In network partitioning, the Fiedler vector can be used to split the nodes in the network in two sets or communities. This procedure is carried out by computing a threshold so that nodes go to a set depending on whether the value of the element in the Fiedler vector associated to the node is below or above this threshold. This method supports different strategies for computing the threshold, described below:
- ZERO: The threshold is set to zero.
- AVG: The threshold is set to the average of the elements in the Fiedler vector.
- MEDIAN: The threshold is set to the median of the elements in the Fiedler vector.
- GAP: The threshold is set to the element in the Fiedler vector with the largest gap or difference to the previous element according to their numerical ordering.
An example of how to apply this community detection algorithm, using the median as threshold value, is shown below:
AttributeNetwork network = ...
EIG1CommunityDetector detector = new EIG1CommunityDetector(network, ThresholdType.MEDIAN);
detector.compute();
Vector communities = detector.getBest();
References:
- Hagen, L., & Kahng, A. B. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE transactions on computer-aided design of integrated circuits and systems, 11(9), 1074-1085. DOI: 10.1109/43.159993.
Overlapping community detection
The previously described algorithms assume that nodes can only belong to a community and that these communities never overlap. However, we can see that this assumption is a simplification of the real world. For example, a person in a social network can belong to different social groups, including different groups of friends and different groups of co-workers.
The algorithms described in this section are more flexible and allow nodes to belong to multiple communities at the same time. Therefore, these algorithms can discover the different groups each node belongs to by detecting overlapping communities in the network.
BIGCLAM algorithm
The Cluster Affiliation Model for Big Networks (BIGCLAM) is an overlapping community detection method with a high scalability. This algorithm defines a model of affiliation between nodes and communities, which also allows to estimate the probability of link between pairs of nodes. The parameters of this model are estimated by iteratively maximizing the likelihood of explaining the links present in the network.
The following example shows how to detect communities in a network using the BIGCLAM algorithm:
AttributeNetwork network = ...
BigClamCommunityDetector detector = new BigClamCommunityDetector(network);
detector.compute();
Matrix affiliationMatrix = detector.getResults();
The obtained affiliation matrix contains a row for each node in the network and a column for each detected community. The matrix represents the degree of affiliation for each node to each community.
References:
- Yang, J., & Leskovec, J. (2013). Overlapping community detection at scale: a nonnegative matrix factorization approach. In Proceedings of the sixth ACM international conference on Web search and data mining (pp. 587-596). DOI: 10.1145/2433396.2433471.