By Yunkyu Sohn
Since the publication of a landmark paper, which initially introduced an objective measure of good partition, called modularity, modularity maximization approach as a community partition method has been favored by scientists in various disciplines. However, recently, theoretical studies have discovered several serious fundamental problems inhere in the method.
I would like to highlight two major problems among them: i) Resolution limit in community detection: Fortunato and Barthelemy validate that modularity maximization algorithms “fail to identify modules smaller than a scale which depends on the total size of the network and on the degree of interconnectedness of the modules, even in cases where modules are unambiguously defined.” ii) Rugged modularity landscape and picking up a single local optimum: Since a modularity maximization task is known as an NP-hard problem, we use several types of heuristics (spectral, greedy, simulated annealing, random walk or genetic, and etc.) to obtain a single community assignment vector whose modularity value is believed to be located near the global maximum. This approach is based upon an assumption that our objective function (Q(x) where Q is modularity resulted by x community assignment vector) has a single peak so that several community assignment vectors near this peak exhibit very similar community assignments, and thus it is OK to select one of them to know about the best partition (Massen and Doye). However think of a case where the objective function has two peaks (global maxima) at very different locations. In this case, even though we find out a community assignment vector which results a modularity value near the global maximum, the vector doesn’t contain any information about a partition which also results maximal modularity at another location. Unfortunately, several studies have proven that many popular network structures possess very rugged landscapes around the top of their modularity functions (Good et al.; Reichardt and Bornholdt; Sales-Pardo et al.). In other words, the modularity landscapes of those networks have many near-global-optima at various places and we cannot know where our algorithm locates its solution.
A family of solutions (Sales-Pardo et al.; Sawardecker et al.) for this problem use stochastic searching (ex. simulated annealing or random walk) to produce various near-optima community assignment vectors and aggregate the information within these vectors. This approach has an advantage of keeping the information of relevant community assignment vectors having diverse characteristics, yet uses very complicated and lengthy computations to reach the final community assignment vector. Another group of solutions throw away modularity measures and build up noble approach to reveal the best partition (Rosvall and Bergstrom Rosvall and Bergstrom b). The most successful algorithm of this type uses random walks (Rosvall and Bergstrom b) and is shown to work well for several types of artificial networks (Lancichinetti and Fortunato). However since this algorithm dumps the authority of modularity measure, we cannot have objective criteria on a better community assignment vector without prior knowledge for node categories.
The statistical physics community has just started to think about above problems seriously, and no definite answer has been introduced yet. We may have to care much about the stability of our community assignment vectors and refine the available methods to obtain better criteria adjusted for the purpose of the study.