doi:10.1016/j.tcs.2006.06.015

Transcript

1 Theoretical Computer Science 363 (2006) 28 – 42 www.elsevier.com/locate/tcs The worst-case time complexity for generating all maximal cliques and computational experiments , a , b a a ∗ , Akira Tanaka , Haruhisa Takahashi Etsuji Tomita a The University of Electro-Communications, Department of Information and Communication Engineering, Chofugaoka 1-5-1, Chofu, Tokyo 182-8585, Japan b Toyota Techno Service Corporation, Imae 1-21, Hanamotocho, Toyota, Aichi 470–0334, Japan Abstract We present a depth-first search algorithm for generating all maximal cliques of an undirected graph, in which pruning methods are employed as in the Bron–Kerbosch algorithm. All the maximal cliques generated are output in a tree-like form. Subsequently, 3 n/ 3 we prove that its worst-case time complexity is O ( for an n -vertex graph. This is optimal as a function of n , since there exist ) n/ 3 up to 3 n -vertex graph. The algorithm is also demonstrated to run very fast in practice by computational maximal cliques in an experiments. © 2006 Elsevier B.V. All rights reserved. Maximal cliques; Enumeration; Worst-case time complexity; Computational experiments Keywords: 1. Introduction G In an undirected graph clique is a complete subgraph of G , i.e., a subgraph in which any two vertices are adjacent. ,a The set of vertices of a maximal clique of the complementary graph of G is a maximal independent set of G . Generating maximal cliques or maximal independent sets of a given graph is one of the fundamental problems in the theory of graphs, and such a generation has many diverse applications, e.g., in clustering and in bioinformatics [15,16,8]. A number of algorithms have been presented and evaluated experimentally or theoretically for this problem [5–7,9,26,13,19,12]. In particular, Pardalas and Xue [19], and Bomze et al. [5] conducted surveys on the maximum clique problem. Bron and all the maximal cliques of a graph. They showed Kerbosch [6] presented a depth-first search algorithm for generating experimentally that the computing time per clique is almost independent of the graph size for random graphs and that the n/ 3 . 14 total computing time is proportional to ( 3 ) for Moon – Moser graphs [17] of n vertices. Furthermore, Johnston [9] and Koch [12] presented some variations of the Bron–Kerbosch algorithm together with a considerable number of com- putational experiments. No results, however, were ever published on the theoretical time complexity for these algorithms. On the other hand, Tsukiyama et al. [26] devised an algorithm for generating all the maximal independent sets in a graph G in O (nm  ) -time, where n , m , and  are the number of vertices, edges, and maximal independent sets of G , respec- tively. Lawler et al. [13] generalized this result further. Chiba and Nishizeki [7] improved the algorithm of Tsukiyama ∗ Corresponding author. E-mail addresses: [email protected] (E. Tomita), [email protected] (A. Tanaka), [email protected] (H. Takahashi). 0304-3975/$ - see front matter © 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2006.06.015

2 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 – 29 42 G in O  ) -time, where a(G) et al. and presented a more efficient algorithm for listing all the maximal cliques of (a(G)m 1 / 2 with G O (m a(G) is the arboricity of  G  is the number of maximal cliques in G . for a connected graph ) and Recently, Makino and Uno [14] presented new algorithms, which are based on the algorithm of Tsukiyama 4 ( in O G et al. [26]. One of their algorithms enumerates all the maximal cliques of  -time, where  is the maximal  ) (Theorem 2 of [14]). They showed by using computational experiments that the algorithms is considerably degree of G faster than that of Tsukiyama et al. [26] for sparse graphs. We present here a depth-first search algorithm for generating all the maximal cliques of an undirected graph, in which pruning methods are employed as in the Bron–Kerbosch algorithm [6]. All the maximal cliques generated are 3 n/ 3 output in a tree-like form. Subsequently, we prove that its worst-case running time complexity is O ( for a graph ) 3 n/ , since there exist up to 3 n vertices. This is the best one could hope for as a function of with n maximal cliques in a graph with n vertices as shown by Moon and Moser [17]. Our algorithm differs from that of Bron and Kerbosch in the format of the printed maximal cliques. The difference is important theoretically, since it is essential to the establishment of the optimal worst-case time complexity. It is also very important practically, since it saves space in the output file. The Bron–Kerbosch algorithm and other algorithms that yield the maximal cliques in a straightforward manner may require huge amounts of memory space compared to our algorithm. The time complexity of our algorithm is of special interest when it is compared with the results of Tarjan and 3 n/ 2 ( Trojanowski [21] and Robson [20]. The former presented an O ) -time algorithm for finding only one maximum n 0 . 276 ) -time 2 independent set—one independent set with the largest number of vertices—and the latter showed an O ( algorithm (using exponential space) for the same problem. n , the number of vertices, while the time The time complexity of our algorithm is expressed as a function of complexities of the algorithms developed by Tsukiyama et al. and their successors are expressed as a function of  , the number of maximal cliques (or maximal independent sets); therefore a direct theoretical comparison of these algorithms is difficult. We have performed extensive computational experiments and demonstrated that our algorithm is much faster in practice than those of Tsukiyama et al. and their successors. Earlier versions of this paper appeared in Tomita et al. [23–25]; in particular, the version of Tomita et al. [23] was reviewed by Pardalos and Xue [19] and Bomze et al. [5] and received considerable attention. 2. Preliminaries [1] graph G = (V , E) with a finite set V of vertices Throughout this paper, we are concerned with a simple undirected w E unordered pairs and a finite set of distinct vertices, called edges . A pair of vertices v and of are said to (v, w) be adjacent if (v, w) ∈ E . [2] For a vertex v ∈ V , let  (v) be the set of all vertices that are adjacent to v in G = (V , E) , i.e.,  (v) ={ w ∈ V | (v, w) ∈ } (   v) . E ∈ For the subset V of vertices, G(W ) = (W, E(W )) with E(W) ={ (v, w) ⊆ W × W | (v, w) ∈ E } is called [3] W subgraph of G = a induced (V , E) W .Foraset W of vertices, | W | denotes the number of elements in W . by complete [4] ⊆ V of vertices, the induced subgraph G(Q) is said to be Q if (v, w) ∈ E for all Given the subset v, w ∈ Q with v  = w . In this case, we may simply state that Q is a complete subgraph. A complete subgraph is also called a clique maximal clique. . If a clique is not a proper subgraph of another clique then it is called a 3. The algorithm We consider a depth-first search algorithm for generating all the maximal cliques of a given graph = (V , E) G  =∅ ) . (V Q of a set of vertices that constitutes a complete subgraph found up to this Here, we introduce a global variable Q be an empty set, and expands Q time. The algorithm begins by letting step by step by applying a recursive procedure EXPAND to V and its succeeding induced subgraphs to search for larger and larger complete subgraphs until they reach maximal ones. Let Q ={ p ,...,p } be a complete subgraph found at some stage, and consider the set of vertices ,p 1 2 d SUBG = V ∩  (p ) ), ∩  (p (p ) ∩···∩  2 d 1

3 30 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – SUBG = and Q =∅ at the initial stage. Apply the procedure EXPAND to SUBG to search for larger complete where V =∅ then is clearly a maximal SUBG subgraphs. If Q clique Q ∪{ q } maximal . Otherwise, complete subgraph ,ora SUBG is a larger complete subgraph for every G( SUBG q ∈ . Now, consider the smaller subgraphs that are induced ) q by new sets of vertices SUBG SUBG ∩  (q) = q q for all ; further apply recursively the same procedure EXPAND to SUBG SUBG ∈ to find larger complete subgraphs q containing ∪{ q } . Q Thus far, we have described only a well-known basic framework of the algorithm for generating all the maximal cliques (with possible duplication). This process can be represented by the following search forest, or the collection of of graph G = (V , E) . For each search trees: the set of roots of the search forest is exactly the same as ∈ SUBG , V q SUBG all the vertices in q . Thus, a set of vertices along a path from the root to any (defined above) are children of q vertex of the search forest constitutes a complete subgraph or a clique. We shall give an example of a search forest (with unnecessary subtrees deleted) later in Fig. 3(b). Now we describe two methods to prune unnecessary parts of the search forest, which are found to be the same as in SUBG (  =∅ )asan ordered the Bron–Kerbosch algorithm [6]. We regard the previously described set set of vertices, and we continue to generate maximal cliques from the vertices in stepwise in this order. SUBG FINI First, let that have already been processed by the algorithm ( FINI is short for be a subset of vertices of SUBG FINI ”). We then denote the set of remaining candidates for expansion by CAND = SUBG − : . Hence, CAND finished “ we have FINI ∪ SUBG ( FINI ∩ CAND =∅ ). = CAND =∅ at the beginning. Consider the subgraph G( SUBG FINI ) with SUBG as defined above, and let q q SUBG FINI ∪ CAND = FINI ∩ CAND =∅ ), ( q q q q q where FINI = FINI ∩  (q) and CAND (q). = CAND ∩  q q CAND Following this, only the vertices in can be candidates for expanding the complete subgraph ∪{ q } to find new Q q larger cliques, since all the cliques containing (Q ∪{ q } ) ∪{ r } with r ∈ FINI ⊆ have already been generated FINI q r by application of the procedure EXPAND to as stated above (see Fig. 1). for any FINI ∈ have been , consider that all the maximal cliques containing Q ∪{ u } u Secondly, given a certain vertex SUBG } new , but not Q ∪{ u Q , must contain at least one vertex q ∈ generated. Then, every maximal clique containing −  (u) . This is because if Q is expanded to a complete subgraph R = (Q ∪ S) SUBG ( SUBG −{ u } ) with S ⊆ ∩ SUBG  (u) , then R ∪{ u } is a larger complete subgraph, and hence R is not maximal . Thus, any new maximal clique ∩ q Q Q ∪{ q } such that to ∈ SUBG −  (u) , and by subsequently generating all the cliques can be found by expanding containing Q ∪{ q } . Taking the previously described pruning method also into consideration, the only search subtrees to be expanded are from the vertices in SUBG − SUBG ∩  (u)) ( FINI − CAND −  (u) . Here, in order to minimize | CAND −  (u) | ,we = ∩ u SUBG as the one that maximizes | CAND ∈  (u) | . This is essential for the proof of Lemma 2(ii), choose a vertex and consequently the main theorem, i.e., Theorem 3. In this manner, the problem of generating all maximal cliques of G( CAND ) can be decomposed into k =| CAND −  (u) | such subproblems; see Lemma 2(i) in Section 4. We present an algorithm CLIQUES for generating all the maximal cliques without duplication in Fig. 2. Note here SUBG is not empty but CAND −  (u) is that some branches may not generate a maximal clique, as in the case where empty. If Q is a maximal clique that is found at statement 2, then the algorithm only prints out a string of characters “ clique ,” n/ 3 instead of itself at statement 3. Otherwise, it is impossible to achieve the worst-case running time of O ( 3 Q ) for an n -vertex graph, since the printing of Q itself requires time proportional to the size of Q . Instead, in addition to printing

4 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 – 42 31 Fig. 1. An illustration of the procedure EXPAND. Fig. 2. Algorithm CLIQUES.

5 32 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – G ; (b) A search forest for ; and (c) A resulting printed sequence. Fig. 3. An example (a) An input graph G followed by a at statement 7 every time q is selected as a new element of a q comma “clique" at statement 3, we print ,” at statement 12 after q is moved from CAND to FINI larger clique; further we print out a string of characters “ back at statement 11. We can easily obtain a tree representation of all the maximal cliques from the sequence printed by ′ ′ ′ statements 3, 7, and 12. In Fig. 2, the primed statements (0 , and 12 ,7 ) are solely for the sake of explanation. Example. Let us apply the algorithm CLIQUES to the graph in Fig. 3(a). The whole process is represented by the with appropriate indentations .In search forest in Fig. 3(b), and we show the resulting printed sequence in Fig. 3(c) Fig. 3(b), each set of vertices surrounded by a flat circle represents SUBG mark is in at that stage. A vertex with a ⊆ u chosen at statement 4 is marked by  FINI at the beginning. The vertex SUBG  depending on whether it is ◦ or CAND or in , respectively. Other vertices in CAND −  (u) are marked by ◦ , while vertices in CAND ∩  (u) are FINI marked by • . As a result, all the maximal cliques of G are { 4 , 6 , 7 , 8 } , { 4 , 6 , 5 } , { 4 , 3 , 8 } , { 1 , 2 , 9 } , and { 2 , 3 , 9 } . Given only the resulting printed sequence in Fig. 3(c) without indentations, we can easily obtain essentially the same result as above by reconstructing from it a tree that represents a principal component of the previous search forest in Fig. 3(b). Here, a dot · (/ ∈ V) is introduced as a virtual root of the tree. Subsequently, every time “ q ,” is encountered in the sequence, we expand a downward edge whose end-point is labeled by q . If and only if “ q ,” is followed by “ clique ,”, the set of all the vertices along the path from the root to the vertex q excluding the root ( · ) represents a maximal clique. Every time “ back ,” is encountered in the sequence, we go up the tree backward by one edge to find other maximal

6 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 – 42 33 2 4 9 1 32 3 6 75899 cl. cl. cl. cl. cl. " is an abbreviation for clique " 8 cl. Fig. 4. A tree representation of the result of the example in Fig. 3. cliques. See Fig. 4 for a tree representation of the result of the Example. It is clear that this transformation can be done in a time that is proportional to the length of the resulting sequence. We conclude this section by the following theorem. Correctness of the algorithm CLIQUES ). Given a graph G = (V , E) (V Theorem 1 =∅ ) , the algorithm CLIQUES (  . generates all and only maximal cliques without duplication Proof. See Appendix A.1. 4. The worst-case time complexity = (V , E) with V  =∅ Given CLIQUES (G) . This is G , we evaluate the worst-case running time of the algorithm (V , V ) . equivalent to evaluating the worst-case running time of EXPAND We begin with the following definitions. T (n) be the worst-case running time of EXPAND( SUBG , CAND ) when | SUBG |= n(n  1 ) . [1] Let T [2] Let − be the worst-case running time of EXPAND( , CAND ) when | SUBG |= n , and | EXT SUBG |=| (n) CAND u k |= at the first entrance to statement 5. (u)  k [3] Let us consider a nonrecursive procedure EXPAND ( , CAND ) that is obtained from EXPAND( SUBG , SUBG 0 ′ ) by replacing a recursive call 10: EXPAND( CAND SUBG CAND . The running time of )by10 :EXPAND , ( ∅ , ∅ ) q q 2 EXPAND ( SUBG CAND ) when | SUBG |= n can be made to be O (n , ) (in which, selection of u at statement 4 0 2 CAND ) ), and so we assume that the running time of EXPAND ( SUBG , ) is bounded above by can be done in O (n 0 the following quadratic formula 2 P (n) = p 0 where p . > n 1 1 From the above definitions, we have T (n) max = { T , (n) } (1) k k where 1  k | CAND | .  The following lemma is a key for evaluating T (n) . EXT Lemma 2. ( SUBG , CAND ) when | SUBG |= n , | EXPAND Consider |= CAND −  (u) |=| k  = 0, and u | CAND ∩  (u) |=| CAND |− k at the first entrance to statement 5. In what follows , CAND stands exclusively for this (u) initial value although it is repeatedly decreased at statement 11 in the while loop. Let CAND −  , ={ v ,v ,...,v } 1 2 k

7 34 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – Fig. 5. An illustration for Lemma 2. and let the vertex at statement 6 be chosen in this order. Let ), SUBG  (v = ∩ and SUBG i v i = CAND ∩  (v ) CAND i v i See Fig . 5.) ( . Then the following hold ∑ k | SUBG | ) . P (n) ( + ) | T( | SUBG  (i) T v k i i 1 = (ii) | SUBG |  n − k  n − 1. v i (i) This is obvious from the procedure EXPAND Proof. SUBG , CAND ) and the definition of P (n) . ( (ii) Since the vertex in SUBG is chosen so that it maximizes | CAND ∩  (u) | we have | CAND ∩  (v u )  | CAND ∩ | i (u) |=| CAND |− k . Hence, | SUBG  |+| ) SUBG ∩  (v ∩ CAND |=| FINI ∩  (v |+| ) |=| CAND ∩  (v FINI ) |  | i i v i i  1.  ) |  (n −| CAND | ) + ( | CAND |− k) = n − k  n − 1, since k (v  i For all n 1 Theorem 3.  3 n/ − Q(n) ≡ R(n), (2) 3 T (n)  C where 2 n + q n + q , = q Q(n) 3 2 1 with p 2 , / 2 > 0 ,q 0 = 9 p > / = > 0 ,q 2 = 27 p / q 1 1 3 1 2 1 and , } ,C ,C C max = C { 2 1 3 2 − 1 − 3 / 3 1 / = q 3 · / 2ln3, C 3 = 39 p ) /( 2 · 3 / ln 3 = ) , and C 27 being the maximum over n of p ( 1 − 2 · 3 with C 2 1 1 3 2 1 3 n/ n/ 3 0 Q(n is .( Note that C − 3 )/ 3 , as n tends to infinity. Hence is bounded above , since it approaches 3 )/ 3 − Q(n 3 well-defined .)

8 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – 35 n/ 3 R(n) , − Q(n) is monotone increasing with R(n)  0 for all integers n  0. 3 ≡ C Here  R(x) x being a real variable and the fact that C with C function continuous By considering a Proof. can be , R(x) 1  R(n) is monotone increasing for easily proved to be monotone increasing for x 0 (Lemma A.2 in Appendix). Hence, / 1 1 / 3 3 3 1 / C = ) 1 R( 0. Furthermore,  n all integers 3 − )  − 37 p . / 2 = p ) , since C  C C = 39 p Q( /( 2 · 3 3 1 1 1 1 2 2  Therefore, R(n) p 0 for all integers n  > 1. 1 Now, we prove that inequality ( 2) holds by induction on n . 1, since = n . Therefore, inequality (2) holds for P (n) by the definition of ) Basis 1 ) = p = 1 T( 1. We have = n : P( 1 1 p ) R(  . 1   n n N , and prove that it also holds for : We assume that inequality (2) holds for all integers Induction step ,1 N + 1. n = EXT SUBG CAND ( when | SUBG |= n = N + 1, | , Consider EXPAND ) |=| −  (u) |= CAND  = 0 with k u −  (u) ={ v CAND ,v } at the first entrance to statement 5. Then, just as in Lemma 2(i), we have ,...,v k 2 1 k ∑ T (n) ( | SUBG | ) P (n), T  + = ) T( | SUBG | v k k i 1 i = | SUBG where by Lemma 2(ii). Then, by the induction hypothesis we have n − 1 = N  | v i k k ∑ ∑ ). | | )  T( | SUBG R( | SUBG v v i i i i = = 1 1 Since is monotone increasing and SUBG R(n) | | n − k ,wehave  v i k ∑ k). R( | SUBG − kR(n | )  v i 1 = i Combining these inequalities yields T  − k) + P (n) (n) kR(n k 3 (n − k)/ 3 ≡ kC − kQ(n − k) + P (n) 3 n/ − k/ 3 3 k = . −{ kQ(n − k) − P (n) } 3 (3) · C k = 3, we have In the case n/ 3 T C 3 3 (n) −{  Q(n − 3 ) − P (n) } . 3 Now, consider the case where  = 3 (with k  1). We shall show that k 3 − k/ 3 n/ 3 n/ k 3 · −{ kQ(n − k) − P (n) C  C 3 3 } −{ 3 Q(n − 3 ) − P (n) } (4) for all integers n 1, provided that C  C  . Modifying inequality ( 4), the problem is equivalent to proving the following 3 inequality. kQ(n ) 3 − − k) Q(n 3 − C 3 = .  k where  (5) 3 n/ 3 − k/ 3 k 1 ( − 3 · ) Here, kQ(n − k)  0 for 1  k  n . Therefore, for the left-hand side of inequality ( 5), we have 3 − 3 ) − kQ(n − k) Q(n ) 3 − Q(n 3 (6) . C   3 3 3 − / 2 − k/ 3 n/ 3 n/ 1 · 3 − k 3 ( ) · 3 ( 1 − 3 · ) 2

9 36 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – k/ 3 − 2 / 3 − k Here, the first inequality holds since  2 · (Lemma A.3 in Appendix) and the second inequality comes 3 3 C from the definition of . 3  C Now from the fact that C  4) holds for all integers , inequality ( 1 and 1  k n n .  3 Combining inequalities (3) and (4) yields 3 n/ 3 n/ 3 −  −{ 3 Q(n − 3 ) − P (n) }= C 3 (n) C Q(n)). Q(n) ( from the definition of T k Substituting this inequality into Eq. ( 1), we have 3 n/ C 3  T (n) − Q(n). 2) also holds for n = N + 1. Therefore, inequality (2) holds for all integers n  1. Hence the Thus, inequality (  result. n/ 3 ) for an n -vertex ( 3 G )isO Therefore, we conclude that the worst-case running time of the algorithm CLIQUES( G = (V , E) . graph 3 n/ (n 3 Here, we note that if we output a list of all the individual maximal cliques, it takes O ) -time in the worst case. 5. Computational experiments In addition to the preceding theoretical analysis of the algorithm CLIQUES, we have implemented CLIQUES in the programming language C and have performed computational experiments to evaluate it in practice [18]. The computer used had a Pentium4 2.20 GHz CPU with 2 GB main memory and a Linux operating system. The compiler and flags used are gcc-O2. These are the same as those in [22]. (So, Clique Benchmark Results in Appendix of [22] and dfmax CPU time for DIMACS benchmark graphs in Table 3 of [22] can be effectively used for calibrating CPU times in different computer environments.) For comparison, we have also implemented and evaluated the algorithm CLIQUE of Chiba and Nishizeki [7], the algorithms A AX C LIQUES (AMC for short) and A LL M AX C LIQUES LL M * (AMC* for short) of Makino and Uno [14] in the same manner as above. In these experiments, the algorithms other than CLIQUES are modified to yield only the number of maximal cliques in order to exclude the requirement of a huge amount of memory space for listing all the maximal cliques. ∗  It is to be noted that the working of AMC* of [14] depends on the value of G such that only for the input graph ∗  ) of vertices in G have a degree larger than  some number ( . To evaluate the overall performance of AMC* that ∗ is independent of the input graphs, we set  = n/ 100 for any input graph of n vertices. This is because it is not ∗ for an unknown graph. In addition, we have confirmed in the  necessarily easy to choose an appropriate value of ∗ following experiments that AMC* with  n/ 100 is much faster than the basic algorithm of Makino and Uno that = is based on Theorem 2 of [14]; that was used in their computational experiments in Section 8 of [14]. Moreover, this ∗  AMC* is faster than AMC*s with other settings of for a wide variety of graphs. First, a random graph is generated for each pair of p (edge probability) in Table 1 so n (the number of vertices) and p  cliques” in Table 1 shows the that every pair of vertices has an edge with a probability . The third column titled “ number of maximal cliques of each random graph generated as discussed above. The CPU time in seconds needed to yield the number of maximal cliques for each random graph using CLIQUE [7], AMC, and AMC* [14] are listed in ( 24 h = 86 , 400 sec ) . The last but one column presents the CPU time the fourth, fifth, and sixth columns, respectively using CLIQUES for each random graph. The bold-faced entries are the fastest in a given row. The final column titled 6 6 “/cliques” shows the average CPU time in seconds for generating 10  cliques) × 10 maximal cliques, i.e., (CPU time/ , for each graph. Table 2 shows the corresponding CPU time using CLIQUE, AMC, AMC*, and CLIQUES for Moon–Moser graphs [17] and DIMACS benchmark graphs [10], where M–M– n stands for Moon–Moser graphs of n vertices, and den- sity stands for edge density of the graph, i.e., (the number of edges in the graph)/ { n(n − 1 )/ 2 } for an n -vertex graph. To compare our experimental results with those of [14], we have also examined sparse locally random graphs proposed in [14]. Their locally random graphs are generated such that edges exist randomly only in certain limited

10 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 – 37 42 Table 1 CPU time (sec) for random graphs AMC AMC* CLIQUES /cliques Graphs CLIQUE cliques np [14] [7]  [14] 2.81 0.088 1.30 10.38 1.60 0.6 65,216 107.13 16.64 9.52 0.54 100 0.7 415,412 1.21 2,035.18 179.51 112.06 6.63 1.05 5,467,664 0.8 > 24 h 7,434.26 6,533.98 0.9 4.41 313,069,088 330.19 3,882 0.26 0.067 0.0041 1.06 0.1 0.62 5.33 1.04 0.022 1.17 18,737 3.26 0.2 41.28 26.77 9.55 0.14 0.3 96,298 1.45 0.4 559,838 364.89 197.20 78.26 0.88 1.57 300 4,874,385 5,645.05 0.5 789.23 8.54 1.75 1,759.61 0.6 > 24 h 24,104.49 12,937.56 140.75 1.06 132,240,024 3,356,452,714 > > 24 h > 24 h 6,279.51 1.87 0.7 24 h 6.04 0.69 15,252 0.018 1.18 0.1 3.13 0.2 72.84 48.60 14.17 0.13 1.31 500 99,259 728,567 812.10 814.98 196.63 1.19 1.63 0.3 97,419,729 0.5 24 h > 24 h 42,612.27 208.16 2.14 > 37,563 21.86 3.15 0.051 1.36 0.1 29.91 0.2 325,479 485.38 367.96 700 0.51 1.57 98.64 0.3 3,094,828 9,197.77 5,201.25 1,806.24 5.42 1.75 0.5 917,376,496 > 24 h > 24 h > 24 h 2,144.31 2.34 0.1 99,062 143.03 19.39 0.21 2.12 179.80 0.2 1,183,584 4,486.30 829.52 2.25 1.90 1,000 3,750.59 > 20,615.89 > 24 h 15,362,096 33.18 2.16 0.3 24 h 747,300 6,384.56 10,149.20 665.59 2.32 3.10 2,000 0.1 0.1 2,945,211 > 24 h 5,905.18 11.26 3.82 3,000 > 24 h 86.60 4.67 5,000 > 24 h 0.1 18,483,855 49,738 13.30 10.86 218.34 0.001 282.40 141,651 2,335.88 111.78 11.18 78.93 0.003 0.005 215,129 364.29 11.74 54.57 6,138.69 10,000 348,552 22,426.63 645.75 14.78 42.40 0.01 0.03 3,738,814 > 24 h 11,315.03 41.29 11.04 0.05 12,139,182 24 h > 24 h 109.78 9.04 > 42,572,404 24 h > 24 h 338.23 7.94 0.07 > 229,786,397 > 24 h > 0.1 1,825.45 7.94 24 h local regions. In other words, given n and r , a locally random graph with n vertices is generated such that different vertices v . and v r are adjacent with a probability 1 / 2if i + n − j( mod n)  r or j + n − i( mod n)  j i Table 3 shows the results for (a) r = 10 and (b) r = 30. Table 4 shows the CPU time for sparse locally random graphs with 10,000 vertices for several values of r . From the above experimental results, it is observed that CLIQUES is considerably faster than CLIQUE. Further, it is also confirmed to be considerably faster than the algorithm of Tsukiyama et al. [26] via the computational results in Section 8 of [14]. CLIQUES is faster than AMC and AMC* for most of the graphs tested here, while AMC and AMC* can be faster than CLIQUES when the number of maximal cliques is extremely small (e.g., c-fat200-5, c-fat500-10). ∗ AMC* can be faster than CLIQUES when the input graph is very sparse with localized edges, especially if  is chosen appropriately according to the individual graph. Makino and Uno [14] reported interesting results for very large sparse

11 38 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – Table 2 CPU time (sec) for Moon–Moser (M–M) graphs and DIMACS benchmark graphs AMC AMC* CLIQUES /cliques Graphs CLIQUE cliques [14] [14]  Name [7] ( n , density) 1.89 0.12 0.15 0.020 0.34 M-M-30 59,049 ( 30, 0.931) M-M-45 46.44 67.47 4.80 0.33 14,348,907 1,309.23 ( 45, 0.954) M-M-48 5,057.65 153.21 224.41 12.41 0.29 43,046,721 ( 48, 0.957) 129,140,163 M-M-51 496.76 740.57 32.95 0.26 16,532.50 ( 51, 0.960) 3,486,784,401 24 h 16,585.75 26,152.31 894.90 0.26 M-M-60 > ( 60, 0.966) 10,460,353,203 > 24 h 47,817.37 > 24 h 2,666.90 0.25 M-M-63 ( 63, 0.968) MANN_a9 52.64 2.24 2.93 0.23 0.39 590,887 ( 45, 0.927) 431,586 181.42 75.16 35.91 0.65 1.51 brock200_2 (200, 0.496) c-fat200-5 7 0.30 0.0029 0.0031 0.0054 771.4 (200, 0.426) c-fat500-10 6.62 0.025 0.028 0.058 7250.0 8 (500, 0.374) hamming6-2 301.00 11.97 16.98 1.21 0.94 1,281,402 ( 64, 0.905) 464 0.018 0.0086 0.0056 0.00043 0.93 hamming6-4 ( 64, 0.349) johnson8-4-4 13.85 1.82 1.95 0.11 0.96 114,690 ( 70, 0.768) 2,027,025 907.51 150.68 153.42 4.38 2.16 johnson16-2-4 (120, 0.765) keller4 10,284,321 3,446.61 1,145.66 490.76 4.98 0.49 (171, 0.649) p_hat300-1 19.23 13.52 3.52 0.07 1.20 58,176 (300, 0.244) 99.72 p_hat300-2 24 h 16,035.71 4,130.20 79,917,408 1.25 > (300, 0.489) bipartite graphs obtained from real world data. Note in CLIQUES, that the ratio of CPU time to the number of cliques varies depending on the graphs; the variation is generally not large, except for a few cases. 6. Concluding remarks We have presented an algorithm for generating all the maximal cliques in a graph of n vertices and have proved n/ 3 that its worst-case time complexity is O 3 ( ) that is optimal as a function of n . It is expected to be very complicated to analyze the time complexity of our algorithm theoretically as a function of the number of maximal cliques in the graph. The algorithm CLIQUES is very simple and consequently easy to implement. Our computational experiments demonstrate that CLIQUES runs very fast in practice. In addition, our depth-first search algorithm can be coupled with some heuristics to yield fast algorithms MCQ and others for finding a maximum clique [22]. They have been successfully applied to interesting problems in bioinformatics [1–4], the design of DNA and RNA sequences for biomolecular computation [11], and other problems.

12 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 – 39 42 Table 3 CPU time (sec) for sparse locally random graphs AMC AMC* CLIQUES /cliques Graphs CLIQUE  n [14] [14] cliques Density [7] 10 = r (a) 2.38 0.45 1,000 0.016 3.57 0.010 4,487 0.04 9,369 18.76 4.70 0.38 0.075 2,000 0.0051 8.01 0.0025 31.48 1.55 0.86 44.87 4,000 19,166 29,192 3.30 3.56 0.0017 113.04 6,000 88.91 0.0013 160.16 6.25 6.25 159.52 8,000 39,179 0.0010 48,975 261.27 10,000 10.49 214.19 9.51 (b) = 30 r 0.030 13,043 3.38 0.38 0.025 1.92 1,000 8.93 25,803 1.49 44.79 0.016 0.10 3.88 2,000 57.13 53,059 243.66 9.31 0.89 16.77 4,000 0.0075 0.0050 81,145 6,000 25.35 3.11 38.33 693.60 8,000 110,354 1,271.13 46.92 5.98 54.19 0.0038 0.0030 73.55 2,075.70 10,000 10.02 71.93 139,304 Table 4 n = 10 , 000 CPU time (sec) for sparse locally random graphs with Graphs AMC AMC* CLIQUES /cliques r Density cliques [14] [14]  0.001 48,975 9.51 10.49 214.19 10 261.27 95,120 10.20 49.45 0.002 107.23 20 952.25 0.004 3,601.09 130.76 9.90 54.57 40 181,424 0.008 386,360 14,448.21 431.20 80 28.34 10.95 120 746,852 35,866.69 530.53 12.97 17.37 0.012 0.016 1,066.62 > 24 h 160 16.85 11.96 1,408,360 240 4,306,123 > 24 h 4,350.94 33.75 7.84 0.024 320 0.032 11,488,405 > 24 h 15,655.05 65.06 5.66 480 0.048 > 24 h > 24 h 293.97 4.42 66,513,724 Acknowledgments The authors are grateful to T. Nakagawa for his contributions to the computational experiments [18]. They would also like to acknowledge useful comments by S. Tsukiyama, T. Nishizeki, T. Uno, K. Makino, E. Harley, D. Hochbaum, E.L. Lawler, R. Karp, J. Tarui, T. Akutsu, and the referees. This research has been supported in part by Grants- in-Aid for Scientific Research nos. 13680435 and 16300001 from the Ministry of Education, Culture, Sports, Science and Technology, Japan, and the Research Fund of the University of Electro-Communications. The research was also provided with a grant by Funai Foundation for Information Technology. Appendix A.1. Proof of Theorem 1 (Correctness of the algorithm CLIQUES) Q ={ p As in Fig. 1, let ,...,p } be a complete subgraph that has been found at some stage, and consider a set ,p 1 2 d of vertices SUBG V ∩  (p = ), ) ∩  (p (p ) ∩···∩  2 1 d where SUBG = V 0. Q =∅ when d = and

13 40 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – Assume that SUBG ∪ CAND (  =∅ ), = FINI ′ ′ } have been generated without duplication for every q ∈ FINI . ∪{ and that all and only maximal cliques containing Q q CAND u in SUBG be chosen at statement 4 in Fig. 2. In what , Then we consider EXPAND( SUBG ), and let a vertex initial value at the first entrance to statement 5. Further, let follows, CAND stands exclusively for this ,v ,...,v }  =∅ (u) v ={ CAND  − 2 1 k . Let ,...,v ,v and vertex v at statement 6 be chosen in the order q 1 k 2 ∩  (v SUBG ), = SUBG i v i CAND = CAND ∩  (v and ), i v i (i) CAND −{ ,...,v = ,v . CAND } v 1 i 2 1 − (See Fig. 5.) Then, we have the next claim:  : CAND −  (u) ( 1 ∈ i  k) chosen as q at statement 6, the following holds Claim A.1. For every (i) v i (i) The expansion from vertex v CAND ) generates all and only maximal cliques by an application of EXPAND(SUBG , i without duplication that contain Q ∪{ v and does not contain any vertex in FINI ∪{ } . ,v } ,...,v v i − 1 2 1 i When all the maximal cliques in G(Q ∪{ q } ), q ∈ FINI ∪ ( CAND −  (u)) (ii) , we have finished being generated have no new maximal cliques that contain Q . We prove the claim by induction on n SUBG | . Proof of Claim A.1. =|  n  2, the properties are easily confirmed to hold by the analysis of each case. Basis : In the case 1 : We assume that the properties hold for all positive integers n  N Induction step , and prove that they also hold for n N + 1. = 1 ( ) CAND i = (i) First, we are concerned with the case where 1 and = CAND . maximal is clearly a } v =∅ :“ clique ,” is printed after “ v ∪{ ,”. The generated set of vertices Q SUBG (a) Case 1 v 1 1 FINI . clique and does not contain any vertex in ′ SUBG (b) Case SUBG u ). ∈  =∅ be chosen at statement 4, and consider EXPAND( SUBG : let a vertex , CAND v v v v 1 1 1 1 ′ CAND (b-1) If , then application of EXPAND(  )  =∅ − SUBG ) guarantees to generate without , CAND (u v v v 1 1 1 } and does not contain any vertex in FINI = duplication all and only maximal cliques that contain v ∪{ Q 1 SUBG CAND , since | SUBG − . |  N and the induction hypothesis then holds true for SUBG v v 1 1 ′ (b-2) If CAND −  (u ,” ) =∅ , application of EXPAND( SUBG back ) does nothing but prints “ , CAND v v v 1 1 1 since maximal clique regarded as a ,”. Then, the set of vertices Q ∪{ v not } is immediately after “ v 1 1 SUBG ∪{ . This is because there exist no new maximal cliques containing =∅  v } than the previously Q 1 v 1 generated ones. After the application of EXPAND( SUBG v CAND at statement 11. ), the vertex , is moved from CAND to FINI v 1 v 1 1 , and the above ,...,k 3 , 2 = i at the following stages step by step for }  =∅ , then we choose v −{ (u)  − CAND If v i 1 argument follows. (ii) All maximal cliques that contain are in Q ∪ SUBG . We are under the assumption that all maximal cliques in Q ∈ ∪{ G(Q ) , q q SUBG −  (u) have finished being generated, then any remaining possibility of a new maximal clique } R is such that R = Q ∪ S for some S ⊆ SUBG ∩  (u) . However, R ∪{ u } is a larger clique, and this implies that R is not . Hence, the result follows. maximal Summarizing the above, the claim has been shown to hold for any positive integer n =| SUBG | as a consequence of induction. 

14 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 – 42 41 (u) =∅ ) and SUBG = CAND = V( FINI =∅ ) , then consideration of CAND −  0 = (d Nowifwelet = Q { v ,v } ,...,v  gives the proof of Theorem 1. 1 2 k ) . Consider the following continuous function R(x) Lemma A.2 ( For the first part of Proof of Theorem 3 in Section 4 with x being a real variable : x/ 3 = C 3 R(x) Q(x), − where 2 and + x . x + q 3 , with q as in Theorem ,q q , q = Q(x) q 2 1 3 2 1 3 x then 0. = 3 q  / ln 3, is monotone increasing for R(x) C  If C 1 2 Proof. Differentiating the function R(x) ,wehave x/ 3 − 1 ln 3 · C 3 d R(x)/ d x = 2 x − q . − q 2 1 In addition, 2 2 3 3 − 2 2 x/ 3 − 2 2 x/ x/ x / = − 2 q ,  ( ln 3 ) ( · C 0 3 ln 3 ) R(x)/ · − 2 q  ={ ( 3ln3 C 2 ) 3 3 d − 1 }· p x > 0 for d 1 1 1 1 > x R(x)/ d x is monotone increasing for 0. Hence, d  0. Thus, for x  0, p since 1 . 3 = = ln 3 · C/ 3 − q q  ln 3 · C − / 0 R(x)/ d x ] d R(x)/ d x  [ d 1 2 = x 2 0 is monotone increasing for  0.  R(x) x Therefore, For the denominator of inequality ( 6) in Proof of Theorem 3 in Section 4 Lemma A.3 . For all positive integers k  = 3, ( ) it holds that 3 − k/ 3 / 2 − k 3  3 . · 2 f(x) Consider the following continuous function x being a real variable: Proof. with − x/ 3 f(x) = x 3 . − x/ 3 x> 0 for x< ln 3, / 3 ( 1 − x · ln 3 / 3 ) . Then d f(x)/ d x> 0 for x< 3 / ln 3, d f(x)/ d 3 = d f(x)/ Here, we have d x [ f(x)/ d x ] and d ..., f(x) 7307 . f(x) = 0. Subsequently, is is monotone increasing for x< 3 / ln 3 = 2 / 3 = x ln 3 monotone decreasing for x> 3 / ln 3, and f(x)  f( 3 / ln 3 ) for all x . In addition, f( 2 ) = 0 . 9614 ...>f( 4 ) = 0 . ... . Thus, the result is obtained.  9244 References [1] T. Akutsu, M. Hayashida, E. Tomita, J. Suzuki, K. Horimoto, Protein threading with profiles and constraints, Proc. IEEE Symp. Bioinform. Bioeng. (2004) 537–544; T. Akutsu, M. Hayashida, D.K.C. Bahadur, E. Tomita, J. Suzuki, K. Horimoto, Dynamic programming and clique based approaches for protein threading with profiles and constraints, IEICE Trans. Fund. Electron. Commun. Comput. Sci. E89-A (2006) 1215–1222. [2] D.K.C. Bahadur, T. Akutsu, E. Tomita, T. Seki, A. Fujiyama, Point matching under non-uniform distortions and protein side chain packing based on efficient maximum clique algorithms, Genome Inform. 13 (2002) 143–152. [3] D.K.C. Bahadur, E. Tomita, J. Suzuki, T. Akutsu, Protein side-chain packing problem: a maximum edge-weight clique algorithmic approach, J. Bioinform. Comput. Biol. 3 (2005) 103–126. D.K.C. Bahadur, E. Tomita, J. Suzuki, K. Horimoto, T. Akutsu, Protein threading with profiles and distance constraints using clique based [4] algorithms, J. Bioinform. Comput. Biol. 4 (2006) 19–42. [5] I.M. Bomze, M. Budinich, P.M. Pardalos, M. Pelillo, The maximum clique problem, in: D.-Z. Du, P.M. Pardalos (Eds.), Handbook of Combinatorial Optimization, Supplement Vol. A, Kluwer Academic Publishers, Dordrecht, 1999, pp. 1–74. [6] C. Bron, J. Kerbosch, Algorithm 457, finding all cliques of an undirected graph, Comm. ACM 16 (1973) 575–577. [7] N. Chiba, T. Nishizeki, Arboricity and subgraph listing algorithms, SIAM J. Comput. 14 (1985) 210–223. [8] M. Hattori, Y. Okuno, S. Goto, M. Kanehisa, Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways, J. Amer. Chem. Soc. 125 (2003) 11853–11865.

15 42 E. Tomita et al. / Theoretical Computer Science 363 (2006) 28 42 – H.C. Johnston, Cliques of a graph—variations on the Bron–Kerbosch algorithm, Int. J. Comput. Inform. Sci. 5 (1976) 209–238. [9] D.S. Johnson, M.A. Trick (Eds.), Cliques, Coloring, and Satisfiability, DIMACS Series in Discrete Mathematics and Theoretical Computer [10] Science, Vol. 26, American Mathematical Society, Providence, RI, 1996. [11] S. Kobayashi, T. Kondo, K. Okuda, E. Tomita, Extracting globally structure free sequences by local structure freeness, Proc. DNA 9 (2003) 206. [12] I. Koch, Enumerating all connected maximal common subgraphs in two graphs, Theoret. Comput. Sci. 250 (2001) 1–30. E.L. Lawler, J.K. Lenstra, A.H.G. Rinnooy Kan, Generating all maximal independent sets: NP-hardness and polynomial-time algorithms, SIAM [13] J. Comput. 9 (1980) 558–565. [14] K. Makino, T. Uno, New algorithms for enumerating all maximal cliques, SWAT 2004, Lecture Notes in Computer Science, Vol. 3111, 2004, pp. 260–272. [15] S. Mohseni-Zadeh, P. Brézellec, J.-L. Risler, Cluster-C, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques, Comput. Biol. Chem. 28 (2004) 211–218. [16] S. Mohseni-Zadeh, A. Louis, P. Brézellec, J.-L. Risler, PHYTOPROT: a database of clusters of plant proteins, Nucleic Acids Res. 32 (2004) D351–D353. [17] J.W. Moon, L. Moser, On cliques in graphs, Israel J. Math. 3 (1965) 23–28. [18] T. Nakagawa, E. Tomita, Experimental comparisons of algorithms for generating all maximal cliques, Technical Report of IPSJ, 2004-MPS-52, 2004, pp. 41–44. [19] P.M. Pardalos, J. Xue, The maximum clique problem, J. Global Optim. 4 (1994) 301–328. [20] J.M. Robson, Algorithms for maximum independent sets, J. Algorithms 7 (1986) 425–440. [21] R.E. Tarjan, A.E. Trojanowski, Finding a maximum independent set, SIAM J. Comput. 6 (1977) 537–546. [22] E. Tomita, T. Seki, An efficient branch-and-bound algorithm for finding a maximum clique, DMTCS 2003, Lecture Notes in Computer Science, Vol. 2731, 2003, pp. 278–289; E. Tomita, T. Kameda, An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments, J. Global Optim. (2006), in press; Y. Sutani, E. Tomita, Computational experiments and analyses of a more efficient algorithm for finding a maximum clique, Technical Report of IPSJ, 2005-MPS-57, 2005, pp. 45–48, and others. [23] E. Tomita, A. Tanaka, H. Takahashi, The worst-case time complexity for finding all the cliques, Technical Report of the University of Electro- Communications, UEC-TR-C5(2), 1988. [24] E. Tomita, A. Tanaka, H. Takahashi, An optimal algorithm for finding all the cliques, Technical Report of IPSJ, 1989-AL-12, 1989, pp. 91–98. [25] E. Tomita, A. Tanaka, H. Takahashi, The worst-case time complexity for generating all maximal cliques, COCOON 2004, Lecture Notes in Computer Science, Vol. 3106, 2004, pp. 161–170. [26] S. Tsukiyama, M. Ide, H. Ariyoshi, I. Shirakawa, A new algorithm for generating all the maximal independent sets, SIAM J. Comput. 6 (1977) 505–517.

Related documents