TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Limited random walk algorithm for big graph data clustering

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
Sivut26
Sivumäärä1
JulkaisuJournal of Big Data
Vuosikerta3
Numero1
DOI - pysyväislinkit
TilaJulkaistu - 2016
OKM-julkaisutyyppiA1 Alkuperäisartikkeli

Tiivistelmä

Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attractor vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrassingly parallel paradigm, it can be efficiently implemented and embedded in any parallel computing environment such as a MapReduce framework. Given enough computing resources, we are capable of clustering graphs with millions of vertices and hundreds millions of edges in a reasonable time.

Julkaisufoorumi-taso

Latausten tilastot

Ei tietoja saatavilla