Tampere University of Technology

TUTCRIS Research Portal

Limited random walk algorithm for big graph data clustering

Research output: Contribution to journalArticleScientificpeer-review


Original languageEnglish
Pages (from-to)26
Number of pages1
JournalJournal of Big Data
Issue number1
Publication statusPublished - 2016
Publication typeA1 Journal article-refereed


Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attractor vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrassingly parallel paradigm, it can be efficiently implemented and embedded in any parallel computing environment such as a MapReduce framework. Given enough computing resources, we are capable of clustering graphs with millions of vertices and hundreds millions of edges in a reasonable time.

Publication forum classification

Field of science, Statistics Finland

Downloads statistics

No data available