TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Graph Analysis and Applications in Clustering and Content-based Image Retrieval

Tutkimustuotos

Standard

Graph Analysis and Applications in Clustering and Content-based Image Retrieval. / Zhang, Honglei.

Tampere University, 2019. 122 s. (Tampere University Dissertations; Vuosikerta 101).

Tutkimustuotos

Harvard

Zhang, H 2019, Graph Analysis and Applications in Clustering and Content-based Image Retrieval. Tampere University Dissertations, Vuosikerta. 101, Vuosikerta. 101, Tampere University.

APA

Zhang, H. (2019). Graph Analysis and Applications in Clustering and Content-based Image Retrieval. (Tampere University Dissertations; Vuosikerta 101). Tampere University.

Vancouver

Zhang H. Graph Analysis and Applications in Clustering and Content-based Image Retrieval. Tampere University, 2019. 122 s. (Tampere University Dissertations).

Author

Zhang, Honglei. / Graph Analysis and Applications in Clustering and Content-based Image Retrieval. Tampere University, 2019. 122 Sivumäärä (Tampere University Dissertations).

Bibtex - Lataa

@book{d6a455de1a5c4d6aa05c586517960518,
title = "Graph Analysis and Applications in Clustering and Content-based Image Retrieval",
abstract = "About 300 years ago, when studying Seven Bridges of K{\"o}nigsberg problem - a famous problem concerning paths on graphs - the great mathematician Leonhard Euler said, “This question is very banal, but seems to me worthy of attention”. Since then, graph theory and graph analysis have not only become one of the most important branches of mathematics, but have also found an enormous range of important applications in many other areas. A graph is a mathematical model that abstracts entities and the relationships between them as nodes and edges. Many types of interactions between the entities can be modeled by graphs, for example, social interactions between people, the communications between the entities in computer networks and relations between biological species. Although not appearing to be a graph, many other types of data can be converted into graphs by cer- tain operations, for example, the k-nearest neighborhood graph built from pixels in an image.Cluster structure is a common phenomenon in many real-world graphs, for example, social networks. Finding the clusters in a large graph is important to understand the underlying relationships between the nodes. Graph clustering is a technique that partitions nodes into clus- ters such that connections among nodes in a cluster are dense and connections between nodes in different clusters are sparse. Various approaches have been proposed to solve graph clustering problems. A common approach is to optimize a predefined clustering metric using different optimization methods. However, most of these optimization problems are NP-hard due to the discrete set-up of the hard-clustering. These optimization problems can be relaxed, and a sub-optimal solu- tion can be found. A different approach is to apply data clusteringalgorithms in solving graph clustering problems. With this approach, one must first find appropriate features for each node that represent the local structure of the graph. Limited Random Walk algorithm uses the random walk procedure to explore the graph and extracts ef- ficient features for the nodes. It incorporates the embarrassing parallel paradigm, thus, it can process large graph data efficiently using mod- ern high-performance computing facilities. This thesis gives the details of this algorithm and analyzes the stability issues of the algorithm.Based on the study of the cluster structures in a graph, we define the authenticity score of an edge as the difference between the actual and the expected number of edges that connect the two groups of the neighboring nodes of the two end nodes. Authenticity score can be used in many important applications, such as graph clustering, outlier detection, and graph data preprocessing. In particular, a data clus- tering algorithm that uses the authenticity scores on mutual k-nearest neighborhood graph achieves more reliable and superior performance comparing to other popular algorithms. This thesis also theoretically proves that this algorithm can asymptotically find the complete re- covery of the ground truth of the graphs that were generated by a stochastic r-block model.Content-based image retrieval (CBIR) is an important application in computer vision, media information retrieval, and data mining. Given a query image, a CBIR system ranks the images in a large image database by their “similarities” to the query image. However, because of the ambiguities of the definition of the “similarity”, it is very diffi- cult for a CBIR system to select the optimal feature set and ranking algorithm to satisfy the purpose of the query. Graph technologies have been used to improve the performance of CBIR systems in var- ious ways. In this thesis, a novel method is proposed to construct a visual-semantic graph—a graph where nodes represent semantic concepts and edges represent visual associations between concepts. The constructed visual-semantic graph not only helps the user to locate the target images quickly but also helps answer the questions related to the query image. Experiments show that the efforts of locating the target image are reduced by 25{\%} with the help of visual-semantic graphs.Graph analysis will continue to play an important role in future data analysis. In particular, the visual-semantic graph that captures important and interesting visual associations between the concepts is worthyof further attention.",
author = "Honglei Zhang",
year = "2019",
month = "8",
day = "9",
language = "English",
isbn = "978-952-03-1183-4",
volume = "101",
series = "Tampere University Dissertations",
publisher = "Tampere University",

}

RIS (suitable for import to EndNote) - Lataa

TY - BOOK

T1 - Graph Analysis and Applications in Clustering and Content-based Image Retrieval

AU - Zhang, Honglei

PY - 2019/8/9

Y1 - 2019/8/9

N2 - About 300 years ago, when studying Seven Bridges of Königsberg problem - a famous problem concerning paths on graphs - the great mathematician Leonhard Euler said, “This question is very banal, but seems to me worthy of attention”. Since then, graph theory and graph analysis have not only become one of the most important branches of mathematics, but have also found an enormous range of important applications in many other areas. A graph is a mathematical model that abstracts entities and the relationships between them as nodes and edges. Many types of interactions between the entities can be modeled by graphs, for example, social interactions between people, the communications between the entities in computer networks and relations between biological species. Although not appearing to be a graph, many other types of data can be converted into graphs by cer- tain operations, for example, the k-nearest neighborhood graph built from pixels in an image.Cluster structure is a common phenomenon in many real-world graphs, for example, social networks. Finding the clusters in a large graph is important to understand the underlying relationships between the nodes. Graph clustering is a technique that partitions nodes into clus- ters such that connections among nodes in a cluster are dense and connections between nodes in different clusters are sparse. Various approaches have been proposed to solve graph clustering problems. A common approach is to optimize a predefined clustering metric using different optimization methods. However, most of these optimization problems are NP-hard due to the discrete set-up of the hard-clustering. These optimization problems can be relaxed, and a sub-optimal solu- tion can be found. A different approach is to apply data clusteringalgorithms in solving graph clustering problems. With this approach, one must first find appropriate features for each node that represent the local structure of the graph. Limited Random Walk algorithm uses the random walk procedure to explore the graph and extracts ef- ficient features for the nodes. It incorporates the embarrassing parallel paradigm, thus, it can process large graph data efficiently using mod- ern high-performance computing facilities. This thesis gives the details of this algorithm and analyzes the stability issues of the algorithm.Based on the study of the cluster structures in a graph, we define the authenticity score of an edge as the difference between the actual and the expected number of edges that connect the two groups of the neighboring nodes of the two end nodes. Authenticity score can be used in many important applications, such as graph clustering, outlier detection, and graph data preprocessing. In particular, a data clus- tering algorithm that uses the authenticity scores on mutual k-nearest neighborhood graph achieves more reliable and superior performance comparing to other popular algorithms. This thesis also theoretically proves that this algorithm can asymptotically find the complete re- covery of the ground truth of the graphs that were generated by a stochastic r-block model.Content-based image retrieval (CBIR) is an important application in computer vision, media information retrieval, and data mining. Given a query image, a CBIR system ranks the images in a large image database by their “similarities” to the query image. However, because of the ambiguities of the definition of the “similarity”, it is very diffi- cult for a CBIR system to select the optimal feature set and ranking algorithm to satisfy the purpose of the query. Graph technologies have been used to improve the performance of CBIR systems in var- ious ways. In this thesis, a novel method is proposed to construct a visual-semantic graph—a graph where nodes represent semantic concepts and edges represent visual associations between concepts. The constructed visual-semantic graph not only helps the user to locate the target images quickly but also helps answer the questions related to the query image. Experiments show that the efforts of locating the target image are reduced by 25% with the help of visual-semantic graphs.Graph analysis will continue to play an important role in future data analysis. In particular, the visual-semantic graph that captures important and interesting visual associations between the concepts is worthyof further attention.

AB - About 300 years ago, when studying Seven Bridges of Königsberg problem - a famous problem concerning paths on graphs - the great mathematician Leonhard Euler said, “This question is very banal, but seems to me worthy of attention”. Since then, graph theory and graph analysis have not only become one of the most important branches of mathematics, but have also found an enormous range of important applications in many other areas. A graph is a mathematical model that abstracts entities and the relationships between them as nodes and edges. Many types of interactions between the entities can be modeled by graphs, for example, social interactions between people, the communications between the entities in computer networks and relations between biological species. Although not appearing to be a graph, many other types of data can be converted into graphs by cer- tain operations, for example, the k-nearest neighborhood graph built from pixels in an image.Cluster structure is a common phenomenon in many real-world graphs, for example, social networks. Finding the clusters in a large graph is important to understand the underlying relationships between the nodes. Graph clustering is a technique that partitions nodes into clus- ters such that connections among nodes in a cluster are dense and connections between nodes in different clusters are sparse. Various approaches have been proposed to solve graph clustering problems. A common approach is to optimize a predefined clustering metric using different optimization methods. However, most of these optimization problems are NP-hard due to the discrete set-up of the hard-clustering. These optimization problems can be relaxed, and a sub-optimal solu- tion can be found. A different approach is to apply data clusteringalgorithms in solving graph clustering problems. With this approach, one must first find appropriate features for each node that represent the local structure of the graph. Limited Random Walk algorithm uses the random walk procedure to explore the graph and extracts ef- ficient features for the nodes. It incorporates the embarrassing parallel paradigm, thus, it can process large graph data efficiently using mod- ern high-performance computing facilities. This thesis gives the details of this algorithm and analyzes the stability issues of the algorithm.Based on the study of the cluster structures in a graph, we define the authenticity score of an edge as the difference between the actual and the expected number of edges that connect the two groups of the neighboring nodes of the two end nodes. Authenticity score can be used in many important applications, such as graph clustering, outlier detection, and graph data preprocessing. In particular, a data clus- tering algorithm that uses the authenticity scores on mutual k-nearest neighborhood graph achieves more reliable and superior performance comparing to other popular algorithms. This thesis also theoretically proves that this algorithm can asymptotically find the complete re- covery of the ground truth of the graphs that were generated by a stochastic r-block model.Content-based image retrieval (CBIR) is an important application in computer vision, media information retrieval, and data mining. Given a query image, a CBIR system ranks the images in a large image database by their “similarities” to the query image. However, because of the ambiguities of the definition of the “similarity”, it is very diffi- cult for a CBIR system to select the optimal feature set and ranking algorithm to satisfy the purpose of the query. Graph technologies have been used to improve the performance of CBIR systems in var- ious ways. In this thesis, a novel method is proposed to construct a visual-semantic graph—a graph where nodes represent semantic concepts and edges represent visual associations between concepts. The constructed visual-semantic graph not only helps the user to locate the target images quickly but also helps answer the questions related to the query image. Experiments show that the efforts of locating the target image are reduced by 25% with the help of visual-semantic graphs.Graph analysis will continue to play an important role in future data analysis. In particular, the visual-semantic graph that captures important and interesting visual associations between the concepts is worthyof further attention.

M3 - Doctoral thesis

SN - 978-952-03-1183-4

VL - 101

T3 - Tampere University Dissertations

BT - Graph Analysis and Applications in Clustering and Content-based Image Retrieval

PB - Tampere University

ER -