Tampere University of Technology

TUTCRIS Research Portal

Unite and conquer: Univariate and multivariate approaches for finding differentially expressed gene sets

Research output: Contribution to journalArticleScientificpeer-review

Standard

Unite and conquer : Univariate and multivariate approaches for finding differentially expressed gene sets. / Glazko, Galina V.; Emmert-Streib, Frank.

In: Bioinformatics, Vol. 25, No. 18, 09.2009, p. 2348-2354.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

APA

Vancouver

Author

Bibtex - Download

@article{8aae7b6a9abf4a0aa7433722c63e3711,
title = "Unite and conquer: Univariate and multivariate approaches for finding differentially expressed gene sets",
abstract = "Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses. Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.",
author = "Glazko, {Galina V.} and Frank Emmert-Streib",
year = "2009",
month = "9",
doi = "10.1093/bioinformatics/btp406",
language = "English",
volume = "25",
pages = "2348--2354",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "18",

}

RIS (suitable for import to EndNote) - Download

TY - JOUR

T1 - Unite and conquer

T2 - Univariate and multivariate approaches for finding differentially expressed gene sets

AU - Glazko, Galina V.

AU - Emmert-Streib, Frank

PY - 2009/9

Y1 - 2009/9

N2 - Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses. Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.

AB - Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses. Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one.

U2 - 10.1093/bioinformatics/btp406

DO - 10.1093/bioinformatics/btp406

M3 - Article

VL - 25

SP - 2348

EP - 2354

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 18

ER -