Portfolio EDEND: 2014-09-15

Publisher name: Oxford University Press PY - 2015/1/1 Y1 - 2015/1/1 N2 - Motivation: MS2-GFP-tagging of RNA is currently the only method to measure intervals between consecutive transcription events in live cells. For this, new transcripts must be accurately detected from intensity time traces. Results: We present a novel method for automatically estimating RNA numbers and production intervals from temporal data of cell fluorescence intensities that reduces uncertainty by exploiting temporal information. We also derive a robust variant, more resistant to outliers caused e.g. by RNAs moving out of focus. Using Monte Carlo simulations, we show that the quantification of RNA numbers and production intervals is generally improved compared with previous methods. Finally, we analyze data from live Escherichia coli and show statistically significant differences to previous methods. The new methods can be used to quantify numbers and production intervals of any fluorescent probes, which are present in low copy numbers, are brighter than the cell background and degrade slowly. Availability: Source code is available under Mozilla Public License at http://www.cs.tut.fi/%7ehakkin22/jumpdet/. Contact: AB - Motivation: MS2-GFP-tagging of RNA is currently the only method to measure intervals between consecutive transcription events in live cells. For this, new transcripts must be accurately detected from intensity time traces. Results: We present a novel method for automatically estimating RNA numbers and production intervals from temporal data of cell fluorescence intensities that reduces uncertainty by exploiting temporal information. We also derive a robust variant, more resistant to outliers caused e.g. by RNAs moving out of focus. Using Monte Carlo simulations, we show that the quantification of RNA numbers and production intervals is generally improved compared with previous methods. Finally, we analyze data from live Escherichia coli and show statistically significant differences to previous methods. The new methods can be used to quantify numbers and production intervals of any fluorescent probes, which are present in low copy numbers, are brighter than the cell background and degrade slowly. Availability: Source code is available under Mozilla Public License at http://www.cs.tut.fi/%7ehakkin22/jumpdet/. Contact: UR - http://www.scopus.com/inward/record.url?scp=84922352843&partnerID=8YFLogxK U2 - 10.1093/bioinformatics/btu592 DO - 10.1093/bioinformatics/btu592 M3 - Article VL - 31 SP - 69 EP - 75 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 1 ER - TY - JOUR T1 - Exploratory analysis of spatiotemporal patterns of cellular automata by clustering compressibility AU - Emmert-Streib, Frank PY - 2010/2/8 Y1 - 2010/2/8 N2 - In this paper we study the classification of spatiotemporal pattern of one-dimensional cellular automata (CA) whereas the classification comprises CA rules including their initial conditions. We propose an exploratory analysis method based on the normalized compression distance (NCD) of spatiotemporal patterns which is used as dissimilarity measure for a hierarchical clustering. Our approach is different with respect to the following points. First, the classification of spatiotemporal pattern is comparative because the NCD evaluates explicitly the difference of compressibility among two objects, e.g., strings corresponding to spatiotemporal patterns. This is in contrast to all other measures applied so far in a similar context because they are essentially univariate. Second, Kolmogorov complexity, which underlies the NCD, was used in the classification of CA with respect to their spatiotemporal pattern. Third, our method is semiautomatic allowing us to investigate hundreds or thousands of CA rules or initial conditions simultaneously to gain insights into their organizational structure. Our numerical results are not only plausible confirming previous classification attempts but also shed light on the intricate influence of random initial conditions on the classification results. AB - In this paper we study the classification of spatiotemporal pattern of one-dimensional cellular automata (CA) whereas the classification comprises CA rules including their initial conditions. We propose an exploratory analysis method based on the normalized compression distance (NCD) of spatiotemporal patterns which is used as dissimilarity measure for a hierarchical clustering. Our approach is different with respect to the following points. First, the classification of spatiotemporal pattern is comparative because the NCD evaluates explicitly the difference of compressibility among two objects, e.g., strings corresponding to spatiotemporal patterns. This is in contrast to all other measures applied so far in a similar context because they are essentially univariate. Second, Kolmogorov complexity, which underlies the NCD, was used in the classification of CA with respect to their spatiotemporal pattern. Third, our method is semiautomatic allowing us to investigate hundreds or thousands of CA rules or initial conditions simultaneously to gain insights into their organizational structure. Our numerical results are not only plausible confirming previous classification attempts but also shed light on the intricate influence of random initial conditions on the classification results. UR - http://www.scopus.com/inward/record.url?scp=76749153776&partnerID=8YFLogxK U2 - 10.1103/PhysRevE.81.026103 DO - 10.1103/PhysRevE.81.026103 M3 - Article VL - 81 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 2 M1 - 026103 ER - TY - JOUR T1 - Fault tolerance of information processing in gene networks AU - Emmert-Streib, Frank AU - Dehmer, Matthias PY - 2009/2/15 Y1 - 2009/2/15 N2 - The major objective of this paper is to study the fault tolerance of gene networks. For single gene knockouts we investigate the disturbance of the communication abilities of gene networks globally. For our study we use directed scale-free networks resembling important properties of gene networks, e.g., signaling, or transcriptional regulatory networks, as well as metabolic networks and define a Markov chain on the network to model the communication dynamics. This allows us to evaluate the spread of information in the network and, hence, detect differences due to single gene knockouts in the gene-to-gene communication asymptotically regarding the limiting stationary distributions governed by the Markov chain. Further, we study the connection of the global effect of the perturbations with local properties of the network topology by means of statistical hypothesis tests. AB - The major objective of this paper is to study the fault tolerance of gene networks. For single gene knockouts we investigate the disturbance of the communication abilities of gene networks globally. For our study we use directed scale-free networks resembling important properties of gene networks, e.g., signaling, or transcriptional regulatory networks, as well as metabolic networks and define a Markov chain on the network to model the communication dynamics. This allows us to evaluate the spread of information in the network and, hence, detect differences due to single gene knockouts in the gene-to-gene communication asymptotically regarding the limiting stationary distributions governed by the Markov chain. Further, we study the connection of the global effect of the perturbations with local properties of the network topology by means of statistical hypothesis tests. KW - Information processing KW - Markov chain KW - Robustness KW - Scale-free network UR - http://www.scopus.com/inward/record.url?scp=57349185507&partnerID=8YFLogxK U2 - 10.1016/j.physa.2008.10.032 DO - 10.1016/j.physa.2008.10.032 M3 - Article VL - 388 SP - 541 EP - 548 JO - Physica A: Statistical Mechanics and Its Applications JF - Physica A: Statistical Mechanics and Its Applications SN - 0378-4371 IS - 4 ER - TY - JOUR T1 - First-principles data set of 45,892 isolated and cation-coordinated conformers of 20 proteinogenic amino acids AU - Ropo, Matti AU - Schneider, Markus AU - Baldauf, Carsten AU - Blum, Volker PY - 2016/2/16 Y1 - 2016/2/16 N2 - We present a structural data set of the 20 proteinogenic amino acids and their amino-methylated and acetylated (capped) dipeptides. Different protonation states of the backbone (uncharged and zwitterionic) were considered for the amino acids as well as varied side chain protonation states. Furthermore, we studied amino acids and dipeptides in complex with divalent cations (Ca2+, Ba2+, Sr2+, Cd2+, Pb2+, and Hg2+). The database covers the conformational hierarchies of 280 systems in a wide relative energy range of up to 4 eV (390 kJ/mol), summing up to a total of 45,892 stationary points on the respective potential-energy surfaces. All systems were calculated on equal first-principles footing, applying density-functional theory in the generalized gradient approximation corrected for long-range van der Waals interactions. We show good agreement to available experimental data for gas-phase ion affinities. Our curated data can be utilized, for example, for a wide comparison across chemical space of the building blocks of life, for the parametrization of protein force fields, and for the calculation of reference spectra for biophysical applications. AB - We present a structural data set of the 20 proteinogenic amino acids and their amino-methylated and acetylated (capped) dipeptides. Different protonation states of the backbone (uncharged and zwitterionic) were considered for the amino acids as well as varied side chain protonation states. Furthermore, we studied amino acids and dipeptides in complex with divalent cations (Ca2+, Ba2+, Sr2+, Cd2+, Pb2+, and Hg2+). The database covers the conformational hierarchies of 280 systems in a wide relative energy range of up to 4 eV (390 kJ/mol), summing up to a total of 45,892 stationary points on the respective potential-energy surfaces. All systems were calculated on equal first-principles footing, applying density-functional theory in the generalized gradient approximation corrected for long-range van der Waals interactions. We show good agreement to available experimental data for gas-phase ion affinities. Our curated data can be utilized, for example, for a wide comparison across chemical space of the building blocks of life, for the parametrization of protein force fields, and for the calculation of reference spectra for biophysical applications. U2 - 10.1038/sdata.2016.9 DO - 10.1038/sdata.2016.9 M3 - Article VL - 3 JO - Scientific Data JF - Scientific Data SN - 2052-4463 M1 - 160009 ER - TY - JOUR T1 - Forecasting mortality rate by singular spectrum analysis AU - Mahmoudvand, Rahim AU - Alehosseini, Fatemeh AU - Rodrigues, Paulo Canas PY - 2015/11/1 Y1 - 2015/11/1 N2 - Singular spectrum analysis (SSA) is a relatively new and powerful non-parametric time series analysis technique that has demonstrated its capability in forecasting different time series in various disciplines. In this paper, we study the feasibility of using the SSA to perform mortality forecasts. Comparisons are made with the Hyndman–Ullah model, which is a new powerful tool in the field of mortality forecasting, and will be considered as a benchmark to evaluate the performance of the SSA for mortality forecasting. We use both SSA and Hyndman–Ullah models to obtain 10 forecasts for the period 2000–2009 in nine European countries including Belgium, Denmark, Finland, France, Italy, The Netherlands, Norway, Sweden and Switzerland. Computational results show a superior accuracy of the SSA forecasting algorithms, when compared with the Hyndman–Ullah approach. AB - Singular spectrum analysis (SSA) is a relatively new and powerful non-parametric time series analysis technique that has demonstrated its capability in forecasting different time series in various disciplines. In this paper, we study the feasibility of using the SSA to perform mortality forecasts. Comparisons are made with the Hyndman–Ullah model, which is a new powerful tool in the field of mortality forecasting, and will be considered as a benchmark to evaluate the performance of the SSA for mortality forecasting. We use both SSA and Hyndman–Ullah models to obtain 10 forecasts for the period 2000–2009 in nine European countries including Belgium, Denmark, Finland, France, Italy, The Netherlands, Norway, Sweden and Switzerland. Computational results show a superior accuracy of the SSA forecasting algorithms, when compared with the Hyndman–Ullah approach. KW - Hyndman–Ullah model KW - Mortality rate KW - Singular spectrum analysis UR - http://www.scopus.com/inward/record.url?scp=84947785922&partnerID=8YFLogxK M3 - Article VL - 13 SP - 193 EP - 206 JO - REVSTAT STATISTICAL JOURNAL JF - REVSTAT STATISTICAL JOURNAL SN - 1645-6726 IS - 3 ER - TY - JOUR T1 - Generative modeling for maximizing precision and recall in information visualization AU - Peltonen, Jaakko AU - Kaski, Samuel PY - 2011 Y1 - 2011 N2 - Information visualization has recently been formulated as an information retrieval problem, where the goal is to find similar data points based on the visualized nonlinear projection, and the visualization is optimized to maximize a compromise between (smoothed) precision and recall. We turn the visualization into a generative modeling task where a simple user model parameterized by the data coordinates is optimized, neighborhood relations are the observed data, and straightforward maximum likelihood estimation corresponds to Stochastic Neighbor Embedding (SNE). While SNE maximizes pure recall, adding a mixture component that "explains away" misses allows our generative model to focus on maximizing precision as well. The resulting model is a generative solution to maximizing tradeoffs between precision and recall. The model outperforms earlier models in terms of precision and recall and in external validation by unsupervised classification. AB - Information visualization has recently been formulated as an information retrieval problem, where the goal is to find similar data points based on the visualized nonlinear projection, and the visualization is optimized to maximize a compromise between (smoothed) precision and recall. We turn the visualization into a generative modeling task where a simple user model parameterized by the data coordinates is optimized, neighborhood relations are the observed data, and straightforward maximum likelihood estimation corresponds to Stochastic Neighbor Embedding (SNE). While SNE maximizes pure recall, adding a mixture component that "explains away" misses allows our generative model to focus on maximizing precision as well. The resulting model is a generative solution to maximizing tradeoffs between precision and recall. The model outperforms earlier models in terms of precision and recall and in external validation by unsupervised classification. UR - http://www.scopus.com/inward/record.url?scp=84862299625&partnerID=8YFLogxK M3 - Article VL - 15 SP - 579 EP - 587 JO - Journal of Machine Learning Research JF - Journal of Machine Learning Research SN - 1532-4435 ER - TY - JOUR T1 - Gene set analysis for self-contained tests T2 - Complex null and specific alternative hypotheses AU - Rahmatallah, Y. AU - Emmert-Streib, F. AU - Glazko, G. PY - 2012/12 Y1 - 2012/12 N2 - Motivation: The analysis of differentially expressed gene sets became a routine in the analyses of gene expression data. There is a multitude of tests available, ranging from aggregation tests that summarize gene-level statistics for a gene set to true multivariate tests, accounting for intergene correlations. Most of them detect complex departures from the null hypothesis but when the null hypothesis is rejected the specific alternative leading to the rejection is not easily identifiable. Results: In this article we compare the power and Type I error rates of minimum-spanning tree (MST)-based non-parametric multivariate tests with several multivariate and aggregation tests, which are frequently used for pathway analyses. In our simulation study, we demonstrate that MST-based tests have power that is for many settings comparable with the power of conventional approaches, but outperform them in specific regions of the parameter space corresponding to biologically relevant configurations. Further, we find for simulated and for gene expression data that MST-based tests discriminate well against shift and scale alternatives. As a general result, we suggest a two-step practical analysis strategy that may increase the interpretability of experimental data: first, apply the most powerful multivariate test to find the subset of pathways for which the null hypothesis is rejected and second, apply MST-based tests to these pathways to select those that support specific alternative hypotheses. AB - Motivation: The analysis of differentially expressed gene sets became a routine in the analyses of gene expression data. There is a multitude of tests available, ranging from aggregation tests that summarize gene-level statistics for a gene set to true multivariate tests, accounting for intergene correlations. Most of them detect complex departures from the null hypothesis but when the null hypothesis is rejected the specific alternative leading to the rejection is not easily identifiable. Results: In this article we compare the power and Type I error rates of minimum-spanning tree (MST)-based non-parametric multivariate tests with several multivariate and aggregation tests, which are frequently used for pathway analyses. In our simulation study, we demonstrate that MST-based tests have power that is for many settings comparable with the power of conventional approaches, but outperform them in specific regions of the parameter space corresponding to biologically relevant configurations. Further, we find for simulated and for gene expression data that MST-based tests discriminate well against shift and scale alternatives. As a general result, we suggest a two-step practical analysis strategy that may increase the interpretability of experimental data: first, apply the most powerful multivariate test to find the subset of pathways for which the null hypothesis is rejected and second, apply MST-based tests to these pathways to select those that support specific alternative hypotheses. UR - http://www.scopus.com/inward/record.url?scp=84870441671&partnerID=8YFLogxK U2 - 10.1093/bioinformatics/bts579 DO - 10.1093/bioinformatics/bts579 M3 - Article VL - 28 SP - 3073 EP - 3080 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 23 ER - TY - JOUR T1 - Gene Sets Net Correlations Analysis (GSNCA) T2 - A multivariate differential coexpression test for gene sets AU - Rahmatallah, Yasir AU - Emmert-Streib, Frank AU - Glazko, Galina PY - 2014/2/1 Y1 - 2014/2/1 N2 - Motivation: To date, gene set analysis approaches primarily focus on identifying differentially expressed gene sets (pathways). Methods for identifying differentially coexpressed pathways also exist but are mostly based on aggregated pairwise correlations or other pairwise measures of coexpression. Instead, we propose Gene Sets Net Correlations Analysis (GSNCA), a multivariate differential coexpression test that accounts for the complete correlation structure between genes.Results: In GSNCA, weight factors are assigned to genes in proportion to the genes' cross-correlations (intergene correlations). The problem of finding the weight vectors is formulated as an eigenvector problem with a unique solution. GSNCA tests the null hypothesis that for a gene set there is no difference in the weight vectors of the genes between two conditions. In simulation studies and the analyses of experimental data, we demonstrate that GSNCA captures changes in the structure of genes' cross-correlations rather than differences in the averaged pairwise correlations. Thus, GSNCA infers differences in coexpression networks, however, bypassing method-dependent steps of network inference. As an additional result from GSNCA, we define hub genes as genes with the largest weights and show that these genes correspond frequently to major and specific pathway regulators, as well as to genes that are most affected by the biological difference between two conditions. In summary, GSNCA is a new approach for the analysis of differentially coexpressed pathways that also evaluates the importance of the genes in the pathways, thus providing unique information that may result in the generation of novel biological hypotheses. AB - Motivation: To date, gene set analysis approaches primarily focus on identifying differentially expressed gene sets (pathways). Methods for identifying differentially coexpressed pathways also exist but are mostly based on aggregated pairwise correlations or other pairwise measures of coexpression. Instead, we propose Gene Sets Net Correlations Analysis (GSNCA), a multivariate differential coexpression test that accounts for the complete correlation structure between genes.Results: In GSNCA, weight factors are assigned to genes in proportion to the genes' cross-correlations (intergene correlations). The problem of finding the weight vectors is formulated as an eigenvector problem with a unique solution. GSNCA tests the null hypothesis that for a gene set there is no difference in the weight vectors of the genes between two conditions. In simulation studies and the analyses of experimental data, we demonstrate that GSNCA captures changes in the structure of genes' cross-correlations rather than differences in the averaged pairwise correlations. Thus, GSNCA infers differences in coexpression networks, however, bypassing method-dependent steps of network inference. As an additional result from GSNCA, we define hub genes as genes with the largest weights and show that these genes correspond frequently to major and specific pathway regulators, as well as to genes that are most affected by the biological difference between two conditions. In summary, GSNCA is a new approach for the analysis of differentially coexpressed pathways that also evaluates the importance of the genes in the pathways, thus providing unique information that may result in the generation of novel biological hypotheses. UR - http://www.scopus.com/inward/record.url?scp=84893275855&partnerID=8YFLogxK U2 - 10.1093/bioinformatics/btt687 DO - 10.1093/bioinformatics/btt687 M3 - Article VL - 30 SP - 360 EP - 368 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 3 ER - TY - JOUR T1 - Hermitian one-particle density matrix through a semiclassical gradient expansion AU - Bencheikh, K. AU - Räsänen, E. PY - 2015/12/9 Y1 - 2015/12/9 N2 - We carry out the semiclassical expansion of the one-particle density matrix up to the second order in h. We use the method of Grammaticos and Voros based on the Wigner transform of operators. We show that the resulting density matrix is Hermitian and idempotent in contrast with the well-known result of the semiclassical Kirzhnits expansion. Our density matrix leads to the same particle density and kinetic energy density as in the literature, and it satisfies the consistency criterion of the Euler equation. The derived Hermitian density matrix clarifies the ambiguity in the usefulness of gradient expansion approximations and might reignite the development of density functionals with semiclassical methods. AB - We carry out the semiclassical expansion of the one-particle density matrix up to the second order in h. We use the method of Grammaticos and Voros based on the Wigner transform of operators. We show that the resulting density matrix is Hermitian and idempotent in contrast with the well-known result of the semiclassical Kirzhnits expansion. Our density matrix leads to the same particle density and kinetic energy density as in the literature, and it satisfies the consistency criterion of the Euler equation. The derived Hermitian density matrix clarifies the ambiguity in the usefulness of gradient expansion approximations and might reignite the development of density functionals with semiclassical methods. KW - density matrix KW - density-functional theory KW - Wigner transform U2 - 10.1088/1751-8113/49/1/015205 DO - 10.1088/1751-8113/49/1/015205 M3 - Article VL - 49 JO - Journal of Physics A: Mathematical and Theoretical JF - Journal of Physics A: Mathematical and Theoretical SN - 1751-8113 IS - 1 M1 - 015205 ER - TY - JOUR T1 - High-Reynolds-number turbulent cavity flow using the lattice Boltzmann method AU - Hegele, L. A. AU - Scagliarini, A. AU - Sbragaglia, M. AU - Mattila, K. K. AU - Philippi, P. C. AU - Puleri, D. F. AU - Gounley, J. AU - Randles, A. PY - 2018/10/4 Y1 - 2018/10/4 N2 - We present a boundary condition scheme for the lattice Boltzmann method that has significantly improved stability for modeling turbulent flows while maintaining excellent parallel scalability. Simulations of a three-dimensional lid-driven cavity flow are found to be stable up to the unprecedented Reynolds number Re=5×104 for this setup. Excellent agreement with energy balance equations, computational and experimental results are shown. We quantify rises in the production of turbulence and turbulent drag, and determine peak locations of turbulent production. AB - We present a boundary condition scheme for the lattice Boltzmann method that has significantly improved stability for modeling turbulent flows while maintaining excellent parallel scalability. Simulations of a three-dimensional lid-driven cavity flow are found to be stable up to the unprecedented Reynolds number Re=5×104 for this setup. Excellent agreement with energy balance equations, computational and experimental results are shown. We quantify rises in the production of turbulence and turbulent drag, and determine peak locations of turbulent production. U2 - 10.1103/PhysRevE.98.043302 DO - 10.1103/PhysRevE.98.043302 M3 - Article VL - 98 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 4 M1 - 043302 ER - TY - JOUR T1 - Information retrieval perspective to meta-visualization AU - Peltonen, Jaakko AU - Lin, Ziyuan PY - 2013 Y1 - 2013 N2 - In visual data exploration with scatter plots, no single plot is sufficient to analyze complicated high-dimensional data sets. Given numerous visualizations created with different features or methods, meta-visualization is needed to analyze the visualizations together. We solve how to arrange numerous visualizations onto a meta-visualization display, so that their similarities and differences can be analyzed. We introduce a machine learning approach to optimize the meta-visualization, based on an information retrieval perspective: Two visualizations are similar if the analyst would retrieve similar neighborhoods between data samples from either visualization. Based on the approach, we introduce a nonlinear embedding method for meta-visualization: it optimizes locations of visualizations on a display, so that visualizations giving similar information about data are close to each other. AB - In visual data exploration with scatter plots, no single plot is sufficient to analyze complicated high-dimensional data sets. Given numerous visualizations created with different features or methods, meta-visualization is needed to analyze the visualizations together. We solve how to arrange numerous visualizations onto a meta-visualization display, so that their similarities and differences can be analyzed. We introduce a machine learning approach to optimize the meta-visualization, based on an information retrieval perspective: Two visualizations are similar if the analyst would retrieve similar neighborhoods between data samples from either visualization. Based on the approach, we introduce a nonlinear embedding method for meta-visualization: it optimizes locations of visualizations on a display, so that visualizations giving similar information about data are close to each other. KW - Meta-visualization KW - Neighbor embedding KW - Nonlinear dimensionality reduction UR - http://www.scopus.com/inward/record.url?scp=84908485499&partnerID=8YFLogxK M3 - Article VL - 29 SP - 165 EP - 180 JO - Journal of Machine Learning Research JF - Journal of Machine Learning Research SN - 1532-4435 ER - TY - JOUR T1 - Introducing libeemd T2 - a program package for performing the ensemble empirical mode decomposition AU - Luukko, P. J. J. AU - Helske, J. AU - Räsänen, E. N1 - EXT="Luukko, P. J. J." PY - 2016/6/1 Y1 - 2016/6/1 N2 - The ensemble empirical mode decomposition (EEMD) and its complete variant (CEEMDAN) are adaptive, noise-assisted data analysis methods that improve on the ordinary empirical mode decomposition (EMD). All these methods decompose possibly nonlinear and/or nonstationary time series data into a finite amount of components separated by instantaneous frequencies. This decomposition provides a powerful method to look into the different processes behind a given time series data, and provides a way to separate short time-scale events from a general trend. We present a free software implementation of EMD, EEMD and CEEMDAN and give an overview of the EMD methodology and the algorithms used in the decomposition. We release our implementation, libeemd, with the aim of providing a user-friendly, fast, stable, well-documented and easily extensible EEMD library for anyone interested in using (E)EMD in the analysis of time series data. While written in C for numerical efficiency, our implementation includes interfaces to the Python and R languages, and interfaces to other languages are straightforward. AB - The ensemble empirical mode decomposition (EEMD) and its complete variant (CEEMDAN) are adaptive, noise-assisted data analysis methods that improve on the ordinary empirical mode decomposition (EMD). All these methods decompose possibly nonlinear and/or nonstationary time series data into a finite amount of components separated by instantaneous frequencies. This decomposition provides a powerful method to look into the different processes behind a given time series data, and provides a way to separate short time-scale events from a general trend. We present a free software implementation of EMD, EEMD and CEEMDAN and give an overview of the EMD methodology and the algorithms used in the decomposition. We release our implementation, libeemd, with the aim of providing a user-friendly, fast, stable, well-documented and easily extensible EEMD library for anyone interested in using (E)EMD in the analysis of time series data. While written in C for numerical efficiency, our implementation includes interfaces to the Python and R languages, and interfaces to other languages are straightforward. KW - Adaptive data analysis KW - Detrending KW - Hilbert–Huang transform KW - Intrinsic mode function KW - Noise-assisted data analysis KW - Time series analysis U2 - 10.1007/s00180-015-0603-9 DO - 10.1007/s00180-015-0603-9 M3 - Article VL - 31 SP - 545 EP - 557 JO - Computational Statistics JF - Computational Statistics SN - 0943-4062 IS - 2 ER - TY - JOUR T1 - Investigation of an entropic stabilizer for the lattice-Boltzmann method AU - Mattila, Keijo K. AU - Hegele, Luiz A. AU - Philippi, Paulo C. N1 - INT=fys,"Mattila, Keijo K." PY - 2015/6/19 Y1 - 2015/6/19 N2 - The lattice-Boltzmann (LB) method is commonly used for the simulation of fluid flows at the hydrodynamic level of description. Due to its kinetic theory origins, the standard LB schemes carry more degrees of freedom than strictly needed, e.g., for the approximation of solutions to the Navier-stokes equation. In particular, there is freedom in the details of the so-called collision operator. This aspect was recently utilized when an entropic stabilizer, based on the principle of maximizing local entropy, was proposed for the LB method [I. V. Karlin, F. Bösch, and S. S. Chikatamarla, Phys. Rev. E 90, 031302(R) (2014)]. The proposed stabilizer can be considered as an add-on or extension to basic LB schemes. Here the entropic stabilizer is investigated numerically using the perturbed double periodic shear layer flow as a benchmark case. The investigation is carried out by comparing numerical results obtained with six distinct LB schemes. The main observation is that the unbounded, and not explicitly controllable, relaxation time for the higher-order moments will directly influence the leading-order error terms. As a consequence, the order of accuracy and, in general, the numerical behavior of LB schemes are substantially altered. Hence, in addition to systematic numerical validation, more detailed theoretical analysis of the entropic stabilizer is still required in order to properly understand its properties. AB - The lattice-Boltzmann (LB) method is commonly used for the simulation of fluid flows at the hydrodynamic level of description. Due to its kinetic theory origins, the standard LB schemes carry more degrees of freedom than strictly needed, e.g., for the approximation of solutions to the Navier-stokes equation. In particular, there is freedom in the details of the so-called collision operator. This aspect was recently utilized when an entropic stabilizer, based on the principle of maximizing local entropy, was proposed for the LB method [I. V. Karlin, F. Bösch, and S. S. Chikatamarla, Phys. Rev. E 90, 031302(R) (2014)]. The proposed stabilizer can be considered as an add-on or extension to basic LB schemes. Here the entropic stabilizer is investigated numerically using the perturbed double periodic shear layer flow as a benchmark case. The investigation is carried out by comparing numerical results obtained with six distinct LB schemes. The main observation is that the unbounded, and not explicitly controllable, relaxation time for the higher-order moments will directly influence the leading-order error terms. As a consequence, the order of accuracy and, in general, the numerical behavior of LB schemes are substantially altered. Hence, in addition to systematic numerical validation, more detailed theoretical analysis of the entropic stabilizer is still required in order to properly understand its properties. U2 - 10.1103/PhysRevE.91.063010 DO - 10.1103/PhysRevE.91.063010 M3 - Article VL - 91 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 6 M1 - 063010 ER - TY - JOUR T1 - Majorization-minimization for manifold embedding AU - Yang, Zhirong AU - Peltonen, Jaakko AU - Kaski, Samuel PY - 2015 Y1 - 2015 N2 - Nonlinear dimensionality reduction by manifold embedding has become a popular and powerful approach both for visualization and as preprocessing for predictive tasks, but more efficient optimization algorithms are still crucially needed. Majorization-Minimization (MM) is a promising approach that monotonically decreases the cost function, but it remains unknown how to tightly majorize the manifold embedding objective functions such that the resulting MM algorithms are efficient and robust. We propose a new MM procedure that yields fast MM algorithms for a wide variety of manifold embedding problems. In our majorization step, two parts of the cost function are respectively upper bounded by quadratic and Lipschitz surrogates, and the resulting upper bound can be minimized in closed form. For cost functions amenable to such QL-majorization, the MM yields monotonic improvement and is efficient: In experiments, the newly developed MM algorithms outperformed five state-ofthe-art optimization approaches in manifold embedding tasks. AB - Nonlinear dimensionality reduction by manifold embedding has become a popular and powerful approach both for visualization and as preprocessing for predictive tasks, but more efficient optimization algorithms are still crucially needed. Majorization-Minimization (MM) is a promising approach that monotonically decreases the cost function, but it remains unknown how to tightly majorize the manifold embedding objective functions such that the resulting MM algorithms are efficient and robust. We propose a new MM procedure that yields fast MM algorithms for a wide variety of manifold embedding problems. In our majorization step, two parts of the cost function are respectively upper bounded by quadratic and Lipschitz surrogates, and the resulting upper bound can be minimized in closed form. For cost functions amenable to such QL-majorization, the MM yields monotonic improvement and is efficient: In experiments, the newly developed MM algorithms outperformed five state-ofthe-art optimization approaches in manifold embedding tasks. UR - http://www.scopus.com/inward/record.url?scp=84954311496&partnerID=8YFLogxK M3 - Article VL - 38 SP - 1088 EP - 1097 JO - Journal of Machine Learning Research JF - Journal of Machine Learning Research SN - 1532-4435 ER - TY - JOUR T1 - NetBioV T2 - An R package for visualizing large network data in biology and medicine AU - Tripathi, Shailesh AU - Dehmer, Matthias AU - Emmert-Streib, Frank PY - 2014/4/2 Y1 - 2014/4/2 N2 - NetBioV (Network Biology Visualization) is an R package that allows the visualization of large network data in biology and medicine. The purpose of NetBioV is to enable an organized and reproducible visualization of networks by emphasizing or highlighting specific structural properties that are of biological relevance. AB - NetBioV (Network Biology Visualization) is an R package that allows the visualization of large network data in biology and medicine. The purpose of NetBioV is to enable an organized and reproducible visualization of networks by emphasizing or highlighting specific structural properties that are of biological relevance. UR - http://www.scopus.com/inward/record.url?scp=84911403383&partnerID=8YFLogxK U2 - 10.1093/bioinformatics/btu384 DO - 10.1093/bioinformatics/btu384 M3 - Article VL - 30 SP - 2834 EP - 2836 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 19 ER - TY - JOUR T1 - Nonlinear continuous-wave optical propagation in nematic liquid crystals T2 - Interplay between reorientational and thermal effects AU - Alberucci, Alessandro AU - Laudyn, Urszula A. AU - Piccardi, Armando AU - Kwasny, Michał AU - Klus, Bartlomiej AU - Karpierz, Mirosław A. AU - Assanto, Gaetano PY - 2017/7/11 Y1 - 2017/7/11 N2 - We investigate nonlinear optical propagation of continuous-wave (CW) beams in bulk nematic liquid crystals. We thoroughly analyze the competing roles of reorientational and thermal nonlinearity with reference to self-focusing/defocusing and, eventually, the formation of nonlinear diffraction-free wavepackets, the so-called spatial optical solitons. To this extent we refer to dye-doped nematic liquid crystals in planar cells excited by a single CW beam in the highly nonlocal limit. To adjust the relative weight between the two nonlinear responses, we employ two distinct wavelengths, inside and outside the absorption band of the dye, respectively. Different concentrations of the dye are considered in order to enhance the thermal effect. The theoretical analysis is complemented by numerical simulations in the highly nonlocal approximation based on a semi-analytic approach. Theoretical results are finally compared to experimental results in the Nematic Liquid Crystals (NLC) 4-trans-4'-n-hexylcyclohexylisothiocyanatobenzene (6CHBT) doped with Sudan Blue dye. AB - We investigate nonlinear optical propagation of continuous-wave (CW) beams in bulk nematic liquid crystals. We thoroughly analyze the competing roles of reorientational and thermal nonlinearity with reference to self-focusing/defocusing and, eventually, the formation of nonlinear diffraction-free wavepackets, the so-called spatial optical solitons. To this extent we refer to dye-doped nematic liquid crystals in planar cells excited by a single CW beam in the highly nonlocal limit. To adjust the relative weight between the two nonlinear responses, we employ two distinct wavelengths, inside and outside the absorption band of the dye, respectively. Different concentrations of the dye are considered in order to enhance the thermal effect. The theoretical analysis is complemented by numerical simulations in the highly nonlocal approximation based on a semi-analytic approach. Theoretical results are finally compared to experimental results in the Nematic Liquid Crystals (NLC) 4-trans-4'-n-hexylcyclohexylisothiocyanatobenzene (6CHBT) doped with Sudan Blue dye. U2 - 10.1103/PhysRevE.96.012703 DO - 10.1103/PhysRevE.96.012703 M3 - Article VL - 96 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 1 M1 - 012703 ER - TY - JOUR T1 - Optimization and universality of Brownian search in a basic model of quenched heterogeneous media AU - Godec, Aljaž AU - Metzler, Ralf PY - 2015/5/21 Y1 - 2015/5/21 N2 - The kinetics of a variety of transport-controlled processes can be reduced to the problem of determining the mean time needed to arrive at a given location for the first time, the so-called mean first-passage time (MFPT) problem. The occurrence of occasional large jumps or intermittent patterns combining various types of motion are known to outperform the standard random walk with respect to the MFPT, by reducing oversampling of space. Here we show that a regular but spatially heterogeneous random walk can significantly and universally enhance the search in any spatial dimension. In a generic minimal model we consider a spherically symmetric system comprising two concentric regions with piecewise constant diffusivity. The MFPT is analyzed under the constraint of conserved average dynamics, that is, the spatially averaged diffusivity is kept constant. Our analytical calculations and extensive numerical simulations demonstrate the existence of an optimal heterogeneity minimizing the MFPT to the target. We prove that the MFPT for a random walk is completely dominated by what we term direct trajectories towards the target and reveal a remarkable universality of the spatially heterogeneous search with respect to target size and system dimensionality. In contrast to intermittent strategies, which are most profitable in low spatial dimensions, the spatially inhomogeneous search performs best in higher dimensions. Discussing our results alongside recent experiments on single-particle tracking in living cells, we argue that the observed spatial heterogeneity may be beneficial for cellular signaling processes. AB - The kinetics of a variety of transport-controlled processes can be reduced to the problem of determining the mean time needed to arrive at a given location for the first time, the so-called mean first-passage time (MFPT) problem. The occurrence of occasional large jumps or intermittent patterns combining various types of motion are known to outperform the standard random walk with respect to the MFPT, by reducing oversampling of space. Here we show that a regular but spatially heterogeneous random walk can significantly and universally enhance the search in any spatial dimension. In a generic minimal model we consider a spherically symmetric system comprising two concentric regions with piecewise constant diffusivity. The MFPT is analyzed under the constraint of conserved average dynamics, that is, the spatially averaged diffusivity is kept constant. Our analytical calculations and extensive numerical simulations demonstrate the existence of an optimal heterogeneity minimizing the MFPT to the target. We prove that the MFPT for a random walk is completely dominated by what we term direct trajectories towards the target and reveal a remarkable universality of the spatially heterogeneous search with respect to target size and system dimensionality. In contrast to intermittent strategies, which are most profitable in low spatial dimensions, the spatially inhomogeneous search performs best in higher dimensions. Discussing our results alongside recent experiments on single-particle tracking in living cells, we argue that the observed spatial heterogeneity may be beneficial for cellular signaling processes. UR - http://www.scopus.com/inward/record.url?scp=84930652975&partnerID=8YFLogxK U2 - 10.1103/PhysRevE.91.052134 DO - 10.1103/PhysRevE.91.052134 M3 - Article VL - 91 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 5 M1 - 052134 ER - TY - JOUR T1 - Parity-time-symmetric solitons in trapped Bose-Einstein condensates and the influence of varying complex potentials T2 - A variational approach AU - Devassy, Lini AU - Jisha, Chandroth P. AU - Alberucci, Alessandro AU - Kuriakose, V. C. PY - 2015/8/19 Y1 - 2015/8/19 N2 - Dynamics and properties of nonlinear matter waves in a trapped BEC subject to a PT-symmetric linear potential, with the trap in the form of a super-Gaussian potential, are investigated via a variational approach accounting for the complex nature of the soliton. In the process, we address how the shape of the imaginary part of the potential, that is, a gain-loss mechanism, affects the self-localization and the stability of the condensate. Variational results are found to be in good agreement with full numerical simulations for predicting the shape, width, and chemical potential of the condensate until the PT breaking point. Variational computation also predicts the existence of solitary solution only above a threshold in the particle number as the gain-loss is increased, in agreement with numerical simulations. AB - Dynamics and properties of nonlinear matter waves in a trapped BEC subject to a PT-symmetric linear potential, with the trap in the form of a super-Gaussian potential, are investigated via a variational approach accounting for the complex nature of the soliton. In the process, we address how the shape of the imaginary part of the potential, that is, a gain-loss mechanism, affects the self-localization and the stability of the condensate. Variational results are found to be in good agreement with full numerical simulations for predicting the shape, width, and chemical potential of the condensate until the PT breaking point. Variational computation also predicts the existence of solitary solution only above a threshold in the particle number as the gain-loss is increased, in agreement with numerical simulations. UR - http://www.scopus.com/inward/record.url?scp=84939612865&partnerID=8YFLogxK U2 - 10.1103/PhysRevE.92.022914 DO - 10.1103/PhysRevE.92.022914 M3 - Article VL - 92 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 2 M1 - 022914 ER - TY - GEN T1 - Performance of Variable Partial Factor approach in a slope design AU - Knuuti, Mika AU - Länsivaara, Tim PY - 2019 Y1 - 2019 N2 - Most of the design codes have moved from traditional total factor of safety method to the partial factor approach, aiming to cover the uncertainties better. The target has been to reach more consistent safety levels, but it has not always obtained. This has raised more interest towards reliability based design and its applications. In this paper, the performance of two partial factor approaches were compared from the reliability point of view; eurocode 7 design approach 3 and proposed Variable Partial Factor approach. The results show that the partial factor method with fixed partial factors cannot fully cover the uncertainties related to the design. The partial factors should be dependent on the level of uncertainty of the parameters. The results also shows that RBD can be applied in designer friendly way. In addition, some challenges in the determination of the characteristic values were pointed out. AB - Most of the design codes have moved from traditional total factor of safety method to the partial factor approach, aiming to cover the uncertainties better. The target has been to reach more consistent safety levels, but it has not always obtained. This has raised more interest towards reliability based design and its applications. In this paper, the performance of two partial factor approaches were compared from the reliability point of view; eurocode 7 design approach 3 and proposed Variable Partial Factor approach. The results show that the partial factor method with fixed partial factors cannot fully cover the uncertainties related to the design. The partial factors should be dependent on the level of uncertainty of the parameters. The results also shows that RBD can be applied in designer friendly way. In addition, some challenges in the determination of the characteristic values were pointed out. U2 - 10.22725/ICASP13.475 DO - 10.22725/ICASP13.475 M3 - Conference contribution BT - 13th International Conference on Applications of Statistics and Probability in Civil Engineering(ICASP13), Seoul, South Korea, May 26-30, 2019 ER - TY - JOUR T1 - Quantifying the non-ergodicity of scaled Brownian motion AU - Safdari, Hadiseh AU - Cherstvy, Andrey G. AU - Chechkin, Aleksei V. AU - Thiel, Felix AU - Sokolov, Igor M. AU - Metzler, Ralf PY - 2015/9/18 Y1 - 2015/9/18 N2 - We examine the non-ergodic properties of scaled Brownian motion (SBM), a non-stationary stochastic process with a time dependent diffusivity of the form $D(t)\simeq {t}^{\alpha -1}$. We compute the ergodicity breaking parameter EB in the entire range of scaling exponents α, both analytically and via extensive computer simulations of the stochastic Langevin equation. We demonstrate that in the limit of long trajectory lengths T and short lag times Δ the EB parameter as function of the scaling exponent α has no divergence at α = 1/2 and present the asymptotes for EB in different limits. We generalize the analytical and simulations results for the time averaged and ergodic properties of SBM in the presence of ageing, that is, when the observation of the system starts only a finite time span after its initiation. The approach developed here for the calculation of the higher time averaged moments of the particle displacement can be applied to derive the ergodic properties of other stochastic processes such as fractional Brownian motion. AB - We examine the non-ergodic properties of scaled Brownian motion (SBM), a non-stationary stochastic process with a time dependent diffusivity of the form $D(t)\simeq {t}^{\alpha -1}$. We compute the ergodicity breaking parameter EB in the entire range of scaling exponents α, both analytically and via extensive computer simulations of the stochastic Langevin equation. We demonstrate that in the limit of long trajectory lengths T and short lag times Δ the EB parameter as function of the scaling exponent α has no divergence at α = 1/2 and present the asymptotes for EB in different limits. We generalize the analytical and simulations results for the time averaged and ergodic properties of SBM in the presence of ageing, that is, when the observation of the system starts only a finite time span after its initiation. The approach developed here for the calculation of the higher time averaged moments of the particle displacement can be applied to derive the ergodic properties of other stochastic processes such as fractional Brownian motion. KW - ageing KW - anomalous diffusion KW - scaled Brownian motion UR - http://www.scopus.com/inward/record.url?scp=84940069543&partnerID=8YFLogxK U2 - 10.1088/1751-8113/48/37/375002 DO - 10.1088/1751-8113/48/37/375002 M3 - Article VL - 48 JO - Journal of Physics A: Mathematical and Theoretical JF - Journal of Physics A: Mathematical and Theoretical SN - 1751-8113 IS - 37 M1 - 375002 ER - TY - JOUR T1 - Reorientational versus Kerr dark and gray solitary waves using modulation theory AU - Assanto, Gaetano AU - Marchant, T. R. AU - Minzoni, Antonmaria A. AU - Smyth, Noel F. PY - 2011/12/9 Y1 - 2011/12/9 N2 - We develop a modulation theory model based on a Lagrangian formulation to investigate the evolution of dark and gray optical spatial solitary waves for both the defocusing nonlinear Schrödinger (NLS) equation and the nematicon equations describing nonlinear beams, nematicons, in self-defocusing nematic liquid crystals. Since it has an exact soliton solution, the defocusing NLS equation is used as a test bed for the modulation theory applied to the nematicon equations, which have no exact solitary wave solution. We find that the evolution of dark and gray NLS solitons, as well as nematicons, is entirely driven by the emission of diffractive radiation, in contrast to the evolution of bright NLS solitons and bright nematicons. Moreover, the steady nematicon profile is nonmonotonic due to the long-range nonlocality associated with the perturbation of the optic axis. Excellent agreement is obtained with numerical solutions of both the defocusing NLS and nematicon equations. The comparisons for the nematicon solutions raise a number of subtle issues relating to the definition and measurement of the width of a dark or gray nematicon. AB - We develop a modulation theory model based on a Lagrangian formulation to investigate the evolution of dark and gray optical spatial solitary waves for both the defocusing nonlinear Schrödinger (NLS) equation and the nematicon equations describing nonlinear beams, nematicons, in self-defocusing nematic liquid crystals. Since it has an exact soliton solution, the defocusing NLS equation is used as a test bed for the modulation theory applied to the nematicon equations, which have no exact solitary wave solution. We find that the evolution of dark and gray NLS solitons, as well as nematicons, is entirely driven by the emission of diffractive radiation, in contrast to the evolution of bright NLS solitons and bright nematicons. Moreover, the steady nematicon profile is nonmonotonic due to the long-range nonlocality associated with the perturbation of the optic axis. Excellent agreement is obtained with numerical solutions of both the defocusing NLS and nematicon equations. The comparisons for the nematicon solutions raise a number of subtle issues relating to the definition and measurement of the width of a dark or gray nematicon. UR - http://www.scopus.com/inward/record.url?scp=84555189254&partnerID=8YFLogxK U2 - 10.1103/PhysRevE.84.066602 DO - 10.1103/PhysRevE.84.066602 M3 - Article VL - 84 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 6 M1 - 066602 ER - TY - JOUR T1 - Revealing differences in gene network inference algorithms on the network level by ensemble methods AU - Altay, Gökmen AU - Emmert-Streib, Frank PY - 2010/5/25 Y1 - 2010/5/25 N2 - Motivation: The inference of regulatory networks from large-scale expression data holds great promise because of the potentially causal interpretation of these networks. However, due to the difficulty to establish reliable methods based on observational data there is so far only incomplete knowledge about possibilities and limitations of such inference methods in this context. Results: In this article, we conduct a statistical analysis investigating differences and similarities of four network inference algorithms, ARACNE, CLR, MRNET and RN, with respect to local network-based measures. We employ ensemble methods allowing to assess the inferability down to the level of individual edges. Our analysis reveals the bias of these inference methods with respect to the inference of various network components and, hence, provides guidance in the interpretation of inferred regulatory networks from expression data. Further, as application we predict the total number of regulatory interactions in human B cells and hypothesize about the role of Myc and its targets regarding molecular information processing. AB - Motivation: The inference of regulatory networks from large-scale expression data holds great promise because of the potentially causal interpretation of these networks. However, due to the difficulty to establish reliable methods based on observational data there is so far only incomplete knowledge about possibilities and limitations of such inference methods in this context. Results: In this article, we conduct a statistical analysis investigating differences and similarities of four network inference algorithms, ARACNE, CLR, MRNET and RN, with respect to local network-based measures. We employ ensemble methods allowing to assess the inferability down to the level of individual edges. Our analysis reveals the bias of these inference methods with respect to the inference of various network components and, hence, provides guidance in the interpretation of inferred regulatory networks from expression data. Further, as application we predict the total number of regulatory interactions in human B cells and hypothesize about the role of Myc and its targets regarding molecular information processing. UR - http://www.scopus.com/inward/record.url?scp=77954484005&partnerID=8YFLogxK U2 - 10.1093/bioinformatics/btq259 DO - 10.1093/bioinformatics/btq259 M3 - Article VL - 26 SP - 1738 EP - 1744 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 14 M1 - btq259 ER - TY - JOUR T1 - SamExploreR T2 - Exploring reproducibility and robustness of RNA-seq results based on SAM files AU - Stupnikov, Alexey AU - Tripathi, Shailesh AU - De Matos Simoes, Ricardo AU - McArt, Darragh AU - Salto-Tellez, Manuel AU - Glazko, Galina AU - Dehmer, Matthias AU - Emmert-Streib, Frank PY - 2016/11/1 Y1 - 2016/11/1 N2 - Motivation: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. Results: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes. AB - Motivation: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. Results: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes. U2 - 10.1093/bioinformatics/btw475 DO - 10.1093/bioinformatics/btw475 M3 - Article VL - 32 SP - 3345 EP - 3347 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 21 ER - TY - JOUR T1 - SCIP T2 - a single-cell image processor toolbox AU - Martins, Leonardo AU - Neeli-Venkata, Ramakanth AU - Oliveira, Samuel M.D. AU - Häkkinen, Antti AU - Ribeiro, Andre S. AU - Fonseca, José M. PY - 2018/12/15 Y1 - 2018/12/15 N2 - Summary: Each cell is a phenotypically unique individual that is influenced by internal and external processes, operating in parallel. To characterize the dynamics of cellular processes one needs to observe many individual cells from multiple points of view and over time, so as to identify commonalities and variability. With this aim, we engineered a software, 'SCIP', to analyze multi-modal, multi-process, time-lapse microscopy morphological and functional images. SCIP is capable of automatic and/or manually corrected segmentation of cells and lineages, automatic alignment of different microscopy channels, as well as detect, count and characterize fluorescent spots (such as RNA tagged by MS2-GFP), nucleoids, Z rings, Min system, inclusion bodies, undefined structures, etc. The results can be exported into *mat files and all results can be jointly analyzed, to allow studying not only each feature and process individually, but also find potential relationships. While we exemplify its use on Escherichia coli, many of its functionalities are expected to be of use in analyzing other prokaryotes and eukaryotic cells as well. We expect SCIP to facilitate the finding of relationships between cellular processes, from small-scale (e.g. gene expression) to large-scale (e.g. cell division), in single cells and cell lineages. Availability and implementation: http://www.ca3-uninova.org/project_scip. Supplementary information: Supplementary data are available at Bioinformatics online. AB - Summary: Each cell is a phenotypically unique individual that is influenced by internal and external processes, operating in parallel. To characterize the dynamics of cellular processes one needs to observe many individual cells from multiple points of view and over time, so as to identify commonalities and variability. With this aim, we engineered a software, 'SCIP', to analyze multi-modal, multi-process, time-lapse microscopy morphological and functional images. SCIP is capable of automatic and/or manually corrected segmentation of cells and lineages, automatic alignment of different microscopy channels, as well as detect, count and characterize fluorescent spots (such as RNA tagged by MS2-GFP), nucleoids, Z rings, Min system, inclusion bodies, undefined structures, etc. The results can be exported into *mat files and all results can be jointly analyzed, to allow studying not only each feature and process individually, but also find potential relationships. While we exemplify its use on Escherichia coli, many of its functionalities are expected to be of use in analyzing other prokaryotes and eukaryotic cells as well. We expect SCIP to facilitate the finding of relationships between cellular processes, from small-scale (e.g. gene expression) to large-scale (e.g. cell division), in single cells and cell lineages. Availability and implementation: http://www.ca3-uninova.org/project_scip. Supplementary information: Supplementary data are available at Bioinformatics online. U2 - 10.1093/bioinformatics/bty505 DO - 10.1093/bioinformatics/bty505 M3 - Article VL - 34 SP - 4318 EP - 4320 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 24 ER - TY - JOUR T1 - Search reliability and search efficiency of combined Lévy-Brownian motion T2 - Long relocations mingled with thorough local exploration AU - Palyulin, Vladimir V. AU - Chechkin, Aleksei V. AU - Klages, Rainer AU - Metzler, Ralf PY - 2016/9/8 Y1 - 2016/9/8 N2 - A combined dynamics consisting of Brownian motion and Lévy flights is exhibited by a variety of biological systems performing search processes. Assessing the search reliability of ever locating the target and the search efficiency of doing so economically of such dynamics thus poses an important problem. Here we model this dynamics by a one-dimensional fractional Fokker-Planck equation combining unbiased Brownian motion and Lévy flights. By solving this equation both analytically and numerically we show that the superposition of recurrent Brownian motion and Lévy flights with stable exponent α <1, by itself implying zero probability of hitting a point on a line, leads to transient motion with finite probability of hitting any point on the line. We present results for the exact dependence of the values of both the search reliability and the search efficiency on the distance between the starting and target positions as well as the choice of the scaling exponent α of the Lévy flight component. AB - A combined dynamics consisting of Brownian motion and Lévy flights is exhibited by a variety of biological systems performing search processes. Assessing the search reliability of ever locating the target and the search efficiency of doing so economically of such dynamics thus poses an important problem. Here we model this dynamics by a one-dimensional fractional Fokker-Planck equation combining unbiased Brownian motion and Lévy flights. By solving this equation both analytically and numerically we show that the superposition of recurrent Brownian motion and Lévy flights with stable exponent α <1, by itself implying zero probability of hitting a point on a line, leads to transient motion with finite probability of hitting any point on the line. We present results for the exact dependence of the values of both the search reliability and the search efficiency on the distance between the starting and target positions as well as the choice of the scaling exponent α of the Lévy flight component. KW - Brownian motion KW - first arrival KW - first passage KW - Lévy flights KW - random search process UR - http://www.scopus.com/inward/record.url?scp=84989172145&partnerID=8YFLogxK U2 - 10.1088/1751-8113/49/39/394002 DO - 10.1088/1751-8113/49/39/394002 M3 - Article VL - 49 JO - Journal of Physics A: Mathematical and Theoretical JF - Journal of Physics A: Mathematical and Theoretical SN - 1751-8113 IS - 39 M1 - 394002 ER - TY - JOUR T1 - Signal focusing through active transport AU - Godec, Aljaž AU - Metzler, Ralf PY - 2015/7/2 Y1 - 2015/7/2 N2 - The accuracy of molecular signaling in biological cells and novel diagnostic devices is ultimately limited by the counting noise floor imposed by the thermal diffusion. Motivated by the fact that messenger RNA and vesicle-engulfed signaling molecules transiently bind to molecular motors and are actively transported in biological cells, we show here that the random active delivery of signaling particles to within a typical diffusion distance to the receptor generically reduces the correlation time of the counting noise. Considering a variety of signaling particle sizes from mRNA to vesicles and cell sizes from prokaryotic to eukaryotic cells, we show that the conditions for active focusing - faster and more precise signaling - are indeed compatible with observations in living cells. Our results improve the understanding of molecular cellular signaling and novel diagnostic devices. AB - The accuracy of molecular signaling in biological cells and novel diagnostic devices is ultimately limited by the counting noise floor imposed by the thermal diffusion. Motivated by the fact that messenger RNA and vesicle-engulfed signaling molecules transiently bind to molecular motors and are actively transported in biological cells, we show here that the random active delivery of signaling particles to within a typical diffusion distance to the receptor generically reduces the correlation time of the counting noise. Considering a variety of signaling particle sizes from mRNA to vesicles and cell sizes from prokaryotic to eukaryotic cells, we show that the conditions for active focusing - faster and more precise signaling - are indeed compatible with observations in living cells. Our results improve the understanding of molecular cellular signaling and novel diagnostic devices. UR - http://www.scopus.com/inward/record.url?scp=84937010360&partnerID=8YFLogxK U2 - 10.1103/PhysRevE.92.010701 DO - 10.1103/PhysRevE.92.010701 M3 - Article VL - 92 JO - Physical Review E JF - Physical Review E SN - 1539-3755 IS - 1 M1 - 010701 ER - TY - JOUR T1 - Structured orthogonal families of one and two strata prime basis factorial models AU - Rodrigues, Paulo C. AU - Moreira, Elsa E. AU - Jesus, Vera M. AU - Mexia, João T. PY - 2014 Y1 - 2014 N2 - The models in structured families correspond to the treatments of a fixed effects base design π, on the fixed effects parameters of the models, is studied. Analyzing such a families enables the study of the action of nesting factors on the effects and interactions of nested factors. When π has an orthogonal structure, the family of models is said to be orthogonal. The models in the family can have one, two or more strata. Models with more than one stratum are obtained through nesting of one stratum models. A general treatment of the case in which the base design has orthogonal structure is presented and a special emphasis is given to the families of prime basis factorials models. These last models are, as it is well known, widely used in fertilization trials. AB - The models in structured families correspond to the treatments of a fixed effects base design π, on the fixed effects parameters of the models, is studied. Analyzing such a families enables the study of the action of nesting factors on the effects and interactions of nested factors. When π has an orthogonal structure, the family of models is said to be orthogonal. The models in the family can have one, two or more strata. Models with more than one stratum are obtained through nesting of one stratum models. A general treatment of the case in which the base design has orthogonal structure is presented and a special emphasis is given to the families of prime basis factorials models. These last models are, as it is well known, widely used in fertilization trials. KW - Factorial designs KW - Families of models KW - Nested models KW - Orthogonal models KW - Two strata models UR - http://www.scopus.com/inward/record.url?scp=84903887341&partnerID=8YFLogxK U2 - 10.1007/s00362-013-0507-0 DO - 10.1007/s00362-013-0507-0 M3 - Article VL - 55 SP - 603 EP - 614 JO - Statistical Papers JF - Statistical Papers SN - 0932-5026 IS - 3 ER - TY - JOUR T1 - Unite and conquer T2 - Univariate and multivariate approaches for finding differentially expressed gene sets AU - Glazko, Galina V. AU - Emmert-Streib, Frank PY - 2009/9 Y1 - 2009/9 N2 - Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses. Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one. AB - Motivation: Recently, many univariate and several multivariate approaches have been suggested for testing differential expression of gene sets between different phenotypes. However, despite a wealth of literature studying their performance on simulated and real biological data, still there is a need to quantify their relative performance when they are testing different null hypotheses. Results: In this article, we compare the performance of univariate and multivariate tests on both simulated and biological data. In the simulation study we demonstrate that high correlations equally affect the power of both, univariate as well as multivariate tests. In addition, for most of them the power is similarly affected by the dimensionality of the gene set and by the percentage of genes in the set, for which expression is changing between two phenotypes. The application of different test statistics to biological data reveals that three statistics (sum of squared t-tests, Hotelling's T2, N-statistic), testing different null hypotheses, find some common but also some complementing differentially expressed gene sets under specific settings. This demonstrates that due to complementing null hypotheses each test projects on different aspects of the data and for the analysis of biological data it is beneficial to use all three tests simultaneously instead of focusing exclusively on just one. U2 - 10.1093/bioinformatics/btp406 DO - 10.1093/bioinformatics/btp406 M3 - Article VL - 25 SP - 2348 EP - 2354 JO - Bioinformatics JF - Bioinformatics SN - 1367-4803 IS - 18 ER - TY - JOUR T1 - Using multi-step proposal distribution for improved MCMC convergence in Bayesian network structure learning AU - Larjo, Antti AU - Lähdesmäki, Harri N1 - EXT="Lähdesmäki, Harri" PY - 2015/12/27 Y1 - 2015/12/27 N2 - Bayesian networks have become popular for modeling probabilistic relationships between entities. As their structure can also be given a causal interpretation about the studied system, they can be used to learn, for example, regulatory relationships of genes or proteins in biological networks and pathways. Inference of the Bayesian network structure is complicated by the size of the model structure space, necessitating the use of optimization methods or sampling techniques, such Markov Chain Monte Carlo (MCMC) methods. However, convergence of MCMC chains is in many cases slow and can become even a harder issue as the dataset size grows. We show here how to improve convergence in the Bayesian network structure space by using an adjustable proposal distribution with the possibility to propose a wide range of steps in the structure space, and demonstrate improved network structure inference by analyzing phosphoprotein data from the human primary T cell signaling network. AB - Bayesian networks have become popular for modeling probabilistic relationships between entities. As their structure can also be given a causal interpretation about the studied system, they can be used to learn, for example, regulatory relationships of genes or proteins in biological networks and pathways. Inference of the Bayesian network structure is complicated by the size of the model structure space, necessitating the use of optimization methods or sampling techniques, such Markov Chain Monte Carlo (MCMC) methods. However, convergence of MCMC chains is in many cases slow and can become even a harder issue as the dataset size grows. We show here how to improve convergence in the Bayesian network structure space by using an adjustable proposal distribution with the possibility to propose a wide range of steps in the structure space, and demonstrate improved network structure inference by analyzing phosphoprotein data from the human primary T cell signaling network. KW - Bayesian network KW - MCMC KW - Proposal distribution KW - Structure learning U2 - 10.1186/s13637-015-0024-7 DO - 10.1186/s13637-015-0024-7 M3 - Article VL - 2015 JO - Eurasip Journal on Bioinformatics and Systems Biology JF - Eurasip Journal on Bioinformatics and Systems Biology SN - 1687-4145 IS - 1 M1 - 6 ER -