TUTCRIS - Tampereen teknillinen yliopisto


Goodness-of-fit tests and heavy-tailed distributions in network traffic data analysis



KustantajaTampere University of Technology
ISBN (elektroninen)978-952-15-2233-8
ISBN (painettu)978-952-15-2191-1
TilaJulkaistu - 21 elokuuta 2009
OKM-julkaisutyyppiG4 Monografiaväitöskirja


NimiTampere University of Technology. Publication
KustantajaTampere University of Technology
ISSN (painettu)1459-2045


Network management system is a vital part of a modern telecommunication network. The duties of the system include, among other things, fault management, configuration management, and performance management. For these purposes the network management system collects vast amounts of data, the processing and analysis of which has developed into a whole discipline. Network traffic data analysis involves, for example, change detection, prediction, and modelling. This thesis concentrates on network traffic data analysis with statistical tools, goodness-of-fit tests in particular. Instead of artificially generated data, data sets collected from real networks serve as case examples. Since real network data fit poorly to analytical distributions or textbook examples, Monte Carlo simulation is used for modelling the properties of the data. The various quantities measured from telecommunication networks reportedly exhibit heavy-tailed distributions. Heavy-tailed distributions possess special features (such as infinite variance) that make them problematic for statistical analysis as well as network management. This is why heavy-tailed distributions are one of the premises of this work. The network management system usually does not allow tailoring the measurements for a specific purpose but the analysis has to adapt to the data available. A histogram is one of the most popular means to compress data, that is, the data from the system often come as a histogram. This work develops a method for change detection of histogram data. Furthermore, classical goodness-of-fit tests are largely inadequate for network traffic data. In addition to heavy-tailed distributions, the huge amount of data causes problems. This thesis collects several test statistics proposed in the literature for testing heavy-tailed distributions. Their usefulness is assessed through a power study, where a scenario of true traffic change detection is created. According to the results, the plain median outperforms all the more complicated test statistics in change detection. A suitable sample size is sought with a similar power study, because the large amount of data may easily ruin the feasibility of the test. Some sources cite predictability as an advantage of heavy-tailed distributions, but this feature has never been exploited. This thesis first generalizes the predictability to the time-continuous domain and then develops it further to a model that tries to predict traffic volume. However, the usefulness of the predictability remains limited, because several assumptions have to be made that do not necessarily hold in real network applications.


Latausten tilastot

Ei tietoja saatavilla