Michael Obach

Application and Comparison of Statistical Methods and Artificial Neural Networks

Studies on Limnology and Aquatic Engineering

 

 

Diploma thesis at the faculty of mathematics and computer science

University Gesamthochschule Kassel


The submitted diploma thesis presents some of the numerous possibilities of data analysis with statistical methods and artificial neural networks (ANN) focussing on the recognition, modelling, and analysis of correlations between variables. Important ecological and in particular limnological concepts and terms were applied.
The studies based on two data sets with observations and measurements of the abundances of various macrozoobenthic taxa, i.e. mostly small animals that can be found at least during one stage of their life cycle on the ground of in this case running water, and EPT species - that are mayflies (Ephemeroptera), stoneflies (Plecoptera), and caddis flies (Trichoptera) - and abiotic water parameters. Prior to pre-processing, the data sets were thoroughly discussed.
Correlations between the annual abundances of stoneflies, mayflies, and caddis flies and their environment as well as the dependencies of macrozoobenthic taxa from abiotic factors and resources described roughly projections of the species' ecological niches. Conversely for bioindication knowing the presence or absence of important indicator species led to predictions on the values of e.g. a chemical water property (total phosphorous).
For statistical or ANN based descriptions of correlations, it is convenient to distinguish between linear and non-linear correlations. Linear and approximately linear correlations can be described using a linear regression or linear neural network.
The statistical approach tests hypotheses assuming certain distribution of the variables in order to test the generalisation ability of models trained with samples concerning the population. However, in the case of ANNs the sample is normally splitted into one training data set and an independent, also representative test data set. Calculation of quality measures for the test data set estimates the quality of the model for the population.
Modelling non-linear correlations and especially given multiple predictors, the functional type of the correlation cannot be assumed a priori. Is the functional type of the correlation insufficiently known, it can often be described approximately using a non-linear neural network or non-parametric kernel regression estimators. Linear networks, multilayer perceptrons, radial basis function networks, and kernel regression estimators (Nadaraya-Watson; general regression networks) were applied.
However, finding a model and adapting its parameters has often to be combined with the selection of important and relevant predictor variables. For multiple linear regression a stepwise method was applied. RBF networks are capable to detect redundant and irrelevant predictors. The possibilities were compared with those of a modified MLP networks which represents the importance of predictors with a sensitivity vector adapted during the training, and the result of a stepwise linear regression. One example showed the superiority of a genetic algorithm compared to a simple forward-selection method selecting predictors for Nadaraya-Watson kernel regression estimators.
Estimating the generalisation quality using a cross validation given small sample sizes is advantageous.
This diploma thesis exemplified many problems of the data analysis with statistical methods and artificial neural networks, for some were offered solutions. New tools in the field of limnology were presented and tested. Moreover, this work contains numerous inspirations for further projects.


Download (3,5 MB) Michael Obach's diploma thesis as a PDF file (in German!).

Last change: 10/12/2001
© Michael Obach