Studies on Limnology and Aquatic Engineering

The submitted diploma thesis presents some of the numerous possibilities of data analysis with
statistical methods and artificial neural networks (ANN) focussing on
the recognition, modelling, and analysis of correlations between variables.
Important ecological and in particular limnological concepts and terms were applied.
The studies based on two data sets with observations and measurements of the abundances of various macrozoobenthic taxa, i.e.
mostly small animals that can be found at least during one stage of their life cycle on the ground of in this case running water,
and EPT species - that are mayflies (Ephemeroptera), stoneflies (Plecoptera), and caddis flies
(Trichoptera) - and abiotic water parameters.
Prior to pre-processing, the data sets were thoroughly discussed.
Correlations between the annual abundances of stoneflies, mayflies, and caddis flies and their environment as well as
the dependencies of macrozoobenthic taxa from abiotic factors and resources
described roughly projections of the species' ecological niches.
Conversely for bioindication knowing the presence or absence of important
indicator species led to predictions on the values of e.g. a chemical water property (total phosphorous).
For statistical or ANN based descriptions of correlations, it is convenient to distinguish between
linear and non-linear correlations.
Linear and approximately linear correlations can be described using a linear regression or
linear neural network.
The statistical approach tests hypotheses assuming certain
distribution of the variables in order to test the generalisation ability of models
trained with samples concerning the population.
However, in the case of ANNs
the sample is normally splitted into one training data set and an independent, also representative test data set.
Calculation of quality measures for the test data set estimates the quality of the model for the population.
Modelling non-linear correlations and especially given multiple predictors, the
functional type of the correlation cannot be assumed a priori.
Is the functional type of the correlation insufficiently known, it can often be described approximately
using a non-linear neural network or non-parametric kernel regression estimators.
Linear networks, multilayer perceptrons, radial basis function networks, and
kernel regression estimators (Nadaraya-Watson; general regression networks) were applied.
However, finding a model and adapting its parameters has often to be combined with the selection of
important and relevant predictor variables. For multiple linear regression a stepwise method was applied.
RBF networks are capable to detect redundant and irrelevant predictors.
The possibilities were compared with those of a modified MLP networks which represents the importance of
predictors with a sensitivity vector adapted during the training, and the result of a stepwise linear regression.
One example showed the superiority of a genetic algorithm compared to a simple forward-selection method
selecting predictors for Nadaraya-Watson kernel regression estimators.
Estimating the generalisation quality using a cross validation given small sample sizes
is advantageous.
This diploma thesis exemplified many problems of the data analysis with statistical methods
and artificial neural networks, for some were offered solutions. New
tools in the field of limnology were presented and tested. Moreover, this work contains numerous inspirations for
further projects.
Last change: 10/12/2001
© Michael Obach