Created by W.Langdon from gp-bibliography.bib Revision:1.8010
In the past few years, many data mining techniques have emerged that are capable of analyzing massive amounts of data. Available processing power allowed the development of efficient data-driven modeling techniques especially suited to situations in which the speed of data acquisition surpasses the time available for data analysis. However, although these methods are promising ways to provide valuable information to the operator and engineer,there is currently no fully developed interest in the application of these techniques to support WWTP operation.
In this thesis, the applicability of data mining and datadriven modeling techniques in the context of WWTP operation is investigated. This context, however, implies specific characteristics that the adapted and developed techniques must satisfy to be practicable: On the one hand, the deployment of a given technique on a plant must be fast, simple and cost-effective. As a consequence, it must consider data that are already available or that can be gathered easily. On the other hand, the application must be safe, i.e., the extracted information must be reliable and communicated clearly. This thesis presents the results of four knowledge discovery projects that adapted data mining and data-driven modeling techniques to tackle problems relevant to either the operator or the process engineer.
First, the extent to which data-driven modeling techniques are suitable for the automatic generation of software sensors exclusively based on measured data available in the SCADA system of the plant is investigated. These software sensors are meant to be substitutes for failure-prone and maintenance-intensive sensors and to diagnose hardware sensors. In two full-scale experiments, four modeling techniques for software-sensor development are compared and the role of expert knowledge is investigated. The investigations show that the non-linear modeling techniques outperform the linear technique and that a higher degree of expert knowledge is beneficial for long term accuracy, but can lead to reduced performance in the short term. Consequently, if frequent model re-calibration is possible, as is the case for sensor diagnosis applications, automatic development given limited expert knowledge is feasible. In contrast, optimum use of expert knowledge requires model transparency, which is only given for two of the investigated techniques: generalized least squares regression and self-organizing maps (SOMs).
In the second project, WWTP operators are provided with additional information on characteristic sewage compositions arriving at their plant from clustered UV/Vis spectra measured at the influent. A two-staged clustering approach is considered that copes well with high-dimensional and noisy data. If it is possible to assign a characteristic cluster to a sewage producer in the catchment, detailed analysis of the temporal discharging pattern is possible without the need for additional measurements at the production site. In a full-scale experiment, one of five detected clusters could by assigned to an industrial laundry by analyzing the cluster centroids. In a validation experiment, 93 out of 95 discharging events were classified correctly. Successful detection depends on the uniqueness of the producer UV/Vis pattern,the dilution at the influent and the size and complexity of the catchment.
In WWTPs, asymmetric feeding of reactors operating in parallel lanes can lead to operational issues and significant performance losses. A new method based on dynamic time warping is presented that makes the quantification of the discharge distribution at hydraulic flow dividers practicable. The method estimates the discharge distribution as a function of total discharge at the divider given influent and effluent measurements of some measured signal in the downstream reactors. The function can not only serve as the basis for structural modification, but it can also be used to calculate the flow to the individual lanes given the total influent, and thus avoid the assumption of equal distribution (this assumption must often be made by process engineers and scientists). Theoretical analysis reveals that the accuracy of the function depends on the hydraulic residence time, the dispersion and the reactions in the reactors downstream of the divider, in addition to the variability of the signal. A systematic application on a wide range of synthetic systems that may be found on WWTPs shows that the error is at least half that when an equal distribution is assumed if the function is used to obtain a better estimate for the flow to a reactor. In a full scale validation experiment, the discharge distribution could be accurately estimated.
The fourth application presented shows that optimal hydraulic reactor models can be searched automatically using grammar-based genetic programming. This method is especially relevant for engineers who want to model the hydraulic processes of the plant and, because of the limited applicability of existing approaches, must rely solely on their experience and intuition for further insights into the reactor hydraulics. With a tree encoding that can decode program trees into hydraulic reactor models compatible with common software and with influent and effluent measurements, a palette of equally performing models can be generated. Of these the modeler then picks the most suitable one as starting point. The methodology is applied to reverse engineer synthetic systems, and because of theoretical and practical identifiability issues, several searches yield different models, which emphasizes the need for an expert to choose the most appropriate model. The method is applied to generate reactor models of a primary clarifier with unknown exact volume. The volume of the resulting models corresponds to the expectation and virtual tracer experiment performed on the synthetic models generally confirms with an experiment performed on-site.
The knowledge discovery projects show that optimal BibTeX entry too long. Truncated
Genetic Programming entries for David J Duerrenmatt