Elsevier

Digital Signal Processing

Volume 40, May 2015, Pages 53-62
Digital Signal Processing

A robust baseline elimination method based on community information

https://doi.org/10.1016/j.dsp.2015.02.015Get rights and content

Highlights

  • Excellent community provides threshold, global and local slope information.

  • Global and local slope information further confirm peak distribution information.

  • An adaptive iteratively reweighted genetic programming model the baseline.

Abstract

Baseline correction is an important pre-processing technique used to separate true spectra from interference effects or remove baseline effects. In this paper, an adaptive iteratively reweighted genetic programming based on excellent community information (GPEXI) is proposed to model baselines from spectra. Excellent community information which is abstracted from the present excellent community includes an automatic common threshold, normal global and local slope information. Significant peaks can be firstly detected by an automatic common threshold. Then based on the characteristic that a baseline varies slowly with respect to wavelength, normal global and local slope information are used to further confirm whether a point is in peak regions. Moreover the slope information is also used to determine the range of baseline curve fluctuation in peak regions. The proposed algorithm is more robust for different kinds of baselines and its curvature and slope can be automatically adjusted without prior knowledge. Experimental results in both simulated data and real data demonstrate the effectiveness of the algorithm.

Introduction

Fourier transform infrared spectroscopy can be a valuable tool for measuring many chemical and physical properties of materials. However, it is a severe problem that spectra generally consist of peaks and noise superimposed on a baseline. Usually these baselines can be either flat, linear, curved or a combination of all three. Compared with peaks, their main character is that they vary much more slowly than the peaks do. The worst is that baselines vary greatly from spectrum to spectrum, even in similar samples. Thus, it is hard to eliminate them and this situation hampers the interpretation of spectra, which makes the removal of baseline drift necessary.

Baseline elimination for spectral data has been studied intensively and several methods have already been presented. These methods can be divided into two categories: manual and automatic techniques. In the manual method [1], the baseline is constructed by using linear, polynomial, or spline functions fitted on the no signal (baseline) points selected by users. If the points are correctly selected, the construction would produce satisfactory results. Obviously, this technique is subjective, time-consuming, and poorly reproducible [2].

In contrast, automatic baseline correction is called for and more widely employed. Among these methods, the wavelet transform has become a useful tool in background removal [3], [4]. However, inappropriate wavelet and resolution level selection are detrimental to baseline estimation. Fourier transform method which can generate the frequency components from the original spectrum is used to make a discrimination among baseline (low frequency), signal (mid frequency), and noise (high frequency) components. Then these frequency components can be filtered by a band-pass or high-pass filter to eliminate unwanted spectral components. But the filter parameters are difficult to set for separating the baseline from the signal effectively [5]. In general, these approaches are based on a hypothesis that the background can be well separated (in the transformed domain) from the real signal. The derivative method [6] uses first derivatives or second derivatives to remove constant off-sets or linear baselines from the spectra. But the threshold, which determines how many peaks are selected from the smoothed differentiated spectrum, is difficult to set.

Recently, baseline correction algorithms with asymmetric least squares smoothing are proposed [7], [8], [9], [10]. The Whittaker smoother described by Eilers is used and only two parameters related to the rigid of the fitted curve and the noise level need to be tuned. But how to set the parameters is not always an easy task.

An iterative method based on polynomial curve fitting for automated estimation of baseline is proposed [11], [12], [13], [14], [15], [16]. These algorithms generate automatic threshold to distinguish the baseline from peaks by a fitted curve. Linear programming is used for baseline correction [17], the polynomial order is selected based on a criterion instead of the user's experience, but the criterion can be used only when these baseline correction processes with different polynomial orders have been completed and only be used in comparing results of these processes. These methods offer a promising approach to removing baseline effects in a simple, straightforward fashion. However, their performance depends on the two parameters predefined by the users. The parameters include the polynomial order and the threshold which is related to the noise level and other characters about the spectrum. Therefore, the accuracy of the estimation still depends on the user's prior knowledge.

If there is some slope or curvature information about the baseline, the parameter which is related to the rigid of the fitted curve and the threshold would be easier to set and these baseline correcting methods should have more chances to present satisfied results.

Usually there isn't any information about baseline before a baseline correction process, but more and more knowledge about the baseline can be obtained with the deepening of the process. In this paper, this knowledge is used in the adaptive iterative baseline correction process to help automatically define the rigid of the fitted curve. Reweighted genetic programming based on excellent community information (GPEXI) is proposed to recognize and model baseline automatically. Here excellent community information includes common automatic threshold, global slope, local slope, and curvature information which are obtained from these present common baseline areas determined by excellent community selected from the current population of GP. The proposed method uses an automatic threshold defined by excellent community information instead of one curve to discriminate baseline areas and peaks. The order of polynomial is automatically determined during the learning process without prior knowledge of spectra. By this way, an iteratively procedure is executed to gradually approximate a complex baseline.

In Section 2, some useful and important preliminary ideas are discussed. The proposed reweighted genetic programming based on excellent community information (GPEXI) are given in Section 3. These methods about how to extract excellent community information from each generation and how to use this information are also given in this section. Section 4 presents some simulated data which are used to illustrate the performances of the proposed method. The effectiveness of the method is also demonstrated through applications on experimental spectra. Finally, some conclusions are given in Section 5.

Section snippets

Problem modeling

Assume that the I-point spectrum is {(x1,y(x1)),(xi,y(xi)),,(xI,y(xI))}. It can be modeled as y(xi)=b(xi)+e(xi), 1iI where: xi is a wavelength value. y=(y(x1),y(x2),,y(xI)) is a I point positive peak spectrum. b=(b(x1),b(x2),,b(xI)) denotes the baseline itself. e=(e(x1),e(x2),,e(xI)) denotes the residual, peaks, and physical noise. The baseline can be modeled asb(xi)=f(xi,a). Here, f() and a are functions and parameters. Baseline should have the following properties: 1) being smooth,

The baseline correction algorithm based on community information

Here, genetic programming provides multiple estimated baseline curves with different smoothness and recognizes baseline areas by community information which is abstracted from all these estimated curves. The characteristic of GP, which is a population based optimization technique, is used to improve the accuracy of the baseline region recognition. Then GP models the baseline without pre-specifying the structure of it.

Considering that a baseline generally varies much slower than a signal, so

Experimental results

Three simulated spectral datasets and experimental spectra are used to validate the performance of the proposed method.

Conclusion

The interpretation of spectroscopic data is largely hampered by the baseline or trend problem. Many proposed automatic curve fitting methods need to predefine the curve order and thresholds. But how to set these parameters is based on the user's experience. Generally the performance of these automatic correction algorithms needs to be tested more times and then the final result is chosen from these tests based on the user's experience.

In this paper, a novel spectra baseline correction algorithm

Acknowledgements

This work is supported by National Natural Science Foundation of China (No. 51177002, 61032007), Doctoral Fund of Ministry of Education of China (No. 20113401120007), and Anhui Provincial Natural Science Project (KJ2012A012).

Yanling Wu was born in Anhui, China, in 1977. She received her Ph.D. degree in control science and engineering from the Zhejiang University in 2009. She is currently an associate professor with the School of Electrical Engineering and Automation, Anhui University. Her research interests are evolutionary computation, robust estimation, and spectral analysis.

References (24)

  • P.A. Mosier-Boss et al.

    Fluorescence rejection in Raman spectroscopy by shifted-spectra, edge detection, and FFT filtering techniques

    Appl. Spectrosc.

    (1995)
  • M.N. Leger et al.

    Comparison of derivative preprocessing and automated polynomial baseline correction method for classification and quantification of narcotics in solid mixtures

    Appl. Spectrosc.

    (2006)
  • Cited by (0)

    Yanling Wu was born in Anhui, China, in 1977. She received her Ph.D. degree in control science and engineering from the Zhejiang University in 2009. She is currently an associate professor with the School of Electrical Engineering and Automation, Anhui University. Her research interests are evolutionary computation, robust estimation, and spectral analysis.

    Qingwei Gao was born in Anhui, China, in 1965. He received his Ph.D. degree in information and communication engineering from the University of science and technology of China, in 2002. He is currently a professor with the School of Electrical Engineering and Automation, Anhui University. His research interests include wavelet analysis, image processing and fractal signal processing.

    Yuanyuan Zhang was born in Anhui, China, in 1977. She received her Ph.D. degree in test and measurement technology and instrument from HeFei University of Technology, in 2010. She is currently an associate professor with the School of Electrical Engineering and Automation, Anhui University. Her research interests are nonlinear modeling and control, optimization algorithm.

    View full text