Abstract
Preprocessing tandem mass spectra to classify the signal and noise peaks plays a crucial role for improving the accuracy of most peptide identification algorithms. As a CID tandem mass spectra dataset is highly imbalanced with high noise ratio and a small number of signal peaks (low signal to noise ratio), a classification strategy which is able to maintain the performance trade-off between the minority (signal) and the majority (noise) class accuracies prior to peptide identification is required. Therefore, this paper proposes a Multi-Objective Genetic Programming (MOGP) approach based on the idea of MOEA/D, named MOGP/D, to evolve a Pareto front of classifiers along the optimal trade-off surface that offers the best compromises between objectives. In comparison with an NSGA-II base MOGP method, called NSGP, with decreasing the signal to noise ratio, MOGP/D produces better solutions in the region of interest (centre of the Pareto front) according to the hypervolume indicator on the training sets. Moreover, the best compromise solution achieved by the proposed method is compared with the best single objective GP and the best of NSGP, and the results show that MOGP/D retains a reasonable number of signal peaks and filters more noise peaks compared to the other two methods. To further evaluate the effectiveness of MOGP/D, the preprocessed MS/MS data is submitted to the mostly used de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed multi-objective GP method improves the reliability of peptide identification compared to the single objective GP.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Sheng, Q., et al.: Preprocessing significantly improves the peptide/protein identification sensitivity of high-resolution isobarically labeled tandem mass spectrometry data. Mol. Cell. Proteomics 14(2), 405–417 (2015)
Azari, S., Zhang, M., Xue, B., Peng, L.: Genetic programming for preprocessing tandem mass spectra to improve the reliability of peptide identification. In: Vellasco, M. (ed.) 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018. IEEE (2018)
Azari, S., Xue, B., Zhang, M., Peng, L.: Preprocessing tandem mass spectra using genetic programming for peptide identification. J. Am. Soc. Mass Spectrom. 30, 1–14 (2019)
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Reusing genetic programming for ensemble selection in classification of unbalanced data. IEEE Trans. Evol. Comput. 18(6), 893–908 (2013)
Bhowan, U., Johnston, M., Zhang, M., Yao, X.: Evolving diverse ensembles using genetic programming for classification with unbalanced data. IEEE Trans. Evol. Comput. 17(3), 368–386 (2012)
Nguyen, B.H., Xue, B., Andreae, P., Ishibuchi, H., Zhang, M.: Multiple reference points-based decomposition for multiobjective feature selection in classification: static and dynamic mechanisms. IEEE Trans. Evol. Comput. 1(1), 170–184 (2020). https://doi.org/10.1109/TEVC.2019.2913831
Ma, X., Zhang, Q., Tian, G., Yang, J., Zhu, Z.: On tchebycheff decomposition approaches for multiobjective evolutionary optimization. IEEE Trans. Evol. Comput. 22(2), 226–244 (2017)
Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Parizeau, M., Gagné, C.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)
Wessels, H.J.C.T., et al.: A comprehensive full factorial LC-MS/MS proteomics benchmark data set. Proteomics 12(14), 2276–2281 (2012)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002)
Riquelme, N., Von Lücken, C., Baran, B.: Performance metrics in multi-objective optimization. In: 2015 Latin American Computing Conference (CLEI), pp. 1–11. IEEE (2015)
Paul, S., Das, S.: Simultaneous feature selection and weighting-an evolutionary multi-objective optimization approach. Pattern Recogn. Lett. 65, 51–59 (2015)
Ma, B., et al.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Azari, S., Xue, B., Zhang, M., Peng, L. (2020). A Decomposition Based Multi-objective Genetic Programming Algorithm for Classification of Highly Imbalanced Tandem Mass Spectrometry. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-41299-9_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41298-2
Online ISBN: 978-3-030-41299-9
eBook Packages: Computer ScienceComputer Science (R0)