The Development of Symbolic Expressions for the Detection of Hepatitis C Patients and the Disease Progression from Blood Parameters Using Genetic Programming-Symbolic Classification Algorithm
Created by W.Langdon from
gp-bibliography.bib Revision:1.7954
- @Article{Andelic:2022:applsci,
-
author = "Nikola Andelic and Ivan Lorencin and
Sandi {Baressi Segota} and Zlatan Car",
-
title = "The Development of Symbolic Expressions for the
Detection of Hepatitis C Patients and the Disease
Progression from Blood Parameters Using Genetic
Programming-Symbolic Classification Algorithm",
-
journal = "Applied Sciences",
-
year = "2022",
-
volume = "13",
-
number = "1",
-
pages = "Article no 574",
-
month = dec,
-
email = "nandelic@riteh.hr",
-
keywords = "genetic algorithms, genetic programming, ADASYN,
borderline SMOTE, genetic programming-symbolic
classifier, Hepatitis C, fibrosis, cirrhosis, SMOTE",
-
publisher = "MDPI",
-
ISSN = "2076-3417",
-
URL = "https://www.mdpi.com/2076-3417/13/1/574",
-
DOI = "doi:10.3390/app13010574",
-
size = "33 pages",
-
abstract = "Hepatitis C is an infectious disease which is caused
by the Hepatitis C virus (HCV) and the virus primarily
affects the liver. Based on the publicly available
dataset used in this paper the idea is to develop a
mathematical equation that could be used to detect HCV
patients with high accuracy based on the enzymes,
proteins, and biomarker values contained in a patient
blood sample using genetic programming symbolic
classification (GPSC) algorithm. Not only that, but the
idea was also to obtain a mathematical equation that
could detect the progress of the disease i.e.,
Hepatitis C, Fibrosis, and Cirrhosis using the GPSC
algorithm. Since the original dataset was imbalanced (a
large number of healthy patients versus a small number
of Hepatitis C/Fibrosis/Cirrhosis patients) the dataset
was balanced using random oversampling, SMOTE, ADSYN,
and Borderline SMOTE methods. The symbolic expressions
(mathematical equations) were obtained using the GPSC
algorithm using a rigorous process of 5-fold
cross-validation with a random hyperparameter search
method which had to be developed for this problem. To
evaluate each symbolic expression generated with GPSC
the mean and standard deviation values of accuracy
(ACC), the area under the receiver operating
characteristic curve (AUC), precision, recall, and
F1-score were obtained. In a simple binary case
(healthy vs. Hepatitis C patients) the best case was
achieved with a dataset balanced with the Borderline
SMOTE method. The results are ACC For the best binary
and multi-class cases, the symbolic expressions are
shown and evaluated on the original dataset.",
- }
Genetic Programming entries for
Nikola Andelic
Ivan Lorencin
Sandi Baressi Segota
Zlatan Car
Citations