abstract = "Algorithms for constructing classification models in
streaming data scenarios are attracting more attention
in the era of artificial intelligence and machine
learning for data analysis. The huge volumes of
streaming data necessitate a learning framework with
timely and accurate processing. For a streaming
classifier to be deployed in the real world, multiple
challenges exist such as 1) Concept drift, 2)
Imbalanced data; and 3) Costly labeling processes.
These challenges become more crucial when they occur in
sensitive fields of operation such as network security.
The objective of this thesis is to provide a team-based
genetic programming (GP) framework to explore and
address these challenges with regard to network-based
services. The GP classifier incrementally introduces
changes to the model throughout the course of the
stream to adapt to the content of the stream. The
framework is based on an active learning approach where
the learning process happens in interaction with a data
subset to build a model. Thus, the design of the system
is founded on the introduction of sampling and
archiving policies to decouple the stream distribution
from the training data subset. These policies work with
no prior information on the distribution of classes and
true labels. Benchmarking is conducted with real-world
network security datasets with label budgets in the
order of 5 to 0.5 percent and significant class
imbalance. Evaluations for the detection of minor
classes have been performed that represent the
classifier behaviour in case of attacks. Comparisons to
the current streaming algorithms and specifically
network state-of-the-art frameworks for streaming
processing under label budgets demonstrate the
effectiveness of the proposed GP framework to address
the challenges related to streaming data. Furthermore,
the applicability of the proposed framework in network
and security analytics is demonstrated.",
notes = "Nee Sara Rahimi?
Argus CTU-13
Supervisors: Nur Zincir-Heywood and Malcolm Heywood",