Elsevier

Expert Systems with Applications

Volume 36, Issue 10, December 2009, Pages 11994-12000
Expert Systems with Applications

Review
Intrusion detection by machine learning: A review

https://doi.org/10.1016/j.eswa.2009.05.029Get rights and content

Abstract

The popularity of using Internet contains some risks of network attacks. Intrusion detection is one major research problem in network security, whose aim is to identify unusual access or attacks to secure internal networks. In literature, intrusion detection systems have been approached by various machine learning techniques. However, there is no a review paper to examine and understand the current status of using machine learning techniques to solve the intrusion detection problems. This chapter reviews 55 related studies in the period between 2000 and 2007 focusing on developing single, hybrid, and ensemble classifiers. Related studies are compared by their classifier design, datasets used, and other experimental setups. Current achievements and limitations in developing intrusion detection systems by machine learning are present and discussed. A number of future research directions are also provided.

Introduction

The Internet has become a part of daily life and an essential tool today. It aids people in many areas, such as business, entertainment and education, etc. In particular, Internet has been used as an important component of business models (Shon & Moon, 2007). For the business operation, both business and customers apply the Internet application such as website and e-mail on business activities. Therefore, information security of using Internet as the media needs to be carefully concerned. Intrusion detection is one major research problem for business and personal networks.

As there are many risks of network attacks under the Internet environment, there are various systems designed to block the Internet-based attacks. Particularly, intrusion detection systems (IDSs) aid the network to resist external attacks. That is, the goal of IDSs is to provide a wall of defense to confront the attacks of computer systems on Internet. IDSs can be used on detect difference types of malicious network communications and computer systems usage, whereas the conventional firewall can not perform this task. Intrusion detection is based on the assumption that the behavior of intruders different from a legal user (Stallings, 2006).

In general, IDSs can be divided into two categories: anomaly and misuse (signature) detection based on their detection approaches (Anderson, 1995, Rhodes et al., 2000). Anomaly detection tries to determine whether deviation from the established normal usage patterns can be flagged as intrusions. On the other hand, misuse detection uses patterns of well-known attacks or weak spots of the system to identify intrusions.

In literature, numbers of anomaly detection systems are developed based on many different machine learning techniques (c.f. Section 3). For example, some studies apply single learning techniques, such as neural networks, genetic algorithms, support vector machines, etc. On the other hand, some systems are based on combining different learning techniques, such as hybrid or ensemble techniques. In particular, these techniques are developed as classifiers, which are used to classify or recognize whether the incoming Internet access is the normal access or an attack. However, there is no a review of these different machine learning techniques over the intrusion detection domain.

Therefore, the goal of this paper is to review 55 related studies/systems published from 2000 to 2007 by examining what techniques have been used, what experiments have been conducted, and what should be considered for future work based on the machine learning’s perspective.

This paper is organized as follows. Section 2 provides an overview of machine learning techniques and briefly describes a number of related techniques for intrusion detection. Section 3 compares related work based on the types of classifier design, the chosen baselines, datasets used for experiments, etc. Conclusion and discussion for future research are given in Section 4.

Section snippets

Pattern classification

Pattern recognition is the action to take raw data and activity on data category (Michalski, Bratko, & Kubat, 1998). The methods of supervised and unsupervised learning can be used to solve different pattern recognition problems (Theodoridis and Koutroumbas, 2006, Theodoridis and Koutroumbas, 2006). In supervised learning, it is based on using the training data to create a function, in which each of the training data contains a pair of the input vector and output (i.e. the class label).

The

Types of classifier design

The methods for intrusion detection can be generally divided into three categories, namely single, hybrid, and ensemble. To understand the types of classifier design, Table 1 shows the total numbers of the 55 articles using single, ensemble, and hybrid classifiers respectively. Fig. 1 presents yearwise distribution of these articles in terms of their classifier design.

Regarding Table 1, single classifiers have the largest number of literatures between 2000 and 2007. On the other hand, very few

Discussion and conclusion

We have reviewed current studies of intrusion detection by machine learning techniques. In particular, this paper reviews recent papers which are between 2000 and 2007. In addition, we consider a large number of machine learning techniques used in the intrusion detection domain for the review including single, hybrid, and ensemble classifiers.

Regarding the comparative results of related work, developing intrusion detection systems using machine learning techniques still needs to be researched.

References (77)

  • S. Manocha et al.

    An empirical analysis of the probabilistic K-nearest neighbour classifier

    Pattern Recognition Letters

    (2007)
  • S. Mukkamala et al.

    Intrusion detection using an ensemble of intelligent paradigms

    Network and Computer Applications

    (2005)
  • T. Ozyer et al.

    Intrusion detection by integrating boosting genetic fuzzy classifier and data mining criteria for rule pre-screening

    Journal of Network and Computer Applications

    (2007)
  • S. Peddabachigari et al.

    Modeling intrusion detection system using hybrid intelligent systems

    Journal of Network and Computer Applications

    (2007)
  • S.L. Scott

    A Bayesian paradigm for designing intrusion detection systems

    Computational Statistics and Data Analysis

    (2004)
  • T. Shon et al.

    Applying genetic algorithm for classifying anomalous TCP/IP packets

    Neurocomputing

    (2006)
  • T. Shon et al.

    A hybrid machine learning approach to network anomaly detection

    Information Sciences

    (2007)
  • A.N. Toosi et al.

    A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers

    Computer Communication

    (2007)
  • C.-H. Tsang et al.

    Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection

    Pattern Recognition

    (2007)
  • Y. Wang et al.

    A latent class modeling approach to detect network intrusion

    Computer Communications

    (2006)
  • C. Zhang et al.

    Intrusion detection using hierarchical neural network

    Pattern Recognition Letters

    (2005)
  • Z. Zhang et al.

    Application of online-training SVMs for real-time intrusion detection with different considerations

    Computer Communications

    (2005)
  • M.S. Abadeh et al.

    A parallel genetic local search algorithm for intrusion detection in computer networks

    Engineering Applications of Artificial Intelligence

    (2007)
  • R. Agarwal et al.

    A new framework for learning classifier models in data mining

    (2000)
  • J. Anderson

    An introduction to neural networks

    (1995)
  • B. Balajinath et al.

    Intrusion detection through behavior model

    Computer Communication

    (2000)
  • C.M. Bishop

    Neural networks for pattern recognition

    (1995)
  • Bouzida, Y., Cuppens, F., Cuppens-Boulahia, N., & Gombault, S. (2004). Efficient intrusion detection using principal...
  • L. Breiman et al.

    Classification and regressing trees

    (1984)
  • Bridges, S. M., & Vaughn, R. B. (2000). Intrusion detection via fuzzy data mining. In Paper presented at the twelfth...
  • Chavan, S., Shah, K. D. N., & Mukherjee, S. (2004). Adaptive neuro-fuzzy intrusion detection systems. In Paper...
  • Y. Chen et al.

    Hybrid flexible neural-tree-based intrusion detection systems

    International Journal of Intelligent Systems

    (2007)
  • Chimphlee, W., Addullah, A. H., Sap, M. N. M., Srinoy, S., & Chimphlee, S. (2006). Anomaly-based intrusion detection...
  • L. Ertoz et al.

    Detection and Summarization of Novel Network Attacks Using Data Mining

    (2003)
  • E. Eskin et al.

    A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data

    (2002)
  • W. Fan et al.

    Using artificial anomalies to detect unknown and known network intrusions

    Knowledge and Information Systems

    (2001)
  • W. Fan et al.

    Using artificial anomalies to detect unknown and known network intrusions

    Knowledge and Information Systems

    (2004)
  • Florez, G., Bridges, S. M., & Vaughn, R. B. (2002). An improved algorithm for fuzzy data mining for intrusion...
  • Cited by (757)

    View all citing articles on Scopus
    View full text