This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.


KDD Cup 1999 Data
External Dataset
External Data Source
University of California, Irvine
51 (lowest rank is 51)

Category & Restrictions

cyber defense competitions, intrusion detection, simulated attacks, local networks


This data set consists of wide variety of intrusions simulated in a military network environment.

This is the data set used for intrusion detector learning task in the Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99, The Fifth International Conference on Knowledge Discovery and Data Mining.      The intrusion detector learning task is to build a predictive model (i.e. a classifier) capable of distinguishing between ``bad'' connections, called intrusions or attacks, and ``good'' normal connections.

The 1998 DARPA Intrusion Detection Evaluation Program was prepared and managed by MIT Lincoln Labs. The objective was to survey and evaluate research in intrusion detection.    A standard set of data to be audited, which includes a wide variety of intrusions simulated in a military network environment, was provided.    The 1999 KDD intrusion detection contest uses a version of this dataset.

Lincoln Labs set up an environment to acquire nine weeks of raw TCP dump data for a local-area network (LAN) simulating a typical U.S. Air Force LAN.    They operated the LAN as if it were a true Air Force environment, but peppered it with multiple attacks.

The raw training data was about four gigabytes of compressed binary TCP dump data from seven weeks of network traffic.    This was processed into about five million connection records.    Similarly, the two weeks of test data yielded around two million connection records. ;

Additional Details

classification algorithms, predictive modelling, artificial neural network, federally funded research and development centers, data mining, statistical classification, inferlink corporation, darpa, market research, machine learning, mit lincoln laboratory, sigkdd, external data source, intrusion detection system, training test and validation sets, kdd cup 1999 data, 937, massachusetts institute of technology, research institute, laboratories in the united states