This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

Disclaimer:
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.

Summary

DS-0940
HTTP DATASET CSIC 2010
External Dataset
External Data Source
Information Security Institute
Unknown
11/05/2007
50 (lowest rank is 50)

Category & Restrictions

Other
simulated attacks
Unrestricted
Unknown

Description


The HTTP dataset CSIC 2010 contains thousands of web requests automatically generated. It can be used for the testing of web attack protection systems. It was developed at the "Information Security Institute" of CSIC (Spanish Research National Council)

The HTTP dataset CSIC 2010 contains the generated traffic targeted to an eCommerce web application developed at our department. In this web application, users can buy items using a shopping cart and register by providing some personal information. As it is a web application in Spanish, the data set contains some Latin characters.
The dataset is generated automatically and contains 36,000 normal requests and more than 25,000 anomalous requests. The HTTP requests are labeled as normal or anomalous and the dataset includes attacks such as SQL injection, buffer overflow, information gathering, files disclosure, CRLF injection, XSS, server side include, parameter tampering and so on. This dataset has been successfully used for web detection in previous works [4, 5, 6, 7, 8, 9].

The traffic is generated following the next steps:

First, real data are collected for all the parameters of the web application. All the data (names, surnames, addresses, etc.) are extracted from real databases. These values are stored in two databases: one for the normal values and other for the anomalous ones. Additionally, all the public available pages of the web application are listed.
Next, normal and anomalous requests are generated for every web page. In the case that normal requests have parameters, the parameter values are filled out with data taken from the normal database randomly. The process is analogous for anomalous requests, where the values of the parameters are taken from the anomalous database.

Three types of anomalous requests were considered:

1) Static attacks try to request hidden (or non-existent) resources. These requests include obsolete files, session ID in URL rewrite, configuration files, default files, etc.

2) Dynamic attacks modify valid request arguments: SQL injection, CRLF injection, cross-site scripting, buffer overflows, etc.

3) Unintentional illegal requests. These requests do not have malicious intention, however they do not follow the normal behavior of the web application and do not have the same structure as normal parameter values (for example, a telephone number composed of letters).

The attacks were generated with the help of tools such as Paros [10] and W3AF[11].

The WAFs where this dataset was used [4,5,6,7] follow the anomaly approach, i.e. the normal behavior of the web application is defined and the behavior apart from that are considered anomalous. Therefore, in this approach only normal traffic is needed for the training phase.

The dataset is divided into three different subsets. One subset for the training phase, which has only normal traffic. And two subsets for the test phase, one with normal traffic and the other one with malicious traffic. ;

Additional Details

N/A
false
Unknown
buffer overflow, computer memory, centennial conference schools, intelligence assessment, johns hopkins university, web design, data serialization formats, transport layer security, evaluation, external data source, hacking, applications of cryptography, internet protocol, sql injection, http dataset csic 2010, uniform resource identifier, hypertext transfer protocol, telephone number, e commerce, internet security, cyberwarfare, information privacy, history of computing, supply chain management, online bookmarking services, cyberattack, universities and colleges in baltimore, cybercrime, web security exploits, web browser, network analyzers, server side, information economy, cryptographic protocol, communication protocol, application layer protocols, default, servers, cross site scripting, intelligence operations, disclosure, data security, injection exploits, web page, inferlink corporation, session id, personally identifiable information, hotlist, secure communication, 940, geosocial networking, configuration file, w3af, exploit