This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

Disclaimer:
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.

Summary

DS-1270
Auto-labeled Corpus
External Dataset
External Data Source
GitHub
Unknown
Unknown
56 (lowest rank is 56)

Category & Restrictions

Other
cyber crime
Unrestricted
true

Description


This is a corpus of auto-labeled cyber security domain text which was used for automatically extracting security-related entities using machine learning. This was generated for use in the Stucco project. This includes all descriptions from CVE/NVD entries starting in 2010.

This corpus was generated and first used in the following paper, which provides many additional details.
Bridges, Robert A., et al. "Automatic Labeling for Entity Extraction in Cyber Security." accepted The Third ASE International Conference on Cyber Security 2014. Preprint arXiv preprint arXiv:1308.4941 (2013).
The src/python/tagging directory contains scripts to generate and tag the initial corpus, using various heuristics. The src/python/learning directory contains scripts to generate a model from the tagged corpus, and then evaluate this model. Training, as well as testing, are done for IOB-tagging, and then domain labeling, but the process is the same for both.

Additional Details

9.0MB
false
Unknown
corpus, auto, labeled, auto-labeled corpus, 1270, external data source, corporation, inferlink corporation, external, source, inferlink, security, cyber, generated, learning, domain, machine, entities, includes, automatically, cve, entries, nvd, starting, 2010, stucco, descriptions, project, text, extracting, preprint, model, src, directory, scripts, labeling, python, tagging, generate, arxiv, ase, details, 4941, entity, process, iob, 2013, 2014, initial, automatic, heuristics, bridges, al, extraction, international, tag, 1308, evaluate, other, conference, additional, training, robert, accepted, tagged, paper, testing