[Return to Search]

This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

Disclaimer:
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.

Summary

Dataset ID

DS-1146

DOI

10.23721/100/1503333

Name

Ember: Endgame Malware BEnchmark for Research

Record Type

External Dataset

Provider

External Data Source

Host

InferLink Corporation

Collection Starting

01/01/2017

Collection Ending

12/31/2017

Popularity Rank

56 (lowest rank is 56)

Category & Restrictions

Description

Short Description
A labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files

Long Description
The ember dataset is a collection of 1.1 million sha256 hashes from PE files that were scanned sometime in 2017. This repository makes it easy to reproducibly train the benchmark model, extend the provided feature set, or classify new PE files with the benchmark model.
The dataset includes features extracted from 1.1M binary files: 900K training samples (300K malicious, 300K benign, 300K unlabeled) and 200K test samples (100K malicious, 100K benign). The dataset is accompanied by open source code for extracting features from additional binaries so that additional sample features can be appended to the dataset. This dataset fills a void in the information security machine learning community: a benign/malicious dataset that is large, open and general enough to cover several interesting use cases. ; Hyrum Anderson

External URL
https://github.com/endgameinc/ember

Additional Details

Size

N/A

Anonymized

false

Ongoing Measurement

false

Generated Keywords

benchmark, ember, malware, ember: endgame malware benchmark for research, 1146, endgame, 2017, source, corporation, external, external data source, inferlink, inferlink corporation, dataset, files, malicious, machine, training, learning, windows, executable, detect, portable, statically, models, labeled, 300k, features, benign, 100k, additional, samples, pe, model, appended, classify, security, 1m, anderson, 900k, extracting, repository, 200k, extracted, sample, accompanied, cover, feature, scanned, hyrum, community, easy, test, reproducibly, extend, void, sha256, train, binary, hashes, fills, binaries, other, code, unlabeled, includes

Additional Keywords

Dataset Details

Summary

Category & Restrictions

Description

Additional Details