This is a non-IMPACT record, meaning that access to the data is not
controlled by IMPACT. For access, see the directions below.
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.
Real Data Corpus - Naval Postgraduate School
External Dataset
Naval Postgraduate School
Naval Postgraduate School
57 (lowest rank is 57)
The Real Data Corpus (RDC) is a collection of disk images extracted from secondary storage devices that were acquired from second-hand markets around the world. In total, the RDC currently consists of 58 TiB of data contained in 3,127 disk images from 29 countries.
Real Data Corpus
The Real Data Corpus (RDC) is a collection of disk images extracted from secondary storage devices that were acquired from second-hand markets around the world. In total, the RDC currently consists of 58 TiB of data contained in 3,127 disk images from 29 countries. A variety of devices are represented, including magnetic media and solid state storage from laptops, desktops, mobile phones, USB memory sticks, and other media. The dataset is hosted in the HPC infrastructure at the Naval Postgraduate School, as well as in AWS Govcloud.
Potential Uses
The Real Data Corpus is a one-of-a-kind scientific resource for:
-Developing and validating forensic and data recovery tools.
-Training students in forensics and data recovery
-Developing and validating document translation software.
-Exploring and characterizing real-world computing practices, configuration choices, and option settings.
-Studying the storage allocation strategies of file systems under real-world conditions
The RDC has been cited in over 60 articles. See our current list here. Access and Availability
Please contact us if you would like access to the Real Data Corpus. In general, due to privacy concerns, we do not release copies of the data to private individuals. However, depending on the requirements of the project, we may be able to offer access through one of two methods: 1.Mediated Access. Researchers submit source code, build instructions, and detailed instructions for running their experiment. We return sanitized results. This is the most expedient option in cases where the desired experiment does not involve human subjects research.
2.Direct Access. Researchers create virtual machines on Amazon GovCloud, and these machines are granted access to the dataset. Because this method may involve direct contact with sensitive data, it involves additional review.
Please be aware that due to limited staff we cannot always accommodate all requests. Efforts are underway to develop infrastructure that will allow us to meet a wider range of research requirements without unduly increasing privacy risks. For more information or if you're interested in access to the Real Data Corpus, please contract:
Brittany Ramsey - Research Associate (831) 656-2014
Additional Details
postgraduate, naval, school, real, corpus, 790, real data corpus - naval postgraduate school, naval postgraduate school, 2006, rdc, disk, images, storage, devices, markets, hand, total, 127, acquired, tib, secondary, extracted, countries, consists, contained, access, contact, dataset, recovery, media, instructions, option, privacy, developing, involve, direct, validating, infrastructure, requirements, experiment, govcloud, researchers, machines, detailed, build, amazon, desktops, 2014, list, translation, involves, potential, articles, aws, settings, including, develop, studying, resource, sanitized, efforts, risks, nps, additional, phones, training, limited, mediated, network, running, aware, laptops, scientific, configuration, generic, private, concerns, virtual, mobile, memory, human, code, submit, computing, ramsey, software, blramsey, choices, conditions, wider, methods, document, students, cited, range, availability, requests, tools, sticks, sensitive, contract, release, expedient, granted, offer, unduly, depending, review, magnetic, represented, solid, method, generic network/behavior data, characterizing, current, file, return, usb, project, source, subjects, forensic, individuals, brittany, hosted, associate, copies, 831, increasing, meet, exploring, desired, variety, systems, accommodate, underway, 656, hpc, behavior, strategies, staff, practices, forensics, create