This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

Disclaimer:
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.

Summary

DS-1093
HTTPS Ecosystem Scans
External Dataset
External Data Source
Internet-Wide Scan Data Repository
Unknown
Unknown
55 (lowest rank is 55)

Category & Restrictions

Other
address space status data
Unrestricted
Unknown

Description


Regular and continuing scans of the HTTPS Ecosystem from 2012 and 2013 including parsed and raw X.509 certificates, temporal state of scanned hosts, and the raw ZMap output of scans on port 443. The dataset contains approximately 43 million unique certificates from 108 million hosts collected via 100+ scans.

This dataset is composed of four parts: parsed certificates, raw certificates, individual scans (status of each responsive host in a single complete scan of the IPv4 address space), and raw ZMap output of TCP SYN scans on port 443. While we have split these into individual parts, the data is optimized for use in a relational database such as PostgreSQL or MySQL. The files certificates.csv.gz, public_keys.csv.gz, and extraneous_extensions.csv.gz contain parsed data from all certificates we have encountered over the course of our scanning. The certificates relation contains all common data found in a certificate (e.g. subject, issuer, etc). The relation is keyed on "id" and is also unique based on SHA-1 fingerprint. The issuer_id attribute is a self-referntial attribute back to the parent certificate's id. Certificates are valided using OpenSSL and recently downloaded root stores. We attempt to validate each certificate against the browser store along with any previously seen intermediate certificates in order to account for missing certificate chains. The validation is represented in the is-*-trusted attributes. We further validate the certificate for other issues (e.g. expiration, invalid signature), not including the trust chain, which is stored in the is-valid and validation-error attributes. The keys relation contains unique parsed RSA and DSA keys and is linked to by certificates.public_key_id == public_keys.id. Other types of keys are noted in the certificates relation, but are not otherwise further parsed. All other non-binary X.509 extensions are stored in the extraneous extensions relation. The scan files we provide contain data about every host that completed a successful TLS handshake on port 443 during a single comprehensive scan of the IPv4 address space. For each host we include: host IP address, certificate ID, the SHA-1 fingerprint of the certificate, and the timestamp at which the TLS handshake was completed. The data specifically originates from a PostgreSQL 9.2 database, whose schema is available in schema.txt, and we recommend for hosting this dataset. Strings are delimited with a double-quote and newlines are replaced with \n. Information about specific fields can be found in schema.txt. ; https-team@umich.edu

Additional Details

N/A
false
false
transport layer security implementation, universities and colleges in michigan, web browser, chain of trust, transport layer protocols, public university systems in the united states, cryptographic protocol, communication protocol, application layer protocols, key management, database application, transport layer security, external data source, address space, cryptographic software, https ecosystem scans, public key cryptography, ipv4, data modeling, computer architecture, free security software, browser, internet protocol, cryptosystem, flagship universities in the united states, university of michigan, uniform resource identifier, hypertext transfer protocol clients, hypertext transfer protocol, public key certificate, inferlink corporation, e commerce, internet security, transmission control protocol, relational database, database schema, big ten conference schools, 1093, public universities in michigan, secure communication, database theory, history of computing, communication software, schools of public health in the united states, ip address, openssl, relational model, trusted computing