Spotlight Article:
Internet infrastructure in coastal areas at serious risk due to climate-change related sea water inundation
Graphics
Abstract Internet infrastructure in coastal areas at serious risk due to climate-change related sea water inundation. See "Lights Out: Climate Change Risk to Internet Infrastructure", to appear in Proceedings of ACM/IRTF/ISOC Applied Networking Workshop, July, 2018.
Links http://www.cs.wisc.edu/~pb/anrw18_final.pdf
Date Posted August 20, 2018
Spotlight Article:
Deployment Characteristics of "The Edge" in Mobile Edge Computing
Graphics
Abstract University of Wisconsin study analyzes cell tower deployments in the US, identifies locations where new towers could be deployed and how micro-data center deployments could enhance performance in edge-mobile computing. See "Deployment Characteristics of "The Edge" in Edge Mobile Computing", to appear in Proceedings of the ACM SIGCOMM Workshop on Edge Mobile Computing", August, 2018.
Links http://www.cs.wisc.edu/~pb/mecom18_final.pdf
Date Posted August 20, 2018
Spotlight Article:
Wrinkles in Time: Detecting Internet-wide Events via NTP
Graphics
Abstract Network Time Protocol (NTP) traffic can be analyzed to identify details of network-wide events such as attacks, outages, and configuration changes. See "Wrinkles in Time: Detecting Internet-wide Events via NTP", in Proceedings of the IFIP Networking Conference, May, 2018.
Links http://www.cs.wisc.edu/~pb/ifip18_final.pdf
Date Posted August 20, 2018
Dataset Spotlight:
AlphaBay 2014-2017 (non-anonymized)
Graphics
Description CMU has made non-anonymized data for the AlphaBay online anonymous marketplace available
Links https://impactcybertrust.org/dataset_view?idDataset=897
Date Posted August 20, 2018
Dataset Spotlight:
AlphaBay 2014-2017 (anonymized)
Graphics
Description CMU has made anonymized data for the AlphaBay online anonymous marketplace available
Links https://impactcybertrust.org/dataset_view?idDataset=896
Date Posted August 20, 2018
Dataset Spotlight:
Measuring the Longitudinal Evolution of the Online Anonymous Marketplace Ecosystem: Anonymized Dataset
Graphics
Description CMU has made anonymized data from the paper "Longitudinal Evolution of the Online Anonymous Marketplace Ecosystem" paper available
Links https://www.impactcybertrust.org/dataset_view?idDataset=844
Date Posted August 20, 2018
Dataset Spotlight:
Measuring the Longitudinal Evolution of the Online Anonymous Marketplace Ecosystem: Non-Anonymized Dataset
Graphics
Description CMU has made non-anonymized data from the paper "Longitudinal Evolution of the Online Anonymous Marketplace Ecosystem" paper available. This is restricted data.
Links https://www.impactcybertrust.org/dataset_view?idDataset=895
Date Posted August 20, 2018
Spotlight Article:
CAIDA's "Inferring Persistent Interdomain Congestion" awarded Best Paper at SIGCOMM 2018
Graphics
Abstract There is significant interest in the technical and policy communities regarding the extent, scope, and consumer harm of persistent interdomain congestion. We provide empirical grounding for discussions of interdomain congestion by developing a system and method to measure congestion on thousands of interdomain links without direct access to them. The full paper can be found at the link below.
Links http://www.caida.org/publications/papers/2018/inferring_persistent_interdomain_congestion
Date Posted August 20, 2018
Spotlight:
ISI's Networking Group Creates Interactive Internet Outages Heatmap
Graphics
Abstract A geographic Internet outages map developed by ISI researchers currently maps nine months of global Internet conditions around the clock.
Links https://www.isi.edu/news/story/333
https://ant.isi.edu/blog/?p=1141
https://ant.isi.edu/outage/world/
Date Posted December 7, 2017
Spotlight:
Should we trust the geolocation databases to geolocate routers?
Graphics
Abstract Geolocation databases are used by researchers and network operators to learn the real-world location of a given IP address.
Links https://blog.apnic.net/2017/11/03/trust-geolocation-databases-geolocate-routers/
Date Posted November 3, 2017
Dataset Spotlight:
Operational Research Data from Internet Namespace Logs (ORDINAL) Available!
Graphics
Description High volume traffic and log data on public-facing Internet hosts resulting from a significant and common DNS misconfiguration is available!
Links https://impactcybertrust.org/dataset_view?idDataset=794
Date Posted May 2017
Spotlight Article:
A Third of the Internet is Under Attack
Graphics
Abstract Study by CAIDA finds millions of network addresses subjected to denial-of-service attacks between 2015 and 2017.
Links https://www.theregister.co.uk/2017/11/05/caida_study_finds_one_third_of_the_internet_suffered_denial_of_service_attacks_between_2015_and_2017/
Related Publications http://ucsdnews.ucsd.edu/pressrelease/a_third_of_the_internet_is_under_attack
Date Posted November 5, 2017
Spotlight Article:
Internet Atlas Named One of the Greatest 100 Innovations of 2017 by Popular Science
Graphics
Links https://www.popsci.com/best-of-whats-new-list-2017
Date Posted October 17, 2017
Spotlight Article:
Evaluation of Hurricane Harvey's Effects on the Internet's Edge
Graphics
Abstract USC-ISI evaluate what outages in the edge networks Internet can say about damage on the ground.
Links https://ant.isi.edu/outage/ani/harvey/
Date Posted September 2017
New Funding Spotlight:
DHS S&T Awards $206K to Carnegie Mellon University for Development of Data Platforms for Analyzing Cyberattacks
Graphics
Abstract The Department of Homeland Security (DHS) Science and Technology Directorate (S&T) has awarded Carnegie Mellon University (CMU) in Pittsburgh $206,062 to develop data platforms that cybersecurity researchers can use to analyze cyberattacks.

The award was made through the S&T Cyber Security Division's Information Marketplace for Policy and Analysis of Cyber-risk & Trust (IMPACT) project. IMPACT supports the global cyber-risk research community by coordinating and developing real-world data and information-sharing capabilities, including tools, models and methodologies. To accelerate solutions for cyber-risk issues and infrastructure security, IMPACT enables empirical data and information-sharing between and among the global academia, industry and government cybersecurity research and development (R&D) community.

The university will conduct its work under a project titled "A Query-able Platform for Online Crime Repositories." Its objective is to build and deploy query-able online platforms for online crime repositories that CMU possesses: anonymous online marketplace data and search-redirection attack corpora, which are primarily used for attracting customers to illicit or fraudulent websites. The researchers also will integrate the anonymous online marketplace data with IMPACT as well as build and deploy a web-based graphical interfaces that will be accessible to all cybersecurity researchers at no cost.

The project will benefit researchers interested in obtaining large amounts of data for research on certain types of online criminal activity. Devising sound measurement data building blocks is a challenging activity for any researcher.

This project will allow them to focus on data analysis, bypassing the time-consuming data building phase. It also will enable separate researchers to conduct different analyses with identical datasets, ultimately making it possible to have reproducibility and comparability of their respective research results.
Links https://www.dhs.gov/science-and-technology/news/2017/07/03/news-release-dhs-st-awards-206k-carnegie-mellon-university
Date Posted June 2017
New Funding Spotlight:
DHS S&T Awards $220K to the University of Tulsa for a Study to Improve Sharing of Cybersecurity Data
Graphics
Abstract The Department of Homeland Security (DHS) Science and Technology Directorate (S&T) has awarded the University of Tulsa $220,209 to study data production and usage by cybersecurity researchers, information that will be used to help improve cybersecurity data-sharing.

The award was made through the S&T Cyber Security Division's Cyber Risk Economics (CyRiE) program. CyRiE supports research - measurement and modeling - into the business, legal, technical and behavioral aspects of the economics of cyber-threats, vulnerabilities and controls. A primary focal point of the program is data transparency, collection and disclosure. This focus will provide relevant, timely, accurate and comprehensive data to help shape effective policy, optimize cybersecurity risk management and advance understanding of the cyber-risk landscape.

Through a project titled "The Economics of Cybersecurity Research Data-Sharing", the university will examine published research to identify what data is being produced to understand what data that can be shared, how the research field is falling short in data-sharing and ultimately recommend how sharing can be improved. Additionally, the project will analyze usage of the research data collected for CSD's Information Marketplace for Policy and Analysis of Cyber-risk & Trust project to understand how existing, shared datasets are being leveraged by others. Last, the project will estimate the costs associated with data-sharing using information gathered by DHS.

The CyRiE project is working to improve value-based decision-making by those who own, operate, protect, and regulate the nation's vital data assets and critical infrastructure. The project goes beyond the traditional economic-based view of incentives for cybersecurity to approach cybersecurity risk as a multidimensional problem that requires multidisciplinary perspectives. In this way CyRiE R&D can more effectively address strategy and tactics for optimal cyber-risk avoidance, acceptance, mitigation and transfer.
Date Posted June 2017
Dataset Spotlight:
New Internet Census Data Available!
Graphics
Description USC-ISI has released new datasets, click on the link below for more information!
Links https://www.impactcybertrust.org/forum/viewtopic.php?pid=97#p97
Date Posted June 2017
Spotlight Article
Changing the Culture of Empirical Internet Assessment
Graphics
Description A paper on the culture of internet assessment
Links https://pdfs.semanticscholar.org/4a46/17bf55c3b54268d3ccd9cd43f1755d32ec95.pdf
Date Posted June 2017
New Feature:
Digital Object Identifiers (DOIs) Coming in April!
Graphics
Description The IMPACT program is implementing digital object identifiers (DOIs) to enhance the accessibility and citation of research datasets and to better track how datasets are used. DOIs are unique identifiers designed to make digital objects--research papers and datasets--permanently referenceable. A DOI is a short identifier, a string of digits and in some cases text, that can be used to reference the document or dataset, and which is guaranteed never to "break" if a document is moved, or a server or site retired: the document's owner can update where the DOI points to ensure that it can be found wherever it's migrated to.
Links DOI_article.doc
Date Posted April 2017
Dataset Spotlight:
New Insider Threat Data!
Graphics
Description Insider Threat Data Corpus.
High quality evaluation of insider threat research can be hampered by the lack of suitable test data. Such test data should capture the behavioral activity of both malicious insiders and benign enterprise users, but also be in the format of common data sensors expected in the target operational environment (e.g., corporate enterprise). Massachusetts Institute of Technology - Lincoln Labs (MIT-LL) is providing IMPACT with a dataset of a dozen simulations of malicious-insider-related scenarios (six scenarios, plus matching false-positive analogs where benign behavior might be mistaken for malicious intent). Each scenario's instrumentation consists of over 1TB of detailed cyber data (e.g., PCAP network packet traces, MS Outlook Mail Data File, MS Domain Controller and File Server event logs). The Lab's LARIAT system was enhanced to emulate realistic insider threat activities, such as malware installation or mail-based coordination, in addition to their nominal 'daily' activities. To create the data corpus, the LARIAT cyber range was instrumented for 1-2 weeks while the 50+ modeled users login / logout, browse an emulated Internet, send/reply mail, use Sharepoint and social networking services, as well as perform insider threat actions.
Date Posted April 2017
Interested in Using Data for Your Courses?
Graphics
Description The IMPACT Portal provide timely and relevant datasets available to the research community. Professors are encouraged to urge students to sign up to an account to access the IMPACT Portal! Visit: https://www.impactcybertrust.org/joinus to register for an account today!
Date Posted April 2017
Towards Characterizing International Routing Detours
Graphics
Author Anant Shah, Romain Fontunge, Christos Papadopoulos
Datasets Used PCH - IXP Member Lists
Abstract There are currently no requirements (technical or otherwise) that BGP paths must be contained within national boundaries. Indeed, some paths experience international detours, i.e., originate in one country, cross international boundaries and return to the same country. In most cases these are sensible traffic engineering or peering decisions at ISPs that serve multiple countries. In some cases such detours may be suspicious. Characterizing international detours is useful to a number of players: (a) network engineers trying to diagnose persistent problems, (b) policy makers aiming at adhering to certain national communication policies, (c) entrepreneurs looking for opportunities to deploy new networks, or (d) privacy-conscious states trying to minimize the amount of internal communication traversing different jurisdictions. In this paper we characterize international detours in the Internet during the month of January 2016. To detect detours we sample BGP RIBs every 8 hours from 461 RouteViews and RIPE RIS peers spanning 30 countries. Then geolocate visible ASes by geolocating each BGP prefix announced by each AS, mapping its presence at IXPs and geolocation infrastructure IPs. Finally, analyze each global BGP RIB entry looking for detours. Our analysis shows more than 5K unique BGP prefixes experienced a detour. A few ASes cause most detours and a small fraction of prefixes were affected the most. We observe about 544K detours. Detours either last for a few days or persist the entire month. Out of all the detours, more than 90% were transient detours that lasted for 72 hours or less. We also show different countries experience different characteristics of detours.
Links https://arxiv.org/pdf/1606.05047.pdf
Related Publications N/A
Date Posted April 2017
Dataset Spotlight:
Wisconsin's Web Cookies Data
Graphics
Description Wisconsin's Web Cookies Data.
The cookies in this data set were gathered from crawls of the top 100K Alexa web sites conducted in November, 2013 and April, 2015. Due to page request timeouts, our Crawler successfully visited 95,220 (95,311) web sites. Note, the set of web sites that caused a timeout is likely an artifact of our crawler, however even among the top 100K Alexa web sites, downtime is not uncommon. The data set is described in detail in the paper "An Empirical Study of Web Cookies", by Cahn et al., which appeared in WWW '16
Date Posted 1/23/2017
Dataset Spotlight:
Merit's Network Scanners from Darknet
Graphics
Description Merit's Network Scanners.
Summary of scanning activities observed at Merit's darknet monitor.
The dataset includes:
1) the IP addresses that have been observed in the darknet monitor to perform scanning,
2) the amount of unsolicited packets that these IPs transmitted.

This dataset is updated daily. A scanner is defined as an IP that has transmitted more than X amount of packets into Merit's NET35 darknet space (X is a configurable threshold currently set to 1000 packets).

REMARKS:
1) this dataset may also include backscatter activity or other types of traffic associated with network misconfiguration or other user errors. In other words, the fact that an origin IP has sent more than 1000 packets does not necessarily mean its performing scanning,
2) this dataset does not differentiate between malicious and benign scanning (e.g., for research purposes).

Researchers interested compiling their own list of scanners, using their own definition of scanning, may request access to Merit's darknet dataset ("Longitudinal Darknet 35/8").
Date Posted 1/23/2017
Dataset Spotlight:
Merit's Network Telescope Data from 35/8 Block
Graphics
Description This dataset captures unsolicited traffic, as recorded by Merit's 35/8 darknet. Darknet data (also known as Internet background radiation) consist of Internet packets that arrive to an unused Internet space, and are therefore usually associated with malicious activities such as horizontal Internet scanning, malware propagation, and "backscatter" from distributed denial of service (DDoS) attacks. This dataset is provided in the CAIDA Corsaro "flowtuple" format (see https://www.caida.org/tools/measurement/corsaro/). Researchers can use this dataset to identify scanners, check for evidence of DDoS activity (e.g., in the case of a TCP SYN flood attack with a spoofed source IP, the victim will reply with a TCP SYN-ACK to the spoofed IP; if the spoofed IP happened to be within the 35/8 address space, our darknet will capture the SYN-ACK replies) or extract other nefarious activities and network misconfigurations.
Date Posted 1/23/2017
New Feature:
The IMPACT Forum is LIVE!!!
Graphics
Description The IMPACT Forum is LIVE!!! This feature allows you to engage with the collective IMPACT community to ask questions about what data will assist your R&D, suggest new data to suit your needs, and in general take part in the collective analysis and knowledge building.
Date Posted 1/23/2017
Partner Spotlight:
Our Dutch partners at dcypher
Graphics
Description Interested in learning more about what other countries are doing in the ICT space? Check out what our Dutch partners are doing over at dcypher.

The Dutch cybersecurity platform for higher education and research is founded by the Dutch Ministries of Security & Justice, Economic Affairs, Education Culture & Science and The Netherlands Organisation for Scientific Research (NWO), department Physical Sciences (EW). dcypher is the answer to the objective of the National Cyber Security Strategy (NCSS2):

"The Netherlands has sufficient cyber security knowledge and skills and invests in ICT innovation to attain cyber security objectives ". dcypher unites researchers, teachers, manufacturers, users and policy makers to increase knowledge and expertise in the Netherlands in the area of cybersecurity.

Partner dcypher
Date Posted 1/23/2017
A Techno-Economic Framework for Broadband Deployment in Underserved Areas
Graphics
Author Paul Barford, Ramakrishnan Durairajan
Datasets Used Internet Atlas
Abstract A large body of economic research has shown the strong correlation between broadband connectivity and economic productivity (e.g., [1-3]). These findings motivate government agencies such as the FCC in the US to provide incentives to services providers to deploy broadband infrastructure in unserved or underserved areas. In this paper, we describe a framework for identifying target areas for network infrastructure deployment. Our approach considers (i) infrastructure availability, (ii) user demographics, and (iii) deployment costs. We use multi-objective optimization to identify geographic areas that have the highest concentrations of un/underserved users and that can be upgraded at the lowest cost. To demonstrate the efficacy of our framework, we consider physical infrastructure and demographic data from the US and two different deployment cost models. Our results identify a list of counties that would be attractive targets for broadband deployment from both cost and impact perspectives. We conclude with discussion on the implications and broader applications of our framework.
Links http://pages.cs.wisc.edu/~pb/gaia16_final.pdf
Related Publications R. W. Crandall, W. Lehr, and R. E. Litan, The Effects of Broadband Deployment on Output and Employment: A Crosssectional Analysis of US Data. Brookings Institution, 2007.

R. Katz, "The Impact of Broadband on the Economy: Research to Date and Policy Issues," Broadband Series, 2012.

N. Czernich, O. Falck, T. Kretschmer, and L. Woessmann, "Broadband Infrastructure and Economic Growth," The Economic Journal, 2011.

"Mapping the Digital Divide." https://www.whitehouse.gov/sites/default/files/wh_digital_divide_issue_brief.pdf.

D. J. Aron and D. E. Burnstein, "Broadband Adoption in the United States: An empirical analysis," Down to the Wire: Studies in the Diffusion and Regulation of Telecommunications Technologies, Allan L. Shampine, ed, 2003.

A. Rickert, A. Sacharow et al., "It's a Woman's World Wide Web," Media Metrix and Jupiter Communications, 2000.

L. Rainie, S. Fox, J. Horrigan, A. Lenhart, and T. Spooner, "Tracking Online Life: How Women Use the Internet to Cultivate Relationships with Family and Friends," Washington, DC: The Pew Internet and American Life Project, 2000.

I. Austen, "Studies Reveal a Rush of OlderWomen to theWeb." The New York Times., 2000.

G. L. Rohde, R. Shapiro et al., "Falling Through the Net: Toward Digital
Date Posted August 2016
Do You See Me Now? Sparsity in Passive Observations of Address Liveness (extended)
Graphics
Author Jelena Mirkovic, Genevieve Bartlett, John Heidemann, Hao Shi, Xiyue Deng
Datasets Used usc_lander_ongoing_tracing_scrambled-20120501
Abstract Full allocation of IPv4 addresses has prompted interest in measuring address liveness, first with active probing, and recently with the addition of passive observation. While prior work has investigated how to increase coverage by combining multiple sources, this paper explores what factors a ect a passive observer's view. All passive monitors are sparse, seeing only a part of the Internet. We seek to understand how different types of sparsity impact observation quality: the interests of external hosts and the hosts within the observed network, the temporal limitations on the observation duration, and coverage challenges to observe all traffic for a given target or a given vantage point. We study sparsity through inverted analysis-- a new approach where we use passive observations at three end networks to infer what of these networks would be seen by virtual monitors, located at all traffic destinations. We show that visibility provided by monitors is heavy-tailed--interest sparsity means popular monitors see a great deal, while 99% see very little. We find that traffic is mostly bipartite, with greater visibility between client-networks and server-networks, than within each group. Finally, we find that popular monitors are robust to temporal and coverage sparsity, but these sparsities greatly reduce power of monitors with initially low visibility.
Links http://www.isi.edu/publications/trpublic/files/tr-710.pdf
Related Publications

Bartlett, G., Heidemann, J., and Papadopoulos, C. Understanding passive and active service discovery. In Proc. of ACM IMC (San Diego, California, USA, Oct. 2007), ACM, pp. 57-70.

Dainotti, A., Benson, K., King, A., kc claffy, Glatz, E., Dimitropoulos, X., Richter, P., Finamore, A., and Snoeren, A. C. Lost in space: Improving inference of ipv4 address space utilization. CoRR abs/1410.6858 (2014).

Dainotti, A., Benson, K., King, A., kc claffy, Kallitsis, M., Glatz, E., and Dimitropoulos, X. Estimating Internet address space usage through passive measurements. ACM Computer Communication Review 44, 1 (Jan. 2014), 42-49.

Zander, S., Andrew, L. L., and Armitage, G. Capturing ghosts: predicting the used ipv4 space by inferring unobserved addresses. In Proceedings of the 2014 Conference on Internet Measurement Conference (2014), ACM, pp. 319-332.
Date Posted July 2016
An Empirical Study of Web Cookies
Graphics
Author A Cahn, S. Alfeld, P. Barford, S. Muthukrishnan
Datasets Used
Datasets Produced Web cookie data (forthcoming in IMPACT 11/2016)
Abstract Web cookies are used widely by publishers and 3rd parties to track users and their behaviors. Despite the ubiquitous use of cookies, there is little prior work on their characteristics such as standard attributes, placement policies, and the knowledge that can be amassed via 3rd party cookies. In this paper, we present an empirical study of web cookie characteristics, placement practices and information transmission. To conduct this study, we implemented a lightweight web crawler that tracks and stores the cookies as it navigates to websites. We use this crawler to collect over 3.2M cookies from the two crawls, separated by 18 months, of the top 100K Alexa web sites. We report on the general cookie characteristics and add context via a cookie category index and website genre labels. We consider privacy implications by examining specific cookie attributes and placement behavior of 3rd party cookies. We find that 3rd party cookies outnumber 1st party cookies by a factor of two, and we illuminate the connection between domain genres and cookie attributes. We find that less than 1% of the entities that place cookies can aggregate information across 75% of web sites. Finally, we consider the issue of information transmission and aggregation by domains via 3rd party cookies. We develop a mathematical framework to quantify user information leakage for a broad class of users, and present findings using real world domains. In particular, we demonstrate the interplay between a domain's footprint across the Internet and the browsing behavior of users, which has significant impact on information transmission.
Links http://pages.cs.wisc.edu/~pb/www16_final.pdf
Related Publications A. Barth. RFC 6265: HTTP State Management System, April 2011.

Andrew F. Tappenden and James Miller. Cookies: A Deployment Study and the Testing Implications. ACM Transactions on the Web, 3(3):9:1-9:49, July 2009.

Ashkan Soltani, Shannon Canty, Quentin Mayo, Lauren Thomas, and Chris Jay Hoofnagle. Flash Cookies and Privacy. 2009.

Balachander Krishnamurthy and Craig Wills. Privacy Diffusion on the Web: A Longitudinal Perspective. In Proceedings of the 18th International Conference on World Wide Web, WWW '09, pages 541-550, New York, NY, USA, 2009. ACM.

B. Krishnamurthy and C. Wills. Generating a Privacy Footprint on the Internet. In In Proceedings of the ACM Internet Measurement Conference, Rio de Janerio, Brazil, October 2006.

B. Krishnamurthy and C. Wills. Characterizing Privacy in Online Social Networks. In In Proceedings of the ACM SIGCOMM Workshop on Online Social Networks, Seattle, WA, August 2008.

B. Krishnamurthy and C. Wills. Privacy Leakage in Mobile Online Social Networks . In In Proceedings of the USENIX Workshop on Online Social Networks, Boston, MA, June 2010.

B. Krishnamurthy, D. Malandrino, and C. Wills. Measuring Privacy Loss and the Impact of Privacy Protection in Web Browsing. In In Proceedings of the Symposium on Usable Privacy and Security, Pittsburgh, PA, July 2007.

D. Kristol and L. Montulli. RFC 2109: HTTP State Management System, February 1997.

D. Kristol and L. Montulli. RFC 2965: HTTP State Management System, October 2000.

D. Malandrino, L. Serra, A. Petta, V. Scarano, R. Spinelli, and B. Krishnamurthy. Privacy Awareness about Information Leakage: Who knows what about me? In In Proceedings of the Workshop on Privacy in the Electronic Society, Berlin, Germany, November 2013.

Franziska Roesner, Tadayoshi Kohno, and David Wetherall. Detecting and Defending Against Third-party Tracking on the Web. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI'12, pages 12-12, Berkeley, CA, USA, 2012. USENIX Association.

J. Mayer and J. Mitchell. Third-Party Web Tracking: Policy and Technology. In In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, May 2012.

K. Borders and A. Prakash. Towards Quantification of Network-based Information Leaks via HTTP. In In Proceedings of the Third USENIX Workshop on Hot Topics in Security (HotSEC), San Jose, CA, May 2008.

Rodica Tirtea. Bittersweet cookies some security and privacy considerations. Heraklion, 2011.

Steven Englehardt, Dillon Reisman, Christian Eubank, Peter Zimmerman, Jonathan Mayer, Arvind Narayanan, and Edward W Felten. Cookies that give you away: The surveillance implications of web tracking. In Proceedings of the 24th International Conference on World Wide Web, pages 289-299. International World Wide Web Conferences Steering Committee, 2015.
Date Posted 4/1/2016
Anycast Latency: How Many Sites Are Enough?
Graphics
Author R. de O. Schmidt, J. Heidemann, and J. Harm Kuipers
Datasets Used Skitter Internet Topology Data Kits (ITDKs)
Abstract Anycast is widely used today to provide important services including naming and content, with DNS and Content Delivery Networks (CDNs). An anycast service uses multiple sites to provide high availability, capacity and redundancy, with BGP routing associating users to nearby anycast sites. Routing defines the catchment of the users that each site serves. Although prior work has studied how users associate with anycast services informally, in this paper we examine the key question how many anycast sites are needed to provide good latency, and the worst case latencies that specific deployments see. To answer this question, we must first define the optimal performance that is possible, then explore how routing, specific anycast policies, and site location affect performance. We develop a new method capable of determining optimal performance and use it to study four real-world anycast services operated by different organizations: C-, F-, K-, and L-Root, each part of the Root DNS service. We measure their performance from more than 7,900 worldwide vantage points (VPs) in RIPE Atlas. (Given the VPs uneven geographic distribution, we evaluate and control for potential bias.) Key results of our study are to show that a few sites can provide performance nearly as good as many, and that geographic location and good connectivity have a far stronger effect on latency than having many nodes. We show how often users see the closest anycast site, and how strongly routing policy affects site selection.
Links http://www.isi.edu/~johnh/PAPERS/Schmidt16a.pdf
Related Publications Adhikari, V. K., Jain, S., Chen, Y., and Zhang, Z.-L. Vivisecting YouTube: An Active Measurement Study. In Proceedings of the IEEE INFOCOM (2012), pp. 2521{2525.

Akhtar, Z., Hussain, A., Katz-Bassett, E., and Govindan, R. DBit: Assessing Statistically Significant Differences in CDN Performance. In Proceedings of the IFIP Trac Monitoring and Analysis (TMA) (2016).

Ballani, H., and Francis, P. Towards a Global IP Anycast Service. In Proceedings of theACM SIGCOMM (2007), pp. 301{312.

Ballani, H., Francis, P., and Ratnasamy, S. A Measuremnet-based Deployment Proposal for IP Anycast. In Proceedings of the ACM Internet Measurement Conference (2006), IMC, pp. 231{244.

Bellis, R. Researching F-root Anycast Placement Using RIPE Atlas. https://labs.ripe.net/, 2015.

Boothe, P., and Bush, R. Anycast Measurements Used to Highlight Routing Instabilities. NANOG 34, 2005.

Brownlee, N., kc claffy, and Nemeth, E. DNS Root/gTLD Performance Measurement. In Proceedings of the USENIX LISA conference (2001), pp. 241{255.

Brownlee, N., and Ziedins, I. Response Time Distributions for Global Name Servers. In Proceedings of the International conference on Passive and Active Measurements (2002), PAM.

Bush, R. DNS Anycast Stability: Some Initial Results. CAIDA/WIDE Workshop, 2005.

CAIDA. Skitter. http://www.caida.org/tools/measurement/skitter/.

Calder, M., Fan, X., Hu, Z., Katz-Bassett, E., Heidemann, J., and Govindan, R. Mapping the Expansion of Google's Serving Infrastructure. In Proceedings of the ACM Internet Measurement Conference (2013), IMC, pp. 313{326.

Calder, M., Flavel, A., Katz-Bassett, E., Mahajan, R., and Padhye, J. Analyzing the Performance of an Anycast CDN. In Proceedings of the ACM Internet Measurement Conference (2015), IMC, pp. 531{537.

Castro, S., Wessels, D., Fomenkov, M., and Claffy, K. A Day at the Root of the Internet. ACM Computer Communication Review 38, 5 (2008), 41{46.

Cicalese, D., Auge, J., Joumblatt, D., Friedman, T., and Rossi, D. Characterizing IPv4 Anycast Adoption and Deployment. In Proceedings of the ACM CoNEXT (2015).

Cicalese, D., Joumblatt, D., Rossi, D., Buob, M.-O., Auge, J., and Friedman, T. A Fistful of Pings: Accurate and Lightweight Anycast Enummeration and Geolocation. In Proceedings of the IEEE INFOCOM (2015), pp. 2776{2784.

Colitti, L. Eect of anycast on K-root. 1st DNS-OARC Workshop, 2005.

Fan, X., Katz-Bassett, E., and Heidemann, J. Assessing A

nity Between Users and CDN

Sites. In Proceedings of the 7th IEEE International Workshop on Trac Monitoring and Analysis (2015), TMA, pp. 95{110.

Giordano, D., Cicalese, D., Finamore, A., Mellia, M., Munafo, M., Joumblatt, D. Z., and Rossi, D. A First Characterization of Anycast Trac from Passive Traces. In Proceedings of the IFIP Trac Monitoring and Analysis Workshop (TMA) (2016).

Lee, B.-S., Tan, Y. S., Sekiya, Y., Narishige, A., and Date, S. Availability and Effectiveness of Root DNS servers: A long term study. In Proceedings of the IEEE Network Operations and Management Symposium (2010), NOMS, pp. 862{865.

Lee, T., Huffaker, B., Fomenkov, M., and kc claffy. On the problem of optimization of DNS root servers' placement. In Proceedings of the International conference on Passive and Active Measurements (2003), PAM.

Liang, J., Jiang, J., Duan, H., Li, K., andWu, J. Measuring Query Latency of Top Level DNS Servers. In Proceedings of the 14th International conference on Passive and Active Measurements (2013), PAM, pp. 145{154.

Liu, Z., Huffaker, B., Fomenkov, M., Brownlee, N., and kc claffy. Two Days in the Life of the DNS Anycast Root Servers. In Proceedings of the 8th International conference on Passive and Active Measurements (2007), PAM, pp. 125{134.

Pang, J., Hendricks, J., Akella, A., Prisco, R. D., Maggs, B., and Seshan, S. Availability, Usage, and Deployment Characteristics of the Domain Name Server. In Proceedings of the ACM Internet Measurement Conference (2004), IMC, pp. 1{14.

Sarat, S., Pappas, V., and Terzis, A. On the use of Anycast in DNS. In Proceedings of the 15th International Conference on Computer Communications and Networks (2006), ICCCN, pp. 71{78.

Streibelt, F., Bottger, J., Chatzis, N., Smaragdakis, G., and Feldman, A. Exploring EDNS-Client-Subnet Adopters in your Free Time. In Proceedings of the ACM Internet Measurement Conference (2013), IMC, pp. 305{312.

Torres, R., Finamore, A., Kim, J. R., Mellia, M., Munafo, M. M., and Rao, S. Dissecting Video Server Selection Strategies in the YouTube CDN. In Proceedings of the 31st International Conference on Distributed Computing Systems (2011), ICDCS, pp. 248{257.
Date Posted 5/1/2016
Lost in Space: Improving Inference of IPv4 Address Space Utilization
Graphics
Author A. Dainotti, K. Benson, A. King, B. Huffaker, E. Glatz, X. Dimitropoulos, P. Richter, A. Finamore, and A. Snoeren
Datasets Used IPv6 Topology; IPv4 Routed /24 DNS Names
IPv4 Routed /24 Topology
internet_address_census_it49c-20120731
Abstract One challenge in understanding the evolution of Internet infrastructure is the lack of systematic mechanisms for monitoring the extent to which allocated IP addresses are actually used. In this paper we try to advance the science of inferring IPv4 address space utilization by analyzing and correlating results obtained through different types of measurements. We have previously studied an approach based on passive measurements that can reveal used portions of the address space unseen by active approaches. In this paper, we study such passive approaches in detail, extending our methodology to four different types of vantage points, identifying traffic components that most significantly contribute to discovering used IPv4 network blocks. We then combine the results we obtained through passive measurements together with data from active measurement studies, as well as measurements from BGP and additional datasets available to researchers. Through the analysis of this large collection of heterogeneous datasets, we substantially improve the state of the art in terms of: (i) understanding the challenges and opportunities in using passive and active techniques to study address utilization; and (ii) knowledge of the utilization of the IPv4 space.
Links https://www.caida.org/publications/papers/2014/lost_in_space/lost_in_space.pdf
Related Publications Dainotti, A. King. CAIDA Blog: Carna botnet scans conrmed. http://blog.caida.org/best_available_data/2013/05/13/carna-botnet-scans/.

X. Cai and J. Heidemann. Understanding block-level address usage in the visible internet. In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, pages 99{110, New York, NY, USA, 2010. ACM.

Geo Huston. IPv4 Address Report. http://www.potaroo.net/tools/ipv4/.

Z. Durumeric, E. Wustrow, and J. A. Halderman. ZMap: Fast Internet-wide scanning and its security applications. In Proceedings of the 22nd USENIX Security Symposium, 2013.

X. Meng, Z. Xu, B. Zhang, G. Huston, S. Lu, and L. Zhang. Ipv4 address allocation and the bgp routing table evolution. SIGCOMM Comput. Commun. Rev., 35(1):71{80, Jan. 2005.

S. Zander, L. L. H. Andrew, G. Armitagei, and G. Huston. Estimating IPv4 Address Space Usage with Capture-recapture. In IEEE Workshop on Network Measurements (WNM 2013).

Geo Huston. Delegated address space: extended report, 2013. http://bgp.potaroo.net/stats/nro/archive/ delegated-nro-extended-20131001.

H.D. Moore. Project Sonar), 2008. https://community.rapid7.com/community/infosec/sonar/blog.

J. Heidemann, Y. Pradkin, R. Govindan, C. Papadopoulos, G. Bartlett, and J. Bannister. Census and survey of the visible Internet. In 8th ACM SIGCOMM conference on Internet measurement, IMC '08, 2008.

J. Horchert and C. Stocker. Mapping the internet: A hacker's secret internet census. Spiegel Online, March 2013.

G. Huston. Ipv4: How long do we have? The Internet Protocol Journal, 6(4):2008{2010, 2003.

G. Huston. Ipv4 address depletion and transition to ipv6. Internet Protocol Journal, 9(10):18{28, 2007.

G. Huston. The changing foundation of the internet: confronting ipv4 address exhaustion. The Internet Protocol Journal, 11(3):19{36, 2008.
Date Posted 6/1/2016
IXP Directory Summary
Graphics
Author Packet Clearing House
Datasets Used PCH - IXP Member Lists
Abstract PCH supports research and Internet operations through the development and hosting of analytical tools that utilize IMPACT funded datasets. These tools provide analysis of infrastructure and operational data that provides ongoing value to the global Internet operations community and the public. Summary reports on IXP growth by country and region, as well as IPv6 adoption at IXPs and global root server locations.
Links https://prefix.pch.net/applications/ixpdir/summary/
Related Publications
Date Posted 10/19/2016
Dataset Spotlight:
Netalyzr
Graphics
Short Description Partially redacted Netalyzr data
Long Description Data collected by the Netalyzr tool of end user measurements. Public IP Subnets partially redacted for both client and other systems in the same subnet, GeoLite lookup performed before redaction. Session IDs replaced. Hostname redacted on reverse DNS but domain maintained. MAC addresses and similar data redacted. Some fields removed. UPnP detailed statistics removed, transcripts removed. Small-sample networks and .gov/.mil networks removed. Deanonymization and contacting participants prohibited.
Provider International Computer Science Institute (ICSI)
Status Approved
Category Performance and Quality Measurements
Sub Category Netalyzr Data
Commercial Allowed True
Restriction Type Restricted
Restriction Terms ********** The data use agreement for the selected dataset has not been finalized yet. You will be contacted with an updated MOA when it is ready. **********
Size in Bytes
Start Date 2011-09-01
End Date 2014-07-01
Dataset Spotlight:
Certificates Data
Graphics
Short Description Website X.509 certificate fields from the certificates
Long Description Certificate fields from the certificates of Top 1 Million Websites (as listed by Alexa), Phishing Websites (as listed by PhishTank), and Bank websites (as listed by FDIC). In addition to certificates collected locally, we also have a PlanetLab observatory that is actively collecting Top 1 Million websites' certificates from different continents. The dataset also includes information about the connections made to obtain the certificates, thus it is possible to trace the date and the source of the connection made to collect certain certificate.
Provider Indiana University
Status Approved
Category Application Layer Security Data
Sub Category X509 Certificates
Commercial Allowed False
Restriction Type Unrestricted
Restriction Terms
Size in Bytes 450
Start Date 2012-12-01
End Date 2020-01-01
BotDigger: Detecting DGA Bots in a Single Network
Graphics
Author Han Zhang, Colorado State University
Manaf Gharaibeh, Colorado State University
Spiros Thanasoulas, Colorado State University
Christos Papadopoulos, Colorado State University
Datasets Used Botnet Dataset
Background Dataset
Abstract To improve the resiliency of communication between bots and C&C servers, bot masters began utilizing Domain Generation Algorithms (DGA) in recent years. Many systems have been introduced to detect DGA-based botnets. However, they suffer from several limitations, such as requiring DNS traffic collected across many networks, the presence of multiple bots from the same botnet, and so forth. These limitations make it very hard to detect individual bots when using traffic collected from a single network. In this paper, we introduce BotDigger, a system that detects DGA-based bots using DNS traffic without a priori knowledge of the domain generation algorithm. BotDigger utilizes a chain of evidence, including quantity, temporal and linguistic evidence to detect an individual bot by only monitoring traffic at the DNS servers of a single network. We evaluate BotDigger's performance using traces from two DGA-based botnets: Kraken and Conflicker. Our results show that BotDigger detects all the Kraken bots and 99.8% of Conficker bots. A one-week DNS trace captured from our university and three traces collected from our research lab are used to evaluate false positives. The results show that the false positive rates are 0.05% and 0.39% for these two groups of background traces, respectively.
Links http://www.cs.colostate.edu/~hanzhang/papers/BotDigger-techReport.pdf
Related Publications Antonakakis, M., Perdisci, R., Nadji, Y., Vasiloglou II, N., Abu-Nimeh, S., Lee, W., Dagon, D.: From throw-away tra_c to bots: Detecting the rise of dga-based malware. In: USENIX security symposium (2012)

Antonakakis, M., Perdisci, R., Lee, W., Vasiloglou, II, N., Dagon, D.: Detecting malware domains at the upper dns hierarchy. In: Proceedings of the 20th USENIX Conference on Security (2011), http://dl.acm.org/citation.cfm?id=2028067. 2028094

Bilge, L., Kirda, E., Kruegel, C., Balduzzi, M.: Exposure: Finding malicious domains using passive dns analysis. In: NDSS (2011)

Br, A., Paciello, A., Romirer-Maierhofer, P.: Trapping botnets by dns failure graphs: Validation, extension and application to a 3g network. In: INFOCOM. pp. 3159-3164. IEEE (2013), http://dblp.uni-trier.de/db/conf/ infocom/infocom2013.html#BarPR13

Manning, C.D., Raghavan, P., Sch¨utze, H., et al.: Introduction to information retrieval, vol. 1. Cambridge university press Cambridge (2008)

Prieto, I., Magana, E., Morat´o, D., Izal, M.: Botnet detection based on dns records and active probing. In: Proceedings of the International Conference on Security and Cryptography (SECRYPT) (2011)

Schiavoni, S., Maggi, F., Cavallaro, L., Zanero, S.: Phoenix: Dga-based botnet tracking and intelligence. In: Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 192-211. Springer (2014)

Yadav, S., Reddy, A.K.K., Reddy, A., Ranjan, S.: Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement. pp. 48-61. ACM (2010)
Date Posted 1/21/2016
US Long-Haul Fiber Map
Graphics
Author Paul Barford, UW-Madison
Datasets Used US Long-haul Infrastructure Topology
Abstract Despite some 20 years of research efforts that have focused on understanding aspects of the Internet's infrastructure such as its router-level topology or AS-level connectivity, very little is known about today's physical Internet where individual components such as cell towers, routers or switches, and fiber-optic cables are concrete entities with well-defined geographic locations. This data set provides a first-of-its-kind map of the US long haul fiber infrastructure. The map is a composite of openly available maps of 20 US fiber providers and is comprised of 273 nodes and 542 conduits. Importantly, the details of the connectivity and shared use of conduits has been verified using public records on rights-of-way.

Details of the map and an evaluation of shared risk and opportunities to improve robustness and performance are described in our ACM SIGCOMM '15 paper entitled "InterTubes: A Study of the US Long-haul Fiber-optic Infrastructure".

The dataset that is made available through IMPACT includes the image of the map as well as a table that provides all details on connectivity represented in the map.

If you have an account and would like to request this dataset, follow these directions:
1) Login and click Browse Catalog
2) Click the Unrestricted radio button and wait for the page to refresh
3) Enter "US long haul map" and click Submit Search
4) Click on Request Selected Dataset button at the bottom of the page
Links Link to IA in data catalog
Related Publications
Date Posted
GT Malware Passive DNS Data Daily Feed
Graphics
Author Adam Allred
Datasets Used GT Malware Passive DNS Data Daily Feed
Abstract Each day, the Georgia Tech Information Security Center (GTISC) processes over 100,000 previously unseen, suspect Windows executable files. To derive network-level information that can help make the potential maliciousness of these files self-identifying, each executable is run in a sterile, isolated environment for a short period of time, with limited access to the Internet.

During processing, each executable's use of the Domain Name System (DNS) is recorded in both raw (packet capture) and simplified plaintext formats. The plaintext format, which contains a subset of information present in the PCAP files, comprises a series of 4-tuples that provide the executable's MD5 hash, the date in which the executable was processed, the qname (domain name) of the DNS query, and (if the query was of type A) a resolution IP address for the domain name.

As of July 2015, a daily feed comprising both raw and plaintext versions of the data is available through DHS IMPACT. In aggregate, this information represents a special kind of passive DNS database for suspect and known malicious software, which GTISC believes will be useful for a variety of research and operational purposes.

Once logged in, the GT Malware Passive DNS Data Daily Feed can be requested via the data catalog.
Links
Related Publications
Date Posted
Distributed Denial of Service (DDoS) attacks
Graphics
Author Michalis Kallitsis and Manish Karir, Merit Network, Inc.
Datasets Used DDoS NTP Dataset
Abstract Distributed Denial of Service (DDoS) attacks that rely on the Network Time Protocol became prominent towards the end of December 2013. DDoS-based high attack volumes persisted for a few months within early 2014, until network operators started mitigating the problem and securing their networks. The NTP dataset provided by Merit Network captures the attacks at their peak level (reaching rates that exceed 300MBps or, equivalently, 2.4Gbps), as recorded by Merit's Netflow telescope. The dataset spans a period of 15 days, starting February 14, 2014.
Links
Related Publications Taming the 800 Pound Gorilla: The Rise and Decline of NTP DDoS Attacks by Jakub Czyz, Michael Kallitsis, Manaf Gharaibeh, Christos Papadopoulos, Michael Bailey, Manish Karir, to appear In Proceedings of the 14th ACM SIGCOMM Conference on Internet Measurement (IMC '14), Vancouver, BC, Canada, November 2014
Date Posted
GT Malware Passive DNS Data
Graphics
Author Paul Royal, Georgia Tech
Datasets Used GT Malware Passive DNS Data
Abstract Each day, the Georgia Tech Information Security Center (GTISC) processes many thousands of previously unseen, suspect Windows executable files. To derive network-level information that can help make the possible maliciousness of these files self-identifying, each executable is run in a sterile, isolated environment for a short period of time, with limited access to the Internet.

Each executable's use of the Domain Name System (DNS) is recorded and used to create 4-tuple comprising the executable's MD5 hash, the date in which the executable was processed, the qname (domain name) of the DNS query, and (if the query was of type A) a resolve IP address for the domain name.

Between 2011 and 2014, more than 8.5 million unique executables processed by GTISC made at least one use of the DNS. In aggregate, this information represents a special kind of passive DNS database for suspect and known malicious software, which GTISC believes will be useful for a variety of research and operational purposes.

The database is now available through IMPACT as an unrestricted dataset, and is labeled GT Malware Passive DNS Data.
Links
Related Publications
Date Posted
Internet Atlas
Graphics
Author Paul Barford, University of Wisconsin
Datasets Used
Abstract Internet Atlas is a visualization and analysis portal for diverse Internet measurement data. The starting point for Atlas is a geographically anchored representation of the physical Internet including (i) nodes (e.g., hosting facilities and data centers), (ii) conduits/links that connect these nodes, and (iii) relevant meta data (e.g., source provenance). This physical representation is built by using search to identify primary source data such as maps and other repositories of service provider network information. This data is then carefully entered into the database using a combination of manual and automated processes including consistency checks and methods for geocoding both node and link data. Data is added to the repository on an on-going basis. The repository currently contains over 10k PoP locations and over 13K links for over 390 networks around the world. Customized interfaces enable a variety of dynamic (e.g., BGP updates, Twitter feeds and weather updates) and static (e.g., highway, rail and census) data to be imported into Atlas, and to layer it on top of the physical representation. The openly available web portal is based on the widely-used ArcGIS geographic information system, which enables visualization and diverse spatial analyses of the data.
Links http://atlas.wail.wisc.edu/
Related Publications R. Durairajan, S. Ghosh, X. Tang, P. Barford and B. Eriksson. "Internet Atlas: A Geographic Database of the Internet", In Proceedings of the 5th ACM HotPlan et Workshop, August, 2013. (paper)

B. Eriksson, R. Durairajan and P. Barford. "RiskRoute: A Framework for Mitigating Network Outage Threats", In Proceedings of ACM CoNEXT, December, 2013. (paper)
Date Posted
A Preliminary Analysis of Network Outages During Hurricane Sandy
Graphics
Author John Heidemann, Lin Quan and Yuri Pradkin, USC/Information Sciences Institute
Datasets Used internet_address_survey_reprobing_it50j
internet_outage_survey_it50j
Abstract Whether due to natural phenomena or man-made actions, it is important to detect and analyze Internet outages. We analyzed Internet outages during the October 2012 Hurricane Sandy. We assessed network reliability by pinging a sample of networks and observing those that responded and then stopped responding. While there are always occasional network outages, we saw that the outage rate in U.S. networks doubled when the hurricane made landfall, then it took about four days to recover. We confirmed that this increase was due to the outages in New York and New Jersey.
Links Tech report: http://www.isi.edu/~johnh/PAPERS/Heidemann12d.pdf

Video from FCC presentation: http://www.isi.edu/ant/outage/fcc2013.html
video also displayed on the page

Presentation at NANOG 57: http://www.nanog.org/meetings/abstract?id=2051
Related Publications R. Durairajan, S. Ghosh, X. Tang, P. Barford and B. Eriksson. "Internet Atlas: A Geographic Database of the Internet", In Proceedings of the 5th ACM HotPlan et Workshop, August, 2013. (paper)

B. Eriksson, R. Durairajan and P. Barford. "RiskRoute: A Framework for Mitigating Network Outage Threats", In Proceedings of ACM CoNEXT, December, 2013. (paper)
Date Posted