This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

Disclaimer:
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.

Summary

DS-0921
Tor-nonTor dataset
External Dataset
External Data Source
University of New Brunswick
Unknown
Unknown
53 (lowest rank is 53)

Category & Restrictions

Other
simulated attacks, malicious traffic, malware, human behavior
Unrestricted
Unknown

Description


Real-world Internet traffic data collected from browsing, email chat, audio streaming, video streaming, voip, and p2p

To be sure about the quantity and diversity of this dataset in CIC, we defined a set of tasks to generate a representative dataset of real-world traffic. We created three users for the browser traffic collection and two users for the communication parts such as chat, mail, FTP, p2p, etc. For the non-Tor traffic we used previous benign traffic from VPN project and for the Tor traffic we used 7 traffic categories:

Browsing: Under this label we have HTTP and HTTPS traffic generated by users while browsing (Firefox and Chrome).

Email: Traffic samples generated using a Thunderbird client, and Alice and Bob Gmail accounts. The clients were configured to deliver mail through SMTP/S, and receive it using POP3/SSL in one client and IMAP/SSL in the other.

Chat: The chat label identifies instant-messaging applications. Under this label we have Facebook and Hangouts via web browser, Skype, and IAM and ICQ using an application called pidgin.

Audio-Streaming: The streaming label identifies audio applications that require a continuous and steady stream of data. We captured traffic from Spotify.

Video-Streaming: The streaming label identifies video applications that require a continuous and steady stream of data. We captured traffic from YouTube (HTML5 and flash versions) and Vimeo services using Chrome and Firefox.

FTP: This label identifies traffic applications whose main purpose is to send or receive files and documents. For our dataset we captured Skype file transfers, FTP over SSH (SFTP) and FTP over SSL (FTPS) traffic sessions.

VoIP: The Voice over IP label groups all traffic generated by voice applications. Within this label we captured voice-calls using Facebook, Hangouts and Skype.

P2P: This label is used to identify file-sharing protocols like Bittorrent. To generate this traffic we downloaded different .torrent files from the Kali linux distribution and captured traffic sessions using the Vuze application. We also used different combinations of upload and download speeds.

The traffic was captured using Wireshark and tcpdump, generating a total of 22GB of data. To facilitate the labeling process, as we explained in the related published paper, we captured the outgoing traffic at the workstation and the gateway simultaneously, collecting a set of pairs of .pcap files: one regular traffic pcap (workstation) and one Tor traffic pcap (gateway) file.

Later, we labelled the captured traffic in two steps. First, we processed the .pcap files captured at the workstation: we extracted the flows, and we confirmed that the majority of traffic flows were generated by application X (Skype, ftps, etc.), the object of the traffic capture. Then, we labelled all flows from the Tor .pcap file as X. ; cic@unb.ca ; a.habibi.l@unb.ca

Additional Details

N/A
false
Unknown
web scraping, internet mail protocols, microcomputers, macos file sharing software, dark web, upload, html5 audio, proxy server, firefox, classes of computers, clear text protocols, download websites, microsoft, bittorrent, streaming media, teleconferencing, transport layer security, wireshark, external data source, history of the internet, social networking websites, web standards, icq, internet protocol, communication, vuze, adware, windows file sharing software, file sharing, online services, uniform resource identifier, mozilla thunderbird, hypertext transfer protocol clients, computer network analysis, hypertext transfer protocol, internet privacy, internet security, workstation, voip software, internet traffic, instant messaging protocols, tor-nontor dataset, online chat, history of computing, communication software, secure shell, 921, ebay, voice over ip, gmail, youtube, file transfer protocol, web browser, voip services, network analyzers, internet message access protocol, communication protocol, cryptographic protocol, tcpdump, application layer protocols, skype, cryptographic software, streaming, tor, free network related software, pcap, browser, cloud infrastructure attacks failures, instant messaging, email, website, videotelephony, malware, simple mail transfer protocol, facebook, inferlink corporation, download com, computer file formats, distributed data storage, centralized computing, network architecture, secure communication, ssh communications security, torrent file, vimeo, virtual private network, computer mediated communication, unix internet software, media sharing, internet broadcasting, internet relay chat, aol