This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

Disclaimer:
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.

Summary

DS-0938
Detecting Malicious URLs
External Dataset
External Data Source
University California. San Diego
Unknown
Data collection is ongoing
56 (lowest rank is 56)

Category & Restrictions

Other
cyber crime
Unrestricted
Unknown

Description


Data used in machine learning experiments to detect malicious URLs.

The long-term goal of this research is to construct a real-time system that uses machine learning techniques to detect malicious URLs (spam, phishing, exploits, and so on). This dataset shows the recorded attempts to use machine learning to detect malicious URLs. UCSD explored techniques that involve classifying URLs based on their lexical and host-based features, as well as online learning to process large numbers of examples and adapt quickly to evolving URLs over time. The data set consists of about 2.4 million URLs (examples) and 3.2 million features. ; csestudent@eng.ucsd.edu
This dataset is the subject of ongoing measurement and data collection. As such the data is continuously growing. Researchers who are granted access will be able to download updates for a period of one year after their request.

Additional Details

N/A
Size is growing as more data is collected
false
true
urls, malicious, detecting, 938, detecting malicious urls, inferlink, inferlink corporation, corporation, external data source, source, external, learning, machine, detect, experiments, based, techniques, ucsd, time, features, examples, system, dataset, explored, evolving, online, consists, spam, classifying, involve, real, exploits, other, host, term, attempts, csestudent, phishing, lexical, process, goal, adapt, eng, construct