This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.


Detecting Malicious URLs
External Dataset
External Data Source
University California. San Diego
Data collection is ongoing
56 (lowest rank is 56)

Category & Restrictions

cyber crime


Data used in machine learning experiments to detect malicious URLs.

The long-term goal of this research is to construct a real-time system that uses machine learning techniques to detect malicious URLs (spam, phishing, exploits, and so on). This dataset shows the recorded attempts to use machine learning to detect malicious URLs. UCSD explored techniques that involve classifying URLs based on their lexical and host-based features, as well as online learning to process large numbers of examples and adapt quickly to evolving URLs over time. The data set consists of about 2.4 million URLs (examples) and 3.2 million features. ;
This dataset is the subject of ongoing measurement and data collection. As such the data is continuously growing. Researchers who are granted access will be able to download updates for a period of one year after their request.

Additional Details

Size is growing as more data is collected
urls, malicious, detecting, 938, detecting malicious urls, inferlink, inferlink corporation, corporation, external data source, source, external, learning, machine, detect, experiments, based, techniques, ucsd, time, features, examples, system, dataset, explored, evolving, online, consists, spam, classifying, involve, real, exploits, other, host, term, attempts, csestudent, phishing, lexical, process, goal, adapt, eng, construct