[Return to Search]

This is a non-IMPACT record, meaning that access to the data is not controlled by IMPACT. For access, see the directions below.

Disclaimer:
This Resource is offered and provided outside of the IMPACT mediation framework. IMPACT and the IMPACT Coordination Council/Blackfire Technology, Inc. expressly disclaim all conditions, representations and warranties including but not limited to Resource availability, quality, accuracy, non-infringement, and non-interference. All Resource information and access is controlled by entities and under terms that are external to the IMPACT legal framework.

Summary

Dataset ID

DS-0938

DOI

10.23721/100/1478802

Name

Detecting Malicious URLs

Record Type

External Dataset

Provider

External Data Source

Host

University California. San Diego

Collection Starting

Unknown

Collection Ending

Data collection is ongoing

Popularity Rank

56 (lowest rank is 56)

Category & Restrictions

Description

Short Description
Data used in machine learning experiments to detect malicious URLs.

Long Description
The long-term goal of this research is to construct a real-time system that uses machine learning techniques to detect malicious URLs (spam, phishing, exploits, and so on). This dataset shows the recorded attempts to use machine learning to detect malicious URLs. UCSD explored techniques that involve classifying URLs based on their lexical and host-based features, as well as online learning to process large numbers of examples and adapt quickly to evolving URLs over time. The data set consists of about 2.4 million URLs (examples) and 3.2 million features. ; csestudent@eng.ucsd.edu

External URL
http://www.sysnet.ucsd.edu/projects/url/

Ongoing Measurement

This dataset is the subject of ongoing measurement and data collection. As such the data is continuously growing. Researchers who are granted access will be able to download updates for a period of one year after their request.

Additional Details

Size

N/A

Size is growing as more data is collected

Anonymized

false

Ongoing Measurement

true

Generated Keywords

urls, malicious, detecting, 938, detecting malicious urls, inferlink, inferlink corporation, corporation, external data source, source, external, learning, machine, detect, experiments, based, techniques, ucsd, time, features, examples, system, dataset, explored, evolving, online, consists, spam, classifying, involve, real, exploits, other, host, term, attempts, csestudent, phishing, lexical, process, goal, adapt, eng, construct

Additional Keywords

Dataset Details

Summary

Category & Restrictions

Description

Additional Details