16 Jun
12:00 - 13:00

UM Data Science Research Seminar

The UM Data Science Research Seminar Series are monthly sessions organized by the Institute of Data Science, on behalf of the UM Data Science Community, in collaboration with different departments across UM with the aim to bring together data scientists from Maastricht University to discuss breakthroughs and research topics related to Data Science.

This session is organized in collaboration with Law & Tech Lab.

Schedule

 

Time: 12:00 - 12:45

SpeakerVageesh Saxena

Title: Utilizing NLP for Linking and Identifying Vendors on Darknet Markets.

Abstract: The Dark Web, aka Darknet, is a collection of countless hidden websites that facilitates various criminal activities, including financial frauds, hacking services, child sexual exploitation, and trafficking of drugs, organs, weapons, and even humans. While law enforcement is actively seeking such criminal activities, the anonymity and the vast scope of the Darknet favors these criminals to stay undetected. Gauging the scope and size of a Darknet market is another challenging task that can help law enforcement prioritize their resources by tracing and linking vendors across various markets and existing criminal databases.
Therefore, to aid the law enforcement agencies, we propose a framework that can look into the writing style of different vendors from the Darknet advertisements and link them across and within seven distinct Darknet markets.

This research emphasizes providing solutions to the following concerns: (1) Vendors on the Darknet often distribute their business across multiple markets to stay under the radar of law enforcement. Therefore, we establish a BERT-based supervised baseline with an accuracy of 0.91 to classify 3,896 vendors across three Darknet markets in an open set multi-class classification setting.(2) Often, vendors on the Darknet change their vendor handles within and across multiple markets to stay undetected. Therefore, using our established baseline, we extract the sentence embeddings and compute the representational similarity to identify vendors with identical advertisements.(3) Countless new markets emerge every day on Darknet.
Unfortunately, not all law enforcement have the resources to train the resource extensive SOTA classifiers. Therefore, we finally perform knowledge distillation from our established baseline to a smaller network for an emerging low-resource market and claim comparable performance to SOTA architectures.

Finally, we claim to identify 201 migrants and 57 aliases across Alphabay, Dreams, Silk Road-1, Traderoute, Agora, Valhalla, and Berlusconi Darknet markets through our research. We believe that law enforcement can benefit from our framework by following our approach and training the established baseline on an extensive criminal database.

Organisers

 

Law & Tech Lab

Gijs van Dijck

IDS

Vikas Jaiman

Maryam Mohammadi

Bernice Breuer