Publication Details
Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence
Horák Adam, Ing. (DIFS FIT BUT)
Polišenský Jan, Bc. (FIT BUT)
Jeřábek Kamil, Ing., Ph.D. (DIFS FIT BUT)
Ryšavý Ondřej, doc. Ing., Ph.D. (DIFS FIT BUT)
Phishing, Domain, Detection, Machine learning, XGBoost, Features, DNS, RDAP, TLS, GeoIP
In the digital landscape, phishing attacks have rapidly evolved into a major cybersecurity challenge, posing significant risks to individuals and organizations. This short paper presents our preliminary research on detecting phishing domains. Our approach amalgamates intelligence from multiple sources: DNS servers, WHOIS/RDAP, TLS certificates, and GeoIP data. We created a rich 15.8 GB dataset of information about benign and phishing domains, from which we derived a comprehensive 80-feature vector for training and testing machine learning classifiers. We propose preliminary results with a fine-tuned XGBoost model, achieving 0.9716 precision rate, 0.9540 F-1 score, and false positive rate of 0.23%.
@INPROCEEDINGS{FITPUB13073, author = "Radek Hranick\'{y} and Adam Hor\'{a}k and Jan Poli\v{s}ensk\'{y} and Kamil Je\v{r}\'{a}bek and Ond\v{r}ej Ry\v{s}av\'{y}", title = "Unmasking the Phishermen: Phishing Domain Detection with Machine Learning and Multi-Source Intelligence", pages = "1--5", booktitle = "Proceedings of IEEE/IFIP Network Operations and Management Symposium 2024", year = 2024, location = "Soul, KR", publisher = "Institute of Electrical and Electronics Engineers", ISBN = "979-8-3503-2794-6", language = "english", url = "https://www.fit.vut.cz/research/publication/13073" }