Publication Details
Is Spam Visible in Flow-Level Statistics?
network measurement, spam, identification, characteristics
This paper investigates feasibility of detection of spam connections using flow statistics collected upon SMTP connections only. To this end, the paper analyzes several days of SMTP communication collected at middle-sized email server. In order to prove that spam connections can be automatically identified at the TCP/IP layer we utilize supervised learning algorithm to construct classifier, in our case the decision tree. The quality of classifier is evaluated and results shows that the flow based statistics contain detectable fingerprint specific to spam connections. Such finding may help with further study of spam behavior in broader manner as the flow statistics can be collected on-line at the backbone links where it is possible to see SMTP traffic for more than one email server.
@TECHREPORT{FITPUB9277, author = "Martin \v{Z}\'{a}dn\'{i}k and Zbyn\v{e}k Michlovsk\'{y}", title = "Is Spam Visible in Flow-Level Statistics?", pages = "67--78", year = 2009, location = "Prague, CZ", publisher = "CESNET National Research and Education Network", ISBN = "978-80-904173-4-2", language = "english", url = "https://www.fit.vut.cz/research/publication/9277" }