Publication Details
Towards Evaluating Quality of Datasets for Network Traffic Domain
Hynek Karel, Ing. (FIT CTU)
Soukup Dominik, Ing. (FIT CTU)
Tisovčík Peter, Ing. (DCSY FIT BUT)
Dataset; Data Quality; Network traffic analysis
This paper deals with the quality of network traffic datasets created to train and validate machine learning classification and detection methods. Naturally, there is a long epoch of research targeted at data quality; however, it is focused mainly on data consistency, validity, precision, and other metrics, which are insufficient for network traffic use-cases. The rise of Machine learning usage in network monitoring applications requires a new methodology for evaluation datasets. There is a need to evaluate and compare traffic samples captured at different conditions and decide the usability of the already captured and annotated data. This paper aims to explain a use case of dataset creation, propose definitions regarding the quality of the network traffic datasets, and finally, describe a framework for datasets analysis.
@INPROCEEDINGS{FITPUB12640, author = "Tom\'{a}\v{s} \v{C}ejka and Karel Hynek and Dominik Soukup and Peter Tisov\v{c}\'{i}k", title = "Towards Evaluating Quality of Datasets for Network Traffic Domain", pages = "264--268", booktitle = "Proceedings of the 17th International Conference on Network Service Management (CNSM 2021)", year = 2021, location = "Izmir, TR", publisher = "Institute of Electrical and Electronics Engineers", ISBN = "978-3-903176-36-2", doi = "10.23919/CNSM52442.2021.9615601", language = "english", url = "https://www.fit.vut.cz/research/publication/12640" }