Resumo
A major threat to system’s security is malware infections, which cause financial and image losses to corporate and endusers, thus motivating the development of malware detectors. In this scenario, Machine Learning (ML) has been demonstrated to be a powerful technique to develop classifiers able to distinguish malware from goodware samples. However, many ML research work on malware detection focus only on the final detection accuracy rate and overlook other important aspects of classifier’s implementation and evaluation, such as feature extraction and parameter selection. In this project, we shed light to these aspects to highlight the challenges and drawbacks of ML-based malware classifiers development. We discovered that (i) dynamic features outperforms static features; (ii) Discrete-bounded features present smaller accuracy variance; (iii) Datasets presenting distinct characteristics impose generalization challenges to ML models; and (iv) Feature analysis can be used as feedback information for malware detection and infection prevention.
Referências
Todos os trabalhos são de acesso livre, sendo que a detenção dos direitos concedidos aos trabalhos são de propriedade da Revista dos Trabalhos de Iniciação Científica da UNICAMP.