Enhancing Cybersecurity with MEME: Reinforcement Learning for Adversarial Malware Evasion

18 Apr 2024


(1) Maria Rigaki, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic and maria.rigaki@fel.cvut.cz;

(2) Sebastian Garcia, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic and sebastian.garcia@agents.fel.cvut.cz.

Abstract & Introduction

Threat Model

Background and Related Work


Experiments Setup



Conclusion, Acknowledgments, and References


8 Conclusions

By employing model-based reinforcement learning, MEME generates adversarial malware samples that successfully evade antivirus systems and train a surrogate model mimicking the target classifier accurately. Our experiments show that MEME surpasses existing methods in evasion rate, suggesting its potential for various applications, such as testing model robustness and enhancing cyber security against advanced persistent threats. Future work may involve exploring ensemble surrogates and other optimizations to enhance MEME’s performance further.


The authors acknowledge support from the Strategic Support for the Development of Security Research in the Czech Republic 2019–2025 (IMPAKT 1) program, by the Ministry of the Interior of the Czech Republic under No. VJ02010020 – AI-Dojo: Multi-agent testbed for the research and testing of AI-driven cyber security technologies. The authors acknowledge the support of NVIDIA Corporation with the donation of a Titan V GPU used for this research.


1. Akiba, T., Sano, S., Yanase, T., Ohta, T., Koyama, M.: Optuna: A Next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 2623–2631. KDD ’19, Association for Computing Machinery, New York, NY, USA (Jul 2019). https://doi.org/10.1145/3292500.3330701

2. Anderson, H.S., Kharkar, A., Filar, B., Evans, D., Roth, P.: Learning to Evade Static PE Machine Learning Malware Models via Reinforcement Learning (Jan 2018). https://doi.org/10.48550/arXiv.1801.08917, arXiv:1801.08917 [cs]

3. Anderson, H.S., Roth, P.: EMBER: An Open Dataset for Training Static PE Malware Machine Learning Models (Apr 2018). https://doi.org/10.48550/arXiv.1804.04637, arXiv:1804.04637 [cs]

4. Bergstra, J., Bardenet, R., Bengio, Y., K´egl, B.: Algorithms for Hyper-Parameter Optimization. In: Advances in Neural Information Processing Systems. vol. 24. Curran Associates, Inc. (2011)

5. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: OpenAI Gym (Jun 2016). https://doi.org/10.48550/arXiv.1606.01540, arXiv:1606.01540 [cs]

6. Ceschin, F., Botacin, M., Gomes, H.M., Oliveira, L.S., Gr´egio, A.: Shallow Security: on the Creation of Adversarial Variants to Evade Machine Learning-Based Malware Detectors. In: Proceedings of the 3rd Reversing and Offensive-oriented Trends Symposium. pp. 1–9. ROOTS’19, Association for Computing Machinery, New York, NY, USA (Feb 2020). https://doi.org/10.1145/3375894.3375898

7. Chandrasekaran, V., Chaudhuri, K., Giacomelli, I., Jha, S., Yan, S.: Exploring connections between active learning and model extraction. In: Proceedings of the 29th USENIX Conference on Security Symposium. pp. 1309–1326. SEC’20, USENIX Association, USA (Aug 2020)

8. Correia-Silva, J.R., Berriel, R.F., Badue, C., de Souza, A.F., Oliveira-Santos, T.: Copycat CNN: Stealing Knowledge by Persuading Confession with Random Non-Labeled Data. In: 2018 International Joint Conference on Neural Networks (IJCNN). pp. 1–8 (Jul 2018). https://doi.org/10.1109/IJCNN.2018.8489592, iSSN: 2161-4407

9. Demetrio, L., Biggio, B., Lagorio, G., Roli, F., Armando, A.: FunctionalityPreserving Black-Box Optimization of Adversarial Windows Malware. IEEE Transactions on Information Forensics and Security 16, 3469–3478 (2021). https://doi.org/10.1109/TIFS.2021.3082330, conference Name: IEEE Transactions on Information Forensics and Security

10. Demetrio, L., Coull, S.E., Biggio, B., Lagorio, G., Armando, A., Roli, F.: Adversarial exemples: A survey and experimental evaluation of practical attacks on machine learning for windows malware detection. ACM Transactions on Privacy and Security (TOPS) 24(4), 1–31 (2021), publisher: ACM New York, NY, USA

11. Dowling, S., Schukat, M., Barrett, E.: Using Reinforcement Learning to Conceal Honeypot Functionality. In: Brefeld, U., Curry, E., Daly, E., MacNamee, B., Marascu, A., Pinelli, F., Berlingerio, M., Hurley, N. (eds.) Machine Learning and Knowledge Discovery in Databases. pp. 341–355. Lecture Notes in Computer Science, Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-10997-4 21

12. Fang, Y., Zeng, Y., Li, B., Liu, L., Zhang, L.: DeepDetectNet vs RLAttackNet: An adversarial method to improve deep learning-based static malware detection model. PLOS ONE 15(4), e0231626 (Apr 2020). https://doi.org/10.1371/journal.pone.0231626, publisher: Public Library of Science

13. Fang, Z., Wang, J., Geng, J., Kan, X.: Feature Selection for Malware Detection Based on Reinforcement Learning. IEEE Access 7, 176177–176187 (2019). https://doi.org/10.1109/ACCESS.2019.2957429, conference Name: IEEE Access

14. Fang, Z., Wang, J., Li, B., Wu, S., Zhou, Y., Huang, H.: Evading Anti-Malware Engines With Deep Reinforcement Learning. IEEE Access 7, 48867–48879 (2019). https://doi.org/10.1109/ACCESS.2019.2908033, conference Name: IEEE Access

15. Harang, R., Rudd, E.M.: SOREL-20M: A Large Scale Benchmark Dataset for Malicious PE Detection. arXiv:2012.07634 [cs] (Dec 2020), arXiv: 2012.07634

16. Hu, W., Tan, Y.: Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. In: Tan, Y., Shi, Y. (eds.) Data Mining and Big Data. pp. 409–423. Communications in Computer and Information Science, Springer Nature, Singapore (2022). https://doi.org/10.1007/978-981-19-8991-9 29

17. Huang, L., Zhu, Q.: Adaptive Honeypot Engagement Through Reinforcement Learning of Semi Markov Decision Processes. In: Alpcan, T., Vorobeychik, Y., Baras, J.S., D´an, G. (eds.) Decision and Game Theory for Security. pp. 196– 216. Lecture Notes in Computer Science, Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-32430-8 13

18. Institute, A.T.: AV-ATLAS - Malware & PUA (2023), https://portal.av-atlas.org/malware

19. Jagielski, M., Carlini, N., Berthelot, D., Kurakin, A., Papernot, N.: High Accuracy and High Fidelity Extraction of Neural Networks. pp. 1345–1362 (2020)

20. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.Y.: LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In: Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017)

21. Kurutach, T., Clavera, I., Duan, Y., Tamar, A., Abbeel, P.: Model-Ensemble TrustRegion Policy Optimization. In: International Conference on Learning Representations (2018)

22. Labaca-Castro, R., Franz, S., Rodosek, G.D.: AIMED-RL: Exploring Adversarial Malware Examples with Reinforcement Learning. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds.) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. pp. 37–52. Lecture Notes in Computer Science, Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-86514-6 3

23. Li, X., Li, Q.: An IRL-based malware adversarial generation method to evade anti-malware engines. Computers & Security 104, 102118 (May 2021). https://doi.org/10.1016/j.cose.2020.102118

24. Ling, X., Wu, L., Zhang, J., Qu, Z., Deng, W., Chen, X., Qian, Y., Wu, C., Ji, S., Luo, T., Wu, J., Wu, Y.: Adversarial attacks against Windows PE malware detection: A survey of the state-of-the-art. Computers & Security 128, 103134 (May 2023). https://doi.org/10.1016/j.cose.2023.103134

25. Lundberg, S.M., Lee, S.I.: A Unified Approach to Interpreting Model Predictions. In: Advances in Neural Information Processing Systems. vol. 30. Curran Associates, Inc. (2017)

26. Nguyen, T.T., Reddi, V.J.: Deep Reinforcement Learning for Cyber Security. IEEE Transactions on Neural Networks and Learning Systems pp. 1–17 (2021). https://doi.org/10.1109/TNNLS.2021.3121870, conference Name: IEEE Transactions on Neural Networks and Learning Systems

27. Orekondy, T., Schiele, B., Fritz, M.: Knockoff Nets: Stealing Functionality of BlackBox Models. pp. 4954–4963 (2019)

28. Pal, S., Gupta, Y., Shukla, A., Kanade, A., Shevade, S., Ganapathy, V.: ActiveThief: Model Extraction Using Active Learning and Unannotated Public Data. Proceedings of the AAAI Conference on Artificial Intelligence 34(01), 865–872 (Apr 2020). https://doi.org/10.1609/aaai.v34i01.5432, number: 01

29. Phan, T.D., Duc Luong, T., Hoang Quoc An, N., Nguyen Huu, Q., Nghi, H.K., Pham, V.H.: Leveraging Reinforcement Learning and Generative Adversarial Networks to Craft Mutants of Windows Malware against Black-box Malware Detectors. In: Proceedings of the 11th International Symposium on Information and Communication Technology. pp. 31–38. SoICT ’22, Association for Computing Machinery, New York, NY, USA (Dec 2022)

30. Quertier, T., Marais, B., Morucci, S., Fournel, B.: MERLIN – Malware Evasion with Reinforcement LearnINg (Mar 2022), arXiv:2203.12980 [cs]

31. Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., Dormann, N.: StableBaselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research 22(268), 1–8 (2021)

32. Rigaki, M., Garcia, S.: Stealing and evading malware classifiers and antivirus at low false positive conditions. Computers & Security 129, 103192 (Jun 2023). https://doi.org/10.1016/j.cose.2023.103192

33. Rosenberg, I., Meir, S., Berrebi, J., Gordon, I., Sicard, G., Omid David, E.: Generating End-to-End Adversarial Examples for Malware Classifiers Using Explainability. In: 2020 International Joint Conference on Neural Networks (IJCNN). pp. 1–10 (Jul 2020). https://doi.org/10.1109/IJCNN48605.2020.9207168, iSSN: 2161-4407

34. Sanyal, S., Addepalli, S., Babu, R.V.: Towards Data-Free Model Stealing in a Hard Label Setting. pp. 15284–15293 (2022)

35. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

36. Security.org, T.: 2023 Antivirus Market Annual Report (Feb 2023), https://www.security.org/antivirus/antivirus-consumer-report-annual/

37. Severi, G., Meyer, J., Coull, S., Oprea, A.: Explanation-Guided Backdoor Poisoning Attacks Against Malware Classifiers. In: 30th USENIX Security Symposium (USENIX Security 21). pp. 1487–1504. USENIX Association (2021)

38. Song, W., Li, X., Afroz, S., Garg, D., Kuznetsov, D., Yin, H.: MAB-Malware: A Reinforcement Learning Framework for Blackbox Generation of Adversarial Malware. In: Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security. pp. 990–1003. ASIA CCS ’22, Association for Computing Machinery, New York, NY, USA (May 2022). https://doi.org/10.1145/3488932.3497768

39. Sussman, B.: New Malware Is Born Every Minute (May 2023), https://blogs.blackberry.com/en/2023/05/new-malware-born-every-minute

40. Sutton, R.S., Barto, A.G.: Reinforcement Learning, second edition: An Introduction. MIT Press (Nov 2018)

41. Total, V.: VirusTotal - Stats, https://www.virustotal.com/gui/stats

42. Uprety, A., Rawat, D.B.: Reinforcement Learning for IoT Security: A Comprehensive Survey. IEEE Internet of Things Journal 8(11), 8693–8706 (Jun 2021). https://doi.org/10.1109/JIOT.2020.3040957, conference Name: IEEE Internet of Things Journal

43. Wu, C., Shi, J., Yang, Y., Li, W.: Enhancing Machine Learning Based Malware Detection Model by Reinforcement Learning. In: Proceedings of the 8th International Conference on Communication and Network Security. pp. 74–78. ICCNS 2018, Association for Computing Machinery, New York, NY, USA (Nov 2018). https://doi.org/10.1145/3290480.3290494

44. Zolotukhin, M., Kumar, S., H¨am¨al¨ainen, T.: Reinforcement Learning for Attack Mitigation in SDN-enabled Networks. In: 2020 6th IEEE Conference on Network Softwarization (NetSoft). pp. 282–286 (Jun 2020). https://doi.org/10.1109/NetSoft48620.2020.9165383

45. Sembera, V., Paquet-Clouston, M., Garcia, S., Erquiaga, M.J.: Cybercrime Spe- ˇ cialization: An Expos´e of a Malicious Android Obfuscation-as-a-Service. In: 2021 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW. pp. 213–236 (2021). https://doi.org/10.1109/EuroSPW54576.2021.00029

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.