Adversarial Malware Creation with Model-Based Reinforcement Learning: Appendix

18 Apr 2024

Authors:

(1) Maria Rigaki, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic and maria.rigaki@fel.cvut.cz;

(2) Sebastian Garcia, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic and sebastian.garcia@agents.fel.cvut.cz.

Table of Links

Abstract & Introduction

Threat Model

Background and Related Work

Conclusion, Acknowledgments, and References

Appendix

A. Hyper-parameter Tuning

The search space for the PPO hyper-parameters:

– gamma: 0.01 - 0.75

– max grad norm: 0.3 - 5.0

– learning rate: 0.001 - 0.1

– activation function: ReLU or Tanh

– neural network size: small or medium

Selected parameters: gamma=0.854, learning rate=0.00138, max grad norm=0.4284,

activation function=Tanh, small network size (2 layers with 64 units each).

The search space for the LGB surrogate training hyper-parameters:

– alpha: 1 - 1,000

– num boosting rounds: 100-2,000

– learning rate: 0.001 - 0.1

– num leaves: 128 - 2,048

– max depth: 5 - 16

– min child samples: 5 - 100

– feature fraction: 0.4 - 1.0

Table 4. Hyper-parameter settings for the training of each LGB surrogate

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.