Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

06/27/2019
by   Linan Huang, et al.
0

The honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoC) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply the infinite-horizon Semi-Markov Decision Process (SMDP) to characterize the stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we produce adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive interaction policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro