TY - GEN
T1 - Modelling direct messaging networks with multiple recipients for cyber deception
AU - Moore, Kristen
AU - Christopher, Cody James
AU - Liebowitz, David
AU - Nepal, Surya
AU - Selvey, Renee
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Cyber deception is the practice of deliberately introducing fake or misleading artefacts into cyber systems. It is emerging as a promising approach to defending networks and systems against attackers and data thieves. However, despite being relatively cheap to deploy [1], the generation of realistic content at scale is very costly when it is hand-crafted. With recent improvements in Machine Learning, we now have the opportunity to bring scale and automation to the creation of realistic and enticing simulated content. In this work, we propose a framework to automate the generation of email and instant messaging-style group communications at scale. Such messaging platforms within organisations contain a lot of valuable information inside private communications and document attachments, making them an enticing target for an adversary. The presence of an active messaging platform also enhances the realism of a deceptive network simulation, contributing both traffic and message artefacts. We address two key aspects of simulating this type of system: modelling when and with whom participants communicate, and generating topical, multi-party text to populate simulated conversation threads. We present the LogNormMix-Net Temporal Point Process as an approach to the first of these, building upon the intensity-free modeling approach of Shchur et al. [2] to create a generative model for unicast and multi-cast communications. We demonstrate the use of fine-tuned, pretrained language models to generate convincing multi-party conversation threads. A live email server is simulated by uniting our LogNormMix-Net TPP (to generate the communication timestamp, sender and recipients) with the language model, which generates the contents of the multi-party email threads. We evaluate the generated content with respect to a number of realism-based properties, that encourage a model to learn to generate content that will engage the attention of an adversary to achieve a deception outcome. Our simulations run in real time, making them suitable for deployment in cyber deception as a honeypot in its own right, or as part of a larger deception environment.
AB - Cyber deception is the practice of deliberately introducing fake or misleading artefacts into cyber systems. It is emerging as a promising approach to defending networks and systems against attackers and data thieves. However, despite being relatively cheap to deploy [1], the generation of realistic content at scale is very costly when it is hand-crafted. With recent improvements in Machine Learning, we now have the opportunity to bring scale and automation to the creation of realistic and enticing simulated content. In this work, we propose a framework to automate the generation of email and instant messaging-style group communications at scale. Such messaging platforms within organisations contain a lot of valuable information inside private communications and document attachments, making them an enticing target for an adversary. The presence of an active messaging platform also enhances the realism of a deceptive network simulation, contributing both traffic and message artefacts. We address two key aspects of simulating this type of system: modelling when and with whom participants communicate, and generating topical, multi-party text to populate simulated conversation threads. We present the LogNormMix-Net Temporal Point Process as an approach to the first of these, building upon the intensity-free modeling approach of Shchur et al. [2] to create a generative model for unicast and multi-cast communications. We demonstrate the use of fine-tuned, pretrained language models to generate convincing multi-party conversation threads. A live email server is simulated by uniting our LogNormMix-Net TPP (to generate the communication timestamp, sender and recipients) with the language model, which generates the contents of the multi-party email threads. We evaluate the generated content with respect to a number of realism-based properties, that encourage a model to learn to generate content that will engage the attention of an adversary to achieve a deception outcome. Our simulations run in real time, making them suitable for deployment in cyber deception as a honeypot in its own right, or as part of a larger deception environment.
KW - cyber deception, generative modelling, simulation
UR - https://www.scopus.com/pages/publications/85134012266
U2 - 10.1109/EuroSP53844.2022.00009
DO - 10.1109/EuroSP53844.2022.00009
M3 - Conference Paper
AN - SCOPUS:85134012266
T3 - IEEE European Symposium on Security and Privacy (EuroS&P)
BT - Proceedings - 7th IEEE European Symposium on Security and Privacy
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th IEEE European Symposium on Security and Privacy, Euro S and P 2022
Y2 - 6 June 2022 through 10 June 2022
ER -