TY - GEN
T1 - Personal information leakage detection in conversations
AU - Xu, Qiongkai
AU - Qu, Lizhen
AU - Gao, Zeyu
AU - Haffari, Gholamreza
N1 - Publisher Copyright:
© 2020 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - The global market size of conversational assistants (chatbots) is expected to grow to USD 9.4 billion by 2024, according to Marketsand-Markets. Despite the wide use of chatbots, leakage of personal information through chatbots poses serious privacy concerns for their users. In this work, we propose to protect personal information by warning users of detected suspicious sentences generated by conversational assistants. The detection task is formulated as an alignment optimization problem and a new dataset PERSONA-LEAKAGE is collected for evaluation. In this paper, we propose two novel constrained alignment models, which consistently outperform baseline methods on PERSONA-LEAKAGE. Moreover, we conduct analysis on the behavior of recently proposed personalized chit-chat dialogue systems. The empirical results show that those systems suffer more from personal information disclosure than the widely used Seq2Seq model and the language model. In those cases, a significant number of information leaking utterances can be detected by our models with high precision.
AB - The global market size of conversational assistants (chatbots) is expected to grow to USD 9.4 billion by 2024, according to Marketsand-Markets. Despite the wide use of chatbots, leakage of personal information through chatbots poses serious privacy concerns for their users. In this work, we propose to protect personal information by warning users of detected suspicious sentences generated by conversational assistants. The detection task is formulated as an alignment optimization problem and a new dataset PERSONA-LEAKAGE is collected for evaluation. In this paper, we propose two novel constrained alignment models, which consistently outperform baseline methods on PERSONA-LEAKAGE. Moreover, we conduct analysis on the behavior of recently proposed personalized chit-chat dialogue systems. The empirical results show that those systems suffer more from personal information disclosure than the widely used Seq2Seq model and the language model. In those cases, a significant number of information leaking utterances can be detected by our models with high precision.
UR - http://www.scopus.com/inward/record.url?scp=85103013159&partnerID=8YFLogxK
M3 - Conference contribution
T3 - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 6567
EP - 6580
BT - EMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
Y2 - 16 November 2020 through 20 November 2020
ER -