Personal information leakage detection in conversations

Qiongkai Xu, Lizhen Qu*, Zeyu Gao, Gholamreza Haffari

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    9 Citations (Scopus)

    Abstract

    The global market size of conversational assistants (chatbots) is expected to grow to USD 9.4 billion by 2024, according to Marketsand-Markets. Despite the wide use of chatbots, leakage of personal information through chatbots poses serious privacy concerns for their users. In this work, we propose to protect personal information by warning users of detected suspicious sentences generated by conversational assistants. The detection task is formulated as an alignment optimization problem and a new dataset PERSONA-LEAKAGE is collected for evaluation. In this paper, we propose two novel constrained alignment models, which consistently outperform baseline methods on PERSONA-LEAKAGE. Moreover, we conduct analysis on the behavior of recently proposed personalized chit-chat dialogue systems. The empirical results show that those systems suffer more from personal information disclosure than the widely used Seq2Seq model and the language model. In those cases, a significant number of information leaking utterances can be detected by our models with high precision.

    Original languageEnglish
    Title of host publicationEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
    PublisherAssociation for Computational Linguistics (ACL)
    Pages6567-6580
    Number of pages14
    ISBN (Electronic)9781952148606
    Publication statusPublished - 2020
    Event2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020 - Virtual, Online
    Duration: 16 Nov 202020 Nov 2020

    Publication series

    NameEMNLP 2020 - 2020 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

    Conference

    Conference2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020
    CityVirtual, Online
    Period16/11/2020/11/20

    Fingerprint

    Dive into the research topics of 'Personal information leakage detection in conversations'. Together they form a unique fingerprint.

    Cite this