TY - GEN
T1 - Privacy-aware text rewriting
AU - Xu, Qiongkai
AU - Qu, Lizhen
AU - Xu, Chenchen
AU - Cui, Ran
N1 - Publisher Copyright:
© 2019 Association for Computational Linguistics
PY - 2019
Y1 - 2019
N2 - Biased decisions made by automatic systems have led to growing concerns in research communities. Recent work from the NLP community focuses on building systems that make fair decisions based on text. Instead of relying on unknown decision systems or human decision-makers, we argue that a better way to protect data providers is to remove the trails of sensitive information before publishing the data. In light of this, we propose a new privacy-aware text rewriting task and explore two privacy-aware back-translation methods for the task, based on adversarial training and approximate fairness risk. Our extensive experiments on three real-world datasets with varying demo-graphical attributes show that our methods are effective in obfuscating sensitive attributes. We have also observed that the fairness risk method retains better semantics and fluency, while the adversarial training method tends to leak less sensitive information.
AB - Biased decisions made by automatic systems have led to growing concerns in research communities. Recent work from the NLP community focuses on building systems that make fair decisions based on text. Instead of relying on unknown decision systems or human decision-makers, we argue that a better way to protect data providers is to remove the trails of sensitive information before publishing the data. In light of this, we propose a new privacy-aware text rewriting task and explore two privacy-aware back-translation methods for the task, based on adversarial training and approximate fairness risk. Our extensive experiments on three real-world datasets with varying demo-graphical attributes show that our methods are effective in obfuscating sensitive attributes. We have also observed that the fairness risk method retains better semantics and fluency, while the adversarial training method tends to leak less sensitive information.
UR - http://www.scopus.com/inward/record.url?scp=85087158745&partnerID=8YFLogxK
M3 - Conference contribution
T3 - INLG 2019 - 12th International Conference on Natural Language Generation, Proceedings of the Conference
SP - 247
EP - 257
BT - INLG 2019 - 12th International Conference on Natural Language Generation, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - 12th International Conference on Natural Language Generation, INLG 2019
Y2 - 29 October 2019 through 1 November 2019
ER -