TY - GEN
T1 - A Submodularity-based Clustering Algorithm for the Information Bottleneck and Privacy Funnel
AU - Ding, Ni
AU - Sadeghi, Parastoo
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - For the relevant data S that nests in the observation X, the information bottleneck (IB) aims to encode X into X in order to maximize the extracted useful information I(S;;X) with the minimum coding rate I(X;;X). For the dual privacy tunnel (PF) problem where S denotes the sensitive/private data, the goal is to minimize the privacy leakage I(S;X) while maintain a certain level of utility I(X;;X). For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet ;X:= X and iteratively merges the elements in the current alphabet\;X that optimizes the Lagrangian function I(S;;X-λ I(X;X). We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of ;X by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier λ, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: I(S;;X) vs.-I(X;;X). We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.
AB - For the relevant data S that nests in the observation X, the information bottleneck (IB) aims to encode X into X in order to maximize the extracted useful information I(S;;X) with the minimum coding rate I(X;;X). For the dual privacy tunnel (PF) problem where S denotes the sensitive/private data, the goal is to minimize the privacy leakage I(S;X) while maintain a certain level of utility I(X;;X). For both problems, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). It starts with the original alphabet ;X:= X and iteratively merges the elements in the current alphabet\;X that optimizes the Lagrangian function I(S;;X-λ I(X;X). We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of ;X by the existing MDSF algorithms. By varying the value of the Lagrangian multiplier λ, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: I(S;;X) vs.-I(X;;X). We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex.
UR - http://www.scopus.com/inward/record.url?scp=85081108070&partnerID=8YFLogxK
U2 - 10.1109/ITW44776.2019.8989355
DO - 10.1109/ITW44776.2019.8989355
M3 - Conference contribution
AN - SCOPUS:85081108070
T3 - 2019 IEEE Information Theory Workshop, ITW 2019
BT - 2019 IEEE Information Theory Workshop, ITW 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE Information Theory Workshop, ITW 2019
Y2 - 25 August 2019 through 28 August 2019
ER -