Privacy-Aware Compression for Federated Data Analysis

Pensacola·2023년 1월 6일

1. Problem & Motivation

1.1. Merging quantization and differential privacy

Federated learning is a framewokr for distributed data anlysis that is free from a necessity of centralized datacenter. Namely a central server owning a machine learning model coordinates distributed learning with clients. The training server distributes the model, and client sends updates to the server for aggregation.

Federated learning faces two problems: Privacy and Compression.

Usually privacy is achieved by Local Differential Privacy (LDP). Clients control and perturb their data to conceal the private information.

Compression is achieved by quantization. It is mapping a value in a given range to another value which quantizes the same range. For example, if given range is [0.0,1.0][0.0, 1.0] and to quantize the value within 5 different values, it can be achieved by mapping the value to [0.0,0.25,0.5,0.75,1.0][0.0, 0.25, 0.5, 0.75, 1.0]. If a value is 0.30.3, it can be mapped to 0.250.25. To send a continuous value through a communication channel, it will requre several bytes (float, double). On the other hand, the above example only requires five bits. There are various quantization techniques that reduces the size and preserves the value as much as possible.

Differential privacy and compression were achieved independently. Common approach addressing both issues was to apply differential privacy to a data or a weight first, and compres it before communication. This sequential approach offers an advese effect on utility as each independently degrades utility without considering existence of the other. In this work, the author proposes a privacy mechanism in conjunction with the compression.

Key ideas (somewhat subjective) are as follows

  1. The author proposes privacy mechanism combined to the quntization
  2. The author defines asymptotic consistency, which is a provable analysis on correlation between utility and the number of clients participating in the training.
  3. The author proposes Minimum Variance Unbiased (MVU) algorithm.

1.2. Asymptotic Consistency

The goal of the paper is to achieve differentially private and compressed algorithm. Formally speaking, there are three requirements.

  1. MM is ε\varepsilon-local DP
  2. The output of MM can be encoded in bb bits
  3. A(M(x1),M(xn))\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right) is a good approximation to Tn(x1,xn)\mathcal{T}_n\left(x_1, \cdots x_n\right).

Where x1,xnx_1, \cdots x_n are sensitive data in respective clients, A\mathcal{A} is an aggregation mechanism, and M\mathcal{M} is a local-differential privacy mechanism. The T\mathcal{T} is an aggregate statistics.

There is no further explanation regarding T\mathcal{T}. This is because the T\mathcal{T} is not pertinent to particular task. It can be a statistic from x1,xnx_1, \cdots x_n, or a likelyhood of the data. The point is, whatever the T\mathcal{T} is, the A(M(x1),M(xn))\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right) is a good approximation for T\mathcal{T}. For example, let A(M(x1),M(xn))\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right) is a federated learning scenario where ii-th client is training a classifier using data xix_i and privacy mechanism M\mathcal{M} and a server is aggregating all of them using A\mathcal{A}. The T\mathcal{T}, which A\mathcal{A} will approximate, is an ideal classifier trained from all data x1,xnx_1, \cdots x_n combined.

This is somewhat magical. How can a quantized & differentially private federeated learning be a good approximator of a counterpart imaginary monolithic classifier which is trained from all data combined? Of course, it is not a free lunch. authors defined a property called asymptotic consistency.

Definition 4
We say that a private federated analytic protocol is asymptotically consistent if the output of the server's aggregation algorithm A(M(x1),M(xn))\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right) approaches the target statistics Tn(x1,xn)\mathcal{T}_n\left(x_1, \cdots x_n\right) as nn \rightarrow \infty. In other words, for any α,δ>0\alpha, \delta > 0, there exists an n0n_0 such that for all nn0n \geq n_0, we have :

Pr(A(M(x1),M(xn))Tn(x1,xn)α)δPr \left( \lvert \mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right) - \mathcal{T}_n\left(x_1, \cdots x_n\right)\rvert \geq \alpha \right) \leq \delta

There is an additional requirement specifically for federated learning. Usually in federated learning, aggregation function is averaging updates from clients. This poses a requirement on M\mathcal{M}.

Lemma 5
If M\mathcal{M} is unbiased for all xx and has bounded variance, and if An\mathcal{A}_n computes the average of the client responses, then the federated analytics solution is asymptotically consistent

As long as our privacy mechanims M\mathcal{M} is unbiased, then we are good. Typical algorithms such as gaussian mechanisms are biased. The goal now is to find a differentially private mechanism which can preserve unbiasedness.

A stochastic algorithm M(x)\mathcal{M}\left(x\right) is unbiased if

E[M(x)]=x\mathbb{E}\left[\mathcal{M}\left(x\right)\right] = x

1.3. Dithering quantization

An unbiased quantization algorithm is dithed quantization.

If x[iB1,i+1B1)x \in [\frac{i}{B-1}, \frac{i+1}{B-1} ) where 0iB10\leq i \leq B-1, then

Dither(x)=iB1with probability (B1)(i+1B1x)i+1B1with probability (B1)(xiB1)\mathrm{Dither}\left(x\right) = \\ \frac{i}{B-1} \quad \quad \textrm{with probability } \left(B-1\right)\left(\frac{i+1}{B-1}-x\right) \\ \frac{i+1}{B-1} \quad \quad \textrm{with probability } \left(B-1\right)\left(x- \frac{i}{B-1}\right)

With the probabilisitc mapping, it is possible to keep the algorithm unbiased. E[Dither(x)]=x\mathbb{E}\left[\mathrm{Dither}\left(x\right)\right] = x.

primitive attempt

0개의 댓글