Federated learning is a framewokr for distributed data anlysis that is free from a necessity of centralized datacenter. Namely a central server owning a machine learning model coordinates distributed learning with clients. The training server distributes the model, and client sends updates to the server for aggregation.

Federated learning faces two problems: Privacy and Compression.

Usually privacy is achieved by

Local Differential Privacy (LDP). Clients control and perturb their data to conceal the private information.

Compression is achieved by

quantization. It is mapping a value in a given range to another value whichquantizesthe same range. For example, if given range is $[0.0, 1.0]$ and to quantize the value within 5 different values, it can be achieved by mapping the value to $[0.0, 0.25, 0.5, 0.75, 1.0]$. If a value is $0.3$, it can be mapped to $0.25$. To send a continuous value through a communication channel, it will requre several bytes (float, double). On the other hand, the above example only requires five bits. There are various quantization techniques that reduces the size and preserves the value as much as possible.

Differential privacy and compression were achieved independently. Common approach addressing both issues was to apply differential privacy to a data or a weight first, and compres it before communication. This sequential approach offers an advese effect on utility as each independently degrades utility without considering existence of the other. In this work, the author proposes a privacy mechanism in conjunction with the compression.

- The author proposes privacy mechanism combined to the quntization
- The author defines
**asymptotic consistency**, which is a provable analysis on correlation between utility and the number of clients participating in the training. - The author proposes Minimum Variance Unbiased (MVU) algorithm.

The goal of the paper is to achieve differentially private and compressed algorithm. Formally speaking, there are three requirements.

- $M$ is $\varepsilon$-local DP
- The output of $M$ can be encoded in $b$ bits
- $\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right)$ is a good approximation to $\mathcal{T}_n\left(x_1, \cdots x_n\right)$.

Where $x_1, \cdots x_n$ are sensitive data in respective clients, $\mathcal{A}$ is an aggregation mechanism, and $\mathcal{M}$ is a local-differential privacy mechanism. The $\mathcal{T}$ is an aggregate statistics.

There is no further explanation regarding $\mathcal{T}$. This is because the $\mathcal{T}$ is not pertinent to particular task. It can be a statistic from $x_1, \cdots x_n$, or a likelyhood of the data. The point is, whatever the $\mathcal{T}$ is, the $\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right)$ is a good approximation for $\mathcal{T}$. For example, let $\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right)$ is a federated learning scenario where $i$-th client is training a classifier using data $x_i$ and privacy mechanism $\mathcal{M}$ and a server is aggregating all of them using $\mathcal{A}$. The $\mathcal{T}$, which $\mathcal{A}$ will approximate, is an ideal classifier trained from all data $x_1, \cdots x_n$ combined.

This is somewhat magical. How can a quantized & differentially private federeated learning be a good approximator of a counterpart imaginary monolithic classifier which is trained from all data combined? Of course, it is not a free lunch. authors defined a property called **asymptotic consistency**.

Definition 4

We say that a private federated analytic protocol is asymptotically consistent if the output of the server's aggregation algorithm $\mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right)$ approaches the target statistics $\mathcal{T}_n\left(x_1, \cdots x_n\right)$ as $n \rightarrow \infty$. In other words, for any $\alpha, \delta > 0$, there exists an $n_0$ such that for all $n \geq n_0$, we have :$Pr \left( \lvert \mathcal{A}\left(\mathcal{M}\left(x_1\right), \cdots \mathcal{M}\left(x_n\right)\right) - \mathcal{T}_n\left(x_1, \cdots x_n\right)\rvert \geq \alpha \right) \leq \delta$

There is an additional requirement specifically for federated learning. Usually in federated learning, aggregation function is averaging updates from clients. This poses a requirement on $\mathcal{M}$.

Lemma 5

If $\mathcal{M}$ is unbiased for all $x$ and has bounded variance, and if $\mathcal{A}_n$ computes the average of the client responses, then the federated analytics solution is asymptotically consistent

As long as our privacy mechanims $\mathcal{M}$ is unbiased, then we are good. Typical algorithms such as gaussian mechanisms are biased. The goal now is to find a differentially private mechanism which can preserve unbiasedness.

A stochastic algorithm $\mathcal{M}\left(x\right)$ is unbiased if

$\mathbb{E}\left[\mathcal{M}\left(x\right)\right] = x$

An unbiased quantization algorithm is dithed quantization.

If $x \in [\frac{i}{B-1}, \frac{i+1}{B-1} )$ where $0\leq i \leq B-1$, then

$\mathrm{Dither}\left(x\right) = \\ \frac{i}{B-1} \quad \quad \textrm{with probability } \left(B-1\right)\left(\frac{i+1}{B-1}-x\right) \\ \frac{i+1}{B-1} \quad \quad \textrm{with probability } \left(B-1\right)\left(x- \frac{i}{B-1}\right)$

With the probabilisitc mapping, it is possible to keep the algorithm unbiased. $\mathbb{E}\left[\mathrm{Dither}\left(x\right)\right] = x$.