EQ-Net: Elastic Quantization Neural Networks

2022.12.17

Keywords: #Quantization

0. Abstract

github code: https://github.com/xuke225/EQ-Net
Proposal: EQ-Net which aims to train a robust weight-sharing quantization supernet
1. Elastic Quantization Space: Elastic bit-width, granularity, and symmetry → to adapt to various mainstream quantitative forms
2. Weight Distribution Regularization Loss (WDR-Loss) and Group Progressive Guidance Loss (GPG-Loss) → to bridge the inconsistency of the distribution for weights and ouput logits in the elastic quantization space gap
3. Conditional Quantization-Aware Accuracy Precision (CQAP) as an estimator to quickly search mixed-precision quantized NN in supernet.

1. Introduction

Quantization incurs added noise due to reduced precision
- PTQ: Only requires access to a small calibration dataset → effectiveness declines when applied to low bit quantization (<= 4bits)
- QAT: By simulating the quantization operation during training or fine-tuning, the network can adapt to quantization noise → better than PTQ
Problem: The forms of quantization supported by different hardware platforms are all different. (ex) NVIDIA’s GPU (channel-wise symmetric quantization in TensorRT inference engine), Qualcomm’s DSP (per-tensor asymmetric quantization in SNPE inference engine) → repeated optimization leads to low efficiency of model qunatization deployment
Proposal
1. Elastic Quantization Space: A unified quantization formula that integrates various model quantization forms and implementing elastic switching of 1) Quantization bit-width, 2) Quanization granularity 3) Quantization symmetry through parameter splitting
2. WDR-Loss, GPG-Loss: Unlike NAS, EQ-Net is fully parameter-shared, and there is no additional weight parameter optimization space with network structure differences. → problem of negative gradient suppression(due to different quantization forms; samples with inconsistent predictions between quantization configurations) → WDR-Loss, GPG-Loss is an efficient training strategy for EQ-Net
3. CQAP combined with a genetic algorithm: Specify any form in the elastic quantization space and quickly obtain a quantized model with the corresponding accuracy. (Can achieve both uniform and MPQ)

One-Shot Network Architecture Search
Multi-Bit Quantization of Neural Networks

3. Approach

3.1 Quantization Preliminaries

Quantization / Dequantization operation

$\begin{align*} \hat{w} &= \text{clip}([\frac{w}{s}]+z, -2^{b-1}, 2^{b-1}-1) \\ \bar{w} &= s \cdot (\hat{w}-z) \end{align*}$

3.2 Elastic Quantization Space Design

3.2.1 Elastic Quantization Bit-Width

Separate and store the quantization step size and zero-point required for different quantization bit-widths.
Higher bit-widths: Small quantization step size → Large saturation truncation range
Lower bit-widths: Larger quantization step size → Small saturation truncation range → Alleviates training pressure for hyperparameters → Poses challenges to the robustness of shared weights.

3.2.2 Elastic Quantization Symmetry

Symmetric quantization: Zero-point is fixed to 0 (z = 0)
Asymmetric quantization: Zero-point is adjustable to different ranges (z ∈ Z).
The switching between the two is achieved by dynamically modifying the value of the zero point.

3.2.1 Elastic Quantization Granularity

Supports both per-tensor(one set of step size and zero-point for a tensor in one layer) and per-channel(quantizes each weight kernel independently) quantization.
per-tensor > per-channel
Step size and zero point for per-tensor can be obtained heuristically from per-channel, or can be learned as independent parameters
EQG is for weights only, activation is are all in the form of per-tensor

3.3 Elastic Quantization Network Modeling

Elastic quantization space of a model: $\varepsilon = \{\varepsilon_{b}, \varepsilon_{g}, \varepsilon_{s}\}$
Training objective: Minimize the task loss under all elastic spaces elastic quantization spaces by optimizing the weights, step sizes, and zero points

$\min_{w^*, s^*, z^*} \sum_\varepsilon \mathcal{L}_\text{val}(\text{QNN}(\hat w, \hat x, s, z))$

3.4 Elastic Quantization Training

Weight Distribution Regularization

DNN weights often conform to Gaussian or Laplace distributions
Skewness regularization: Reducing skewness → enhance robustness of weights in elastic symmetry
Kurtosis regularization: Reducing sharpness → enhance robustness of weights in elastic bit-width

$\begin{align*} \text{Skew}[w] &= E\bigl[(\frac{w-\mu}{\sigma})^3\bigr] \\ \text{Kurt}[w] &= E\bigl[(\frac{w-\mu}{\sigma})^4\bigr] \end{align*}$

Group Progressive Guidance

There are many subnets with varying quantization configurations, resulting in many soft labels that the network can learn from.
KL indicates the KL divergence loss and CE indicates the cross-entropy loss.
In the case of $\mathcal{L}_R$, the loss function acts as a regulation for two parts:
- $\mathcal{L}_{\text{KL}}(\mathcal{y}_R, \mathcal{y}_H)$ - Matches the random subnet’s output distribution, and the highest quantization bit-width’s subnet’s output distribution.
- $\mathcal{L}_{\text{CE}}(\mathcal{y}_R, \mathcal{y})$ - The loss between the random subnet’s output and the original model’s output.

4.3 Ablation Studies

Weight Distribution Regularization

The impact of skewness and sharpness of weight distribution on fixed bit-width quantization is relatively insignificant.
However, for elastic quantization with high robustness demands, such phenomena can significantly affect the overall performance, particularly for low bit widths.
Simultaneously applying kurtosis and skewness regularization can lead to a distribution effect that is closer to uniform distribution, eliminating data skewness and sharpness.

Effectiveness of Group Progressive Guidance (GPG)

GPG utilizes soft labels from the high bit-width subnet to progressively guide the low bit-width subnet, creating more coherence between the output of the high and low bit-width networks.
To demonstrate the training efficiency of the whole quantization supernet, we use the average precision of 2-4-8 bit-widths,and the average precision of our method is always the best.
The Convergence Curve of ResNet20 trained using three different methods(hard label, label smoothing, and our GPG method).

5. Conclusion

A one-shot weight-sharing quantization supernet.
EQ-Net can support subnets with both uniform and mixed-precision quantization without retraining.
Weight Distribution Regularization(WDR) and Group Progressive Guidance(GPG) techniques to optimize EQ-Net.

EQ-Net: Elastic Quantization Neural Networks

0. Abstract

1. Introduction

2. Related Work

3. Approach

3.1 Quantization Preliminaries

3.2 Elastic Quantization Space Design

3.2.1 Elastic Quantization Bit-Width

3.2.2 Elastic Quantization Symmetry

3.2.1 Elastic Quantization Granularity

3.3 Elastic Quantization Network Modeling

3.4 Elastic Quantization Training

Weight Distribution Regularization

Group Progressive Guidance

4.3 Ablation Studies

Weight Distribution Regularization

Effectiveness of Group Progressive Guidance (GPG)

5. Conclusion