EQ-Net: Elastic Quantization Neural Networks


Keywords: #Quantization

0. Abstract

  • Proposal: EQ-Net which aims to train a robust weight-sharing quantization supernet
    1. Elastic Quantization Space: Elastic bit-width, granularity, and symmetry → to adapt to various mainstream quantitative forms
    2. Weight Distribution Regularization Loss (WDR-Loss) and Group Progressive Guidance Loss (GPG-Loss) → to bridge the inconsistency of the distribution for weights and ouput logits in the elastic quantization space gap
    3. Conditional Quantization-Aware Accuracy Precision (CQAP) as an estimator to quickly search mixed-precision quantized NN in supernet.

1. Introduction

  • Quantization incurs added noise due to reduced precision
    • PTQ: Only requires access to a small calibration dataset → effectiveness declines when applied to low bit quantization (<= 4bits)
    • QAT: By simulating the quantization operation during training or fine-tuning, the network can adapt to quantization noise → better than PTQ
  • Problem: The forms of quantization supported by different hardware platforms are all different. (ex) NVIDIA’s GPU (channel-wise symmetric quantization in TensorRT inference engine), Qualcomm’s DSP (per-tensor asymmetric quantization in SNPE inference engine) → repeated optimization leads to low efficiency of model qunatization deployment
  • Proposal
    1. Elastic Quantization Space: A unified quantization formula that integrates various model quantization forms and implementing elastic switching of 1) Quantization bit-width, 2) Quanization granularity 3) Quantization symmetry through parameter splitting
    2. WDR-Loss, GPG-Loss: Unlike NAS, EQ-Net is fully parameter-shared, and there is no additional weight parameter optimization space with network structure differences. → problem of negative gradient suppression(due to different quantization forms; samples with inconsistent predictions between quantization configurations) → WDR-Loss, GPG-Loss is an efficient training strategy for EQ-Net
    3. CQAP combined with a genetic algorithm: Specify any form in the elastic quantization space and quickly obtain a quantized model with the corresponding accuracy. (Can achieve both uniform and MPQ)
  • One-Shot Network Architecture Search
  • Multi-Bit Quantization of Neural Networks

3. Approach

3.1 Quantization Preliminaries

  • Quantization / Dequantization operation

3.2 Elastic Quantization Space Design

3.2.1 Elastic Quantization Bit-Width

  • Separate and store the quantization step size and zero-point required for different quantization bit-widths.
  • Higher bit-widths: Small quantization step size → Large saturation truncation range
  • Lower bit-widths: Larger quantization step size → Small saturation truncation range → Alleviates training pressure for hyperparameters → Poses challenges to the robustness of shared weights.

3.2.2 Elastic Quantization Symmetry

  • Symmetric quantization: Zero-point is fixed to 0 (z = 0)
  • Asymmetric quantization: Zero-point is adjustable to different ranges (z ∈ Z).
  • The switching between the two is achieved by dynamically modifying the value of the zero point.

3.2.1 Elastic Quantization Granularity

  • Supports both per-tensor(one set of step size and zero-point for a tensor in one layer) and per-channel(quantizes each weight kernel independently) quantization.
  • per-tensor > per-channel
  • Step size and zero point for per-tensor can be obtained heuristically from per-channel, or can be learned as independent parameters
  • EQG is for weights only, activation is are all in the form of per-tensor

3.3 Elastic Quantization Network Modeling

  • Elastic quantization space of a model: $\varepsilon = \{\varepsilon_{b}, \varepsilon_{g}, \varepsilon_{s}\}$
  • Training objective: Minimize the task loss under all elastic spaces elastic quantization spaces by optimizing the weights, step sizes, and zero points

3.4 Elastic Quantization Training

Weight Distribution Regularization

  • DNN weights often conform to Gaussian or Laplace distributions
  • Skewness regularization: Reducing skewness → enhance robustness of weights in elastic symmetry
  • Kurtosis regularization: Reducing sharpness → enhance robustness of weights in elastic bit-width

Group Progressive Guidance

  • There are many subnets with varying quantization configurations, resulting in many soft labels that the network can learn from.
  • KL indicates the KL divergence loss and CE indicates the cross-entropy loss.
  • In the case of $\mathcal{L}_R$, the loss function acts as a regulation for two parts:
    • $\mathcal{L}_{\text{KL}}(\mathcal{y}_R, \mathcal{y}_H)$ - Matches the random subnet’s output distribution, and the highest quantization bit-width’s subnet’s output distribution.
    • $\mathcal{L}_{\text{CE}}(\mathcal{y}_R, \mathcal{y})$ - The loss between the random subnet’s output and the original model’s output.

4.3 Ablation Studies

Weight Distribution Regularization

  • The impact of skewness and sharpness of weight distribution on fixed bit-width quantization is relatively insignificant.
  • However, for elastic quantization with high robustness demands, such phenomena can significantly affect the overall performance, particularly for low bit widths.
  • Simultaneously applying kurtosis and skewness regularization can lead to a distribution effect that is closer to uniform distribution, eliminating data skewness and sharpness.

Effectiveness of Group Progressive Guidance (GPG)

  • GPG utilizes soft labels from the high bit-width subnet to progressively guide the low bit-width subnet, creating more coherence between the output of the high and low bit-width networks.
  • To demonstrate the training efficiency of the whole quantization supernet, we use the average precision of 2-4-8 bit-widths,and the average precision of our method is always the best.
  • The Convergence Curve of ResNet20 trained using three different methods(hard label, label smoothing, and our GPG method).

5. Conclusion

  • A one-shot weight-sharing quantization supernet.
  • EQ-Net can support subnets with both uniform and mixed-precision quantization without retraining.
  • Weight Distribution Regularization(WDR) and Group Progressive Guidance(GPG) techniques to optimize EQ-Net.