Paper Review
Table of contents
- Overcoming Oscillations in Quantization-Aware Training
- Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
- Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
- Improving Low-Precision Network Quantization via Bin Regularization
- Low-bit Quantization Needs Good Distribution
- Post training 4-bit quantization of convolutional networks for rapid-deployment
- Data-Free Quantization Through Weight Equalization and Bias Correction
- Data-Free Quantization with Accurate Activation Clipping and Adaptive Batch Normalization
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- OnlineHD: Robust, Efficient, and Single-Pass Online Learning Using Hyperdimensional System
- Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review
- Classification using Hyperdimensional Computing: A Review
- LoRA: Low-Rank Adaptation of Large Language Models
- On-Device Training Under 256KB Memory
- SmoothQuant: Accurate and EfficientPost-Training Quantization for Large Language Models
- Robust Quantization: One Model to Rule Them All
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Learned Step Size Quantization
- EQ-Net: Elastic Quantization Neural Networks
- PointPillars: Fast Encoders for Object Detection from Point Clouds
- A Survey of Quantization Methods for Efficient Neural Network Inference
- Any-Precision Deep Neural Networks
- Neural Architecture Search: A Survey
- Learning Efficient Convolutional Networks through Network Slimming
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding