Paper Review
Now updated here! A2J’s Notion
Table of contents
- BSQ: Exploring Bit-Level Sparsity for Mixed Precision Neural Network Quantization
- R2 Loss: Range Restriction Loss for Model Compression and Quantization
- BigNAS: Neural Architecture Search with Big Single-Stage Models
- PTMQ: Post-training Multi-Bit Quantization of Neural Networks
- MultiQuant: Training Once for Multi-Bit Quantization of Neural Networks
- CSQ: Growing Mixed-Precision QuantizationScheme with Bi-level Continuous Sparsification
- Resource-Efficient Transformer Pruning for Finetuning of Large Models
- EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning
- NIPQ: Noise proxy-based Integrated Pseudo-Quantization
- EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge
- NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
- Overcoming Oscillations in Quantization-Aware Training
- Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
- Review and Benchmarking of Precision-Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing
- Improving Low-Precision Network Quantization via Bin Regularization
- Low-bit Quantization Needs Good Distribution
- Post training 4-bit quantization of convolutional networks for rapid-deployment
- Data-Free Quantization Through Weight Equalization and Bias Correction
- Data-Free Quantization with Accurate Activation Clipping and Adaptive Batch Normalization
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- OnlineHD: Robust, Efficient, and Single-Pass Online Learning Using Hyperdimensional System
- Efficient Acceleration of Deep Learning Inference on Resource-Constrained Edge Devices: A Review
- Classification using Hyperdimensional Computing: A Review
- LoRA: Low-Rank Adaptation of Large Language Models
- On-Device Training Under 256KB Memory
- SmoothQuant: Accurate and EfficientPost-Training Quantization for Large Language Models
- Robust Quantization: One Model to Rule Them All
- DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
- Learned Step Size Quantization
- EQ-Net: Elastic Quantization Neural Networks
- PointPillars: Fast Encoders for Object Detection from Point Clouds
- A Survey of Quantization Methods for Efficient Neural Network Inference
- Any-Precision Deep Neural Networks
- Neural Architecture Search: A Survey
- Learning Efficient Convolutional Networks through Network Slimming
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Attention Is All You Need