๐ Notes on Quantization Options
Quantization Type (bnb_4bit_quant_type)
- fp4: Floating-point 4-bit quantization.
- nf4: Normal float 4-bit quantization.
Double Quantization
- True: Applies a second round of quantization to the quantization constants, further reducing memory usage.
- False: Uses standard quantization only.
Model Saving Options
- Model Name: Custom name for your quantized model on the Hub. If left empty, a default name will be generated.
- Make model public: If checked, anyone can access your quantized model. If unchecked, only you can access it.
๐ How It Works
This app uses the BitsAndBytes library to perform 4-bit quantization on Transformer models. The process:
- Downloads the original model
- Applies the selected quantization settings
- Uploads the quantized model to your HuggingFace account
๐ Memory Usage
4-bit quantization can reduce model size by up to โ75% compared to FP16 for big models.