🤗 BitsAndBytes Quantizer : Create your own BNB Quants ! ✨

{}

⚙️ Model Quantization Type Settings

The quantization data type in the bnb.nn.Linear4Bit layers

The compute type for the model

The storage type for the model

🔄 Double Quantization Settings

Use Double Quant

True False

💾 Saving Settings

Model Name (optional : to override default)

If checked, the model will be publicly accessible

🌐 Make model public

If checked, the model will be uploaded to the bnb-community organization
(Give the space access to the bnb-community, if not already done revoke the token and login again)

🤗 Upload to bnb-community

🔗 Quantized Model Info

📝 Notes on Quantization Options

Quantization Type (bnb_4bit_quant_type)

fp4: Floating-point 4-bit quantization.
nf4: Normal float 4-bit quantization.

Double Quantization

True: Applies a second round of quantization to the quantization constants, further reducing memory usage.
False: Uses standard quantization only.

Model Saving Options

Model Name: Custom name for your quantized model on the Hub. If left empty, a default name will be generated.
Make model public: If checked, anyone can access your quantized model. If unchecked, only you can access it.

🔍 How It Works

This app uses the BitsAndBytes library to perform 4-bit quantization on Transformer models. The process:

Downloads the original model
Applies the selected quantization settings
Uploads the quantized model to your HuggingFace account

📊 Memory Usage

4-bit quantization can reduce model size by up to ≈75% compared to FP16 for big models.