๐Ÿค— BitsAndBytes Quantizer : Create your own BNB Quants ! โœจ



{}

โš™๏ธ Model Quantization Type Settings

Dropdown

The quantization data type in the bnb.nn.Linear4Bit layers

Dropdown

The compute type for the model

Dropdown

The storage type for the model

๐Ÿ”„ Double Quantization Settings

Radio

Use Double Quant

๐Ÿ’พ Saving Settings

If checked, the model will be publicly accessible

If checked, the model will be uploaded to the bnb-community organization
(Give the space access to the bnb-community, if not already done revoke the token and login again)

๐Ÿ”— Quantized Model Info

๐Ÿ“ Notes on Quantization Options

Quantization Type (bnb_4bit_quant_type)

  • fp4: Floating-point 4-bit quantization.
  • nf4: Normal float 4-bit quantization.

Double Quantization

  • True: Applies a second round of quantization to the quantization constants, further reducing memory usage.
  • False: Uses standard quantization only.

Model Saving Options

  • Model Name: Custom name for your quantized model on the Hub. If left empty, a default name will be generated.
  • Make model public: If checked, anyone can access your quantized model. If unchecked, only you can access it.

๐Ÿ” How It Works

This app uses the BitsAndBytes library to perform 4-bit quantization on Transformer models. The process:

  1. Downloads the original model
  2. Applies the selected quantization settings
  3. Uploads the quantized model to your HuggingFace account

๐Ÿ“Š Memory Usage

4-bit quantization can reduce model size by up to โ‰ˆ75% compared to FP16 for big models.