💎 Gemma3 — Desi Chatbot (GGUF / HF fallback)

Gemma3 (quantized GGUF) — Local inference if available, otherwise fallback to Hugging Face Inference API.

16 1024
0 1.5

Runtime: HuggingFace Inference

  • MODEL_REPO: google/gemma-3-4b-it-qat-q4_0-gguf
  • HF model (inference): <not set>

Tips: Reduce max tokens if you see OOM. Upload a smaller Q4 quantized GGUF for Spaces.