What You Get
Run GGUF models locally with llama.cpp. Automatic model discovery from HuggingFace Hub, quantization selection, and inference optimization.
Features
- HuggingFace Hub model search and download
- Automatic quantization level selection (Q4_K_M, Q5_K_M, Q8_0)
- GPU layer offloading optimization
- OpenAI-compatible local server
- Context length extension (RoPE scaling)
- Batch processing for embedding generation
Requirements
- Hermes Agent or OpenClaw v2.0+
- 8GB+ RAM (CPU) or 6GB+ VRAM (GPU)
Installation
hermes skills install llama-cpp-inference





Reviews
There are no reviews yet.