llama.cpp Local Inference

$9.99

Run quantized models locally with HuggingFace Hub discovery and GPU optimization.

SKU: mlops-inference-llama-cpp Category:

What You Get

Run GGUF models locally with llama.cpp. Automatic model discovery from HuggingFace Hub, quantization selection, and inference optimization.

Features

  • HuggingFace Hub model search and download
  • Automatic quantization level selection (Q4_K_M, Q5_K_M, Q8_0)
  • GPU layer offloading optimization
  • OpenAI-compatible local server
  • Context length extension (RoPE scaling)
  • Batch processing for embedding generation

Requirements

  • Hermes Agent or OpenClaw v2.0+
  • 8GB+ RAM (CPU) or 6GB+ VRAM (GPU)

Installation

hermes skills install llama-cpp-inference
Version3.1.0
CategoryMLOps
Installs2.1k+
UpdatedMay 2026
LicenseSingle Site
Compatibility

Hermes Agentllama.cppHuggingFace

Reviews

There are no reviews yet.

Be the first to review “llama.cpp Local Inference”

Your email address will not be published. Required fields are marked *