🐙 GitHub Detail

psmarter/mini-infer

By psmarter

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

GitHub Python MIT License Updated 04 Jun 2026

Open Source ↗ Find Similar 🔎 Submit to Directory ＋

Live Snapshot

⭐

Stars

259

🍴

Forks

📄

License

MIT License

🧩

Type

Python

📘

About this open-source project

Live information fetched from GitHub.

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

🌿

Default Branch

main

🐞

Open Issues

👀

Watchers

259

Project Details

Source GitHub

Owner psmarter

License MIT License

Updated 04 Jun 2026

Need help using this?

Golden Eagle IT Technologies can help with setup, customization, deployment, AI integration and monthly support.

Get Support →