Menu
Back to Open Source

🐙 GitHub Detail

M

psmarter/mini-infer

By psmarter

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

GitHub Python MIT License Updated 04 Jun 2026

Live Snapshot

Stars

259

🍴

Forks

17

📄

License

MIT License

🧩

Type

Python

📘

About this open-source project

Live information fetched from GitHub.

LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving

🌿

Default Branch

main

🐞

Open Issues

1

👀

Watchers

259