🐙 GitHub Detail
psmarter/mini-infer
By psmarter
LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving
Live Snapshot
⭐
Stars
259
🍴
Forks
17
📄
License
MIT License
🧩
Type
Python
About this open-source project
Live information fetched from GitHub.
LLM inference engine from scratch — paged KV cache, continuous batching, chunked prefill, prefix caching, speculative decoding, CUDA graph, tensor parallelism, OpenAI-compatible serving
Default Branch
main
Open Issues
1
Watchers
259