Menu
Back to Open Source

🐙 GitHub Detail

V

KastanDay/video-pretrained-transformer

By KastanDay

Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).

GitHub Jupyter Notebook MIT License Updated 13 Feb 2026

Live Snapshot

Stars

54

🍴

Forks

11

📄

License

MIT License

🧩

Type

Jupyter Notebook

📘

About this open-source project

Live information fetched from GitHub.

Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).

🌿

Default Branch

main

🐞

Open Issues

0

👀

Watchers

54