🐙 GitHub Detail

KastanDay/video-pretrained-transformer

By KastanDay

Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).

GitHub Jupyter Notebook MIT License Updated 13 Feb 2026

Open Source ↗ Find Similar 🔎 Submit to Directory ＋

Live Snapshot

⭐

Stars

🍴

Forks

📄

License

MIT License

🧩

Type

Jupyter Notebook

📘

About this open-source project

Live information fetched from GitHub.

Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).

🌿

Default Branch

main

🐞

Open Issues

👀

Watchers

Project Details

Source GitHub

Owner KastanDay

License MIT License

Updated 13 Feb 2026

Need help using this?

Golden Eagle IT Technologies can help with setup, customization, deployment, AI integration and monthly support.

Get Support →