🐙 GitHub Detail
KastanDay/video-pretrained-transformer
By KastanDay
Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).
Live Snapshot
⭐
Stars
54
🍴
Forks
11
📄
License
MIT License
🧩
Type
Jupyter Notebook
About this open-source project
Live information fetched from GitHub.
Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scratch on YouTube (YT-1B dataset).
Default Branch
main
Open Issues
0
Watchers
54