โฑ๏ธ Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding
Upload a video and ask any question! Tempo dynamically compresses visual tokens based on your query to achieve SOTA performance. ๐ Project Page | ๐ป GitHub | ๐ Paper | ๐จโ๐ป @Junjie Fei
โณ Slow preprocessing? Try Examples 4 & 5 below, decrease Max Sampled Frames in Advanced Settings, or check our GitHub for full-speed local deployment.
๐ก Try an Example
Examples