Apple researchers have developed an adapted version of the SlowFast-LLaVA model that beats larger models at long-form video analysis and understanding. Here’s what that means. The nerdy bits Very basically, when an LLM is trained to also understand video, it learns to split videos into frames, apply computer vision to extract visual features, analyze how those features change over time, …
Read More »