TwelveLabs is a San Francisco-based AI company that builds multimodal video understanding technology for enterprises and developers. Its platform enables organisations to search, analyse, and extract structured insights from video content at scale, processing petabytes of data across cloud, private cloud, and on-premise environments. The company maintains a significant presence in Seoul alongside its global operations.
At the core of TwelveLabs' offering are two foundation models: Marengo, a multimodal encoder, and Pegasus, a video-language foundation model. Together, they power a platform capable of processing speech, text, audio, and visual information within video - enabling capabilities such as semantic search and automated analysis across large video libraries. The platform is SOC 2 Type 2 certified and serves over 30,000 developers and companies worldwide.
The company has raised $107 million in funding from investors including NVIDIA, NEA, Radical Ventures, Index Ventures, Snowflake, and Databricks. Customers span media and entertainment, sports, and enterprise verticals - among them the NFL, which uses the platform to mine a century of game footage.
TwelveLabs occupies a technically specialised niche within AI infrastructure, focusing specifically on video as a modality rather than text or images alone. Its models are designed to integrate with existing enterprise data stacks and are accessible to developers through an API, positioning the company at the intersection of applied AI research and large-scale data engineering.