Current location: Home> Ai News

Zhiyuan launches BGE-VL model: new breakthrough in multimodal retrieval

Author: LoRA Time: 07 Mar 2025 961

In the field of multimodal artificial intelligence, Zhiyuan Research Institute has cooperated with many universities to launch the new multimodal vector model BGE-VL, marking a major breakthrough in multimodal retrieval technology. The BGE series of models has received wide acclaim since its launch, and the launch of BGE-VL has further enriched the ecosystem. The model has performed well in many key tasks such as graphic and text retrieval, combined image retrieval, and demonstrated its outstanding performance.

BGE-VL's success is attributed to the MegaPairs data synthesis technology behind it. This innovative method significantly improves the scalability and quality of data by mining existing large-scale graphic and text data. MegaPairs is able to generate diverse data sets at extremely low cost, containing more than 26 million samples, providing a rich foundation for training multimodal retrieval models. This technology has enabled BGE-VL to achieve leading results in multiple mainstream multimodal retrieval benchmarks.

image.png

Today, when multimodal retrieval is increasingly valued, users' needs for information acquisition are becoming increasingly diversified. Previous retrieval models mostly rely on a single graphic pair for training, and cannot effectively deal with complex combined inputs. BGE-VL overcomes this limitation by introducing MegaPairs data, allowing the model to more comprehensively understand and process multimodal queries.

In the performance evaluation of multiple tasks, the Zhiyuan team found that the BGE-VL model performed excellently on the Massive Multimodal Embedding Benchmark (MMEB). Although MegaPairs does not cover most of the tasks in MMEB, its task generalization capabilities are still exciting. In addition, in the evaluation of combined image retrieval, BGE-VL also performed outstandingly, significantly surpassing many well-known models, such as Google's MagicLens and Nvidia's MM-Embed.

image.png

In the future, Zhiyuan Research Institute plans to continue to deepen MegaPairs technology, combine with richer multimodal search scenarios, and strive to create a more comprehensive and efficient multimodal searcher to provide users with more accurate information services. With the development of multimodal technology, the launch of BGE-VL will undoubtedly promote further exploration and innovation in related fields.

Paper address: https://arxiv.org/abs/2412.14475

Project homepage: https://github.com/VectorSpaceLab/MegaPairs

Model address: https://huggingface.co/BAAI/BGE-VL-MLLM-S1