Current location: Home> AI Model> Natural Language Processing
CosyVoice 2.0

CosyVoice 2.0

CosyVoice 2.0 is a leading multilingual voice generation model, using streaming modeling technology to achieve ultra-low latency (150ms), and the sound quality is natural and stable.
Author:LoRA
Inclusion Time:11 Mar 2025
Downloads:931
Pricing Model:Free
Introduction

CosyVoice2.0 is a multilingual, large-scale speech generation model with complete full-stack capabilities, covering reasoning, training and deployment, and is of great value in the field of speech synthesis. It not only supports multilingual voice generation, but also generates natural and smooth voices that are close to human voices, which are suitable for multiple locales.

The project was developed by the FunAudioLLM team and is open sourced under the Apache-2.0 license.

Main features

Multilingual support: CosyVoice supports pronunciation synthesis in Chinese, English, Japanese, Korean and a variety of Chinese dialects (such as Cantonese, Sichuan, Shanghai, Tianjin, Wuhan dialect, etc.).

Ultra-low latency: CosyVoice 2.0 integrates offline and streaming modeling technology and supports bidirectional streaming voice synthesis, with first-pack synthesis latency as low as 150 milliseconds while maintaining high-quality audio output.

High Accuracy: CosyVoice 2.0 reduces pronunciation errors in synthetic audio by 30% to 50% compared to version 1.0, achieving the lowest character error rate on the difficult test set of the Seed-TTS evaluation set.

Strong stability: CosyVoice 2.0 ensures excellent timbre consistency in zero-sample and cross-language speech synthesis.

Natural experience: The rhythm, sound quality and emotional alignment of synthetic audio have been significantly improved, with the MOS evaluation score increased from 5.4 to 5.53.

CosyVoice 2.0 local deployment detailed tutorial

This tutorial will guide you on-premises CosyVoice 2.0 , from environment configuration to model runs, for Windows users.

1. Download and install Miniconda

Miniconda is a Conda management tool that is very convenient to install on Windows. After downloading, click Next like normal software until the installation is completed.

2. Download the CosyVoice source code

Get the CosyVoice source code from the official repository or specified channel and unzip it.

3. Create a virtual environment and activate it

Open Anaconda Prompt or CMD and enter the following command to create and activate the environment:

 conda create -n cosyvoice python=3.8 -y
conda activated cosyvoice

4. Install the pynini module

The pynini module can only be installed using Conda under Windows, so it runs in an activated environment:

 conda install -y -c conda-forge pynini==2.1.5 WeTextProcessing==1.0.3

5. Install other dependencies (using Alibaba mirror)

  • Edit requirements.txt

    • Delete WeTextProcessing==1.0.3 of the last line (avoid installation failure)

    • Adding Matcha-TTS dependencies

  • Installation dependencies (using Alibaba Cloud Mirror Acceleration):

 pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/

6. Complete deployment

At this point, CosyVoice and all its dependencies have been installed and can be started.

Guess you like
  • Amazon Nova Premier

    Amazon Nova Premier

    Amazon Nova Premier is Amazon's new multi-modal language model that supports the understanding and generation of text, images, and videos, helping developers build AI applications.
    Generate text images
  • Qwen2.5-14B-Instruct-GGUF

    Qwen2.5-14B-Instruct-GGUF

    Qwen2.5-14B-Instruct-GGUF is an optimized large-scale language generation model that combines advanced technology and powerful instruction tuning with efficient text generation and understanding capabilities.
    Text generation chat
  • Skywork 4.0

    Skywork 4.0

    Tiangong Model 4.0 is online, with dual upgrades of reasoning and voice assistant. It is free and open, bringing a new AI experience!
    multimodal model
  • gpt-4o-mini-transcribe

    gpt-4o-mini-transcribe

    gpt-4o-mini-transcribe is a speech-to-text model launched by OpenAI, and is a streamlined version of gpt-4o-transcribe.
    Voice to text real-time voice transcription
  • ReasonGraph

    ReasonGraph

    ReasonGraph is an open source platform that visualizes and analyzes the inference process of large language models (LLMs), and supports 50+ mainstream models such as OpenAI, Google, and Anthropic.
    Machine learning inference optimization
  • Gemini 2.5 Pro

    Gemini 2.5 Pro

    Gemini 2.5 Pro is a new generation of AI model launched by Google. It has "thinking ability" and conducts multiple steps of reasoning before responding, thereby greatly improving performance and accuracy.
    AI inference model Google artificial intelligence
  • DeepSeek V3

    DeepSeek V3

    DeepSeek V3 is an advanced open source AI model developed by Chinese AI company DeepSeek (part of the hedge fund High-Flyer).
    Open source AI natural language processing model
  • InfAlign

    InfAlign

    InfAlign is a new model released by Google that aims to solve the problem of information alignment in cross-modal learning.
    Language model inference
Selected columns
  • Second Me Tutorial

    Second Me Tutorial

    Welcome to the Second Me Creation Experience Page! This tutorial will help you quickly create and optimize your second digital identity.
  • ComfyUI Tutorial

    ComfyUI Tutorial

    ComfyUI is an efficient UI development framework. This tutorial details the features, components and practical tips of ComfyUI.
  • Cursor ai Tutorial

    Cursor ai Tutorial

    Cursor is a powerful AI programming editor that integrates intelligent completion, code interpretation and debugging functions. This article explains the core functions and usage methods of Cursor in detail.
  • Sora Tutorial

    Sora Tutorial

    Sora is an AI video generation model launched by OpenAI. This tutorial introduces the functions, usage methods and application scenarios of Sora in detail to help you get started quickly.
  • Deepseek Tutorial

    Deepseek Tutorial

    Deepseek is an AI data search and analysis tool. This article introduces the functions, applications and usage methods of Deepseek in detail.