# LiveTalking
**Repository Path**: vebin/LiveTalking
## Basic Information
- **Project Name**: LiveTalking
- **Description**: 实时交互数字人
- **Primary Language**: Python
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: https://www.livetalking.ai
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 85
- **Created**: 2026-06-02
- **Last Updated**: 2026-06-02
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
English | [中文版](./README.md)
A real-time interactive streaming digital human engine enabling synchronized audio-video conversation, widely adopted in commercial applications.
**Demos**: [wav2lip](https://youtu.be/-ss0H8qLr7E) | [ernerf](https://www.bilibili.com/video/BV1G1421z73r/) | [musetalk](https://youtu.be/vzUMruoZlxc/)
Domestic Mirror:
---
## Features
1. Supports multiple digital human models: ernerf, musetalk, wav2lip, Ultralight-Digital-Human
2. Supports voice cloning
3. Supports interrupting the digital human while speaking
4. Supports full-body video stitching
5. Supports WebRTC, RTMP, and virtual camera output
6. Supports action choreography: plays custom videos when not speaking
7. Supports multi-concurrency
8. Supports custom digital human avatars
9. Provides frontend API integration
---
## Usage Scenarios
LiveTalking leverages real-time streaming digital human technology to drive virtual avatars via text or voice, combined with LLM for intelligent conversation. Suitable for the following scenarios:
| Scenario | Description |
|----------|-------------|
| **Virtual Streamer / Live Commerce** | 24/7 unmanned live streaming with LLM-generated sales scripts and action choreography for natural performance |
| **AI Digital Human Customer Service** | Integrate enterprise knowledge bases for real-time voice Q&A with interruption support |
| **Online Education / Training** | Digital teacher分身 for course recording, or API-driven digital instructor for real-time lectures |
| **Intelligent Voice Assistant** | Pair with smart speakers or apps, calling the `/human` API to drive digital human voice interactions |
| **Large Screen Presentation** | Digital human presenter for exhibition halls, event venues, and other content narration scenarios |
| **Batch Short Video Creation** | Submit scripts in batch via API to generate digital human videos without real-person filming, using `/human` + `/record` APIs |
**Core Flow**: User input (text/audio) → LLM response (optional) → TTS speech synthesis → Real-time lip-sync → Audio/video streaming output
---
## 1. Installation
Tested on Ubuntu 24.04, Python 3.12, PyTorch 2.9.1, CUDA 13.0.
### 1.1 Install Dependencies
```bash
git clone https://github.com/lipku/LiveTalking.git
conda create -n livetalking python=3.12
conda activate livetalking
# If CUDA version is not 13.0 (check via nvidia-smi), install the corresponding PyTorch version(https://pytorch.org/get-started/previous-versions)
pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu130
cd LiveTalking
pip install -r requirements.txt
```
Installation FAQ:
Linux CUDA environment setup:
---
## 2. Quick Start
### 2.1 Download Models
| Source | Link |
|--------|------|
| Quark Cloud | |
| Google Drive | |
1. Copy `wav2lip256.pth` to the project's `models/` directory and rename it to `wav2lip.pth`
2. Extract `wav2lip256_avatar1.tar.gz` and copy the entire extracted folder to `data/avatars/`
### 2.2 Start the Server
```bash
python app.py --transport webrtc --model wav2lip --avatar_id wav2lip256_avatar1
```
> **Note**: The server must open ports TCP:8010, UDP:1-65536
### 2.3 Client Access
| Method | Description |
|--------|-------------|
| Browser | Open `http://serverip:8010/index.html`, click "Start Connection" to play the digital human video, then enter text and submit |
| API | See [API Docs](docs/api.md) for HTTP-based integration |
| Desktop App | Download: |
### 2.4 Web Pages
| Page | URL | Description |
|------|-----|-------------|
| Home | `/index.html` | WebRTC connection + text/audio driver + recording control |
| Avatar Creator | `/avatar.html` | Upload video to auto-generate digital human avatars |
| Admin Console | `/admin.html` | Real-time session monitoring & global configuration |
### 2.5 Quick Experience
Create an instance with a cloud image to run instantly:
- [UCloud Image](https://www.compshare.cn/images/4458094e-a43d-45fe-9b57-de79253befe4?referral_code=3XW3852OBmnD089hMMrtuU&ytag=GPU_GitHub_livetalking)
### 2.6 Documentation
---
## 3. Architecture
### Dataflow Diagram
### Layer Overview
**API Layer**
- `/human`: Accepts text, supporting echo (direct playback) and chat (LLM conversation) modes
- `/humanaudio`: Accepts audio files for direct playback
- Each connection is assigned a unique `sessionid`, supporting multi-user concurrency
**Logic Layer**
- **LLM Engine**: Integrates with models like Qwen to generate conversational responses
- **TTS Engine**: Modular design supporting EdgeTTS, GPT-SoVITS, CosyVoice, Tencent Cloud, and more
- **Feature Extraction**: Synchronously extracts acoustic features (e.g., Mel spectrograms) for lip-sync inference
**Rendering Layer**
- **Model Inference**: Uses deep learning models (Wav2Lip, MuseTalk, etc.) to generate lip-sync frames from audio features
- **Post-Processing**: Smoothly overlays the generated mouth region back onto the original high-definition video
**Streaming Layer**
- **WebRTC**: Low-latency browser-based streaming
- **RTMP**: Standard live streaming protocol, supports pushing to platforms like Bilibili/YouTube
- **Virtual Camera**: Outputs as a system camera device
**Plugin System**
- Decentralized registration mechanism based on [registry.py](registry.py), allowing developers to extend TTS, Avatar, and Output modules
---
## 4. API Documentation
| Document | Description |
|----------|-------------|
| [docs/api.md](docs/api.md) | General API — WebRTC, text/audio driver, recording, action choreography |
| [docs/avatar_api.md](docs/avatar_api.md) | Avatar Generation API — create tasks, query progress, delete tasks |
| [docs/admin_api.md](docs/admin_api.md) | Admin API — global config, session monitoring, force stop |
---
## 5. Docker
Available images:
- **AutoDL**: — [Tutorial](https://doc.livetalking.ai/en/docs/autodl/)
- **UCloud**: — Supports opening any port, no additional SRS deployment required — [Tutorial](https://doc.livetalking.ai/en/docs/ucloud/)
> AutoDL cannot open UDP ports, so you need to deploy SRS or TURN relay service separately.
---
## 6. Performance
- Each video stream compression consumes CPU; higher resolution means greater CPU usage. Each lip-sync inference consumes GPU
- Concurrent sessions when not speaking depend on CPU; concurrent speaking sessions depend on GPU
- In backend logs: `inferfps` = GPU inference frame rate, `finalfps` = final streaming frame rate. Both must be >= 25 for real-time performance
### Real-Time Inference Performance
| Model | GPU | FPS |
|:------|:----|:----|
| wav2lip256 | RTX 3060 | 60 |
| wav2lip256 | RTX 3080Ti | 120 |
| musetalk | RTX 3080Ti | 42 |
| musetalk | RTX 3090 | 45 |
| musetalk | RTX 4090 | 72 |
- wav2lip256: RTX 3060 or higher recommended
- musetalk: RTX 3080Ti or higher recommended
---
## 7. Statement
Videos developed based on this project and published on platforms such as Bilibili, WeChat Channels, and Douyin must include the LiveTalking watermark and logo.
---
If this project is helpful to you, please give it a Star. Contributors interested in improving this project are also welcome.
| Community | Link |
|-----------|------|
| Knowledge Planet | |
| WeChat | wxwubug (mention for group invite) |
| Telegram | |
| Discord | |
| Email | lipku@foxmail.com |
| WeChat Official | 数字人技术 |
