Skip to content

Commit

Permalink
modify readme for using efficient speech
Browse files Browse the repository at this point in the history
  • Loading branch information
Xu Liu authored and Xu Liu committed Apr 6, 2024
1 parent bc3f7c6 commit 580ed90
Show file tree
Hide file tree
Showing 5 changed files with 13 additions and 18 deletions.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,17 @@
- 实时和离线功能
- 使用[ChatGLM3](https://github.com/THUDM/ChatGLM3) 6B 4比特量化模型进行聊天交互
- 使用[whisper.cpp](https://github.com/ggerganov/whisper.cpp)加速自动语音识别(ASR)
- 使用[TTS](https://github.com/coqui-ai/TTS)进行文本转语音转换
- 使用[EfficientSpeech](https://github.com/roatienza/efficientspeech)进行文本转语音转换

## 使用方法
- 根据[Chatglm.cpp](chatglm.cpp.md)安装Chatglm.cpp
- 下载ChatGLM3 6B-4bit模型 [model](https://huggingface.co/Xorbits/chatglm3-6B-GGML)
- 安装 [whisper.cpp](https://github.com/ggerganov/whisper.cpp) 尽量选择BLAS编译,加速推理速度
- 安装 [EfficientSpeech](./examples/efficientspeech/README.md)
- 安装相关依赖
`pip install -r requirements.txt`
- 创建本地文字转语音服务(TTS)
`cd examples/efficientspeech/ && sh es_tts_service.sh`
- 修改模型存储路径并运行脚本
```
vim examples/demo.sh
Expand Down
9 changes: 6 additions & 3 deletions README_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,18 @@ Tested on M2 16G MacBook Air / (4070Ti) + 13600KF Ubuntu / (4070Ti) + 13600KF Wi
- Real-time and offline functionality
- Utilizes [ChatGLM3](https://github.com/THUDM/ChatGLM3) 6B 4-bit quantized model for chat interactions
- Accelerated automatic speech recognition (ASR) with [whisper.cpp](https://github.com/ggerganov/whisper.cpp)
- Text-to-speech (TTS) conversion using [TTS](https://github.com/coqui-ai/TTS)
- Text-to-speech (TTS) conversion using [EfficientSpeech](https://github.com/roatienza/efficientspeech)

## Usage
1. Follow the README of [Chatglm.cpp](chatglm.cpp.md) to install chatglm.cpp
2. Download ChatGLM3 6B-4bit [model](https://huggingface.co/Xorbits/chatglm3-6B-GGML)
3. Install [whisper.cpp](https://github.com/ggerganov/whisper.cpp) and compile with BLAS / CUBLAS can speed up the inference process
4. Install requirements
4. Install [EfficientSpeech](./examples/efficientspeech/README.md) for real-time TTS
5. Install requirements
`pip install -r requirements.txt`
5. Modify the model path of the script and run it:
6. Create local TTS service
`cd examples/efficientspeech/ && sh es_tts_service.sh`
7. Modify the model path of the script and run it:
```
vim examples/demo.sh
cd examples && sh demo.sh
Expand Down
13 changes: 3 additions & 10 deletions examples/cli_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
import argparse
from pathlib import Path
from typing import List
from TTS.api import TTS
import playsound
import requests
import numpy as np
from scipy.io import wavfile
import warnings
Expand All @@ -23,9 +22,7 @@
/_/ /_/
""".strip("\n")
WELCOME_MESSAGE = "Welcome to ChatGLM.CPP-based oral English bot! Ask whatever you want. Say 'clear' to clear context. Say 'stop' to exit."

import requests
url = 'http://localhost:5000/convert_text_to_speech'
tts_url = 'http://localhost:5000/convert_text_to_speech'

def main() -> None:
parser = argparse.ArgumentParser()
Expand All @@ -47,7 +44,6 @@ def main() -> None:
parser.add_argument("--runtime_dir", default=None, type=str, help="path to save chat audio and text file")
parser.add_argument("--asr_main", default=None, type=str, help="path to run asr main program")
parser.add_argument("--asr_model", default=None, type=str, help="path to load asr model")
parser.add_argument("--tts_model", default=None, type=str, help="path to load tts model")
parser.add_argument("--input_device", default=0, type=int, help="the index of the input device")
args = parser.parse_args()

Expand All @@ -58,7 +54,6 @@ def main() -> None:
if args.sp:
system = args.sp.read_text()
os.makedirs(args.runtime_dir, exist_ok=True)
# tts = TTS(model_name=args.tts_model, progress_bar=False)
sampling_rate = 22050
sd.default.samplerate = sampling_rate
sd.default.latency = 'low'
Expand Down Expand Up @@ -149,7 +144,7 @@ def main() -> None:
ai_audio_path = f'{args.runtime_dir}/chat_{talk_round}_ai.wav'
response_txt = remove_non_english_chars(msg_out.content.replace('\n', ' ').strip())
if response_txt:
response = requests.post(url, json={'text': response_txt})
response = requests.post(tts_url, json={'text': response_txt})
if response.status_code == 200:
wav = response.json()['audio']
wav = np.array([x[0] for x in wav], dtype=np.float32).reshape((-1, 1))
Expand All @@ -158,8 +153,6 @@ def main() -> None:
sd.wait()
else:
print('Error:', response.json())
# tts.tts_to_file(response_txt, file_path=ai_audio_path)
# playsound.playsound(ai_audio_path)

print("Bye")

Expand Down
3 changes: 0 additions & 3 deletions examples/demo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,10 @@
CHAT_MODEL=/Users/xuliu/Documents/vscode/llm/model_zoo/chatglm3-ggml-q4_0.bin
ASR_MAIN=/Users/xuliu/Documents/vscode/speech/whisper.cpp/main
ASR_MODEL=/Users/xuliu/Documents/vscode/speech/whisper.cpp/models/ggml-base.en.bin
TTS_MODEL=tts_models/en/ljspeech/vits--neon
python3 cli_demo.py \
--model ${CHAT_MODEL} \
--interactive \
--runtime_dir './runtime' \
--asr_main ${ASR_MAIN} \
--asr_model ${ASR_MODEL} \
--tts_model ${TTS_MODEL} \
--input_device 1

1 change: 0 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,3 @@ pynput
scipy
sounddevice
subprocess
TTS

0 comments on commit 580ed90

Please sign in to comment.