当前位置：网站首页>Wenet: E2E speech recognition tool for industrial implementation

Wenet: E2E speech recognition tool for industrial implementation

2022-07-05 04:29:00 【Wang Xiaoxi WW】

WeNet： Industrial oriented E2E Speech recognition tools

List of articles

WeNet： Industrial oriented E2E Speech recognition tools

One 、WeNet Build a speech recognition platform

1、 Reference material

2、 Quickly build WeNet platform

Reference resources WeNet Chinese document

Download the official pre training model , And start the docker service , Load model , Provide websocket Speech recognition service of the Protocol .


wget https://wenet-1256283475.cos.ap-shanghai.myqcloud.com/models/aishell2/20210618_u2pp_conformer_libtorch.tar.gz
tar -xf 20210618_u2pp_conformer_libtorch.tar.gz
model_dir=$PWD/20210618_u2pp_conformer_libtorch
docker run --rm -it -p 10086:10086 -v $model_dir:/home/wenet/model wenetorg/wenet-mini:latest bash /home/run.sh

Note：

there $PWD = "/home/wenet/model".
Make sure Pre training model file The storage location of should be correct , Namely decompress in $PWD Next , Execute the following command
model_dir=$PWD/20210618_u2pp_conformer_libtorch Assign variables , Otherwise it will be reported ：

Real time recognition

Open the file using the browser **index.html, stay WebSocket URL Fill in the ws://127.0.0.1:10086 ( If in windows Pass through wsl2** function docker, Then use ws://localhost:10086) , Allow browser pop-up requests to use microphones , Real time speech recognition through microphone .

Use here wsl2 Under the docker demonstrate ： If you are close to the microphone , The false detection rate is relatively low .

Insert picture description here

Two 、WeNet Realize reasoning （ Can't use... For the time being onnx cpu Version reasoning ）

Note：

If only wenet/bin/recognize.py, Use libTorch Model reasoning , Can be in windows Build an environment in , Refer to for the specific construction process WeNet Official website
If you want to use wenet/bin/recognize_onnx.py Reasoning , Need to download first ctc_encoder, Pay attention here pypi Upper ctc_encoder Only 2020 Version of （WeNet1.0）, And current WeNet3.0 Version inconsistency , So we need to go to https://github.com/Slyne/ctc_decoder Download and compile . Due to compilation swig_encoder You need to use bash command , So try to linux Running in the system , Use here WSL + ubuntu As a solution .
Actually windows install git You can execute bash command , It's just that it's being installed here wget.exe,swig.exe ,git clone Corresponding package（kenlm,ThreadPool） after , For downloaded openfst-1.6.3, Even in VC It's complete .h file , Unable to compile successfully .

1、 build WeNet Environmental Science

Here, due to trying to use onnx The reasoning model , Therefore use WSL + ubuntu As a solution

WSL + Docker Desktop Use tutorial reference WSL Ubuntu + Docker Desktop build python Environmental Science

On completion WSL and Docker Desktop After installation ,WeNet The environment configuration steps are as follows ：

Instantiation anaconda Containers

docker run -it --name="anaconda" -p 8888:8888 continuumio/anaconda3 /bin/bash

If you quit , It can be restarted anaconda Containers

# restart 
docker start anaconda
docker exec -it anaconda /bin/bash

stay base Environmental Science The configuration wenet Environmental Science （ Do not create virtual environments , After convenience, it is packaged into an image , for pycharm Use ）
take WSL Medium wenet Project copy to docker In the container （ Suppose that WSL Of /home/usr There are wenet project ）
```
docker cp /home/usr/wenet/requirements.txt 9cf7b3c196f3:/home/  #9cf7b3c196f3 by anaconda Containers id
```
Get into anaconda In container , stay /home/ Use pip Install all packages （conda Source modification reference ubuntu Replace conda Source ）
```
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
conda install pytorch=1.10.0 torchvision torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge
```

download ctc_encoder project （ Give Way conformer It can be used in speech recognition beam_search Method ）

ctc_encoder The official website is as follows ：https://github.com/Slyne/ctc_decoder.

because github clone stay ubuntu It may not work well , So in windows Into swig/setup.sh：

#!/usr/bin/env bash

if [ ! -d kenlm ]; then
    git clone https://github.com/kpu/kenlm.git
    echo -e "\n"
fi

if [ ! -d openfst-1.6.3 ]; then
    echo "Download and extract openfst ..."
    wget http://www.openfst.org/twiki/pub/FST/FstDownload/openfst-1.6.3.tar.gz --no-check-certificate
    tar -xzvf openfst-1.6.3.tar.gz
    echo -e "\n"
fi

if [ ! -d ThreadPool ]; then
    git clone https://github.com/progschj/ThreadPool.git
    echo -e "\n"
fi

echo "Install decoders ..."
# python3 setup.py install --num_processes 10
python3 setup.py install --user --num_processes 10

After installing the necessary packages （ stay git bash Use in setup.sh The command ,wget,swig Direct installation exe that will do ）, The overall document structure is as follows （ There are four more files ）：

Insert picture description here

Then put the complete ctc_encoder Copied to the anaconda In the container , Just compile it .

compile ctc_encoder：
Let's say that now anaconda In the container ,ctc_encoder The project in /home Under the table of contents , Get into swig After folder , function bash setup.sh You can compile （ It needs to be done first apt install gcc, apt install g++）

To configure onnx,onnxruntime Environmental Science

pip install onnx==1.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install onnxruntime==1.9.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

take Docker The runtime container is packaged as an image

take anaconda The environment of the container runtime is packaged into an image , to pycharm Professional Edition calls , Reference resources Pycharm Use docker Environment development in container

#OPTIONS explain ：
# -a : The author of the image submitted ;
# -c : Use Dockerfile Command to create a mirror ;
# -m : Instructions for submission ;
# -p : stay commit when , Pause container .

#2b1ad7022d19 by anaconda When the container is running id
docker commit -a "wangxiaoxi" -m "wenet_env" 2b1ad7022d19  wenet_env:v1

2、 model training

Reference resources Tutorial on AIShell

3、 be based on libTorch Model reasoning

download aishell2 sample The dataset goes on wenet Model reasoning , The official website is as follows ： Hill shell

download WeNet Pre training model of （ download Checkpoint model - conformer）

Put the test data set and pre training model under the project path , such as ：

Insert picture description here

modify train.yaml Medium cmvn_file The location of （ If you use docker In container python Environmental Science , Relative paths are recommended ）

cmvn_file: ../../test/aishell2/global_cmvn     # Use relative path here

take aishell2 The dataset is modified to wenet data format

{
    "key": "D4_753", "wav": "../../test/aishell2/test_data/D4_750.wav", "txt": ""}
{
    "key": "D4_754", "wav": "../../test/aishell2/test_data/D4_751.wav", "txt": ""}
{
    "key": "D4_755", "wav": "../../test/aishell2/test_data/D4_752.wav", "txt": ""}
{
    "key": "D4_753", "wav": "../../test/aishell2/test_data/D4_753.wav", "txt": ""}
{
    "key": "D4_754", "wav": "../../test/aishell2/test_data/D4_754.wav", "txt": ""}
{
    "key": "D4_755", "wav": "../../test/aishell2/test_data/D4_755.wav", "txt": ""}
{
    "key": "D4_756", "wav": "../../test/aishell2/test_data/D4_756.wav", "txt": ""}

Use wenet/bin/recognize.py, Enter the following command

python recognize 
--config=../../test/aishell2/train.yaml \
--dict=../../test/aishell2/units.txt \
--checkpoint=../../test/aishell2/final.pt \
--result_file=../../test/aishell2/att_res_result.txt \
--test_data=../../test/aishell2/test_data/data.list \

The output is as follows ：

Namespace(batch_size=16, beam_size=10, bpe_model=None, checkpoint='../../test/aishell2/final.pt', config='../../test/aishell2/train.yaml', connect_symbol='', ctc_weight=0.0, data_type='raw', decoding_chunk_size=-1, dict='../../test/aishell2/units.txt', gpu=-1, mode='attention', non_lang_syms=None, num_decoding_left_chunks=-1, override_config=[], penalty=0.0, result_file='../../test/aishell2/att_res_result.txt', reverse_weight=0.0, simulate_streaming=False, test_data='../../test/aishell2/test_data/data1.list')
2022-07-04 15:54:22,441 INFO Checkpoint: loading from checkpoint ../../test/aishell2/final.pt for CPU
F:\ASR\wenet\wenet\transformer\asr_model.py:266: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  best_hyps_index = best_k_index // beam_size
2022-07-04 15:54:27,189 INFO D4_753  Minning Marketing Service Department of people's insurance group of China 
2022-07-04 15:54:27,189 INFO D4_755  Chinatelecom minning town cooperation business office 
2022-07-04 15:54:27,189 INFO D4_754  Minning Town Health Center 
2022-07-04 15:54:27,189 INFO D4_756  Minning town passenger station 
2022-07-04 15:54:27,189 INFO D4_753  Episode 61 
2022-07-04 15:54:27,189 INFO D4_755  Episode 63 
2022-07-04 15:54:27,189 INFO D4_754  Episode 62

4、WeNet export onnx Model

Reference resources ONNX backend on WeNet

Download here first WeNet Pre training model of （ download Checkpoint model - conformer）, Then use wenet/bin/export_onnx_cpu.py, Set the following parameters , Can be libtorch Of pt File conversion to onnx file

python export_onnx_cpu.py
 --config F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/train.yaml \
 --checkpoint F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/final.pt \
 --chunk_size 16 \
 --output_dir F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/onnx_dir \
 --num_decoding_left_chunks -1

If onnx Export succeeded , The following will be generated in the output folder 3 File ：encoder.onnx,ctc.onnx, decoder.onnx.

5、 Use `recognize_onnx` Reasoning （ Unresolved ）

Reference resources https://github.com/wenet-e2e/wenet/pull/761.

To download conformer Weight file for model （checkpoint model）,https://wenet.org.cn/wenet/pretrained_models.html

After decompressing the weight file , The folder directory is as follows

Insert picture description here
modify train.yaml Medium cmvn_file The location of

#cmvn_file: F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/global_cmvn
cmvn_file: ../../test/aishell2/global_cmvn     # Use relative path here

convert to wenet Of json data format ： Suppose there are audio files D4_750.wav, The format is converted as follows json Format , Reference resources https://wenet.org.cn/wenet/tutorial_librispeech.html?highlight=test_data#stage-0-prepare-training-data

{
    "key": "D4_753", "wav": "../../test/aishell2/test_data/D4_750.wav", "txt": " The purchase restriction that has the greatest inhibitory effect on the transaction of the real estate market "}

Then run ：

python3 wenet/bin/recognize_onnx.py --config=20210618_u2pp_conformer_exp/train.yaml --test_data=raw_wav/test/data.list --gpu=0 --dict=20210618_u2pp_conformer_exp/words.txt --mode=attention_rescoring --reverse_weight=0.4 --ctc_weight=0.1 --result_file=./att_res_result.txt --encoder_onnx=onnx_model/encoder.onnx --decoder_onnx=onnx_model/decoder.onnx

Be careful It is best to use relative paths here , Because it uses docker Inside python Environmental Science , If you use windows The absolute path under , This will cause the following errors . Solution ideas refer to https://github.com/microsoft/onnxruntime/issues/8735（ Anyway, I can't solve it ）

{
    FileNotFoundError}[Errno 2] No such file or directory: 'F:/ASR/model/20210618_u2pp_conformer_libtorch_aishell2/train.yaml'

Use here export_onnx_cpu Derived onnx Model , Use recognize_onnx Reasoning

encoder_ort_session=onnxruntime.InferenceSession(encoder_outpath, providers=['CPUExecutionProvider']);
ort_inputs = {
    
    encoder_ort_session.get_inputs()[0].name: feats.astype('float32'),
    encoder_ort_session.get_inputs()[1].name: feats_lengths.astype('int64'),
    encoder_ort_session.get_inputs()[2].name: np.zeros((12,4,0,128)).astype('float32'),
    encoder_ort_session.get_inputs()[3].name: np.zeros((12,1,256,7)).astype('float32')
}
encoder_ort_session.run(None, ort_inputs)

Will throw an error

{
    Fail}[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node. Name:'Slice_49' Status Message: slice.cc:153 FillVectorsFromInput Starts must be a 1-D array

Should be cuda and onnxruntime Version inconsistency results in , Reference resources OnnxRunTime encounter FAIL : Non-zero status code returned while running BatchNormalization node.

It turned out recognize_onnx It's right export_onnx_gpu.py Infer from the derived model , instead of export_onnx_cpu.py. To use export_onnx_gpu.py Still have to install nividia_docker and onnxruntime_gpu, Otherwise, an error will be reported ：

/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:53: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'
  warnings.warn("Specified provider '{}' is not in available provider names."
Traceback (most recent call last):
  File "/opt/project/wenet/bin/export_onnx_gpu.py", line 574, in <module>
    onnx_config = export_enc_func(model, configs, args, logger, encoder_onnx_path)
  File "/opt/project/wenet/bin/export_onnx_gpu.py", line 334, in export_offline_encoder
    test(to_numpy([o0, o1, o2, o3, o4]), ort_outs)
NameError: name 'test' is not defined