当前位置:网站首页>Using tensorflow to realize voiceprint recognition
Using tensorflow to realize voiceprint recognition
2022-07-02 23:55:00 【TomCruisePro】
Preface
This paper introduces the use of tensorflow Implement a simple voiceprint recognition model , First, you need to be familiar with audio classification , If you don't know, you can check my last article - be based on tensorflow Realize sound classification , On this basis , We train a voiceprint recognition model , Through this model, we can identify who the speaker is , It can be applied to some audio verification projects . The difference is that this project uses ArcFace loss,ArcFace loss:Additive Angular Margin Loss( Additive angular interval loss function ), Normalize the feature vector and weight , Yes θ Plus angular spacing , The influence of angle interval on angle is more direct than cosine interval .
Usage environment
python3.8
tensorflow2.3.0
Model download
https://download.csdn.net/download/qq_33200967/20368421
install
1.pip install tensorflow==2.3.0 -i https://mirrors.aliyun.com/pypi/simple/
2.pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
Create data
1. This niuma uses a Chinese speech corpus , This dataset has 3342 Personal voice data , Yes 1130000+ Voice data . If the reader has other better data sets , Can be mixed together , Use python Tool module of aukit Processing audio 、 Noise reduction and mute removal
2. First create a data list ,** The format of the data list is < Voice file path / Voice tag >, Creating this list is mainly It is convenient to read later , It is also convenient to read other voice data sets , Voice classification tags are the unique identification of different people id, Different voice data sets , You can write the corresponding function to generate the data list , Write these data sets in the data list
3. stay create_data.py Write down the code in , Because Chinese speech corpus This data set is mp3 Format , Ben niuma found that the reading speed of this format is very slow , So I put all mp3 Format of audio conversion to wav Format , After creating the data list , Some data may be wrong , So we need to check , Delete the wrong data . Execute the following procedure to complete data preparation
import json
import os
from pydub import AudioSegment
from tqdm import tqdm
from utils.reader import load_audio
# Generate data list
def get_data_list(infodata_path, list_path, zhvoice_path):
with open(infodata_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')
f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')
sound_sum = 0
speakers = []
speakers_dict = {
}
for line in tqdm(lines):
line = json.loads(line.replace('\n', ''))
duration_ms = line['duration_ms']
if duration_ms < 1300:
continue
speaker = line['speaker']
if speaker not in speakers:
speakers_dict[speaker] = len(speakers)
speakers.append(speaker)
label = speakers_dict[speaker]
sound_path = os.path.join(zhvoice_path, line['index'])
save_path = "%s.wav" % sound_path[:-4]
if not os.path.exists(save_path):
try:
wav = AudioSegment.from_mp3(sound_path)
wav.export(save_path, format="wav")
os.remove(sound_path)
except Exception as e:
print(' Data error :%s, Information :%s' % (sound_path, e))
continue
if sound_sum % 200 == 0:
f_test.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
else:
f_train.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
sound_sum += 1
f_test.close()
f_train.close()
# Delete error audio
def remove_error_audio(data_list_path):
with open(data_list_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
lines1 = []
for line in tqdm(lines):
audio_path, _ = line.split('\t')
try:
spec_mag = load_audio(audio_path)
lines1.append(line)
except Exception as e:
print(audio_path)
print(e)
with open(data_list_path, 'w', encoding='utf-8') as f:
for line in lines1:
f.write(line)
if __name__ == '__main__':
get_data_list('dataset/zhvoice/text/infodata.json', 'dataset', 'dataset/zhvoice')
remove_error_audio('dataset/train_list.txt')
remove_error_audio('dataset/test_list.txt')
After executing the above code , The following data formats will be generated ,, If you want to customize data , Refer to the following data list , The front is the relative path of audio , Behind it is the label of the speaker , for example
dataset/zhvoice/zhmagicdata/5_895/5_895_20170614203758.wav 3238
data fetch
边栏推荐
- JDBC practice cases
- 容器运行时分析
- QT 如何将数据导出成PDF文件(QPdfWriter 使用指南)
- Difference between NVIDIA n card and amda card
- Returns the root node of the largest binary search subtree in a binary tree
- Matlab 信号处理【问答笔记-1】
- 95 pages of smart education solutions 2022
- Interface switching based on pyqt5 toolbar button -1
- Speech recognition Series 1: speech recognition overview
- Optimization of streaming media technology
猜你喜欢
JDBC練習案例
Dishes launcher small green program and directory management (efficiency tool)
Convolution和Batch normalization的融合
Data set - fault diagnosis: various data and data description of bearings of Western Reserve University
[live broadcast appointment] database obcp certification comprehensive upgrade open class
What can I do after buying a domain name?
Mapper agent development
C MVC creates a view to get rid of the influence of layout
Highly available cluster (HAC)
Master the development of facial expression recognition based on deep learning (based on paddlepaddle)
随机推荐
Implementation of VGA protocol based on FPGA
Bean加载控制
Solution: exceptiole 'xxxxx QRTZ_ Locks' doesn't exist and MySQL's my CNF file append lower_ case_ table_ Error message after names startup
Matlab 信号处理【问答笔记-1】
【STL源码剖析】仿函数(待补充)
Use redis to realize self increment serial number
Happy Lantern Festival, how many of these technical lantern riddles can you guess correctly?
Top Devops tool chain inventory
MySQL Foundation
[array] binary search
Highly available cluster (HAC)
Speech recognition Series 1: speech recognition overview
Go basic data type
leetcode 650. 2 keys keyboard with only two keys (medium)
How to maintain the brand influence of clothing enterprises
Explain in detail the process of realizing Chinese text classification by CNN
公司里只有一个测试是什么体验?听听他们怎么说吧
Program analysis and Optimization - 9 appendix XLA buffer assignment
Load balancing cluster (LBC)
How much do you know about synchronized?