当前位置:网站首页>Using tensorflow to realize voiceprint recognition
Using tensorflow to realize voiceprint recognition
2022-07-02 23:55:00 【TomCruisePro】
Preface
This paper introduces the use of tensorflow Implement a simple voiceprint recognition model , First, you need to be familiar with audio classification , If you don't know, you can check my last article - be based on tensorflow Realize sound classification , On this basis , We train a voiceprint recognition model , Through this model, we can identify who the speaker is , It can be applied to some audio verification projects . The difference is that this project uses ArcFace loss,ArcFace loss:Additive Angular Margin Loss( Additive angular interval loss function ), Normalize the feature vector and weight , Yes θ Plus angular spacing , The influence of angle interval on angle is more direct than cosine interval .
Usage environment
python3.8
tensorflow2.3.0
Model download
https://download.csdn.net/download/qq_33200967/20368421
install
1.pip install tensorflow==2.3.0 -i https://mirrors.aliyun.com/pypi/simple/
2.pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
Create data
1. This niuma uses a Chinese speech corpus , This dataset has 3342 Personal voice data , Yes 1130000+ Voice data . If the reader has other better data sets , Can be mixed together , Use python Tool module of aukit Processing audio 、 Noise reduction and mute removal
2. First create a data list ,** The format of the data list is < Voice file path / Voice tag >, Creating this list is mainly It is convenient to read later , It is also convenient to read other voice data sets , Voice classification tags are the unique identification of different people id, Different voice data sets , You can write the corresponding function to generate the data list , Write these data sets in the data list
3. stay create_data.py Write down the code in , Because Chinese speech corpus This data set is mp3 Format , Ben niuma found that the reading speed of this format is very slow , So I put all mp3 Format of audio conversion to wav Format , After creating the data list , Some data may be wrong , So we need to check , Delete the wrong data . Execute the following procedure to complete data preparation
import json
import os
from pydub import AudioSegment
from tqdm import tqdm
from utils.reader import load_audio
# Generate data list
def get_data_list(infodata_path, list_path, zhvoice_path):
with open(infodata_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')
f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')
sound_sum = 0
speakers = []
speakers_dict = {
}
for line in tqdm(lines):
line = json.loads(line.replace('\n', ''))
duration_ms = line['duration_ms']
if duration_ms < 1300:
continue
speaker = line['speaker']
if speaker not in speakers:
speakers_dict[speaker] = len(speakers)
speakers.append(speaker)
label = speakers_dict[speaker]
sound_path = os.path.join(zhvoice_path, line['index'])
save_path = "%s.wav" % sound_path[:-4]
if not os.path.exists(save_path):
try:
wav = AudioSegment.from_mp3(sound_path)
wav.export(save_path, format="wav")
os.remove(sound_path)
except Exception as e:
print(' Data error :%s, Information :%s' % (sound_path, e))
continue
if sound_sum % 200 == 0:
f_test.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
else:
f_train.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
sound_sum += 1
f_test.close()
f_train.close()
# Delete error audio
def remove_error_audio(data_list_path):
with open(data_list_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
lines1 = []
for line in tqdm(lines):
audio_path, _ = line.split('\t')
try:
spec_mag = load_audio(audio_path)
lines1.append(line)
except Exception as e:
print(audio_path)
print(e)
with open(data_list_path, 'w', encoding='utf-8') as f:
for line in lines1:
f.write(line)
if __name__ == '__main__':
get_data_list('dataset/zhvoice/text/infodata.json', 'dataset', 'dataset/zhvoice')
remove_error_audio('dataset/train_list.txt')
remove_error_audio('dataset/test_list.txt')
After executing the above code , The following data formats will be generated ,, If you want to customize data , Refer to the following data list , The front is the relative path of audio , Behind it is the label of the speaker , for example
dataset/zhvoice/zhmagicdata/5_895/5_895_20170614203758.wav 3238
data fetch
边栏推荐
- 附加:token;(没写完,别看…)
- Load balancing cluster (LBC)
- JDBC练习案例
- Define MySQL function to realize multi module call
- 程序分析与优化 - 9 附录 XLA的缓冲区指派
- Three solutions to frequent sticking and no response of explorer in win11 system
- Interface automation coverage statistics - used by Jacobo
- Realization of mask recognition based on OpenCV
- 公司里只有一个测试是什么体验?听听他们怎么说吧
- Hit the industry directly! The propeller launched the industry's first model selection tool
猜你喜欢

Create an interactive experience of popular games, and learn about the real-time voice of paileyun unity

Flexible combination of applications is a false proposition that has existed for 40 years

What is the official website address of e-mail? Explanation of the login entry of the official website address of enterprise e-mail

JDBC教程

How to set automatic reply for mailbox and enterprise mailbox?

Dishes launcher small green program and directory management (efficiency tool)

How to apply for company email when registering in company email format?

PR FAQ, what about PR preview video card?

Use redis to realize self increment serial number

采用VNC Viewer方式遠程連接樹莓派
随机推荐
QT 如何将数据导出成PDF文件(QPdfWriter 使用指南)
Returns the size of the largest binary search subtree in a binary tree
RuntimeError: no valid convolution algorithms available in CuDNN
SharedPreferences save list < bean > to local and solve com google. gson. internal. Linkedtreemap cannot be cast to exception
Leetcode DP three step problem
Returns the maximum distance between two nodes of a binary tree
Codeforces Round #771 (Div. 2)---A-D
[error record] the flutter reports an error (could not resolve io.flutter:flutter_embedding_debug:1.0.0.)
In February 2022, the ranking list of domestic databases: oceanbase regained its popularity with "three consecutive increases", and gaussdb is expected to achieve the largest increase this month
Interface automation coverage statistics - used by Jacobo
Speech recognition Series 1: speech recognition overview
MFC file operation
[shutter] shutter open source project reference
判断二叉树是否为满二叉树
Detailed explanation of 'viewpager' in compose | developer said · dtalk
@How to use bindsinstance in dagger2
Analyze ad654: Marketing Analytics
95页智慧教育解决方案2022
ArrayList分析2 :Itr、ListIterator以及SubList中的坑
Improvement of RTP receiving and sending PS stream tool (II)