当前位置:网站首页>Using tensorflow to realize voiceprint recognition
Using tensorflow to realize voiceprint recognition
2022-07-02 23:55:00 【TomCruisePro】
Preface
This paper introduces the use of tensorflow Implement a simple voiceprint recognition model , First, you need to be familiar with audio classification , If you don't know, you can check my last article - be based on tensorflow Realize sound classification , On this basis , We train a voiceprint recognition model , Through this model, we can identify who the speaker is , It can be applied to some audio verification projects . The difference is that this project uses ArcFace loss,ArcFace loss:Additive Angular Margin Loss( Additive angular interval loss function ), Normalize the feature vector and weight , Yes θ Plus angular spacing , The influence of angle interval on angle is more direct than cosine interval .
Usage environment
python3.8
tensorflow2.3.0
Model download
https://download.csdn.net/download/qq_33200967/20368421
install
1.pip install tensorflow==2.3.0 -i https://mirrors.aliyun.com/pypi/simple/
2.pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
Create data
1. This niuma uses a Chinese speech corpus , This dataset has 3342 Personal voice data , Yes 1130000+ Voice data . If the reader has other better data sets , Can be mixed together , Use python Tool module of aukit Processing audio 、 Noise reduction and mute removal
2. First create a data list ,** The format of the data list is < Voice file path / Voice tag >, Creating this list is mainly It is convenient to read later , It is also convenient to read other voice data sets , Voice classification tags are the unique identification of different people id, Different voice data sets , You can write the corresponding function to generate the data list , Write these data sets in the data list
3. stay create_data.py Write down the code in , Because Chinese speech corpus This data set is mp3 Format , Ben niuma found that the reading speed of this format is very slow , So I put all mp3 Format of audio conversion to wav Format , After creating the data list , Some data may be wrong , So we need to check , Delete the wrong data . Execute the following procedure to complete data preparation
import json
import os
from pydub import AudioSegment
from tqdm import tqdm
from utils.reader import load_audio
# Generate data list
def get_data_list(infodata_path, list_path, zhvoice_path):
with open(infodata_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')
f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')
sound_sum = 0
speakers = []
speakers_dict = {
}
for line in tqdm(lines):
line = json.loads(line.replace('\n', ''))
duration_ms = line['duration_ms']
if duration_ms < 1300:
continue
speaker = line['speaker']
if speaker not in speakers:
speakers_dict[speaker] = len(speakers)
speakers.append(speaker)
label = speakers_dict[speaker]
sound_path = os.path.join(zhvoice_path, line['index'])
save_path = "%s.wav" % sound_path[:-4]
if not os.path.exists(save_path):
try:
wav = AudioSegment.from_mp3(sound_path)
wav.export(save_path, format="wav")
os.remove(sound_path)
except Exception as e:
print(' Data error :%s, Information :%s' % (sound_path, e))
continue
if sound_sum % 200 == 0:
f_test.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
else:
f_train.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
sound_sum += 1
f_test.close()
f_train.close()
# Delete error audio
def remove_error_audio(data_list_path):
with open(data_list_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
lines1 = []
for line in tqdm(lines):
audio_path, _ = line.split('\t')
try:
spec_mag = load_audio(audio_path)
lines1.append(line)
except Exception as e:
print(audio_path)
print(e)
with open(data_list_path, 'w', encoding='utf-8') as f:
for line in lines1:
f.write(line)
if __name__ == '__main__':
get_data_list('dataset/zhvoice/text/infodata.json', 'dataset', 'dataset/zhvoice')
remove_error_audio('dataset/train_list.txt')
remove_error_audio('dataset/test_list.txt')
After executing the above code , The following data formats will be generated ,, If you want to customize data , Refer to the following data list , The front is the relative path of audio , Behind it is the label of the speaker , for example
dataset/zhvoice/zhmagicdata/5_895/5_895_20170614203758.wav 3238
data fetch
边栏推荐
- CADD课程学习(4)-- 获取没有晶体结构的蛋白(SWISS-Model)
- Container runtime analysis
- Leetcode DP three step problem
- 流媒体技术优化
- The privatization deployment of SaaS services is the most efficient | cloud efficiency engineer points north
- Bean加载控制
- How can cross-border e-commerce achieve low-cost and steady growth by laying a good data base
- 接口自动化覆盖率统计——Jacoco使用
- SharedPreferences save list < bean > to local and solve com google. gson. internal. Linkedtreemap cannot be cast to exception
- MySQL Foundation
猜你喜欢
JDBC教程
[shutter] shutter open source project reference
Open Source | Wenxin Big Model Ernie Tiny Lightweight Technology, Accurate and Fast, full Open Effect
附加:token;(没写完,别看…)
Where is the win11 automatic shutdown setting? Two methods of setting automatic shutdown in win11
请求与响应
Solution: exceptiole 'xxxxx QRTZ_ Locks' doesn't exist and MySQL's my CNF file append lower_ case_ table_ Error message after names startup
List of major chip Enterprises
The privatization deployment of SaaS services is the most efficient | cloud efficiency engineer points north
Interface switching based on pyqt5 toolbar button -1
随机推荐
[array] binary search
How can cross-border e-commerce achieve low-cost and steady growth by laying a good data base
How much do you know about synchronized?
How difficult is it to be high? AI rolls into the mathematics circle, and the accuracy rate of advanced mathematics examination is 81%!
Detailed explanation of 'viewpager' in compose | developer said · dtalk
流媒体技术优化
[OJ] intersection of two arrays (set, hash mapping...)
Three solutions to frequent sticking and no response of explorer in win11 system
非路由组件之头部组件和底部组件书写
MFC file operation
JDBC practice cases
95 pages of smart education solutions 2022
MFC文件操作
Sourcetree details
Additional: token; (don't read until you finish writing...)
返回二叉树两个节点间的最大距离
Intranet penetration | teach you how to conduct intranet penetration hand in hand
Matlab 信号处理【问答笔记-1】
vim区间删行注释
Wechat applet basic learning (wxss)