当前位置:网站首页>Using tensorflow to realize voiceprint recognition
Using tensorflow to realize voiceprint recognition
2022-07-02 23:55:00 【TomCruisePro】
Preface
This paper introduces the use of tensorflow Implement a simple voiceprint recognition model , First, you need to be familiar with audio classification , If you don't know, you can check my last article - be based on tensorflow Realize sound classification , On this basis , We train a voiceprint recognition model , Through this model, we can identify who the speaker is , It can be applied to some audio verification projects . The difference is that this project uses ArcFace loss,ArcFace loss:Additive Angular Margin Loss( Additive angular interval loss function ), Normalize the feature vector and weight , Yes θ Plus angular spacing , The influence of angle interval on angle is more direct than cosine interval .
Usage environment
python3.8
tensorflow2.3.0
Model download
https://download.csdn.net/download/qq_33200967/20368421
install
1.pip install tensorflow==2.3.0 -i https://mirrors.aliyun.com/pypi/simple/
2.pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
Create data
1. This niuma uses a Chinese speech corpus , This dataset has 3342 Personal voice data , Yes 1130000+ Voice data . If the reader has other better data sets , Can be mixed together , Use python Tool module of aukit Processing audio 、 Noise reduction and mute removal
2. First create a data list ,** The format of the data list is < Voice file path / Voice tag >, Creating this list is mainly It is convenient to read later , It is also convenient to read other voice data sets , Voice classification tags are the unique identification of different people id, Different voice data sets , You can write the corresponding function to generate the data list , Write these data sets in the data list
3. stay create_data.py Write down the code in , Because Chinese speech corpus This data set is mp3 Format , Ben niuma found that the reading speed of this format is very slow , So I put all mp3 Format of audio conversion to wav Format , After creating the data list , Some data may be wrong , So we need to check , Delete the wrong data . Execute the following procedure to complete data preparation
import json
import os
from pydub import AudioSegment
from tqdm import tqdm
from utils.reader import load_audio
# Generate data list
def get_data_list(infodata_path, list_path, zhvoice_path):
with open(infodata_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')
f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')
sound_sum = 0
speakers = []
speakers_dict = {
}
for line in tqdm(lines):
line = json.loads(line.replace('\n', ''))
duration_ms = line['duration_ms']
if duration_ms < 1300:
continue
speaker = line['speaker']
if speaker not in speakers:
speakers_dict[speaker] = len(speakers)
speakers.append(speaker)
label = speakers_dict[speaker]
sound_path = os.path.join(zhvoice_path, line['index'])
save_path = "%s.wav" % sound_path[:-4]
if not os.path.exists(save_path):
try:
wav = AudioSegment.from_mp3(sound_path)
wav.export(save_path, format="wav")
os.remove(sound_path)
except Exception as e:
print(' Data error :%s, Information :%s' % (sound_path, e))
continue
if sound_sum % 200 == 0:
f_test.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
else:
f_train.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
sound_sum += 1
f_test.close()
f_train.close()
# Delete error audio
def remove_error_audio(data_list_path):
with open(data_list_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
lines1 = []
for line in tqdm(lines):
audio_path, _ = line.split('\t')
try:
spec_mag = load_audio(audio_path)
lines1.append(line)
except Exception as e:
print(audio_path)
print(e)
with open(data_list_path, 'w', encoding='utf-8') as f:
for line in lines1:
f.write(line)
if __name__ == '__main__':
get_data_list('dataset/zhvoice/text/infodata.json', 'dataset', 'dataset/zhvoice')
remove_error_audio('dataset/train_list.txt')
remove_error_audio('dataset/test_list.txt')
After executing the above code , The following data formats will be generated ,, If you want to customize data , Refer to the following data list , The front is the relative path of audio , Behind it is the label of the speaker , for example
dataset/zhvoice/zhmagicdata/5_895/5_895_20170614203758.wav 3238
data fetch
边栏推荐
- How much do you know about synchronized?
- How difficult is it to be high? AI rolls into the mathematics circle, and the accuracy rate of advanced mathematics examination is 81%!
- sourcetree 详细
- VIM interval deletion note
- 公司里只有一个测试是什么体验?听听他们怎么说吧
- 【STL源码剖析】仿函数(待补充)
- Fusion de la conversion et de la normalisation des lots
- MFC 获取当前时间
- A single element in an ordered array -- Valentine's Day mental problems
- Top Devops tool chain inventory
猜你喜欢

MySQL Foundation

Explain in detail the process of realizing Chinese text classification by CNN

基于OpenCV实现口罩识别

非路由组件之头部组件和底部组件书写
![MATLAB signal processing [Q & a notes-1]](/img/53/ae081820fe81ce28e1f04914678a6f.png)
MATLAB signal processing [Q & a notes-1]

Practical series - free commercial video material library
![[shutter] shutter photo wall (center component | wrap component | clickrrect component | stack component | positioned component | button combination component)](/img/c5/2f65d37682607aab98443d7f1ba775.jpg)
[shutter] shutter photo wall (center component | wrap component | clickrrect component | stack component | positioned component | button combination component)

Improvement of RTP receiving and sending PS stream tool (II)

流媒体技术优化

JDBC练习案例
随机推荐
MATLAB signal processing [Q & a notes-1]
Returns the root node of the largest binary search subtree in a binary tree
MySQL Foundation
Request and response
实用系列丨免费可商用视频素材库
Sourcetree details
leetcode 650. 2 Keys Keyboard 只有两个键的键盘(中等)
Writing of head and bottom components of non routing components
JSON数据传递参数
接口差异测试——Diffy工具
Happy Lantern Festival, how many of these technical lantern riddles can you guess correctly?
[live broadcast appointment] database obcp certification comprehensive upgrade open class
【OJ】两个数组的交集(set、哈希映射 ...)
PHP get real IP
Detailed explanation of 'viewpager' in compose | developer said · dtalk
leetcode 650. 2 keys keyboard with only two keys (medium)
【STL源码剖析】仿函数(待补充)
Leetcode DP three step problem
Interface difference test - diffy tool
一文掌握基于深度学习的人脸表情识别开发(基于PaddlePaddle)