当前位置:网站首页>Using tensorflow to realize voiceprint recognition
Using tensorflow to realize voiceprint recognition
2022-07-02 23:55:00 【TomCruisePro】
Preface
This paper introduces the use of tensorflow Implement a simple voiceprint recognition model , First, you need to be familiar with audio classification , If you don't know, you can check my last article - be based on tensorflow Realize sound classification , On this basis , We train a voiceprint recognition model , Through this model, we can identify who the speaker is , It can be applied to some audio verification projects . The difference is that this project uses ArcFace loss,ArcFace loss:Additive Angular Margin Loss( Additive angular interval loss function ), Normalize the feature vector and weight , Yes θ Plus angular spacing , The influence of angle interval on angle is more direct than cosine interval .
Usage environment
python3.8
tensorflow2.3.0
Model download
https://download.csdn.net/download/qq_33200967/20368421
install
1.pip install tensorflow==2.3.0 -i https://mirrors.aliyun.com/pypi/simple/
2.pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
Create data
1. This niuma uses a Chinese speech corpus , This dataset has 3342 Personal voice data , Yes 1130000+ Voice data . If the reader has other better data sets , Can be mixed together , Use python Tool module of aukit Processing audio 、 Noise reduction and mute removal
2. First create a data list ,** The format of the data list is < Voice file path / Voice tag >, Creating this list is mainly It is convenient to read later , It is also convenient to read other voice data sets , Voice classification tags are the unique identification of different people id, Different voice data sets , You can write the corresponding function to generate the data list , Write these data sets in the data list
3. stay create_data.py Write down the code in , Because Chinese speech corpus This data set is mp3 Format , Ben niuma found that the reading speed of this format is very slow , So I put all mp3 Format of audio conversion to wav Format , After creating the data list , Some data may be wrong , So we need to check , Delete the wrong data . Execute the following procedure to complete data preparation
import json
import os
from pydub import AudioSegment
from tqdm import tqdm
from utils.reader import load_audio
# Generate data list
def get_data_list(infodata_path, list_path, zhvoice_path):
with open(infodata_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')
f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')
sound_sum = 0
speakers = []
speakers_dict = {
}
for line in tqdm(lines):
line = json.loads(line.replace('\n', ''))
duration_ms = line['duration_ms']
if duration_ms < 1300:
continue
speaker = line['speaker']
if speaker not in speakers:
speakers_dict[speaker] = len(speakers)
speakers.append(speaker)
label = speakers_dict[speaker]
sound_path = os.path.join(zhvoice_path, line['index'])
save_path = "%s.wav" % sound_path[:-4]
if not os.path.exists(save_path):
try:
wav = AudioSegment.from_mp3(sound_path)
wav.export(save_path, format="wav")
os.remove(sound_path)
except Exception as e:
print(' Data error :%s, Information :%s' % (sound_path, e))
continue
if sound_sum % 200 == 0:
f_test.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
else:
f_train.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
sound_sum += 1
f_test.close()
f_train.close()
# Delete error audio
def remove_error_audio(data_list_path):
with open(data_list_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
lines1 = []
for line in tqdm(lines):
audio_path, _ = line.split('\t')
try:
spec_mag = load_audio(audio_path)
lines1.append(line)
except Exception as e:
print(audio_path)
print(e)
with open(data_list_path, 'w', encoding='utf-8') as f:
for line in lines1:
f.write(line)
if __name__ == '__main__':
get_data_list('dataset/zhvoice/text/infodata.json', 'dataset', 'dataset/zhvoice')
remove_error_audio('dataset/train_list.txt')
remove_error_audio('dataset/test_list.txt')
After executing the above code , The following data formats will be generated ,, If you want to customize data , Refer to the following data list , The front is the relative path of audio , Behind it is the label of the speaker , for example
dataset/zhvoice/zhmagicdata/5_895/5_895_20170614203758.wav 3238
data fetch
边栏推荐
- leetcode 650. 2 keys keyboard with only two keys (medium)
- 【ML】李宏毅三:梯度下降&分类(高斯分布)
- Master the development of facial expression recognition based on deep learning (based on paddlepaddle)
- 35页危化品安全管理平台解决方案2022版
- List of major chip Enterprises
- Markdown basic grammar
- PR FAQ, what about PR preview video card?
- MFC gets the current time
- How QT exports data to PDF files (qpdfwriter User Guide)
- leetcode 650. 2 Keys Keyboard 只有两个键的键盘(中等)
猜你喜欢

Detailed explanation of 'viewpager' in compose | developer said · dtalk

基于FPGA的VGA协议实现

What if win11 can't turn off the sticky key? The sticky key is cancelled but it doesn't work. How to solve it

Where is the win11 automatic shutdown setting? Two methods of setting automatic shutdown in win11

JDBC tutorial

CDN acceleration requires the domain name to be filed first

Why can't the start method be called repeatedly? But the run method can?

Integration of revolution and batch normalization

采用VNC Viewer方式远程连接树莓派

Where is the win11 microphone test? Win11 method of testing microphone
随机推荐
ArrayList analysis 2: pits in ITR, listiterator, and sublist
JSON data transfer parameters
35 pages dangerous chemicals safety management platform solution 2022 Edition
[error record] the flutter reports an error (could not resolve io.flutter:flutter_embedding_debug:1.0.0.)
Agnosticism and practice makes perfect
Arduino - character judgment function
PHP get real IP
开源了 | 文心大模型ERNIE-Tiny轻量化技术,又准又快,效果全开
Leetcode relaxation question - day of the week
附加:token;(没写完,别看…)
Digital twin visualization solution digital twin visualization 3D platform
Difference between NVIDIA n card and amda card
[live broadcast appointment] database obcp certification comprehensive upgrade open class
Integration of revolution and batch normalization
CADD课程学习(4)-- 获取没有晶体结构的蛋白(SWISS-Model)
Dishes launcher small green program and directory management (efficiency tool)
What if win11 can't turn off the sticky key? The sticky key is cancelled but it doesn't work. How to solve it
JDBC练习案例
leetcode 650. 2 keys keyboard with only two keys (medium)
返回二叉树两个节点间的最大距离