当前位置:网站首页>Using tensorflow to realize voiceprint recognition
Using tensorflow to realize voiceprint recognition
2022-07-02 23:55:00 【TomCruisePro】
Preface
This paper introduces the use of tensorflow Implement a simple voiceprint recognition model , First, you need to be familiar with audio classification , If you don't know, you can check my last article - be based on tensorflow Realize sound classification , On this basis , We train a voiceprint recognition model , Through this model, we can identify who the speaker is , It can be applied to some audio verification projects . The difference is that this project uses ArcFace loss,ArcFace loss:Additive Angular Margin Loss( Additive angular interval loss function ), Normalize the feature vector and weight , Yes θ Plus angular spacing , The influence of angle interval on angle is more direct than cosine interval .
Usage environment
python3.8
tensorflow2.3.0
Model download
https://download.csdn.net/download/qq_33200967/20368421
install
1.pip install tensorflow==2.3.0 -i https://mirrors.aliyun.com/pypi/simple/
2.pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/
Create data
1. This niuma uses a Chinese speech corpus , This dataset has 3342 Personal voice data , Yes 1130000+ Voice data . If the reader has other better data sets , Can be mixed together , Use python Tool module of aukit Processing audio 、 Noise reduction and mute removal
2. First create a data list ,** The format of the data list is < Voice file path / Voice tag >, Creating this list is mainly It is convenient to read later , It is also convenient to read other voice data sets , Voice classification tags are the unique identification of different people id, Different voice data sets , You can write the corresponding function to generate the data list , Write these data sets in the data list
3. stay create_data.py Write down the code in , Because Chinese speech corpus This data set is mp3 Format , Ben niuma found that the reading speed of this format is very slow , So I put all mp3 Format of audio conversion to wav Format , After creating the data list , Some data may be wrong , So we need to check , Delete the wrong data . Execute the following procedure to complete data preparation
import json
import os
from pydub import AudioSegment
from tqdm import tqdm
from utils.reader import load_audio
# Generate data list
def get_data_list(infodata_path, list_path, zhvoice_path):
with open(infodata_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
f_train = open(os.path.join(list_path, 'train_list.txt'), 'w')
f_test = open(os.path.join(list_path, 'test_list.txt'), 'w')
sound_sum = 0
speakers = []
speakers_dict = {
}
for line in tqdm(lines):
line = json.loads(line.replace('\n', ''))
duration_ms = line['duration_ms']
if duration_ms < 1300:
continue
speaker = line['speaker']
if speaker not in speakers:
speakers_dict[speaker] = len(speakers)
speakers.append(speaker)
label = speakers_dict[speaker]
sound_path = os.path.join(zhvoice_path, line['index'])
save_path = "%s.wav" % sound_path[:-4]
if not os.path.exists(save_path):
try:
wav = AudioSegment.from_mp3(sound_path)
wav.export(save_path, format="wav")
os.remove(sound_path)
except Exception as e:
print(' Data error :%s, Information :%s' % (sound_path, e))
continue
if sound_sum % 200 == 0:
f_test.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
else:
f_train.write('%s\t%d\n' % (save_path.replace('\\', '/'), label))
sound_sum += 1
f_test.close()
f_train.close()
# Delete error audio
def remove_error_audio(data_list_path):
with open(data_list_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
lines1 = []
for line in tqdm(lines):
audio_path, _ = line.split('\t')
try:
spec_mag = load_audio(audio_path)
lines1.append(line)
except Exception as e:
print(audio_path)
print(e)
with open(data_list_path, 'w', encoding='utf-8') as f:
for line in lines1:
f.write(line)
if __name__ == '__main__':
get_data_list('dataset/zhvoice/text/infodata.json', 'dataset', 'dataset/zhvoice')
remove_error_audio('dataset/train_list.txt')
remove_error_audio('dataset/test_list.txt')
After executing the above code , The following data formats will be generated ,, If you want to customize data , Refer to the following data list , The front is the relative path of audio , Behind it is the label of the speaker , for example
dataset/zhvoice/zhmagicdata/5_895/5_895_20170614203758.wav 3238
data fetch
边栏推荐
- Go basic data type
- Third party payment function test point [Hangzhou multi tester _ Wang Sir] [Hangzhou multi tester]
- 95 pages of smart education solutions 2022
- How to set automatic reply for mailbox and enterprise mailbox?
- Writing of head and bottom components of non routing components
- 富滇银行完成数字化升级|OceanBase数据库助力布局分布式架构中台
- The privatization deployment of SaaS services is the most efficient | cloud efficiency engineer points north
- [live broadcast appointment] database obcp certification comprehensive upgrade open class
- List of major chip Enterprises
- Flexible combination of applications is a false proposition that has existed for 40 years
猜你喜欢

Many to one, one to many processing

Where is the win11 automatic shutdown setting? Two methods of setting automatic shutdown in win11

35 pages dangerous chemicals safety management platform solution 2022 Edition

JDBC教程

Three solutions to frequent sticking and no response of explorer in win11 system

JDBC tutorial

流媒体技术优化

Connexion à distance de la tarte aux framboises en mode visionneur VNC

開源了 | 文心大模型ERNIE-Tiny輕量化技術,又准又快,效果全開

Implementation of VGA protocol based on FPGA
随机推荐
Digital twin visualization solution digital twin visualization 3D platform
QT 如何将数据导出成PDF文件(QPdfWriter 使用指南)
leetcode 650. 2 Keys Keyboard 只有两个键的键盘(中等)
What can I do after buying a domain name?
MySQL Foundation
CADD course learning (4) -- obtaining proteins without crystal structure (Swiss model)
How much do you know about synchronized?
YOLOX加强特征提取网络Panet分析
Installing redis under Linux
Open source | Wenxin big model Ernie tiny lightweight technology, which is accurate and fast, and the effect is fully open
I've been interviewed. The starting salary is 16K
開源了 | 文心大模型ERNIE-Tiny輕量化技術,又准又快,效果全開
ArrayList分析2 :Itr、ListIterator以及SubList中的坑
Container runtime analysis
Happy Lantern Festival, how many of these technical lantern riddles can you guess correctly?
Leetcode DP three step problem
Data set - fault diagnosis: various data and data description of bearings of Western Reserve University
直击产业落地!飞桨重磅推出业界首个模型选型工具
接口差异测试——Diffy工具
JDBC Exercise case