当前位置:网站首页>Industry insight | is speech recognition really beyond human ears?
Industry insight | is speech recognition really beyond human ears?
2022-07-28 03:09:00 【Magic Data】

In recent years , With the development of artificial intelligence technology , The performance of speech recognition has been significantly improved . Many companies claim , The accuracy of speech recognition technology has reached 98% above , Is speech recognition really more effective than human ears ?
Of course, this conclusion cannot be drawn . After all, the human brain is the most accurate instrument in the world . There is a saying on the Internet that is very good ,“ Leaving the test set to say accuracy is like playing hooligans ”. When in a quiet environment , The recognition accuracy is about 98%, But when in a noisy environment , The accuracy will drop rapidly .
When at a party , It is difficult for speech recognition machines to pick up the speech of the target speaker from the overlapping speech , More difficult to identify accurately , This is a classic problem in the field of speech recognition —— Cocktail party problem (Cooktail Party Problem). In the mix of various sounds , Hear the voice you want to pay attention to , It's human instinct . But for machines , This is it. “ explode ”, It must be through speech separation technology , First, separate the target speech , Then it can be identified .
Speech separation algorithm based on Neural Network
Speech separation is the solution in speech recognition “ cocktail lounge ” The first step of the problem . Add speech separation technology to the front end of speech recognition , Separating the voice of the target speaker from other interference can improve the robustness of the speech recognition system . Cocktail party problem refers to the collected audio signal except for the main speaker , There is also interference from other people's voices and noise . The goal of speech separation is to separate the main speaker's speech from these interferences .
At present, the mainstream speech separation algorithm is based on Neural Network , The main purpose of neural network is to learn an ideal binary masking (Ideal Binary Mask,IBM), To determine which time-frequency units of the target signal in the spectrum (Time-frequency Units) Take the lead . If an auditory signal is divided into two dimensions: time domain and frequency domain ( Time-frequency two-dimensional ) To said , We can put the hour 、 The two dimensions of frequency are expressed as a two-dimensional matrix , Each element in this matrix is called a time-frequency unit . If you don't need to divide the target signal so carefully , Just once in a while —— It belongs to the target sound source , Or background noise , Then the time-frequency unit can be quantized as 2 It's worth , such as 0 and 1, This is binary . such , From the perspective of ideal binary masking , This problem becomes a supervised learning (Supervised Learning) The problem of classification .
Speech separation algorithm based on multimodal fusion
In addition to the above pure speech, do speech separation , Solve the cocktail party problem , Recently, there are many articles to solve the cocktail party problem with multimodal methods . Google from YouTube I searched for 10 10000 high-quality lectures and speech videos to generate training samples , Adoption of the covenant 2000 Hours of video clip analysis , Train a neural network based on multi stream convolution (CNN) Model of , Segment the synthetic cocktail party segment into separate audio streams for each speaker in the video . In the experiments , The input is one or more vocal objects , A video that is simultaneously disturbed by other objects or a noisy background . The output is to decompose the audio track of the input video into pure audio tracks , And correspond to the corresponding speaker .
Whether multimodal or monomodal Speech Separation Algorithm , Can not be separated from the support of voice data , The cost of voice data acquisition for multiple speakers is high 、 Difficulty in marking . and Magic Data As the world's leading AI Data service provider , It can provide many high-quality data for Algorithm Engineers , Provide experimental machine tools for solving cocktail party problems .
Edward Colin Cherry Published in 1957 Year of On Human Communication The book says :“ up to now , No machine algorithm can solve ‘ cocktail lounge ’ problem .” I didn't expect that so far , This assertion is still not completely overturned .
边栏推荐
- 机器人工程是否有红利期
- 四、固态硬盘存储技术的分析(论文)
- Job 7.27 IO process
- LoRaWAN中的网关和chirpstack到底如何通信的?UDP?GRPC?MQTT?
- MySQL essay
- Arm32 for remote debugging
- 注意,这些地区不能参加7月NPDP考试
- CSDN TOP1“一个处女座的程序猿“如何通过写作成为百万粉丝博主?
- mysql 随笔
- The applet has obtained the total records and user locations in the database collection. How to use aggregate.geonear to arrange the longitude and latitude from near to far?
猜你喜欢

Redis aof日志持久化

The test post changes jobs repeatedly, jumping and jumping, and then it disappears

CNN循环训练的解释 | PyTorch系列(二十二)

Superparameter adjustment and experiment - training depth neural network | pytorch series (26)

Explanation of CNN circular training | pytorch series (XXII)

On the problem that sqli labs single quotation marks do not report errors

ROS的调试经验

Record of a cross domain problem

Data Lake: database data migration tool sqoop

vscode debug显示多列数据
随机推荐
【stream】并行流与顺序流
"29 years old, general function test, how do I get five offers in a week?"
Data Lake: flume, a massive log collection engine
Development and design logic of rtsp/onvif protocol easynvr video platform one click upgrade scheme
stm32F407-------FPU学习
style=“width: ___“ VS width=“___“
Pytorch 相关-梯度回传
Why is it that when logging in, you clearly use the account information already in the database, but still display "user does not exist"?
数据湖:各模块组件
数据湖:海量日志采集引擎Flume
NPDP candidates! The exam requirements for July 31 are here!
Deep residual learning for image recognition shallow reading and Implementation
【微信小程序开发(六)】绘制音乐播放器环形进度条
els 键盘信息
Promise object
MySQL index learning
R 笔记 MICE
Day 8 of DL
Intelligent industrial design software company Tianfu C round financing of hundreds of millions of yuan
GAMES101复习:光线追踪(Ray Tracing)