当前位置:网站首页>Sok: the faults in our asrs: an overview of attacks against automatic speech recognition
Sok: the faults in our asrs: an overview of attacks against automatic speech recognition
2022-07-27 06:52:00 【Haulyn5】
Catalog
Preface
Citation format :H. Abdullah, K. Warren, V. Bindschaedler, N. Papernot, and P. Traynor, “SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems,” in 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24-27 May 2021, 2021, pp. 730–747. doi: 10.1109/SP40001.2021.00014.
Thesis title :SoK: The Faults in our ASRs: An Overview of Attacks against Automatic Speech Recognition and Speaker Identification Systems
The title of the thesis is too long , exceed CSDN The biggest limit is all . Published in S&P21 The last article about ASR An overview of attack and defense on .
Text
summary
This article is right for the previous ASR And SI( They are speech to text and speaker recognition ) An overview of the attack and defense of the two missions , It mainly investigates the scheme based on neural network method . Author expresses , There is still a big difference between the attack and defense in the voice field and the image field ( The following notes will explain in a little detail ), Then the previous work is classified in different dimensions . Last , The small experiment description in the appendix , The author himself train Several models , except seed, Model hyperparameters are the same , The last generated countermeasure samples cannot transfer, There is no successful .
“ intensive reading ” note
Introduction
The author summarizes his contribution :
Yes VPS(Voice Processing Systems) Relevant threat models are classified ,( Mark yourself as the first )
Classify the existing work according to the framework proposed in the previous contribution ,( Author expresses , After classification, we found , although paper Quite a lot , But there's still a lot to do
Tested Transferability, The result is basically the same training set and test set segmentation , Same super parameter , Even the same seed And model architecture , Finally, there was no successful transfer .
Background
First of all, let's briefly talk about psychoacoustics , say ASR These are imitations of human brain processing speech , Let's talk about the nonlinearity of human auditory perception , And the threshold of human auditory frequency .
Next let's talk. VPS, Voice processing system , say VPS There are two kinds ,ASR and SI(Speaker Identification), Then these two are very similar , Let's say ASR Take as an example to introduce the process pipeline.
The flow chart above uses Mermaid draw , Syntax can click on the link . Rich text editor didn't find what happened , Come again Markdown Editor .
This process is basically a cliche , The first is pretreatment . Interestingly, the paper mentioned that relevant detection algorithms can be used to detect segments with human speech G.729, I looked at the references , 99 Technology of . Another thing to preprocess is filtering , General low-pass filtering , Do not use frequency components that exceed the threshold .
Feature extraction is generally MFCC, The original text of the thesis is also taken MFCC say , You should have a good look when you have a chance MFCC Design idea , It is still necessary to understand formulas and logic .MFCC The logic is DFT The discrete Fourier transform changes to the frequency domain ,( Of course, it's a frame ), then MEL wave filtering , It is to enhance or attenuate different frequencies according to human auditory sensitivity , Then take the logarithm , Because human perception of loudness is logarithmic , And finally DCT Discrete cosine transform , This step is more interesting , After reading some explanations, I said that I was taking the envelope , Discrete cosine transform is also something that information hiding learned a long time ago , If you have a chance, you should expand this part .
In short, the proposed matrix is a eigenvector for each frame , Then send it to the network to do Inference, If SI This is the end , classification . Then because the output of the model is probability , And use it Beam search Find the text that has the highest total probability of output from so many frames . Besides , The result obtained at this time , In the author's words , It is difficult for human beings to understand , Because maybe several frames represent a letter , The output is LLOOOCK, It's actually LOCK. Then do some processing. The original text is called Decoding Get the final result . Concrete ASR After the process, we should see teacher Li Hongyi DLHLP 2020.
Said a lot , The author says , Each of these modules can be replaced , Then the model is mainly based on Neural Network .
Then I talked about ML Model related attacks ,Integrity and Confidentiality Two angles . One is to train poisoning and escape attacks ( Counter samples ), The other is data privacy and model extraction . This article mainly investigates the confrontation samples . Later, we will briefly introduce the confrontation samples .
Attacks against VPSes
Then the author introduces the difference between the picture field and the audio field . The first is pretreatment , After that, speech is generally modeled by sequence (RNN And so on ), The second is Beam Search Such as the use of non guided operations , Finally, the use of some statistical models ( Not Neural Networks ). All this makes the attack and defense of voice different from the field of image .
Attack threat model taxonomy
This part is the author's classification of attacks , The author has designed many classification angles . This part feels unnecessary to write it down , Use now, see now . Remember some interesting points . For example, the author is concerned about Attack Medium The classification of , Unlike the traditional Physical,Logical(ASV Spoof The classification of ), The author lists 4 class , Namely Over-Line , Over-Air , Over-Telephony-Network , and Over-Others. The first two are the commonly used classifications , The third is the scenario of telephone network , Consider coding loss , Packet loss , There are also various noisy scenes , An example of the last kind of authors is mp3 Compression is relative to wav This kind of .
Existing attack classification
It is mainly based on some perspectives on classification in the previous chapter , Classify the existing work . There is not much to say specifically , The author published the classification results on a website as follows .
https://sites.google.com/view/adv-asr-sok/
Only some interesting points are recorded here . The author said that many works have not given a reasonable and effective Metric Evaluate the advantages of their work over the previous work . second , Most jobs cannot work Over-air.
Defense and detection taxonomy
This chapter is similar to the previous chapter , It is the classification framework of defense schemes , Introduced the angle of classification .
Defense and detection classification
This chapter is the classification of some existing defense schemes , Mainly confrontation training ( It will reduce the accuracy of the model , Secondly, it may fail due to pretreatment ) And vivisection . Finally, in the future , The author points out that one direction is to redesign preprocessing and feature extraction .( Of course it's hard , The technology screened out by the development of speech processing field for decades )
Discussion
The last golden part .
- Attacks based on optimized attacks lack transferability , This is reflected in the experiment done by the author in the appendix below, which has not been transferred successfully .
- About defense , Existing black box attacks are difficult to transfer , White box attacks almost never happen . Of course , Researchers who design defense should still pay attention to the protection against black box scene attacks .
- Image and audio are different , The whole of audio pipeline There are still many places to explore . from pipeline On the whole , Design attacks and defenses against different components , It can be the research direction in the future .
- The author didn't see Poisoning and Privacy Attack.
- At present, there is no defense work done in Over-telephony above .
- About audio Intelligibility , Lack of evaluation criteria .L-2 Regularize what may be useful on these images , It doesn't match human's psychological perception of sound .
边栏推荐
- Pymysql query result conversion JSON
- 1. CentOS 7 安装 redis
- 2022上半年英特尔有哪些“硬核创新”?看这张图就知道了!
- What is special about the rehabilitation orthopedic branch of 3D printing brand?
- Alibaba cloud SMS authentication third-party interface (fast use)
- Build cloud native operating environment
- GoLand writes Go program
- ArcGIS for JS API entry series
- 一键修复漏洞可行吗?向日葵来告诉你一键修复漏洞可行吗?向日葵来告诉你一键修复漏洞可行吗?向日葵来告诉你一键修复漏洞可行吗?向日葵来告诉你一键修复漏洞可行吗?向日葵来告诉你一键修复漏洞可行吗?向日葵来告
- Shell script delete automatically clean up files that exceed the size
猜你喜欢

Publish a building segmentation database with a resolution of 0.22m

云原生运行环境搭建

Shell Function

To improve the baby's allergy, take yiminshu. Azg and aibeca love la Beijia work together to protect the growth of Chinese babies

Pruning - quantification - turn to onnx Chinese series tutorials

How to avoid loopholes? Sunflower remote explains the safe use methods in different scenarios

PSI|CSI和ROC|AUC和KS -备忘录

EasyCVR平台播放设备录像时,拖动时间轴播放无效是什么原因?

众多世界500强企业集聚第二届数博会,数字产业大幕即将开启!

Geonode GeoServer win10 installation tutorial (personal test)
随机推荐
Basic knowledge of English: juxtaposition structure
Open source WebGIS related knowledge
LVM and disk quota
众多世界500强企业集聚第二届数博会,数字产业大幕即将开启!
Detection and identification data set and yolov5 model of helmet reflective clothing
Decorator functions and the use of class decorators
What is special about the rehabilitation orthopedic branch of 3D printing brand?
Express接收请求参数
Webodm win10 installation tutorial (personal test)
项目实训经历1
ArcGIS for JS API entry series
Shell Function
Iptables firewall
Brief introduction of chip, memory and its key indicators I
Shell -- circular statements (for, while, until)
PSI|CSI和ROC|AUC和KS -备忘录
网站服务器被攻击怎么办?向日葵提示防范漏洞是关键
Redis快速学习
程序、进程、线程、协程以及单线程、多线程基本概念
1. CentOS 7 安装 redis