当前位置:网站首页>Research on natural transition dubbing processing scheme based on MATLAB
Research on natural transition dubbing processing scheme based on MATLAB
2022-06-26 16:28:00 【Zhuoqing】

Jane Medium : Abstract : This article starts from modifying the background of video dubbing , Put forward the goal of natural integration and modification of audio ; By analyzing the factors of environmental sound , The processing scheme of recording impulse response and convolution with audio is determined ; Then through the actual test and MATLAB The feasibility of convolution scheme is explored , After improvement, the effect is acceptable ; Finally, a feasible flow scheme is proposed and the problems in the experiment are supplemented .
key word: Convolution , Audio processing , Impulse response
One 、 The background and objective of the problem
1. The background of the question
I am learning , stay B standing ( Bilibili) Operating personal account , Release some videos about unpacking evaluation of digital products . In the process of video production , Occasionally, you need to modify the audio of the original material , For example, to make up for a slip of the tongue 、 Correct typos and add some content . For the material recorded in the scene outside the dormitory , If you dub it directly in the dormitory , Due to different recording scenes , There is a strong sense of conflict between the picture and the sound , The transition is not natural .
2021 year 12 month , The author participated in the late editing of the class drama . In one of the scenes , The voice of the female leader is obviously less than that of the male leader , Post dubbing is required . Due to the lack of outdoor shooting conditions at that time , The post dubbing work is carried out in the dormitory , Although the recording effect is very good , But there is no outdoor background sound , When integrated into the video, the effect is poor .

▲ chart 1.1 Class play clips that need post-processing 2. The goal to achieve
Based on MATLAB Audio processing for , Make it quiet in the dormitory 、 Audio recorded without echo , can
It can be naturally integrated into other environments . In this way, the workload of later modification is reduced , It can also improve the final video
The overall look and feel of .
Two 、 Theoretical scheme analysis
1. Factors that produce environmental sound
When shooting outdoors , The environment will have a great impact on audio recording . The sound received by the microphone is in addition to the direct sound from the characters in the video , And the walls 、 Reflected sound from the ground , Noise from passing motor vehicles , There is even air flow 、 Background noise such as microphone noise .

▲ chart 2.1 Schematic diagram of microphone receiving sound during outside diameter shooting These sounds will have a considerable impact on the final recording , Can not be ignored . therefore , The sound received by the microphone is equivalent to the sound of the target sound after being processed by a specific environmental system . If you want to achieve the goal of simulating location recording in a quiet environment , It is necessary to find a way to describe the system .
2. Impulse response
Impulse response is defined as : A system under test when a pulse excitation signal is input , The obtained time domain response characteristics . In acoustic analysis , It is considered that the impulse response is the acoustic signature of a system , Contains a wealth of information about the system , Including arrival time 、 Frequency component 、 Reverberation attenuation characteristics and overall frequency response, etc . therefore , By measuring the impulse response, a system description scheme can be obtained .

▲ chart 2.2 Composition diagram of impulse response ( Source network )3. Using convolution to realize audio processing
Just mentioned , The sound received by the microphone is equivalent to the sound of the original sound after being processed by a specific environmental system . set up Y For the sound received by the microphone , X For the original sound , H Transfer functions to the system , stay s Domain has :
Y ( s ) = H ( s ) ⋅ H ( x ) Y\left( s \right) = H\left( s \right) \cdot H\left( x \right) Y(s)=H(s)⋅H(x)
Since the system transfer function is equal to the impulse response of the system , therefore H(s) It can be obtained by actually measuring the impulse response . from s The operational relationship between domain and time domain , Yes :
y ( t ) = h ( t ) ∗ x ( t ) y\left( t \right) = h\left( t \right) * x\left( t \right) y(t)=h(t)∗x(t)
therefore , Convolute the audio recorded in a quiet environment with the impulse response recorded in a specific environment , In theory, you can get audio that is similar to the recording effect in this environment , Then it can be harmoniously integrated into the materials that need to be changed .
3、 ... and 、 Test practice
1. Test plan development
(1) Selection of test scenarios
Combine the actual conditions with the ease of operation , Select the quiet recording environment inside the dormitory , The specific recording environment is selected as the bathroom . Because the bathroom is very small 、 Strong tightness , So there will be strong reverberation when recording , It's easy to experiment .

▲ chart 3.1 The specific recording environment is selected as the bathroom (2) Selection of recording device
The recording device is capacitive USB Microphone . Compared with mobile phones , The microphone has a certain noise reduction effect , The recorded audio is mono , It is convenient for subsequent experimental operation .

▲ chart 3.2 The experiment used USB Microphone (3) Test audio selection
The audio text recorded in the bathroom is “ Tsinghua University since 02 class ”, The audio text recorded in the dormitory is “ Department of automation ”, The aim is to integrate it into “ Department of automation, Tsinghua University 02 class ”. For impulse signals , After testing a series of triggering methods , Select the best sounding signal to record the impulse response of the bathroom .
2. Actual test process
(1) Recording audio in the bathroom
Record in the bathroom “ Tsinghua University since 02 class ” Audio , Name it “ bathvoice”.
(2) Measure the impulse response of the bathroom
Snap your fingers in the bathroom , Determine the impulse response in the bathroom environment , Name it “ pulse”.
(3) Recording audio in the dorm
Recording in the dormitory “ Department of automation ” Audio , Name it “ roomvoice”.
(4) Audio waveform check
use GoldWave Music software checks whether the audio is mono , The results were normal , As shown in the figure below :

▲ chart 3.3 Waveform diagram of each audio file , All are mono among , The narrow column below each sound is the overall progress bar , Not the second channel , All three audio frequencies are mono .
(5) MATLAB Convolution
Import each audio file MATLAB, Here's the picture :

▲ chart 3.4 Import MATLAB Rear audio file , Sampling rate fs by 48000take roomvoice And pulse Convolution , Name it ans1, It is the result of convolution :
>> ans1 = conv(roomvoice,pulse)
Four 、 Analysis of test results
1.ans1 Effect analysis
audition ans1, It is found that the convolution result is noisy , And the response time is very long , The effect is not perfect .
>> sound(ans1, fs)
>> audiowrite('ans1.wav', ans1, 48000
The analysis reason , use plot Command draw roomvoice、pulse and ans1 The image is as follows :



▲ chart 4.1 RoomVoice,Pulse,Ans1 wave form Observe the image , You can find roomvoice The audio is about 3.8 Second decay to close to 0,ans1 The waveform of is approaching 4 Seconds before it begins to decay ; and ans1 Before 4 Second waveform ratio roomvoice Much tighter , Not close to the weak voice 0, This results in heavy noise and long reverberation . To solve the problem , Observe the impulse response pulse, It is found that the impulse start time is not zero, resulting in transmission delay , And the reverberation attenuation slope is small 、 Large background noise leads to unsatisfactory convolution results . therefore , Next, deal with the impulse response pulse Improvement .
2. Interception of impulse response
The impulse response pulse Waveform amplification , Pictured :

▲ chart 4.2 Image after impulse response amplification Contrast map 3- Schematic diagram of impulse response composition , It is observed that the period from direct sound to the end of early attenuation only accounts for a small part of the impulse response , The rest is reverberation building and reverberation attenuation . Considering that the impact of the bathroom system on the sound mainly lies in the primary reflection 、 Wall absorption and reverberation , And the limitations of the recording equipment itself , For impulse response , Direct sound arrival shall be reserved ~ Reverberation building part , Round off the rest . This can ensure that the bathroom environment system itself accounts for most of the sound processing , Try to reduce the influence of other factors on the sound .
The treatment is : First re import pulse.m4a And named it pulse2, Double-click to open pulse2, Will start all 0 Value delete , Pictured :

▲ chart 4.3 delete pulse2 Drive enough of all 0 value Then find the area where the impulse response begins to show a small value on a large scale , Delete the following part , For example, the 22290~74304 part , For best convolution , Multiple adjustments may be required :

▲ chart 4.4 delete pulse2 The smaller value of the second half use plot The impulse response after the command is processed is shown in the figure below :

▲ chart 4.5 After processing the pulse2 The first half of the amplified waveform 3.ans2 Effect analysis
take roomvoice And pulse2 Convolution , Name it ans2, Export to wav File and listen .
>> ans2 = conv(voomvoice,pulse2)
>> audiowrite('ans2.wav', ans2, 48000)
>> sound(ans2,fs)
Convolution effect is very good , Close to recording directly in the bathroom bathvoice The effect of . plot Command to draw the waveform as follows , It can be seen that the reverberation length has been greatly improved , The noise problem has also been solved to some extent :

▲ chart 4.6 ans2 Waveform of Next use Adobe Premiere Pro Software , Yes bathvoice and ans2 Audio splicing , Simulate the modification and replacement of audio in practical application , And exported as final.mp3. In addition, we will bathvoice And unprocessed audio roomvoice Make the same splicing , Export to normal.mp3 As a contrast . final.mp3 and normal.mp3 as well as Adobe Premiere Pro Project file audio editing .prproj All have been attached to the thesis package .

▲ chart 4.7 Simulate splicing and replacement in practical application 4. The processed audio is compared with the unprocessed audio
audition final.mp3 and normal.mp3, Although the processed audio is easy to hear that it is spliced , But compared to unprocessed audio , The processed splicing part adds reverberation to simulate the bathroom environment , Give a person a kind of “ The supplementary part was also recorded in the bathroom ” The feeling of , Unprocessed audio has no such effect .
Through audio comparison , It is proved that this paper is based on MATLAB Audio processing is effective , It can reduce the sense of disobedience in the late dubbing , Achieve the effect of natural transition .
5、 ... and 、 Practical application and improvement supplement
1. A feasible process plan
During the video recording of the location , You can record one or more impulse responses in each scene . If you need to modify the dubbing later , Just re record the correct audio , Convolute with the processed impulse response of the corresponding scene , Finally, select the one closest to the original audio effect for splicing , You can achieve the effect of natural transition .
2. Supplementary explanation of the problems in the experiment
The parts that lack implementation conditions and the parts that need attention in this paper are summarized as follows :
(1) Try to choose high-quality recording equipment
The key part of this paper is the convolution of signals , If the impulse response is not received correctly , Or the noise in the late dubbing is too loud , It may seriously affect the accuracy of convoluted audio . In this experiment 100 Yuan price condenser microphone , In minimizing noise ( Close the door and close the window ) Under the premise of , Recorded in bathroom and dormitory , Although there is some bottom noise , But the effect is still ideal . If you can use a higher level microphone , Recording in a professional studio , It should have a better effect .
(2) Pay attention to the microphone distance during the later dubbing
According to the author's experience in dubbing video on weekdays , The distance between the mouth and the microphone will greatly affect the recording
The effect of . In the late dubbing , Try to keep the distance between the microphone and the recording , If conditions permit, the angle and relative position should also be consistent as far as possible , In this way, the best reduction effect can be achieved . This experiment recorded roomvoice Limited by dormitory conditions , The microphone is too close , As a result, there is still a great sense of disobedience after handling .
(3) Complete reading a sentence to realize the restoration of mood
Impulse response convolution can process audio , But it can't affect the tone . So try not to read single words in the later dubbing , Instead, read the whole sentence , Guarantee tone with the original material 、 The tone is consistent , In this way, the effect of integration is better . In this experiment roomvoice Only “ Department of automation ” Four words , Tone and original sentence “ Tsinghua University since 02 class ” Large gap , Difficult to integrate , It should be avoided in practical application .
(4) Adjust loudness and other parameters with other software
Convolution will affect the loudness of the audio , It usually shows a slight increase in loudness . When integrating splices , It can be used MATLAB To adjust the amplitude of the convoluted audio , Try to match the original audio . It can also be used. Adobe Premiere Pro And other audio and video processing software to adjust the loudness , In this experiment final.mp3 in , The gain of the three audio segments is : +3.8dB、-3.3dB and +5.9dB.

▲ chart 5.1 Adjust loudness for better integration reference :
[1] Signals and systems 2022 The fourth assignment in the spring semester , https://zhuoqing.blog.csdn.net/article/details/123550045.
[2] 26 Class play -Video-Export,https://www.bilibili.com/video/BV1da411r7uM.
[3] Explanation of acoustic concepts —— Figure out what impulse response is , https://blog.csdn.net/qq_28350219/article/details/114096751.
[4] Sound changing principle : Convolution and transfer function , https://www.csdn.net/tags/MtTaAgzsNTgzNTM1LWJsb2cO0O0O.html. [5]matlab Process audio signals 33, https://www.csdn.net/tags/OtTaAgysODM2MDUtYmxvZwO0O0OO0O0O.html.
● Related chart Links :
- chart 1.1 Class play clips that need post-processing
- chart 2.1 Schematic diagram of microphone receiving sound during outside diameter shooting
- chart 2.2 Composition diagram of impulse response ( Source network )
- chart 3.1 The specific recording environment is selected as the bathroom
- chart 3.2 The experiment used USB Microphone
- chart 3.3 Waveform diagram of each audio file , All are mono
- chart 3.4 Import MATLAB Rear audio file , Sampling rate fs by 48000
- chart 4.1 RoomVoice,Pulse,Ans1 wave form
- chart 4.2 Image after impulse response amplification
- chart 4.3 delete pulse2 Drive enough of all 0 value
- chart 4.4 delete pulse2 The smaller value of the second half
- chart 4.5 After processing the pulse2 The first half of the amplified waveform
- chart 4.6 ans2 Waveform of
- chart 4.7 Simulate splicing and replacement in practical application
- chart 5.1 Adjust loudness for better integration
边栏推荐
- Redis migration (recommended operation process)
- R language plotly visualization: Violin graph, multi category variable violin graph, grouped violin graph, split grouped violin graph, two groups of data in each violin graph, each group accounts for
- 我把它当副业月入3万多,新手月入过万的干货分享!
- What is the process of switching C # read / write files from user mode to kernel mode?
- 补齐短板-开源IM项目OpenIM关于初始化/登录/好友接口文档介绍
- Notes on key review of software engineering at the end of the term
- 【蓝桥杯集训100题】scratch辨别质数合数 蓝桥杯scratch比赛专项预测编程题 集训模拟练习题第15题
- 100+数据科学面试问题和答案总结 - 基础知识和数据分析
- Practice of federal learning in Tencent micro vision advertising
- I regard it as a dry product with a monthly income of more than 30000 yuan for sidelines and more than 10000 yuan for novices!
猜你喜欢

李飞飞团队将ViT用在机器人身上,规划推理最高提速512倍,还cue了何恺明的MAE...

Natural language inference with attention and fine tuning Bert pytorch

【力扣刷题】11.盛最多水的容器//42.接雨水

SAP OData 开发教程 - 从入门到提高(包含 SEGW, RAP 和 CDP)

基于STM32+华为云IOT设计的云平台监控系统

100+数据科学面试问题和答案总结 - 基础知识和数据分析

This year, the AI score of college entrance examination English is 134. The research of Fudan Wuda alumni is interesting

IAR工程适配GD32芯片

神经网络“炼丹炉”内部构造长啥样?牛津大学博士小姐姐用论文解读

Arduino UNO + DS1302简单获取时间并串口打印
随机推荐
Make up the weakness - Open Source im project openim about initialization / login / friend interface document introduction
架构实战营毕业设计
Codeforces Round #802 (Div. 2)
补齐短板-开源IM项目OpenIM关于初始化/登录/好友接口文档介绍
[time complexity and space complexity]
Redis顺序排序命令
用Attention和微调BERT进行自然语言推断-PyTorch
Redis migration (recommended operation process) 1
LeetCode Algorithm 24. 两两交换链表中的节点
Binary array command of redis
[learn FPGA programming from scratch -46]: Vision - development and technological progress of integrated circuits
R语言plotly可视化:plotly可视化归一化的直方图(historgram)并在直方图中添加密度曲线kde、并在直方图的底部边缘使用geom_rug函数添加边缘轴须图
Hyperf框架使用阿里云OSS上传失败
Lifeifei's team applied vit to the robot, increased the maximum speed of planning reasoning by 512 times, and also cued hekaiming's Mae
我把它当副业月入3万多,新手月入过万的干货分享!
神经网络“炼丹炉”内部构造长啥样?牛津大学博士小姐姐用论文解读
Failed to upload hyperf framework using alicloud OSS
国内首款开源 MySQL HTAP 数据库即将发布,三大看点提前告知
若依打包如何分离jar包和资源文件?
电路中缓存的几种形式