当前位置：网站首页>Research on natural transition dubbing processing scheme based on MATLAB

Research on natural transition dubbing processing scheme based on MATLAB

2022-06-26 16:28:00 【Zhuoqing】

Jane Medium ： Abstract ： This article starts from modifying the background of video dubbing , Put forward the goal of natural integration and modification of audio ; By analyzing the factors of environmental sound , The processing scheme of recording impulse response and convolution with audio is determined ; Then through the actual test and MATLAB The feasibility of convolution scheme is explored , After improvement, the effect is acceptable ; Finally, a feasible flow scheme is proposed and the problems in the experiment are supplemented .

key word ： Convolution , Audio processing , Impulse response

automation 02 2020011071 Sun Bowen

One 、 The background and objective of the problem

1. The background of the question

I am learning , stay B standing （ Bilibili） Operating personal account , Release some videos about unpacking evaluation of digital products . In the process of video production , Occasionally, you need to modify the audio of the original material , For example, to make up for a slip of the tongue 、 Correct typos and add some content . For the material recorded in the scene outside the dormitory , If you dub it directly in the dormitory , Due to different recording scenes , There is a strong sense of conflict between the picture and the sound , The transition is not natural .

2021 year 12 month , The author participated in the late editing of the class drama . In one of the scenes , The voice of the female leader is obviously less than that of the male leader , Post dubbing is required . Due to the lack of outdoor shooting conditions at that time , The post dubbing work is carried out in the dormitory , Although the recording effect is very good , But there is no outdoor background sound , When integrated into the video, the effect is poor .

▲ chart 1.1 Class play clips that need post-processing

▲ chart 1.1 Class play clips that need post-processing

2. The goal to achieve

Based on MATLAB Audio processing for , Make it quiet in the dormitory 、 Audio recorded without echo , can
It can be naturally integrated into other environments . In this way, the workload of later modification is reduced , It can also improve the final video
The overall look and feel of .

Two 、 Theoretical scheme analysis

1. Factors that produce environmental sound

When shooting outdoors , The environment will have a great impact on audio recording . The sound received by the microphone is in addition to the direct sound from the characters in the video , And the walls 、 Reflected sound from the ground , Noise from passing motor vehicles , There is even air flow 、 Background noise such as microphone noise .

▲ chart 2.1 Schematic diagram of microphone receiving sound during outside diameter shooting

▲ chart 2.1 Schematic diagram of microphone receiving sound during outside diameter shooting

These sounds will have a considerable impact on the final recording , Can not be ignored . therefore , The sound received by the microphone is equivalent to the sound of the target sound after being processed by a specific environmental system . If you want to achieve the goal of simulating location recording in a quiet environment , It is necessary to find a way to describe the system .

2. Impulse response

Impulse response is defined as ： A system under test when a pulse excitation signal is input , The obtained time domain response characteristics . In acoustic analysis , It is considered that the impulse response is the acoustic signature of a system , Contains a wealth of information about the system , Including arrival time 、 Frequency component 、 Reverberation attenuation characteristics and overall frequency response, etc . therefore , By measuring the impulse response, a system description scheme can be obtained .

▲ chart 2.2 Composition diagram of impulse response （ Source network ）

▲ chart 2.2 Composition diagram of impulse response （ Source network ）

3. Using convolution to realize audio processing

Just mentioned , The sound received by the microphone is equivalent to the sound of the original sound after being processed by a specific environmental system . set up Y For the sound received by the microphone , X For the original sound , H Transfer functions to the system , stay s Domain has ：

$Y\left( s \right) = H\left( s \right) \cdot H\left( x \right)$

Since the system transfer function is equal to the impulse response of the system , therefore H(s) It can be obtained by actually measuring the impulse response . from s The operational relationship between domain and time domain , Yes ：
$y\left( t \right) = h\left( t \right) * x\left( t \right)$

therefore , Convolute the audio recorded in a quiet environment with the impulse response recorded in a specific environment , In theory, you can get audio that is similar to the recording effect in this environment , Then it can be harmoniously integrated into the materials that need to be changed .

3、 ... and 、 Test practice

1. Test plan development

(1) Selection of test scenarios

Combine the actual conditions with the ease of operation , Select the quiet recording environment inside the dormitory , The specific recording environment is selected as the bathroom . Because the bathroom is very small 、 Strong tightness , So there will be strong reverberation when recording , It's easy to experiment .

▲ chart 3.1 The specific recording environment is selected as the bathroom

▲ chart 3.1 The specific recording environment is selected as the bathroom

(2) Selection of recording device

The recording device is capacitive USB Microphone . Compared with mobile phones , The microphone has a certain noise reduction effect , The recorded audio is mono , It is convenient for subsequent experimental operation .

▲ chart 3.2 The experiment used USB Microphone

▲ chart 3.2 The experiment used USB Microphone

(3) Test audio selection

The audio text recorded in the bathroom is “ Tsinghua University since 02 class ”, The audio text recorded in the dormitory is “ Department of automation ”, The aim is to integrate it into “ Department of automation, Tsinghua University 02 class ”. For impulse signals , After testing a series of triggering methods , Select the best sounding signal to record the impulse response of the bathroom .

2. Actual test process

(1) Recording audio in the bathroom

Record in the bathroom “ Tsinghua University since 02 class ” Audio , Name it “ bathvoice”.

(2) Measure the impulse response of the bathroom

Snap your fingers in the bathroom , Determine the impulse response in the bathroom environment , Name it “ pulse”.

(3) Recording audio in the dorm

Recording in the dormitory “ Department of automation ” Audio , Name it “ roomvoice”.

(4) Audio waveform check

use GoldWave Music software checks whether the audio is mono , The results were normal , As shown in the figure below ：

▲ chart 3.3 Waveform diagram of each audio file , All are mono

▲ chart 3.3 Waveform diagram of each audio file , All are mono

among , The narrow column below each sound is the overall progress bar , Not the second channel , All three audio frequencies are mono .

(5) MATLAB Convolution

Import each audio file MATLAB, Here's the picture ：

▲ chart 3.4 Import MATLAB Rear audio file , Sampling rate fs by 48000

▲ chart 3.4 Import MATLAB Rear audio file , Sampling rate fs by 48000

take roomvoice And pulse Convolution , Name it ans1, It is the result of convolution ：

>> ans1 = conv(roomvoice,pulse)

Four 、 Analysis of test results

1.ans1 Effect analysis

audition ans1, It is found that the convolution result is noisy , And the response time is very long , The effect is not perfect .

>> sound(ans1, fs)
>> audiowrite('ans1.wav', ans1, 48000

The analysis reason , use plot Command draw roomvoice、pulse and ans1 The image is as follows ：

▲ chart 4.1 RoomVoice,Pulse,Ans1 wave form

▲ chart 4.1 RoomVoice,Pulse,Ans1 wave form

Observe the image , You can find roomvoice The audio is about 3.8 Second decay to close to 0,ans1 The waveform of is approaching 4 Seconds before it begins to decay ; and ans1 Before 4 Second waveform ratio roomvoice Much tighter , Not close to the weak voice 0, This results in heavy noise and long reverberation . To solve the problem , Observe the impulse response pulse, It is found that the impulse start time is not zero, resulting in transmission delay , And the reverberation attenuation slope is small 、 Large background noise leads to unsatisfactory convolution results . therefore , Next, deal with the impulse response pulse Improvement .

2. Interception of impulse response

The impulse response pulse Waveform amplification , Pictured ：

▲ chart 4.2 Image after impulse response amplification

▲ chart 4.2 Image after impulse response amplification

Contrast map 3- Schematic diagram of impulse response composition , It is observed that the period from direct sound to the end of early attenuation only accounts for a small part of the impulse response , The rest is reverberation building and reverberation attenuation . Considering that the impact of the bathroom system on the sound mainly lies in the primary reflection 、 Wall absorption and reverberation , And the limitations of the recording equipment itself , For impulse response , Direct sound arrival shall be reserved ~ Reverberation building part , Round off the rest . This can ensure that the bathroom environment system itself accounts for most of the sound processing , Try to reduce the influence of other factors on the sound .

The treatment is ： First re import pulse.m4a And named it pulse2, Double-click to open pulse2, Will start all 0 Value delete , Pictured ：

▲ chart 4.3 delete pulse2 Drive enough of all 0 value

▲ chart 4.3 delete pulse2 Drive enough of all 0 value

Then find the area where the impulse response begins to show a small value on a large scale , Delete the following part , For example, the 22290~74304 part , For best convolution , Multiple adjustments may be required ：

▲ chart 4.4 delete pulse2 The smaller value of the second half

▲ chart 4.4 delete pulse2 The smaller value of the second half

use plot The impulse response after the command is processed is shown in the figure below ：

▲ chart 4.5 After processing the pulse2 The first half of the amplified waveform

▲ chart 4.5 After processing the pulse2 The first half of the amplified waveform

3.ans2 Effect analysis

take roomvoice And pulse2 Convolution , Name it ans2, Export to wav File and listen .

>> ans2 = conv(voomvoice,pulse2)
>> audiowrite('ans2.wav', ans2, 48000)
>> sound(ans2,fs)

Convolution effect is very good , Close to recording directly in the bathroom bathvoice The effect of . plot Command to draw the waveform as follows , It can be seen that the reverberation length has been greatly improved , The noise problem has also been solved to some extent ：

▲ chart 4.6 ans2 Waveform of

▲ chart 4.6 ans2 Waveform of

Next use Adobe Premiere Pro Software , Yes bathvoice and ans2 Audio splicing , Simulate the modification and replacement of audio in practical application , And exported as final.mp3. In addition, we will bathvoice And unprocessed audio roomvoice Make the same splicing , Export to normal.mp3 As a contrast . final.mp3 and normal.mp3 as well as Adobe Premiere Pro Project file audio editing .prproj All have been attached to the thesis package .

▲ chart 4.7 Simulate splicing and replacement in practical application

▲ chart 4.7 Simulate splicing and replacement in practical application

4. The processed audio is compared with the unprocessed audio

audition final.mp3 and normal.mp3, Although the processed audio is easy to hear that it is spliced , But compared to unprocessed audio , The processed splicing part adds reverberation to simulate the bathroom environment , Give a person a kind of “ The supplementary part was also recorded in the bathroom ” The feeling of , Unprocessed audio has no such effect .

Through audio comparison , It is proved that this paper is based on MATLAB Audio processing is effective , It can reduce the sense of disobedience in the late dubbing , Achieve the effect of natural transition .

5、 ... and 、 Practical application and improvement supplement

1. A feasible process plan

During the video recording of the location , You can record one or more impulse responses in each scene . If you need to modify the dubbing later , Just re record the correct audio , Convolute with the processed impulse response of the corresponding scene , Finally, select the one closest to the original audio effect for splicing , You can achieve the effect of natural transition .

2. Supplementary explanation of the problems in the experiment

The parts that lack implementation conditions and the parts that need attention in this paper are summarized as follows ：

(1) Try to choose high-quality recording equipment

The key part of this paper is the convolution of signals , If the impulse response is not received correctly , Or the noise in the late dubbing is too loud , It may seriously affect the accuracy of convoluted audio . In this experiment 100 Yuan price condenser microphone , In minimizing noise （ Close the door and close the window ） Under the premise of , Recorded in bathroom and dormitory , Although there is some bottom noise , But the effect is still ideal . If you can use a higher level microphone , Recording in a professional studio , It should have a better effect .

(2) Pay attention to the microphone distance during the later dubbing

According to the author's experience in dubbing video on weekdays , The distance between the mouth and the microphone will greatly affect the recording
The effect of . In the late dubbing , Try to keep the distance between the microphone and the recording , If conditions permit, the angle and relative position should also be consistent as far as possible , In this way, the best reduction effect can be achieved . This experiment recorded roomvoice Limited by dormitory conditions , The microphone is too close , As a result, there is still a great sense of disobedience after handling .

(3) Complete reading a sentence to realize the restoration of mood

Impulse response convolution can process audio , But it can't affect the tone . So try not to read single words in the later dubbing , Instead, read the whole sentence , Guarantee tone with the original material 、 The tone is consistent , In this way, the effect of integration is better . In this experiment roomvoice Only “ Department of automation ” Four words , Tone and original sentence “ Tsinghua University since 02 class ” Large gap , Difficult to integrate , It should be avoided in practical application .

(4) Adjust loudness and other parameters with other software

Convolution will affect the loudness of the audio , It usually shows a slight increase in loudness . When integrating splices , It can be used MATLAB To adjust the amplitude of the convoluted audio , Try to match the original audio . It can also be used. Adobe Premiere Pro And other audio and video processing software to adjust the loudness , In this experiment final.mp3 in , The gain of the three audio segments is ： +3.8dB、-3.3dB and +5.9dB.

▲ chart 5.1 Adjust loudness for better integration

▲ chart 5.1 Adjust loudness for better integration

reference ：

[1] Signals and systems 2022 The fourth assignment in the spring semester , https://zhuoqing.blog.csdn.net/article/details/123550045.
[2] 26 Class play -Video-Export,https://www.bilibili.com/video/BV1da411r7uM.
[3] Explanation of acoustic concepts —— Figure out what impulse response is , https://blog.csdn.net/qq_28350219/article/details/114096751.
[4] Sound changing principle ： Convolution and transfer function , https://www.csdn.net/tags/MtTaAgzsNTgzNTM1LWJsb2cO0O0O.html. [5]matlab Process audio signals 33, https://www.csdn.net/tags/OtTaAgysODM2MDUtYmxvZwO0O0OO0O0O.html.

● Related chart Links :