当前位置:网站首页>Technology practice | scene oriented audio and video call experience Optimization
Technology practice | scene oriented audio and video call experience Optimization
2022-06-22 13:49:00 【51CTO】

In modern life , Audio and video communication is one of the most commonly used communication methods . such as , One to one audio and video in social networking , Remote consultation in medical treatment , Online house viewing in real estate transaction , And the one-to-one and group audio and video communication that may occur anytime and anywhere in the remote working scene .* This article is transferred from the official account 【 Rongyun global Internet communication cloud 】, reply Luck draw Receive benefits .
And we have also experienced the logic of the product in use “ jam ”, For example, audio and video cannot be switched freely 、1V1 Group chat cannot be upgraded 、 You can't join the group video at any time .

6 month 16 Japan , Melting clouds RTC · Advanced actual combat master class Focus audio and video calls , From audio and video calls 、 Experience pain points and cloud melting in multiple scenes New CallLib Best practices, etc , Disassemble the experience challenges faced by audio and video calls in a variety of usage scenarios , Share optimization solutions . The background to reply 【 conversation 】 Get the complete courseware
Audio and video calls are realized
What we usually call audio and video calls , It refers to the application scenarios such as wechat that must contain the call process . There is a caller and one or more callees , The caller initiates a call , The callee can choose to answer or end .

There are many scenarios for audio and video calls , Especially when we are experiencing a large-scale digital transformation .
such as , Remote consultation , Doctors can communicate with patients through audio and video calls to facilitate diagnosis ;VR House watching , Real estate agents communicate with tenants through audio and video calls , combination VR Realize remote house viewing ; Online dating , Matchmakers can communicate with each other through group chat , This is a group audio and video call scenario .



( Audio and video call scenarios )
Audio and video calls appear in every aspect of our online life , It is also a necessary capability for various applications . that , How to realize an audio and video call ?
From the aspect of basic knowledge that needs to be mastered , It is roughly divided into three parts , Audio and video 、 Network transmission and server .
This is a very complex system , Here is only a general description .


(RTC Basic knowledge )
Audio and video
Basic knowledge of audio and video : Collection of audio and video , Different platforms have different collection methods . Developers need to master the basic knowledge of metadata , That is, the data format we get directly through collection .
Audio and video data processing : To master audio noise reduction 、 Audio echo cancellation 、 Image clipping, etc .
Audio and video codec : At least one kind of audio coding and one kind of video coding should be mastered .
The encoding format has a corresponding decoder , The common encoding format can also be decoded by hardware , Soft solution or hard solution can be selected according to different solutions . High compatibility of soft solutions , But it will consume CPU performance , Good hard solution performance , But it may face some compatibility problems .
Audio and video playback and rendering It is also necessary to master , Generally, playback and rendering refers to the playback and rendering of metadata , Audio is just PCM Playback of , If the video is a client, it will be used OpenGL, Browsers need to use WebGL etc. .
Network transmission
The client can choose TCP or UDP, Usually audio and video use UDP transmission , This is because audio and video data requires real-time performance , Not integrity , Audio and video data can be played normally even if they are incomplete .
TCP and QUIC All are Reliable transport protocol , More use of integrity is required for signaling data of similar services TCP perhaps QUIC.
QUIC The agreement is based on UDP An agreement on a protocol that guarantees integrity .UDP Itself is an unreliable agreement , If you want to ensure the integrity, you need to make your own packet loss retransmission strategy , and QUIC It is a set of data protocol with packet loss retransmission policy , Can be achieved with TCP The same effect .
The server
Usually divided into Signaling server and Audio and video server . seeing the name of a thing one thinks of its function , The signaling server is responsible for transmitting service related data , The audio and video server is responsible for transmitting audio and video data . Different technical solutions correspond to different technologies for implementing servers .
Use the above basic knowledge , We can complete the core logic of the audio and video call scenario . Before considering the business processing strongly related to the call scenario , The following two problems should be solved first .
First of all Instant transmission of business data . To initiate a call , Have a caller 、 The called end has two clients .
How the called end receives the initiating signaling sent by the calling end ? have access to Push+ In the form of long links , Or from a third party IM SDK To achieve .
The next step is more critical Basic audio and video capabilities , as well as Real time transmission of audio and video data .
If the developer makes his own research , You can choose WebRTC, This is a Google It provides a complete open source solution with basic audio and video capabilities and transmission capabilities ; Or you can develop a complete set from scratch RTC System .
Of course , No matter which self-study scheme you choose , All have very complex underlying knowledge to learn , Considerable R & D capability and staffing are required .
Rongyun provides another more convenient implementation scheme for developers , Melting clouds CallLib.
For audio and video call scenarios , Rongyun general IM and RTC Capability fusion encapsulation , Provides... With a complete call flow SDK. Developers just need to care CallLib A small number of interfaces provided , Call requirements can be realized , It also has the following advantages .
integrity , The call flow is complete , Single person support 、 Multi person call , Developers don't need to care about the underlying implementation principles .
Ease of use , Open the box , Fast implementation , Flexible customization .
stability ,IM news 100% Reliable arrival ,RTC Audio anti packet loss 80%, Video anti packet loss 60%.
The diversity of the scene , Extensibility is strong , It can meet the needs of audio and video calls in many fields .
Based on rongyun CallLib, Developers can quickly implement with call function App, As for the usage scenario jam we mentioned at the beginning , We have also upgraded and optimized our products .
Multi scenario call experience optimization needs
Call escalation
Audio and video level up and down : Users can directly switch to video calls when making audio calls .
1V1 Upgrade to group chat : from 1V1 A single chat directly turns into a group chat with multiple people .
Join an open call at any time : When a group call , If the user does not immediately access , As long as the call is not over , You can choose to join .
Upgrade of flow control
Free audio and video streaming API; Preview of multiple calls ; Select answer and call waiting .
Mainly to optimize flow control API, Developers can develop some high-level functions according to their own needs while making it easy to use .
Upgrade of data consistency
Including consistency of call time 、 Participant status synchronization consistency and extreme scenarios , Consistency of operational business logic .
The upgrade of data consistency is mainly for all users participating in audio and video calls , In particular, users who access calls through different terminals will have a more consistent experience , It is also convenient for later business expansion .
that , How to implement these optimizations and upgrades ?

Melting clouds CallLib be based on IMLib and RTCLib Realization , Is a heavy client design .
All state and business logic needs to be stored on the client , It increases the implementation complexity of the client , Business logic should be implemented repeatedly at each end , It will also lead to inconsistent status in extreme cases .
For example, the caller and the callee click hang up at the same time . The caller hangs up and calls Cancel , The callee hangs up and calls Refuse , It is different in the call record . If the caller and the callee hang up at the same time , Because this operation is sent and processed from the client , It is impossible to tell who did the operation first , This leads to inaccurate call records .
This problem is actually manageable , But it can be complicated , Because the design of heavy client is difficult to realize the function expansion .
Melting clouds New CallLib practice
Melting clouds New CallLib It can solve the above problems gracefully .
Upgrade business processing capacity : Re server , Light client design .
The original CallLib Responsible for the status of maintenance , Instead, maintain and save on the server ;
CallLib Change into by Server Driven state machine model ;
It greatly simplifies the implementation complexity of the client , And it can avoid the inconsistent implementation of multi terminal logic .
Because state changes are sent by the server , This ensures the consistency of the state , It is also able to handle extreme scenarios and online and offline states .
For example, the two ends mentioned above hang up at the same time , There must be an operation that reaches the server first , Then the server issues the status , More orderly . The generation of call records is also completed by the server , There will be no inconsistency between the two ends .
Let's take a look at the specific scenarios , How to upgrade the new design experience .
1V1 Audio and video call flow

This is our most basic call function , On the right is the complete sequence diagram of this basic capability , from 4 The roles work together ,Client A、Client B They represent the calling end and the called end respectively ,Call Server Handle the business logic related to the call ,RTC Server Handle audio and video calls .
As you can see from the diagram , The process of implementing this basic capability is actually very complicated .
Audio and video level up and down

I believe many people have had this experience , When you need to turn on the camera for some reason during a voice call , You must first hang up the audio call and restart the video .
Rongyun has optimized such scenarios , Make it more convenient for users to use in this scenario , When I need to temporarily upgrade an audio call to a video call, I can initiate an upgrade request within the call .
The initiated end can choose to accept or reject , Upgrade to video call if received , Refuse to continue the audio call . Of course, the initiator can also choose to cancel the upgrade .
In this scene , There may also be some extreme operations , For example, the initiating end cancels the operation and the initiated end accepts the operation by clicking at the same time , At this time CallServer Will play a decision-making role , For example, the cancellation shall prevail , As long as the operation is cancelled , Even if the initiator clicks to accept, both ends will still be degraded to audio calls .
In the previous heavy client design , This judgment can become very complicated .
An incoming call during a call

In our common audio and video call scenarios , If the called user is making a voice or video call , It usually shows that the other party is busy , Wait for him to hang up .
Analogy telephone scene , During the call, we may receive a call from a third person , At this time, the operator's telephone has the functions of selective answer and call waiting , That means we can handle two calls at the same time .
For this scenario , Cloud melting CallServer The status of each phone is stored , So even if you are in a call, you still have the ability to handle other calls .
Group chat call process

The difference between group chat and single chat lies in two points , First, there may be two or more participants , Second, as long as there is another person on the phone , The call is not over .
So group chat calls can only be initiated , The initiator entered the first time RTC room , Waiting for others to join .
Another special point is , You can invite others to join the room at any time , The process is very similar to the initiation .
In our common communication software , Group chat can only occur in a group , and Rongyun service can support initiating and inviting from different groups or private address books anytime, anywhere .
1V1 Upgrade group chat

This scene takes place when two people are talking , It is possible to temporarily discover that this communication requires other people to participate in , Rongyun supports a 1V1 The call is directly upgraded to a group call .
The upgrade process is similar to the process of inviting others during group calls , The difference is that when it is upgraded to group chat ,Call Server Will definitely tell 1V1 The current call between the participating parties has become a group call .
The follow-up process after the upgrade is successful is the same as group chat .
Those who have not ended the call can join at any time

This is also a common situation —— It is not convenient to answer group calls , Or you need to leave temporarily after answering for a period of time .
The group call will not end as long as there is one person , So users who have not participated in group chat for the time being , You can choose to join the call again at any time before the end .
This is a particularly useful optimization , Especially in our more and more Online office and communication In the scene , I have something to leave , Later, we need to join the communication to continue the previous discussion . It also benefits from Call Server Save the call status , So that the client can get the status of not joining the call at any time .
Call record implementation

before , Our call logs are stored locally , It is actually very difficult to achieve multi terminal synchronization .
Now you can update and delete the call records , Even the unread count of call records can be synchronized at multiple terminals , Bring users a more reasonable and unified experience .
Experience optimization is a long-term topic , Audio and video call is a very common and complex function . Rongyun is based on years of profound technology accumulation in the communication industry , Continue to optimize and experience the details , Eliminate in use “ jam ”, Provide developers with solutions that are more convenient and conform to users' habits .
边栏推荐
猜你喜欢

Simple integration of client go gin IX create

VR全景拍摄,打破传统宣传雁过不留痕的僵局

After several years of writing at CSDN, I published "the first book". Thank you!

Leetcode game 297

别再用 System.currentTimeMillis() 统计耗时了,太 Low,StopWatch 好用到爆!

Rigid demand of robot direction → personal thinking ←

“不敢去怀疑代码,又不得不怀疑代码”记一次网络请求超时分析
MySQL如何让一个表中可以有多个自增列

openGauss内核分析之查询重写

VCIP2021:利用解码信息进行超分辨率
随机推荐
What does Huawei's minutes on patents say? (including Huawei's top ten inventions)
Growth knowledge network
Acwing 241 Loulan totem (detailed explanation of tree array)
Leetcode interval DP
SQL and Oracle statements for eliminating duplicate records
Neuron+eKuiper 实现工业物联网数据采集、清理与反控
"N'osez pas douter du Code, vous devez douter du Code" notez une analyse de délai de demande réseau
Query rewriting for opengauss kernel analysis
Redis+Caffeine两级缓存的实现
机器人方向的刚性需求→个人思考←
3dMax建模笔记(一):介绍3dMax和创建第一个模型Hello world
Detailed explanation of rules and ideas for advance sale of deposit
epoch_ Num and predict_ Conversion of num
Acwing week 52
定金预售的规则思路详解
Leetcode subsequence / substring problem
MySQL如何让一个表中可以有多个自增列
Microservice test efficiency governance
SQL Server 常用函数
leetcode 834. Sum of distances in the tree