当前位置：网站首页>Technology practice | scene oriented audio and video call experience Optimization

Technology practice | scene oriented audio and video call experience Optimization

2022-06-22 13:49:00 【51CTO】

Technology practice | Scene oriented audio and video call experience Optimization _ The server

In modern life , Audio and video communication is one of the most commonly used communication methods . such as , One to one audio and video in social networking , Remote consultation in medical treatment , Online house viewing in real estate transaction , And the one-to-one and group audio and video communication that may occur anytime and anywhere in the remote working scene .* This article is transferred from the official account 【 Rongyun global Internet communication cloud 】, reply Luck draw Receive benefits .

And we have also experienced the logic of the product in use “ jam ”, For example, audio and video cannot be switched freely 、1V1 Group chat cannot be upgraded 、 You can't join the group video at any time .

Technology practice | Scene oriented audio and video call experience Optimization _ client _02

6 month 16 Japan , Melting clouds RTC · Advanced actual combat master class Focus audio and video calls , From audio and video calls 、 Experience pain points and cloud melting in multiple scenes New CallLib Best practices, etc , Disassemble the experience challenges faced by audio and video calls in a variety of usage scenarios , Share optimization solutions . The background to reply 【 conversation 】 Get the complete courseware

Audio and video calls are realized

What we usually call audio and video calls , It refers to the application scenarios such as wechat that must contain the call process . There is a caller and one or more callees , The caller initiates a call , The callee can choose to answer or end .

Technology practice | Scene oriented audio and video call experience Optimization _ The server _03

There are many scenarios for audio and video calls , Especially when we are experiencing a large-scale digital transformation .

such as , Remote consultation , Doctors can communicate with patients through audio and video calls to facilitate diagnosis ;VR House watching , Real estate agents communicate with tenants through audio and video calls , combination VR Realize remote house viewing ; Online dating , Matchmakers can communicate with each other through group chat , This is a group audio and video call scenario .

Technology practice | Scene oriented audio and video call experience Optimization _ Audio and video _04

Technology practice | Scene oriented audio and video call experience Optimization _ Audio and video _05

Technology practice | Scene oriented audio and video call experience Optimization _ client _06

（ Audio and video call scenarios ）

Audio and video calls appear in every aspect of our online life , It is also a necessary capability for various applications . that , How to realize an audio and video call ？

From the aspect of basic knowledge that needs to be mastered , It is roughly divided into three parts , Audio and video 、 Network transmission and server .

This is a very complex system , Here is only a general description .

Technology practice | Scene oriented audio and video call experience Optimization _ The server _07

Technology practice | Scene oriented audio and video call experience Optimization _ client _08

（RTC Basic knowledge ）

Audio and video

Basic knowledge of audio and video ： Collection of audio and video , Different platforms have different collection methods . Developers need to master the basic knowledge of metadata , That is, the data format we get directly through collection .

Audio and video data processing ： To master audio noise reduction 、 Audio echo cancellation 、 Image clipping, etc .

Audio and video codec ： At least one kind of audio coding and one kind of video coding should be mastered .

The encoding format has a corresponding decoder , The common encoding format can also be decoded by hardware , Soft solution or hard solution can be selected according to different solutions . High compatibility of soft solutions , But it will consume CPU performance , Good hard solution performance , But it may face some compatibility problems .

Audio and video playback and rendering It is also necessary to master , Generally, playback and rendering refers to the playback and rendering of metadata , Audio is just PCM Playback of , If the video is a client, it will be used OpenGL, Browsers need to use WebGL etc. .

Network transmission

The client can choose TCP or UDP, Usually audio and video use UDP transmission , This is because audio and video data requires real-time performance , Not integrity , Audio and video data can be played normally even if they are incomplete .

TCP and QUIC All are Reliable transport protocol , More use of integrity is required for signaling data of similar services TCP perhaps QUIC.

QUIC The agreement is based on UDP An agreement on a protocol that guarantees integrity .UDP Itself is an unreliable agreement , If you want to ensure the integrity, you need to make your own packet loss retransmission strategy , and QUIC It is a set of data protocol with packet loss retransmission policy , Can be achieved with TCP The same effect .

The server

Usually divided into Signaling server and Audio and video server . seeing the name of a thing one thinks of its function , The signaling server is responsible for transmitting service related data , The audio and video server is responsible for transmitting audio and video data . Different technical solutions correspond to different technologies for implementing servers .

Use the above basic knowledge , We can complete the core logic of the audio and video call scenario . Before considering the business processing strongly related to the call scenario , The following two problems should be solved first .

First of all Instant transmission of business data . To initiate a call , Have a caller 、 The called end has two clients .

How the called end receives the initiating signaling sent by the calling end ？ have access to Push+ In the form of long links , Or from a third party IM SDK To achieve .

The next step is more critical Basic audio and video capabilities , as well as Real time transmission of audio and video data .

If the developer makes his own research , You can choose WebRTC, This is a Google It provides a complete open source solution with basic audio and video capabilities and transmission capabilities ; Or you can develop a complete set from scratch RTC System .

Of course , No matter which self-study scheme you choose , All have very complex underlying knowledge to learn , Considerable R & D capability and staffing are required .

Rongyun provides another more convenient implementation scheme for developers , Melting clouds CallLib.

For audio and video call scenarios , Rongyun general IM and RTC Capability fusion encapsulation , Provides... With a complete call flow SDK. Developers just need to care CallLib A small number of interfaces provided , Call requirements can be realized , It also has the following advantages .

integrity , The call flow is complete , Single person support 、 Multi person call , Developers don't need to care about the underlying implementation principles .

Ease of use , Open the box , Fast implementation , Flexible customization .

stability ,IM news 100% Reliable arrival ,RTC Audio anti packet loss 80%, Video anti packet loss 60%.

The diversity of the scene , Extensibility is strong , It can meet the needs of audio and video calls in many fields .

Based on rongyun CallLib, Developers can quickly implement with call function App, As for the usage scenario jam we mentioned at the beginning , We have also upgraded and optimized our products .

Multi scenario call experience optimization needs

Call escalation

Audio and video level up and down ： Users can directly switch to video calls when making audio calls .

1V1 Upgrade to group chat ： from 1V1 A single chat directly turns into a group chat with multiple people .

Join an open call at any time ： When a group call , If the user does not immediately access , As long as the call is not over , You can choose to join .

Upgrade of flow control

Free audio and video streaming API; Preview of multiple calls ; Select answer and call waiting .

Mainly to optimize flow control API, Developers can develop some high-level functions according to their own needs while making it easy to use .

Upgrade of data consistency

Including consistency of call time 、 Participant status synchronization consistency and extreme scenarios , Consistency of operational business logic .

The upgrade of data consistency is mainly for all users participating in audio and video calls , In particular, users who access calls through different terminals will have a more consistent experience , It is also convenient for later business expansion .

that , How to implement these optimizations and upgrades ？

Technology practice | Scene oriented audio and video call experience Optimization _ The server _09

Melting clouds CallLib be based on IMLib and RTCLib Realization , Is a heavy client design .

All state and business logic needs to be stored on the client , It increases the implementation complexity of the client , Business logic should be implemented repeatedly at each end , It will also lead to inconsistent status in extreme cases .

For example, the caller and the callee click hang up at the same time . The caller hangs up and calls Cancel , The callee hangs up and calls Refuse , It is different in the call record . If the caller and the callee hang up at the same time , Because this operation is sent and processed from the client , It is impossible to tell who did the operation first , This leads to inaccurate call records .

This problem is actually manageable , But it can be complicated , Because the design of heavy client is difficult to realize the function expansion .

Melting clouds New CallLib practice

Melting clouds New CallLib It can solve the above problems gracefully .

Upgrade business processing capacity ： Re server , Light client design .

The original CallLib Responsible for the status of maintenance , Instead, maintain and save on the server ;

CallLib Change into by Server Driven state machine model ;

It greatly simplifies the implementation complexity of the client , And it can avoid the inconsistent implementation of multi terminal logic .

Because state changes are sent by the server , This ensures the consistency of the state , It is also able to handle extreme scenarios and online and offline states .

For example, the two ends mentioned above hang up at the same time , There must be an operation that reaches the server first , Then the server issues the status , More orderly . The generation of call records is also completed by the server , There will be no inconsistency between the two ends .

Let's take a look at the specific scenarios , How to upgrade the new design experience .

1V1 Audio and video call flow

Technology practice | Scene oriented audio and video call experience Optimization _ client _10

This is our most basic call function , On the right is the complete sequence diagram of this basic capability , from 4 The roles work together ,Client A、Client B They represent the calling end and the called end respectively ,Call Server Handle the business logic related to the call ,RTC Server Handle audio and video calls .

As you can see from the diagram , The process of implementing this basic capability is actually very complicated .

Audio and video level up and down

Technology practice | Scene oriented audio and video call experience Optimization _ Audio and video _11

I believe many people have had this experience , When you need to turn on the camera for some reason during a voice call , You must first hang up the audio call and restart the video .

Rongyun has optimized such scenarios , Make it more convenient for users to use in this scenario , When I need to temporarily upgrade an audio call to a video call, I can initiate an upgrade request within the call .

The initiated end can choose to accept or reject , Upgrade to video call if received , Refuse to continue the audio call . Of course, the initiator can also choose to cancel the upgrade .

In this scene , There may also be some extreme operations , For example, the initiating end cancels the operation and the initiated end accepts the operation by clicking at the same time , At this time CallServer Will play a decision-making role , For example, the cancellation shall prevail , As long as the operation is cancelled , Even if the initiator clicks to accept, both ends will still be degraded to audio calls .

In the previous heavy client design , This judgment can become very complicated .

An incoming call during a call

Technology practice | Scene oriented audio and video call experience Optimization _ client _12

In our common audio and video call scenarios , If the called user is making a voice or video call , It usually shows that the other party is busy , Wait for him to hang up .

Analogy telephone scene , During the call, we may receive a call from a third person , At this time, the operator's telephone has the functions of selective answer and call waiting , That means we can handle two calls at the same time .

For this scenario , Cloud melting CallServer The status of each phone is stored , So even if you are in a call, you still have the ability to handle other calls .

Group chat call process

Technology practice | Scene oriented audio and video call experience Optimization _ The server _13

The difference between group chat and single chat lies in two points , First, there may be two or more participants , Second, as long as there is another person on the phone , The call is not over .

So group chat calls can only be initiated , The initiator entered the first time RTC room , Waiting for others to join .

Another special point is , You can invite others to join the room at any time , The process is very similar to the initiation .

In our common communication software , Group chat can only occur in a group , and Rongyun service can support initiating and inviting from different groups or private address books anytime, anywhere .

1V1 Upgrade group chat

Technology practice | Scene oriented audio and video call experience Optimization _ The server _14

This scene takes place when two people are talking , It is possible to temporarily discover that this communication requires other people to participate in , Rongyun supports a 1V1 The call is directly upgraded to a group call .

The upgrade process is similar to the process of inviting others during group calls , The difference is that when it is upgraded to group chat ,Call Server Will definitely tell 1V1 The current call between the participating parties has become a group call .

The follow-up process after the upgrade is successful is the same as group chat .

Those who have not ended the call can join at any time

Technology practice | Scene oriented audio and video call experience Optimization _ Audio and video _15

This is also a common situation —— It is not convenient to answer group calls , Or you need to leave temporarily after answering for a period of time .

The group call will not end as long as there is one person , So users who have not participated in group chat for the time being , You can choose to join the call again at any time before the end .

This is a particularly useful optimization , Especially in our more and more Online office and communication In the scene , I have something to leave , Later, we need to join the communication to continue the previous discussion . It also benefits from Call Server Save the call status , So that the client can get the status of not joining the call at any time .

Call record implementation

Technology practice | Scene oriented audio and video call experience Optimization _ client _16

before , Our call logs are stored locally , It is actually very difficult to achieve multi terminal synchronization .

Now you can update and delete the call records , Even the unread count of call records can be synchronized at multiple terminals , Bring users a more reasonable and unified experience .

Experience optimization is a long-term topic , Audio and video call is a very common and complex function . Rongyun is based on years of profound technology accumulation in the communication industry , Continue to optimize and experience the details , Eliminate in use “ jam ”, Provide developers with solutions that are more convenient and conform to users' habits .

原网站

版权声明
本文为[51CTO]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/173/202206221232167418.html