当前位置:网站首页>Voice assistant - Measurement Indicators
Voice assistant - Measurement Indicators
2022-06-12 07:31:00 【Turned_ MZ】
A voice assistant contains many modules and links , such as ASR、NLU、TTS、 Client, etc , So how to evaluate the effect of a voice assistant and its various modules ? Are there any quantifiable indicators ?



1、 Product indicators :
- Number of user sessions : It refers to the number of user conversations per day .
- User volume : Refers to the number of users per day .
- The next day : It refers to the proportion of users who still use the assistant the next day compared with the previous day .
- The first 7 Daily retention : With n Benchmarking , Is the first n+7 The percentage of users who still use the assistant in days .
- Retained in the next week : With n Benchmarking ,n+7 not , from n+7 To n+14 Proportion of users who still use voice assistant in .
The above indicators , It is to evaluate the whole product from different angles , It reflects the overall state of a voice assistant , Simply speaking , The more people use it ( User volume ), The more people use it ( Number of user sessions / User volume ), I want to use it after I use it ( Retain ), Is a good product .
2、 Technical indicators :
client :
- Client execution success rate : It refers to the success rate of the operation performed by the client , The higher, the better .
ASR:
- ASR Word error rate : In order to keep the recognized word sequence consistent with the standard word sequence , Need to be replaced 、 Delete or insert certain words , These inserts 、 The total number of words replaced or deleted , Divided by the percentage of the total number of words in the standard word sequence , That is to say WER, The lower the better .

- ASR Sentence error rate :SER If there is a word recognition error in the sentence , Then this sentence is considered a recognition error , The number of errors in sentence recognition , Divide by the total number of sentences SER, The lower the better , The calculation formula is as follows :

- Wake up rate : That is, the probability that the user wakes up successfully , The higher, the better .
- False wake-up rate : Non user wakeup , Or when there is noise such as background sound , The probability of being awakened by mistake , The lower the better .
NLU
- Semantic understanding accuracy : The intention of semantic recognition 、 Slot position 、 When the results are all correct , Think the semantic understanding is correct , Then the accuracy = Identify the correct query/ total query
- Recall rate : Be able to correctly identify the user's script to a certain intention , Is the recall rate of the corresponding scenario , Recall rate = Correctly identify the script of the scene / There is a script with this intention in the user script .
TTS
- TTS Naturalness : finger TTS The fluency of the broadcast , More close to people talking , Not machine sound , It is a subjective indicator .
3、 From the perspective of user measurement :
Chatting scene :
- Compassion : It refers to the ability to identify users' emotions , Be able to identify the user's current emotion , Also called emotion recognition .
- Empathy : Empathy requires looking at things from the user's perspective , Not from your own point of view , That is, after identifying the user's emotion , Be able to empathize with users in the reply , Resonate with users .
- The correlation : The answer should be related to the user's question , You can't answer questions that are different from what you are asking , such as :Q:“ What's the weather like today? ?”A1:“ It's raining today ”. A2: “ today 8 Number ”. be A1 And Q relevant ,A2 Unrelated .
- interest : It refers to whether the content of the reply is interesting , Not just “ What to ask and what to answer ”, Still the above example ,A3:“ It's raining today , Remember to take an umbrella when you go out , Keep in a good mood even if it rains .”,A3 be relative to A1 More interesting , It is also a subjective indicator .
- diversity : It refers to whether the content of the reply is rich and diverse , When the user asks the same question many times , The reply cannot always be the same .
- Average conversation rounds : Refers to the total number of conversations with the user , It is generally believed 10 There was no conversation with the user within minutes , Then a round of dialogue has ended , Average conversation rounds = Total number of conversations / Total rounds , such as : One day the user and the assistant talked together 100 Time , In the morning 8:00-8:30, 9:00-10:00, 10:15-11:00, Then the average number of conversational rounds =100/3 = 33 Time .
- Human consistency : Each assistant has its own set , such as : women 、18 year , Like to eat cabbage , Like green and so on , Human design consistency requires that the assistant talk to the user , Keep your personal settings the same , You can't “ Chop and change ”.
- The ability to start a conversation : It refers to when a topic cannot be talked about , The ability to initiate new topics , Can't be awkward enough to talk .
Q & a scenario :
- Reply to relevance : It refers to whether the reply is related to the user's question .
- timeliness : It refers to whether the reply is real-time enough , For example, stock information , Number limit information, etc , They all have practical requirements .
- Problem solving rate : It refers to the proportion of replies that can solve user problems .
- Proportion without answer : The proportion of people who have not found the answer , The lower the better .
- The proportion of repeated queries by users : When users don't find their own answers , Sometimes the question is repeated , Or change a statement and ask again , The lower the proportion of repeated inquiries, the better , That is to say, we hope to solve the needs of users at one time .
In the command scenario :
- Skill coverage : It refers to the ability to perform skills that cover the needs of users , The needs of users are diverse , Is there a corresponding skill to perform , Indicates skill coverage .
- To complete the degree : It refers to the degree to which an action is executed to meet the user's requirements , For example, users :“ Open wechat to help me send a circle of friends. The content is that the weather is good today ”, If you only help users open wechat , Instead of sending a circle of friends , Then the skill has not been completed .
- Execution success rate : Refers to the proportion of successful action execution .
4、 Higher requirements :
above 3 Point from the product 、 technology 、 From the perspective of users, this paper analyzes the evaluation indicators of voice assistant , But a truly intelligent voice assistant , You should also have the ability to have multiple sessions , He used it as a small talk 、 Question and answer 、 Lubricant between tasks , Can greatly enhance the sense of intelligence , This will be discussed in detail later , The corresponding indicators of multiple rounds of dialogue are :
- Average conversation rounds : The calculation method is similar to the above , But different scenarios have different requirements , The higher the average number of conversational rounds, the better , It means that the conversation is good , The question and answer type and task type hope that the fewer rounds of dialogue, the better , It indicates that the user needs can be met in the shortest time .
- Jump out rate : It refers to the proportion of users jumping out of multiple conversations .
- Multi round correlation : Different from reply relevance , If defined once Q And once A For a round , Then multi round correlation refers to the correlation between multiple rounds , It also refers to the ability to maintain a topic .
边栏推荐
- modelarts二
- Thoroughly understand the "rotation matrix / Euler angle / quaternion" and let you experience the beauty of three-dimensional rotation
- Stm32cubemx learning (I) USB HID bidirectional communication
- RT thread studio learning (I) new project
- AcWing——4268. Sexy element
- GD32F4(5):GD32F450时钟配置为200M过程分析
- node:打不开/node:已拒绝访问
- [yolo-v5 learning notes]
- Explain in detail the use of dynamic parameter adjustment and topic communication in ROS (principle + code + example)
- Vs2019 MFC IP address Control Control inherit cipaddressctrl class redessine
猜你喜欢

Detailed explanation of addressing mode in 8086

Summary of machine learning + pattern recognition learning (IV) -- decision tree

Complete set of typescript Basics

RT thread studio learning (I) new project

Modelants II

paddlepaddl 28 支持任意维度数据的梯度平衡机制GHM Loss的实现(支持ignore_index、class_weight,支持反向传播训练,支持多分类)

Detailed explanation of 8086/8088 system bus (sequence analysis + bus related knowledge)

Introduction to JDE object management platform and use of from

LVDS drive adapter

鸿蒙os-第一次培训
随机推荐
‘CMRESHandler‘ object has no attribute ‘_ timer‘,socket. gaierror: [Errno 8] nodename nor servname pro
Gradient epic memory for continuous learning
Detailed explanation of TF2 command line debugging tool in ROS (parsing + code example + execution logic)
2022年G3锅炉水处理复训题库及答案
VS2019 MFC IP Address Control 控件繼承CIPAddressCtrl類重繪
[Li Kou] curriculum series
Kotlin plug-ins kotlin Android extensions
Summary of machine learning + pattern recognition learning (II) -- perceptron and neural network
Explain ADC in stm32
Interview computer network - transport layer
R语言使用neuralnet包构建神经网络回归模型(前馈神经网络回归模型),计算模型在测试集上的MSE值(均方误差)
R语言dplyr包mutate_at函数和one_of函数将dataframe数据中指定数据列(通过向量指定)的数据类型转化为因子类型
Test left shift real introduction
Construction of running water lamp experiment with simulation software proteus
[college entrance examination] prospective college students look at it, choose the direction and future, and grasp it by themselves
AcWing——4268. 性感素
Adaptive personalized federated learning paper interpretation + code analysis
Keil installation of C language development tool for 51 single chip microcomputer
Esp8266 firmware upgrade method (esp8266-01s module)
xshell安装