当前位置:网站首页>New function | intelligent open search online customized word weight model
New function | intelligent open search online customized word weight model
2022-07-26 12:39:00 【Alibaba cloud big data AI technology】
Catalog
Pain points 1: Strong industry attribute
Pain points 2: It is difficult to build a model by oneself 、 The high cost 、 Cycle is long
Open search lightweight customized word weight solution
E-commerce Scene Effect Comparison
Content Scene Effect Comparison
Business pain points
In the query analysis stage of search , Word weight will analyze the importance of each word in the text , And quantify it into weights , Words with low weight may not participate in search recall . Use word weight model , It can avoid when the query words entered by the user contain some words with low weight , Recall is still limited by the original input , Cause too few hits . therefore , Word weight can effectively improve the recall rate of search , It is an indispensable function in the search, query and analysis stage .
Pain points 1: Strong industry attribute
Customers' search applications often belong to specific industries , When the general word weight model is applied in a specific industry, due to the lack of industry knowledge , There will be more badcase Affect the recall sorting effect .
Based on the industry model, the adaptation problem of word weight model in customer search application can be alleviated , However, if the customer's data distribution belongs to a specific vertical category in the industry or there is no corresponding industry, it can be linked , You need to customize the word weight based on customer data to achieve the best effect .
Pain points 2: It is difficult to build a model by oneself 、 The high cost 、 Cycle is long
The self built word weight model mainly includes the following processes :

difficulty 1: Word weight tagging requires high domain knowledge , It is difficult to judge the importance of different words in search engines . At the same time, the amount of data also needs to reach at least 10000 , It can take months .
difficulty 2: The threshold of model training is high , Need professional algorithm practitioners to debug , And the model effect 、 Iteration efficiency strongly depends on the investment and ability of Algorithm Engineers .
difficulty 3: Model deployment 、 The operation and maintenance process is complex , Engineering required 、 Algorithm 、 Operation and maintenance and other parties participate , And the launch of the depth model also involves performance 、 Many optimizations related to efficiency .
Open search lightweight customized word weight solution
Introduction of the plan
Before the search text recall , Open search will analyze and process the query semantics of the keywords entered by users . Due to the diversity of business scenarios , Different industries and businesses have their own particularity , Only the word weight model specific to the application level can guarantee the optimal search effect .
Compared with participle , Word weight plays a role in both recall and relevance ranking . In the recall phase , The word weight model will give each search word a corresponding weight score . First query , Words with low weight do not require a hit in the query, but will participate in the calculation . If you query zero or less results for the first time , The second query will lower 、 Words with medium weight are not mandatory to have a hit , So as to expand the recall . Correlation sorting stage , The weight of each search term given by the word weight model will participate in the calculation of correlation characteristics . When there are hits , Documents with high weight words will get higher scores , Thus ranking higher . The customized word weight model is customized based on the customer's own business data , Greatly improve search results .
OpenSearch It provides rich domain specific word weight models , Users can base on the corresponding industry analyzer , After simple configuration training, we get a customized word weight model . After training , Users can view the variance rate in the console 、 Typical word weight case Compare and other model effects , Wait until the effect meets expectations , This customized word weight model can be used in open search , And support the manual intervention of word weight effect .
The whole customization process does not require additional data docking , Word weight model training will automatically extract existing data for adaptation .

For customers
- Search is an important scenario of core business , Customers who have higher requirements for search results
- industry 、 Pendula 、 Special business , Customers with more exclusive terms
- Search manpower is limited , Algorithm students have relatively few customers
Usage method
- Import business data
- Create word weight model , Select the training field and create the model
- After model training , Quote word weight model in query analysis
For more instructions, please refer to : Recall custom word weights - Intelligent open search OpenSearch - Alibaba cloud
Effect comparison
E-commerce Scene Effect Comparison
original text | Weight of e-commerce common words | Customize word weights |
Four piece gift set in winter | winter : in Four piece suit : high gift : high | winter : in Four piece suit : high gift : low |
Bear digital display warm hand treasure | Little bear : low Digital display : in Hand warmer : high | Little bear : in Digital display : low Hand warmer : high |
Defibrillator compatibility package | Defibrillator : high compatible : in package : low | Defibrillator : in compatible : low package : high |
22 Annual calendar | 22: low year : low wall calendar : high | 22: in year : low wall calendar : high |
Content Scene Effect Comparison
original text | Common word weight | Customize word weights |
Potential function | potential : low function : high | potential : high function : low |
get post difference | get: in post: high difference : low | get: high post: high difference : low |
ktv Song ordering system | ktv: in choose a song : high System : in | ktv: high choose a song : in System : low |
Summary :
- If your business is currently using or preparing to use the industry version of open search , You can train the customized word weight model based on the industry model ;
- If open search has not yet provided an industry version close to your business , It is suggested to train the customized word weight model based on the general version model , This situation requires as much data as possible , The distribution shall be as comprehensive and balanced as possible , Help to improve the effect of the model ;
- Open search currently supports Custom word breaker 、 Custom word weight model , More customized recall models will be provided later , Coming soon ~
边栏推荐
- What is oom, why it happens and some solutions
- FPGA入门学习(一) - 第一个FPGA工程
- JDBC从连接池获取连接(Druid连接池)
- Overseas app push (Part 2): Channel Integration Guide for overseas manufacturers
- JDBC gets connections from the connection pool (Druid connection pool)
- Database composition table
- 如何组装一个注册中心?
- 【2243】module_param.m
- Use the jsonobject object in fastjason to simplify post request parameter passing
- 3D point cloud course (VIII) -- feature point matching
猜你喜欢

论文阅读-MLPD:Multi-Label Pedestrian Detector in Multispectral Domain(海康威视研究院实习项目)

How to view encrypted information in text form

Ds-112 time relay

动静态库的实现(打包动静态库供他人使用)

羽毛球馆的两个基础设施你了解多少?

How do children's playgrounds operate?

LCD notes (4) analyze the LCD driver of the kernel

Ssj-21b time relay

STM32 drives hc05 Bluetooth serial port communication module

Emerging security providers to learn about in 2022
随机推荐
敲黑板画重点:七种常见“分布式事务”详解
What is a callback function? Understanding of the word "back"
What is oom, why it happens and some solutions
二、容器_
【2243】module_ param.m
食品安全 | 这些常见食物小心有毒!速查自家餐桌
Examples of ThreadLocal usage scenarios under multithreading
Access数据库无法连接
Notes....
VS code 设置Ctrl+S保存,自动格式化的方法
A super easy-to-use artifact apifox, throw swagger a few streets... (glory Collection Edition)
Data query of MySQL (aggregate function)
JDBC从连接池获取连接(Druid连接池)
Microsoft has shut down two attack methods: Office macro and RDP brute force cracking
华为超融合FusionCube解决方案笔记
一文看懂GaitSet中的test.py
The "2022 Huawei developer competition eastern China division opening ceremony" was successfully held in Fuzhou
Backtracking - question 51 Queen n -- a classic backtracking problem that must be overcome
[MySQL constraint]
HTAP comes at a price