当前位置:网站首页>Why can't Google search page infinite?
Why can't Google search page infinite?
2022-06-11 01:06:00 【Gegwu MMQ!!】

This is an interesting but little noticed problem .
When I use Google Search for MySQL When it comes to this keyword ,Google Provided only 13 Page search results , I changed it url The paging parameter of attempts to search for page 14 Page data , The following error message appears :

Baidu search also does not provide unlimited paging , about MySQL key word , Baidu search provides 76 Page search results .

Why not support infinite paging
Strong as Google Search for , Why not support infinite paging ? There are only two possibilities :
Can't do
It is not necessary to
「 Can't do 」 It's impossible , The only reason is 「 It is not necessary to 」.
First , When the first 1 Page search results do not have the content we need , We usually change keywords immediately , Instead of turning the page 2 page , Not to mention turning to 10 The page goes back . This is the first reason why it is unnecessary —— User demand is not strong .
secondly , The function of infinite paging is very performance consuming for search engines . You may feel strange , Turn to No 2 Page and turn to page 1000 The pages are all search , What's the difference ?
actually , A side effect of the design of high availability and high scalability of search engine is that it can not efficiently realize the unlimited paging function , Not being efficient means being able to , But the price is higher , This is a problem that all search engines will face , Professionally called 「 Deep paging 」. This is also the second reason why it is unnecessary —— High implementation cost .
Of course I don't know Google How to do the search of , So next I use ES(Elasticsearch) As an example to explain why deep paging is a headache for search engines .
Why take ES For example
Elasticsearch( Hereinafter referred to as" ES) The functions and Google And the functions provided by Baidu search are the same , And it is similar in the way of high availability and high scalability , Deep paging problems are caused by these similar optimization methods .
What is? ES
ES It's a full text search engine .
What the hell is a full-text search engine ?
Imagine a scene , You happen to hear a song with a particularly beautiful melody , I still feel the lingering sound after I go home , But you can only remember a few words in the lyrics :「 The edge of the umbrella 」. At this time, the search engine will work .
Using the search engine, you can get 「 The edge of the umbrella 」 All results of keywords , There is a term for these results , It's called documentation . And the search results are returned after sorting according to the relevance of documents and keywords . We get the definition of full-text search engine :
Full text search engine is to find relevant documents according to the contents of documents , A tool that returns search results in order of relevance

Surfing the Internet for too long , We will gradually mistake the ability of the computer for our own ability , For example, we may mistakenly think that our brain is good at this kind of search . On the contrary , We are not good at full-text search .
for instance , If I say to you : In the Quiet Night . You may blurt out : abed, I see a silver light , The frost on the ground . look at the bright moon , Bow your head and think of your hometown . But if I ask you to say something with 「 month 」 The ancient poetry of , You must have worked hard on the membership fee .
Including the books we usually read , The directory itself is a search structure that conforms to the retrieval characteristics of our human brain , Let's go through the documentation ID Or the document title is a general sign to find a document , This structure is called forward index .
A directory is a forward index
Full text search engines are just the opposite , It is to find documents through the contents of documents , The flying flower order in the poetry conference is the full-text search engine of the human brain version .
Flying order is full-text search
The data structure that full-text search engines rely on is the famous inverted index (「 inverted 」 This word means that this data structure is just the opposite of our normal way of thinking ), It is a concrete implementation of the containment relationship between words and documents .
Word document matrix
Stop ! Can't go on , Finish the introduction in one sentence ES Well !
ES Is a use of inverted index data structure 、 Be able to find relevant documents according to their contents , A full-text search engine that returns search results in order of relevance
The secret of high availability —— copy (Replication)
High availability is an indicator that enterprise level services must consider , High availability inevitably involves clustering and distribution , Fortunately ES Natural support cluster mode , It is very simple to build a distributed system .
ES High service availability requires that if one of the nodes is down , Can not affect the normal search service . This means that the data stored on the suspended node , You must have a full backup on the other nodes . This is the concept of copy .


As shown in the figure above ,Node1 Master node ,Node2 and Node3 As a replica node, it saves the same data as the master node , In this way, the failure of any node will not affect the business search . Meet the high availability requirements of services .
But there's a fatal problem , Unable to realize system expansion ! Even if you add another node , It can not help the capacity expansion of the whole system . Because each node completely saves all the document data .
therefore ,ES Introduced slicing (Shard) The concept of .
PB The foundation stone of level quantity —— Fragmentation (Shard)
ES Put each index (ES A collection of documents in , amount to MySQL In the table ) Divide into several pieces , Sharding will be distributed to different nodes as evenly as possible . For example, a cluster now has 3 A node , The index is divided into 5 A shard , The distribution method is roughly ( Because how to distribute it equally depends on ES) As shown in the figure below .
Fragmentation
thus , The horizontal expansion of the cluster is very simple , Now let's add... To the cluster 2 Nodes , be ES The partition will be automatically balanced to each node :

High availability + The elastic expansion
The copy and sharding functions work together to create ES Today's high availability and support PB Two advantages of level data volume .
Now let's use 3 Nodes as an example , Show that the number of slices is 5, The number of copies is 1 Under the circumstances ,ES The partition arrangement on different nodes :
The distribution of the main partition and the sub partition
There is one caveat , In the example above, the primary partition and the corresponding replica partition do not appear on the same node , As for why , You can think about it for yourself .
Distributed storage of documents
ES How to determine which partition a document should be stored on ?
Through the above mapping algorithm ,ES Distribute the document data evenly in each partition , among routing The default is document id.
Besides , The contents of replica shards depend on the master shard for synchronization , The meaning of replica sharding is load balancing 、 The main partition position on the top that may hang up at any time , Become the new main partition .
Now the basics are over , Finally we can search .
ES The search mechanism of
A picture is worth a thousand words :

When the client performs keyword search ,ES A load balancing policy will be used to select a node as the coordination node (Coordinating Node) Accept the request , Let's assume that the choice is Node3 node ;
Node3 The node will be in 10 The primary and secondary slices are randomly selected 5 A shard ( All partitions must be able to contain all contents , And can't repeat ), send out search request;
Selected 5 After the query is executed and sorted, the results are returned to Node3 node ;
Node3 Node consolidation 5 Results returned by sharding , After sorting again, get the result set of the corresponding page and return it to the client .
notes : actually ES Our search is divided into Query Phase and Fetch There are two steps in the phase , stay Query The documents are returned in each stage Id And sort values ,Fetch Phase by document Id Go to the corresponding partition to obtain the document details , The picture and text above simplify this , Please know .
Now consider client-side fetch 990~1000 When it comes to documentation ,ES How to give correct search results in the case of fragment storage .
obtain 990~1000 When it comes to documentation ,ES Under each partition, you need to obtain 1000 A document , Then from Coordinating Node Aggregate the results of all slices , Then sort the correlation , Finally, the correlation order is selected in 990~1000 Of 10 Documents .
Deep paging
The deeper the pages , Each node processes more documents , The more memory it takes up , The longer it takes , This is why search engine vendors usually do not provide deep paging , They don't have to waste performance on features that don't have strong customer demand .
边栏推荐
- 扎实的基础知识+正确的方法是快速阅读源码的关键
- Wechat applet to realize OCR scanning recognition
- [论文阅读] TGANet: Text-guided attention for improved polyp segmentation
- About log traffic monitoring and early warning small project | database management tool: migrate
- 适配器模式
- 团队管理|如何提高技术Leader的思考技巧?
- [ROS tutorial] - 02 ROS installation
- Synchronized keyword for concurrent programming
- Fastdfs quick start
- 【NVIDIA驱动的顽固问题】---- /dev/sdax:clean,xxx/xxx files,xxx/xxx blocks ---- 最全解决方法
猜你喜欢

About log traffic monitoring and early warning small project | Kafka vs redis

【ROS入门教程】---- 03 单片机、PC主机、ROS通信机制

What is thread in concurrent programming

Teach you the front and back separation architecture (V) system authentication implementation

Complete uninstallation of MySQL under Linux

Small project on log traffic monitoring and early warning | environment and foundation 1

lucene思维导图,让搜索引擎不再难懂

What exactly does Devops mean?

Logback log framework

数组的字典排序
随机推荐
The principle and source code interpretation of executor thread pool in concurrent programming
Serrures dans SQLSERVER
SqlServer中的锁
Google搜索为什么不能无限分页?
Kubernetes入门介绍与基础搭建
Detailed explanation of five types of load balancing principle scenarios
compiler explorer
Can I buy annuity insurance? Is annuity insurance safe?
Complete collection of MySQL exercises (with answers) - you will know everything after practice
五大类型负载均衡的原理场景详解
UUID quick explanation
Optimization of startup under SYSTEMd, deleting useless SYSTEMd services
Lucene mind map makes search engines no longer difficult to understand
Adapter mode
What is thread in concurrent programming
MySQL
最好的创意鼓工具:Groove Agent 5
The best creative drum tool: groove agent 5
MySQL
Complete uninstallation of MySQL under Linux