Implement secondary index with Gaussian redis

Abstract ： gaussian Redis Build secondary index of business , Low cost , High performance , Achieve win-win performance and cost .

This article is shared from Huawei cloud community 《 Hua Wei Yun GaussDB(for Redis) Uncover secrets 21 period ： Use Gauss Redis Implement secondary index 》, author ： gaussian Redis The official blog .

One 、 background

Bring up the index , The first impression is the noun of database , however , gaussian Redis You can also implement secondary indexing ！！！ gaussian Redis The secondary index in generally uses zset To achieve . gaussian Redis Compared to open source Redis It has higher stability 、 And cost advantage , Use Gauss Redis zset Realize the secondary index of business , You can get a win-win situation of performance and cost .

The essence of index is to use ordered structure to speed up query , So by Zset Structural Gauss Redis It can easily realize the index of numeric type and character type .

• Numeric type index （zset Sort by score ）：

• Character type index （ When the scores are the same zset Sort by dictionary order ）：

Let's cut into two classic business scenarios , See how to use Gauss Redis To build a stable and reliable secondary index system .

Two 、 Scene one ： Complete the dictionary

When typing a query in the browser , Browsers usually recommend searches with the same prefix according to the possibility , This kind of scene can use Gauss Redis The secondary index function is realized .

2.1 Basic plan

The easiest way is to add each query of the user to the index . When user input completion recommendation is required , Use ZRANGEBYLEX Just execute the range query . If you don't want to return too many entries , gaussian Redis Also supports the use of LIMIT Options to reduce the number of results .

• Search the user banana Add to index ：

ZADD myindex 0 banana:1

• Suppose the user enters “bit”, And we want to offer the possibility to “bit” Search keywords at the beginning .

ZRANGEBYLEX myindex "[bit" "[bit\xff"

That is to use ZRANGEBYLEX Make a range query , The query range is the string that the user is now entering , And the same string plus a trailing byte 255（\xff）. In this way , We can get all strings prefixed with the string typed by the user .

2.2 Frequency related dictionary completion

In practical applications, we usually want to automatically sort and complete the entries according to the frequency of occurrence , At the same time, it can eliminate the entries that are no longer popular , And automatically adapt to future input . We can still use Gauss Redis Of ZSet Structure to achieve this goal , Just in the index structure , Not only do you need to store search terms , You also need to store the frequency associated with it .

• Search the user banana Add to index

• Judge banana Whether there is

ZRANGEBYLEX myindex "[banana:" + LIMIT 0 1

• hypothesis banana non-existent , add to banana:1, among 1 It's the frequency

ZADD myindex 0 banana:1

• hypothesis banana There is , You need to increase the frequency

if ZRANGEBYLEX myindex "[banana:" + LIMIT 0 1 The frequency returned in is 1

1） Delete old entries ：

ZREM myindex 0 banana:1

2） Add one frequency to rejoin ：

ZADD myindex 0 banana:2

Please note that , Because there may be concurrent updates , Therefore, it should pass Lua The script sends the above three commands , use Lua script Automatically get the old count and add the entry again after increasing the score .

• Suppose the user enters “banana”, And we want to provide similar search keywords . adopt ZRANGEBYLEX The results are sorted by frequency .

ZRANGEBYLEX myindex "[banana:" + LIMIT 0 10

1) "banana:123"

2) "banaooo:1"

3) "banned user:49"

4) "banning:89"

• Use streaming algorithm to clear infrequent input . Select an entry randomly from the returned entries , Subtract its score 1, Then add it again with the new score . however , If the new score is 0, We need to delete this entry from the list .

• If the frequency of randomly selected items is 1, Such as banaooo:1

ZREM myindex 0 banaooo:1

• If the frequency of randomly selected items is greater than 1, Such as banana:123

ZREM myindex 0 banana:123

ZADD myindex 0 banana:122

In the long run , The index will contain popular searches , If popular searches change over time , It will also automatically adapt .

3、 ... and 、 Scene two ： Multidimensional index

Except for queries on a single dimension , gaussian Redis It also supports retrieval in multidimensional data . for example , Retrieve all ages at 50 to 55 Between the ages of , At the same time, the salary is 70000 to 85000 Between people . The key to realize multi-dimensional secondary index is to convert two-dimensional data into one-dimensional data through coding , Then based on Gauss Redis zset Storage .

Represent two-dimensional indexes from a visual perspective . There are some points in the space below , They represent our data samples , among x and y It's two variables , The maximum values are 400. The blue box in the picture represents our query . We hope to inquire x Be situated between 50 and 100 Between ,y Be situated between 100 and 300 All points between .

3.1 Data encoding

If the inserted data point is x = 75 and y = 200

1） fill 0（ Data maximum is 400, So fill 3 position ）

x = 075

y = 200

2） Interleave numbers , With x Represents the leftmost number , With y Represents the leftmost number , And so on , To create a code

027050

If you use 00 and 99 Replace the last two , namely 027000 to 027099,map return x and y, namely ：

x = 70-79

y = 200-209

therefore , in the light of x=70-79 and y = 200-209 Two dimensional query , It can be encoded map become 027000 to 027099 One dimensional query , This can be achieved by Gauss Redis Of Zset The structure is easy to realize .

Empathy , We can aim at the last four / 6、 ... and /etc Perform the same operation with digits , So as to obtain a larger range .

3） Use binary

To obtain finer granularity , Data can be represented in binary , So when replacing numbers , Each time, you will get twice the original search scope . Suppose we only need 9 position （ To indicate at most 400 A number of values ）, Our number in binary form will be ：

x = 75 -> 001001011

y = 200 -> 011001000

After interweaving ,000111000011001010

Let's look at the use of 0s ad 1s Replace the last 2、4、6、8,... What is our range of bits ：

3.2 Add a new element

If the inserted data point is x = 75 and y = 200

x = 75 and y = 200 Binary interleaved code is 000111000011001010,

ZADD myindex 0 000111000011001010

3.3 Inquire about

Inquire about ：x Be situated between 50 and 100 Between ,y Be situated between 100 and 300 All points between

Replace from index N Bit will give us a side length of 2^(N/2) Search box . therefore , What we need to do is to check the smaller size of the search box , And check the closest 2 The power of , And constantly divide the remaining space , Then use ZRANGEBYLEX To search .

Here is the sample code ：

def spacequery(x0,y0,x1,y1,exp)

    bits=exp*2

    x_start = x0/(2**exp)

    x_end = x1/(2**exp)

    y_start = y0/(2**exp)

    y_end = y1/(2**exp)

    (x_start..x_end).each{|x|

        (y_start..y_end).each{|y|

            x_range_start = x*(2**exp)

            x_range_end = x_range_start | ((2**exp)-1)

            y_range_start = y*(2**exp)

            y_range_end = y_range_start | ((2**exp)-1)

            puts "#{x},#{y} x from #{x_range_start} to #{x_range_end}, y from #{y_range_start} to #{y_range_end}"

            # Turn it into interleaved form for ZRANGEBYLEX query.

            # We assume we need 9 bits for each integer, so the final

            # interleaved representation will be 18 bits.

            xbin = x_range_start.to_s(2).rjust(9,'0')

            ybin = y_range_start.to_s(2).rjust(9,'0')

            s = xbin.split("").zip(ybin.split("")).flatten.compact.join("")

            # Now that we have the start of the range, calculate the end

            # by replacing the specified number of bits from 0 to 1.

            e = s[0..-(bits+1)]+("1"*bits)

            puts "ZRANGEBYLEX myindex [#{s} [#{e}"

        }

    }

end

spacequery(50,100,100,300,6)

Four 、 summary

This paper introduces how to pass Gauss Redis Build a secondary index , The secondary index is in e-commerce 、 chart （hexastore）、 Games and other fields have a wide range of application scenarios , gaussian redis There are many similar applications in Xianwang . gaussian Redis Based on the architecture of separation of storage and calculation , Rely on distributed storage pool to ensure strong data consistency , It can easily support the secondary index function , Provide stable and reliable services for enterprise customers 、 Super high concurrency , Core data storage services that can be rapidly and elastically expanded .

appendix

The author of this article ： Huawei cloud database GaussDB(for Redis) The team
Hangzhou / Xi'an / Shenzhen resume delivery ：[email protected]
More product information , Welcome to the official blog ：bbs.huaweicloud.com/blogs/248875

Click to follow , The first time to learn about Huawei's new cloud technology ~

Use Gauss Redis More related articles on implementing secondary indexing

HBase Summary of secondary index scheme
from :http://blog.sina.com.cn/s/blog_4a1f59bf01018apd.html attach hbase How to create a secondary index and create a secondary index instance :http://www.aboutyun ...
Use ElasticSearch Empower HBase Secondary indexes | Summary after one year of practice
Preface : Do you remember it was 2018 One summer in , It's very hot , While wiping my sweat, I listened to the leaders' bold talk about their future reform blueprint . The meeting is over , The core idea is : We need to build a big data pool , We should pour all the data that the company can pour into this big pool , Then let others ...
hbase Build a secondary index solution
Official account : Big data technology school , reply " Information ", receive 1024G Information . 1 Why a secondary index is needed HBase The primary index of is rowkey, We can only pass rowkey Search . Suppose we are relatively Hbas ...
MySQL Optimize it MRR (Multi-Range Read: Secondary indexes merge back into tables )
MySQL5.6 Introduced in MRR, To optimize : Secondary index range scan and need to return to the table . Its principle is , Sort multiple secondary indexes that need to be returned to the table according to the primary key , And then go back to the chart together , The original table back to the random IO, Turn it into order IO. file ...
be based on Solr Realization HBase Secondary index of
Source of the article :http://www.open-open.com/lib/view/open1421501717312.html Achieve the goal : because hbase Orderly storage based on exercise keys , It is very efficient to use Xingjian when querying , And then think ...
HBase Secondary index of , as well as phoenix Installation （ Need to do it again ）
One :HBase Secondary index of 1. Explain uid+ts 11111_20161126111111: Inquire about a uid Data in a certain period of time Query the data of all users in a certain period of time : According to the time Index table rowkey:ts+ ...
mysql Secondary index of
mysql Each table in has a clustered index (clustered index ), In addition, every nonclustered index on a table is a secondary index , It's also called auxiliary index (secondary indexes). With InnoDB Come on , Every Inn ...
Use redis The cache can handle millions of database concurrency
Use redis The cache can handle millions of database concurrency Preface : Instructions in advance : In practical application, this kind of design needs to be designed by readers themselves , This article only provides one idea . preparation : Local number after installation redis The server , Use mysql database , Insert in advance 1 ...
HBase The design of secondary index ( Case on )
Abstract A recent project involves the combination of multiple conditions , Data storage uses HBase, just HBase The query for such a scenario is particularly suck , commonly HBase All queries are made through RowKey( The fields of multi condition combination query should be spliced in RowK ...
Phoenix Secondary indexes (Secondary Indexing) Use
Abstract HBase Only a dictionary based primary key index is provided , In the query, you can only query by row key or scan the whole table to obtain data , Use Phoenix Secondary index provided , It can avoid full table scanning when querying data , Improve checked performance , Improve query efficiency test ...

Random recommendation

from js towards Action The solution to the problem of garbled Chinese parameters
Action obtain jsp Chinese parameters in the form , As long as the whole project adopts UTF-8 Coding format will not appear garbled problem : but JSP It is used in JS, And from JS towards Action Transfer Chinese parameters , There will be chaos in Chinese On a project , Find out A ...
Caused by: java.lang.UnsatisfiedLinkError... Solving experiences
Caused by: java.lang.UnsatisfiedLinkError: Couldn't load BaiduMapVOS_v2_1_3: findLibrary returned nu ...
Use C Language describes static linked list and dynamic linked list
Static linked list and dynamic linked list are two different representations of linear list linked storage structure . The initial length of a static list is usually fixed , You don't need to move elements when doing insert and delete operations , Just change the pointer , So it still has the main advantage of chain storage structure . Dynamic linked list is relative to static chain ...
Bird brother linux File permissions and directory configuration of private cooking learning records
stay linux Through ls To view the files Such as ls -al, You can see something similar to the following Give an example to understand If there is only r No permission x Permission cannot enter the directory
Perl New notes
I met a in the project two days ago Perl Script program , You need to understand the program , Because I haven't used it before Perl Language , So it can't be done . Today, I took the time to read the foundation of the language , Basically, I can read it inside Perl Script program . It's really like many experiences shared online , ...
js Abstract classes and abstract methods
js Simulate abstract classes in : Invoke an undefined method in the parent class , This method must be implemented in subclasses . 1, Factory pattern of simulation class // Base class var Class = { // Static method of base class creat:function(){ // ...
How to use MFC Connect Access database
(1) Create a new one Access Database files . I'm going to call it data.mdb, And create a table . Field . (2) Add data sources to the system . open “ Control panel ”—>“ Management tools ”—>“ data source ”, choice “ System DSN”, Click... On the right ...
Dynamics 365 for CRM： modify ADFS The expiration time of ,TokenLifetime
adopt Microsoft PowerShell modify ADFS The expiration time of is extended CRM The expiration time of To change the timeout value, you will need to update t ...
Student information management system （ With XML For storage mode ）
In order to better apply XML, I wrote this little project . Here is the directory structure of my project Project ideas dao yes Date Access Object Data access layer , It's mainly responsible for operating data domain It's the physical layer , Be similar to bean layer , place ...
【 Fault announcement 】10:30-10:45 about docker swarm Cluster node problems cause failures
So sorry , today 10:30-10:45 Because of docker swarm There is a problem with the cluster node , Cause access exception to the site except blog , This brings you a lot of trouble , Please understand . The fault appears at the beginning, sometimes the access is normal, sometimes the access is out ...