当前位置：网站首页>Huawei cloud gaussdb (for redis) unveils issue 21: using Gauss redis to achieve secondary indexing

Huawei cloud gaussdb (for redis) unveils issue 21: using Gauss redis to achieve secondary indexing

2022-07-06 23:02:00 【Hua Weiyun】

One 、 background

Bring up the index , The first impression is the noun of database , however , gaussian Redis You can also implement secondary indexing ！！！ gaussian Redis The secondary index in generally uses zset To achieve . gaussian Redis Compared to open source Redis It has higher stability 、 And cost advantage , Use Gauss Redis zset Realize the secondary index of business , You can get a win-win situation of performance and cost .

The essence of index is to use ordered structure to speed up query , So by Zset Structural Gauss Redis It can easily realize the index of numeric type and character type .

• Numeric type index （zset Sort by score ）：

• Character type index （ When the scores are the same zset Sort by dictionary order ）：

Let's cut into two classic business scenarios , See how to use Gauss Redis To build a stable and reliable secondary index system .

Two 、 Scene one ： Complete the dictionary

When Press... In the browser When entering the query , Browsers usually recommend searches with the same prefix according to the possibility , This scenario It can be used gaussian Redis The secondary index function is realized .

2.1 Basic plan

The easiest way is to add each query of the user to the index . When need Conduct User input completion recommend when , Use ZRANGEBYLEX Execute range query that will do . If you don't want to return too many entries , gaussian Redis And support Use LIMIT Option to Reduce the number of results .

• Search the user banana Add to index ：

ZADD myindex 0 banana:1

• Suppose the user enters “bit”, And we want to offer the possibility to “bit” Search keywords at the beginning .

ZRANGEBYLEX myindex "[bit" "[bit\xff"

That is to use ZRANGEBYLEX Make a range query , The query range is the string that the user is now entering , And the same string plus a trailing byte 255（\xff）. In this way , We can get all strings prefixed with the string typed by the user .

2.2 And frequency Relevant dictionary completion

In practical applications, we usually want to follow the frequency Automatically Sort and complete the entries , Also can Clear out entries that are no longer popular , And automatically adapt to future input . We can still use Gauss Redis Of ZSet Structure to achieve this goal , Just in the index structure , Not only need Store search terms , also need Store the frequency associated with it .

• Search the user banana Add to index

• Judge banana Whether there is

ZRANGEBYLEX myindex "[banana:" + LIMIT 0 1

• hypothesis banana non-existent , add to banana:1, among 1 It's the frequency

ZADD myindex 0 banana:1

• hypothesis banana There is , You need to increase the frequency

if ZRANGEBYLEX myindex "[banana:" + LIMIT 0 1 The frequency returned in is 1

1） Delete old entries ：

ZREM myindex 0 banana:1

2） Add one frequency to rejoin ：

ZADD myindex 0 banana:2

Please note that , Because there may be concurrent updates , Therefore, it should pass Lua The script sends the above three commands , use Lua script Automatically get the old count and add the entry again after increasing the score .

• Suppose the user enters “banana”, And we want to provide similar Search keywords for . adopt ZRANGEBYLEX The results are sorted by frequency .

ZRANGEBYLEX myindex "[banana:" + LIMIT 0 101) "banana:123"2) "banaooo:1"3) "banned user:49"4) "banning:89"

• Use flow algorithm Clear uncommon input . Select an entry randomly from the returned entries , Score it reduce 1, Then add it again with the new score . however , If the new score is 0, We You need to delete this item from the list Objective .

• If the frequency of randomly selected items is 1, Such as banaooo:1

ZREM myindex 0 banaooo:1

• If the frequency of randomly selected items is greater than 1, Such as banana:123

ZREM myindex 0 banana:123ZADD myindex 0 banana:122

In the long run , The index will contain popular searches , If popular searches change over time , It will also automatically adapt .

3、 ... and 、 Scene two ： Multidimensional index

Except for queries on a single dimension , gaussian Redis It also supports retrieval in multidimensional data . for example , Retrieve all ages at 50 to 55 Between the ages of , meanwhile Salary in 70000 to 85000 Between people . The key to realize multi-dimensional secondary index is to convert two-dimensional data into One-dimensional data , Then based on Gauss Redis zset Storage .

from The visual perspective represents a two-dimensional index . The figure below There are some points in space , They represent our data samples , among x and y yes Two variables , Its The maximum values are 400. The blue box in the picture represents our query . We hope Inquire about x Be situated between 50 and 100 Between ,y Be situated between 100 and 300 All points between .

3.1 Data encoding

If the inserted data point is x = 75 and y = 200

1） fill 0（ Data maximum is 400, So fill 3 position ）

x = 075

y = 200

2） Interleave numbers , With x Represents the leftmost number , With y Represents the leftmost number , And so on , To create a code

027050

If you use 00 and 99 Replace the last two , namely 027000 to 027099,map return x and y, namely ：

x = 70-79

y = 200-209

therefore , in the light of x=70-79 and y = 200-209 Two dimensional query , It can be encoded map become 027000 to 027099 One dimensional query , This can be achieved by Gauss Redis Of Zset The structure is easy to realize .

Empathy , We can aim at the last four / 6、 ... and /etc Digit number The same operation , So as to obtain a larger range .

3） Use binary

To obtain finer granularity , Data can be represented in binary , So when replacing numbers , Each time, you will get twice the original search scope . Suppose we only need 9 position （ To indicate at most 400 A number of values ）, Our number in binary form will be ：

x = 75 -> 001001011

y = 200 -> 011001000

After interweaving ,000111000011001010

Let's look at the use of 0s ad 1s Replace the last 2、4、6、8,... What is our range of bits ：

3.2 Add a new element

If the inserted data point is x = 75 and y = 200

x = 75 and y = 200 Binary interleaved code is 000111000011001010,

ZADD myindex 0 000111000011001010

3.3 Inquire about

Inquire about ：x Be situated between 50 and 100 Between ,y Be situated between 100 and 300 All points between

Replace from index N Bit will give us a side length of 2^(N/2) Search box . therefore , What we need to do is to check the smaller size of the search box , And check the closest 2 The power of , And constantly divide the remaining space , Then use ZRANGEBYLEX To search .

Here is Sample code ：

def spacequery(x0,y0,x1,y1,exp)    bits=exp*2    x_start = x0/(2**exp)    x_end = x1/(2**exp)    y_start = y0/(2**exp)    y_end = y1/(2**exp)    (x_start..x_end).each{|x|        (y_start..y_end).each{|y|            x_range_start = x*(2**exp)            x_range_end = x_range_start | ((2**exp)-1)            y_range_start = y*(2**exp)            y_range_end = y_range_start | ((2**exp)-1)            puts "#{x},#{y} x from #{x_range_start} to #{x_range_end}, y from #{y_range_start} to #{y_range_end}"            # Turn it into interleaved form for ZRANGEBYLEX query.            # We assume we need 9 bits for each integer, so the final            # interleaved representation will be 18 bits.            xbin = x_range_start.to_s(2).rjust(9,'0')            ybin = y_range_start.to_s(2).rjust(9,'0')            s = xbin.split("").zip(ybin.split("")).flatten.compact.join("")            # Now that we have the start of the range, calculate the end            # by replacing the specified number of bits from 0 to 1.            e = s[0..-(bits+1)]+("1"*bits)            puts "ZRANGEBYLEX myindex [#{s} [#{e}"        }    }endspacequery(50,100,100,300,6)

Four 、 summary

This paper introduces how to pass Gauss Redis Build a secondary index , The secondary index is in e-commerce 、 chart （hexastore）、 Games and other fields have a wide range of application scenarios , gaussian redis There are many similar applications in Xianwang . gaussian Redis Based on the architecture of separation of storage and calculation , Rely on distributed storage pool to ensure strong data consistency , It can easily support the secondary index function , Provide stable and reliable services for enterprise customers 、 Super high concurrency , Core data storage services that can be rapidly and elastically expanded .