当前位置:网站首页>coming! Gaussdb (for Cassandra) new features appear

coming! Gaussdb (for Cassandra) new features appear

2022-07-07 18:43:00 Hua Weiyun

today , Hua Wei Yun GaussDB(for Cassandra) carry Lucene Engine new solution Come on.

At present , Internet 、 Big data is developing rapidly , The amount of data is growing explosively , In high concurrency 、 High availability 、 Driven by the high expansion of business demand ,NoSQL Database has become the rigid demand of more and more business scenarios . But in terms of query , Conventional NoSQL But it has certain limitations , Strictly speaking , Like open source MongoDBCassandraHbase Etc. do not have multi-dimensional query of massive data 、 Text retrieval 、 Statistical analysis, etc . Most enterprises are still looking for a more perfect NoSQL Solution .

Huawei cloud native multimode database GaussDB NoSQL Have a strong ecosystem , Support key value 、 A wide watch 、 file 、 Timing four engine interfaces . among , Wide table engine interface GaussDB(for Cassandra) Has been released Lucene Secondary index function , Existing NoSQL The advantages of , It can also support a variety of complex query scenarios , Comprehensively improve users' query experience in massive data scenarios , Spoil powder with strength ! I believe you must have many questions ,GaussDB(for Cassandra) What is it? ? How to use secondary index ?Lucene What are the differences between secondary indexes ? take it easy , Next, let's interpret them one by one .

 Just see .png

What is? GaussDB(for Cassandra)

GaussDB(for Cassandra) It is a Huawei self-developed 、 Distributed cloud database with computing storage separation architecture , In high performance 、 High availability 、 Highly reliable 、 High security 、 On the basis of elastic expansion and contraction , Provides one click deployment 、 Backup recovery 、 Monitoring alarm and other service capabilities ; And highly compatible with open source Cassandra Interface , Provide high read / write performance . At present, it has been widely used in IoT、 meteorological 、 Internet 、 Games and many other fields .

What is a secondary index ?

Let's first understand the concept of index . Index is a storage structure created to speed up data retrieval , It is a design idea of exchanging space for time . The function can be understood as the catalogue of books , Through the directory, you can quickly locate the required content .

stay Cassandra in ,Primary Key It's index. ( Also known as primary index ), At query time , according to Primary Key You can directly retrieve the corresponding records . And secondary index is also called auxiliary index , To help locate the primary index , Then find the corresponding record according to the primary index . We usually use CREATE INDEX The statement establishes a secondary index .

At present Cassandra What are the pain points of the secondary index

Native Cassandra The implementation of secondary index in actually creates an implicit table , Of this table Primary Key Is the column that creates the index , The value is the corresponding Primary Key, Implementation is relatively simple , Therefore, it is inevitable to bring some constraints :

1. The first primary key can only be used “=” Inquire about ;

2. The second primary key can use “=><>=<=”;

3. Index columns only support “=” Inquire about ;

4. Delete 、 Columns that are updated too frequently are not suitable for indexing ;

5.High-cardinality Columns are not suitable for indexing ;

Based on the above constraints ,Cassandra The query function that secondary index can provide is very limited .

Why Lucene

Lucene It is currently the most popular open source full-text search engine tool , It has the following characteristics :

1. Stable 、 High indexing performance ;

2. It's efficient 、 accuracy 、 High performance search algorithm ;

3. Rich query types : Support phrase query 、 Wildcard query 、 Approximate query 、 Range query, etc ;

4. There is strong open source community support , Good maintainability ;

therefore , Use integration Lucene Engine to supplement Cassandra The weakness of query ability is the best choice , After all, who would refuse a stable performance 、 Continued growth 、 And update the iterative search engine ?

Lucene The engine has powerful inverted index and columnar storage capacity , Given GaussDB(for Cassandra) Efficient multidimensional query 、 Text retrieval 、 Statistical analysis, etc , It is similar to the native secondary index in use experience , But at the same time, it has richer syntax support .

Use Lucene After secondary index , What changes have taken place in my query ?

More flexible query 、 Filtering method :

All queries can be made without PK Or take part PK, And the index column supports “><in” Wait for the operator , Users no longer need to be limited to using “=”.

Strong text retrieval ability :

Text retrieval ability is Lucene What I'm good at , It's very convenient to use , Just pass the keyword like That is to say .

You can do this :

SELECT * FROM example WHERE field LIKE 'test%';   //  Prefix query 

You can do that :

SELECT * FROM example WHERE field LIKE 'start*end';   //  Regular matching 

It can be like this :

SELECT * FROM example WHERE field LIKE '%+lucene +index%';   //  Full text search , High performance , Stable 

Support the statistics of large amount of data exceeding trillion specifications :

select count(*) from example where pk > 1 and expr(lucene_index, 'count');  

Multiple deletion methods :

Support single Single row deletion 、partition Partition deletion 、range Scope delete , Cover all kinds of deletion scenes .

DELETE FROM example WHERE pk1='a' AND field=1;   // single Single row deletion 
DELETE FROM example WHERE pk1='a' AND pk2=5000;   // partition Partition deletion 
DELETE FROM example WHERE pk1='a' AND pk2=3000 AND ck1=2 AND ck2>'a' AND ck2<'c';   // range Scope delete 

Support extended json Query interface , Easily deal with various complex query scenarios :

Extended json Query interface provides rich query syntax , More diverse usage . The following is a list of keywords :

filter

In the query statement json Search keywords

term

When querying, judge a document Whether to include a specific value , Word segmentation query will not be performed on the queried value

match

Segment the queried value , Full text search

range

Query specifies that a field is in a specific range ( Range query subkey :"eq"/"gte"/"gt"/"lte"/"lt")

bool

It has to be with "must""should""must not" Combine complex queries together

must

bool Type of subquery , The type is list, encapsulation "term""match""range" Inquire about

should

bool Type of subquery , The type is list, encapsulation "term""match""range" Inquire about

must not

bool Type of subquery , The type is list, encapsulation "term""match""range" Inquire about

Take a chestnut :

SELECT * FROM example WHERE EXPR(index_field, '{"filter": {"bool": {"should": [{"bool": {"should": [{"bool": {"must": [{"bool": {"should": [{"range": {"ck1": {"lt": 2}, "ck1": {"gte": 4}}}]}}, {"bool": {"should": [{"range": {"field1": {"lt": 2}, "field1": {"gt": 3}}}]}}]}}, {"bool": {"should": [{"term": {"pk1": "a", "pk1": "b", "pk1": "c"}}]}}]}}, {"bool": {"must": [{"range": {"field2": {"gte":5, "lte": 15}, "pk2": {"gt": 2000}}}]}}]}}}')

Add nesting through condition combination , You can DIY In line with their own business sql sentence , And the highest support 200 layer json nesting , Even complex scenes can be handled !

Hua Wei Yun GaussDB(for Cassandra) carrying Lucene engine , adopt Lucene The secondary index sinks the search ability to the bottom , Fundamentally liberated the application layer query , Multi dimensional query 、 Text retrieval 、 Statistical analysis and other abilities , It can perfectly make up for NoSQL Weak query function short board , Let enterprises calmly deal with the complex query scenario of massive data . What are we waiting for? , Come and experience it !

appendix

The author of this article : Huawei cloud Cassandra The team

Hangzhou, Xi'an, Shenzhen resume delivery :[email protected]

More technical articles , Please pay attention to Gauss Cassandra The official blog :https://bbs.huaweicloud.com/community/usersnew/id_1563519101830986

gaussian Cassandra Official home page :https://www.huaweicloud.com/product/gaussdbforcassandra.html

原网站

版权声明
本文为[Hua Weiyun]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071638118489.html