当前位置：网站首页>coming! Gaussdb (for Cassandra) new features appear

coming! Gaussdb (for Cassandra) new features appear

2022-07-07 18:43:00 【Hua Weiyun】

today , Hua Wei Yun GaussDB(for Cassandra) carry Lucene Engine new solution Come on. ！

At present , Internet 、 Big data is developing rapidly , The amount of data is growing explosively , In high concurrency 、 High availability 、 Driven by the high expansion of business demand ,NoSQL Database has become the rigid demand of more and more business scenarios . But in terms of query , Conventional NoSQL But it has certain limitations , Strictly speaking , Like open source MongoDB、Cassandra、Hbase Etc. do not have multi-dimensional query of massive data 、 Text retrieval 、 Statistical analysis, etc . Most enterprises are still looking for a more perfect NoSQL Solution .

Huawei cloud native multimode database GaussDB NoSQL Have a strong ecosystem , Support key value 、 A wide watch 、 file 、 Timing four engine interfaces . among , Wide table engine interface GaussDB(for Cassandra) Has been released Lucene Secondary index function , Existing NoSQL The advantages of , It can also support a variety of complex query scenarios , Comprehensively improve users' query experience in massive data scenarios , Spoil powder with strength ！ I believe you must have many questions ,GaussDB(for Cassandra) What is it? ？ How to use secondary index ？Lucene What are the differences between secondary indexes ？ take it easy , Next, let's interpret them one by one .

What is? GaussDB(for Cassandra)？

GaussDB(for Cassandra) It is a Huawei self-developed 、 Distributed cloud database with computing storage separation architecture , In high performance 、 High availability 、 Highly reliable 、 High security 、 On the basis of elastic expansion and contraction , Provides one click deployment 、 Backup recovery 、 Monitoring alarm and other service capabilities ; And highly compatible with open source Cassandra Interface , Provide high read / write performance . At present, it has been widely used in IoT、 meteorological 、 Internet 、 Games and many other fields .

What is a secondary index ？

Let's first understand the concept of index . Index is a storage structure created to speed up data retrieval , It is a design idea of exchanging space for time . The function can be understood as the catalogue of books , Through the directory, you can quickly locate the required content .

stay Cassandra in ,Primary Key It's index. （ Also known as primary index ）, At query time , according to Primary Key You can directly retrieve the corresponding records . And secondary index is also called auxiliary index , To help locate the primary index , Then find the corresponding record according to the primary index . We usually use CREATE INDEX The statement establishes a secondary index .

At present Cassandra What are the pain points of the secondary index ？

Native Cassandra The implementation of secondary index in actually creates an implicit table , Of this table Primary Key Is the column that creates the index , The value is the corresponding Primary Key, Implementation is relatively simple , Therefore, it is inevitable to bring some constraints ：

1. The first primary key can only be used “=” Inquire about ;

2. The second primary key can use “=、>、<、>=、<=”;

3. Index columns only support “=” Inquire about ;

4. Delete 、 Columns that are updated too frequently are not suitable for indexing ;

5.High-cardinality Columns are not suitable for indexing ;

Based on the above constraints ,Cassandra The query function that secondary index can provide is very limited .

Why Lucene？

Lucene It is currently the most popular open source full-text search engine tool , It has the following characteristics ：

1. Stable 、 High indexing performance ;

2. It's efficient 、 accuracy 、 High performance search algorithm ;

3. Rich query types ： Support phrase query 、 Wildcard query 、 Approximate query 、 Range query, etc ;

4. There is strong open source community support , Good maintainability ;

therefore , Use integration Lucene Engine to supplement Cassandra The weakness of query ability is the best choice , After all, who would refuse a stable performance 、 Continued growth 、 And update the iterative search engine ？

Lucene The engine has powerful inverted index and columnar storage capacity , Given GaussDB(for Cassandra) Efficient multidimensional query 、 Text retrieval 、 Statistical analysis, etc , It is similar to the native secondary index in use experience , But at the same time, it has richer syntax support .

Use Lucene After secondary index , What changes have taken place in my query ？

More flexible query 、 Filtering method ：

All queries can be made without PK Or take part PK, And the index column supports “>、<、in” Wait for the operator , Users no longer need to be limited to using “=”.

Strong text retrieval ability ：

Text retrieval ability is Lucene What I'm good at , It's very convenient to use , Just pass the keyword like That is to say .

You can do this ：

SELECT * FROM example WHERE field LIKE 'test%';   //  Prefix query

You can do that ：

SELECT * FROM example WHERE field LIKE 'start*end';   //  Regular matching

It can be like this ：

SELECT * FROM example WHERE field LIKE '%+lucene +index%';   //  Full text search , High performance , Stable

Support the statistics of large amount of data exceeding trillion specifications ：

select count(*) from example where pk > 1 and expr(lucene_index, 'count');

Multiple deletion methods ：

Support single Single row deletion 、partition Partition deletion 、range Scope delete , Cover all kinds of deletion scenes .

DELETE FROM example WHERE pk1='a' AND field=1;   // single Single row deletion

DELETE FROM example WHERE pk1='a' AND pk2=5000;   // partition Partition deletion

DELETE FROM example WHERE pk1='a' AND pk2=3000 AND ck1=2 AND ck2>'a' AND ck2<'c';   // range Scope delete

Support extended json Query interface , Easily deal with various complex query scenarios ：

Extended json Query interface provides rich query syntax , More diverse usage . The following is a list of keywords ：

filter	In the query statement json Search keywords
term	When querying, judge a document Whether to include a specific value , Word segmentation query will not be performed on the queried value
match	Segment the queried value , Full text search
range	Query specifies that a field is in a specific range ( Range query subkey ："eq"/"gte"/"gt"/"lte"/"lt")
bool	It has to be with "must"、"should"、"must not" Combine complex queries together
must	bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about
should	bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about
must not	bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about

Take a chestnut ：

SELECT * FROM example WHERE EXPR(index_field, '{"filter": {"bool": {"should": [{"bool": {"should": [{"bool": {"must": [{"bool": {"should": [{"range": {"ck1": {"lt": 2}, "ck1": {"gte": 4}}}]}}, {"bool": {"should": [{"range": {"field1": {"lt": 2}, "field1": {"gt": 3}}}]}}]}}, {"bool": {"should": [{"term": {"pk1": "a", "pk1": "b", "pk1": "c"}}]}}]}}, {"bool": {"must": [{"range": {"field2": {"gte":5, "lte": 15}, "pk2": {"gt": 2000}}}]}}]}}}')

Add nesting through condition combination , You can DIY In line with their own business sql sentence , And the highest support 200 layer json nesting , Even complex scenes can be handled ！

Hua Wei Yun GaussDB(for Cassandra) carrying Lucene engine , adopt Lucene The secondary index sinks the search ability to the bottom , Fundamentally liberated the application layer query , Multi dimensional query 、 Text retrieval 、 Statistical analysis and other abilities , It can perfectly make up for NoSQL Weak query function short board , Let enterprises calmly deal with the complex query scenario of massive data . What are we waiting for? , Come and experience it ！

appendix

The author of this article ： Huawei cloud Cassandra The team

Hangzhou, Xi'an, Shenzhen resume delivery ：[email protected]

More technical articles , Please pay attention to Gauss Cassandra The official blog ：https://bbs.huaweicloud.com/community/usersnew/id_1563519101830986

gaussian Cassandra Official home page ：https://www.huaweicloud.com/product/gaussdbforcassandra.html

原网站

版权声明
本文为[Hua Weiyun]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/188/202207071638118489.html