当前位置:网站首页>coming! Gaussdb (for Cassandra) new features appear
coming! Gaussdb (for Cassandra) new features appear
2022-07-07 18:43:00 【Hua Weiyun】
today , Hua Wei Yun GaussDB(for Cassandra) carry Lucene Engine new solution Come on. !
At present , Internet 、 Big data is developing rapidly , The amount of data is growing explosively , In high concurrency 、 High availability 、 Driven by the high expansion of business demand ,NoSQL Database has become the rigid demand of more and more business scenarios . But in terms of query , Conventional NoSQL But it has certain limitations , Strictly speaking , Like open source MongoDB、Cassandra、Hbase Etc. do not have multi-dimensional query of massive data 、 Text retrieval 、 Statistical analysis, etc . Most enterprises are still looking for a more perfect NoSQL Solution .
Huawei cloud native multimode database GaussDB NoSQL Have a strong ecosystem , Support key value 、 A wide watch 、 file 、 Timing four engine interfaces . among , Wide table engine interface GaussDB(for Cassandra) Has been released Lucene Secondary index function , Existing NoSQL The advantages of , It can also support a variety of complex query scenarios , Comprehensively improve users' query experience in massive data scenarios , Spoil powder with strength ! I believe you must have many questions ,GaussDB(for Cassandra) What is it? ? How to use secondary index ?Lucene What are the differences between secondary indexes ? take it easy , Next, let's interpret them one by one .

What is? GaussDB(for Cassandra)?
GaussDB(for Cassandra) It is a Huawei self-developed 、 Distributed cloud database with computing storage separation architecture , In high performance 、 High availability 、 Highly reliable 、 High security 、 On the basis of elastic expansion and contraction , Provides one click deployment 、 Backup recovery 、 Monitoring alarm and other service capabilities ; And highly compatible with open source Cassandra Interface , Provide high read / write performance . At present, it has been widely used in IoT、 meteorological 、 Internet 、 Games and many other fields .
What is a secondary index ?
Let's first understand the concept of index . Index is a storage structure created to speed up data retrieval , It is a design idea of exchanging space for time . The function can be understood as the catalogue of books , Through the directory, you can quickly locate the required content .
stay Cassandra in ,Primary Key It's index. ( Also known as primary index ), At query time , according to Primary Key You can directly retrieve the corresponding records . And secondary index is also called auxiliary index , To help locate the primary index , Then find the corresponding record according to the primary index . We usually use CREATE INDEX The statement establishes a secondary index .
At present Cassandra What are the pain points of the secondary index ?
Native Cassandra The implementation of secondary index in actually creates an implicit table , Of this table Primary Key Is the column that creates the index , The value is the corresponding Primary Key, Implementation is relatively simple , Therefore, it is inevitable to bring some constraints :
1. The first primary key can only be used “=” Inquire about ;
2. The second primary key can use “=、>、<、>=、<=”;
3. Index columns only support “=” Inquire about ;
4. Delete 、 Columns that are updated too frequently are not suitable for indexing ;
5.High-cardinality Columns are not suitable for indexing ;
Based on the above constraints ,Cassandra The query function that secondary index can provide is very limited .
Why Lucene?
Lucene It is currently the most popular open source full-text search engine tool , It has the following characteristics :
1. Stable 、 High indexing performance ;
2. It's efficient 、 accuracy 、 High performance search algorithm ;
3. Rich query types : Support phrase query 、 Wildcard query 、 Approximate query 、 Range query, etc ;
4. There is strong open source community support , Good maintainability ;
therefore , Use integration Lucene Engine to supplement Cassandra The weakness of query ability is the best choice , After all, who would refuse a stable performance 、 Continued growth 、 And update the iterative search engine ?
Lucene The engine has powerful inverted index and columnar storage capacity , Given GaussDB(for Cassandra) Efficient multidimensional query 、 Text retrieval 、 Statistical analysis, etc , It is similar to the native secondary index in use experience , But at the same time, it has richer syntax support .
Use Lucene After secondary index , What changes have taken place in my query ?
More flexible query 、 Filtering method :
All queries can be made without PK Or take part PK, And the index column supports “>、<、in” Wait for the operator , Users no longer need to be limited to using “=”.
Strong text retrieval ability :
Text retrieval ability is Lucene What I'm good at , It's very convenient to use , Just pass the keyword like That is to say .
You can do this :
SELECT * FROM example WHERE field LIKE 'test%'; // Prefix query You can do that :
SELECT * FROM example WHERE field LIKE 'start*end'; // Regular matching It can be like this :
SELECT * FROM example WHERE field LIKE '%+lucene +index%'; // Full text search , High performance , Stable Support the statistics of large amount of data exceeding trillion specifications :
select count(*) from example where pk > 1 and expr(lucene_index, 'count'); Multiple deletion methods :
Support single Single row deletion 、partition Partition deletion 、range Scope delete , Cover all kinds of deletion scenes .
DELETE FROM example WHERE pk1='a' AND field=1; // single Single row deletion DELETE FROM example WHERE pk1='a' AND pk2=5000; // partition Partition deletion DELETE FROM example WHERE pk1='a' AND pk2=3000 AND ck1=2 AND ck2>'a' AND ck2<'c'; // range Scope delete Support extended json Query interface , Easily deal with various complex query scenarios :
Extended json Query interface provides rich query syntax , More diverse usage . The following is a list of keywords :
filter | In the query statement json Search keywords |
term | When querying, judge a document Whether to include a specific value , Word segmentation query will not be performed on the queried value |
match | Segment the queried value , Full text search |
range | Query specifies that a field is in a specific range ( Range query subkey :"eq"/"gte"/"gt"/"lte"/"lt") |
bool | It has to be with "must"、"should"、"must not" Combine complex queries together |
must | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
should | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
must not | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
Take a chestnut :
SELECT * FROM example WHERE EXPR(index_field, '{"filter": {"bool": {"should": [{"bool": {"should": [{"bool": {"must": [{"bool": {"should": [{"range": {"ck1": {"lt": 2}, "ck1": {"gte": 4}}}]}}, {"bool": {"should": [{"range": {"field1": {"lt": 2}, "field1": {"gt": 3}}}]}}]}}, {"bool": {"should": [{"term": {"pk1": "a", "pk1": "b", "pk1": "c"}}]}}]}}, {"bool": {"must": [{"range": {"field2": {"gte":5, "lte": 15}, "pk2": {"gt": 2000}}}]}}]}}}')Add nesting through condition combination , You can DIY In line with their own business sql sentence , And the highest support 200 layer json nesting , Even complex scenes can be handled !
Hua Wei Yun GaussDB(for Cassandra) carrying Lucene engine , adopt Lucene The secondary index sinks the search ability to the bottom , Fundamentally liberated the application layer query , Multi dimensional query 、 Text retrieval 、 Statistical analysis and other abilities , It can perfectly make up for NoSQL Weak query function short board , Let enterprises calmly deal with the complex query scenario of massive data . What are we waiting for? , Come and experience it !
appendix
The author of this article : Huawei cloud Cassandra The team
Hangzhou, Xi'an, Shenzhen resume delivery :[email protected]
More technical articles , Please pay attention to Gauss Cassandra The official blog :https://bbs.huaweicloud.com/community/usersnew/id_1563519101830986
gaussian Cassandra Official home page :https://www.huaweicloud.com/product/gaussdbforcassandra.html
边栏推荐
- gsap动画库
- PIP related commands
- 2022年理财有哪些产品?哪些适合新手?
- Do you really understand sticky bag and half bag? 3 minutes to understand it
- 通过 Play Integrity API 的 nonce 字段提高应用安全性
- [C language] string function
- 保证接口数据安全的10种方案
- Wireshark analyzes packet capture data * cap
- 静态路由配置
- Ten thousand words nanny level long article -- offline installation guide for datahub of LinkedIn metadata management platform
猜你喜欢

【Unity Shader】插入Pass实现模型遮挡X光透视效果

Learn to make dynamic line chart in 3 minutes!

gsap动画库

More than 10000 units were offline within ten days of listing, and the strength of Auchan Z6 products was highly praised

嵌入式C语言程序调试和宏使用的技巧

Five network IO models

Performance test process and plan
![[PaddleSeg源码阅读] PaddleSeg Validation 中添加 Boundary IoU的计算(1)——val.py文件细节提示](/img/f2/b6a0e5512b35cf1b695a8feecd0895.png)
[PaddleSeg源码阅读] PaddleSeg Validation 中添加 Boundary IoU的计算(1)——val.py文件细节提示

Download, installation and development environment construction of "harmonyos" deveco

Will low code help enterprises' digital transformation make programmers unemployed?
随机推荐
Kirk Borne的本周学习资源精选【点击标题直接下载】
A few simple steps to teach you how to see the K-line diagram
What is the general yield of financial products in 2022?
[trusted computing] Lesson 12: TPM authorization and conversation
Tips for this week 131: special member functions and ` = Default`
Is it safe to open an online futures account now? How many regular futures companies are there in China?
PHP面试题 foreach($arr as &$value)与foreach($arr as $value)的用法
Nunjuks template engine
Discuss | what preparations should be made before ar application is launched?
[principle and technology of network attack and Defense] Chapter 1: Introduction
元宇宙带来的创意性改变
直播预约通道开启!解锁音视频应用快速上线的秘诀
Cloud security daily 220707: Cisco Expressway series and telepresence video communication server have found remote attack vulnerabilities and need to be upgraded as soon as possible
[principles and technologies of network attack and Defense] Chapter 3: network reconnaissance technology
golang 客户端服务端登录
[principle and technology of network attack and Defense] Chapter 7: password attack technology Chapter 8: network monitoring technology
Personal best practice demo sharing of enum + validation
Chapter 2 build CRM project development environment (database design)
Performance test process and plan
Thread pool and singleton mode and file operation