当前位置:网站首页>coming! Gaussdb (for Cassandra) new features appear
coming! Gaussdb (for Cassandra) new features appear
2022-07-07 18:43:00 【Hua Weiyun】
today , Hua Wei Yun GaussDB(for Cassandra) carry Lucene Engine new solution Come on. !
At present , Internet 、 Big data is developing rapidly , The amount of data is growing explosively , In high concurrency 、 High availability 、 Driven by the high expansion of business demand ,NoSQL Database has become the rigid demand of more and more business scenarios . But in terms of query , Conventional NoSQL But it has certain limitations , Strictly speaking , Like open source MongoDB、Cassandra、Hbase Etc. do not have multi-dimensional query of massive data 、 Text retrieval 、 Statistical analysis, etc . Most enterprises are still looking for a more perfect NoSQL Solution .
Huawei cloud native multimode database GaussDB NoSQL Have a strong ecosystem , Support key value 、 A wide watch 、 file 、 Timing four engine interfaces . among , Wide table engine interface GaussDB(for Cassandra) Has been released Lucene Secondary index function , Existing NoSQL The advantages of , It can also support a variety of complex query scenarios , Comprehensively improve users' query experience in massive data scenarios , Spoil powder with strength ! I believe you must have many questions ,GaussDB(for Cassandra) What is it? ? How to use secondary index ?Lucene What are the differences between secondary indexes ? take it easy , Next, let's interpret them one by one .
What is? GaussDB(for Cassandra)?
GaussDB(for Cassandra) It is a Huawei self-developed 、 Distributed cloud database with computing storage separation architecture , In high performance 、 High availability 、 Highly reliable 、 High security 、 On the basis of elastic expansion and contraction , Provides one click deployment 、 Backup recovery 、 Monitoring alarm and other service capabilities ; And highly compatible with open source Cassandra Interface , Provide high read / write performance . At present, it has been widely used in IoT、 meteorological 、 Internet 、 Games and many other fields .
What is a secondary index ?
Let's first understand the concept of index . Index is a storage structure created to speed up data retrieval , It is a design idea of exchanging space for time . The function can be understood as the catalogue of books , Through the directory, you can quickly locate the required content .
stay Cassandra in ,Primary Key It's index. ( Also known as primary index ), At query time , according to Primary Key You can directly retrieve the corresponding records . And secondary index is also called auxiliary index , To help locate the primary index , Then find the corresponding record according to the primary index . We usually use CREATE INDEX The statement establishes a secondary index .
At present Cassandra What are the pain points of the secondary index ?
Native Cassandra The implementation of secondary index in actually creates an implicit table , Of this table Primary Key Is the column that creates the index , The value is the corresponding Primary Key, Implementation is relatively simple , Therefore, it is inevitable to bring some constraints :
1. The first primary key can only be used “=” Inquire about ;
2. The second primary key can use “=、>、<、>=、<=”;
3. Index columns only support “=” Inquire about ;
4. Delete 、 Columns that are updated too frequently are not suitable for indexing ;
5.High-cardinality Columns are not suitable for indexing ;
Based on the above constraints ,Cassandra The query function that secondary index can provide is very limited .
Why Lucene?
Lucene It is currently the most popular open source full-text search engine tool , It has the following characteristics :
1. Stable 、 High indexing performance ;
2. It's efficient 、 accuracy 、 High performance search algorithm ;
3. Rich query types : Support phrase query 、 Wildcard query 、 Approximate query 、 Range query, etc ;
4. There is strong open source community support , Good maintainability ;
therefore , Use integration Lucene Engine to supplement Cassandra The weakness of query ability is the best choice , After all, who would refuse a stable performance 、 Continued growth 、 And update the iterative search engine ?
Lucene The engine has powerful inverted index and columnar storage capacity , Given GaussDB(for Cassandra) Efficient multidimensional query 、 Text retrieval 、 Statistical analysis, etc , It is similar to the native secondary index in use experience , But at the same time, it has richer syntax support .
Use Lucene After secondary index , What changes have taken place in my query ?
More flexible query 、 Filtering method :
All queries can be made without PK Or take part PK, And the index column supports “>、<、in” Wait for the operator , Users no longer need to be limited to using “=”.
Strong text retrieval ability :
Text retrieval ability is Lucene What I'm good at , It's very convenient to use , Just pass the keyword like That is to say .
You can do this :
SELECT * FROM example WHERE field LIKE 'test%'; // Prefix query
You can do that :
SELECT * FROM example WHERE field LIKE 'start*end'; // Regular matching
It can be like this :
SELECT * FROM example WHERE field LIKE '%+lucene +index%'; // Full text search , High performance , Stable
Support the statistics of large amount of data exceeding trillion specifications :
select count(*) from example where pk > 1 and expr(lucene_index, 'count');
Multiple deletion methods :
Support single Single row deletion 、partition Partition deletion 、range Scope delete , Cover all kinds of deletion scenes .
DELETE FROM example WHERE pk1='a' AND field=1; // single Single row deletion
DELETE FROM example WHERE pk1='a' AND pk2=5000; // partition Partition deletion
DELETE FROM example WHERE pk1='a' AND pk2=3000 AND ck1=2 AND ck2>'a' AND ck2<'c'; // range Scope delete
Support extended json Query interface , Easily deal with various complex query scenarios :
Extended json Query interface provides rich query syntax , More diverse usage . The following is a list of keywords :
filter | In the query statement json Search keywords |
term | When querying, judge a document Whether to include a specific value , Word segmentation query will not be performed on the queried value |
match | Segment the queried value , Full text search |
range | Query specifies that a field is in a specific range ( Range query subkey :"eq"/"gte"/"gt"/"lte"/"lt") |
bool | It has to be with "must"、"should"、"must not" Combine complex queries together |
must | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
should | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
must not | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
Take a chestnut :
SELECT * FROM example WHERE EXPR(index_field, '{"filter": {"bool": {"should": [{"bool": {"should": [{"bool": {"must": [{"bool": {"should": [{"range": {"ck1": {"lt": 2}, "ck1": {"gte": 4}}}]}}, {"bool": {"should": [{"range": {"field1": {"lt": 2}, "field1": {"gt": 3}}}]}}]}}, {"bool": {"should": [{"term": {"pk1": "a", "pk1": "b", "pk1": "c"}}]}}]}}, {"bool": {"must": [{"range": {"field2": {"gte":5, "lte": 15}, "pk2": {"gt": 2000}}}]}}]}}}')
Add nesting through condition combination , You can DIY In line with their own business sql sentence , And the highest support 200 layer json nesting , Even complex scenes can be handled !
Hua Wei Yun GaussDB(for Cassandra) carrying Lucene engine , adopt Lucene The secondary index sinks the search ability to the bottom , Fundamentally liberated the application layer query , Multi dimensional query 、 Text retrieval 、 Statistical analysis and other abilities , It can perfectly make up for NoSQL Weak query function short board , Let enterprises calmly deal with the complex query scenario of massive data . What are we waiting for? , Come and experience it !
appendix
The author of this article : Huawei cloud Cassandra The team
Hangzhou, Xi'an, Shenzhen resume delivery :[email protected]
More technical articles , Please pay attention to Gauss Cassandra The official blog :https://bbs.huaweicloud.com/community/usersnew/id_1563519101830986
gaussian Cassandra Official home page :https://www.huaweicloud.com/product/gaussdbforcassandra.html
边栏推荐
- [C language] string function
- Standard ACL and extended ACL
- Tsinghua, Cambridge and UIC jointly launched the first Chinese fact verification data set: evidence-based, covering many fields such as medical society
- Datasimba launched wechat applet, and datanuza accepted the test of the whole scene| StartDT Hackathon
- [trusted computing] Lesson 10: TPM password resource management (II)
- pip相关命令
- 云安全日报220707:思科Expressway系列和网真视频通信服务器发现远程攻击漏洞,需要尽快升级
- sqlite sql 异常 near “with“: syntax error
- Hutool - 轻量级 DB 操作解决方案
- Introduction of common API for socket programming and code implementation of socket, select, poll, epoll high concurrency server model
猜你喜欢
The highest level of anonymity in C language
Idea completely uninstalls installation and configuration notes
小试牛刀之NunJucks模板引擎
Improve application security through nonce field of play integrity API
Chapter 3 business function development (user login)
Classification of regression tests
上市十天就下线过万台,欧尚Z6产品实力备受点赞
Chapter 2 build CRM project development environment (database design)
【蓝桥杯集训100题】scratch从小到大排序 蓝桥杯scratch比赛专项预测编程题 集训模拟练习题第17题
How to clean when win11 C disk is full? Win11 method of cleaning C disk
随机推荐
持续测试(CT)实战经验分享
Static routing configuration
清华、剑桥、UIC联合推出首个中文事实核查数据集:基于证据、涵盖医疗社会等多个领域
国内的软件测试会受到偏见吗
Nat address translation
Introduction of common API for socket programming and code implementation of socket, select, poll, epoll high concurrency server model
Tear the Nacos source code by hand (tear the client source code first)
Learn to make dynamic line chart in 3 minutes!
上市十天就下线过万台,欧尚Z6产品实力备受点赞
[demo] circular queue and conditional lock realize the communication between goroutines
Personal best practice demo sharing of enum + validation
能同时做三个分割任务的模型,性能和效率优于MaskFormer!Meta&UIUC提出通用分割模型,性能优于任务特定模型!开源!...
Five simple ways to troubleshoot with Stace
Standard ACL and extended ACL
静态路由配置
2022年理财有哪些产品?哪些适合新手?
【Unity Shader】插入Pass实现模型遮挡X光透视效果
将模型的记忆保存下来!Meta&UC Berkeley提出MeMViT,建模时间支持比现有模型长30倍,计算量仅增加4.5%...
nest.js入门之 database
[principles and technologies of network attack and Defense] Chapter 3: network reconnaissance technology