当前位置:网站首页>coming! Gaussdb (for Cassandra) new features appear
coming! Gaussdb (for Cassandra) new features appear
2022-07-07 18:43:00 【Hua Weiyun】
today , Hua Wei Yun GaussDB(for Cassandra) carry Lucene Engine new solution Come on. !
At present , Internet 、 Big data is developing rapidly , The amount of data is growing explosively , In high concurrency 、 High availability 、 Driven by the high expansion of business demand ,NoSQL Database has become the rigid demand of more and more business scenarios . But in terms of query , Conventional NoSQL But it has certain limitations , Strictly speaking , Like open source MongoDB、Cassandra、Hbase Etc. do not have multi-dimensional query of massive data 、 Text retrieval 、 Statistical analysis, etc . Most enterprises are still looking for a more perfect NoSQL Solution .
Huawei cloud native multimode database GaussDB NoSQL Have a strong ecosystem , Support key value 、 A wide watch 、 file 、 Timing four engine interfaces . among , Wide table engine interface GaussDB(for Cassandra) Has been released Lucene Secondary index function , Existing NoSQL The advantages of , It can also support a variety of complex query scenarios , Comprehensively improve users' query experience in massive data scenarios , Spoil powder with strength ! I believe you must have many questions ,GaussDB(for Cassandra) What is it? ? How to use secondary index ?Lucene What are the differences between secondary indexes ? take it easy , Next, let's interpret them one by one .
What is? GaussDB(for Cassandra)?
GaussDB(for Cassandra) It is a Huawei self-developed 、 Distributed cloud database with computing storage separation architecture , In high performance 、 High availability 、 Highly reliable 、 High security 、 On the basis of elastic expansion and contraction , Provides one click deployment 、 Backup recovery 、 Monitoring alarm and other service capabilities ; And highly compatible with open source Cassandra Interface , Provide high read / write performance . At present, it has been widely used in IoT、 meteorological 、 Internet 、 Games and many other fields .
What is a secondary index ?
Let's first understand the concept of index . Index is a storage structure created to speed up data retrieval , It is a design idea of exchanging space for time . The function can be understood as the catalogue of books , Through the directory, you can quickly locate the required content .
stay Cassandra in ,Primary Key It's index. ( Also known as primary index ), At query time , according to Primary Key You can directly retrieve the corresponding records . And secondary index is also called auxiliary index , To help locate the primary index , Then find the corresponding record according to the primary index . We usually use CREATE INDEX The statement establishes a secondary index .
At present Cassandra What are the pain points of the secondary index ?
Native Cassandra The implementation of secondary index in actually creates an implicit table , Of this table Primary Key Is the column that creates the index , The value is the corresponding Primary Key, Implementation is relatively simple , Therefore, it is inevitable to bring some constraints :
1. The first primary key can only be used “=” Inquire about ;
2. The second primary key can use “=、>、<、>=、<=”;
3. Index columns only support “=” Inquire about ;
4. Delete 、 Columns that are updated too frequently are not suitable for indexing ;
5.High-cardinality Columns are not suitable for indexing ;
Based on the above constraints ,Cassandra The query function that secondary index can provide is very limited .
Why Lucene?
Lucene It is currently the most popular open source full-text search engine tool , It has the following characteristics :
1. Stable 、 High indexing performance ;
2. It's efficient 、 accuracy 、 High performance search algorithm ;
3. Rich query types : Support phrase query 、 Wildcard query 、 Approximate query 、 Range query, etc ;
4. There is strong open source community support , Good maintainability ;
therefore , Use integration Lucene Engine to supplement Cassandra The weakness of query ability is the best choice , After all, who would refuse a stable performance 、 Continued growth 、 And update the iterative search engine ?
Lucene The engine has powerful inverted index and columnar storage capacity , Given GaussDB(for Cassandra) Efficient multidimensional query 、 Text retrieval 、 Statistical analysis, etc , It is similar to the native secondary index in use experience , But at the same time, it has richer syntax support .
Use Lucene After secondary index , What changes have taken place in my query ?
More flexible query 、 Filtering method :
All queries can be made without PK Or take part PK, And the index column supports “>、<、in” Wait for the operator , Users no longer need to be limited to using “=”.
Strong text retrieval ability :
Text retrieval ability is Lucene What I'm good at , It's very convenient to use , Just pass the keyword like That is to say .
You can do this :
SELECT * FROM example WHERE field LIKE 'test%'; // Prefix query
You can do that :
SELECT * FROM example WHERE field LIKE 'start*end'; // Regular matching
It can be like this :
SELECT * FROM example WHERE field LIKE '%+lucene +index%'; // Full text search , High performance , Stable
Support the statistics of large amount of data exceeding trillion specifications :
select count(*) from example where pk > 1 and expr(lucene_index, 'count');
Multiple deletion methods :
Support single Single row deletion 、partition Partition deletion 、range Scope delete , Cover all kinds of deletion scenes .
DELETE FROM example WHERE pk1='a' AND field=1; // single Single row deletion
DELETE FROM example WHERE pk1='a' AND pk2=5000; // partition Partition deletion
DELETE FROM example WHERE pk1='a' AND pk2=3000 AND ck1=2 AND ck2>'a' AND ck2<'c'; // range Scope delete
Support extended json Query interface , Easily deal with various complex query scenarios :
Extended json Query interface provides rich query syntax , More diverse usage . The following is a list of keywords :
filter | In the query statement json Search keywords |
term | When querying, judge a document Whether to include a specific value , Word segmentation query will not be performed on the queried value |
match | Segment the queried value , Full text search |
range | Query specifies that a field is in a specific range ( Range query subkey :"eq"/"gte"/"gt"/"lte"/"lt") |
bool | It has to be with "must"、"should"、"must not" Combine complex queries together |
must | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
should | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
must not | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
Take a chestnut :
SELECT * FROM example WHERE EXPR(index_field, '{"filter": {"bool": {"should": [{"bool": {"should": [{"bool": {"must": [{"bool": {"should": [{"range": {"ck1": {"lt": 2}, "ck1": {"gte": 4}}}]}}, {"bool": {"should": [{"range": {"field1": {"lt": 2}, "field1": {"gt": 3}}}]}}]}}, {"bool": {"should": [{"term": {"pk1": "a", "pk1": "b", "pk1": "c"}}]}}]}}, {"bool": {"must": [{"range": {"field2": {"gte":5, "lte": 15}, "pk2": {"gt": 2000}}}]}}]}}}')
Add nesting through condition combination , You can DIY In line with their own business sql sentence , And the highest support 200 layer json nesting , Even complex scenes can be handled !
Hua Wei Yun GaussDB(for Cassandra) carrying Lucene engine , adopt Lucene The secondary index sinks the search ability to the bottom , Fundamentally liberated the application layer query , Multi dimensional query 、 Text retrieval 、 Statistical analysis and other abilities , It can perfectly make up for NoSQL Weak query function short board , Let enterprises calmly deal with the complex query scenario of massive data . What are we waiting for? , Come and experience it !
appendix
The author of this article : Huawei cloud Cassandra The team
Hangzhou, Xi'an, Shenzhen resume delivery :[email protected]
More technical articles , Please pay attention to Gauss Cassandra The official blog :https://bbs.huaweicloud.com/community/usersnew/id_1563519101830986
gaussian Cassandra Official home page :https://www.huaweicloud.com/product/gaussdbforcassandra.html
边栏推荐
- debian10系统问题总结
- Tips of this week 141: pay attention to implicit conversion to bool
- 五种网络IO模型
- Cloud security daily 220707: Cisco Expressway series and telepresence video communication server have found remote attack vulnerabilities and need to be upgraded as soon as possible
- 3分钟学会制作动态折线图!
- 保证接口数据安全的10种方案
- [C language] string function
- 元宇宙带来的创意性改变
- Comparison and selection of kubernetes Devops CD Tools
- golang 客户端服务端登录
猜你喜欢
Static routing configuration
性能测试过程和计划
[principle and technology of network attack and Defense] Chapter 6: Trojan horse
回归测试的分类
Idea completely uninstalls installation and configuration notes
Wireshark分析抓包数据*.cap
NAT地址转换
清华、剑桥、UIC联合推出首个中文事实核查数据集:基于证据、涵盖医疗社会等多个领域
讨论 | AR 应用落地前,要做好哪些准备?
Industry case | digital operation base helps the transformation of life insurance industry
随机推荐
What are the financial products in 2022? What are suitable for beginners?
Afghan interim government security forces launched military operations against a hideout of the extremist organization "Islamic state"
简单几步教你如何看k线图图解
[C language] string function
Tips of the week 136: unordered containers
nest.js入门之 database
现在网上期货开户安全吗?国内有多少家正规的期货公司?
[PaddleSeg源码阅读] PaddleSeg Validation 中添加 Boundary IoU的计算(1)——val.py文件细节提示
Idea completely uninstalls installation and configuration notes
Hutool - 轻量级 DB 操作解决方案
4种常见的缓存模式,你都知道吗?
DataSimba推出微信小程序,DataNuza接受全场景考验? | StartDT Hackathon
sqlite sql 异常 near “with“: syntax error
SQLite SQL exception near "with": syntax error
How to clean when win11 C disk is full? Win11 method of cleaning C disk
Personal best practice demo sharing of enum + validation
AI defeated mankind and designed a better economic mechanism
国内的软件测试会受到偏见吗
清华、剑桥、UIC联合推出首个中文事实核查数据集:基于证据、涵盖医疗社会等多个领域
PHP面试题 foreach($arr as &$value)与foreach($arr as $value)的用法