当前位置:网站首页>coming! Gaussdb (for Cassandra) new features appear
coming! Gaussdb (for Cassandra) new features appear
2022-07-07 18:43:00 【Hua Weiyun】
today , Hua Wei Yun GaussDB(for Cassandra) carry Lucene Engine new solution Come on. !
At present , Internet 、 Big data is developing rapidly , The amount of data is growing explosively , In high concurrency 、 High availability 、 Driven by the high expansion of business demand ,NoSQL Database has become the rigid demand of more and more business scenarios . But in terms of query , Conventional NoSQL But it has certain limitations , Strictly speaking , Like open source MongoDB、Cassandra、Hbase Etc. do not have multi-dimensional query of massive data 、 Text retrieval 、 Statistical analysis, etc . Most enterprises are still looking for a more perfect NoSQL Solution .
Huawei cloud native multimode database GaussDB NoSQL Have a strong ecosystem , Support key value 、 A wide watch 、 file 、 Timing four engine interfaces . among , Wide table engine interface GaussDB(for Cassandra) Has been released Lucene Secondary index function , Existing NoSQL The advantages of , It can also support a variety of complex query scenarios , Comprehensively improve users' query experience in massive data scenarios , Spoil powder with strength ! I believe you must have many questions ,GaussDB(for Cassandra) What is it? ? How to use secondary index ?Lucene What are the differences between secondary indexes ? take it easy , Next, let's interpret them one by one .

What is? GaussDB(for Cassandra)?
GaussDB(for Cassandra) It is a Huawei self-developed 、 Distributed cloud database with computing storage separation architecture , In high performance 、 High availability 、 Highly reliable 、 High security 、 On the basis of elastic expansion and contraction , Provides one click deployment 、 Backup recovery 、 Monitoring alarm and other service capabilities ; And highly compatible with open source Cassandra Interface , Provide high read / write performance . At present, it has been widely used in IoT、 meteorological 、 Internet 、 Games and many other fields .
What is a secondary index ?
Let's first understand the concept of index . Index is a storage structure created to speed up data retrieval , It is a design idea of exchanging space for time . The function can be understood as the catalogue of books , Through the directory, you can quickly locate the required content .
stay Cassandra in ,Primary Key It's index. ( Also known as primary index ), At query time , according to Primary Key You can directly retrieve the corresponding records . And secondary index is also called auxiliary index , To help locate the primary index , Then find the corresponding record according to the primary index . We usually use CREATE INDEX The statement establishes a secondary index .
At present Cassandra What are the pain points of the secondary index ?
Native Cassandra The implementation of secondary index in actually creates an implicit table , Of this table Primary Key Is the column that creates the index , The value is the corresponding Primary Key, Implementation is relatively simple , Therefore, it is inevitable to bring some constraints :
1. The first primary key can only be used “=” Inquire about ;
2. The second primary key can use “=、>、<、>=、<=”;
3. Index columns only support “=” Inquire about ;
4. Delete 、 Columns that are updated too frequently are not suitable for indexing ;
5.High-cardinality Columns are not suitable for indexing ;
Based on the above constraints ,Cassandra The query function that secondary index can provide is very limited .
Why Lucene?
Lucene It is currently the most popular open source full-text search engine tool , It has the following characteristics :
1. Stable 、 High indexing performance ;
2. It's efficient 、 accuracy 、 High performance search algorithm ;
3. Rich query types : Support phrase query 、 Wildcard query 、 Approximate query 、 Range query, etc ;
4. There is strong open source community support , Good maintainability ;
therefore , Use integration Lucene Engine to supplement Cassandra The weakness of query ability is the best choice , After all, who would refuse a stable performance 、 Continued growth 、 And update the iterative search engine ?
Lucene The engine has powerful inverted index and columnar storage capacity , Given GaussDB(for Cassandra) Efficient multidimensional query 、 Text retrieval 、 Statistical analysis, etc , It is similar to the native secondary index in use experience , But at the same time, it has richer syntax support .
Use Lucene After secondary index , What changes have taken place in my query ?
More flexible query 、 Filtering method :
All queries can be made without PK Or take part PK, And the index column supports “>、<、in” Wait for the operator , Users no longer need to be limited to using “=”.
Strong text retrieval ability :
Text retrieval ability is Lucene What I'm good at , It's very convenient to use , Just pass the keyword like That is to say .
You can do this :
SELECT * FROM example WHERE field LIKE 'test%'; // Prefix query You can do that :
SELECT * FROM example WHERE field LIKE 'start*end'; // Regular matching It can be like this :
SELECT * FROM example WHERE field LIKE '%+lucene +index%'; // Full text search , High performance , Stable Support the statistics of large amount of data exceeding trillion specifications :
select count(*) from example where pk > 1 and expr(lucene_index, 'count'); Multiple deletion methods :
Support single Single row deletion 、partition Partition deletion 、range Scope delete , Cover all kinds of deletion scenes .
DELETE FROM example WHERE pk1='a' AND field=1; // single Single row deletion DELETE FROM example WHERE pk1='a' AND pk2=5000; // partition Partition deletion DELETE FROM example WHERE pk1='a' AND pk2=3000 AND ck1=2 AND ck2>'a' AND ck2<'c'; // range Scope delete Support extended json Query interface , Easily deal with various complex query scenarios :
Extended json Query interface provides rich query syntax , More diverse usage . The following is a list of keywords :
filter | In the query statement json Search keywords |
term | When querying, judge a document Whether to include a specific value , Word segmentation query will not be performed on the queried value |
match | Segment the queried value , Full text search |
range | Query specifies that a field is in a specific range ( Range query subkey :"eq"/"gte"/"gt"/"lte"/"lt") |
bool | It has to be with "must"、"should"、"must not" Combine complex queries together |
must | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
should | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
must not | bool Type of subquery , The type is list, encapsulation "term"、"match"、"range" Inquire about |
Take a chestnut :
SELECT * FROM example WHERE EXPR(index_field, '{"filter": {"bool": {"should": [{"bool": {"should": [{"bool": {"must": [{"bool": {"should": [{"range": {"ck1": {"lt": 2}, "ck1": {"gte": 4}}}]}}, {"bool": {"should": [{"range": {"field1": {"lt": 2}, "field1": {"gt": 3}}}]}}]}}, {"bool": {"should": [{"term": {"pk1": "a", "pk1": "b", "pk1": "c"}}]}}]}}, {"bool": {"must": [{"range": {"field2": {"gte":5, "lte": 15}, "pk2": {"gt": 2000}}}]}}]}}}')Add nesting through condition combination , You can DIY In line with their own business sql sentence , And the highest support 200 layer json nesting , Even complex scenes can be handled !
Hua Wei Yun GaussDB(for Cassandra) carrying Lucene engine , adopt Lucene The secondary index sinks the search ability to the bottom , Fundamentally liberated the application layer query , Multi dimensional query 、 Text retrieval 、 Statistical analysis and other abilities , It can perfectly make up for NoSQL Weak query function short board , Let enterprises calmly deal with the complex query scenario of massive data . What are we waiting for? , Come and experience it !
appendix
The author of this article : Huawei cloud Cassandra The team
Hangzhou, Xi'an, Shenzhen resume delivery :[email protected]
More technical articles , Please pay attention to Gauss Cassandra The official blog :https://bbs.huaweicloud.com/community/usersnew/id_1563519101830986
gaussian Cassandra Official home page :https://www.huaweicloud.com/product/gaussdbforcassandra.html
边栏推荐
- What are the financial products in 2022? What are suitable for beginners?
- 清华、剑桥、UIC联合推出首个中文事实核查数据集:基于证据、涵盖医疗社会等多个领域
- Will low code help enterprises' digital transformation make programmers unemployed?
- 2022年理财有哪些产品?哪些适合新手?
- 小试牛刀之NunJucks模板引擎
- The report of the state of world food security and nutrition was released: the number of hungry people in the world increased to 828million in 2021
- Wireshark分析抓包数据*.cap
- 线程池和单例模式以及文件操作
- [C language] string function
- 云安全日报220707:思科Expressway系列和网真视频通信服务器发现远程攻击漏洞,需要尽快升级
猜你喜欢

Backup Alibaba cloud instance OSS browser

DataSimba推出微信小程序,DataNuza接受全场景考验? | StartDT Hackathon

『HarmonyOS』DevEco的下载安装与开发环境搭建

标准ACL与扩展ACL

GSAP animation library

【蓝桥杯集训100题】scratch从小到大排序 蓝桥杯scratch比赛专项预测编程题 集训模拟练习题第17题

元宇宙带来的创意性改变

【C语言】字符串函数

Comparison and selection of kubernetes Devops CD Tools

4种常见的缓存模式,你都知道吗?
随机推荐
Discuss | frankly, why is it difficult to implement industrial AR applications?
Download, installation and development environment construction of "harmonyos" deveco
[trusted computing] Lesson 12: TPM authorization and conversation
线程池中的线程工厂
RIP和OSPF的区别和配置命令
低代码助力企业数字化转型会让程序员失业?
上市十天就下线过万台,欧尚Z6产品实力备受点赞
小试牛刀之NunJucks模板引擎
将模型的记忆保存下来!Meta&UC Berkeley提出MeMViT,建模时间支持比现有模型长30倍,计算量仅增加4.5%...
PHP面试题 foreach($arr as &$value)与foreach($arr as $value)的用法
来了!GaussDB(for Cassandra)新特性亮相
Kirk Borne的本周学习资源精选【点击标题直接下载】
DataSimba推出微信小程序,DataNuza接受全场景考验? | StartDT Hackathon
Win11C盘满了怎么清理?Win11清理C盘的方法
[network attack and defense principle and technology] Chapter 4: network scanning technology
nest.js入门之 database
回归问题的评价指标和重要知识点总结
【demo】循环队列及条件锁实现goroutine间的通信
Differences between rip and OSPF and configuration commands
Nat address translation