当前位置:网站首页>Alibaba architects spent a year sorting out the "Lucene advanced document", and you are also a big factory employee!
Alibaba architects spent a year sorting out the "Lucene advanced document", and you are also a big factory employee!
2022-07-29 10:53:00 【InfoQ】


One 、 Theoretical basis of search technology
- Why learn Lucene
- Data query method
- Application scenario of full-text retrieval technology

Two 、Lucene Introduce
- What is full-text retrieval
- What is? Lucene
- Lucene Official website

3、 ... and 、Lucene The process of full-text retrieval
- Index and search flow chart
- Indexing process
- Search process

Four 、Lucene introduction
- Lucene Get ready
- development environment
- establish Java engineering
- Indexing process
- Use Luke Look at the index
- Search process

5、 ... and 、Field Domain type
- Field attribute
- Field Common types
- Field modify

6、 ... and 、 Index maintenance
- demand
- Add index
- Modify the index
- Delete index

7、 ... and 、 Word segmentation is
- Participle understanding
- Analyzer Use time
- Lucene Native word breaker
- Third party Chinese word segmentation

8、 ... and 、Lucene Advanced search
- Text search
- Numerical range search
- Combined search

Nine 、 Search for cases
- Introduce dependencies
- Add pages and resources to the project
- Create packages and startup classes
- The configuration file
- Business code

Ten 、Lucene The underlying storage structure ( senior )
- Understand in detail lucene Storage structure
- Index library physical files
- Index library file extension comparison table
- Construction of dictionary

11、 ... and 、Lucene Optimize ( senior )
- confifig.setMaxBufffferedDocs(100000); Control to write a new segment It's stored in memory before document Number of , Setting a larger number can speed up indexing .(The higher the value, the faster the indexing speed , But it will consume more memory)
- indexWriter.forceMerge( Number of documents ); Set up N Documents are merged into one segment (The higher the value, the faster the indexing speed , The slower the search speed ; The smaller the value, the slower the indexing speed , The faster the search)
- Solve a large number of disks IO
- Choose the right word breaker
- Choose an appropriate location to store the index library
- Search for api The choice of

Twelve 、Lucene Correlation ranking ( senior )

13、 ... and 、Lucene Precautions for use ( senior )
- Keywords are case sensitive OR AND TO Key words are case sensitive ,lucene Only in capitals , Lowercase words are treated as ordinary words .
- Reading and writing are mutually exclusive There can only be one write to the index at a time , Search while writing
- File lock Forced exit in the process of writing index will be in tmp The catalog leaves a lock file , Make future write operations impossible , You can delete it manually
- Time format lucene Only one time format is supported yyMMddHHmmss, So you pass a yy-MM-dd HH:mm:ss Time for lucene It's not going to be dealt with as time
- Set up boost Sometimes when searching, the weight of a field needs to be larger , For example, you may think that articles with keywords in the title are more valuable than articles with keywords in the text , You can put the title of boost Set it bigger , Then the search results will give priority to the articles with keywords in the title

边栏推荐
- matplotlib中文问题
- 软件测试干货
- Spark efficient data analysis 01. Establishment of idea development environment
- LeetCode_ 1049_ Weight of the last stone II
- Using R-Pack premsim to predict microsatellite instability based on gene expression
- Getting started with pytoch
- Leetcode bit operation
- 重磅 | 开放原子校源行活动正式启动
- 基于STM32设计的酒驾报警系统
- What happens when MySQL tables change from compressed tables to ordinary tables
猜你喜欢

重磅 | 开放原子校源行活动正式启动

Watch the open source summit first | quick view of the sub Forum & Activity agenda on July 29

Factoextra: visual PCA of multivariate statistical methods

98. (cesium chapter) cesium point heat

主子仓库都修改,如何进行同步?

数据可视化设计指南(信息图表篇)

ES6 arrow function this points to

站点数据收集-Scrapy使用笔记

开源峰会抢先看 | 7月29日分论坛&活动议程速览

Zhou Hongyi: 360 is the largest secure big data company in the world
随机推荐
周鸿祎:360是世界上最大的安全大数据公司
Analysis of QT basic engineering
Matplotlib Chinese question
Spark efficient data analysis 02, basic knowledge 13
Achieve the effect of a menu tab
2.安装MySQL
GPO:在 Start/Logon 中使用 PowerShell 脚本
聊聊性能测试环境搭建
Site data collection -scrapy usage notes
WPF 截图控件之绘制方框与椭圆(四) 「仿微信」
Using R-Pack premsim to predict microsatellite instability based on gene expression
Detailed arrangement of JVM knowledge points (long text warning)
Understand what a binary tree is (types, traversal methods, definitions of binary trees)
QT工程基本构建
会议OA项目(五)---- 会议通知、反馈详情
LeetCode_1049_最后一块石头的重量Ⅱ
3.认识和操作一下mysql的基本命令
开源峰会抢先看 | 7月29日分论坛&活动议程速览
Why use markdown to write?
学习R语言这几本电子书就够了!