当前位置:网站首页>Alibaba architects spent a year sorting out the "Lucene advanced document", and you are also a big factory employee!
Alibaba architects spent a year sorting out the "Lucene advanced document", and you are also a big factory employee!
2022-07-29 10:53:00 【InfoQ】


One 、 Theoretical basis of search technology
- Why learn Lucene
- Data query method
- Application scenario of full-text retrieval technology

Two 、Lucene Introduce
- What is full-text retrieval
- What is? Lucene
- Lucene Official website

3、 ... and 、Lucene The process of full-text retrieval
- Index and search flow chart
- Indexing process
- Search process

Four 、Lucene introduction
- Lucene Get ready
- development environment
- establish Java engineering
- Indexing process
- Use Luke Look at the index
- Search process

5、 ... and 、Field Domain type
- Field attribute
- Field Common types
- Field modify

6、 ... and 、 Index maintenance
- demand
- Add index
- Modify the index
- Delete index

7、 ... and 、 Word segmentation is
- Participle understanding
- Analyzer Use time
- Lucene Native word breaker
- Third party Chinese word segmentation

8、 ... and 、Lucene Advanced search
- Text search
- Numerical range search
- Combined search

Nine 、 Search for cases
- Introduce dependencies
- Add pages and resources to the project
- Create packages and startup classes
- The configuration file
- Business code

Ten 、Lucene The underlying storage structure ( senior )
- Understand in detail lucene Storage structure
- Index library physical files
- Index library file extension comparison table
- Construction of dictionary

11、 ... and 、Lucene Optimize ( senior )
- confifig.setMaxBufffferedDocs(100000); Control to write a new segment It's stored in memory before document Number of , Setting a larger number can speed up indexing .(The higher the value, the faster the indexing speed , But it will consume more memory)
- indexWriter.forceMerge( Number of documents ); Set up N Documents are merged into one segment (The higher the value, the faster the indexing speed , The slower the search speed ; The smaller the value, the slower the indexing speed , The faster the search)
- Solve a large number of disks IO
- Choose the right word breaker
- Choose an appropriate location to store the index library
- Search for api The choice of

Twelve 、Lucene Correlation ranking ( senior )

13、 ... and 、Lucene Precautions for use ( senior )
- Keywords are case sensitive OR AND TO Key words are case sensitive ,lucene Only in capitals , Lowercase words are treated as ordinary words .
- Reading and writing are mutually exclusive There can only be one write to the index at a time , Search while writing
- File lock Forced exit in the process of writing index will be in tmp The catalog leaves a lock file , Make future write operations impossible , You can delete it manually
- Time format lucene Only one time format is supported yyMMddHHmmss, So you pass a yy-MM-dd HH:mm:ss Time for lucene It's not going to be dealt with as time
- Set up boost Sometimes when searching, the weight of a field needs to be larger , For example, you may think that articles with keywords in the title are more valuable than articles with keywords in the text , You can put the title of boost Set it bigger , Then the search results will give priority to the articles with keywords in the title

边栏推荐
- Meeting OA project (V) -- meeting notice and feedback details
- 2022cuda summer training camp Day6 practice
- 深度强化学习应用实践技巧
- 重磅 | 开放原子校源行活动正式启动
- Sunwenlong, Secretary General of the open atom open source foundation, worked together to expand open source
- Use R-Pack skimr to collect the beautiful display of President measurement
- Analysis of QT basic engineering
- ADB shell WM command and usage:
- R language brca MRNA data set analysis
- Regular expression matching URL
猜你喜欢

Pytorch 入门

Using Riemann sum to calculate approximate integral in R language

ES6 arrow function this points to

Detailed arrangement of JVM knowledge points (long text warning)

浅谈安科瑞灭弧式智慧用电在养老机构的应用

Data visualization design guide (information chart)

重磅 | 开放原子校源行活动正式启动

Meeting OA project (V) -- meeting notice and feedback details

Basic construction of QT project

Open source, compliance escort! 2022 open atom global open source summit open source compliance sub forum is about to open
随机推荐
3.认识和操作一下mysql的基本命令
R language uses data set veteran for survival analysis
R 语言 BRCA.mRNA数据集 分析
Discussion on the application of arcing smart electricity in elderly care institutions
Drawing box and ellipse of WPF screenshot control (IV) "imitating wechat"
Talk about the establishment of performance testing environment
Hutool日期时间
重磅 | 基金会为白金、黄金、白银捐赠人授牌
️ 炒 股 实 战丨原 地 起 飞 ️
为什么要使用markdown进行写作?
Detailed arrangement of JVM knowledge points (long text warning)
VMWare:使用命令更新或升级 VMWare ESXi 主机
R 语言 用黎曼和求近似 积分
Scrape crawler framework
2022cuda summer training camp Day5 practice
Less than 10% of the 3 software test interview questions can be answered correctly! How many do you know?
2022cuda summer training camp Day6 practice
Svn revision keyword
正则表达式匹配网址
使用R包skimr汇总统计量的优美展示