当前位置:网站首页>Kingbasees plug-in ftutilx of Jincang database
Kingbasees plug-in ftutilx of Jincang database
2022-06-25 11:07:00 【Thousands of sails pass by the side of the sunken boat_】
Catalog
1. The plugin is introduced
ftutilx It's a KingbaseES An extension of , It is mainly used to format files from storage streams blob Extract text content from the type field . among blob Type field contents can include pdf、doc、docx、wps、xls、xlsx、ppt and pptx Format file .ftutilx The plug-in does not support encrypted file format .
2. Add plug-ins
In the use of ftutilx Before , You need to add it to kingbase.conf Of documents shared_preload_libraries in , And restart KingbaseES database .
shared_preload_libraries = 'ftutilx' # (change requires restart)
CREATE EXTENSION ftutilx;
3. Parameter configuration
ftutilx.max_string_length
Maximum length of extraction result , The default value is :128M, This parameter takes effect immediately after it is set .
ftutilx.jvm_option_string
JVM Initialize parameters , The default value is :"-Xmx1024m,-Xms1024m,-Xmn256m,-XX:MetaspaceSize=64m,-XX:MaxMetaspaceSize=128m,-XX:CompressedClassSpaceSize=256m", This parameter is only called for the first time in the session process extracttext Function creation JVM Effective when , Setting this parameter again is no longer valid .
Under the database default extended loading mechanism , After creating an extension in a session , The extended dynamic library is not loaded immediately after a new session starts , Instead, the extension dynamic library will not be loaded until the interface in the extension is called for the first time , As a result, it is invalid to set the extension parameters in the new session . The solution is : Modify... In the database configuration file shared_preload_libraries perhaps session_preload_libraries One of the two parameters , Make the parameter value include ftutilx, It can be loaded immediately after the new session starts ftutilx Extend dynamic library , And set the extension parameters .
4. Use ftutilx
ftutilx The plug-in provides extracttext Function is used to extract data stored in blob File contents in the type field .extracttext The() function accepts a that represents the contents of a file blob Type parameter , Returns the extracted text Type text content .
CREATE TABLE tab (title text, body blob);
INSERT INTO tab VALUES ('test.doc', blob_import('/home/test/data.doc'));
SELECT title, length(extracttext(body)) FROM tab;
4.1. Use ftutilx The joint use scheme of full-text retrieval
Because the extraction speed of electronic document content is slow , To improve the performance of full-text retrieval , You can add storage columns to a table , It is used to store content extraction results or word position lists .
Scheme 1 :
CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION zhparsercfg (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION zhparsercfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;
CREATE EXTENSION ftutilx;
CREATE TABLE tab (title text, body blob);
ALTER TABLE tab ADD COLUMN content text GENERATED ALWAYS AS (extracttext(body)) STORED;
CREATE INDEX tab_idx ON tab USING GIN (to_tsvector('zhparsercfg', content));
INSERT INTO tab VALUES ('test.doc', blob_import('/home/test/data.doc'));
SELECT title FROM tab WHERE to_tsvector('zhparsercfg', content) @@ to_tsquery(' journal ');
Option two :
CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION zhparsercfg (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION zhparsercfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;
CREATE EXTENSION ftutilx;
CREATE TABLE tab (title text, body blob);
ALTER TABLE tab ADD COLUMN tab_idx_col tsvector GENERATED ALWAYS AS (to_tsvector('zhparsercfg', extracttext(body))) STORED;
CREATE INDEX tab_idx ON tab USING GIN (tab_idx_col);
INSERT INTO tab VALUES ('test.doc', blob_import('/home/test/data.doc'));
SELECT title FROM tab WHERE tab_idx_col @@ to_tsquery(' journal ');
4.2. matters needing attention
1) ftutilx Need to rely on jre-1.8.0 Runtime environment , Settings required after deployment LD_LIBRARY_PATH The system environment variable contains jre-1.8.0 Of libjvm.so route .
2) ftutilx.max_string_length Parameter is used to configure the maximum length of the extraction result , But because of tsvector At present, the biggest support (1M-1), therefore extracttext combination to_tsvector When using , The size of the word segmentation result cannot exceed (1M-1).
3) ftutilx Need to create JVM,JVM It will occupy more memory . Although adjusted ftutilx.jvm_option_string Of -Xmx Can restrict JVM Memory footprint , But too small -Xmx Value will cause large file parsing JVM An out of memory exception occurred .
4) Based on the previous full-text retrieval joint use scheme , In an environment with less system memory , You need to limit the number of session processes that insert data in parallel , In case the system memory is exhausted .
5. Uninstall plugins
drop extension ftutilx;
边栏推荐
- keep-alive
- A random number generator
- Crawler scheduling framework of scratch+scratch+grammar
- Dell technology performs the "fast" formula and plays ci/cd
- 手机炒股安全吗?
- Output reading: apply what you have learned
- [file containing vulnerability-03] six ways to exploit file containing vulnerabilities
- Array structure collation
- 金仓数据库 KingbaseES 插件ftutilx
- Writing wechat applet with uni app
猜你喜欢

【文件包含漏洞-04】经典面试题:已知某网站仅存在本地文件包含漏洞时,如何GetShell?

Upload and modify the use of avatars

Sign up to open the third session of the "flying oar hacker marathon". It's been a long time

Complete steps for a complete Oracle uninstall
![[file containing vulnerability-03] six ways to exploit file containing vulnerabilities](/img/4f/495c852eb0e634c58e576d911a2c14.png)
[file containing vulnerability-03] six ways to exploit file containing vulnerabilities

Getting started with Apache Shenyu

scrapy+scrapyd+gerapy 爬虫调度框架

Jincang KFS data cascade scenario deployment

龙书虎书鲸书啃不动?试试豆瓣评分9.5的猴书

Advanced single chip microcomputer -- development of PCB (2)
随机推荐
金仓数据库 KingbaseES 插件DBMS_RANDOM
16 种企业架构策略
金仓数据库 KingbaseES 插件identity_pwdexp
Software testing to avoid being dismissed during the probation period
Daily 3 questions (3) - check whether integers and their multiples exist
【文件包含漏洞-04】经典面试题:已知某网站仅存在本地文件包含漏洞时,如何GetShell?
keep-alive
Spannable 和 Editable、SpannableString 和 SpannableString
SystemVerilog(十三)-枚举数据类型
视频会议一体机的技术实践和发展趋势
金仓数据库 KingbaseES 插件dbms_session
[shangyun boutique] energy saving and efficiency improvement! Accelerating the transformation of "intelligent manufacturing" in the textile industry
服务端渲染
Gaussdb cluster maintenance case set - slow SQL execution
June 24, 2022: golang multiple choice question, what does the following golang code output? A:1; B:3; C:4; D: Compilation failed. package main import ( “fmt“ ) func mai
Detection and analysis of simulator in an app
1-7Vmware中的快照与克隆
指南针在上面开股票账户安全吗?
Think about it
金仓数据库 KingbaseES 插件DBMS_UTILITY