当前位置:网站首页>基于Lucene3.5.0怎样从TokenStream获得Token
基于Lucene3.5.0怎样从TokenStream获得Token
2022-07-05 11:22:00 【全栈程序员站长】
通过学习Lucene3.5.0的doc文档,对不同release版本号 lucene版本号的API修改做分析。最后找到了有价值的修改信息。 LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java’s StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir) 以上信息可以知道,原来的通过的方法已经不可以提取响应的Token了
StringReader reader = new StringReader(s);
TokenStream ts =analyzer.tokenStream(s, reader);
TermAttribute ta = ts.getAttribute(TermAttribute.class);
通过分析Api文档信息 可知,CharTermAttribute已经成为替换TermAttribute的接口因此我编写了一个样例来更好的从TokenStream中提取Token
package com.segment;
import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.AttributeImpl;
import org.wltea.analyzer.lucene.IKAnalyzer;
public class Segment {
public static String show(Analyzer a, String s) throws Exception {
StringReader reader = new StringReader(s);
TokenStream ts = a.tokenStream(s, reader);
String s1 = "", s2 = "";
boolean hasnext= ts.incrementToken();
//Token t = ts.next();
while (hasnext) {
//AttributeImpl ta = new AttributeImpl();
CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
//TermAttribute ta = ts.getAttribute(TermAttribute.class);
s2 = ta.toString() + " ";
s1 += s2;
hasnext = ts.incrementToken();
}
return s1;
}
public String segment(String s) throws Exception {
Analyzer a = new IKAnalyzer();
return show(a, s);
}
public static void main(String args[])
{
String name = "我是俊杰,我爱编程,我的測试用例";
Segment s = new Segment();
String test = "";
try {
System.out.println(test+s.segment(name));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/109513.html原文链接:https://javaforall.cn
边栏推荐
- How to close the log window in vray5.2
- DDR4硬件原理图设计详解
- Cron expression (seven subexpressions)
- C language current savings account management system
- 龙蜥社区第九次运营委员会会议顺利召开
- 【全网首发】(大表小技巧)有时候 2 小时的 SQL 操作,可能只要 1 分钟
- 基础篇——基础项目解析
- What does cross-border e-commerce mean? What do you mainly do? What are the business models?
- 管理多个Instagram帐户防关联小技巧大分享
- About the use of Vray 5.2 (self research notes) (II)
猜你喜欢
About the use of Vray 5.2 (self research notes) (II)
数据库三大范式
How to close the log window in vray5.2
[advertising system] incremental training & feature access / feature elimination
Stop saying that microservices can solve all problems!
A mining of edu certificate station
R3live series learning (IV) r2live source code reading (2)
Ziguang zhanrui's first 5g R17 IOT NTN satellite in the world has been measured on the Internet of things
Go language learning notes - analyze the first program
Summary of thread and thread synchronization under window
随机推荐
Four departments: from now on to the end of October, carry out the "100 day action" on gas safety
websocket
Leetcode 185 All employees with the top three highest wages in the Department (July 4, 2022)
2022 Pengcheng cup Web
IPv6与IPv4的区别 网信办等三部推进IPv6规模部署
ZCMU--1390: 队列问题(1)
go语言学习笔记-初识Go语言
【DNS】“Can‘t resolve host“ as non-root user, but works fine as root
7.2每日学习4
Paradigm in database: first paradigm, second paradigm, third paradigm
力扣(LeetCode)185. 部门工资前三高的所有员工(2022.07.04)
Codeforces Round #804 (Div. 2)
Ffmpeg calls avformat_ open_ Error -22 returned during input (invalid argument)
7.2 daily study 4
COMSOL--三维随便画--扫掠
AUTOCAD——遮罩命令、如何使用CAD对图纸进行局部放大
如何让全彩LED显示屏更加节能环保
修复动漫1K变8K
以交互方式安装ESXi 6.0
解决readObjectStart: expect { or n, but found N, error found in #1 byte of ...||..., bigger context ..