当前位置:网站首页>How to get a token from tokenstream based on Lucene 3.5.0

How to get a token from tokenstream based on Lucene 3.5.0

2022-07-05 11:27:00 Full stack programmer webmaster

Through the study Lucene3.5.0 Of doc file , To be different release Version number lucene Version number of API Modify and analyze . Finally, we found valuable modification information . LUCENE-2302: Deprecated TermAttribute and replaced by a new CharTermAttribute. The change is backwards compatible, so mixed new/old TokenStreams all work on the same char[] buffer independent of which interface they use. CharTermAttribute has shorter method names and implements CharSequence and Appendable. This allows usage like Java’s StringBuilder in addition to direct char[] access. Also terms can directly be used in places where CharSequence is allowed (e.g. regular expressions). (Uwe Schindler, Robert Muir) The above information can be known , The original passed method can no longer extract the response Token 了

StringReader reader = new StringReader(s);
TokenStream ts =analyzer.tokenStream(s, reader);
TermAttribute ta = ts.getAttribute(TermAttribute.class);

Through analysis Api document information You know ,CharTermAttribute Has become a replacement TermAttribute So I wrote a sample to better understand the interface from TokenStream Extract from Token

package com.segment;

import java.io.StringReader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.AttributeImpl;
import org.wltea.analyzer.lucene.IKAnalyzer;

public class Segment {
	public static String show(Analyzer a, String s) throws Exception {

		StringReader reader = new StringReader(s);
		TokenStream ts = a.tokenStream(s, reader);
		String s1 = "", s2 = "";
		boolean hasnext= ts.incrementToken();
		//Token t = ts.next();
		while (hasnext) {
			//AttributeImpl ta = new AttributeImpl();
			CharTermAttribute ta = ts.getAttribute(CharTermAttribute.class);
			//TermAttribute ta = ts.getAttribute(TermAttribute.class);
			s2 = ta.toString() + " ";
			s1 += s2;
			hasnext = ts.incrementToken();
		return s1;

	public String segment(String s) throws Exception {
		Analyzer a = new IKAnalyzer();
		return show(a, s);
	public static void main(String args[])
		String name = " I'm Junjie , I love programming. , My test case ";
		Segment s = new Segment();
		String test = "";
		try {
		} catch (Exception e) {
			// TODO Auto-generated catch block


Publisher : Full stack programmer stack length , Reprint please indicate the source :https://javaforall.cn/109513.html Link to the original text :https://javaforall.cn


本文为[Full stack programmer webmaster]所创,转载请带上原文链接,感谢