当前位置:网站首页>Lucene introduces NFA
Lucene introduces NFA
2022-07-03 07:30:00 【chuanyangwang】
org.apache.lucene.util.automaton.MinimizationOperations#minimize
Determinate and minimize :
/**
* Minimizes (and determinizes if not already deterministic) the given
* automaton using Hopcroft's algorighm.
* @param maxDeterminizedStates maximum number of states determinizing the
* automaton can result in. Set higher to allow more complex queries and
* lower to prevent memory exhaustion.
*/
public static Automaton minimize(Automaton a, int maxDeterminizedStates) {
}org.apache.lucene.search.AutomatonQuery#getTermsEnum
take label by c At the end of state In a Dstate in
private int nextState(int charClass) {
initTransitions();
assert charClass < transitions.length;
if (transitions[charClass] == NOT_COMPUTED) {
assignTransition(charClass, findDState(step(points[charClass])));
// we could potentially update more than one char classes
if (minimalTransition != null) {
// to the left
int cls = charClass;
while (cls > 0 && points[--cls] >= minimalTransition.min) {
assert transitions[cls] == NOT_COMPUTED || transitions[cls] == transitions[charClass];
assignTransition(cls, transitions[charClass]);
}
// to the right
cls = charClass;
while (cls < points.length - 1 && points[++cls] <= minimalTransition.max) {
assert transitions[cls] == NOT_COMPUTED || transitions[cls] == transitions[charClass];
assignTransition(cls, transitions[charClass]);
}
minimalTransition = null;
}
}
return transitions[charClass];
}among
assignTransition(charClass, findDState(step(points[charClass])));
findDState Back to DFA Status number in
org.apache.lucene.util.automaton.NFARunAutomaton.DState#step
/**
* given a list of NFA states and a character c, compute the output list of NFA state which is
* wrapped as a DFA state
*/
private DState step(int c) {
statesSet.reset(); // TODO: fork IntHashSet from hppc instead?
int numTransitions;
int left = -1, right = alphabetSize;
for (int nfaState : nfaStates) {
numTransitions = automaton.initTransition(nfaState, stepTransition);
// TODO: binary search should be faster, since transitions are sorted
for (int i = 0; i < numTransitions; i++) {
automaton.getNextTransition(stepTransition);
if (stepTransition.min <= c && stepTransition.max >= c) {
statesSet.incr(stepTransition.dest);
left = Math.max(stepTransition.min, left);
right = Math.min(stepTransition.max, right);
}
if (stepTransition.max < c) {
left = Math.max(stepTransition.max + 1, left);
}
if (stepTransition.min > c) {
right = Math.min(stepTransition.min - 1, right);
// transitions in automaton are sorted
break;
}
}
}
if (statesSet.size() == 0) {
return null;
}
minimalTransition = new Transition();
minimalTransition.min = left;
minimalTransition.max = right;
return new DState(statesSet.getArray());
} /**
* return the ordinal of given DFA state, generate a new ordinal if the given DFA state is a new
* one
*/
private int findDState(DState dState) {
if (dState == null) {
return MISSING;
}
int ord = dStateToOrd.getOrDefault(dState, -1);
if (ord >= 0) {
return ord;
}
ord = dStateToOrd.size();
dStateToOrd.put(dState, ord);
assert ord >= dStates.length || dStates[ord] == null;
if (ord >= dStates.length) {
dStates = ArrayUtil.grow(dStates, ord + 1);
}
dStates[ord] = dState;
return ord;
}Reference documents :
Regular Expression Matching Can Be Simple And Fast
https://swtch.com/~rsc/regexp/regexp1.html
边栏推荐
- sharepoint 2007 versions
- 【CMake】CMake链接SQLite库
- Talk about floating
- 2. E-commerce tool cefsharp autojs MySQL Alibaba cloud react C RPA automated script, open source log
- Chrome 98 Private Network Access problem w/ disabled web security: Request had no target IP address
- 【已解决】win10找不到本地组策略编辑器解决方法
- [untitled]
- The embodiment of generics in inheritance and wildcards
- Introduction of buffer flow
- [plus de détails] dernière entrevue complète redis (50)
猜你喜欢
随机推荐
Longest common prefix and
圖像識別與檢測--筆記
SQL create temporary table
TreeMap
The babbage industrial policy forum
2021-07-18
gstreamer ffmpeg avdec解码数据流向分析
Raspberry pie update tool chain
HISAT2 - StringTie - DESeq2 pipeline 进行bulk RNA-seq
Homology policy / cross domain and cross domain solutions /web security attacks CSRF and XSS
Beginners use Minio
Le Seigneur des anneaux: l'anneau du pouvoir
Warehouse database fields_ Summary of SQL problems in kingbase8 migration of Jincang database
FileInputStream and fileoutputstream
TypeScript let與var的區別
Circuit, packet and message exchange
Some basic operations of reflection
La différence entre le let Typescript et le Var
【最詳細】最新最全Redis面試大全(50道)
Sent by mqtt client server of vertx
https://issues.apache.org/jira/browse/LUCENE-10010
https://www.omegaxyz.com/2019/01/25/rg2re/
https://www.bilibili.com/video/BV1zW411t7YE?from=search&seid=11305774085050474112&spm_id_from=333.337.0.0






