当前位置:网站首页>Lucene introduces NFA
Lucene introduces NFA
2022-07-03 07:30:00 【chuanyangwang】
org.apache.lucene.util.automaton.MinimizationOperations#minimize
Determinate and minimize :
/**
* Minimizes (and determinizes if not already deterministic) the given
* automaton using Hopcroft's algorighm.
* @param maxDeterminizedStates maximum number of states determinizing the
* automaton can result in. Set higher to allow more complex queries and
* lower to prevent memory exhaustion.
*/
public static Automaton minimize(Automaton a, int maxDeterminizedStates) {
}org.apache.lucene.search.AutomatonQuery#getTermsEnum
take label by c At the end of state In a Dstate in
private int nextState(int charClass) {
initTransitions();
assert charClass < transitions.length;
if (transitions[charClass] == NOT_COMPUTED) {
assignTransition(charClass, findDState(step(points[charClass])));
// we could potentially update more than one char classes
if (minimalTransition != null) {
// to the left
int cls = charClass;
while (cls > 0 && points[--cls] >= minimalTransition.min) {
assert transitions[cls] == NOT_COMPUTED || transitions[cls] == transitions[charClass];
assignTransition(cls, transitions[charClass]);
}
// to the right
cls = charClass;
while (cls < points.length - 1 && points[++cls] <= minimalTransition.max) {
assert transitions[cls] == NOT_COMPUTED || transitions[cls] == transitions[charClass];
assignTransition(cls, transitions[charClass]);
}
minimalTransition = null;
}
}
return transitions[charClass];
}among
assignTransition(charClass, findDState(step(points[charClass])));
findDState Back to DFA Status number in
org.apache.lucene.util.automaton.NFARunAutomaton.DState#step
/**
* given a list of NFA states and a character c, compute the output list of NFA state which is
* wrapped as a DFA state
*/
private DState step(int c) {
statesSet.reset(); // TODO: fork IntHashSet from hppc instead?
int numTransitions;
int left = -1, right = alphabetSize;
for (int nfaState : nfaStates) {
numTransitions = automaton.initTransition(nfaState, stepTransition);
// TODO: binary search should be faster, since transitions are sorted
for (int i = 0; i < numTransitions; i++) {
automaton.getNextTransition(stepTransition);
if (stepTransition.min <= c && stepTransition.max >= c) {
statesSet.incr(stepTransition.dest);
left = Math.max(stepTransition.min, left);
right = Math.min(stepTransition.max, right);
}
if (stepTransition.max < c) {
left = Math.max(stepTransition.max + 1, left);
}
if (stepTransition.min > c) {
right = Math.min(stepTransition.min - 1, right);
// transitions in automaton are sorted
break;
}
}
}
if (statesSet.size() == 0) {
return null;
}
minimalTransition = new Transition();
minimalTransition.min = left;
minimalTransition.max = right;
return new DState(statesSet.getArray());
} /**
* return the ordinal of given DFA state, generate a new ordinal if the given DFA state is a new
* one
*/
private int findDState(DState dState) {
if (dState == null) {
return MISSING;
}
int ord = dStateToOrd.getOrDefault(dState, -1);
if (ord >= 0) {
return ord;
}
ord = dStateToOrd.size();
dStateToOrd.put(dState, ord);
assert ord >= dStates.length || dStates[ord] == null;
if (ord >= dStates.length) {
dStates = ArrayUtil.grow(dStates, ord + 1);
}
dStates[ord] = dState;
return ord;
}Reference documents :
Regular Expression Matching Can Be Simple And Fast
https://swtch.com/~rsc/regexp/regexp1.html
边栏推荐
- VMware network mode - bridge, host only, NAT network
- Margin left: -100% understanding in the Grail layout
- 【开发笔记】基于机智云4G转接板GC211的设备上云APP控制
- [solved] sqlexception: invalid value for getint() - 'Tian Peng‘
- C代码生产YUV420 planar格式文件
- Segment read
- Longest common prefix and
- Introduction of buffer flow
- [set theory] order relation (partial order relation | partial order set | example of partial order set)
- [untitled]
猜你喜欢

Comparison of advantages and disadvantages between most complete SQL and NoSQL

Leetcode 213: 打家劫舍 II

Circuit, packet and message exchange

Leetcode 213: looting II

Take you through the whole process and comprehensively understand the software accidents that belong to testing

Common architectures of IO streams

Basic knowledge about SQL database

Inverted chain disk storage in Lucene (pfordelta)

《指環王:力量之戒》新劇照 力量之戒鑄造者亮相

Le Seigneur des anneaux: l'anneau du pouvoir
随机推荐
Raspberry pie update tool chain
IO stream system and FileReader, filewriter
Chapter VI - Containers
[coppeliasim4.3] C calls UR5 in the remoteapi control scenario
c语言指针的概念
HCIA notes
docket
The embodiment of generics in inheritance and wildcards
High concurrency memory pool
Win 2008 R2 crashed at the final installation stage
La différence entre le let Typescript et le Var
Docker builds MySQL: the specified path of version 5.7 cannot be mounted.
Hash table, generic
Homology policy / cross domain and cross domain solutions /web security attacks CSRF and XSS
VMWare网络模式-桥接,Host-Only,NAT网络
[most detailed] latest and complete redis interview book (50)
2021-07-18
PgSQL converts string to double type (to_number())
Web router of vertx
sharepoint 2007 versions
https://issues.apache.org/jira/browse/LUCENE-10010
https://www.omegaxyz.com/2019/01/25/rg2re/
https://www.bilibili.com/video/BV1zW411t7YE?from=search&seid=11305774085050474112&spm_id_from=333.337.0.0