当前位置:网站首页>AC automata
AC automata
2022-07-24 22:09:00 【Harrison who likes to knock code】
AC Problems solved by automata
Play on the prefix tree KMP,
Suppose a string array contains several sensitive words , Given a big article ,AC Automata is to find out which sensitive words are included in a big article
AC Automata is one of the most basic things in many similar matching algorithms
step
First, build some sensitive words into a prefix tree ,( Prefix tree ,,)
The organization between sensitive words is a prefix tree , But we need to upgrade the prefix tree

need Add a line on each node fail The pointer (fail The pointer is indicated by a dotted line ), The first node fail The pointer points to null , And the subordinate direct nodes of the head node are artificially specified fail The pointers all point to the header node ,

Then the bottom fail How to set the pointer ?
First, finish building the prefix tree , Set up fail The pointer , Set the... Of each node according to the order of width priority fail The pointer ,
The following figure ,X The parent node of a node has a path to its own b, however X Of the parent node of fail Pointer refers to the head node , The head node has a trend a node 、b node 、c Path of node , That is to say, the head node does not go b Direction of the road ; So the head node follows fail The pointer goes down , Came to null,null Of course, the node does not go b Direction of the road , therefore x Of fail The pointer points directly to the header node

Then set the following figure X Nodal fail The pointer , Same as above , The path from the parent node to its own direction is d, But from the parent node fail The pointer cannot find the direction d Direction of the road , therefore X Nodal fail The pointer points directly to the header node

Then set the following figure X Nodal fail The pointer ( Set in width first order ),X My father goes to his own way c, Paternal fail Pointer to head node , But the head node has a trend c Direction of the road , therefore X Of fail The pointer points to this point and goes to c Node of direction

fail Pointer setting rules
The first 3) This situation is more complicated ,



AC automata fail The essence of pointer
If the match fails , In the remaining string , Which string prefix matches the longest suffix that must contain the current character ,




package com.harrison.class22;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.List;
import java.util.Queue;
/** * @author Harrison * @create 2022-04-02-10:21 * @motto looking for him for thousand times in the crowd , Suddenly look back , The man was in the dim light . */
public class Code03_ACAutomaton {
// The node of the prefix tree
public static class Node{
// If one Node,end It's empty , Description is not the end
// If end Not empty , Indicates that this point is the end of a string ,end The value of is this string
public String end;
// Only on the top end When the variable is not empty ,endUse It makes sense
// Express , This string has no answer before , Prevent repeated collection of answers
public boolean endUse;
public Node fail;
// The path below the prefix tree , Hypothetical article , All sensitive words are lowercase , You can fix it 26 length
public Node[] nexts;
public Node(){
endUse=false;
end=null;
fail=null;
nexts=new Node[26];
}
}
public static class ACAutomaton{
private Node root;
// about AC For automata, characters must be on the road , Not on the node , Just like prefix tree
public ACAutomaton(){
root=new Node();
}
// The process of building prefix tree , Call as many sensitive words as you have insert Method
// After the prefix tree is built , Call again build Method to set all fail The pointer
public void insert(String s){
char[] str=s.toCharArray();
Node cur=root;
int index=0;
for(int i=0; i<str.length; i++){
index=str[i]-'a';
if(cur.nexts[index]==null){
cur.nexts[index]=new Node();
}
cur=cur.nexts[index];
}
cur.end=s;
}
// Be careful : Not set the pop-up node itself fail The pointer
// Instead, pop up the node and set all its children fail The pointer
public void build(){
Queue<Node> queue=new LinkedList<>();
queue.add(root);
Node cur=null;
Node cfail=null;
while(!queue.isEmpty()){
// The current node pops up , All descendants of the current node join the queue , The current node gives its children settings fail The pointer
// Some father cur
cur=queue.poll();
for(int i=0; i<26; i++){
// All the way
// cur-> father i No. 1 son , Must take i Son No fail The pointer is set
if(cur.nexts[i]!=null){
// If there is i No. 1 son
cur.nexts[i].fail=root;// Set it to root, If found, set it to someone else
cfail=cur.fail;
while (cfail!=null){
if(cfail.nexts[i]!=null){
cur.nexts[i].fail=cfail.nexts[i];
break;
}
cfail=cfail.fail;
}
queue.add(cur.nexts[i]);
}
}
}
}
// Big article
public List<String> containWords(String content){
char[] str=content.toCharArray();
Node cur=root;
Node follow=null;
int index=0;
List<String> ans=new ArrayList<>();
for(int i=0; i<str.length; i++){
index=str[i]-'a';// road
// If the current character is not matched on this road , Just follow fail Direction to the next path
while(cur.nexts[index]==null && cur!=root){
cur=cur.fail;
}
// 1) Now come to the path , Can continue to match
// 2) Now come to the node , Is the root node of the prefix tree
cur=cur.nexts[index]!=null?cur.nexts[index]:root;
follow=cur;
while(follow!=root){
if(follow.endUse){
break;
}
// Different needs , Revise between this paragraph
if(follow.end!=null){
ans.add(follow.end);
follow.endUse=true;
}
// Different needs , Revise between this paragraph
follow=follow.fail;
}
}
return ans;
}
}
public static void main(String[] args) {
ACAutomaton ac = new ACAutomaton();
ac.insert("dhe");
ac.insert("he");
ac.insert("abcdheks");
// Set up fail The pointer
ac.build();
List<String> contains = ac.containWords("abcdhekskdjfafhasldkflskdjhwqaeruv");
for (String word : contains) {
System.out.println(word);
}
}
}
边栏推荐
- 《论文复现》BiDAF代码实现过程(4)模型训练+验证
- Wechat applet monitoring real-time geographical location change event interface application
- CAD text styles
- Homework of the 20th week
- PCL点云处理之ply文件读取与保存(五十四)
- Apipost signs up with Chinatelecom! Work together to accelerate the digital transformation of enterprises
- Circom 2.0: A Scalable Circuit Compiler
- Go+语言
- Function default parameter pit avoidance Guide
- "Iruntime": undeclared identifier
猜你喜欢

From A76 to A78 -- learning arm microarchitecture in change

Helm —— 强大的 Kubernetes 应用的包管理工具

Day10: declarative transaction control

通讯异常判断之通讯心跳信号应用编程

【南瓜书ML】(task4)神经网络中的数学推导

Gradle learning - gradle advanced instructions

Circom 2.0: A Scalable Circuit Compiler

Calling Laser Galvanometer control in the application and development tutorial of motion control card

Composability and Recursion in snarkyJS

PCL点云处理之平面规则化(五十五)
随机推荐
【考研词汇训练营】Day 12 —— native,separate,figure,contribute,species,assumption,suppose
Win10 解base64
PCL点云处理之平面规则化(五十五)
Makefile basics -- extensions
PCL点云处理之找到直线点集的两个端点(五十七)
Build Tencent cloud website server at low cost (build your own website server)
一种自动化九点标定工具原理(包涵部分源码)
PCL点云处理之随机采样抽稀RandomSample(六十)
Esp32485 air temperature and humidity test
Homework of the 20th week
RISC0:Towards a Unified Compilation Framework for Zero Knowledge
2022 Niuke multi school 7.23
基于深度学习的多任务人脸属性分析(基于飞桨PaddlePaddle)
[postgraduate entrance examination vocabulary training camp] day 12 - native, separate, figure, contribution, categories, assessment, propose
企业运营自媒体不能“自嗨”:内容要接地气不能接广告
Everything about database, database and table is here
QT学习之VS创建QT项目显示未将对象引用设置到对象的实例
微机原理:CPU架构详解
My love lesson 2 - remove webpage pop-up
Database - Metadata databasemetadata beginner