当前位置:网站首页>AC automata
AC automata
2022-06-29 07:59:00 【Today I will also write bugs】
List of articles
Why AC automata
For a long article or string str, There are several sensitive words .
How to find out str Sensitive words in ?
The feasible solution is to traverse str, Then it matches all sensitive words at each position , If both matches fail , Go back to the next position of the first matching position , In turn . Until when str End of traversal .
But this method wastes too much time , If the total length of sensitive words is M,str Is the length of the N, It is obvious that the time complexity has come O(MN).
AC Automata is used to solve this problem efficiently .
AC The concept of automata
AC The basis of automata is Trie Trees ( Prefix tree ). and Trie The difference between trees is , Each node in the tree has a pointer to the child , One more fail The pointer .
- Every node has fail The pointer , And the root node fail The pointer points to null
- The children of the first layer below the root node fail The pointers all point to the root node
- In other locations fail The pointer first looks at the parent node fail Does the pointer have a path to its own? If so, let the current position of fail The pointer points to the father fail The node to which the pointer points , If you don't have this road, go through fail The pointer continues to go up , If it's empty , Then let fail The pointer points to the root node
- fail The meaning of the pointer : If the path that must be formed at the end of the current character is str, Which string prefix is left and str The suffix , Have the maximum matching length .fail The pointer points to the node position corresponding to that string .

How do we collect the matched sensitive words ?
A flag bit must be placed at the end of each path to indicate that the node has come to the end , As long as you get to the end, it means that the match is successful . And there will be one bool A variable of type indicates that the sensitive word has been found , Don't look for this sensitive word next time .
Every time you come to a position, follow fail The pointer goes around , See if the position has come to an end , If not, continue to match .
Or the example above :
How to set up fail The pointer
Use width first traversal , Pop up a node cur when , look for cur Next level node of next, according to cur Of fail Whether the next position of the node pointed to is next, modify next Of fail Where the pointer points , If not, it means root.
Code
#include<vector>
#include<queue>
#include<string>
#include<iostream>
using namespace std;
struct Node {
// Has the answer been added to the string of table expression before
// If you have already joined , You don't need to add an answer next time you get a match
bool endUse;
Node* fail;
vector<Node*>nexts;
// If end If it is not empty, it means the end of a string
string end;
Node()
:endUse(false)
, fail(nullptr)
, nexts(26)
, end("")
{
}
};
class ACAutomation {
public:
ACAutomation()
{
// Initializing a root node
_root = new Node;
}
// Same as prefix tree operation , Build prefix tree only
void insert(const string& str)
{
Node* cur = _root;
int index = 0;
for (int i = 0; i < str.size(); i++) {
index = str[i] - 'a';
if (cur->nexts[index] == nullptr)
{
cur->nexts[index] = new Node;
}
cur = cur->nexts[index];
}
cur->end = str;
}
// Connect fail The pointer
void build()
{
// Breadth first traversal
queue<Node*>q;
q.push(_root);
Node* cur = nullptr;
Node* cfail = nullptr;
while (!q.empty())
{
// The father goes out in line to set up the children's fail The pointer
cur = q.front();
q.pop();
for (int i = 0; i < 26; i++)
{
// Set up cur->nexts[i] Of fail The pointer
if (cur->nexts[i]!=nullptr)
{
cur->nexts[i]->fail = _root;// Point to the root node by default , If you find it later, change it again
// look for cur Node fail Where the pointer points
cfail = cur->fail;
// seek cfail The position to which the pointer is pointing
while (cfail)
{
// eureka , And change the location
if (cfail->nexts[i])
{
cur->nexts[i]->fail = cfail->nexts[i];
break;
}
// without , Just go up one floor fail go , Walk until you are empty , It indicates that you have reached the root node
cfail = cfail->fail;
}
}
q.push(cur->nexts[i]);
}
}
}
// Give a long string content, Return all sensitive words to
vector<string>containWords(const string& content)
{
Node* cur = _root;
Node* follow = nullptr;
vector<string>ans;
int index = 0;
for (int i = 0; i < content.size(); i++)
{
index = content[i] - 'a';
// The current character is not matched on this road , Just follow fail The pointer finds another path to match
while (cur->nexts[index] == nullptr && cur != _root)
{
cur = cur->fail;
}
// Now the path can be matched
// Now, the node is the root node of the prefix tree
cur = cur->nexts[index] != nullptr ? cur->nexts[index] : _root;
// Go to any node and follow fail The pointer goes around
follow = cur;
while (follow != _root)
{
// It has been used to prevent repeated collection
if (follow->endUse)
{
break;
}
// eureka
if (follow->end != "")
{
ans.push_back(follow->end);
follow->endUse = true;
}
follow = follow->fail;
}
}
return ans;
}
private:
Node* _root;
};

边栏推荐
猜你喜欢

How to permanently set Mysql to utf8 encoding?
![[industrial control old horse] detailed design of PLC six way responder system](/img/9c/8bfe336bb95a028a4fb8130941ff81.png)
[industrial control old horse] detailed design of PLC six way responder system

Kyushu cloud helps Inner Mongolia's "counting from the east to the west" project to drive the smart new ecology of the surveying and mapping industry

关于SqlSugar的多对多的级联插入的问题(无法获取集合属性的id,导致无法维护中间表)

电检码配置

Appium 环境搭建

呕心沥血总结出来的MySQL常见错误以及解决方法(二)

【深度之眼吴恩达第四期作业班】多元线性回归Linear Regression with multiple variables总结

Protobuf binary file learning and parsing

SizeBalanceTree
随机推荐
【量化投资系统】因子处理安装talib
C actual combat - high configuration version of Snake game design
postman预处理/前置条件Pre-request
时间操作 - 时间格式转换
【深度之眼吴恩达机器学习作业班第四期】Regularization正则化总结
Detailed explanation of top and free commands
多态中的向上和向下转型
華為雲的AI深潜之旅
ROS2中的行为树 BehaviorTree
js异或混淆代码
MySQL系统关键字汇总(官网)
SAP ui5 Beginner (I) Introduction
Some examples.
Django - installing mysqlclient error: mysqlclient 1.4.0 or newer is required; you have 0.9.3
【深度之眼吴恩达机器学习作业班第四期】逻辑回归编程实现
Appium environment setup
低配MySQL数据库几十秒插入百万数据
Gateway controller communication protocol
Basic syntax - bit operation
Behaviortree in ros2