当前位置:网站首页>Explanation of suffix automata (SAM) + Luogu p3804 [template] suffix automata (SAM)
Explanation of suffix automata (SAM) + Luogu p3804 [template] suffix automata (SAM)
2022-07-29 07:16:00 【Morgannr】
SAM
Suffix automata can store all substrings of a string .
One 、 Concept
Here is a character string "aababa"
Of Postfix automaton .
In the picture above Black edge call Move the side , Green edge call Link edge .
The path taken along the transfer edge from the root node corresponds to a substring .
The root node represents an empty string , Other nodes Express Similar substring Of aggregate .( Similar substring Refer to Substring with the same end position )
- explain : Above picture , Green edges and nodes Make a tree ( Then explain ), Black edges and nodes Form a directed acyclic graph , With
6
Number point For example , share3
Paths Sure arrive6
Number point , namely6
The number represents3
A different substring ( Express A collection of substrings ) - for example The nodes in the above figure represent collections :
② ={ a }
③ ={ aa }
④ ={ aab }
⑤ ={ aaba }
⑥ ={ aabab,abab,bab }
⑦ ={ ab,b }
⑧ ={ aababa,ababa,baba}
⑨ ={ aba,ba }
- Find out : Same set Medium All strings , The last letter is the same , And the end position is the same .(“ Set of similar substrings ” meaning ), As for the ⑦ aggregate Come on ,
endpos("ab") = endpos("b") = { 3,5 }
Two 、 Building suffix automata
Building suffix automata The process of is a Dynamic process , That is to say Insert nodes one by one , Two kinds of edges ( Move the side , Link edge ) Alternate construction . The process should maintain 3
An array :
ch[x][c]
: save nodex
Of Transfer the end of the edge . example :ch[1][b] = 7, ch[2][b] = 7
( From the above ①、② node Alongb
This side Can walk to ⑦ This End )fa[x]
: save nodex
Of End of link edge . example :fa[7] = 1,fa[6] = 7
( From the above ⑦ node Along the green edge Can walk to ① This End )len[x]
: save nodex
Of The length of the longest string . example :len[6] = 5,len[7] = 2
(⑥ In the string represented by the dot The length of the longest string is5
)
3、 ... and 、 Build suffix link tree
Purpose of constructing suffix automata It's not matching on the graph , It's to extract from the picture Green link edge , Build a tree . Take a closer look at the picture above , We found that , Each node has only one green link edge pointing to its parent node , So we can construct a tree as shown in the figure below , We call it “ Postfix link tree ”:
What does this tree have characteristic ?
- Tree node Express Similar substring And its End position Of aggregate .( We completely transferred the information on the map to this tree )
Its Legitimacy : Child node Of Shortest string Of Longest suffix = Parent node Of Longest string ,
- Like in a tree ⑥ Node is no. The shortest string is
“bab”
, it The longest suffix excluding itself is“ab”
, Exactly equal to Its parent node ⑦ Node node Longest string“ab”
. Again , about ⑦ node , Its The shortest string is“b”
, it The longest suffix excluding itself is empty , Exactly equal to The root node ① The longest string of empty .
just so so Find out , from The leaf node goes up to the root node In the process of , Length of string yes Decreasing , Besides empty outside , character string The last letter is the same ,
Since these substrings are stored in the tree Nature and Law , Then we can use them to Solve some problems :
- Length of substring of node : The longest
len[6] = 5
, The shortestlen[6] - len[7] = 3
( about The shortest string length , We can do it in During traversal , find Current node And its Parent node Longest string length , The difference between the two is the shortest string length of the current node ) - Number of substrings of node :
len[6] - len[7] = 3,len[7] - len[1] = 2
( It is consistent with the method of finding the length of the shortest string of the current node , Make a difference between the longest string length of the current node and its parent node ) - Number of substrings :
cnt[4] = 1,cnt[6] = 1,cnt[7] = 2
( In the tree painted above , Next to the node Blue numbers Represents the The node represents the position where the string appears , Such as ⑥ Node is no. , All strings it represents are Only in5
The position appears once ,④ So it is with , and ④、⑥ Parent of node ⑦ Corresponding“ab”
、“b”
Two strings meanwhile stay3、5
Position appears , That is, the corresponding appeared two , And we found that ⑦ The corresponding number of times Exactly His two sons ④、⑥ Sum of times )
Four 、 The specific steps of constructing suffix automata (extend
function )
All steps must Meet the above “ Legitimacy ”, This is convenient Extract the link tree and make statistics .
We use Variable tot
Number nodes , The root node by 1
Number point , The pointer np
Always Point to the last created node , It's also Start at the root node ( The root node Default , No need to create )
int tot = 1, np = 1;
The next step is 3
An array ,
//fa Link edge end ,ch Transfer edge end ,len The longest string length
int fa[N], ch[N][26], len[N];
Pass in extend
function Medium Parameters It's a Offset c
(0 ~ 25:a ~ z
, Think What is passed in is a character )
The key to building is 4
A pointer to the :
p
: Dynamic bounce pointer ,np
: Fixed pointing New dot pointer ,q
: Fixed pointing Link point ,nq
: Point to New link point .
First, structure “ Prelude ”: if ch[p][c]
non-existent (p
Point to the old point Along character c
There is no end you can reach ), Just from p
( Old point ) towards np
( New points ) build Move the side .
int p = np; np = ++tot; //p Point to the old point ,np It's a new point
len[np] = len[p] + 1; cnt[np] = 1; // Number of substrings
//p Jump back along the link edge , Transfer the edge from the old point to the new point
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
If sign out for
when p = 0
, explain c
It's a new character , from New points towards The root node Build link edge
if(!p) fa[np] = 1; //1 The number is the root node
If sign out for
when p > 0
, explain ch[p][c]
There is (p
Point to the old point Along character c
There is a destination that can be reached ), Make q = ch[p][c]
- (1) if
p
Andq
Distance of= 1
, legal , be fromnp
towardsq
Build link edge . - (2) if
p
Andq
Distance of> 1
, illegal , be Splitq
, fromnp
towardsnq
Build link edge .
else {
// If c It's an old character
int q = ch[p][c]; //q Is the link point
//2. If link point q legal , From a new point to q Build link edge
if (len[q] == len[p] + 1) fa[np] = q;
//3. If link point q illegal , Then crack q spot ﹐ Rebuild two kinds of edges
else {
int nq = ++tot; //nq It's a new link , It's a legal father found for Xindian
len[nq] = len[p] + 1;
// The reconstruction nq, q, np Link side of
fa[nq] = fa[q], fa[q] = fa[np] = nq; // Point to q The transfer edge of is changed to point to nq
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}// from q The outgoing transfer edge is copied to nq
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
Look at the During construction Changes in variables :
For the top Insert 4
Characters in the process , Link point q
It's all legal ,
And for the bottom Insert first 5
Characters b
when ,p = 5
,np = 6
, There is an illegal situation ,
Let's first count ② ~ ⑥ The string set represented by node No :
We will focus on ④ and ⑥ node , For new points ⑥, We should look for The longest suffix of the shortest string in its set ("ab"
), obviously Only ④ The node contains , however ④ node Illegal , because It contains substrings "ab"
Not the longest , therefore Does not satisfy legitimacy ( Unable to construct suffix Link tree ),
What shall I do? ? At this time, we are going to take ④ The illegal link point on No. 2 is broken ,
That is, from this situation
Turn into :
It is not difficult to find that at this time Added a legal link point ⑦, Next, we need to do some Edge building operation ( Three steps , It can be understood in contrast to the code )
(1) change fa
Link edge , Three steps :
- take New link point
nq = 7
Point toq = 4
Parent node :fa[7] = 1;
- take
q = 4
node and Newly splitnq = 7
node link :fa[4] = 7;
- go back to The purpose at the beginning , Find a new one Legal link points :
fa[6] = 7;
// The reconstruction nq, q, np Link side of
fa[nq] = fa[q], fa[q] = fa[np] = nq; // Point to q The transfer edge of is changed to point to nq
(2)for
Cycle change ch
Move the side : because ④ Node No. now only represents character string "aab"
了 , therefore Originally from ② Connect to ④ And from the ① Connect to ④ The side of must change its direction , The specific direction is New nodes ⑦. Corresponding to the above picture :ch[2][b] = 7, ch[1][b] = 7;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}// from q The outgoing transfer edge is copied to nq
(3) establish The transfer edge connected from the new link point ( In and out ), copying :cpy:ch[7][a] = 5;
memcpy(ch[nq], ch[q], sizeof ch[q]);
Follow the Model It is convenient to understand and memorize code :
below Illegal process It can be simulated on paper :
Be careful :
Can prove that , The length is
n
String , At most2n - 1
Points and3n - 4
side . for exampleabb…bb
.The complexity of time and space is
O(n)
When building a link tree , Both points and edges are open
2n
Spacenode
7
and9
In order to Meet the legitimacy of substring set links And created , It does not affect the number of occurrences of the substring , thereforecnt[7,9] = 0
SAM
a key :
Code board :(extend
function )
void extend(int c) {
int p = np;
np = ++tot;
len[np] = len[p] + 1, cnt[np] = 1;
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
if (!p) {
fa[np] = 1;
}
else {
int q = ch[p][c];
if (len[q] == len[p] + 1) fa[np] = q;
else {
int nq = ++tot;
len[nq] = len[p] + 1;
fa[nq] = fa[q], fa[q] = fa[np] = nq;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
}
Example 【 Templates 】 Postfix automaton (SAM)
Title Description
Given a string containing only lowercase letters S S S.
Please find out S S S All occurrences of are not 1 1 1 Multiply the number of occurrences of the substring by the maximum length of the substring .
Input format
A string containing only lowercase letters on a line S S S.
Output format
An integer , For the answer .
Examples #1
The sample input #1
abab
Sample output #1
4
Tips
about 10 % 10 \% 10% The data of , ∣ S ∣ ≤ 1000 \lvert S \rvert \le 1000 ∣S∣≤1000.
about 100 % 100\% 100% The data of , 1 ≤ ∣ S ∣ ≤ 10 6 1 \le \lvert S \rvert \le {10}^6 1≤∣S∣≤106.
Ideas :
Building suffix automata , Then use depth first traversal to make statistics on the suffix link tree .
Code :( The board )
#include <bits/stdc++.h>
using namespace std;
//#define map unordered_map
//#define int long long
const int N = 1e6 + 10;
typedef long long ll;
char s[N];
int tot = 1, np = 1;
int fa[N << 1], ch[N << 1][26], len[N << 1];
ll cnt[N << 1], ans;
vector<int> g[N << 1];
void extend(int c) {
int p = np;
np = ++tot;
len[np] = len[p] + 1, cnt[np] = 1;
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
if (!p) {
fa[np] = 1;
}
else {
int q = ch[p][c];
if (len[q] == len[p] + 1) fa[np] = q;
else {
int nq = ++tot;
len[nq] = len[p] + 1;
fa[nq] = fa[q], fa[q] = fa[np] = nq;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
}
void dfs(int u) {
for (auto v : g[u]) {
dfs(v);
cnt[u] += cnt[v];
}
if (cnt[u] > 1) ans = max(ans, cnt[u] * len[u]);
}
signed main()
{
scanf("%s", s);
for (int i = 0; s[i]; ++i) {
extend(s[i] - 'a');
}
for (int i = 2; i <= tot; ++i) {
g[fa[i]].push_back(i);
}
dfs(1);
cout << ans << '\n';
return 0;
}
边栏推荐
- Unity sends a post request to the golang server for parsing and returning
- Hj37 statistics of the total number of rabbits per month Fibonacci series
- VMware16创建虚拟机:无法创建新虚拟机,不具备执行此操作的权限
- [redis] redis development specifications and precautions
- 图像加噪声与矩阵求逆
- 【OpenGL】着色器(Shader)的使用
- Flink实时仓库-DWD层(kafka-关联mysql的lookup join)模板代码
- gin 中间件
- Excel file reading and writing (creation and parsing)
- Redis基础篇
猜你喜欢
Flink real time warehouse DWD layer (traffic domain) template code
Nodejs installation tutorial
After three years of outsourcing, the salary of automatic testing after job hopping is twice that of the original. The secret is
Ansible中的变量及加密
Summary of OCR optical character recognition methods
Error 1045 (28000) access denied for user 'root' @ 'localhost' solution
Some tips of vim text editor
[Charles' daily problems] when you open Charles, you can't use nails
Student status management system based on C language design
Cvpr2021 | multi view stereo matching based on self supervised learning (cvpr2021)
随机推荐
vue-router路由缓存
论文阅读 (62):Pointer Networks
NPM install reports an error NPM err could not resolve dependency NPM err peer
图像加噪声与矩阵求逆
VMware16安装虚拟机遇到的问题
MySQL 高级(进阶) SQL 语句 (一)
Vscode remote debugging PHP solution through remotessh and Xdebug
Kubernetes (五) ---------部署 Kubernetes Dashboard
Vmware16 create virtual machine: cannot create a new virtual machine, do not have permission to perform this operation
数组的子集不能累加出的最小正数
ERROR 1045 (28000) Access denied for user ‘root‘@‘localhost‘解决方法
Why does ETL often become ELT or even let?
MVFuseNet:Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of
记 - 踩坑-实时数仓开发 - doris/pg/flink
2022-07-28:以下go语言代码输出什么?A:AA;B:AB;C:BA;D:BB。 package main import ( “fmt“ ) func main() { f
gin 路由,参数,输出
It's enough for MySQL to have this article (disgusting and crazy typing 37k words, just for Bo Jun's praise!!!)
WPF简单登录页面的完成案例
同步/异步、阻塞/非阻塞 与 IO
Flink实时仓库-DWD层(流量域)模板代码