当前位置:网站首页>Explanation of suffix automata (SAM) + Luogu p3804 [template] suffix automata (SAM)
Explanation of suffix automata (SAM) + Luogu p3804 [template] suffix automata (SAM)
2022-07-29 07:16:00 【Morgannr】
SAM
Suffix automata can store all substrings of a string .
One 、 Concept
Here is a character string "aababa" Of Postfix automaton .

In the picture above Black edge call Move the side , Green edge call Link edge .
The path taken along the transfer edge from the root node corresponds to a substring .
The root node represents an empty string , Other nodes Express Similar substring Of aggregate .( Similar substring Refer to Substring with the same end position )
- explain : Above picture , Green edges and nodes Make a tree ( Then explain ), Black edges and nodes Form a directed acyclic graph , With
6Number point For example , share3Paths Sure arrive6Number point , namely6The number represents3A different substring ( Express A collection of substrings ) - for example The nodes in the above figure represent collections :
② ={ a }③ ={ aa }④ ={ aab }⑤ ={ aaba }
⑥ ={ aabab,abab,bab }⑦ ={ ab,b }
⑧ ={ aababa,ababa,baba}
⑨ ={ aba,ba } - Find out : Same set Medium All strings , The last letter is the same , And the end position is the same .(“ Set of similar substrings ” meaning ), As for the ⑦ aggregate Come on ,
endpos("ab") = endpos("b") = { 3,5 }
Two 、 Building suffix automata
Building suffix automata The process of is a Dynamic process , That is to say Insert nodes one by one , Two kinds of edges ( Move the side , Link edge ) Alternate construction . The process should maintain 3 An array :
ch[x][c]: save nodexOf Transfer the end of the edge . example :ch[1][b] = 7, ch[2][b] = 7( From the above ①、② node AlongbThis side Can walk to ⑦ This End )fa[x]: save nodexOf End of link edge . example :fa[7] = 1,fa[6] = 7( From the above ⑦ node Along the green edge Can walk to ① This End )len[x]: save nodexOf The length of the longest string . example :len[6] = 5,len[7] = 2(⑥ In the string represented by the dot The length of the longest string is5)
3、 ... and 、 Build suffix link tree
Purpose of constructing suffix automata It's not matching on the graph , It's to extract from the picture Green link edge , Build a tree . Take a closer look at the picture above , We found that , Each node has only one green link edge pointing to its parent node , So we can construct a tree as shown in the figure below , We call it “ Postfix link tree ”:
What does this tree have characteristic ?
- Tree node Express Similar substring And its End position Of aggregate .( We completely transferred the information on the map to this tree )
Its Legitimacy : Child node Of Shortest string Of Longest suffix = Parent node Of Longest string ,
- Like in a tree ⑥ Node is no. The shortest string is
“bab”, it The longest suffix excluding itself is“ab”, Exactly equal to Its parent node ⑦ Node node Longest string“ab”. Again , about ⑦ node , Its The shortest string is“b”, it The longest suffix excluding itself is empty , Exactly equal to The root node ① The longest string of empty .
just so so Find out , from The leaf node goes up to the root node In the process of , Length of string yes Decreasing , Besides empty outside , character string The last letter is the same ,
Since these substrings are stored in the tree Nature and Law , Then we can use them to Solve some problems :
- Length of substring of node : The longest
len[6] = 5, The shortestlen[6] - len[7] = 3
( about The shortest string length , We can do it in During traversal , find Current node And its Parent node Longest string length , The difference between the two is the shortest string length of the current node ) - Number of substrings of node :
len[6] - len[7] = 3,len[7] - len[1] = 2
( It is consistent with the method of finding the length of the shortest string of the current node , Make a difference between the longest string length of the current node and its parent node ) - Number of substrings :
cnt[4] = 1,cnt[6] = 1,cnt[7] = 2
( In the tree painted above , Next to the node Blue numbers Represents the The node represents the position where the string appears , Such as ⑥ Node is no. , All strings it represents are Only in5The position appears once ,④ So it is with , and ④、⑥ Parent of node ⑦ Corresponding“ab”、“b”Two strings meanwhile stay3、5Position appears , That is, the corresponding appeared two , And we found that ⑦ The corresponding number of times Exactly His two sons ④、⑥ Sum of times )
Four 、 The specific steps of constructing suffix automata (extend function )
All steps must Meet the above “ Legitimacy ”, This is convenient Extract the link tree and make statistics .
We use Variable tot Number nodes , The root node by 1 Number point , The pointer np Always Point to the last created node , It's also Start at the root node ( The root node Default , No need to create )
int tot = 1, np = 1;
The next step is 3 An array ,
//fa Link edge end ,ch Transfer edge end ,len The longest string length
int fa[N], ch[N][26], len[N];
Pass in extend function Medium Parameters It's a Offset c(0 ~ 25:a ~ z, Think What is passed in is a character )
The key to building is 4 A pointer to the :
p: Dynamic bounce pointer ,np: Fixed pointing New dot pointer ,q: Fixed pointing Link point ,nq: Point to New link point .
First, structure “ Prelude ”: if ch[p][c] non-existent (p Point to the old point Along character c There is no end you can reach ), Just from p ( Old point ) towards np ( New points ) build Move the side .
int p = np; np = ++tot; //p Point to the old point ,np It's a new point
len[np] = len[p] + 1; cnt[np] = 1; // Number of substrings
//p Jump back along the link edge , Transfer the edge from the old point to the new point
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
If sign out for when p = 0, explain c It's a new character , from New points towards The root node Build link edge
if(!p) fa[np] = 1; //1 The number is the root node
If sign out for when p > 0, explain ch[p][c] There is (p Point to the old point Along character c There is a destination that can be reached ), Make q = ch[p][c]
- (1) if
pAndqDistance of= 1, legal , be fromnptowardsqBuild link edge . - (2) if
pAndqDistance of> 1, illegal , be Splitq, fromnptowardsnqBuild link edge .
else {
// If c It's an old character
int q = ch[p][c]; //q Is the link point
//2. If link point q legal , From a new point to q Build link edge
if (len[q] == len[p] + 1) fa[np] = q;
//3. If link point q illegal , Then crack q spot ﹐ Rebuild two kinds of edges
else {
int nq = ++tot; //nq It's a new link , It's a legal father found for Xindian
len[nq] = len[p] + 1;
// The reconstruction nq, q, np Link side of
fa[nq] = fa[q], fa[q] = fa[np] = nq; // Point to q The transfer edge of is changed to point to nq
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}// from q The outgoing transfer edge is copied to nq
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
Look at the During construction Changes in variables :
For the top Insert 4 Characters in the process , Link point q It's all legal ,
And for the bottom Insert first 5 Characters b when ,p = 5,np = 6, There is an illegal situation ,
Let's first count ② ~ ⑥ The string set represented by node No :
We will focus on ④ and ⑥ node , For new points ⑥, We should look for The longest suffix of the shortest string in its set ("ab"), obviously Only ④ The node contains , however ④ node Illegal , because It contains substrings "ab" Not the longest , therefore Does not satisfy legitimacy ( Unable to construct suffix Link tree ),
What shall I do? ? At this time, we are going to take ④ The illegal link point on No. 2 is broken ,
That is, from this situation 
Turn into :
It is not difficult to find that at this time Added a legal link point ⑦, Next, we need to do some Edge building operation ( Three steps , It can be understood in contrast to the code )

(1) change fa Link edge , Three steps :
- take New link point
nq = 7Point toq = 4Parent node :fa[7] = 1; - take
q = 4node and Newly splitnq = 7node link :fa[4] = 7; - go back to The purpose at the beginning , Find a new one Legal link points :
fa[6] = 7;
// The reconstruction nq, q, np Link side of
fa[nq] = fa[q], fa[q] = fa[np] = nq; // Point to q The transfer edge of is changed to point to nq
(2)for Cycle change ch Move the side : because ④ Node No. now only represents character string "aab" 了 , therefore Originally from ② Connect to ④ And from the ① Connect to ④ The side of must change its direction , The specific direction is New nodes ⑦. Corresponding to the above picture :ch[2][b] = 7, ch[1][b] = 7;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}// from q The outgoing transfer edge is copied to nq
(3) establish The transfer edge connected from the new link point ( In and out ), copying :cpy:ch[7][a] = 5;
memcpy(ch[nq], ch[q], sizeof ch[q]);
Follow the Model It is convenient to understand and memorize code :
below Illegal process It can be simulated on paper :
Be careful :
Can prove that , The length is
nString , At most2n - 1Points and3n - 4side . for exampleabb…bb.The complexity of time and space is
O(n)When building a link tree , Both points and edges are open
2nSpacenode
7and9In order to Meet the legitimacy of substring set links And created , It does not affect the number of occurrences of the substring , thereforecnt[7,9] = 0
SAM a key :
Code board :(extend function )
void extend(int c) {
int p = np;
np = ++tot;
len[np] = len[p] + 1, cnt[np] = 1;
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
if (!p) {
fa[np] = 1;
}
else {
int q = ch[p][c];
if (len[q] == len[p] + 1) fa[np] = q;
else {
int nq = ++tot;
len[nq] = len[p] + 1;
fa[nq] = fa[q], fa[q] = fa[np] = nq;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
}
Example 【 Templates 】 Postfix automaton (SAM)
Title Description
Given a string containing only lowercase letters S S S.
Please find out S S S All occurrences of are not 1 1 1 Multiply the number of occurrences of the substring by the maximum length of the substring .
Input format
A string containing only lowercase letters on a line S S S.
Output format
An integer , For the answer .
Examples #1
The sample input #1
abab
Sample output #1
4
Tips
about 10 % 10 \% 10% The data of , ∣ S ∣ ≤ 1000 \lvert S \rvert \le 1000 ∣S∣≤1000.
about 100 % 100\% 100% The data of , 1 ≤ ∣ S ∣ ≤ 10 6 1 \le \lvert S \rvert \le {10}^6 1≤∣S∣≤106.
Ideas :
Building suffix automata , Then use depth first traversal to make statistics on the suffix link tree .
Code :( The board )
#include <bits/stdc++.h>
using namespace std;
//#define map unordered_map
//#define int long long
const int N = 1e6 + 10;
typedef long long ll;
char s[N];
int tot = 1, np = 1;
int fa[N << 1], ch[N << 1][26], len[N << 1];
ll cnt[N << 1], ans;
vector<int> g[N << 1];
void extend(int c) {
int p = np;
np = ++tot;
len[np] = len[p] + 1, cnt[np] = 1;
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
if (!p) {
fa[np] = 1;
}
else {
int q = ch[p][c];
if (len[q] == len[p] + 1) fa[np] = q;
else {
int nq = ++tot;
len[nq] = len[p] + 1;
fa[nq] = fa[q], fa[q] = fa[np] = nq;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
}
void dfs(int u) {
for (auto v : g[u]) {
dfs(v);
cnt[u] += cnt[v];
}
if (cnt[u] > 1) ans = max(ans, cnt[u] * len[u]);
}
signed main()
{
scanf("%s", s);
for (int i = 0; s[i]; ++i) {
extend(s[i] - 'a');
}
for (int i = 2; i <= tot; ++i) {
g[fa[i]].push_back(i);
}
dfs(1);
cout << ans << '\n';
return 0;
}
边栏推荐
- Vagrant box cluster processing
- Flink real time warehouse DWD layer (traffic domain) template code
- 基于C语言设计的学生成绩排名系统
- VMware16创建虚拟机:Win11无法安装
- Connecting PHP 7.4 to Oracle configuration on Windows
- Win11vmware turns on the virtual machine and restarts on the blue screen and the solution that cannot be started
- spark学习笔记(七)——sparkcore核心编程-RDD序列化/依赖关系/持久化/分区器/累加器/广播变量
- WPF nested layout case
- VMware16安装虚拟机遇到的问题
- MVFuseNet:Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of
猜你喜欢

Decompilation of wechat applet

Flink实时仓库-DWD层(流量域)模板代码

CVPR2021| 基于自监督学习的多视图立体匹配 (CVPR2021)

Operator3-设计一个operator

WPF 界面布局必知基础

Improved pillar with fine grained feature for 3D object detection paper notes

Record - step on the pit - real-time data warehouse development - doris/pg/flink

ERROR 1045 (28000) Access denied for user ‘root‘@‘localhost‘解决方法

2D cartoon rendering - advanced skills

基于C语言设计的学生成绩排名系统
随机推荐
Decompilation of wechat applet
Pod基本介绍
Leetcode 879. profit plan
同步/异步、阻塞/非阻塞 与 IO
My personal website doesn't allow access to wechat, so I did this
太空射击第17课: Game Over (結束)
[C language brush leetcode] 2332. The latest time to get on the bus (m)
Interface test actual project 03: execute test cases
接口测试实战项目03:执行测试用例
Unity sends a post request to the golang server for parsing and returning
It's enough for MySQL to have this article (disgusting and crazy typing 37k words, just for Bo Jun's praise!!!)
如何使用gs_expansion扩展节点
对Vintage分析的一些学习理解
Summary of OCR optical character recognition methods
buck电路boot和ph引脚实测
微服务远程调用
fillder使用
Vmware16 create virtual machine: win11 cannot be installed
Win11 system error: code execution cannot continue because ierutil.dll cannot be found. Reinstalling the program may fix this problem
Fillder use