当前位置:网站首页>Explanation of suffix automata (SAM) + Luogu p3804 [template] suffix automata (SAM)
Explanation of suffix automata (SAM) + Luogu p3804 [template] suffix automata (SAM)
2022-07-29 07:16:00 【Morgannr】
SAM
Suffix automata can store all substrings of a string .
One 、 Concept
Here is a character string "aababa" Of Postfix automaton .

In the picture above Black edge call Move the side , Green edge call Link edge .
The path taken along the transfer edge from the root node corresponds to a substring .
The root node represents an empty string , Other nodes Express Similar substring Of aggregate .( Similar substring Refer to Substring with the same end position )
- explain : Above picture , Green edges and nodes Make a tree ( Then explain ), Black edges and nodes Form a directed acyclic graph , With
6Number point For example , share3Paths Sure arrive6Number point , namely6The number represents3A different substring ( Express A collection of substrings ) - for example The nodes in the above figure represent collections :
② ={ a }③ ={ aa }④ ={ aab }⑤ ={ aaba }
⑥ ={ aabab,abab,bab }⑦ ={ ab,b }
⑧ ={ aababa,ababa,baba}
⑨ ={ aba,ba } - Find out : Same set Medium All strings , The last letter is the same , And the end position is the same .(“ Set of similar substrings ” meaning ), As for the ⑦ aggregate Come on ,
endpos("ab") = endpos("b") = { 3,5 }
Two 、 Building suffix automata
Building suffix automata The process of is a Dynamic process , That is to say Insert nodes one by one , Two kinds of edges ( Move the side , Link edge ) Alternate construction . The process should maintain 3 An array :
ch[x][c]: save nodexOf Transfer the end of the edge . example :ch[1][b] = 7, ch[2][b] = 7( From the above ①、② node AlongbThis side Can walk to ⑦ This End )fa[x]: save nodexOf End of link edge . example :fa[7] = 1,fa[6] = 7( From the above ⑦ node Along the green edge Can walk to ① This End )len[x]: save nodexOf The length of the longest string . example :len[6] = 5,len[7] = 2(⑥ In the string represented by the dot The length of the longest string is5)
3、 ... and 、 Build suffix link tree
Purpose of constructing suffix automata It's not matching on the graph , It's to extract from the picture Green link edge , Build a tree . Take a closer look at the picture above , We found that , Each node has only one green link edge pointing to its parent node , So we can construct a tree as shown in the figure below , We call it “ Postfix link tree ”:
What does this tree have characteristic ?
- Tree node Express Similar substring And its End position Of aggregate .( We completely transferred the information on the map to this tree )
Its Legitimacy : Child node Of Shortest string Of Longest suffix = Parent node Of Longest string ,
- Like in a tree ⑥ Node is no. The shortest string is
“bab”, it The longest suffix excluding itself is“ab”, Exactly equal to Its parent node ⑦ Node node Longest string“ab”. Again , about ⑦ node , Its The shortest string is“b”, it The longest suffix excluding itself is empty , Exactly equal to The root node ① The longest string of empty .
just so so Find out , from The leaf node goes up to the root node In the process of , Length of string yes Decreasing , Besides empty outside , character string The last letter is the same ,
Since these substrings are stored in the tree Nature and Law , Then we can use them to Solve some problems :
- Length of substring of node : The longest
len[6] = 5, The shortestlen[6] - len[7] = 3
( about The shortest string length , We can do it in During traversal , find Current node And its Parent node Longest string length , The difference between the two is the shortest string length of the current node ) - Number of substrings of node :
len[6] - len[7] = 3,len[7] - len[1] = 2
( It is consistent with the method of finding the length of the shortest string of the current node , Make a difference between the longest string length of the current node and its parent node ) - Number of substrings :
cnt[4] = 1,cnt[6] = 1,cnt[7] = 2
( In the tree painted above , Next to the node Blue numbers Represents the The node represents the position where the string appears , Such as ⑥ Node is no. , All strings it represents are Only in5The position appears once ,④ So it is with , and ④、⑥ Parent of node ⑦ Corresponding“ab”、“b”Two strings meanwhile stay3、5Position appears , That is, the corresponding appeared two , And we found that ⑦ The corresponding number of times Exactly His two sons ④、⑥ Sum of times )
Four 、 The specific steps of constructing suffix automata (extend function )
All steps must Meet the above “ Legitimacy ”, This is convenient Extract the link tree and make statistics .
We use Variable tot Number nodes , The root node by 1 Number point , The pointer np Always Point to the last created node , It's also Start at the root node ( The root node Default , No need to create )
int tot = 1, np = 1;
The next step is 3 An array ,
//fa Link edge end ,ch Transfer edge end ,len The longest string length
int fa[N], ch[N][26], len[N];
Pass in extend function Medium Parameters It's a Offset c(0 ~ 25:a ~ z, Think What is passed in is a character )
The key to building is 4 A pointer to the :
p: Dynamic bounce pointer ,np: Fixed pointing New dot pointer ,q: Fixed pointing Link point ,nq: Point to New link point .
First, structure “ Prelude ”: if ch[p][c] non-existent (p Point to the old point Along character c There is no end you can reach ), Just from p ( Old point ) towards np ( New points ) build Move the side .
int p = np; np = ++tot; //p Point to the old point ,np It's a new point
len[np] = len[p] + 1; cnt[np] = 1; // Number of substrings
//p Jump back along the link edge , Transfer the edge from the old point to the new point
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
If sign out for when p = 0, explain c It's a new character , from New points towards The root node Build link edge
if(!p) fa[np] = 1; //1 The number is the root node
If sign out for when p > 0, explain ch[p][c] There is (p Point to the old point Along character c There is a destination that can be reached ), Make q = ch[p][c]
- (1) if
pAndqDistance of= 1, legal , be fromnptowardsqBuild link edge . - (2) if
pAndqDistance of> 1, illegal , be Splitq, fromnptowardsnqBuild link edge .
else {
// If c It's an old character
int q = ch[p][c]; //q Is the link point
//2. If link point q legal , From a new point to q Build link edge
if (len[q] == len[p] + 1) fa[np] = q;
//3. If link point q illegal , Then crack q spot ﹐ Rebuild two kinds of edges
else {
int nq = ++tot; //nq It's a new link , It's a legal father found for Xindian
len[nq] = len[p] + 1;
// The reconstruction nq, q, np Link side of
fa[nq] = fa[q], fa[q] = fa[np] = nq; // Point to q The transfer edge of is changed to point to nq
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}// from q The outgoing transfer edge is copied to nq
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
Look at the During construction Changes in variables :
For the top Insert 4 Characters in the process , Link point q It's all legal ,
And for the bottom Insert first 5 Characters b when ,p = 5,np = 6, There is an illegal situation ,
Let's first count ② ~ ⑥ The string set represented by node No :
We will focus on ④ and ⑥ node , For new points ⑥, We should look for The longest suffix of the shortest string in its set ("ab"), obviously Only ④ The node contains , however ④ node Illegal , because It contains substrings "ab" Not the longest , therefore Does not satisfy legitimacy ( Unable to construct suffix Link tree ),
What shall I do? ? At this time, we are going to take ④ The illegal link point on No. 2 is broken ,
That is, from this situation 
Turn into :
It is not difficult to find that at this time Added a legal link point ⑦, Next, we need to do some Edge building operation ( Three steps , It can be understood in contrast to the code )

(1) change fa Link edge , Three steps :
- take New link point
nq = 7Point toq = 4Parent node :fa[7] = 1; - take
q = 4node and Newly splitnq = 7node link :fa[4] = 7; - go back to The purpose at the beginning , Find a new one Legal link points :
fa[6] = 7;
// The reconstruction nq, q, np Link side of
fa[nq] = fa[q], fa[q] = fa[np] = nq; // Point to q The transfer edge of is changed to point to nq
(2)for Cycle change ch Move the side : because ④ Node No. now only represents character string "aab" 了 , therefore Originally from ② Connect to ④ And from the ① Connect to ④ The side of must change its direction , The specific direction is New nodes ⑦. Corresponding to the above picture :ch[2][b] = 7, ch[1][b] = 7;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}// from q The outgoing transfer edge is copied to nq
(3) establish The transfer edge connected from the new link point ( In and out ), copying :cpy:ch[7][a] = 5;
memcpy(ch[nq], ch[q], sizeof ch[q]);
Follow the Model It is convenient to understand and memorize code :
below Illegal process It can be simulated on paper :
Be careful :
Can prove that , The length is
nString , At most2n - 1Points and3n - 4side . for exampleabb…bb.The complexity of time and space is
O(n)When building a link tree , Both points and edges are open
2nSpacenode
7and9In order to Meet the legitimacy of substring set links And created , It does not affect the number of occurrences of the substring , thereforecnt[7,9] = 0
SAM a key :
Code board :(extend function )
void extend(int c) {
int p = np;
np = ++tot;
len[np] = len[p] + 1, cnt[np] = 1;
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
if (!p) {
fa[np] = 1;
}
else {
int q = ch[p][c];
if (len[q] == len[p] + 1) fa[np] = q;
else {
int nq = ++tot;
len[nq] = len[p] + 1;
fa[nq] = fa[q], fa[q] = fa[np] = nq;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
}
Example 【 Templates 】 Postfix automaton (SAM)
Title Description
Given a string containing only lowercase letters S S S.
Please find out S S S All occurrences of are not 1 1 1 Multiply the number of occurrences of the substring by the maximum length of the substring .
Input format
A string containing only lowercase letters on a line S S S.
Output format
An integer , For the answer .
Examples #1
The sample input #1
abab
Sample output #1
4
Tips
about 10 % 10 \% 10% The data of , ∣ S ∣ ≤ 1000 \lvert S \rvert \le 1000 ∣S∣≤1000.
about 100 % 100\% 100% The data of , 1 ≤ ∣ S ∣ ≤ 10 6 1 \le \lvert S \rvert \le {10}^6 1≤∣S∣≤106.
Ideas :
Building suffix automata , Then use depth first traversal to make statistics on the suffix link tree .
Code :( The board )
#include <bits/stdc++.h>
using namespace std;
//#define map unordered_map
//#define int long long
const int N = 1e6 + 10;
typedef long long ll;
char s[N];
int tot = 1, np = 1;
int fa[N << 1], ch[N << 1][26], len[N << 1];
ll cnt[N << 1], ans;
vector<int> g[N << 1];
void extend(int c) {
int p = np;
np = ++tot;
len[np] = len[p] + 1, cnt[np] = 1;
while (p && !ch[p][c]) {
ch[p][c] = np;
p = fa[p];
}
if (!p) {
fa[np] = 1;
}
else {
int q = ch[p][c];
if (len[q] == len[p] + 1) fa[np] = q;
else {
int nq = ++tot;
len[nq] = len[p] + 1;
fa[nq] = fa[q], fa[q] = fa[np] = nq;
while (p && ch[p][c] == q) {
ch[p][c] = nq;
p = fa[p];
}
memcpy(ch[nq], ch[q], sizeof ch[q]);
}
}
}
void dfs(int u) {
for (auto v : g[u]) {
dfs(v);
cnt[u] += cnt[v];
}
if (cnt[u] > 1) ans = max(ans, cnt[u] * len[u]);
}
signed main()
{
scanf("%s", s);
for (int i = 0; s[i]; ++i) {
extend(s[i] - 'a');
}
for (int i = 2; i <= tot; ++i) {
g[fa[i]].push_back(i);
}
dfs(1);
cout << ans << '\n';
return 0;
}
边栏推荐
- buck电路boot电容短路和断路实测波形
- vue-router路由缓存
- Cvpr2021 | multi view stereo matching based on self supervised learning (cvpr2021)
- After three years of outsourcing, the salary of automatic testing after job hopping is twice that of the original. The secret is
- Use vscode to configure Mysql to realize connection, query, and other functions
- Why does ETL often become ELT or even let?
- [cf1054h] epic Revolution -- number theory, convolution, arbitrary modulus NTT
- MySQL queries are case sensitive
- When NPM is installed, it is stuck. There are five solutions
- npm install 时,卡住不动,五种解决方法
猜你喜欢

Ansible中的变量及加密

Kubernetes (五) ---------部署 Kubernetes Dashboard

JS chicken laying eggs and egg laying chickens. Who appeared earlier, object or function? Is function an instance of function?

CAN&CANFD综合测试分析软件LKMaster与PCAN-Explorer 6分析软件的优势对比

数组的子集能否累加出K

H3C_ Using setting default static routing priority to realize the active and standby function of export dual lines

Connecting PHP 7.4 to Oracle configuration on Windows

Vscode remote debugging PHP solution through remotessh and Xdebug

MVFuseNet:Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of

Flink实时仓库-DWD层(交易域-加购维度退化处理)模板代码
随机推荐
Google fragmented notes JWT (Draft)
route的meta配置项
When NPM is installed, it is stuck. There are five solutions
Improved pillar with fine grained feature for 3D object detection paper notes
Redis Basics
怎么会不喜欢呢,CICD中轻松发送邮件
MVFuseNet:Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of
ERROR 1045 (28000) Access denied for user ‘root‘@‘localhost‘解决方法
gin 路由,参数,输出
spark学习笔记(七)——sparkcore核心编程-RDD序列化/依赖关系/持久化/分区器/累加器/广播变量
js中break与continue和return关键字
MySQL queries are case sensitive
Homebrew brew update doesn't respond for a long time (or stuck in updating homebrew...)
MySQL advanced (Advanced) SQL statement (I)
[solution] error: lib/bridge_ generated. dart:837:9: Error: The parameter ‘ptr‘ of the method ‘FlutterRustB
win11VMware打开虚拟机就蓝屏重启以及启动不了的解决方案
[C language brush leetcode] 2332. The latest time to get on the bus (m)
[cf1054h] epic Revolution -- number theory, convolution, arbitrary modulus NTT
[redis] redis development specifications and precautions
JS chicken laying eggs and egg laying chickens. Who appeared earlier, object or function? Is function an instance of function?