# **【学习笔记】字符串—广义后缀自动机** [$\mathcal{My}\ \mathcal{Blog}$](https://www.cnblogs.com/Xing-Ling/p/12038349.html) $update\ 2021.11.27:$ 时隔多年回来补坑，有了额外的发现，并给出证明。【离线写法再探】阐述离线 $\text{dfs}$ 错误原因，证明离线 $\text{bfs}$ 正确性；阐述在线写法特判 $1$ 的本质以及离线在线写法的之间联系；修复手绘图的一个小失误。 $update\ 2020.8.13:$ 这个蒟蒻发现自己傻得不行，明明很简单的东西一直没扯清楚，所以立马来补锅了，顺便思考了如何 $\text{hack}$ 盗版。更新更细致的复杂度讲解；添加卡掉盗版在线构造的方法，并对空节点性质进行深入研究；调整板块布局。 $updata\ 2020.7.13:$ ~~迫于ezoixx130的淫威跑来修锅啦。~~ 更新更侑秀的在线构造正确写法；增修文章细节；$\text{Latex}$ 规范化；并添加两道例题。 $update\ 2020.3.3:$ 发现题库里出现了模板，决定添加两道例题，并对文章细节进行修改。 ## **一：【前言】** 最近一段时间都在研究 **惊(~~Ren~~)艳(~~Lei~~)无(~~Zhi~~)比(~~Hui~~)、美(~~Li~~)妙(~~Xing~~)绝(~~Yu~~)伦(~~Yue~~)** 的自动机，这里引用 [$\text{bztMinamoto}$](https://www.cnblogs.com/bztMinamoto/) 巨佬的一句话来表达此时的心情： > 我感觉我整个人都自动机了…… ——[$bztMinamoto$](https://www.cnblogs.com/bztMinamoto/)（[回文自动机学习笔记](https://www.cnblogs.com/bztMinamoto/p/9630617.html)）在此过程中发现网上讲广义 $\text{SAM}$ 的文章很少，而且很多都不正确，所以决定整理一下。 ## **二：【引理】** 众所周知，$\text{SAM}$ 的一个经典应用是求一个字符串中本质不同子串数量，那么如果改为求一个 $\text{Trie}$ 树呢？（$\text{Trie}$ 中从上到下若干前缀串的本质不同子串） > 大部分可以用后缀自动机处理的字符串的问题均可扩展到 $Trie$ 树上。 ——刘研绎 ($2015$ 国家队论文《后缀自动机在字典树上的拓展》) 我们将这种建立在 $\text{Trie}$ 树上的 $\text{SAM}$ 称为广义 $\text{SAM}$ 。在学习之前，首先要确保对[单串 $\text{SAM}$](https://www.cnblogs.com/zjp-shadow/p/9218214.html#autoid-4-1-0) 足够熟悉。其实我们通常需要解决的是**多模式串问题**，即给出多个串让你去统计各种各样的信息（将多个串插入到一棵 $\text{Trie}$ 中，然后依靠这棵 $\text{Trie}$ 构造广义 $\text{SAM}$）。可能少部分题目会有直接给出一棵 $\text{Trie}$ 树的情况，但不常见。 **本文主要解决前一类问题**，后者仅给出一种构造方法（即 $bfs$ 版离线写法），不详述其应用。注意这里两种类型题目中 $\text{Trie}$ 树有不同的性质：对于**多模式串问题**：$G(T)=O(\sum len)=O(|T|)$，对于直接给出的 $\text{Trie}$：$G(T)=O(|T|^2)$（如果不理解这个 $|T|^2$ 可以看下面那张嫖来的图片）。（其中 $G(T)$ 为 $\text{Trie}$ 树 $T$ 所有叶节点深度之和，$|T|$ 为 $\text{Trie}$ 树大小） $G(T)$ 这个东西看起来似乎没啥用处，但它会直接影响构造广义 $\text{SAM}$ 的算法复杂度。 ## **三：【算法实现】** ### **1.【离线构造】** 在用广义 $\text{SAM}$ 处理**多模式串问题**时，网上流传着的**主流**写法有 $3$ 种： $(1).$ 用特殊符号将所有模式串连成一个大串放到一个 $\text{SAM}$ 中，再加一些玄学判断来处理信息。 $(2).$ 每次插入一个模式串之前，都把 $last$ 设为 $1$，按照普通 $\text{SAM}$ 一样插入，即每个字符串都从起点 $1$ 开始重新构造。 $(3).$ 用所有模式串建出一颗 $\text{Trie}$ 树，对其进行 $\text{dfs/bfs}$ 遍历构建 $\text{SAM}$，$insert$ 时使 $last$ 为它在 $\text{Trie}$ 树上的父亲，其余和普通 $\text{SAM}$ 一样。第一种实用性不高且复杂度危险。第二种机房大佬说是盗版，但因为复杂度依旧为线性、代码简单且在大部分题中都能保证正确性，所以很多人都用的这种（[$\text{SAM Drawer}$](https://yutong.site/sam/) 似乎就是依据这个做法绘的图）。但根据广义 $\text{SAM}$ 的定义，只有第三种中才是正确写法。而且随便抛组数据就能立马发现构造出来的差异。 $\text{dfs}$ 代码如下： ```cpp //Trie.tr[x]: Trie树的状态转移数组 //Trie.c[x]: Trie树上节点x的字符 //pos[x]:Trie上x节点的前缀字符串（路径根->x 所表示的字符串）在SAM上的对应节点编号 inline void dfs(Re x){ for(Re i=0,to;i<26;++i)if(to=T1.tr[x][i]) pos[to]=insert(T1.c[to],pos[x]),dfs(to); } inline void build(){pos[1]=1,dfs(1);}//dfs遍历Trie树构造广义SAM(Tire树上的根1在SAM上的位置为根1) ``` $bfs$ 代码如下： ```cpp //Trie.tr[x]: Trie树的状态转移数组 //Trie.fa[x]: Trie树上节点x的父节点 //Trie.c[x]: Trie树上节点x的字符 //pos[x]:Trie上x节点的前缀字符串（路径根->x 所表示的字符串）在SAM上的对应节点编号 inline void build(){//bfs遍历Trie树构造广义SAM for(Re i=0;ifa[x]] for(Re i=0;i 注意：$\text{dfs}$ 遍历的复杂度为 $O(G(T))$，$\text{bfs}$ 为 $O(|T|)$ 。如果题目给的是若干个待插入串 $S$，那么 $\text{dfs/bfs}$ 可以任选一种，因为此时 $O(G(T))=O(\sum|S|)$。但要是直接给了一颗 $\text{Trie}$，$\text{dfs}$ 就会被卡。关于 $G(T)=O(|T|^2)$ 的证明，这里嫖一张图： [【图片来源】](https://www.luogu.com.cn/paste/3oq8xx3b) ![](https://cdn.luogu.com.cn/upload/image_hosting/1d6n2trm.png) ### **2.【在线构造】** 仅针对于**多模式串问题**，我们还有另一种构造广义 $\text{SAM}$ 的方法。 “离线”，顾名思义，对多个模式串离线构造出 $\text{Trie}$ 树，然后依据 $\text{Trie}$ 构造广义 $\text{SAM}$ 。而“在线”就是指不建立 $\text{Trie}$，直接把给出的多个模式串依次插入到广义 $\text{SAM}$ 中（用在线做法写正确的人少得可怜）。具体的说，每次插入一个模式串之前，都把 $last$ 设为 $1$，$insert$ 函数**在普通** $\text{SAM}$ **的基础上加入特判**（注意前面说的盗版写法用的是**不加特判的普通** $insert$）。更改后的 $insert$ 代码如下： ```cpp //link[i]: 后缀链接 //trans[i]: 状态转移数组 inline int insert(Re ch,Re last){//将ch[now]接到last后面 if(trans[last][ch]&&maxlen[last]+1==maxlen[trans[last][ch]])return trans[last][ch]; //已经存在需要的节点(特判1) Re x,y,z=++O,p=last,flag=0;maxlen[z]=maxlen[last]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])link[z]=x; else{//需要拆分x，将len<=maxlen[p]+1的部分复制一个y出来 if(maxlen[p]+1==maxlen[z]/*或者写：p==last*/)flag=1;(特判2) y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i （注：下面这段话的意义不大，而且可能会看得一脸懵逼，可以直接略过） **额外思考**：其实上述内容并不是产生空节点 $z$ 的唯一情况。如果 $\text{SAM}$ 已经被空节点污染，且对于前面 $trans[p][ch]=NULL$ 的段 $p$ 均为空节点，那么此时 $z$ 也一定为空。比如这个数据 `dcab ab`，在插入串 $ab$ 的第二个字符 $b$ 时，$last$ 为上一次 $insert(a)$ 时产生的空节点 $6$，而 $6$ 目前还不存在 $trans$ 边（即$trans[last=6][ch=b]=NULL$），但此时新建的 $z$（即 $8$ 号节点）仍为空，且之前的空节点 $6$ 有一条指向 $8$ 的 $trans$ 边。具体可自行画图加深理解。（请到下方例题处抱走std，然后使用代码输出自动机的边再画到纸上，最好把加/不加特判最终产生的各种形态都试一下看看。但不推荐自己模拟绘图，因为工作量大且极易出错）回到**空节点**的问题，一般来讲，这个点不会对答案造成影响，但也有办法能卡掉，具体见下方【关于如何卡掉盗版在线写法】。另外，我们也可以用 $minlen,maxlen$ 的大小来推导出 $z$ 为空： > $z$ 的 $link$ 会指向 $x$ 的**拆分节点** $y$，而 $maxlen[y]=maxlen[last]+1$，所以 $minlen[z]=maxlen[link[z]=y]+1=maxlen[last]+2$，又有 $maxlen[z]=maxlen[last]+1 疑问：在线写法和离线写法有什么不同呢？见后文4.【离线写法再探】。 ### **3.【关于如何卡掉盗版在线写法】** 这里讨论**不加特判**的在线写法。通常情况下，这种写法只是多了一些节点，多了一些 $trans$ 边和 $link$ 边，**它仍是一只正确的自动姬，复杂度也依旧为线性**（所以盗版写法才会横行啊......）。但这样显然就不符合 $\text{SAM}$ “**用最少的节点储存所有串信息**”这一性质了，具体地说，有以下两种情况： - **一个等价类被拆成若干个节点**，子串信息被分散。 - 出现**空节点**（即 $z$）。 **已知后者会在某些情况下产生影响，前者还有待探讨。** 截止 $2020.8.13$，我只找到了两种方案（没有写代码逐个测试，如果您认为分析有误，最好给一下代码和 $\text{hack}$数据说明）。先来罗列一下**空节点** $z$ 的性质： - $(1)$ 其 $trans$ 边指向的节点也一定是空节点（$z$ 本身就为空了，继续加字符是没有意义的）。 - $(2)$ 其 $link$ 指向 $y$，且没有节点的 $link$ 指向 $z$，故 $z$ 在 $parent$ 树上是叶子节点。 - $(3)$ $maxlen[z]=maxlen[last]+1=maxlen[y],$ $minlen[z]=maxlen[z]+1$（由 $link$ 边的指向推导得到）。 - $(4)$ 在新建节点时，$z$ 比 $y$ 先出现，所以节点编号 $z=1;--i)siz[link[Q[i]]]+=siz[Q[i]]; ``` 如果出现了空节点 $z$，由于 $maxlen[z]=maxlen[y]$ 且 $z **额外思考**： > 考虑这里`iod od`产生的“情况 $2$”的原因是：开始插入 `od` 的第一个字母'o'时，$last$ 为根节点 $1$，此时产生了情况 $2$。 > 那么，如果有公共前缀呢？ > 举个例子：`aiod aod`，你会发现此时 $\text{dfs}$ 是正确的（$\text{SAM}$ 结构和离线 $\text{bfs}$ 以及在线写法都一样），这是因为插入第二个串的字符'o'时，$trans[last=pos[a]][o]=NULL$（这里的 $a$ 指的是字符'a'所对应的 $\text{Trie}$ 点）。 > 由此自然提出一个猜想： > **猜想** $1$：存在公共前缀的两个串在建 $\text{SAM}$ 时不会在某个位置出现情况 $2$，此时 $\text{dfs}$ 写法正确。 > 不过，该猜想随便写一个暴力对拍就能轻松 $\text{hack}$。 > 反例：`ood od`（与前面提到的`aab ab`本质一样）。这个反例告诉我们一件事情：情况 $2$ 在离线 $\text{dfs}$ 写法中普遍存在。 **【如何改进？】** 显然我们可以把在线写法里的最终版 $insert$ 直接搬过来。这样做当然是正确的，但是，由这里 *“它看起来就像是个没有特判 $1$、特判 $2$ 的在线写法”* 得到启发，我们应该注意到一个很有意思的点： **猜想 $2$：在线写法中的特判 $1$，对应着离线写法中的 $\text{Trie}$ 结构建造。也就是说，只加特判 $1$ 的在线写法和不加任何特判的离线 $\text{dfs}$ 写法类似。**（事实上这一直觉是正确的，具体证明见后文） > 注：换句话说，离线 $\text{dfs}$ 其实是介于“离线 $\text{bfs}$/最终版在线”和“未加任何特判的在线”之间的一种折中写法，一种“特判不全的在线”，它解决了特判 $1$ 的情况（实现 $\text{Trie}$ 结构意义上的压缩节点），但没有解决特判 $2$ 的情况（多串 $\text{SAM}$ 结构中空节点的压缩）。这样我们就理解了离线 $\text{dfs}$ 的本质，只需要加特判 $2$ 就可以了，特判 $1$ 是没有意义的（$\text{Trie}$ 结构已经预处理好了）。（当然，这只是解决了正确性的问题，复杂度肯定还是不如 $\text{bfs}$ 的）离线 $\text{dfs}$ 代码改进版如下： ```cpp inline int insert(Re ch,Re last){//普通SAM添加特判2 if(trans[last][ch]){//不存在特判1的情况 Re p=last,x=trans[p][ch]; Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[x]=y; return y;//即最初的特判2 } Re x,y,z=++O,p=last;maxlen[z]=maxlen[last]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])link[z]=x; else{ y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[z]=link[x]=y; } } return z; } ``` #### **(2).【bfs 为何正确】** $\text{bfs}$ 代码也是使用的普通 $insert$，没加任何特判，为什么它就没有像 $\text{dfs}$ 那样出问题呢？关于特判 $1$ 的情况，前面已经讨论过，由 $\text{Trie}$ 结构所解决。关于情况 $2$：先说一下两个记号表达： $deep(a)$：$\text{Trie}$ 点 $a$ 在 $\text{Trie}$ 树上的深度。 $S_{root_{Trie}->a}$：从 $\text{Trie}$ 树根 $root_{Trie}$ 到 $\text{Trie}$ 点 $a$ 这一条长为 $deep(a)$ 的路径所表示的字符串（即 $a$ 在 $\text{Trie}$ 上的前缀字符串）。 > **引理**：对于任何 $\text{Trie}$ 点 $a$，一定有 $maxlen[pos[a]]=deep(a)$ 。 > **证明**： > ①$maxlen[pos[a]]a}$，而 $|S_{root_{Trie}->a}|=deep(a)$ 。 > ②**假设** $maxlen[pos[a]]>deep(a)$，也就是说，有另一点 $b$，$deep(b)>deep(a)$，$S_{root_{Trie}->b}$ 也在该等价类中，即 $pos[b]=pos[a]$ 。 > 但是，我们使用的是普通 $insert$ 代码，每一个 $a$ 对应的 $pos[a]$ 都是新建的点，所以 $pos[a]$ 不可能等于 $pos[b]$ 。 > **故假设不成立**。 $\text{bfs}$ 的特点是一层一层遍历，存在这样一个性质：当前 $\text{Trie}$ 点 $a$ 的深度 $deep(a)=deep(fa_{Trie}(a))+1$ 为目前所遍历过的最深的深度。在准备将 $\text{Trie}$ 点 $a$ 插入 $\text{SAM}$ 的时，$last$ 为 $pos[fa_{Trie}(a)]$ 。此时一定有如下结论： > **结论**：$trans[last][ch]= NULL$。 > **证明**：**假设**存在 $x=trans[last][ch]\neq NULL$，则有 $maxlen[x]> maxlen[last]$（转移边指出去的点的长度不可能小于等于自身）。 > 又由引理得：$maxlen[x]> maxlen[last]=deep(fa_{Trie}(a))$ ——① > 在等价类 $x$ 中取最长的那个字符串 $s_{maxlen}(x)$ 在 $\text{Trie}$ 上对应的末节点 $a'$（显然 $s_{maxlen}(x)$ 就是 $S_{root_{Trie}->a'}$），有 $pos[a']=x$， > 由引理及①得：$deep(a')=maxlen[pos[a']]=maxlen[x]>deep(fa_{Trie}(a))$， > 则必满足 $maxlen[x]=deep(a')=deep(fa_{Trie}(a))+1$ ——②（目前最深的点深度只能是这么大）， > 所以 $deep(a')=deep(a)$ 。 > 注意我们是沿着 $S_{root_{Trie}->fa_{Trie}(a)}$ 的每一个字符在自动机上一路走到 $last$ 的，$s_{maxlen}(last)$ 即为 $S_{root_{Trie}->fa_{Trie}(a)}$，且字符串 $SS=s_{maxlen}(last)+ch$ 必定在等价类 $x$ 中。 > 这里的 $SS$ 实际上等于 $S_{root_{Trie}->a}$，所以 $|SS|=deep(a)=deep(a')=|S_{root_{Trie}->a'}|$，由于字符串 $S_{root_{Trie}->a'}$ 也在该等价类中，所以 $S_{root_{Trie}->a'}=SS=S_{root_{Trie}->a}$（同一等价类里一种长度只对应一种字符串）。 > 由于此时 $a$ 目前还在准备插入阶段，实际还没有对应的 $\text{SAM}$ 的点，所以 $a'$ 与 a 为不同的 $\text{Trie}$ 点。也就是说，$\text{Trie}$ 树上出现了两个不同的点对应着完全相同的前缀字符串，这是 $\text{Trie}$ 结构所不允许的。 **故假设不成立**。 > 另外，由引理及②可得 $maxlen[x]=deep(fa_{Trie}(a))+1=maxlen[last]+1$，发现我们的假设其实就是特判 $1$ 所判断的情况。 > 所以，**前文所说的“特判 $1$ 由 $\text{Trie}$ 结构所解决”也得到了证明**。由此证明离线 $\text{bfs}$ 不需要写上述两个特判。 ## **四：【广义SAM的复杂度】** 设 $|T|$ 为 $\text{Trie}$ 树大小，$|A|$ 为字符集大小（可视为常数），$G(T)$ 为 $\text{Trie}$ 树所有叶节点深度之和。 - 状态数（节点数）为线性 $O(2|T|)$ 。 - 转移函数（边数）上界为 $O(|T||A|)$ 。 - 离线时间复杂度为 $O(|T||A|+|T|)$ 。 - 在线时间复杂度为 $O(|T||A|+G(T))$ 。上述性质在刘研绎的论文都中有严谨证明，这里不赘述。有趣的是，实际运行效率在线构造（即使是不够侑秀的写法）要比离线快得多。 ## **五：【例题】** （由于代码较多，可能会显得较冗长，但广义 $\text{SAM}$ 的写法具有争议，在各种题目中都能见到一些奇怪的做法，所以我还是把代码放出来供大家参考一下） ### **1.【广义 SAM 模板】** 传送门：[【模板】广义后缀自动机（广义 $\text{SAM}$） $\text{[P6139]}$](https://www.luogu.com.cn/problem/P6139) #### **【题目描述】** 求多个字符串的本质不同子串个数。 #### **【分析】** 随便选一种方式建好自动机，答案为：$\sum maxlen[i]-maxlen[link[i]]$ 。 #### **【Code (离线)】** ```cpp #include #include #include #define Re register int #define LL long long using namespace std; const int N=2e6+5,M=1e6+3; int n,t;char ch[N]; inline void in(Re &x){ int fu=0;x=0;char c=getchar(); while(c<'0'||c>'9')fu|=c=='-',c=getchar(); while(c>='0'&&c<='9')x=(x<<1)+(x<<3)+(c^48),c=getchar(); x=fu?-x:x; } struct Trie{ int O,c[M],fa[M],tr[M][26]; //fa[x]: Trie树上x的父节点 //c[x]: Trie树上x的颜色 Trie(){O=1;}//根初始化为1 inline void insert(char ch[]){ Re p=1; for(Re i=1;ch[i];++i){ Re a=ch[i]-'a'; if(!tr[p][a])tr[p][a]=++O,fa[O]=p,c[O]=a; p=tr[p][a]; } } }T1; struct Suffix_Automaton{ int O,pos[N],link[N],maxlen[N],trans[N][26];queueQ; //pos[x]:Trie上的x节点（路径1->x所表示的字符串）在SAM上的对应节点编号 //link[i]: 后缀链接 //trans[i]: 状态转移数组 Suffix_Automaton(){O=1;}//根初始化为1 inline int insert(Re ch,Re last){//和普通SAM一样 Re x,y,z=++O,p=last;maxlen[z]=maxlen[last]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])link[z]=x; else{ y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[z]=link[x]=y; } } return z; } inline void build(){//bfs遍历Trie树构造广义SAM for(Re i=0;i<26;++i)if(T1.tr[1][i])Q.push(T1.tr[1][i]);//插入第一层字符 pos[1]=1;//Tire树上的根1在SAM上的位置为根1 while(!Q.empty()){ Re x=Q.front();Q.pop(); pos[x]=insert(T1.c[x],pos[T1.fa[x]]);//注意是pos[Trie->fa[x]] for(Re i=0;i<26;++i)if(T1.tr[x][i])Q.push(T1.tr[x][i]); } } inline void sakura(){ LL ans=0; for(Re i=2;i<=O;++i)ans+=maxlen[i]-maxlen[link[i]]; printf("%lld\n",ans); } }SAM; int main(){ // freopen("123.txt","r",stdin); in(n); for(Re i=1;i<=n;++i)scanf("%s",ch+1),T1.insert(ch); SAM.build(),SAM.sakura(); } ``` #### **【Code (在线)】** ```cpp #include #include #include #define Re register int #define LL long long using namespace std; const int N=2e6+5; int n;char ch[N]; inline void in(Re &x){ int fu=0;x=0;char c=getchar(); while(c<'0'||c>'9')fu|=c=='-',c=getchar(); while(c>='0'&&c<='9')x=(x<<1)+(x<<3)+(c^48),c=getchar(); x=fu?-x:x; } struct Suffix_Automaton{ int O,link[N],maxlen[N],trans[N][26]; //link[i]: 后缀链接 //trans[i]: 状态转移数组 Suffix_Automaton(){O=1;}//根初始化为1 inline int insert(Re ch,Re last){ if(trans[last][ch]){ Re p=last,x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])return x;//即最初的特判1 else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[x]=y; return y;//即最初的特判2 } } Re z=++O,p=last;maxlen[z]=maxlen[last]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ Re x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])link[z]=x; else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[z]=link[x]=y; } } return z; } inline void sakura(){ LL ans=0; for(Re i=2;i<=O;++i)ans+=maxlen[i]-maxlen[link[i]]; printf("%lld\n",ans); } }SAM; int main(){ // freopen("123.txt","r",stdin); in(n); for(Re i=1;i<=n;++i){ scanf("%s",ch+1);Re last=1; for(Re j=1;ch[j];++j)last=SAM.insert(ch[j]-'a',last); } SAM.sakura(); } ``` ### **2.【分别维护不同串的 siz】** 传送门：[找相同字符 $\text{[P3181]}$](https://www.luogu.com.cn/problem/P3181) #### **【题目描述】** 求两个字符串的相同子串数量。 #### **【分析】** 如上黑体字所说，两个串的 $|endpos|$ 要分开计算，可以开一个二维数组，用 $siz[x][id]$ 表示节点 $x$ 在串 $id$ 上的 $\{endpos\}$ 大小。则答案为：$\sum siz[i][0]\times siz[i][1]\times (maxlen[i]-maxlen[link[i]])$ 。 #### **【Code (离线)】** 求 $siz$ 用离线做法貌似会麻烦一点，要在 $\text{Trie}$ 树上记录不同字符串的信息，等啥时候~~心情好了~~有空了再回来填坑吧。 #### **【Code (在线)】** ```cpp #include #include #include #define Re register int #define LL long long using namespace std; const int N=8e5+5; char ch[200003];LL ans; inline void in(Re &x){ int fu=0;x=0;char c=getchar(); while(c<'0'||c>'9')fu|=c=='-',c=getchar(); while(c>='0'&&c<='9')x=(x<<1)+(x<<3)+(c^48),c=getchar(); x=fu?-x:x; } struct Suffix_Automaton{ int O,ru[N],link[N],maxlen[N],siz[N][2],trans[N][26];queueQ; //siz[x]: |endpos[x]| 即节点x的endpos大小 Suffix_Automaton(){O=1;} inline int insert(Re ch,Re last,Re id){ if(trans[last][ch]){ Re p=last,x=trans[p][ch]; if(maxlen[p]+1==maxlen[x]){siz[x][id]=1;return x;} else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[x]=y; siz[y][id]=1;return y; } } Re z=++O,p=last;maxlen[z]=maxlen[last]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ Re x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])link[z]=x; else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[z]=link[x]=y; } } siz[z][id]=1; return z; } inline void sakura(){ for(Re i=2;i<=O;++i)++ru[link[i]]; for(Re i=1;i<=O;++i)if(!ru[i])Q.push(i); while(!Q.empty()){ Re x=Q.front();Q.pop(); siz[link[x]][0]+=siz[x][0];//分开更新 siz[link[x]][1]+=siz[x][1]; if(!(--ru[link[x]]))Q.push(link[x]); } for(Re i=2;i<=O;++i)//统计答案 ans+=(LL)siz[i][0]*siz[i][1]*(maxlen[i]-maxlen[link[i]]); printf("%lld\n",ans); } }SAM; int main(){ // freopen("123.txt","r",stdin); for(Re i=0;i<2;++i){ scanf("%s",ch+1);Re last=1; for(Re j=1;ch[j];++j)last=SAM.insert(ch[j]-'a',last,i); } SAM.sakura(); } ``` ### **3.【线段树合并维护 siz】** 传送门：[$\text{Forensic Examination}$](https://www.luogu.com.cn/problem/CF666E) [$\text{[CF666E]}$](codeforces.com/problemset/problem/666/E) #### **【题目描述】** 给出主串 $S$ 以及 $m$ 个字符串 $T[1..m]$ 。有若干次询问，每次查询 $S$ 的子串 $S[p_l..p_r]$ 在 $T[l..r]$ 中的哪个串 $T_{i}$ 里的出现次数最多，输出 $i$ 以及出现次数，有多解则取最靠前的那一个。 #### **【分析】** 先把所有字符串都插入到广义 $\text{SAM}$ 中，对于每个节点开一颗下标为 $[1,m]$ 的动态开点线段树维护 $siz$（注意插入 $S$ 时就不要在线段树上进行修改操作了）。由于 $siz$ 的维护是统计子树和，所以插入结束后要在 $parent$ 树上跑一下线段树合并。查询时先在 $parent$ 树上倍增找到包含子串 $S[p_l,p_r]$ 的等价类状态节点，然后在该点的线段树上查询区间 $[l,r]$ 中的最大值，顺便维护下最大值所处位置即可。 #### **【Code (离线)】** 同上，需要记录 $siz$ 的离线做法先咕着。 #### **【Code (在线)】** ```cpp #include #include #include #include #include #define LL long long #define Re register int using namespace std; const int N=5e5+3,M=5e4+3,logN=21; int n,m,x,y,l,r,T,pos[N];char s[N],ch[M]; inline void in(Re &x){ int f=0;x=0;char c=getchar(); while(c<'0'||c>'9')f|=c=='-',c=getchar(); while(c>='0'&&c<='9')x=(x<<1)+(x<<3)+(c^48),c=getchar(); x=f?-x:x; } struct QWQ{ int x,id;QWQ(Re X=0,Re ID=0){x=X,id=ID;} inline bool operator>(const QWQ &O)const{return x!=O.x?x>O.x:idB?A:B;} int pt[N+M<<1]; struct Segment_Tree{ #define pl (tr[p].lp) #define pr (tr[p].rp) #define mid ((L+R)>>1) int O; struct QAQ{int lp,rp;QWQ ans;}tr[(M<<1)*30]; inline void pushup(Re p){ tr[p].ans=max(tr[pl].ans,tr[pr].ans); } inline void change(Re &p,Re L,Re R,Re x){ if(!p)p=++O; if(L==R){++tr[p].ans.x,tr[p].ans.id=L;return;} if(x<=mid)change(pl,L,mid,x); else change(pr,mid+1,R,x); pushup(p); } inline int merge(Re p,Re q,Re L,Re R){ if(!p||!q)return p+q; Re x=++O; if(L==R){tr[x]=tr[p],tr[x].ans.x+=tr[q].ans.x;return x;} tr[x].lp=merge(pl,tr[q].lp,L,mid); tr[x].rp=merge(pr,tr[q].rp,mid+1,R); pushup(x);return x; } inline QWQ ask(Re p,Re L,Re R,Re l,Re r){ if(!p)return QWQ(0,m+1); if(l<=L&&R<=r)return tr[p].ans; QWQ ans=QWQ(0,m+1); if(l<=mid)ans=max(ans,ask(pl,L,mid,l,r)); if(r>mid)ans=max(ans,ask(pr,mid+1,R,l,r)); return ans; } }TR; struct Suffix_Automaton{ int O,link[N+M<<1],maxlen[N+M<<1],trans[N+M<<1][26]; Suffix_Automaton(){O=1;} inline int insert(Re ch,Re last,Re id){ if(trans[last][ch]){ Re p=last,x=trans[p][ch]; if(maxlen[p]+1==maxlen[x]){if(id)TR.change(pt[x],1,m,id);return x;} else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[x]=y; if(id)TR.change(pt[y],1,m,id); return y; } } Re z=++O,p=last;maxlen[z]=maxlen[p]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ Re x=trans[p][ch]; if(maxlen[x]==maxlen[p]+1)link[z]=x; else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<26;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[x]=link[z]=y; } } if(id)TR.change(pt[z],1,m,id); return z; } int o,deep[N+M<<1],head[N+M<<1],ant[N+M<<1][23]; struct QAQ{int to,next;}a[N+M<<1]; inline void add(Re x,Re y){a[++o].to=y,a[o].next=head[x],head[x]=o;} inline void dfs(Re x,Re fa){ deep[x]=deep[ant[x][0]=fa]+1; for(Re i=1;(1<=0;--i)if(ant[p][i]&&maxlen[ant[p][i]]>=len)p=ant[p][i]; return p; } inline void sakura(Re l,Re r,Re x,Re y){ QWQ ans=TR.ask(pt[get(y,y-x+1)],1,m,l,r); if(ans.x==0)ans.id=l; printf("%d %d\n",ans.id,ans.x); } }SAM; int main(){ // freopen("123.txt","r",stdin); scanf("%s",s+1),n=strlen(s+1),in(m); for(Re i=1;i<=m;++i){ scanf("%s",ch+1);Re last=1; for(Re j=1;ch[j];++j)last=SAM.insert(ch[j]-'a',last,i); } for(Re i=1,last=1;i<=n;++i)pos[i]=last=SAM.insert(s[i]-'a',last,0); SAM.build(),in(T); while(T--)in(l),in(r),in(x),in(y),SAM.sakura(l,r,x,y); } ``` ### **4.【树上本质不同路径数】** 传送门：[诸神眷顾的幻想乡 $\text{[ZJOI2015] [P3346]}$](https://www.luogu.com.cn/problem/P3346) [$\text{[Bzoj3926]}$](https://www.lydsy.com/JudgeOnline/problem.php?id=3926) #### **【题目描述】** 给出一颗叶子结点不超过 $20$ 个的无根树，每个节点上都有一个不超过 $10$ 的数字，求树上本质不同的路径个数（两条路径相同定义为：其路径上所有节点上的数字依次相连组成的字符串相同）。 #### **【分析】** 首先有一个很麻烦的地方是路径可以拐弯（即两端点分别在其 $lca$ 两个不同儿子节点的子树中），而 $\text{Trie}$ 树和各种自动机在“接受”字符串时都是以根为起点从上往下径直走到底（什么？跳 $parent$ 树？你跳任你跳，跳完还是直的）所以要想办法把路径捋直，瞎 $yy$ 可能不太容易想出来，这里直接抛结论： > 一颗无根树上任意一条路径必定可以在以某个叶节点为根时，变成**一条从上到下的路径**（利于广义 $\text{SAM}$ 的使用）。注意到题目中说叶节点不超过 $20$ 个，这意味着什么？暴力枚举每一个叶节点作为根节点遍历整棵树啊！ **将一共 $cnt_{leaf}$ 颗树中的所有前缀串都抽出来建立广义 $\text{SAM}$，然后直接求本质不同的子串个数。** 其中前缀串定义为从根节点（无根树的某个叶子结点）到任意一个节点的路径所构成的字符串（实际上就是将 $cnt_{leaf}$ 颗 $\text{Trie}$ 树合在了一起跑广义 $\text{SAM}$）。 **注意数组大小和空间限制。** #### **【Code (离线)】** （本题 $\text{Trie}$ 树的构造方法与其他相比较为特别） ```cpp #include #include #include #include #define Re register int #define LL long long using namespace std; const int N=4e6+5,N20=2e6+3,Nn=1e5+3; int n,m,o,x,y,t,C,du[Nn],co[Nn],head[Nn];LL ans; struct QAQ{int to,next;}a[Nn<<1]; inline void add(Re x,Re y){a[++o].to=y,a[o].next=head[x],head[x]=o;} inline void in(Re &x){ int fu=0;x=0;char c=getchar(); while(c<'0'||c>'9')fu|=c=='-',c=getchar(); while(c>='0'&&c<='9')x=(x<<1)+(x<<3)+(c^48),c=getchar(); x=fu?-x:x; } struct Trie{ int O,c[N20],fa[N20],tr[N20][10]; Trie(){O=1;} inline int insert(Re p,Re ch){//在p后面插入一个ch if(!tr[p][ch])tr[p][ch]=++O,c[O]=ch,fa[O]=p; return tr[p][ch]; } }T1; struct Suffix_Automaton{ int O,pos[N],link[N],trans[N][10],maxlen[N];queueQ; Suffix_Automaton(){O=1;} inline int insert(Re ch,Re last){ Re x,y,z=++O,p=last;maxlen[z]=maxlen[last]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])link[z]=x; else{ y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i #include #include #include #define Re register int #define LL long long using namespace std; const int N=4e6+5,N20=2e6+3,Nn=1e5+3; int n,m,o,x,y,t,C,du[Nn],co[Nn],head[Nn];LL ans; struct QAQ{int to,next;}a[Nn<<1]; inline void add(Re x,Re y){a[++o].to=y,a[o].next=head[x],head[x]=o;} inline void in(Re &x){ int fu=0;x=0;char c=getchar(); while(c<'0'||c>'9')fu|=c=='-',c=getchar(); while(c>='0'&&c<='9')x=(x<<1)+(x<<3)+(c^48),c=getchar(); x=fu?-x:x; } struct Suffix_Automaton{ int O,link[N],trans[N][10],maxlen[N]; Suffix_Automaton(){O=1;} inline int insert(Re ch,Re last){ if(trans[last][ch]){ Re p=last,x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])return x; else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<10;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[x]=y; return y; } } Re z=++O,p=last;maxlen[z]=maxlen[last]+1; while(p&&!trans[p][ch])trans[p][ch]=z,p=link[p]; if(!p)link[z]=1; else{ Re x=trans[p][ch]; if(maxlen[p]+1==maxlen[x])link[z]=x; else{ Re y=++O;maxlen[y]=maxlen[p]+1; for(Re i=0;i<10;++i)trans[y][i]=trans[x][i]; while(p&&trans[p][ch]==x)trans[p][ch]=y,p=link[p]; link[y]=link[x],link[z]=link[x]=y; } } return z; } inline void sakura(){ for(Re i=2;i<=O;++i)ans+=maxlen[i]-maxlen[link[i]]; printf("%lld\n",ans); } }SAM; inline void dfs(Re x,Re fa,Re fap){//遍历在线构造SAM Re xp=SAM.insert(co[x],fap);//记录x在SAM上的位置，方便下次直接使用 for(Re i=head[x],to;i;i=a[i].next) if((to=a[i].to)!=fa)dfs(to,x,xp); } int main(){ // freopen("123.txt","r",stdin); in(n),in(C),m=n-1; for(Re i=1;i<=n;++i)in(co[i]); while(m--)in(x),in(y),add(x,y),add(y,x),++du[x],++du[y]; for(Re i=1;i<=n;++i)if(du[i]==1)dfs(i,0,1);//依次把每个叶子节点作为根插入Trie树 SAM.sakura(); } ``` ### **5.【卡空间常数的例子（减少无用节点）】** 传送门：[$\text{Cyclical Quest}$](https://www.luogu.com.cn/problem/CF235C) [$\text{[CF235C]}$](codeforces.com/problemset/problem/235/C) 给出主串 $S$ 和 $n$ 个询问串。对于每个询问串，求出它的所有循环同构在主串中的出现次数总和。做法见 [题解 $\text{by asuldb}$](https://www.luogu.com.cn/blog/asuldb/solution-cf235c) 。由于是暴力非正解，需要疯狂卡空间，如果使用在线做法不加特判 $1,2$（即之前列举出来的盗版做法）会[喜获 $\text{MLE}$](https://codeforces.com/contest/235/submission/82583691) 。加了特判但不处理无用节点 $z$ 可以 [以 $476Mb$ 的好成绩 $\text{AC}$](https://codeforces.com/contest/235/submission/82583743)。使用最终版代码当然[也可以过](https://codeforces.com/contest/235/submission/86714342)，但多用了一丢丢空间，或许是评测姬波动？可 [$\text{CF666E}$](https://www.luogu.com.cn/problem/CF666E) 亦是如此。可能.....无用 $z$ 的个数比较少吧....... ## **六：【后记】** 初学时我在网上找了很久（当时傻乎乎的，看不懂论文），只发现了一篇细讲广义 $\text{SAM}$ 复杂度和正确性的博客（也就是[这个](https://www.luogu.com.cn/paste/3oq8xx3b)），所以无条件相信了里面写的所有东西，并凭借本篇博客又误导了许多其他初学者，深感惭愧。我们嘤该学会独立思考，不要盲目相信别人博客里写的东西啊......（咳咳，本篇也不一定完全正确，若发现有误希望及时指正） ## **七：【参考文献】** - [后缀自动机（$\text{SAM}$）学习笔记](https://www.cnblogs.com/zjp-shadow/p/9218214.html#autoid-4-1-0) - [广义后缀自动机——$\text{HocRiser}$](https://www.cnblogs.com/HocRiser/p/9580478.html) - [广义后缀自动机——饕餮传奇](https://www.luogu.com.cn/paste/3oq8xx3b) - [诸神眷顾的幻想乡题解 $\text{by zcysky}$](https://www.luogu.com.cn/blog/zcysky/solution-p3346) - [$\text{Cyclical Quest}$ 题解 $\text{by asuldb}$](https://www.luogu.com.cn/blog/asuldb/solution-cf235c) - [《后缀自动机在字典树上的拓展》——刘研绎](https://files-cdn.cnblogs.com/files/Xing-Ling/%E5%9B%BD%E5%AE%B6%E9%9B%86%E8%AE%AD%E9%98%9F2015%E8%AE%BA%E6%96%87%E9%9B%86.rar) - 特别感谢 [$\text{Prean}$](https://www.luogu.com.cn/user/160839) 不厌其烦地协助思考证明。