[hihocoder1877]Approximate Matching
/ / 阅读耗时 6 分钟题目描述
String matching, a common problem in DNA sequence analysis and text editing, is to find the occurrences of one certain string (called pattern) in a larger string (called text). In some cases, the pattern is not required to be exactly in the text, and minor differences are acceptable (due to possible typing mistakes). When given a pattern string and a text string, we say pattern P is approximately matched within text S, if there is a substring of S which is at most one letter different from P. Note that the length of this substring and the pattern must be identical. For example, pattern “abb” is approximately matched in text “babc” but not matched in “bbac”.
It is easy to check if a pattern is approximately matched in a text. So your task is to count the number of all text strings of length m in which the given pattern can be approximately matched, and both of the patterns and texts are binary strings in order not to handle big integers.
输入输出格式
输入格式
The first line of input is a single integer $T (1 ≤ T ≤ 666)$, the number of test cases. Each test case begins with a line of two integers $n,m (1 ≤ n,m ≤ 40)$, denoting the length of pattern string and text string. Then a single line of binary string P follows, which denotes the pattern. Note that there will be at most 15 test cases in which n ≥ 16.
输出格式
For each test case, output a single line with one integer, representing the answer.
输入输出样例
Sample input
5
3 4
110
4 7
1011
2 10
00
7 17
1001110
11 22
11101010001
Sample output
12
104
1023
72840
291544
题解
这题好啊。
首先这是一个DP都能看出来,但是想一想状转方程,必须记录后面子串的前$n$位信息才能转移,复杂度高达$O(m2^n)$,对这种数据量来说无论是空间还是时间都会爆。
于是考虑一下AC自动机(本题亮点),要求匹配的数量,只需要先求出不匹配的数量再用总数减去即可。
把这$n+1$个模式串丢进AC自动机,现在要求的就是在AC自动机上找到一种转移方案,使得经过的结点均不经过标记结点,于是就可以在AC自动机上DP了。这个题只需要裸的AC自动机,连fail树都不需要。
由于是二进制串,这里的AC自动机在结构上其实是一个01Trie树。复杂度应该是$O(mn^2)$。1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
using namespace std;
int trie[5000][2], ct, n, m, fail[5000];
bool is[5000];
char op[100];
queue<int> que;
long long dp[5000][50];
inline void insert(const char *s) {//建立01Trie树
int now = 1, i = 0;
while (s[i]) {
if (trie[now][s[i] - '0'])now = trie[now][s[i] - '0'];
else now = trie[now][s[i] - '0'] = ++ct;
++i;
}
is[now] = true;
}
long long DP(int x, int y) {//从x结点开始,不走标记结点,形成长度为y的串的方案数
if (dp[x][y] != -1)return dp[x][y];//记忆化
if (y == 1)return dp[x][y] = !is[x];//最后一步,取决于该结点是否被标记
if (is[x])return 0;
return dp[x][y] = DP(trie[x][0], y - 1) + DP(trie[x][1], y - 1);//向两个方向转移
}
int main() {
int T;
scanf("%d", &T);
while (T--) {
scanf("%d%d%s", &n, &m, op), ct = 1, memset(is, 0, sizeof(is)), memset(trie, 0, sizeof(trie));
memset(dp, -1, sizeof(dp)), insert(op);
for (int i = 0; op[i]; i++) {
op[i] = static_cast<char>(((op[i] - '0') ^ 1) + '0');
insert(op);
op[i] = static_cast<char>(((op[i] - '0') ^ 1) + '0');
}
if (trie[1][0])fail[trie[1][0]] = 1, que.push(trie[1][0]);
else trie[1][0] = 1;
if (trie[1][1])fail[trie[1][1]] = 1, que.push(trie[1][1]);
else trie[1][1] = 1;
while (!que.empty()) {//构造AC自动机,并产生失配指针
int f = que.front();
que.pop();
for (int i = 0; i < 2; i++) {
if (trie[f][i])fail[trie[f][i]] = trie[fail[f]][i], que.push(trie[f][i]);
else trie[f][i] = trie[fail[f]][i];
}
}
cout << (1ll << m) - DP(1, m + 1) << endl;
}
return 0;
}