Doing some work with timing different algorithms, however my brute force implementation which I have found numerous times on different sites is sometimes returning more results than, say, Notepad++ search or VSCode search. Not sure what I am doing wrong.
The program opens a txt file with a DNA strand string of length 10000000 and searches and counts the number of occurrences of the string passed in via command line.
Algorithm:
int main(int argc, char *argv[]) {
// read in dna strand
ifstream file("dna.txt");
string dna((istreambuf_iterator<char>(file)), istreambuf_iterator<char>());
dna.c_str();
int dnaLength = dna.length();
cout << "DNA Strand Length: " << dnaLength << endl;
string pat = argv[1];
cout << "Pattern: " << pat << endl;
// algorithm
int M = pat.length();
int N = dnaLength;
int localCount = 0;
for (int i = 0; i <= N - M; i++) {
int j;
for (j = 0; j < M; j++) {
if (dna.at(i + j) != pat.at(j)) {
break;
}
}
if (j == M) {
localCount++;
}
}
The difference might be because your algorithm also counts overlapping results, while a quick check with Notepad++ shows that it does not. Example:
Let dna be "FooFooFooFoo"
And your pattern "FooFoo"
What result do you expect? Notepad++ shows 2 (one starts at position 1, the second at position 7 (after the first).
Your algorithm will find 3 (position 1, 4 and 7)
Yes this is the reason. Thank you. I need to walk through and figure out how to fix that. Although, I may actually think that it's better to have the overlap.