12/18/2007

string searching algorithm

Now busy rebuilding C/C++ and algorithm skills। Three more years from this filed, so, in several weeks, just wish to recover 60+% level of what I feel on 2003.

Just now, met a string search, say, "The problem is to search for a pattern string, pat[1॥m], in a text string txt[1..n]। " In university textbooks, it was described as a Boyer-Moore algorithm, which may be the best method. basically, it has two ideas.

The first one, also the key one, is if the mth character is not the end character of our searching pattern string, then the 1 to m characters could just be omitted। then we check next m characters।

The second idea is a supplemental, depends on which the last character is, shift specilized positions*, then check again as idea 1, if not move m positions, if yes, check if a matched pattern found। There is a need for a logic to calculate the shift table.

I believe it is the best idea। though even from the time I learned it, I hate to count the characters one by one। I prefer some direct, maybe "rude" ones. I'm a lazy boy;)

One of this kind of algorithm is Rabin's. Very simple, for the pattern string, C1C2...Cm, I caculate their weighted sum, Sm=C1+C2*3+C3*3**2+...+Cm*3**(m-1). and then... you know that?:)
We're writing C/C++ code and we would always need to make some decisions for the algorithm choice। For this kind of problem, they are aleady in the library. but if we meet with a new type question?

I prefer to the simple one। If no special performance demands, I would always choose a algorithm like Rabin's. My excuse:

1। Easy to implement।
2।Easy to understand and maintain। Our code will be maintained by somebody else, too complex and too gorgeous solution will bring big trouble to the tester and programmer who has to review your code.