## Building a Finite State Machine Using DFA::Simple

http://www.perl.com/pub/2004/09/23/fsms.html

Building a Finite State Machine Using DFA::Simple By Bill Ruppert on September 23, 2004 12:00 AM I am converting some articles from MS Word to HTML by hand. I often use bulleted outlines so I face a lot of work creating lists with nested sub-lists. It didn’t take opening and closing many

and

• tags to realize I wanted to automate the task.

It’s very easy to use filters with my text editor and I’ve written several in Perl to speed up my HTML formatting. I decided to create a filter to handle this annoying chore. After a little thought I decided the logic involved was just tricky enough to use a finite state machine (FSM) to keep things straight.

Large programs always require some thought and planning while most small programs can be written off the top of the head. Once in a while I run into a small problem with just enough complexity that I know “off the top” will quickly become “not so quick, very dirty, and probably wrong.” Finite state machines can help to solve these knotty problems cleanly and correctly.

There have been several times I wanted to use an FSM but didn’t because I lacked a good starting skeleton. I would do it some other way or hand code an FSM using a loop, nested if statements, and a state variable. This ad hoc approach was tedious and unsatisfying.

I knew there had to be a Perl module or two implementing FSMs and decided it was time to add this tool to my toolbox.

What Are Finite State Machines? A finite state machine is a set of states, one of which is the starting state. Each state has a set of transitions to other states, based on the next input token. A transition has a condition and an action. If the condition is met, the FSM performs the action and then enters the new state. This cycle repeats until reaching the end state.

Finite state machines are important in the theory of computation. A Wikipedia article on finite state machines goes into the computer science aspect. FSMs are valuable because it is possible to identify precisely what problems they can and cannot solve. To solve a new class of problems, add features to create related machines. In turn, you can characterize problems by the kind of machine needed to solve it. Eventually you work up to a Turing machine, the most theoretically complete computing device. It is a finite state machine attached to a tape with potentially infinite capacity.

The formal and highly abstract computer science approach to FSMs does not hint at their usefulness for solving practical problems.

The Bubble Diagram Figure 1 shows the parts of a finite state machine. The circles represent states. A double circle indicates the start state. The arcs represent the transitions from one state to another. Each arc label is a condition. The slashes separate actions.

Finite state machine elements Figure 1. Finite state machine elements.

I like FSMs because they are graphical. I can grab a pen and sketch the solution to a problem. The bubble diagram helps me see the combinations of states and input conditions in a logical and organized way.

Hand drawn finite state bubble diagram Figure 2. Initial rough diagram.

Figure 2 is the initial diagram I drew for the HTML list problem. The diagram turned out to be have an omission — it does not handle blank lines. Having discovered this bug, it was easy to make sure to consider blank lines in every state.

Final finite state diagram Figure 3. The final diagram.

Figure 3 is a formal diagram of the final FSM, complete with blank-line handling, prepared for this article. Normally, I would never be so elaborate for such a small problem. You can see that it is an excellent way to communicate a solution to others.

Looking at DFA::Simple I have yet to look for a Perl module to do something and come up totally empty handed. That’s absolutely my favorite thing about Perl.

I looked for existing FSM modules and found DFA::Command and DFA::Simple. DFA::Command looks good but was more than I needed. (By the way, the simplest FSM is also a “deterministic finite automata,” thus the DFA in the module names.)

The “simple” in DFA::Simple sounded promising, but the module’s documentation is poor. The sample program is difficult to follow and does not process input from a file. Using the module in the real world is left as an exercise for the reader: the docs do not mention handling errors and the end-of-file condition.

While FSMs start as circles and arcs, they end up as tables of states and transitions. My interest was in exactly how the packages created these tables. Having clean, easy-to-look-at code is very important to me. I’m probably the one who’ll have to change it someday!

I was very unhappy that DFA::Simple defines its sample FSM in a position-dependent way. The state being defined depends upon its textual position in the source. It uses numeric constants for state IDs within the tables and you must synchronize them with any changes. Comments indicate which state number you think you are working on, but they could easily unsynchronize as you add, drop, and move states.

To clarify, I saw this:

my \$states = [ # State 0 [ …jump to state 2… ], # State 1 [ …jump to state 0… ], # State 2 [ …jump to state 1… ], # … ];

But I wanted to see something like this:

This was a big negative. I have run into enough bugs caused by position-dependency that I refused to consider using this code. I almost decided to write my own module, but first went back and took a last look at DFA::Simple to see if there was any way I could make it work.

I realized that an FSM use symbolic constants for state definitions. Array references using these symbolic constants would provide the self-documenting and maintainable approach I demanded. It was easy to develop a way to process input (handling errors and EOF) that works in most cases. I put it together, gave it a try, and it worked!

The Input and Output The input comes from copying an MS Word document and pasting it into my text editor. My bulleted outlines look like this:

o Major bullet points – each line is a point – a leading “o ” is removed o Minor bullet points – indicated by a leading “- ” – may not start a list – only one level of minor points is supported o Future enhancements – numbering of points – handling of paragraphs

o Start a new group of points – a blank line indicates end of old group – old group is closed, new one opened

This outline actually documents key features of the program.

A correctly nested HTML list must come entirely within the parent list item. This is what made the filter tricky enough that I wanted to use a finite state machine. When you open a top-level list element you have to wait for the next input to see if you should close it or start a sub-list. When you look at an input, you have to know if you have an open list element pending. The current state and the input determine what to do next, exactly fitting the finite state machine model.

A snippet of output:

• Major bullet points
• Each line is a point
• A leading “o ” is removed

Not being an HTML expert, I initially tried to insert my sub-list between two higher-level list elements. The page rendered correctly but would not validate, revealing my error and starting this project.

The Parts of the Machine The first step in building the machine was deciding how to represent state names symbolically rather than numerically. The natural choice is the constant pragma.

use constant { stStart => 0, stItem => 1, stSubItem => 2, stEnd => 3, };

I defined the machine as two arrays. The first is @actions defining what to do when entering and exiting a state. You might use it to initialize a data structure on entry and save it on exit, or allocate resources on entry and de-allocate them on exit.

Each element of @actions is a two-element array. The first is the enter action and the second the exit action. My program logs a message on entry and leaves the exit action undefined.

my @actions;

# log entry into each state for debugging \$actions[stStart] = [sub{print LOG “Enter Startn”;} ]; \$actions[stItem] = [sub{print LOG “Enter Itemn”;} ]; \$actions[stSubItem] = [sub{print LOG “Enter SubItemn”;}]; \$actions[stEnd] = [sub{print LOG “Enter Endn”;} ];

Each element of the second array, @states, is an array of transitions for a state. A transition has three elements: the new state; the test condition; and the action. The condition is a subroutine returning true or false, or undef, which always succeeds.

The machine tests the transitions of the current state in sequence until a condition returns true or is undefined. Then it takes the action and enters the new state. If the new state is different than the old one, the respective exit and entry actions execute.

As with the final else in a nested if statement, consider having an undef at the end of the transitions. If nothing else, log a “can’t happen” error message to catch holes in your logic as well as holes caused by future changes.

Here is the transition array for a state:

my @states;

# Next State, Test, Action

\$states[stSubItem] = [ [stEnd, sub{\$done}, sub{end_sublist; end_list} ], [stSubItem, sub{/^-s/}, sub{output_subitem} ], [stStart, sub{/^\$/}, sub{end_sublist; end_list} ], [stItem, undef, sub{end_sublist; start_item}], ];

This uses the symbolic constants for state IDs in both tables. The states’ numerical values are irrelevant. I followed convention and numbered the states consecutively from zero, but I tested the program using random state numbers. It worked fine.

This is much better than the definition of the sample FSM in DFA::Simple. Now adding, deleting, and shuffling states will not introduce position-dependent bugs.

Final Assembly The parts of the machine are built. Assembling them is easy:

use DFA::Simple

my \$fsm = new DFA::Simple @actions, @states; \$fsm->State(stStart);

This creates a new DFA::Simple object using references to the @actions and @states arrays. The State method retrieves or sets the machine’s state, invoking entry and exit routines as needed. Here, the call sets the initial state.

Putting the Machine in Motion The major goal of this project was to develop a skeleton to develop FSMs quickly, with an emphasis on filters. I wanted to develop a way of handling file input and processing errors that would work for future projects.

This loop accomplishes that:

while (<>) { chomp;

# trim input and get rid of leading “o ” s/^s+//; s/s+\$//; s/^os+//;

# process this line of input \$fsm->Check_For_NextState();

# get out if an error occurred last if \$error; }

The while (<>) loop is standard for filters. The application cleans up input as needed. In this case, it chomps each line and trims and strips it of some text bullet characters.

The log message is for initial debugging. This, along with the new state message, gives a clear picture of what is happening.

The most important statement is \$fsm->Check_For_NextState();, which does the major work — scanning the current state’s transition table to see what to do next.

The FSM must be able to exit early if it spots an error. This requires the use of a global \$error switch. It starts out as false. If \$error is true, the program ends and returns the value of \$error.

End-of-file handling also uses a global switch, \$done. It is initially false and changes to true when the program exhausts standard input. One last cycle runs to give the machine a chance to clean up. Transition conditions checking for EOF can simply return \$done.

Here is the end-of-file code:

unless (\$error) { # finish up \$done = 1; \$fsm->Check_For_NextState(); }

The program does not support early normal termination. You can enter a state with no transitions and the program will read (but ignore) any remaining input. However, there is no way to avoid reading all of the input without setting \$error. You can implement early exit by adding a \$exit switch and handling it similarly to \$error.

Appropriately for filters, this program is line oriented. Many FSMs read input a character at time: lexical scanners and regular expression parsers, for example. This program could easily change to handle character input.

The Action Routines Now that the machine is purring along, it needs to do something. This is the purpose of the actions. Those in my program are typical of a finite state machine. They invoke small subroutines that form a mini-language for building lists with sub-lists. The subroutines are very simple, but running them in the right order does the job.

Thus, this approach breaks a complex problem into smaller pieces, each of which is simply solvable. This is what programming is all about.

Conclusion The next time you have a knotty little problem like this one, try sketching a solution using a bubble diagram and implementing it with DFA::Simple. You’ll have added a great new tool to your toolbox.

Resources * Zipped source code of the final program * Finite State Machines at Wikipedia * Turing Machines at Wikipedia * DFA::Simple at CPAN * DFA::Command at CPAN * Finite State Technology at Xerox Research Centre Europe * Finite State Utilities in C++ by Jan Daciuk

## String and Array Creation

Image via Wikipedia

This is the fourth part of a nine-part article on famous Perl one-liners. In this part I will create various one-liners for string and array creation. See part one for introduction of the series.

Famous Perl one-liners is my attempt to create “perl1line.txt” that is similar to “awk1line.txt” and “sed1line.txt” that have been so popular among Awk and Sed programmers.

The article on famous Perl one-liners will consist of nine parts:

I decided that there will be two new parts in this series. The most powerful feature in Perl is its regular expressions, therefore I will write a part on “Handy Perl regular expressions.” I also decided to publish an e-book after I am done with the series, so that will be the last part of this series. Everyone who’s subscribed to my blog will get a free copy! Subscribe to my blog now!

I also updated the previous part on calculations with 14 new one-liners on finding values of constants pi and e, doing date calculations, finding factorial, greatest common divisor, least common multiple, generating random numbers, generating permutations, finding power sets and doing some IP address conversions.

Here are today’s one-liners:

## String Creation and Array Creation

49. Generate and print the alphabet.

` perl -le \'print a..z\'`

This one-liner prints all the letters from a to z as abcdefghijklmnopqrstuvwxyz. The letters are generated by the range operator ... The range operator, when used in the list context (which is forced here by print) on strings, uses the magical auto-increment algorithm that advances the string to the next character. So in this one-liner the auto-increment algorithm on the range a..z produces all the letters from a to z.

I really golfed this one-liner. If you used strict it would not work because of barewords a and z. Semantically more correct version is this:

` perl -le \'print (\"a\"..\"z\")\'`

Remember that the range operator .. produced a list of values. If you wish, you may print them comma separated by setting the \$, special variable:

` perl -le \'\$, = \",\"; print (\"a\"..\"z\")\'`

There are many more special variables. Take a look at my special variable cheat sheet for a complete listing.

50. Generate and print all the strings from “a” to “zz”.

` perl -le \'print (\"a\"..\"zz\")\'`

Here the range operator .. is used again. This time it does not stop at “z” as in the previous one-liner, but advances z by one-character producing “aa”, then it keeps going, producing “ab”, “ac”, …, until it hits “az”. At this point it advances the string to “ba”, continues with “bb”, “bc”, …, until it reaches “zz”.

Similarly you may generate all strings from “aa” to “zz” by:

` perl -le \'print \"aa\"..\"zz\"\'`

Here it goes like “aa”, “ab”, …, “az”, “ba”, “bb”, …, “bz”, “ca”, … “zz”.

51. Create a hex lookup table.

` @hex = (0..9, \"a\"..\"f\")`

Here the array @hex gets filled with values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and letters a, b, c, d, e, f.

You may use this array to convert a number (in variable \$num) from decimal to hex using base conversion formula:

` perl -le \'\$num = 255; @hex = (0..9, \"a\"..\"f\"); while (\$num) { \$s = \$hex[(\$num%16)&15].\$s; \$num = int \$num/16 } print \$s\'`

52. Generate a random 8 character password.

` perl -le \'print map { (\"a\"..\"z\")[rand 26] } 1..8\'`

Here the map function executes (\"a\"..\"z\")[rand 26] code 8 times (because it iterates over the dummy range 1..8). In each iteration the code chooses a random letter from the alphabet. When map is done iterating, it returns the generated list of characters and print function prints it out by concatenating all the characters together.

If you also wish to include numbers in the password, add 0..9 to the list of characters to choose from and change 26 to 36 as there are 36 different characters to choose from:

` perl -le \'print map { (\"a\"..\"z\", 0..9)[rand 36] } 1..8\'`

If you need a longer password, change 1..8 to 1..20 to generate a 20 character long password.

53. Create a string of specific length.

` perl -le \'print \"a\"x50\'`

Operator x is the repetition operator. This one-liner creates a string of 50 letters “a” and prints it.

If the repetition operator is used in list context, it creates a list (instead of scalar) with the given elements repeated:

` perl -le \'@list = (1,2)x20; print \"@list\"\'`

This one liner creates a list of twenty repetitions of (1, 2) (it looks like (1, 2, 1, 2, 1, 2, …)).

54. Create an array from a string.

` @months = split \' \', \"Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec\"`

Here the @months gets filled with values from the string containing month names. As each month name is separated by a space, the split function splits them and puts them in @months. This way \$months[0] contains “Jan”, \$months[1] contains “Feb”, …, and \$months[11] contains “Dec”.

55. Create a string from an array.

` @stuff = (\"hello\", 0..9, \"world\"); \$string = join \'-\', @stuff`

Here the values in array @stuff get turned in a string \$string that has them separated by a hyphen. Turning an array in a string was done by the join function that takes a separator and a list, and concatenates the items in the list in a single string, separated by the separator.

56. Find the numeric values for characters in the string.

` perl -le \'print join \", \", map { ord } split //, \"hello world\"\'`

This one-liner takes the string “hello world”, splits it into a list of characters by split //, \"hello world\", then it maps the ord function onto each of the characters, which returns the numeric, native 8-bit encoding (like ASCII or EBCDIC) of the character. Finally all the numeric values get joined together by a comma and get printed out.

Another way to do the same is use the unpack function and specify C* as the unpacking template (C means unsigned character and * means as many characters there are):

` perl -le \'print join \", \", unpack(\"C*\", \"hello world\")\'`

57. Convert a list of numeric ASCII values into a string.

` perl -le \'@ascii = (99, 111, 100, 105, 110, 103); print pack(\"C*\", @ascii)\'`

Just as we unpacked a string into a list of values with the C* template in the one-liner above, we can pack them back into a string.

Another way to do the same is use the chr function that takes the code point value and returns the corresponding character:

` perl -le \'@ascii = (99, 111, 100, 105, 110, 103); print map { chr } @ascii\'`

Similar to one-liner #55 above, function chr gets mapped onto each value in the @ascii producing the characters.

58. Generate an array with odd numbers from 1 to 100.

` perl -le \'@odd = grep {\$_ % 2 == 1} 1..100; print \"@odd\"\'`

This one-liner generates an array of odd numbers from 1 to 99 (as 1, 3, 5, 7, 9, 11, …, 99). It uses the grep function that evaluates the given code \$_ % 2 == 1 for each element in the given list 1..100 and returns only the elements that had the code evaluate to true. In this case the code tests if the reminder of the number is 1. If it is, the number is odd and it has to be put in the @odd array.

59. Generate an array with even numbers from 1 to 100.

` perl -le \'@even = grep {\$_ % 2 == 0} 1..100; print \"@even\"\'`

This is almost the same as the previous one-liner, except the condition grep tests for is “is the number even (reminder dividing by 2 is zero)?”

60. Find the length of the string.

` perl -le \'print length \"one-liners are great\"\'`

Just for completeness, the length subroutine finds the length of the string.

61. Find the number of elements in an array.

` perl -le \'@array = (\"a\"..\"z\"); print scalar @array\'`

Evaluating an array in a scalar context returns the number of elements in it.

## Have Fun!

Have fun with these one-liners for now. The next part is going to be about text conversion and substitution.

Can you think of other string creating and array creation one-liners that I didn’t include here?

## Script to get the number of events from the logs.

I was trying to do some log analysis and finding the events in the logs. For this the logs had the Events logged as \”|+Event name|\” or with sending and receiving. So I wrote this little script to take care of my requirements.

First you would need to change the pattern in the bold red and make sure that all special characters like | or + need to be escaped with \”\”. So here\’s the script:

#!/usr/bin/perl
#===============================================================================
#
#         FILE:  log_counters.pl
#
#        USAGE:  ./log_counters.pl log_filename
#
#  DESCRIPTION:  Get all the counters for the messages from the logs.
#
#      OPTIONS:  —
# REQUIREMENTS:  —
#         BUGS:  —
#        NOTES:  —
#       AUTHOR:  Amit Agarwal (amit.agarwal@amit-agarwal.co.in),
#      COMPANY:
#      VERSION:  1.0
#      CREATED:  01/15/2010 11:27:49 AM
#     REVISION:  —
#===============================================================================
open (FILE, “<\$ARGV[0]“);

@names = `grep|+’ \$ARGV[0]|sed -e ’s#.*|+##’ -e ’s#|.*##’|sort |uniq`;

%patterns=(’Received :’ => 0, ‘Sending :’ => 0);
for \$j (@names) {

chomp(\$j);
\$temp = “\\+\$j\\|”;

\$patterns{\$temp} = 0;
}
@keys = keys %patterns;

while (\$line = ) {
chomp(\$line);
for \$i  (keys %patterns)
{

if (\$line =~ m/\$i/){

\$patterns{\$i} += 1;

}

}
}
for \$j (sort keys %patterns){
printf “%40s ttt==> \$patterns{\$j}n”, \$j;
}
close(FILE);

Sample output: