This directory contains some examples illustrating techniques for extracting
high-performance from flex scanners. Each program implements a simplified
version of the Unix "wc" tool: read text from stdin and print the number of
characters, words, and lines present in the text. All programs were compiled
using gcc (version unavailable, sorry) with the -O flag, and run on a
SPARCstation 1+. The input used was a PostScript file, mainly containing
figures, with the following "wc" counts:
lines words characters
214217 635954 2592172
The basic principles illustrated by these programs are:
- match as much text with each rule as possible
- adding rules does not slow you down!
- avoid backing up
and the big caveat that comes with them is:
- you buy performance with decreased maintainability; make
sure you really need it before applying the above techniques.
See the "Performance Considerations" section of flexdoc for more
details regarding these principles.
The different versions of "wc":
mywc.c
a simple but fairly efficient C version
wc1.l a naive flex "wc" implementation
wc2.l somewhat faster; adds rules to match multiple tokens at once
wc3.l faster still; adds more rules to match longer runs of tokens
wc4.l fastest; still more rules added; hard to do much better
using flex (or, I suspect, hand-coding)
wc5.l identical to wc3.l except one rule has been slightly
shortened, introducing backing-up
Timing results (all times in user CPU seconds):
program time notes
------- ---- -----
wc1 16.4 default flex table compression (= -Cem)
wc1 6.7 -Cf compression option
/bin/wc 5.8 Sun's standard "wc" tool
mywc 4.6 simple but better C implementation!
wc2 4.6 as good as C implementation; built using -Cf
wc3 3.8 -Cf
wc4 3.3 -Cf
wc5 5.7 -Cf; ouch, backing up is expensive