56 lines
1.8 KiB
Text
56 lines
1.8 KiB
Text
This directory contains some examples illustrating techniques for extracting
|
|
high-performance from flex scanners. Each program implements a simplified
|
|
version of the Unix "wc" tool: read text from stdin and print the number of
|
|
characters, words, and lines present in the text. All programs were compiled
|
|
using gcc (version unavailable, sorry) with the -O flag, and run on a
|
|
SPARCstation 1+. The input used was a PostScript file, mainly containing
|
|
figures, with the following "wc" counts:
|
|
|
|
lines words characters
|
|
214217 635954 2592172
|
|
|
|
|
|
The basic principles illustrated by these programs are:
|
|
|
|
- match as much text with each rule as possible
|
|
- adding rules does not slow you down!
|
|
- avoid backing up
|
|
|
|
and the big caveat that comes with them is:
|
|
|
|
- you buy performance with decreased maintainability; make
|
|
sure you really need it before applying the above techniques.
|
|
|
|
See the "Performance Considerations" section of flexdoc for more
|
|
details regarding these principles.
|
|
|
|
|
|
The different versions of "wc":
|
|
|
|
mywc.c
|
|
a simple but fairly efficient C version
|
|
|
|
wc1.l a naive flex "wc" implementation
|
|
|
|
wc2.l somewhat faster; adds rules to match multiple tokens at once
|
|
|
|
wc3.l faster still; adds more rules to match longer runs of tokens
|
|
|
|
wc4.l fastest; still more rules added; hard to do much better
|
|
using flex (or, I suspect, hand-coding)
|
|
|
|
wc5.l identical to wc3.l except one rule has been slightly
|
|
shortened, introducing backing-up
|
|
|
|
Timing results (all times in user CPU seconds):
|
|
|
|
program time notes
|
|
------- ---- -----
|
|
wc1 16.4 default flex table compression (= -Cem)
|
|
wc1 6.7 -Cf compression option
|
|
/bin/wc 5.8 Sun's standard "wc" tool
|
|
mywc 4.6 simple but better C implementation!
|
|
wc2 4.6 as good as C implementation; built using -Cf
|
|
wc3 3.8 -Cf
|
|
wc4 3.3 -Cf
|
|
wc5 5.7 -Cf; ouch, backing up is expensive
|