44 lines
1.6 KiB
Text
44 lines
1.6 KiB
Text
|
This is a patched version of zlib modified to use
|
||
|
Pentium-optimized assembly code in the deflation algorithm. The files
|
||
|
changed/added by this patch are:
|
||
|
|
||
|
README.586
|
||
|
match.S
|
||
|
|
||
|
The effectiveness of these modifications is a bit marginal, as the the
|
||
|
program's bottleneck seems to be mostly L1-cache contention, for which
|
||
|
there is no real way to work around without rewriting the basic
|
||
|
algorithm. The speedup on average is around 5-10% (which is generally
|
||
|
less than the amount of variance between subsequent executions).
|
||
|
However, when used at level 9 compression, the cache contention can
|
||
|
drop enough for the assembly version to achieve 10-20% speedup (and
|
||
|
sometimes more, depending on the amount of overall redundancy in the
|
||
|
files). Even here, though, cache contention can still be the limiting
|
||
|
factor, depending on the nature of the program using the zlib library.
|
||
|
This may also mean that better improvements will be seen on a Pentium
|
||
|
with MMX, which suffers much less from L1-cache contention, but I have
|
||
|
not yet verified this.
|
||
|
|
||
|
Note that this code has been tailored for the Pentium in particular,
|
||
|
and will not perform well on the Pentium Pro (due to the use of a
|
||
|
partial register in the inner loop).
|
||
|
|
||
|
If you are using an assembler other than GNU as, you will have to
|
||
|
translate match.S to use your assembler's syntax. (Have fun.)
|
||
|
|
||
|
Brian Raiter
|
||
|
breadbox@muppetlabs.com
|
||
|
April, 1998
|
||
|
|
||
|
|
||
|
Added for zlib 1.1.3:
|
||
|
|
||
|
The patches come from
|
||
|
http://www.muppetlabs.com/~breadbox/software/assembly.html
|
||
|
|
||
|
To compile zlib with this asm file, copy match.S to the zlib directory
|
||
|
then do:
|
||
|
|
||
|
CFLAGS="-O3 -DASMV" ./configure
|
||
|
make OBJA=match.o
|