benchmarking body checksums

Leandro Santi
Mon Feb 17 07:50:55 UTC 2003


I've been doing some tests with dccproc, in order to measure the
checksumming performance of the body checksums. This perl script,

will read a list of files from stdin, running different dccproc flavors
for each of these messages. The output consists of one line per input
message. Each line has three columns: performance ratio, user cpu time,
and checksumming speed. For example: this line,

      ratio          cpu time (user)     checksumming speed
1.00x 0.66x 0.51x | 0.12s 0.08s 0.06s | 2.4MB/s 3.6MB/s 4.6MB/s

tells us that the first dccproc flavor processed the message
using 0.12s of cpu at ~2.4 mbyte/s; the second dccproc flavor
processed the same message using 66% of the time (0.08s, 3.6MB/s) and
the third one in 51% of the original time (0.06s, 4.6MB/s).

Each one of these numbers is actually an averaged result of multiple
runs over the same message, discarding the first run in order to
warm up the fs cache.

Test machine is an old PII/350 (2 way) with 128 MB of RAM, running
a locally hacked redhat distro.

I built several dccproc binaries,

0) /usr/local/bin/dccproc
1) /home/leandro/dcc-dccd-1.1.27.O1/dccproc/dccproc
2) /home/leandro/dcc-dccd-1.1.27.O2/dccproc/dccproc
3) /home/leandro/dcc-dccd-1.1.27.icc/dccproc/dccproc
4) /home/leandro/dcc-dccd-1.1.27.icc_pentiumii/dccproc/dccproc
5) /home/leandro/dcc-dccd-1.1.27.icc_prof/dccproc/dccproc

0) "standard" dccproc (ie built using CC=gcc)
1) CC=gcc, CFLAGS=-O
2) CC=gcc, CFLAGS=-O2
3) CC=icc
4) CC=icc CFLAGS=-march=pentiumii (ie using hardware vectorization)
5) CC=icc, built using ICC's profile-guided optimization capability

Note: gcc is and old 2.91.66, icc is at 7.0.something. The DCC package
is at 1.1.27.


In order to minimize startup time influence and to actually measure
body checksumming speed, every one of the 95 messages is 200k or bigger.

0) scored ~2.3 MB/s on average
1) ran on 65% of 0's time on average, at ~3.6 MB/s
2) 69%, at ~3.3 MB/s
3) 53%, at ~4.3 MB/s
4) 54%, at ~4.3 MB/s
5) 50%, at ~4.6 MB/s (twice as fast! :-)

Script output:

WRT the intel compiler. Compiling the DCC with icc is not supported,
I think (I only checked the MD5 outputs of each message and compared
them against the standard dccproc, and it worked just fine). It needs
some work in order to fix many warnings and of course to check that
everything else is working as expected.


ps: Separately, Its interesting to see that 1.1.27 is sometimes faster,
sometimes slower than 1.1.11:

1.00x 0.92x | 0.10s 0.09s | 2.2MB/s 2.4MB/s
1.00x 1.03x | 0.91s 0.93s | 2.3MB/s 2.2MB/s
1.00x 0.92x | 0.52s 0.48s | 2.8MB/s 3.0MB/s
1.00x 0.98x | 0.13s 0.13s | 2.3MB/s 2.3MB/s
1.00x 0.99x | 0.89s 0.88s | 2.4MB/s 2.4MB/s
1.00x 0.91x | 0.09s 0.09s | 2.1MB/s 2.3MB/s
1.00x 1.02x | 0.26s 0.27s | 2.4MB/s 2.3MB/s
1.00x 0.99x | 0.14s 0.14s | 2.3MB/s 2.3MB/s
1.00x 0.97x | 0.16s 0.15s | 2.3MB/s 2.4MB/s
1.00x 0.99x | 0.18s 0.18s | 2.3MB/s 2.3MB/s
1.00x 0.99x | 0.17s 0.17s | 2.3MB/s 2.4MB/s
1.00x 1.00x | 0.17s 0.17s | 2.3MB/s 2.3MB/s
1.00x 0.99x | 0.16s 0.16s | 2.3MB/s 2.4MB/s
1.00x 1.01x | 0.15s 0.15s | 2.3MB/s 2.3MB/s

More information about the DCC mailing list

Contact by mail or use the form.