CMPH Benchmark Bar Chart

I added a little more code to the cmph-bench project in GitHub. It’s described in the previous post, cmph-bench – A Benchmark for Minimal Perfect Hashing in C. The bench mark program now randomizes the order of the keys before looking them up, and it has a -c option to output the results as a row of comma separated values suitable for storing in a spreadsheet.

The new program cmph-graph.py runs the benchmark on ten data sets ranging in size from 1,000,000 to 10,000,000. You can then take the CSV it outputs and graph it:

chd-graph

This graph was constructed using data generated from the CHD algorithm.

For each number of keys, the times to build, load, and lookup all the keys in random order in the resulting MPH, are in microseconds. Load times are unobservably short. Build times start out at less than a microsecond per key, but slowly grow past that ratio, though the degradation appears to be linear. Lookup times are similar, though slightly faster.

The number of bytes needed to build the structure and the size of the structure itself are virtually equal. They are roughly half the number of keys, meaning each key requires about a half a byte of overhead. Not bad.

 

Advertisements

About jimbelton

I'm a software developer, and a writer of both fiction and non-fiction, and I blog about movies, books, and philosophy. My interest in religious philosophy and the search for the truth inspires much of my writing.
This entry was posted in c programming, python programming and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s