I added a little more code to the cmph-bench project in GitHub. It’s described in the previous post, cmph-bench – A Benchmark for Minimal Perfect Hashing in C. The bench mark program now randomizes the order of the keys before looking them up, and it has a -c option to output the results as a row of comma separated values suitable for storing in a spreadsheet.
The new program cmph-graph.py runs the benchmark on ten data sets ranging in size from 1,000,000 to 10,000,000. You can then take the CSV it outputs and graph it:
This graph was constructed using data generated from the CHD algorithm.
For each number of keys, the times to build, load, and lookup all the keys in random order in the resulting MPH, are in microseconds. Load times are unobservably short. Build times start out at less than a microsecond per key, but slowly grow past that ratio, though the degradation appears to be linear. Lookup times are similar, though slightly faster.
The number of bytes needed to build the structure and the size of the structure itself are virtually equal. They are roughly half the number of keys, meaning each key requires about a half a byte of overhead. Not bad.