Using a cmph as a Set

set-theoryGiven the extremely low overhead of the minimal perfect hashes generated by cmph (on the order of 1/2 a byte per key), I was fairly certain they would generate false positives. That is, if you searched for a key that isn’t in the hash, cmph would tell you it was. As I suspected, this is the case. This has big implications for using minimal perfect hashes as sets.

When you look up any key with cmph_search, it returns an index. To determine whether that index refers to the same key as the one you looked up, you need to keep an array of offsets. Assuming you have more than 65000 keys, that means you will have a 4 byte overhead per key. Once you have the offset, you then need to compare the key at that offset to the one you looked up.

Update (2017-08-30): I have verified that the indexes returned by cmph_search are random. This means you need to create the cmph, then walk back across it looking up each of the strings you loaded it with, and save the string’s offset at the returned index in your array of offsets.

The code to look up completely random keys in a cmph hash is now available in cmph-bench on GitHub. The next step in the benchmark will be to add the index and the keys to the algorithm, and to add the ability to benchmark other solutions, including binary searching and using a dynamic hash like the ones bench-marked in hash-table-shootout.

Advertisements

About jimbelton

I'm a software developer, and a writer of both fiction and non-fiction, and I blog about movies, books, and philosophy. My interest in religious philosophy and the search for the truth inspires much of my writing.
This entry was posted in c programming and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s