Given the extremely low overhead of the minimal perfect hashes generated by cmph (on the order of 1/2 a byte per key), I was fairly certain they would generate false positives. That is, if you searched for a key that isn’t in the hash, cmph would tell you it was. As I suspected, this is the case. This has big implications for using minimal perfect hashes as sets.
When you look up a key with cmph_search, it returns an index. To determine whether that index refers to the same key as the one you looked up, you need to keep an array of offsets. Assuming you have more than 65000 keys, that means you will have a 4 byte overhead per key. Once you have the offset, you then need to compare the key at that offset to the one you looked up.
The code to look up completely random keys in a cmph hash is now available in cmph-bench on GitHub. The next step in the benchmark will be to add the index and the keys to the algorithm, and to add the ability to benchmark other solutions, including binary searching and using a dynamic hash like the ones bench-marked in hash-table-shootout.