Automated MAC to IP Address Mapping

ip-macI just hacked a little Linux specific python code that maps host names to MAC addresses and IP addresses. This is useful if you have machines on your local area network (e.g. wireless LAN) that are only identifiable by their MACs. For example, this would be the case if they’re getting their IP addresses via DHCP, which can allocate different IPs if the network is restarted.

I’m going to walk through the code so you can see how it works. First, the file starts with a she-bang that identifies the location of the python interpreter, followed by import statements for each of the libraries used:

#!/usr/bin/python

import argparse
import json
import os
import re
import subprocess
import sys

Next I parse the command line arguments. You can specify a host name with -H and, optionally, a MAC address with -m. The MAC must be 12 hexadecimal digits, with each pair separated by colons. For example, 00:90:a9:ec:69:af. You can typically find it on a label on the case if your connecting to a stand alone device. On a Linux machine, you can find the MAC using the ifconfig command. Here’s the code:

argumentParser = argparse.ArgumentParser(description="Show the differences between a set of files and a backup copy")
argumentParser.add_argument("-H", "--hostname", help="name of the host; if no MAC specified, look up in ~/.hosttomac")
argumentParser.add_argument("-m", "--mac", help="ethernet MAC address; if specified write to ~/.hosttomac")
arguments = argumentParser.parse_args()

Next, I create an empty map of host names to MAC addresses, then look for a JSON file containing an existing map and, if found, read it in:

hostToMac = {}

try:
    with open(os.getenv("HOME") + "/.hosttomac") as hostToMacFP:
        hostToMac = json.load(hostToMacFP)
except IOError as error:
    if os.path.exists(os.path.realpath(os.getenv("HOME") + "/.hosttomac")):
        raise error

Next, if a host name was specified and a MAC was too, if the host is already in the map, make sure the MACs match. If no MAC was specified, make sure there’s one in the map.

if arguments.hostname:
    if arguments.mac:
        if (hostToMac.get(arguments.hostname) and arguments.mac 
         != hostToMac[arguments.hostname]):
            sys.exit("Host " + arguments.hostname"
                        + " is already in ~/.hosttomac with MAC "
                        + hostToMac[arguments.hostname] + " but you specified "
                        + arguments.mac)

    elif hostToMac.get(arguments.hostname):
        arguments.mac = hostToMac[arguments.hostname]

else:
     sys.exit("Host " + arguments.hostname 
                 + " specified without a MAC and not found in ~/.hosttomac")

Next, I grab the output of arp -n, which gives the mappings from IP to MAC address of all the devices the Linux machine knows about. I then go through it looking for lines that map an IP to a MAC, and see if there’s one with the MAC that matches the host, and if there is, record its IP.

output = subprocess.check_output(["arp", "-n"])
 linePat = re.compile(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+ether\s+'
                                       + r'([\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f])')
 hostIp = None

for line in output.split('\n'):
    match = linePat.match(line)

    if match and match.group(2) == arguments.mac:
        hostIp = match.group(1)

if not hostIp:
     sys.stderr.write(output)
     sys.exit("MAC address " + arguments.mac 
                 + " not found in arp -n output; try running: sudo arp-scan -l")

Finally, if the host wasn’t already in the map, add it along with the MAC and save the map file. This way, once the MAC has been confirmed as valid, it’s saved and you never have to specify it again.

if not hostToMac.get(arguments.hostname):
     hostToMac[arguments.hostname] = arguments.mac

    with open(os.getenv("HOME") + "/.hosttomac", "w") as hostToMacFP:
        json.dump(hostToMac, hostToMacFP)

You can then do what you want with your IP:

    print hostIp
Advertisements
Posted in Uncategorized | Leave a comment

Avro C Diagnostics

avro-rotaI was experimenting with Avro C, a library from the Apache project used to encode/decode information whose Java implementation that is commonly used to enode data being stored in Kafka, a log based data distribution system. Avro C is fairly difficult to use. In order to get it to encode my data (a simple log file), I had to climb into the code and enhance the diagnostics. I’ve just published the enhanced library on GitHub: https://github.com/jimbelton/avro-c-packaging

Here’s a summary of the changes:

  • In datafile.c, all errors from avro_[*_]write functions were being treated as buffer overflows, and the low level error messages (the real problems) overwritten. I changed the buffer overflow messages to prefix the lower level messages.
  • In datum.c, added detailed errors on failure of avro_record_get, including the set of valid field names.
  • In datum_validate.c, added detailed errors on failure of avro_schema_datum_validate.
  • In datum_write.c, changed avro_write_data to take advantage of the change to datum_valididate.c.
  • In st.c, added a new function st_keys_as_string that extracts a comma separated list of the keys in an st_table.

Feel free to use this version if you like.

Posted in Uncategorized | Leave a comment

Go is in the Top 10, According to Tiobe

tiobe-logoIts July 2017, and since last July, there has been a huge shift in the top programming languages, according to the Tiobe index. Though the top 6 languages remain unchanged, all have declined. Visual Basic .NET traded places with Javascript to move into 7th (down from 6th last October), and Delphi/Object Pascal leaped from 12th to 9th. But the most impressive gain came for the upstart Go programming language, which rose from 55th to 10th. This was due to a whopping 2.2% increase in mindshare. Java, the most popular language, has 13.75%.

Rounding out the top 20: perl fell back out of the top ten to 11th; swift (Apple’s replacement for Objective C) rose to 12th; Ruby fell to 13th; Assembly fell out of the top ten to 14th; R, the mathematics friendly language, rose to 15th; Visual Basic dropped to 16th; Matlab dropped a notch to 17th; Objective C plummeted to 18th; teaching language Scratch joined the top 20 at 19th; and PL/SQL, Oracle’s stored procedure language, dropped to 20th.

I noted Go’s impressive rise last October. I wonder if javascript’s drop may in some measure be attributable to Go taking mind share from Node, the server side javascript environment. Delphi’s comeback is a big surprise to me. Visual Basic .NET’s increase matches the declines in legacy Visual Basic and, more interestingly, C#, the other big .NET language, which managed to hold on to it’s 5th place ranking.

Here’s the last year of Tiobe’s graph, stretched out:

tiobe

As you can see, the year has not been kind to Java or C. Go, on the other hand, had a huge surge near the end of last year, followed by steady growth. It’s crazy that the most popular language is favored by little over 1 in 8 developers.

 

 

Posted in Uncategorized | Leave a comment

cmph Comparitive Graphs

I just enhanced cmph-bench/cmph-graph.py to allow graphing a single attribute across multiple minimal perfect hashing algorithms. The updated code is available in GitHub. For example, the following command generates a CSV of lookup times for four algorithms:

./cmph-graph.py -t chd chd-ph brz brz-ph

Here’s what the output looks like:

"algo", 0, 1000000, 2000000, 3000000, 4000000, 5000000, 6000000, 7000000, 8000000, 9000000, 10000000,
"chd", 0, 771125, 1845753, 2877126, 3630860, 4724457, 6133570, 7256998, 9365072, 9743756, 10769506,
"chd-ph", 0, 750386, 1672261, 2516055, 3620473, 4914581, 6514058, 6985887, 8423580, 10685038, 10831800,
"brz", 0, 753254, 1756762, 2678971, 3530783, 5239339, 6505353, 7822440, 8512124, 10439278, 10784875,
"brz-ph", 0, 813923, 1816921, 2587742, 3700242, 4648037, 5710205, 7039408, 8217433, 9588468, 11932299,

Finally, here’s the graph:

cmph-lookup-times

There’s not a whole lot of difference between the algorithms, speed wise.

Posted in c programming, python programming | Tagged , , , | Leave a comment

Using a cmph as a Set

set-theoryGiven the extremely low overhead of the minimal perfect hashes generated by cmph (on the order of 1/2 a byte per key), I was fairly certain they would generate false positives. That is, if you searched for a key that isn’t in the hash, cmph would tell you it was. As I suspected, this is the case. This has big implications for using minimal perfect hashes as sets.

When you look up any key with cmph_search, it returns an index. To determine whether that index refers to the same key as the one you looked up, you need to keep an array of offsets. Assuming you have more than 65000 keys, that means you will have a 4 byte overhead per key. Once you have the offset, you then need to compare the key at that offset to the one you looked up.

Update (2017-08-30): I have verified that the indexes returned by cmph_search are random. This means you need to create the cmph, then walk back across it looking up each of the strings you loaded it with, and save the string’s offset at the returned index in your array of offsets.

The code to look up completely random keys in a cmph hash is now available in cmph-bench on GitHub. The next step in the benchmark will be to add the index and the keys to the algorithm, and to add the ability to benchmark other solutions, including binary searching and using a dynamic hash like the ones bench-marked in hash-table-shootout.

Posted in c programming | Tagged , , , , | Leave a comment

CMPH Benchmark Bar Chart

I added a little more code to the cmph-bench project in GitHub. It’s described in the previous post, cmph-bench – A Benchmark for Minimal Perfect Hashing in C. The bench mark program now randomizes the order of the keys before looking them up, and it has a -c option to output the results as a row of comma separated values suitable for storing in a spreadsheet.

The new program cmph-graph.py runs the benchmark on ten data sets ranging in size from 1,000,000 to 10,000,000. You can then take the CSV it outputs and graph it:

chd-graph

This graph was constructed using data generated from the CHD algorithm.

For each number of keys, the times to build, load, and lookup all the keys in random order in the resulting MPH, are in microseconds. Load times are unobservably short. Build times start out at less than a microsecond per key, but slowly grow past that ratio, though the degradation appears to be linear. Lookup times are similar, though slightly faster.

The number of bytes needed to build the structure and the size of the structure itself are virtually equal. They are roughly half the number of keys, meaning each key requires about a half a byte of overhead. Not bad.

 

Posted in c programming, python programming | Tagged , , | Leave a comment

cmph-bench – A Benchmark for Minimal Perfect Hashing in C

mph-diagThere is an open source software package on Sourceforge that includes implementations in C of several minimal perfect hashing algorithms. It has been uploaded to GitHub at https://github.com/zvelo/cmph. The code is licensed under your choice of LGPL (Lesser GNU Public License) 2 or MPL (Mozilla Public License) 1.1. This is important because MPL has fairly free terms of use, whereas GPL can give people the right to reverse engineer your code if you include libraries licensed (exclusively) with it.

You can download and build the package as follows:

  1. git clone https://github.com/zvelo/cmph
  2. cd cmph
  3. configure
  4. make
  5. make check

There is a sample program in the README. I copied it to a sibling directory (cmph-bench), calling the file cmph-bench.c. I then built it using my baker build tool. Note that the latest version of baker from 2017-06-25 is required, as cmph uses libm.a (the math library) and baker wasn’t resolving symbols in libraries under /usr/lib prior to this version.

After building, the progam cmph-bench is dropped in the target directory and can be run with: target/cmph-bench. I copied the file /usr/share/dict/words, a corpus of almost 100000 unique words used in the Easy Perfect Minimal Hashing benchmark. I then modified the program to create a MPH from the corpus, reload it from disk, and look up every word in the corpus in it. I added code to benchmark each of these phases. Running with the BRZ algorithm:

Constructed MPH in 0.381116 seconds
Loaded MPH in 0.000689 seconds
Looked up 99171 keys MPH in 0.058402 seconds

I added commands to allow any of the supported algorithms to be bench-marked. Running without an algorithm gives you the list of supported algorithms:

usage: cmph-bench (bdz|bdz-ph|bmz|bmz8|brz|chd|chd-ph|chm|fch)

Finally, I added code to track the amount of memory being allocated using the linux function mallinfo. This produced some interesting results. E.g., when I ran: target/cmph-bench chd

Constructed MPH in 0.379223 seconds, memory 20528
Loaded MPH in 0.000653 seconds, memory 182960
Looked up 99171 keys in MPH in 0.059219 seconds, corpus size 938848

Note that the corpus size is the actual size of the corpus including just the words separated by newlines. That means that the memory allocated for the MPH does not include the keys. So how does cmph know that when there’s a hit, its actually a match for the same key? I’m going to have to delve into the code of cmph to figure that out. For now, I’ve pushed the code of cmph-bench to github: https://github.com/jimbelton/cmph-bench. Feel free to download it (or fork it) and play.

My friend Simon did so and sent me some comments:

Problem with the benchmark is that 100k words is way too small. The entire output set probably sits in cache memory and so the timing might not reflect reality for bigger sets. Instead of words then just try using consecutive big numbers as keys, e.g. 1,000,000,000 onwards…
Also, when looking up the keys then look them up in a random order so CPU cache effects do not skew the results.

The randomization would be an interesting extension to the test. The larger data set can be achieved by changing the corpus-words, as Simon shows in his last comment.

Baker is not working for me:

simon@precision-5520:~/cmph/cmph-bench$ ../baker/baker

/usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp’ with pointer equality in `../../../../usr/lib/x86_64-linux-gnu/libc.a(strcmp.o)’ can not be used when making an executable; recompile with -fPIE and relink with -pie
collect2: error: ld returned 1 exit status
fatal: Failed to link C program with: cc target/cmph-bench.o -o target/cmph-bench ../cmph/src/.libs/libcmph.a ../../../../usr/lib/x86_64-linux-gnu/libm.a ../../../../usr/lib/x86_64-linux-gnu/libc.a

Interestingly, if I just do the last link step manually instead of baker with -lm -lc instead of those long paths then everything seems to work.

simon@precision-5520:~/cmph/cmph-bench$ cc target/cmph-bench.o -o target/cmph-bench ../cmph/src/.libs/libcmph.a -lm -lc

I’m not sure why baker is having trouble with libc.a. The -lc flag should not be required. If you run into this problem, try Simon’s work around.

Hmmm… if I try with a 1M or 10M word corpus then it always asserts 😦 — Simon

simon@precision-5520:~/cmph/cmph-bench$ target/cmph-bench brz
cmph-bench: brz.c:417: brz_gen_mphf: Assertion `nkeys_vd < brz->size[cur_bucket]’ failed.Aborted (core dumped)

Not sure why the BRZ algorithm has this limitation. It’s also fairly slow. I will likely do a future post with some graphs of the various algorithms with various sized key sets.

If I change the algo in the bench mark program then I can avoid the assert and even 10 M keys works 🙂 However, the construction time and lookup time do not seem to be linear going from 10M keys to 100M keys 😦 And I didn’t try to randomize the keys lookups yet…

simon@precision-5520:~/cmph/cmph-bench$ perl -e ‘foreach(0..10_000_000){printf qq[%u\n],$_;}’ > corpus-words
simon@precision-5520:~/cmph/cmph-bench$ target/cmph-bench bdz
debug: Open the dump file and get the time and memory stamps
debug: Create minimal perfect hash function using the chosen algorithm
Constructed MPH in 9.087200 seconds, memory 3459552
Loaded MPH in 0.000649 seconds, memory 3458976
Looked up 10000001 keys in MPH in 1.362376 seconds, corpus size 78888899

simon@precision-5520:~/cmph/cmph-bench$ perl -e ‘foreach(0..100_000_000){printf qq[%u\n],$_;}’ > corpus-words
simon@precision-5520:~/cmph/cmph-bench$ target/cmph-bench bdz
debug: Open the dump file and get the time and memory stamps
debug: Create minimal perfect hash function using the chosen algorithm
Constructed MPH in 259.054525 seconds, memory 3843904
Loaded MPH in 0.006719 seconds, memory 34593344
Looked up 100000001 keys in MPH in 30.801986 seconds, corpus size 888888900

Posted in c programming | Tagged , , , , , , , | 1 Comment