Git Bash vs. Power Shell

git-for-windowsFor programmers used to developing on Linux, Windows is an incredibly unfriendly programming environment. What is the best way to create a Linux programmer friendly developer environment on Windows?

In the past, the best way to create a Unix like development environment on Windows was by installing cygwin. Unfortunately, cygwin is not Windows, and software developed in it requires the cygwin DLLs to be installed on the target machine. This led to the development of MinGW (minimal GNU for Windows), a package of GNU utilities that can run natively on Windows, along with a port of the GNU bash shell.

Microsoft has release their own shell, the PowerShell. It has some attractive features, but is completely different from bash. While looking for a solution to provide git in PowerShell, I came across Git for Windows, which offers git bash, a Windows command tool that runs the MinGW bash shell. This seems to be a better solution than PowerShell if you are coming from a Linux development background.

The following document describes how to add new utilities to git bash: How to add more to Git Bash on Windows. I intend to try to use git bash as my development environment on Windows, and will try to update this post and make new posts describing what I’m doing and how it works out.

First thing I need is a C compiler. The MinGW 64 installer can be found (and downloaded from) here: It is straightforward to install. Before it will be usable from Git Bash, you need to add it to the path:

  1. Open the Control Panel. If your using Classic Shell, click Start and select Programs/Windows System/Control Panel
  2. Click System and Security
  3. Click on Environment Variables.
  4. Double Click on System variables > Path
  5. Add C:\Program Files (x86)\mingw-64\i686-7.3.0-posix-dwarf-rt_v5-rev0\mingw32\bin to the Variable value

Open  a new git bash shell. You should be able to run: gcc –version

Posted in tools | Tagged , , , , , | 1 Comment

Automated MAC to IP Address Mapping

ip-macI just hacked a little Linux specific python code that maps host names to MAC addresses and IP addresses. This is useful if you have machines on your local area network (e.g. wireless LAN) that are only identifiable by their MACs. For example, this would be the case if they’re getting their IP addresses via DHCP, which can allocate different IPs if the network is restarted.

I’m going to walk through the code so you can see how it works. First, the file starts with a she-bang that identifies the location of the python interpreter, followed by import statements for each of the libraries used:


import argparse
import json
import os
import re
import subprocess
import sys

Next I parse the command line arguments. You can specify a host name with -H and, optionally, a MAC address with -m. The MAC must be 12 hexadecimal digits, with each pair separated by colons. For example, 00:90:a9:ec:69:af. You can typically find it on a label on the case if your connecting to a stand alone device. On a Linux machine, you can find the MAC using the ifconfig command. Here’s the code:

argumentParser = argparse.ArgumentParser(description="Show the differences between a set of files and a backup copy")
argumentParser.add_argument("-H", "--hostname", help="name of the host; if no MAC specified, look up in ~/.hosttomac")
argumentParser.add_argument("-m", "--mac", help="ethernet MAC address; if specified write to ~/.hosttomac")
arguments = argumentParser.parse_args()

Next, I create an empty map of host names to MAC addresses, then look for a JSON file containing an existing map and, if found, read it in:

hostToMac = {}

    with open(os.getenv("HOME") + "/.hosttomac") as hostToMacFP:
        hostToMac = json.load(hostToMacFP)
except IOError as error:
    if os.path.exists(os.path.realpath(os.getenv("HOME") + "/.hosttomac")):
        raise error

Next, if a host name was specified and a MAC was too, if the host is already in the map, make sure the MACs match. If no MAC was specified, make sure there’s one in the map.

if arguments.hostname:
    if arguments.mac:
        if (hostToMac.get(arguments.hostname) and arguments.mac 
         != hostToMac[arguments.hostname]):
            sys.exit("Host " + arguments.hostname"
                        + " is already in ~/.hosttomac with MAC "
                        + hostToMac[arguments.hostname] + " but you specified "
                        + arguments.mac)

    elif hostToMac.get(arguments.hostname):
        arguments.mac = hostToMac[arguments.hostname]

     sys.exit("Host " + arguments.hostname 
                 + " specified without a MAC and not found in ~/.hosttomac")

Next, I grab the output of arp -n, which gives the mappings from IP to MAC address of all the devices the Linux machine knows about. I then go through it looking for lines that map an IP to a MAC, and see if there’s one with the MAC that matches the host, and if there is, record its IP.

output = subprocess.check_output(["arp", "-n"])
 linePat = re.compile(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+ether\s+'
                                       + r'([\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f]:[\da-f][\da-f])')
 hostIp = None

for line in output.split('\n'):
    match = linePat.match(line)

    if match and == arguments.mac:
        hostIp =

if not hostIp:
     sys.exit("MAC address " + arguments.mac 
                 + " not found in arp -n output; try running: sudo arp-scan -l")

Finally, if the host wasn’t already in the map, add it along with the MAC and save the map file. This way, once the MAC has been confirmed as valid, it’s saved and you never have to specify it again.

if not hostToMac.get(arguments.hostname):
     hostToMac[arguments.hostname] = arguments.mac

    with open(os.getenv("HOME") + "/.hosttomac", "w") as hostToMacFP:
        json.dump(hostToMac, hostToMacFP)

You can then do what you want with your IP:

    print hostIp
Posted in Uncategorized | Leave a comment

Avro C Diagnostics

avro-rotaI was experimenting with Avro C, a library from the Apache project used to encode/decode information whose Java implementation that is commonly used to enode data being stored in Kafka, a log based data distribution system. Avro C is fairly difficult to use. In order to get it to encode my data (a simple log file), I had to climb into the code and enhance the diagnostics. I’ve just published the enhanced library on GitHub:

Here’s a summary of the changes:

  • In datafile.c, all errors from avro_[*_]write functions were being treated as buffer overflows, and the low level error messages (the real problems) overwritten. I changed the buffer overflow messages to prefix the lower level messages.
  • In datum.c, added detailed errors on failure of avro_record_get, including the set of valid field names.
  • In datum_validate.c, added detailed errors on failure of avro_schema_datum_validate.
  • In datum_write.c, changed avro_write_data to take advantage of the change to datum_valididate.c.
  • In st.c, added a new function st_keys_as_string that extracts a comma separated list of the keys in an st_table.

Feel free to use this version if you like.

Posted in Uncategorized | Leave a comment

Go is in the Top 10, According to Tiobe

tiobe-logoIts July 2017, and since last July, there has been a huge shift in the top programming languages, according to the Tiobe index. Though the top 6 languages remain unchanged, all have declined. Visual Basic .NET traded places with Javascript to move into 7th (down from 6th last October), and Delphi/Object Pascal leaped from 12th to 9th. But the most impressive gain came for the upstart Go programming language, which rose from 55th to 10th. This was due to a whopping 2.2% increase in mindshare. Java, the most popular language, has 13.75%.

Rounding out the top 20: perl fell back out of the top ten to 11th; swift (Apple’s replacement for Objective C) rose to 12th; Ruby fell to 13th; Assembly fell out of the top ten to 14th; R, the mathematics friendly language, rose to 15th; Visual Basic dropped to 16th; Matlab dropped a notch to 17th; Objective C plummeted to 18th; teaching language Scratch joined the top 20 at 19th; and PL/SQL, Oracle’s stored procedure language, dropped to 20th.

I noted Go’s impressive rise last October. I wonder if javascript’s drop may in some measure be attributable to Go taking mind share from Node, the server side javascript environment. Delphi’s comeback is a big surprise to me. Visual Basic .NET’s increase matches the declines in legacy Visual Basic and, more interestingly, C#, the other big .NET language, which managed to hold on to it’s 5th place ranking.

Here’s the last year of Tiobe’s graph, stretched out:


As you can see, the year has not been kind to Java or C. Go, on the other hand, had a huge surge near the end of last year, followed by steady growth. It’s crazy that the most popular language is favored by little over 1 in 8 developers.



Posted in Uncategorized | Leave a comment

cmph Comparitive Graphs

I just enhanced cmph-bench/ to allow graphing a single attribute across multiple minimal perfect hashing algorithms. The updated code is available in GitHub. For example, the following command generates a CSV of lookup times for four algorithms:

./ -t chd chd-ph brz brz-ph

Here’s what the output looks like:

"algo", 0, 1000000, 2000000, 3000000, 4000000, 5000000, 6000000, 7000000, 8000000, 9000000, 10000000,
"chd", 0, 771125, 1845753, 2877126, 3630860, 4724457, 6133570, 7256998, 9365072, 9743756, 10769506,
"chd-ph", 0, 750386, 1672261, 2516055, 3620473, 4914581, 6514058, 6985887, 8423580, 10685038, 10831800,
"brz", 0, 753254, 1756762, 2678971, 3530783, 5239339, 6505353, 7822440, 8512124, 10439278, 10784875,
"brz-ph", 0, 813923, 1816921, 2587742, 3700242, 4648037, 5710205, 7039408, 8217433, 9588468, 11932299,

Finally, here’s the graph:


There’s not a whole lot of difference between the algorithms, speed wise.

Posted in c programming, python programming | Tagged , , , | Leave a comment

Using a cmph as a Set

set-theoryGiven the extremely low overhead of the minimal perfect hashes generated by cmph (on the order of 1/2 a byte per key), I was fairly certain they would generate false positives. That is, if you searched for a key that isn’t in the hash, cmph would tell you it was. As I suspected, this is the case. This has big implications for using minimal perfect hashes as sets.

When you look up any key with cmph_search, it returns an index. To determine whether that index refers to the same key as the one you looked up, you need to keep an array of offsets. Assuming you have more than 65000 keys, that means you will have a 4 byte overhead per key. Once you have the offset, you then need to compare the key at that offset to the one you looked up.

Update (2017-08-30): I have verified that the indexes returned by cmph_search are random. This means you need to create the cmph, then walk back across it looking up each of the strings you loaded it with, and save the string’s offset at the returned index in your array of offsets.

The code to look up completely random keys in a cmph hash is now available in cmph-bench on GitHub. The next step in the benchmark will be to add the index and the keys to the algorithm, and to add the ability to benchmark other solutions, including binary searching and using a dynamic hash like the ones bench-marked in hash-table-shootout.

Posted in c programming | Tagged , , , , | Leave a comment

CMPH Benchmark Bar Chart

I added a little more code to the cmph-bench project in GitHub. It’s described in the previous post, cmph-bench – A Benchmark for Minimal Perfect Hashing in C. The bench mark program now randomizes the order of the keys before looking them up, and it has a -c option to output the results as a row of comma separated values suitable for storing in a spreadsheet.

The new program runs the benchmark on ten data sets ranging in size from 1,000,000 to 10,000,000. You can then take the CSV it outputs and graph it:


This graph was constructed using data generated from the CHD algorithm.

For each number of keys, the times to build, load, and lookup all the keys in random order in the resulting MPH, are in microseconds. Load times are unobservably short. Build times start out at less than a microsecond per key, but slowly grow past that ratio, though the degradation appears to be linear. Lookup times are similar, though slightly faster.

The number of bytes needed to build the structure and the size of the structure itself are virtually equal. They are roughly half the number of keys, meaning each key requires about a half a byte of overhead. Not bad.


Posted in c programming, python programming | Tagged , , | Leave a comment