Minimal Perfect Hashing

mphfA perfect hash is a one-to-one map. A key is taken and hash (into an integer) and looked up in a single location in the map. If it is in the map, it will be found in that location. This makes perfect hashes incredibly fast and very light on the CPU cache and the virtual memory system of the OS.

A minimal perfect hash is the smallest perfect hash that can be found for a set of keys. Determining the minimal perfect hash by brute force is time consuming. The paper http://cmph.sourceforge.net/papers/esa09.pdf describes an algorithm for generating a minimal perfect hash in linear time. That means that the time to determine the minimal perfect hash is directly proportional to the number of keys.

I found a sample implementation of the algorithm in python in the article Throw away the keys: Easy, Minimal Perfect Hashing. This program can be run as a benchmark, and takes about 1.7s to run on my laptop. The article points to another implementation in python, this time of the MOS Algorithm II CACM92, by the scientist who discovered the algorithm. When I ran this version, which is supposed to have a compacter representation, the program never came back.

Update: I looked into why the MOS II algorithm went ballistic. It is not a linear algorithm. The part of the code it gets stuck in is a nested loop of O(n2). I left it running, and six hours later, it was still in this loop (the inner loop of which is labelled with the comment “Step 3: rotate patterns and search for suitable displacement”). It’s author claims “This code runs fast linearly”. I beg to differ.

Advertisements
Posted in Uncategorized | Leave a comment

CLI, Directories, and Gob in Go

go_lang1Building on my last post, Playing with Go (AKA golang), I’m going to hack a command line interface (CLI), do some directory parsing, and add some code to read back the note I serialized to a file using gob. To read what was typed on the command line, import the “flag” package. In main(), you can then:

flag.Parse()
fmt.Println(“arguments: “, flag.Args())

I implemented a couple of useful functions that are missing from go: die, which prints an error message and exits with status 1, and assert, which tests a condition and, if false, dies with a message. They both use go’s bizarre “varargs” syntax, … interface{}, to allow me to pass a format string and an arbitrary list of arguments to a printf like formatter.

The code from the last post now implements the “new” command. I’ve implemented two additional commands, “list”, which lists all the notes dates in order, and “show”, which displays the body of the most recent note. Both the new commands use ioutil.ReadDir to get the list of the notes files in sorted order. The show command uses a gob decoder to read in and decode the note in the latest note file.

Here’s the complete program. Cut and paste it into your editor, save it, and go build it, then play with it. If you have any questions about it, feel free to post a comment.

package main

import "encoding/gob"
import "flag"
import "fmt"
import "io/ioutil"
import "os"
import "strings"
import "time"

type Note struct {
    Body []byte
    Created time.Time
}

func die(format string, args ...interface{}) {
    line := fmt.Sprintf(format, args...)
 
    if (!strings.HasSuffix(line, "\n")) {
        line += "\n"
    }
 
    fmt.Fprint(os.Stderr, line)
    os.Exit(1)
}

func assert(condition bool, format string, args ...interface{}) {
    if (!condition) {
        die(format, args...)
    }
}

func usageIf(condition bool) {
    if (condition) {
        die("usage: gnosis (new|list|show)")
    }
}
 
func main() {
    flag.Parse()
    args := flag.Args()
 
    usageIf(len(args) == 0)
 
    if (args[0] == "new") {
        var note Note
        note.Body, _ = ioutil.ReadAll(os.Stdin)
        note.Created = time.Now().UTC()

        os.Mkdir("notes", 0700)
        name := note.Created.Format("notes/2006-01-02T15:04:05.999999999.gob")
        file, err := os.OpenFile(name, os.O_CREATE | os.O_WRONLY, 0600)
        assert(err == nil, "Error '%v' creating file %v", err, name) 
        encoder := gob.NewEncoder(file)
        err = encoder.Encode(note)
        assert(err == nil, fmt.Sprintf("Error '%v' encoding note %v", err, note))
        file.Close()
        os.Exit(0)
    }
 
    if (args[0] == "list") {
        files, err := ioutil.ReadDir("notes")
        assert(err == nil, "Error '%v' reading directory 'notes'", err)

        for _, file := range files {
            stamp := file.Name()
            stamp = stamp[0:len(stamp) - 4] // Strip away .gob suffix
            fmt.Print(stamp + "\n")
        }
 
        os.Exit(0)
    }

    if (args[0] == "show") {
        files, err := ioutil.ReadDir("notes")
        assert(err == nil, "Error '%v' reading directory 'notes'", err)
        name := "notes/" + files[len(files) - 1].Name()
        file, err := os.Open(name)
        assert(err == nil, "Error '%v' opening file %v", err, name)
        decoder := gob.NewDecoder(file)
        var note Note
        err = decoder.Decode(&note)
        assert(err == nil, "Error '%v' decoding note", err)
        file.Close()
        fmt.Print(string(note.Body))
        os.Exit(0)
    }
 
    usageIf(true)
}

 

Posted in golang programming | Tagged , , , , , , , | Leave a comment

Playing with Go (AKA golang)

golangTo installing Go on debian Linux:

sudo apt-get install golang

Check your version like this:

go version
> go version go1.3.3 linux/amd64

Using you favorite text editor (IDEs are for the weak), create a file with the .go extension. I’ll call mine gnosis.go. Code a minimal go program like this (from go’s Getting Started page):

package main

import "fmt"

func main() {
 fmt.Printf("hello, world\n")
 }

Compile it like this (run from the directory where you save the program):

go build

Finally, you can run it, like this:

./gnosis
> hello, world

I want to make it do something interesting, so I will start with a simple note taking app. First, the gnosis program must read the note from standard input, which by default is the keyboard.

package main

import "io/ioutil"
 import "fmt"
 import "os"

func main() {
    note, _ := ioutil.ReadAll(os.Stdin)

    fmt.Print(string(note))
 }

After compile and running it, I typed:

This is a note
Line 2 of the note
[Ctrl-D]

In response, the program printed:

This is a note
Line 2 of the note

Some interesting things to note: ioutil.ReadAll reads the entire contents of a file (in this case, the standard input), and returns the content as a (byte-array, error) ordered pair. Specifying the error be stored in ‘_’ rather than a named variable tells Go to throw it away. The string function is used to convert the byte array into a string of characters. Finally, I typed [Ctrl-D] because in Linux, that is the way to simulate the end of file from the keyboard.

I would like to have a time-stamp associated with each note. To do this, I imported “time” and added this code right before the previous call to fmt.Print:

import "time"

timestamp := time.Now().UTC().Format("2006-01-02T15:04:05.999999999")
 fmt.Print(timestamp + "\n")

Compiling and running the program now causes the time-stamp for the current time to be output in the format shown on a line before the text of the note. But this leaves one more problem: I want to remember what I wrote in my note, so instead of just printing it out, I store it in a structure and write it to a file:

package main

import "encoding/gob"
import "fmt"
import "io/ioutil"
import "os"
import "time"

type Note struct {
    Body    []byte
    Created time.Time
}

func main() {
    var note Note
    note.Body, _ = ioutil.ReadAll(os.Stdin)
    note.Created = time.Now().UTC()

    os.Mkdir("notes", 0700)
    name := note.Created.Format("notes/2006-01-02T15:04:05.999999999.gob")
    file, _ := os.OpenFile(name, os.O_CREATE, 0400)
    encoder := gob.NewEncoder(file)
    encoder.Encode(note)
    file.Close()
}

There’s a lot more going on here. First, I’ve created a type called Note that contains a body (what you type in) and a creation time. Next, I read the body, as before, and capture the creation time (without formatting it as a string this time).

NOTE: This took me a while to figure out: If the structure member’s names don’t begin with an upper case letter, they are not serialized. This is a very annoying quirk of the gob package that is not documented.

Next, I write the note out as a GOB, which can later be read back in and used. If there isn’t already a notes subdirectory, I create one. Then, I create a file whose name is the string formatted time-stamp of the note, followed by the extension .gob. Finally, I encode the note into the file as a GOB.

The GOB encodes the note in binary form. If you cat the file, it’s contents are quite unreadable, as parts of it are encoded. Try dumping it a command like the following one, with the time-stamp of the filename replaced with the one just created:

 od -c notes/2017-06-15T02\:14\:53.864365634.gob

You should see output like this:

od-gob

The first part of the file is a description of the type (Note) and its members (Body and Created). This is followed by the value, which includes the string you entered as the body and a binary representation of the time-stamp. This value can be decoded into a variable of type Note. I’ll discuss doing so in a future post.

Posted in golang programming | Tagged , , , , | 1 Comment

Installing Linux Mint Debian Edition 2

My previous article on this subject was actually a guide to Configuring Linux Mint Debian Edition 2 (Betsy). In this article, I’ll talk about the actual installation process. First, download the installation ISO:

  1. Browse to https://www.linuxmint.com/download_lmde.php
  2. Select your version at the bottom. I recommend Cinnamon [64-bit]
  3. Pick a mirror and download the file lmde-2-201503-cinnamon-64bit.iso
  4. Run: cd ~/Downloads

To put it on a thumb drive (the first two steps, from Creating a bootable USB drive from an ISO image, may not be required):

  1. Run: sudo apt-get install syslinux-utils
  2. Run: isohybrid lmde-2-201503-cinnamon-64bit.iso
  3. Before inserting your thumb drive, run: lsblk
  4. Insert the thumb drive and wait for Linux to automount it, then rerun: lsblk
  5. The new device that didn’t show up the first time you ran lsblk is the thumb drive.
  6. Unmount it. E.g.: umount /media/jim/USB STICK
  7. Copy the ISO to the thumb drive. E.g.: sudo dd if=path/to/image.iso of=/dev/sdb

If your laptop won’t boot off a USB thumb drive (like my piece of crap Toshiba Satellite L850D), you can burn the ISO to a DVD-RW using brasero. Make sure to burn the image to disk, not the ISO file as data.

Caveat #1

The longstanding bug where, after you sudo apt-get update and sudo apt-get upgrade, cinnamon is broken when you reboot, is still in full force. If you run into this, boot into the backup GUI, access Menu/Administration/Synaptic Package Manager, and update cinnamon.

Caveat #2

I ran into this totally weird problem: on boot, the system would no longer accept my password. Entering it using the mouse keyboard widget worked. When I checked in Menu/Preferences/Keyboard, the language had been set to Cherokee.

Posted in systems administration | Tagged , , , , , , , , | Leave a comment

Why is Booting From USB on a Toshiba L850D Laptop so Difficult?

So far, trying to boot from a USB thumb drive on my Toshiba L850D has been an abject failure. The steps I followed here have leave the laptop in a state where it still won’t boot off the thumb drive, but also fails to boot Windows off the hard disk. It is possible to boot off the DVD drive.

The Toshiba L850D BIOS access is truly horrible. The process to have the laptop boot off USB that I’ve pieced together, which has left the laptop in a disfunctional state, is this:

  1. Power the laptop off.
  2. Hold down the Fn and F2 keys
  3. Power the laptop on and hold the keys for a second or so. The machine will then boot to an Automatic Repair menu.
  4. Select See Advanced Repair Options
  5. Select Trouble Shoot
  6. Select Advanced Options
  7. Select UEFI Firmware Settings
  8. Select Restart. The laptop should reboot into the BIOS settings screen
  9. Select Boot on the top menu (using the left arrow key)
  10. Select USB (using the down arrow)
  11. Move USB up in the boot order by pressing the F6 key
  12. Select Security on the top menu
  13. Select Secure Boot using the down arrow and press F5 to toggle its value to Disabled
  14. Select Advanced on the top Menu
  15. Select >System Configuration using the down arrow, the press Enter
  16. Select Boot Mode using the down arrow and press F5 to toggle its value to CSM Boot
  17. Press F10 to Save and Exit and Enter to confirm

Here is one source I followed: http://www.tomshardware.com/forum/64755-63-toshiba-satellite-boot

I’ve read that you can completely reset the BIOS by removing the CMOS backup battery. This did not work for me. After trying this, the laptop still refused to boot from the hard drive. I believe it is because the CSM Boot is still enabled.

Note: After setting it to boot in CSM mode, you can no longer enter the BIOS with Fn-F2. Instead, press F12 at boot and you’ll get a boot menu with an option to change the BIOS configuration.

I finally gave up and booted the laptop off DVD by selecting the correct option from the boot menu. Note that after booting the LMDE2 installer, the track-pad and keyboard did not work unless boot mode was set to CSM Boot.

Posted in systems administration | Tagged , , , , , | Leave a comment

Enabling Rsync to the WD My Cloud

wd-my-cloudLinux’s rsync utility is a simple way to do incremental backups.This saves a lot of time, because rsync will only copy files that have changed. If you have a large collection of MP3s, for example, rerunning an rsync backup only copies the new files (or modified files if you’ve renamed or retagged them) to the backup.

Here’s how to set up the WD My Cloud to be able to back up to it and restore from it using rsync from Linux:

  1. Run: arp -n
  2. If the MAC address of your WD My Cloud doesn’t show, try: sudo arp-scan -l
  3. Copy the IP address that maps to the MAC address of your WD My Cloud
  4. Paste it into the address bar of your browser
  5. After logging in to your WD My Cloud, you should probably change the root password
  6. You can create users and shares from the GUI
  7. Shares will be stored in the /shares directory and are owned by root
  8. From the GUI, click Settings (you may need to scroll the top menu to the left to see this button)
  9. Select Network from the menu
  10. Set SSH to ON

Some important points:

  1. The default ssh user sshd is an alias for root. To log in to the box, run (substituting the correct IP, determined with arp): ssh sshd@192.168.0.11
  2. Do not try to back up to the root account. It is on a tiny filesystem. Instead, specify the full path to one of the shares you’ve created
  3. You can “dry run” any rsync command to see what it would do by including the -n option, but it doesn’t work well, IMO.

Example backup command:

rsync -ahv Documents root@192.168.0.11:/shares/jim

Restore with the same command, switching the source and destination:

rsync -ahv root@192.168.0.11:/shares/jim/Documents ~

You can use the file browser to look at what’s stored in your share.

Note: I followed the first two parts of How to create your own private DropBox with WD MyCloud to set a shared directory and the SSH daemon on the WD My Cloud. Later steps that in that document that show how to set up an rsync daemon do not work for me. It looks as though WD has removed the ability to modify the software using apt.

Here’s a knowledge base article on using rsync that I found helpful: Using rsync to Transfer and Synchronize Local and Remote Systems

Posted in systems administration | Tagged , , , , | Leave a comment

Using a Laptop Drive as a Huge USB Thumb Drive

huge-thumbIf you have an old laptop with a decent size spinning disk, you can pick yourself up a USB hard drive enclosure for under $10, remove the laptop drive, stick it in the enclosure, and use it to transfer large amounts of data (e.g. your mp3 collection) between computers quickly and easily. Once you get the drive formatted properly, it will act like a really large thumb drive.

Using Linux, it’s fairly easy to prepare the drive:

  1. Plug it in to a USB port
  2. Run dmesg and look for the device name in the log. For example, on my machine, the first USB drive is mounted as /dev/sdb1
  3. Unmount the partition. E.g.: sudo umount /dev/sdb1
  4. Run fdisk on the master device: sudo fdisk /dev/sdb
    • Note that the 1 is intentionally omitted
  5. Issue the p command to see the existing partitions on the drive
  6. If you are sure there’s nothing you need on any of the partitions, you can delete them all by running the d command repeatedly
  7. Create a new partition with the a command, and give it the entire disk
  8. Write the new partition table and exit fdisk with the w command
  9. Create an NTFS file system on the drive: sudo mkfs.ntfs -f /dev/sdb1
    • Note that you can use ext3 if you don’t need to use the drive from Windows

You can verify the size of the drive by mounting it:

  1. Create a mount point directory. E.g.: sudo mkdir /media/disk
  2. Mount the drive: sudo mount /dev/sdb1 /media/disk
  3. Run: df -h
  4. Unmount the drive when done: sudo umount /dev/sdb1

The output of df -h will look something like this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb1       150G   70M  149G   1% /media/disk
Posted in systems administration | Tagged , , , , , , | Leave a comment