essexeld: Open Source Domain/URL Block List Daemon

essexessexld is a simple, high performance service, written in C using the sxe event driven programming library, for serving information about domains and URLs over HTTP from Linux. It was built and tested on Mint/debian Linux. It doesn’t support multiple categories, but could easily be extended to do so.

The source code is under an MIT like license, allowing you to do almost anything with it. It can be downloaded from my GitHub repository:

The remainder of this article borrows heavily from the README (see the repo for all the details).

Dependencies

essexeld requires the latest version of sxe, which is available from my GitHub account. essexeld can be built using the latest version of my experimental baker build tool, also available from my GitHub account.

Data

The unit tests now use tables data/domains and data/urls. These tables were generated from the MESD blacklist’s porn domains and urls lists (available here: http://www.squidguard.org/blacklists.html) using the utility program util/target/essexeld_blacklist. The generated tables are sorted lists of MD5 checksums. To regenerate the tables, run the following commands:

  1. util/target/essexeld_blacklist domains-file > data/domains
  2. util/target/essexeld_blacklist -f data/domains urls-file > data/urls

Initial Benchmark

The script test/benchmark_essexeld.py throws the 165719 URLs in the MESD blacklist porn urls list at the service. Running the benchmark on a single system (via the loopback address, 127.0.0.1):

SXE Build Total Time URLs/s
debug 18420s 9
release 35s 4735

Deploying and Restarting the Server

The server can be deployed with ansible using the provided playbook.yml file. See the README for further details.

To restart the server, ssh in to each host, become the root user, and run:

nohup /opt/essexeld/bin/essexld -p 80 &

Using the Server

The protocol for looking up the block list is simple. Issue an HTTP get request with the path string /urlinfo/1/host.name[:port][/query/string]. If the URL is not found, the server will respond with status code 404 (Not Found). If found, the server will respond with 200 (OK). Note: the body will contain the hard-coded string “porn”.

Posted in c programming | Tagged , , , , , , , , | Leave a comment

An Extension for Python re: regex

One thing I dislike about python is how painful it is to use regular expressions that capture matching groups of characters. For example, if I want to find words in quotes, I would write:

quotedWordPattern = re.compile(r'"([^"]+)"(.*)')
match = quotedWordPattern(text)

while match:
    print match.group(1)
    text  = match.group(2)
    match = quotedWordPattern(text)

This is a lot clunkier than the equivalent perl:

while (text =~ /"([^"]+"(.*)/) {
    print $1
    text = $2
}

To get around this, ideally I would like to subclass python’s re, but it turns out that, even in python version 3, res aren’t really first class objects. So instead, I created a wrapper class called regex. Using it, I can write:

quotedWordPattern = regex.compile(r'"([^"]+)"(.*)')

while quotedWordPattern(text):
    print quotedWordPattern.last_match.group(1)
    text = quotedWordPattern.last_match.group(2)

Here’s the code of regex.py:

import re

class Regex():
    def __init__(self, pattern, flags=0):
        self.re         = re.compile(pattern, flags)
        self.last_match = None
        
    def search(self, string, pos=0, endpos=0):
        self.last_match = (self.re.search(string, pos, endpos) 
                           if endpos else self.re.search(string, pos))
        return self.last_match
        
    def match(self, string, pos=0, endpos=0):
        self.last_match = (self.re.match(string, pos, endpos) 
                           if endpos else self.re.search(string, pos))
        return self.last_match
        
    def fullmatch(self, string, pos=0, endpos=0):
        self.last_match = (self.re.fullmatch(string, pos, endpos) 
                           if endpos else self.re.search(string, pos))
        return self.last_match

def compile(pattern, flags=0):
    return Regex(pattern, flags)

Note that all operations other than search, match, and fullmatch can be accomplished using the regex’s re member, and flags can be imported from re if needed. For example:

import re
import regex
pattern = regex.compile(r'\wJim\w', re.UNICODE)

if pattern.match(name):
    bio = pattern.re.sub("James", bio)

 

Posted in python programming | Tagged , , , , , | Leave a comment

Python Unit Testing with pytest

pytestPytest is a unit testing framework for python that offers significantly more than the standard unittest module and nose test runner. For example, it supports parameterizing tests, and coverage analysis.

 

To install it:

  1. If you haven’t already, install pip: sudo apt-get install python-pip
  2. Install pytest: sudo pip install pytest

 

Posted in Uncategorized | Leave a comment

The Rise of Go

golangA first glance at the Tiobe index shows no change since the beginning of the year in the top five programming language. Looking closer, four of the five (Java, C, C#, and python) had lower ratings, with C dropping by a whopping 6 percentage points. Javascript has moved up from 8th to 6th, reflecting the steady increase of client side (in browser) applications being developed. Objective C, assembly and Swift all rose past former 10th place Ruby, not because of a loss in popularity by Ruby but due to gains that they made.

The big news, according to Tiobe, is Go, rising from 65th to 16th. (Groovy, the Java Virtual Machine (JVM) based scripting language, had a similar rise, from 32nd to 17th.) I would have to agree that Go looks like the new up and comer. It will be interesting to see if Go continues to rise. As a compiled language focused on systems programming, Go seems like a safe bet to be on the rise at the expense of the venerable C programming language.

Posted in Uncategorized | Tagged , , | Leave a comment

LMDE 2 Cinnamon is Toast, Mate

After installing LMDE 2 on two different laptops, I had the same problem on both: The wireless, which had previously worked flawlessly, began failing, and the only solution I found that worked was to reboot the machine each time it happened. Unacceptable.

On one of the machines, I tried upgrading the OS. After rebooting, Cinnamon failed to start. I reinstalled again, and immediately upgraded and rebooted. Again, Cinnamon failed to start after the reboot. The fallback GUI that did come up is complete crap. Boo!

mateI decided to try the MATE (Gnome 2) desktop instead, following this how-to: How to install MATE in Linux Mint Cinnamon Edition. I followed the section Install MATE using Software Manager (the software manager is available in the fallback GUI). Note that the name of the package is mint-meta-debian-mate on LMDE 2.

After rebooting, you need to click on the Ci icon in the top right corner of the password dialog. Choose MATE from the pop up  menu, then log in. Fix the ridiculous default where minimised windows disappear by right clicking on the task bar, choosing + Add to Panel …, then selecting Window List and clicking Add.

Update

The upgrade seems to have fixed the problem with the wireless driver. I discovered from the Linux Mint forums that the problem with Cinnamon is known, and has been there since April 2016. The workaround is to apt-get install cinnamon. Why this seemingly simple to fix problem has not been addressed is a mystery. Sorry, Mate, but I like Cinnamon better.

Posted in Uncategorized | Tagged , , , , | 1 Comment

Linux Mint Debian on Toshiba Satellite

Here’s how to install from a thumb drive.

  1. Plug it in
  2. Reboot the laptop
  3. Press F12 on the way up
  4. Select USB
  5. Select Install Linux Mint from the desktop
  6. When the time comes to partition, I choose to delete them all. In the past, I’ve installed in one big partition. This time around. I elected to create an 8Gbyte swap partition, then gave Linux the rest of the disk.
  7. Once installed, reboot and enter the wireless password
  8. Run: sudo apt-get update
  9. Run: sudo apt-get upgrade

After rebooting, I was unable to log in with Cinnamon and switched to MATE. See LMDE 2 Cinnamon is Toast, Mate for details.

I still couldn’t get the automated installation of Chrome to work. Here’s the manual process that actually works: Installing Chrome on LMDE 2

If you are able to get Cinnamon to work, here are a couple of configurations you might want:

  1. Open Menu/Preferences/Mouse and Touchpad, select the Touchpad tab, and uncheck Tap to click
  2. Open Menu/Preferences/Window Tiling and Edge Flip, and uncheck Enable Window Tiling and Snapping

 

Posted in Uncategorized | Leave a comment

OOPC (Object Oriented Programming in C) Conventions

OOPBecause C is not an object oriented programming language, if you want to use it for object oriented programming, you need to do it yourself. But how best to do this? Here are some of the conventions I prefer to follow:

  1. If the size of the class is fixed, it should be declared like this:

    typedef struct _package_Class {
    … members …
    } * package_Class;

  2. The size of a fixed sized class can be determined with a macro like:

    #define SEYMOUR_CLASS_SIZEOF(className) sizeof(struct _##className)

  3. Fixed size objects can be allocated on the stack as follows:

    package_Class object = alloca(SEYMOUR_CLASS_SIZEOF(className));

  4. If the size of the class is variable, it should be declared like this:

    typedef struct _package_Class * package_Class;

  5. The size of a variably sized class should be determinable by a function, which you need to provide:

    size_t package_Class_SizeOf(…parameters…)

  6. Objects of variably sized classes can be allocated on the stack using alloca, and dynamically using malloc. For example, this object will be allocated on the stack and automatically deallocated on return from the function:

    package_Class object = alloca(package_Class_SizeOf(1024)));

  7. Preallocated objects should be constructed using:

    void package_Class_construct(package_Class this, …parameters…)

  8. If there are special constraints on the allocation and deallocation of objects of a class, the class may omit the constructor and implement only an allocator:

    package_Class package_Class_New(…parameters…)

  9. In this case, the corresponding deallocator should be used to free the object:

    void package_Class_delete(package_Class this)

  10. Classes should implement a toString method:

    const char * package_Class_toString(
    package_Class this, char * bufferPointer, size_t bufferSize)

Following a consistent convention around the meaning of object variables (in this convention, objects are always passed by reference, and you never need to take their addresses) makes code easier to understand and verify.

Consistent naming conventions do the same. For example, _New and _SizeOf are upper/pascal case because they are functions (or “class methods”, if you prefer). But _construct, _delete and _toString are lower/camel case because they are object methods. The first parameter passed to a method is always an object (by convention, named this).

Posted in c programming | Tagged , , , , | Leave a comment