essexeld: Open Source Domain/URL Block List Daemon

essexessexld is a simple, high performance service, written in C using the sxe event driven programming library, for serving information about domains and URLs over HTTP from Linux. It was built and tested on Mint/debian Linux. It doesn’t support multiple categories, but could easily be extended to do so.

The source code is under an MIT like license, allowing you to do almost anything with it. It can be downloaded from my GitHub repository:

The remainder of this article borrows heavily from the README (see the repo for all the details).

Dependencies

essexeld requires the latest version of sxe, which is available from my GitHub account. essexeld can be built using the latest version of my experimental baker build tool, also available from my GitHub account.

Data

The unit tests now use tables data/domains and data/urls. These tables were generated from the MESD blacklist’s porn domains and urls lists (available here: http://www.squidguard.org/blacklists.html) using the utility program util/target/essexeld_blacklist. The generated tables are sorted lists of MD5 checksums. To regenerate the tables, run the following commands:

  1. util/target/essexeld_blacklist domains-file > data/domains
  2. util/target/essexeld_blacklist -f data/domains urls-file > data/urls

Initial Benchmark

The script test/benchmark_essexeld.py throws the 165719 URLs in the MESD blacklist porn urls list at the service. Running the benchmark on a single system (via the loopback address, 127.0.0.1):

SXE Build Total Time URLs/s
debug 18420s 9
release 35s 4735

Deploying and Restarting the Server

The server can be deployed with ansible using the provided playbook.yml file. See the README for further details.

To restart the server, ssh in to each host, become the root user, and run:

nohup /opt/essexeld/bin/essexld -p 80 &

Using the Server

The protocol for looking up the block list is simple. Issue an HTTP get request with the path string /urlinfo/1/host.name[:port][/query/string]. If the URL is not found, the server will respond with status code 404 (Not Found). If found, the server will respond with 200 (OK). Note: the body will contain the hard-coded string “porn”.

Advertisements

About jimbelton

I'm a software developer, and a writer of both fiction and non-fiction, and I blog about movies, books, and philosophy. My interest in religious philosophy and the search for the truth inspires much of my writing.
This entry was posted in c programming and tagged , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s