Carlo Hamalainen


Subkeys in GPG for a YubiKey

2016-12-31

I recently got two YubiKeys to try out another form of 2FA and to see how they work with my PGP setup (Enigmail and Thunderbird). I followed ankitrasto's guide (part 1 and part 2) to move a key to the YubiKey.

I then exported my public key with gpg2 -a --export carlo@carlo-hamalainen.net and sent it to a friend. He replied with the reasonable question: why didn't the fingerprint E3E4A5B8 change? The exported data changed (was longer) yet the fingerprint, which looks like a hash, was the same.

What's going on here is that originally I had a main key E3E4A5B8 for signing (the "S" next to usage) and certification (the "C"). Meanwhile, encryption was done using a subkey 81E07A3C (the "E").

$ gpg2 --edit-key carlo@carlo-hamalainen.net
gpg (GnuPG) 2.1.11; Copyright (C) 2016 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Secret key is available.

sec  rsa4096/E3E4A5B8
     created: 2013-07-10  expires: never       usage: SC 
     trust: ultimate      validity: ultimate
ssb  rsa4096/81E07A3C
     created: 2013-07-10  expires: never       usage: E  
[ultimate] (1). Carlo Hamalainen 

From my friend's perspective, I only had one "public key", the one with fingerprint E3E4A5B8.

When I added two subkeys (one for each YubiKey), I got BE8897FA and 766D56F8. These entries have a card-no which refers to the serial number of the YubiKey where the subkey lives.

$ gpg2 --edit-key carlo@carlo-hamalainen.net
gpg (GnuPG) 2.1.11; Copyright (C) 2016 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Secret key is available.

sec  rsa4096/E3E4A5B8
     created: 2013-07-10  expires: never       usage: SC 
     trust: ultimate      validity: ultimate
ssb  rsa4096/81E07A3C
     created: 2013-07-10  expires: never       usage: E  
ssb  rsa2048/BE8897FA
     created: 2016-11-20  expires: never       usage: E  
     card-no: 0006 XXXXXXXX
ssb  rsa2048/766D56F8
     created: 2016-12-13  expires: never       usage: E  
     card-no: 0006 YYYYYYYY
[ultimate] (1). Carlo Hamalainen 

To see more detail about the keys, we have to inspect the packets in the export.

Packet dump

We can use gpg2 --list-packets --verbose to see what is in the output of gpg2 -a --export.

Here is the original packet dump when I had just the main key and the encryption subkey:

$ gpg2 -a --export | gpg2 --list-packets --verbose

# off=0 ctb=99 tag=6 hlen=3 plen=525
:public key packet:
    version 4, algo 1, created 1373439395, expires 0
    pkey[0]: C35E579D722847E70515382572B28200 (very long line, snipped)
    pkey[1]: 010001
    keyid: 269E0DC4E3E4A5B8
# off=528 ctb=b4 tag=13 hlen=2 plen=45
:user ID packet: "Carlo Hamalainen "
# off=575 ctb=89 tag=2 hlen=3 plen=568
:signature packet: algo 1, keyid 269E0DC4E3E4A5B8
    version 4, created 1373439395, md5len 0, sigclass 0x13
    digest algo 2, begin of digest 05 44
    hashed subpkt 2 len 4 (sig created 2013-07-10)
    hashed subpkt 27 len 1 (key flags: 03)
    hashed subpkt 11 len 5 (pref-sym-algos: 9 8 7 3 2)
    hashed subpkt 21 len 5 (pref-hash-algos: 8 2 9 10 11)
    hashed subpkt 22 len 3 (pref-zip-algos: 2 3 1)
    hashed subpkt 30 len 1 (features: 01)
    hashed subpkt 23 len 1 (key server preferences: 80)
    subpkt 16 len 8 (issuer key ID 269E0DC4E3E4A5B8)
    data: REDACTED001
# off=1146 ctb=b9 tag=14 hlen=3 plen=525
:public sub key packet:
    version 4, algo 1, created 1373439395, expires 0
    pkey[0]: REDACTED002
    pkey[1]: 010001
    keyid: EF86D47281E07A3C
# off=1674 ctb=89 tag=2 hlen=3 plen=543
:signature packet: algo 1, keyid 269E0DC4E3E4A5B8
    version 4, created 1373439395, md5len 0, sigclass 0x18
    digest algo 2, begin of digest b9 63
    hashed subpkt 2 len 4 (sig created 2013-07-10)
    hashed subpkt 27 len 1 (key flags: 0C)
    subpkt 16 len 8 (issuer key ID 269E0DC4E3E4A5B8)
    data: REDACTED003

With the two subkeys added for the YubiKeys, the dump now looks like this:


# off=0 ctb=99 tag=6 hlen=3 plen=525
:public key packet:
    version 4, algo 1, created 1373439395, expires 0
    pkey[0]: C35E579D722847E70515382572B28200 (very long line, snipped)
    pkey[1]: 010001
    keyid: 269E0DC4E3E4A5B8
# off=528 ctb=b4 tag=13 hlen=2 plen=45
:user ID packet: "Carlo Hamalainen "
# off=575 ctb=89 tag=2 hlen=3 plen=568
:signature packet: algo 1, keyid 269E0DC4E3E4A5B8
    version 4, created 1373439395, md5len 0, sigclass 0x13
    digest algo 2, begin of digest 05 44
    hashed subpkt 2 len 4 (sig created 2013-07-10)
    hashed subpkt 27 len 1 (key flags: 03)
    hashed subpkt 11 len 5 (pref-sym-algos: 9 8 7 3 2)
    hashed subpkt 21 len 5 (pref-hash-algos: 8 2 9 10 11)
    hashed subpkt 22 len 3 (pref-zip-algos: 2 3 1)
    hashed subpkt 30 len 1 (features: 01)
    hashed subpkt 23 len 1 (key server preferences: 80)
    subpkt 16 len 8 (issuer key ID 269E0DC4E3E4A5B8)
    data: REDACTED001
# off=1146 ctb=b9 tag=14 hlen=3 plen=525
:public sub key packet:
    version 4, algo 1, created 1373439395, expires 0
    pkey[0]: REDACTED002
    pkey[1]: 010001
    keyid: EF86D47281E07A3C
# off=1674 ctb=89 tag=2 hlen=3 plen=543
:signature packet: algo 1, keyid 269E0DC4E3E4A5B8
    version 4, created 1373439395, md5len 0, sigclass 0x18
    digest algo 2, begin of digest b9 63
    hashed subpkt 2 len 4 (sig created 2013-07-10)
    hashed subpkt 27 len 1 (key flags: 0C)
    subpkt 16 len 8 (issuer key ID 269E0DC4E3E4A5B8)
    data: REDACTED003
# off=2220 ctb=b9 tag=14 hlen=3 plen=269
:public sub key packet:
    version 4, algo 1, created 1479618635, expires 0
    pkey[0]: REDACTED004
    pkey[1]: 010001
    keyid: 6D682AD2BE8897FA
# off=2492 ctb=89 tag=2 hlen=3 plen=543
:signature packet: algo 1, keyid 269E0DC4E3E4A5B8
    version 4, created 1479618635, md5len 0, sigclass 0x18
    digest algo 8, begin of digest bc 2c
    hashed subpkt 2 len 4 (sig created 2016-11-20)
    hashed subpkt 27 len 1 (key flags: 0C)
    subpkt 16 len 8 (issuer key ID 269E0DC4E3E4A5B8)
    data: REDACTED005
# off=3038 ctb=b9 tag=14 hlen=3 plen=269
:public sub key packet:
    version 4, algo 1, created 1481611279, expires 0
    pkey[0]: REDACTED006
    pkey[1]: 010001
    keyid: 17118623766D56F8
# off=3310 ctb=89 tag=2 hlen=3 plen=543
:signature packet: algo 1, keyid 269E0DC4E3E4A5B8
    version 4, created 1481611279, md5len 0, sigclass 0x18
    digest algo 8, begin of digest a2 63
    hashed subpkt 2 len 4 (sig created 2016-12-13)
    hashed subpkt 27 len 1 (key flags: 0C)
    subpkt 16 len 8 (issuer key ID 269E0DC4E3E4A5B8)
    data: REDACTED007

1. In both dumps, my main key is the same (the green block). The fingerprint is 269E0DC4E3E4A5B8 in both dumps.

2. The next block is the same in both dumps - it is the encryption subkey. The signature packet is our way of proving that the subkey is attached to our main key. Note the line issuer key ID 269E0DC4E3E4A5B8.

3. This part is for my first YubiKey subkey, BE8897FA. The signature packet claims that the subkey is issued by 269E0DC4E3E4A5B8.

4. This part is for my second YubiKey subkey, 766D56F8. Its signature packet also claims that the subkey is issued by 269E0DC4E3E4A5B8.

pgpdump

An alternative to gpg2 --list-packets --verbose is pgpdump. This formats the packet dump in a nicer way, for example rendering the timestamp created 1373439395 as Public key creation time - Wed Jul 10 14:56:35 SGT 2013.

$ gpg2 --armor --export carlo@carlo-hamalainen.net | pgpdump
Old: Public Key Packet(tag 6)(525 bytes)
    Ver 4 - new
    Public key creation time - Wed Jul 10 14:56:35 SGT 2013
    Pub alg - RSA Encrypt or Sign(pub 1)
    RSA n(4096 bits) - ...
    RSA e(17 bits) - ...
Old: User ID Packet(tag 13)(45 bytes)
    User ID - Carlo Hamalainen 
Old: Signature Packet(tag 2)(568 bytes)
    Ver 4 - new
    Sig type - Positive certification of a User ID and Public Key packet(0x13).
    Pub alg - RSA Encrypt or Sign(pub 1)
    Hash alg - SHA1(hash 2)
    Hashed Sub: signature creation time(sub 2)(4 bytes)
        Time - Wed Jul 10 14:56:35 SGT 2013
    Hashed Sub: key flags(sub 27)(1 bytes)
        Flag - This key may be used to certify other keys
        Flag - This key may be used to sign data
    Hashed Sub: preferred symmetric algorithms(sub 11)(5 bytes)
        Sym alg - AES with 256-bit key(sym 9)
        Sym alg - AES with 192-bit key(sym 8)
        Sym alg - AES with 128-bit key(sym 7)
        Sym alg - CAST5(sym 3)
        Sym alg - Triple-DES(sym 2)
    Hashed Sub: preferred hash algorithms(sub 21)(5 bytes)
        Hash alg - SHA256(hash 8)
        Hash alg - SHA1(hash 2)
        Hash alg - SHA384(hash 9)
        Hash alg - SHA512(hash 10)
        Hash alg - SHA224(hash 11)
    Hashed Sub: preferred compression algorithms(sub 22)(3 bytes)
        Comp alg - ZLIB (comp 2)
        Comp alg - BZip2(comp 3)
        Comp alg - ZIP (comp 1)
    Hashed Sub: features(sub 30)(1 bytes)
        Flag - Modification detection (packets 18 and 19)
    Hashed Sub: key server preferences(sub 23)(1 bytes)
        Flag - No-modify
    Sub: issuer key ID(sub 16)(8 bytes)
        Key ID - 0x269E0DC4E3E4A5B8
    Hash left 2 bytes - 05 44
    RSA m^d mod n(4095 bits) - ...
        -> PKCS-1
Old: Public Subkey Packet(tag 14)(525 bytes)
    Ver 4 - new
    Public key creation time - Wed Jul 10 14:56:35 SGT 2013
    Pub alg - RSA Encrypt or Sign(pub 1)
    RSA n(4096 bits) - ...
    RSA e(17 bits) - ...
Old: Signature Packet(tag 2)(543 bytes)
    Ver 4 - new
    Sig type - Subkey Binding Signature(0x18).
    Pub alg - RSA Encrypt or Sign(pub 1)
    Hash alg - SHA1(hash 2)
    Hashed Sub: signature creation time(sub 2)(4 bytes)
        Time - Wed Jul 10 14:56:35 SGT 2013
    Hashed Sub: key flags(sub 27)(1 bytes)
        Flag - This key may be used to encrypt communications
        Flag - This key may be used to encrypt storage
    Sub: issuer key ID(sub 16)(8 bytes)
        Key ID - 0x269E0DC4E3E4A5B8
    Hash left 2 bytes - b9 63
    RSA m^d mod n(4095 bits) - ...
        -> PKCS-1
Old: Public Subkey Packet(tag 14)(269 bytes)
    Ver 4 - new
    Public key creation time - Sun Nov 20 13:10:35 SGT 2016
    Pub alg - RSA Encrypt or Sign(pub 1)
    RSA n(2048 bits) - ...
    RSA e(17 bits) - ...
Old: Signature Packet(tag 2)(543 bytes)
    Ver 4 - new
    Sig type - Subkey Binding Signature(0x18).
    Pub alg - RSA Encrypt or Sign(pub 1)
    Hash alg - SHA256(hash 8)
    Hashed Sub: signature creation time(sub 2)(4 bytes)
        Time - Sun Nov 20 13:10:35 SGT 2016
    Hashed Sub: key flags(sub 27)(1 bytes)
        Flag - This key may be used to encrypt communications
        Flag - This key may be used to encrypt storage
    Sub: issuer key ID(sub 16)(8 bytes)
        Key ID - 0x269E0DC4E3E4A5B8
    Hash left 2 bytes - bc 2c
    RSA m^d mod n(4096 bits) - ...
        -> PKCS-1
Old: Public Subkey Packet(tag 14)(269 bytes)
    Ver 4 - new
    Public key creation time - Tue Dec 13 14:41:19 SGT 2016
    Pub alg - RSA Encrypt or Sign(pub 1)
    RSA n(2048 bits) - ...
    RSA e(17 bits) - ...
Old: Signature Packet(tag 2)(543 bytes)
    Ver 4 - new
    Sig type - Subkey Binding Signature(0x18).
    Pub alg - RSA Encrypt or Sign(pub 1)
    Hash alg - SHA256(hash 8)
    Hashed Sub: signature creation time(sub 2)(4 bytes)
        Time - Tue Dec 13 14:41:19 SGT 2016
    Hashed Sub: key flags(sub 27)(1 bytes)
        Flag - This key may be used to encrypt communications
        Flag - This key may be used to encrypt storage
    Sub: issuer key ID(sub 16)(8 bytes)
        Key ID - 0x269E0DC4E3E4A5B8
    Hash left 2 bytes - a2 63
    RSA m^d mod n(4093 bits) - ...
        -> PKCS-1

Manually checking the packets

I noticed that the gpg2 dump for the packet at offset 3038 refers to the key id 17118623766D56F8 but the equivalent block from pgpdump has no key id:

# off=3038 ctb=b9 tag=14 hlen=3 plen=269
:public sub key packet:
    version 4, algo 1, created 1481611279, expires 0
    pkey[0]: REDACTED006
    pkey[1]: 010001
    keyid: 17118623766D56F8
Old: Public Subkey Packet(tag 14)(269 bytes)
    Ver 4 - new
    Public key creation time - Tue Dec 13 14:41:19 SGT 2016
    Pub alg - RSA Encrypt or Sign(pub 1)
    RSA n(2048 bits) - ...
    RSA e(17 bits) - ...

The RFC tells us how to calculate the fingerprint of a V4 packet:

A V4 fingerprint is the 160-bit SHA-1 hash of the octet 0x99, followed by the two-octet packet length, followed by the entire Public-Key packet starting with the version field. The Key ID is the low-order 64 bits of the fingerprint.

My Python snippet to do this is here. A snippet is below:

xs = read_binary_public_key()

SUBKEY_OFFSET = 3038
SUBKEY_PLEN   = 269

subkey_packet = xs[SUBKEY_OFFSET:SUBKEY_OFFSET+SUBKEY_PLEN+2] # +2 because the PLEN is short by one?

assert subkey_packet[2] == '\x04'

for_fingerprint = ['\x99'] + subkey_packet

h = hashlib.sha1(to_str(for_fingerprint))

assert h.digest_size == 20
key_id = to_hexdump(h.digest()[12:], s='')

assert key_id == '17118623766D56F8'

Subkey binding signature

The subkey packet at off=3038 defines the subkey 17118623766D56F8. The next packet at off=3310 provides proof that the key 17118623766D56F8 is attached to our main public key 269E0DC4E3E4A5B8.

The signature packet doesn't refer to the offset 3038 or the key id 17118623766D56F8 of the subkey packet, so let's check the contents of the signature packet to see if it really does match the subkey data.

# off=3038 ctb=b9 tag=14 hlen=3 plen=269
:public sub key packet:
    version 4, algo 1, created 1481611279, expires 0
    pkey[0]: REDACTED006
    pkey[1]: 010001
    keyid: 17118623766D56F8
# off=3310 ctb=89 tag=2 hlen=3 plen=543
:signature packet: algo 1, keyid 269E0DC4E3E4A5B8
    version 4, created 1481611279, md5len 0, sigclass 0x18
    digest algo 8, begin of digest a2 63
    hashed subpkt 2 len 4 (sig created 2016-12-13)
    hashed subpkt 27 len 1 (key flags: 0C)
    subpkt 16 len 8 (issuer key ID 269E0DC4E3E4A5B8)
    data: REDACTED007

The first thing that we can check is that the left 16 bits of the hash matches A2 63 (red line above). Checking this hash wasn't completely straightforward, just reading from RFC. (Other people ran into similar issues.) The full block of code is here. Sample is below:

signature_block = xs[3310:3310+543+2]

# Starts off with two bytes for the length.
assert 543 == (ord(signature_block[0]) << 8) + ord(signature_block[1])

hash_subpacket_length = (ord(signature_block[6]) << 8) + ord(signature_block[7])
assert hash_subpacket_length == 9

start_of_hashed_part   = 8
start_of_unhashed_part = 7 + hash_subpacket_length + 1

unhash_subpacket_length = ( (ord(signature_block[start_of_unhashed_part]) << 8)
                          +  ord(signature_block[start_of_unhashed_part+1])
                          )

start_of_left_16 = start_of_unhashed_part + unhash_subpacket_length + 2

left_16 = signature_block[start_of_left_16:start_of_left_16+2]

assert left_16 == ['\xA2', '\x63']

hashed_part_of_sig = signature_block[start_of_hashed_part:start_of_hashed_part+hash_subpacket_length]
assert len(hashed_part_of_sig) == hash_subpacket_length

public_key_block = xs[0:0+525+2] # +2 for what? 
assert public_key_block[2] == '\x04'

header1     = ['\x99'] + public_key_block[:8]
pubkey_body = public_key_block[8:]

header2     = ['\x99'] + subkey_packet[:8]
subsig_body = subkey_packet[8:]

# Version, class, pub key algo, digest algo.
version_class_algos = [ '\x04', '\x18',
                        signature_block[4],
                        signature_block[5]
                      ]

m = hash_subpacket_length
assert m == 9
hash_chunk_length = [chr(m >> 8), chr(m)]

"""
According to https://tools.ietf.org/html/rfc4880#section-5.2.4
the final bit of data is a trailer of six octets:

   V4 signatures also hash in a final trailer of six octets: the
   version of the Signature packet, i.e., 0x04; 0xFF; and a four-octet,
   big-endian number that is the length of the hashed data from the
   Signature packet (note that this number does not include these final
   six octets).

But in gnupg-2.1.11, we see the following in g10/sig-check.c:

410     else {
411     byte buf[6];
412     size_t n;
413     gcry_md_putc( digest, sig->pubkey_algo );
414     gcry_md_putc( digest, sig->digest_algo );
415     if( sig->hashed ) {
416         n = sig->hashed->len;
417             gcry_md_putc (digest, (n >> 8) );
418             gcry_md_putc (digest,  n       );
419         gcry_md_write (digest, sig->hashed->data, n);
420         n += 6;
421     }
422     else {
423       /* Two octets for the (empty) length of the hashed
424              section. */
425           gcry_md_putc (digest, 0);
426       gcry_md_putc (digest, 0);
427       n = 6;
428     }
429     /* add some magic per Section 5.2.4 of RFC 4880.  */
430     buf[0] = sig->version;
431     buf[1] = 0xff;
432     buf[2] = n >> 24;
433     buf[3] = n >> 16;
434     buf[4] = n >>  8;
435     buf[5] = n;
436     gcry_md_write( digest, buf, 6 );
437     }
438     gcry_md_final( digest );

Line 420 adds 6, so we'll do the same even though it
seems to disagree with the RFC.

"""

n = m + 6
assert n == 15

magic = ['\x04', '\xff',
         chr(n >> 24),
         chr(n >> 16),
         chr(n >>  8),
         chr(n)]

for_digest = []

for_digest += header1
for_digest += pubkey_body

for_digest += header2
for_digest += subsig_body 

for_digest += version_class_algos 

for_digest += hash_chunk_length 
for_digest += hashed_part_of_sig

for_digest += magic

# According to https://tools.ietf.org/html/rfc4880#section-9.4
# the hash algo 8 is SHA256.
assert '\x08' == signature_block[5]
digest = hashlib.sha256(to_str(for_digest)).digest()

assert 'A2 63' == to_hexdump(digest[:2])

We find that the first 16 bits of the digest is A2 63, matching the data in the signature packet, so the binding signature passes the first test.

The second test is to convert the digest to an MPI and verify it using the specified public key algorithm (an exercise for the reader since pygcrypt fails to install on my laptop :)

The RFC alone wasn't enough for me to reconstruct the subkey hash check, for example the +6 quirk. I had to poke around in gpg 2.1.11 to see what was being used in the hash. For efficiency reasons, libgcrypt lets you push single characters or blocks of bytes to the hash buffer (gcry_md_putc and gcry_md_write; see the libgcrypt docs) so you can't dump a contiguous block of memory to compare against for_digest). My hacky debugging (print statements) is in this patch. For some reason gpg2 --check-sigs 766D56F8! wasn't exercising the signature checking code (cached somewhere!?) so on line 108 of my patch I had to force opt.no_sig_cache = 1;.

Enigmail and key fingerprints

So why doesn't Enigmail let you choose which subkey is being used for encryption? As far as I can tell this is by design:

https://sourceforge.net/p/enigmail/forum/support/thread/37b7a5c8

Enigmail does not try to expose all possible features to end users. The goal of Enigmail is to be usable for everyone, including beginners. I think that the concept of subkeys is way too complex even for many average users. Most of them are already confused by public and secret keys.

You can use gpg.conf to configure the way you want to use subkeys. I will not implement specific features for this in Enigmail.

And then:

Unfortunately, this doesn't work in this case. gpg is invoked by enigmail with the -u / --local-user argument, completely overriding my settings in gpg.conf. If you / enigmail invoked it with the --default-key argument, it would be a different story. But it does not.

...

If you would change the next enigmail update to use the --default-key argument instead of the -u argument, it would really help.

EDIT:

Ok, patched enigmail myself. It works as expected with --default-key instead of -u.

And then:

I have decided that I will not replace "-u" (or the equivalent "--local-user") by "--default-key" in Enigmail. Here is why:

If a user specified local-user in gpg.conf, then use of "-u" in Enigmail will lead to the key being signed by both keys. This is what some users (especially companies) want, expect from Enigmail, and know that it's been supported for the last 10 years. Using --default-key will break this; in other words, gpg.conf will "win" over Enigmail.

The requirement to a specific subkey for signing is by far less common, and average users don't need to do this.

We can see what's going on by using an old Unix trick: make a gpg2 "binary" that is a shell script and put it before the real gpg2 binary in the $PATH:

$ cat bin/gpg2
#!/bin/bash

echo $@ >> /tmp/gpg2.txt

/usr/bin/gpg2 `echo $@ | sed 's/-u 0x7679121C22964C12888893D1269E0DC4E3E4A5B8/-u 766D56F8!/g'`

The -u 766D56F8! parameter forcibly chooses that subkey (the exclamation mark is needed).

This trick is stupid, and potentially dangerous, since someone could convince you to sign a document with the encryption key instead of the signing key. So don't do it! By default gpg2 uses the last encryption subkey for encryption.

Links

Source code for the snippets in this blog post: https://github.com/carlohamalainen/playground/tree/master/pgp/key_id

Anatomy of a gpg key

RFC 4880

Part 1/2: Email Encryption with the Yubikey-NEO, GPG and Linux

Part 2/2: Email Encryption with the Yubikey-NEO, GPG and Linux

Libgcrypt Reference Manual

GNU Privacy Guard

gnupg-2.1.11.tar.bz2 and signature

Foscam IP camera with Pi server

2016-12-31

Over the xmas holidays I set up a security camera for my parents. I used a Foscam FI9805E IP camera with a Raspberry Pi (model 3) as the server. An old 1Tb USB hard drive provides ample storage on the Pi. Here is the layout:

[photo]


The camera supports power over ethernet but it comes with an AC adaptor so you can use a normal ethernet setup as I have done.

Shopping list

These are the bits that I bought:

  • Raspberry Pi model 3. Mine came with free heatsinks.
  • Micro-SD card for the Pi's root file system.
  • 2.4amp AC to USB adaptor; powers the Pi via a micro-USB cable.
  • Foscam FI9805E IP camera.
  • Standard ethernet cable.

External installation

[photo]


I used some Sugru to keep the main cable straight against the roller-door track:

[photo]


Due to the garage being hot I mounted an old peg basket using picture frame hooks. Free flow of air is good.

[photo]


Setup

Unfortunately the camera's web interface requires a browser plugin to be installed (an EXE). Microsoft provides free Windows virtual machine images which is handy.

For security reasons I have the camera on its own 192.168.1.x subnet, with the Pi running a DHCP server to give the camera a static IP. There is no route from 192.168.1.x to the outside world so I also run an ntp and ftp server on the Pi. Meanwhile, the Pi connects via wifi the normal 10.1.1.x network.

To view the camera's web interface via the Pi, I have socat passing 8088 (on 10.1.1.x) to the 192.168.1.x network. I keep these going with supervisor:

$ cat /etc/supervisor/conf.d/socatcamera.conf 
[program:socatcameratcp]
command=socat TCP-LISTEN:8088,fork TCP:192.168.1.33:88
directory=/tmp
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/socatcamera/socatcamera_tcp.err.log
stdout_logfile=/var/log/socatcamera/socatcamera_tcp.out.log
user=root

[program:socatcameraudp]
command=socat UDP-LISTEN:8088,fork UDP:192.168.1.33:88
directory=/tmp
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/socatcamera/socatcamera_udp.err.log
stdout_logfile=/var/log/socatcamera/socatcamera_udp.out.log
user=root

(TCP is probably enough, maybe the UDP isn't needed.)

Then I can view the live video stream using vlc (the Pi is 10.1.1.220):

vlc rtsp://user:pass@10.1.1.220:8088/videoMain

The camera has motion detection software and can store snapshots to an FTP server. For this I have vsftpd on the Pi. The snapshot files have names like MDAlarm_20161220-195958.jpg so it's easy to parse the year, month, day, time. A small Python script archives snapshots to an archive directory, and another script makes a html index (by day) which is served up via nginx.

For maintenance I have ssh access to the Pi from outside, with rate limited ssh.

Monitoring

The Pi logs its CPU temperature to my Postgresql structured logging server and a daily email report plots the temperatures for me:

For real-time monitoring (5 minute intervals) I use a free account on UptimeRobot.

With the most aggressive settings for motion detection (smallest detection window and largest save window) we are seeing between 50 and 200Mb of snapshot images per 24 hour period.

Bonus

Having a Pi on NBN fibre in Australia is convenient for running WebDL for grabbing ABC and SBS shows.

Note on operational

2016-10-30

I was looking at an old post of mine on free monads and wondered what it would look like using operational. Here we go!

(Literate Haskell source for this blog post is here.)

First, some imports. We use GADTs for our instruction type.

> {-# LANGUAGE GADTs #-}
> 
> module Note where
> 
> import Control.Monad
> import Control.Monad.Operational
> import Test.QuickCheck
> import qualified Data.Map as M

We want to encode a few instructions that operate on a data store. For the moment we don’t care about the implementation, just the underlying actions that we can perform: create a value, list all values, retrieve a particular value, and delete a value.

Suppose we have an index of type i and values of type v. Then DataStoreInstruction is:

> data DataStoreInstruction i v a where
>     Create   :: v -> DataStoreInstruction i v i
>     List     ::      DataStoreInstruction i v [v]
>     Retrieve :: i -> DataStoreInstruction i v (Maybe v)
>     Delete   :: i -> DataStoreInstruction i v ()

Intuitively, Create takes a value of type v and returns an instruction, which on interpretation gives a value of type i (the last type parameter).

List doesn’t take any argument and produces a list of type v, i.e. [v]. The only odd one is Delete which doesn’t return anything, so it has a () in the last type variable.

A sequence of DataStoreInstructions is a program, which we get using the Program constructor:

> type DataStoreProgram i v a = Program (DataStoreInstruction i v) a

where the index has type i, the values have type v, and the overall result of the program has type a.

To more easily construct programs, use singleton:

> create :: v -> DataStoreProgram i v i
> create = singleton . Create
> 
> list :: DataStoreProgram i v [v]
> list = singleton List
> 
> retrieve :: i -> DataStoreProgram i v (Maybe v)
> retrieve = singleton . Retrieve
> 
> delete :: i -> DataStoreProgram i v ()
> delete = singleton . Delete

Now we can write programs in this DSL! All the usual monad things are at our disposal:

> -- Insert a few values and return a list
> -- of all values:
> doSomeThings :: DataStoreProgram Int Int [Int]
> doSomeThings = do
>     ix3 <- create 3
>     ix4 <- create 4
>     delete ix3
>     ix5 <- create 5
>     list
> -- Insert all the supplied values and
> -- return a list of indexes as well as a
> -- list of final values (which should be empty).
> insertValues :: [Int] -> DataStoreProgram Int Int ([Int], [Int])
> insertValues xs = do
>     ixs <- forM xs create
>     forM_ ixs delete -- delete everything
>     final <- list
> 
>     return (ixs, final)

The last step is to write an interpreter. We do this using view and pattern matching on the constructors of DataStoreInstruction. We use :>>= to break apart the program. As the documentation says, (someInstruction :>>= k) means someInstruction and the remaining program is given by the function k.

Here we interpret the program using a Map as the underlying data store:

> interpretMap :: Ord a => DataStoreProgram a a b -> (M.Map a a -> b)
> interpretMap = eval . view
>   where
>     eval (Return x)          m = x
>     eval (Create   v :>>= k) m = interpretMap (k v)              (M.insert v v m)
>     eval (List       :>>= k) m = interpretMap (k (M.elems m))    m
>     eval (Retrieve i :>>= k) m = interpretMap (k $ M.lookup i m) m
>     eval (Delete   i :>>= k) m = interpretMap (k ())             (M.delete i m)

If we wanted to we could flatten a program out to a string:

> interpretString :: (Show a, Show b) => DataStoreProgram a a b -> String
> interpretString = eval . view
>   where
>     eval (Return x)          = "Return "   ++ show x
>     eval (Create   v :>>= k) = "Create("   ++ show v ++ "); "  ++ interpretString (k v)
>     eval (List       :>>= k) = "List; "                        ++ interpretString (k [])
>     eval (Retrieve i :>>= k) = "Retrieve(" ++ show i ++ "); "  ++ interpretString (k $ Just i)
>     eval (Delete   i :>>= k) = "Delete("   ++ show i ++ "); "  ++ interpretString (k ())

Maybe we have some super-important business need for storing Int/Int key value maps with a Postgresql backend. We could target this by writing an interpreter that uses postgresql-simple:

> data Connection = Connection -- cheating for this example
> 
> interprestPostgresql :: DataStoreProgram Int Int b -> (Connection -> IO b)
> interprestPostgresql = eval . view
>   where
>     eval (Return x)          _ = return x
>     eval (Create   v :>>= k) c = interprestPostgresql (k v) (insertPsql c v)
>     eval (List       :>>= k) c = do allValues <- getAllPsql c
>                                     interprestPostgresql (k allValues) c
>     eval (Retrieve i :>>= k) c = do v <- lookupPsql c i
>                                     interprestPostgresql (k v) c
>     eval (Delete   i :>>= k) c = deletePsql c i >> interprestPostgresql (k ()) c
> 
>     -- Exercises for the reader.
>     insertPsql c v = undefined
>     getAllPsql c   = undefined
>     lookupPsql c i = undefined
>     deletePsql c i = undefined

Running the programs:

*Note> interpretMap doSomeThings M.empty
[4,5]

*Note> interpretString doSomeThings
"Create(3); Create(4); Delete(3); Create(5); List; Return []"

*Note> interpretMap (insertValues [1..10]) M.empty
([1,2,3,4,5,6,7,8,9,10],[])

*Note> interpretString (insertValues [1..10])
"Create(1); Create(2); Create(3); Create(4); Create(5); Create(6);
 Create(7); Create(8); Create(9); Create(10); Delete(1); Delete(2);
 Delete(3); Delete(4); Delete(5); Delete(6); Delete(7); Delete(8);
 Delete(9); Delete(10); List; Return ([1,2,3,4,5,6,7,8,9,10],[])"

QuickCheck

It’s always good to write some tests:

> prop_insert_then_delete :: [Int] -> Bool
> prop_insert_then_delete xs = null $ interpretMap (f xs) M.empty
>   where
>     f :: [Int] -> DataStoreProgram Int Int [Int]
>     f is = do
>         ixs <- forM is create
>         forM_ ixs delete
>         list
> prop_create :: Positive Int -> Bool
> prop_create (Positive n) = let ns = [1..n] in ns == interpretMap (f ns) M.empty
>   where
>     f = mapM create
*Note> quickCheck prop_insert_then_delete
+++ OK, passed 100 tests.

*Note>
*Note>
*Note>
*Note> quickCheck prop_create
+++ OK, passed 100 tests.

Over time we find out that the interpreter that uses Map is too slow, so we write a new one using a fancy data structure:

> -- Uses fancy tuned data structure.
> interpretMapFast :: Ord a => DataStoreProgram a a b -> (M.Map a a -> b)
> interpretMapFast = undefined -- So fancy!

Now we can compare implementations using QuickCheck. Nice!

> prop_fast_inserts :: [Int] -> Bool
> prop_fast_inserts xs = (interpretMapFast xs' M.empty) == (interpretMap xs' M.empty)
>   where
>     xs' = f xs
> 
>     f :: [Int] -> DataStoreProgram Int Int [Int]
>     f is = do
>         ixs <- forM is create
>         list

Use of operational in larger projects

Here are a few samples of the operational package in action. For more, see the reverse dependencies on Hackage.

Hadron

I first heard about operational from this talk at the New York Haskell meetup: Conquering Hadoop with Haskell and Ozgun Ataman.

Here is the ConI type from Hadron.Controller:

> data ConI a where
>     Connect :: forall i o. MapReduce i o
>             -> [Tap i] -> Tap o
>             -> Maybe String
>             -> ConI ()
> 
>     MakeTap :: Protocol' a -> ConI (Tap a)
> 
>     BinaryDirTap
>         :: FilePath
>         -> (FilePath -> Bool)
>         -> ConI (Tap (FilePath, B.ByteString))
> 
>     ConIO :: IO a -> ConI a
> 
>     OrchIO :: IO a -> ConI ()
>     -- Only the orchestrator performs action
> 
>     NodeIO :: IO a -> ConI a
>     -- Only the nodes perform action
> 
>     SetVal :: String -> B.ByteString -> ConI ()
> 
>     GetVal :: String -> ConI (Maybe B.ByteString)
> 
>     RunOnce :: Serialize a => IO a -> ConI a
>     -- Only run on orchestrator, then make available to all the
>     -- nodes via HDFS.

There is the distinction between the single orchestrator node, which runs OrchIO and can’t run NodeIO, and worker nodes that can’t run OrchIO but can run NodeIO. In the orchestrate, trying to evaluate a NodeIO results in an error:

> orchestrate
>     :: (MonadMask m, MonadIO m, Applicative m)
>     => Controller a
>     -> RunContext
>     -> RerunStrategy
>     -> ContState
>     -> m (Either String a)
> orchestrate (Controller p) settings rr s = do
>     bracket
>       (liftIO $ openFile "hadron.log" AppendMode)
>       (liftIO . hClose)
>       (\_h -> do echoInfo ()  "Initiating orchestration..."
>                  evalStateT (runExceptT (go p)) s)
>     where
>       go = eval . O.view
> 
>       eval (Return a) = return a
>       eval (i :>>= f) = eval' i >>= go . f
> 
>       eval' :: (Functor m, MonadIO m) => ConI a -> ExceptT String (StateT ContState m) a
>       eval' (ConIO  f) = liftIO f
>       eval' (OrchIO f) = void $ liftIO f
>       eval' (NodeIO _) = return (error "NodeIO can't be used in the orchestrator decision path")

Meanwhile, worker nodes ignore an OrchIO and lift the NodeIO action.

Chart

The Graphics.Rendering.Chart.Backend.Impl module defines the the backend instructions:

> data ChartBackendInstr a where
>   StrokePath :: Path -> ChartBackendInstr ()
>   FillPath   :: Path -> ChartBackendInstr ()
>   GetTextSize :: String -> ChartBackendInstr TextSize
>   DrawText    :: Point -> String -> ChartBackendInstr ()
>   GetAlignments :: ChartBackendInstr AlignmentFns
>   WithTransform  :: Matrix ->  Program ChartBackendInstr a -> ChartBackendInstr a
>   WithFontStyle  :: FontStyle -> Program ChartBackendInstr a -> ChartBackendInstr a
>   WithFillStyle  :: FillStyle -> Program ChartBackendInstr a -> ChartBackendInstr a
>   WithLineStyle  :: LineStyle -> Program ChartBackendInstr a -> ChartBackendInstr a
>   WithClipRegion :: Rect -> Program ChartBackendInstr a -> ChartBackendInstr a
> 
> type BackendProgram a = Program ChartBackendInstr a

Then the Graphics.Rendering.Chart.Backend.Cairo module provides a way to run a BackendProgram for the cairo graphics engine:

> runBackend' :: CEnv -> BackendProgram a -> C.Render a
> runBackend' env m = eval env (view m)
>   where
>     eval :: CEnv -> ProgramView ChartBackendInstr a -> C.Render a
>     eval env (Return v)= return v
>     eval env (StrokePath p :>>= f) = cStrokePath env p >>= step env f
>     eval env (FillPath p :>>= f) = cFillPath env p >>= step env f
>     eval env (GetTextSize s :>>= f) = cTextSize s >>= step env f
>     eval env (DrawText p s :>>= f) = cDrawText env p s >>= step env f
>     eval env (GetAlignments :>>= f) = return (ceAlignmentFns env) >>= step env f
>     eval env (WithTransform m p :>>= f) = cWithTransform env m p >>= step env f
>     eval env (WithFontStyle font p :>>= f) = cWithFontStyle env font p >>= step env f
>     eval env (WithFillStyle fs p :>>= f) = cWithFillStyle env fs p >>= step env f
>     eval env (WithLineStyle ls p :>>= f) = cWithLineStyle env ls p >>= step env f
>     eval env (WithClipRegion r p :>>= f) = cWithClipRegion env r p >>= step env f

Meanwhile, Graphics.Rendering.Chart.Backend.Diagrams does the same but for diagrams:

> runBackend' tr m = eval tr $ view $ m
>   where
>     eval :: (D.Renderable (D.Path V2 (N b)) b, D.Renderable t b, D.TypeableFloat (N b))
>          => TextRender b t -> ProgramView ChartBackendInstr a
>          -> DState (N b) (D.QDiagram b V2 (N b) Any, a)
>     eval tr (Return v) = return (mempty, v)
>     eval tr (StrokePath p   :>>= f) = dStrokePath  p   <># step tr f
>     eval tr (FillPath   p   :>>= f) = dFillPath    p   <># step tr f
>     eval tr@TextRenderSvg    (DrawText   p s :>>= f) = dDrawTextSvg    p s <># step tr f
>     eval tr@TextRenderNative (DrawText   p s :>>= f) = dDrawTextNative p s <># step tr f
>     eval tr (GetTextSize  s :>>= f) = dTextSize      s <>= step tr f
>     eval tr (GetAlignments  :>>= f) = dAlignmentFns    <>= step tr f
>     eval tr (WithTransform m p :>>= f)  = dWithTransform  tr m  p <>= step tr f
>     eval tr (WithFontStyle fs p :>>= f) = dWithFontStyle  tr fs p <>= step tr f
>     eval tr (WithFillStyle fs p :>>= f) = dWithFillStyle  tr fs p <>= step tr f
>     eval tr (WithLineStyle ls p :>>= f) = dWithLineStyle  tr ls p <>= step tr f
>     eval tr (WithClipRegion r p :>>= f) = dWithClipRegion tr r  p <>= step tr f

redis-io

The Data.Redis.Command module defines heaps of commands:

> data Command :: * -> * where
>     -- Connection
>     Ping   :: Resp -> Command ()
>     Echo   :: FromByteString a => Resp -> Command a
>     Auth   :: Resp -> Command ()
>     Quit   :: Resp -> Command ()
>     Select :: Resp -> Command ()
> 
>     -- Many more here.
> 
> type Redis  = ProgramT Command

and Database.Redis.IO.Client, an internal module of redis-io, defines how to interpret the redis commands:

> eval f conn red = run conn [] red
>   where
>     run :: Connection -> [IO ()] -> Redis IO a -> IO (a, [IO ()])
>     run h ii c = do
>         r <- viewT c
>         case r of
>             Return a -> return (a, ii)
> 
>             -- Connection
>             Ping   x :>>= k -> f h x (matchStr "PING" "PONG") >>= \(a, i) -> run h (i:ii) $ k a
>             Echo   x :>>= k -> f h x (readBulk "ECHO")        >>= \(a, i) -> run h (i:ii) $ k a
>             Auth   x :>>= k -> f h x (matchStr "AUTH" "OK")   >>= \(a, i) -> run h (i:ii) $ k a
>             Quit   x :>>= k -> f h x (matchStr "QUIT" "OK")   >>= \(a, i) -> run h (i:ii) $ k a
>             Select x :>>= k -> f h x (matchStr "SELECT" "OK") >>= \(a, i) -> run h (i:ii) $ k a
> 
>             -- Many more here, snipped

language-puppet

The internal module Puppet.Interpreter.Types of language-puppet defines an interpreter instruction type:

> data InterpreterInstr a where
>     GetNativeTypes      :: InterpreterInstr (Container NativeTypeMethods)
>     GetStatement        :: TopLevelType -> Text -> InterpreterInstr Statement
>     ComputeTemplate     :: Either Text T.Text -> InterpreterState -> InterpreterInstr T.Text
>     ExternalFunction    :: Text -> [PValue] -> InterpreterInstr PValue
>     GetNodeName         :: InterpreterInstr Text
>     HieraQuery          :: Container Text -> T.Text -> HieraQueryType -> InterpreterInstr (Maybe PValue)
>     GetCurrentCallStack :: InterpreterInstr [String]
>     IsIgnoredModule     :: Text -> InterpreterInstr Bool
>     IsExternalModule    :: Text -> InterpreterInstr Bool
>     IsStrict            :: InterpreterInstr Bool
>     PuppetPaths         :: InterpreterInstr PuppetDirPaths
>     -- Many more here, snipped.

Then Puppet.Interpreter.IO provides:

> -- The internal (not exposed) eval function
> eval :: Monad m
>      => InterpreterReader m
>      -> InterpreterState
>      -> ProgramViewT InterpreterInstr (State InterpreterState) a
>      -> m (Either PrettyError a, InterpreterState, InterpreterWriter)
> eval _ s (Return x) = return (Right x, s, mempty)
> eval r s (a :>>= k) =
>     let runInstr = interpretMonad r s . k -- run one instruction
>         thpe = interpretMonad r s . throwPosError . getError
>         pdb = r^.readerPdbApi
>         strFail iof errf = iof >>= \case
>             Left rr -> thpe (errf (string rr))
>             Right x -> runInstr x
>         canFail iof = iof >>= \case
>             S.Left err -> thpe err
>             S.Right x -> runInstr x
>         canFailX iof = runExceptT iof >>= \case
>             Left err -> thpe err
>             Right x -> runInstr x
>         logStuff x c = (_3 %~ (x <>)) <$> c
>     in  case a of
>             IsStrict                     -> runInstr (r ^. readerIsStrict)
>             ExternalFunction fname args  -> case r ^. readerExternalFunc . at fname of
>                                                 Just fn -> interpretMonad r s ( fn args >>= k)
>                                                 Nothing -> thpe (PrettyError ("Unknown function: " <> ttext fname))
>             GetStatement topleveltype toplevelname
>                                          -> canFail ((r ^. readerGetStatement) topleveltype toplevelname)
>             ComputeTemplate fn stt       -> canFail ((r ^. readerGetTemplate) fn stt r)
>             WriterTell t                 -> logStuff t (runInstr ())
>             WriterPass _                 -> thpe "WriterPass"
>             WriterListen _               -> thpe "WriterListen"
>             PuppetPaths                  -> runInstr (r ^. readerPuppetPaths)
>             GetNativeTypes               -> runInstr (r ^. readerNativeTypes)
>             ErrorThrow d                 -> return (Left d, s, mempty)
>             ErrorCatch _ _               -> thpe "ErrorCatch"
>             GetNodeName                  -> runInstr (r ^. readerNodename)
>             -- More cases here, snipped.

Since InterpreterInstr is a normal data type, it’s possible to write an instance of Pretty so that warnings or error messages look nicer:

> -- Puppet.Interpreter.PrettyPrinter
> 
> instance Pretty (InterpreterInstr a) where
> 
>     ...
> 
>     pretty (ExternalFunction fn args)  = pf (ttext fn) (map pretty args)
>     pretty GetNodeName                 = pf "GetNodeName" []
>     pretty (HieraQuery _ q _)          = pf "HieraQuery" [ttext q]
>     pretty GetCurrentCallStack         = pf "GetCurrentCallStack" []
>     pretty (ErrorThrow rr)             = pf "ErrorThrow" [getError rr]
>     pretty (ErrorCatch _ _)            = pf "ErrorCatch" []
>     pretty (WriterTell t)              = pf "WriterTell" (map (pretty . view _2) t)
>     pretty (WriterPass _)              = pf "WriterPass" []

References

Spectral graph partitioning in Haskell

2016-10-09

I was reading some notes from QUT about using spectral graph theory to partition a graph using eigenvalues of unnormalised Laplacian. Their Matlab code looks like this:

% Form W = Gaussian distribution based on distances
W = zeros(100, 100);
D = zeros(100, 100);

sigma = 2;
N = 100;
for i = 1:N
    for j = 1:N
        if (j ~= i)
            % Calculate the distance between two points
            dist = norm([A(i, 1) - A(j, 1); A(i, 2) - A(j, 2)]);
            expp = exp(-dist^2 / (2 * sigma^2));
            adjacency(i, j) = 1; 
            % Add the weights to the matrix
            W(i, j) = expp; 
            % Add up the row sum as we go
            D(i, i) = D(i, i) + expp;
        end
    end
end

L = D - W;

Find the eigenpairs of L

[vec, val] = eig(L, 'vector');
% Ensure the eigenvalues and eigenvectors are sorted in ascending order
[val,ind] = sort(val);
vec = vec(:, ind);

% Plot with clusters identified by marker
% v2
figure; 
hold on;
for i = 1:N
    plot(i, vec(i, 2), ['k' plot_syms{i}]);
end

Personally, this gives me nightmares from undergrad numerical methods classes. So here’s how to do it in Haskell. Full source (including cabal config) for this post is here.

> module Notes where

To create the W matrix we use buildMatrix:

> buildW :: Double -> Matrix Double -> Matrix Double
> buildW sigma a = buildMatrix n n $ \(i,j) -> f sigma a i j
>   where
>     n = rows a
> 
>     f sigma a i j = if j /= i
>                         then expp sigma a i j
>                         else 0.0
> 
>     dist :: Matrix Double -> Int -> Int -> Double
>     dist m i j = norm2 $ fromList [ m!i!0 - m!j!0, m!i!1 - m!j!1 ]
> 
>     expp :: Double -> Matrix Double -> Int -> Int -> Double
>     expp sigma m i j = exp $ (-(dist m i j)**2)/(2*sigma**2)

The D matrix is a diagonal matrix with each element being the sum of a row of W, which is nicely expressible by composing a few functions:

> buildD :: Matrix Double -> Matrix Double
> buildD w = diag
>          . fromList
>          . map (foldVector (+) 0)
>          . toRows
>          $ w
>   where
>     n = rows w

The L matrix is real and symmetric so we use eigSH which provides the eigenvalues in descending order.

> lapEigs :: Double -> Matrix Double -> (Vector Double, Matrix Double)
> lapEigs sigma m = eigSH l
>   where
>     w = buildW sigma m
>     d = buildD w
>     l = d - w

To finish up, the Fiedler eigenvector corresponds to the second smallest eigenvalue:

> fiedler :: Double -> Matrix Double -> (Double, Vector Double)
> fiedler sigma m = (val ! (n-2), vector $ concat $ toLists $ vec ¿ [n-2])
>   where
>     (val, vec) = lapEigs sigma m
>     n = rows m

To plot the eigenvalues and eigenvector we use the Chart library which uses the cairo backend.

> doPlot "eigenvalues.png" "Eigenvalues" "eigenvalue" $ zip [0..] (reverse $ toList val)
> 
> doPlot "fiedler.png"
>        "Second eigenvalue of unnormalised Laplacian"
>        "fiedler eigenvector"
>        (zip [0..] $ toList algConnecEigVec)

ghc-imported-from => ghc-mod

2016-09-25

I have a pull request to merge ghc-imported-from into ghc-mod. The main benefit of being part of ghc-mod is that I don't have to duplicate ghc-mod's infrastructure for handling sandboxes, GHC options, interfaces to other build tools like Stack, and compatibility with more versions of GHC.

The pull request is still under review, so until then you can try it out by cloning the development branches:

git clone -b imported-from https://github.com/DanielG/ghc-mod.git ghc-mod-imported-from
cd ghc-mod-imported-from
cabal update && cabal sandbox init && cabal install
export PATH=`pwd`/.cabal-sandbox/bin:$PATH

Assuming that you use Plugged for managing Vim/Neovim plugins, also grab my branch of ghcmod-vim:

git clone -b ghcmod-imported-from-cmd git@github.com:carlohamalainen/ghcmod-vim.git $HOME/.vim/plugged/ghcmod-vim

and make sure that the plugin is enabled:

call plug#begin('~/.vim/plugged')
Plug 'eagletmt/ghcmod-vim', {'for': 'haskell'}

and set some handy key mappings:

au FileType  haskell nnoremap            :GhcModType
au FileType  haskell nnoremap            :GhcModInfo
au FileType  haskell nnoremap    :GhcModTypeClear

au FileType lhaskell nnoremap            :GhcModType
au FileType lhaskell nnoremap            :GhcModInfo
au FileType lhaskell nnoremap    :GhcModTypeClear

au FileType haskell  nnoremap   :GhcModOpenDoc
au FileType lhaskell nnoremap   :GhcModOpenDoc

au FileType haskell  nnoremap   :GhcModDocUrl
au FileType lhaskell nnoremap   :GhcModDocUrl

au FileType haskell  vnoremap  : GhcModOpenHaddockVismode
au FileType lhaskell vnoremap  : GhcModOpenHaddockVismode

au FileType haskell  vnoremap  : GhcModEchoUrlVismode
au FileType lhaskell vnoremap  : GhcModEchoUrlVismode

On the command line, use the imported-from command. It tells you the defining module, the exporting module, and the Haddock URL:

$ ghc-mod imported-from Foo.hs 9 34 show
base-4.8.2.0:GHC.Show.show Prelude https://hackage.haskell.org/package/base-4.8.2.0/docs/Prelude.html

From Vim/Neovim, navigate to a symbol and hit F4 which will open the Haddock URL in your browser, or F5 to echo the command-line output.

Reflex FRP gallery editor

2016-09-17

When I post a series of photos to a personal blog I find myself editing HTML in Vim and switching back and forth to a browser to see if I have written comments in the right places and ordered the photos correctly. I could use a HTML editor to do this, but why not try FRP with Haskell? :) Apparently I sort of use FRP at work so trying out Reflex wasn’t too much of a leap.

This post is adapted from the todo list and drag ‘n’ drop examples in this Reflex repo.

Let’s start with the types. My basic blob of data is an image (a URL), a comment about the image, and a flag to indicate if the image should appear in the final output:

> data Image = Image
>     { imageFileName :: T.Text   -- ^ e.g. "file:///tmp/cat.jpg"
>     , imageVisible  :: Bool     -- ^ Output in HTML render?
>     , imageRemark   :: T.Text   -- ^ Comment that goes before the image.
>     }
>     deriving (Eq, Ord, Show)

The images have to be rendered in a particular order, so we’ll use a Map

> Map Int a

where the integer keys provide the ordering and a is some type.

To toggle visibility in the final rendering, we flip imageVisible:

> toggleVisibility :: Int -> Map Int Image -> Map Int Image
> toggleVisibility k m = M.adjust f k m
>   where
>     f (Image x b c) = Image x (not b) c

We can set the description for an image:

> setDesc :: (Int, T.Text) -> Map Int Image -> Map Int Image
> setDesc (k, c) m = M.adjust f k m
>   where
>     f (Image x b _) = Image x b c

We can move the kth image up:

> moveUp :: Int -> Map Int Image -> Map Int Image
> moveUp 0 m = m
> moveUp k m
>   = let xs = M.elems m in
>     M.fromList $ zip [0..] $ take (k-1) xs ++ [xs !! k, xs !! (k-1)] ++ drop (k+1) xs
>     -- ^^^ Assumes contiguous keys!

and down:

> moveDown :: Int -> Map Int Image -> Map Int Image
> moveDown k m
>   | k == fst (M.findMax m) = m
>   | otherwise = let xs = M.elems m in
>       M.fromList $ zip [0..] $ take k xs ++ [xs !! (k+1), xs !! k] ++ drop (k+2) xs

It’s not efficient to completely rebuild the map by converting it to a list and back again, but this’ll do for now.

In terms of the user interface there are a few events to consider:

  • user toggles visibility of the kth image;
  • user moves the kth image up;
  • user moves the kth image down; and
  • user changes the comment text for the kth image.

We’ll put these four events into our own type. The first three are of type Event t Int where the Int is the key for the image in question. The last one has type Event t (Int, T.Text) since we need the key and the text that was entered into the textbox. In Reflex, the event type is Event.

> data ImageEvent t = ImageEvent
>     { evToggle  :: Event t Int
>     , evUp      :: Event t Int
>     , evDown    :: Event t Int
>     , evKey     :: Event t (Int, T.Text)
>     }

Next, imageW creates an unnumbered list of images, consisting of a text field indicating if the image will be visible; a text box for writing a comment; buttons to toggle visibility and move the image up and down; and finally the image itself.

> imageW
>     :: forall m t. (MonadWidget t m)
>     => Dynamic t (Map Int Image)
>     -> m (Dynamic t (Map Int (ImageEvent t)))
> imageW xs = elClass "ul" "list" $ listWithKey xs $ \k x -> elClass "li" "element" $ do
>     dynText $ fmap (T.pack . show . imageVisible) x
>     el "br" $ return ()
> 
>     let xEvent = imageRemark <$> uniqDyn x
> 
>     ti <- textInput $ textBoxAttrs & setValue .~ (updated xEvent)
> 
>     tEvent <- updated <$> return (zipDynWith (,) (constDyn k) (_textInput_value ti))
> 
>     el "br" $ return ()
> 
>     (visibleEvent, moveUpEvent, moveDownEvent) <- elClass "div" "my buttons" $ do
>                                                     visibleEvent  <- (fmap $ const k) <$> button "visible"
>                                                     moveUpEvent   <- (fmap $ const k) <$> button "up"
>                                                     moveDownEvent <- (fmap $ const k) <$> button "down"
>                                                     return (visibleEvent, moveUpEvent, moveDownEvent)
> 
>     elClass "p" "the image" $ elDynAttr "img" (fmap f x) (return ())
> 
>     return $ ImageEvent visibleEvent moveUpEvent moveDownEvent tEvent
> 
>   where
> 
>     f :: Image -> Map T.Text T.Text
>     f i = M.fromList
>             [ ("src",   imageFileName i)
>             , ("width", "500")
>             ]
> 
>     textBoxAttrs :: TextInputConfig t
>     textBoxAttrs = def { _textInputConfig_attributes = constDyn $ M.fromList [("size", "100")] }

To process the dynamic map we use listWithKey:

> listWithKey
>   :: forall t k v m a. (Ord k, MonadWidget t m)
>   => Dynamic t (Map k v)
>   -> (k -> Dynamic t v -> m a)
>   -> m (Dynamic t (Map k a))

Specialised to our usage, the type is:

> listWithKey
>   :: forall t m. (MonadWidget t m)
>   => Dynamic t (Map Int Image)
>   -> (Int -> Dynamic t Image -> m (ImageEvent t))
>   -> m (Dynamic t (Map Int (ImageEvent t)))

It’s like mapping over the elements of the dynamic input:

> listWithKey xs $ \k x -> ...

We use elClass to produce the elements on the page. For example the text attribute showing if the image is visible or not can be rendered using dynText:

>     dynText $ fmap (T.pack . show . imageVisible) x

We have an fmap since x :: Dynamic t Image and Dynamic has a Functor instance.

The image list and all the events are wrapped up in imageListW. Here’s the main part:

> imageListW
>     :: forall t m. MonadWidget t m
>     => Dynamic t T.Text
>     -> m ()
> imageListW dynDrop = do
>     let eventDrop = fmap const $ updated $ fmap parseDrop dynDrop :: Event t (MM Image -> MM Image)
> 
>     rec xs <- foldDyn ($) emptyMap $ mergeWith (.)
>             [ eventDrop
>             , switch . current $ toggles
>             , switch . current $ ups
>             , switch . current $ downs
>             , switch . current $ keys
>             ]
> 
>         bs <- imageW xs
> 
>         let toggles :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
>             ups     :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
>             downs   :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
>             keys    :: Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))
> 
>             toggles = (mergeWith (.) . map (fmap $ toggleVisibility) . map evToggle . M.elems) <$> bs
>             ups     = (mergeWith (.) . map (fmap $ moveUp)           . map evUp     . M.elems) <$> bs
>             downs   = (mergeWith (.) . map (fmap $ moveDown)         . map evDown   . M.elems) <$> bs
>             keys    = (mergeWith (.) . map (fmap $ setDesc)          . map evKey    . M.elems) <$> bs
> 
>     ta <- textArea $ (def :: TextAreaConfig t)
>                         { _textAreaConfig_setValue   = (T.concat . map rawHTML . M.elems) <$> updated xs
>                         , _textAreaConfig_attributes = taAttrs
>                         }
> 
>     return ()

Notice that toggles is used before it is defined! This is made possible by using the recursive do extension which provides the ability to do value recursion.

The key bit is the use of mergeWith that combines all of the events.

> mergeWith :: Reflex t => (a -> a -> a) -> [Event t a] -> Event t a

Here, mergeWith (.) will left-fold simultaneous events.

>     rec xs <- foldDyn ($) emptyMap $ mergeWith (.)
>             [ eventDrop
>             , switch . current $ toggles
>             , switch . current $ ups
>             , switch . current $ downs
>             , switch . current $ keys
>             ]

The toggles has type

> Dynamic t (Event t (M.Map Int Image -> M.Map Int Image))

so we use switch and current to get to an Event type:

ghci> :t switch
switch :: Reflex t => Behavior t (Event t a) -> Event t a

ghci> :t current
current :: Reflex t => Dynamic t a -> Behavior t a

ghci> :t switch . current
switch . current :: Reflex t => Dynamic t (Event t a) -> Event t a

This merge is also where we bring in the drag ‘n’ drop event via eventDrop which is how we get a list of images into the dynamic map.

Try it out

To try it out without setting up Reflex, grab gallery_editor.zip, unzip it, and open gallery_editor/gallery.jsexe/index.html in your browser. Drag some images onto the top area of the page using your file manager. Tested on Ubuntu 16.

Or, grab the source from Github.

Monadic Yesod form with ReCAPTCHA

2016-05-08

Applicative forms in Yesod are nifty but they don't let you customise layout, CSS, and so on. I had this form for comments on my blog:

commentFormOLD :: EntryId -> Form Comment
commentFormOLD entryId = renderDivs $ Comment
    <$> pure entryId
    <*> lift (liftIO getCurrentTime)
    <*> areq textField (fieldSettingsLabel MsgCommentName) Nothing
    <*> aopt emailField (fieldSettingsLabel MsgCommentEmail) Nothing
    <*> aopt urlField (fieldSettingsLabel MsgCommentUrl) Nothing
    <*> areq htmlField (fieldSettingsLabel MsgCommentText) Nothing
    <*> pure False <* recaptchaAForm

I wanted to tweak the layout so I had to convert it to a monadic form. The only quirk was that the second part of the return value of recaptchaMForm is of type [FieldView site], not FieldView site. It looks like the first element of the list does the job for rendering the Yesod-ReCAPTCHA widget. So we set

    let recapView0 = recapView DL.!! 0

and then write

                <p>
                    ^{fvInput recapView0}

in the whamlet.

Here's the full function. Note the fvId bits where we can specify the width and height. Also, to the form as a parameter to generateFormPost, we must have a parameter of type Html. So we put the EntryId at the front so that we can use Currying.

commentForm :: EntryId -> Html -> MForm Handler (FormResult Comment, Widget)
commentForm entryId extra = do
    (nameRes, nameView)     <- mreq textField  (fieldSettingsLabel MsgCommentName)  Nothing
    (emailRes, emailView)   <- mopt emailField (fieldSettingsLabel MsgCommentEmail) Nothing
    (urlRes, urlView)       <- mopt urlField   (fieldSettingsLabel MsgCommentUrl)   Nothing
    (textRes, textView)     <- mreq htmlField  (fieldSettingsLabel MsgCommentText)  Nothing

    (recapRes, recapView)   <- recaptchaMForm

    let recapView0 = recapView DL.!! 0

    now <- liftIO getCurrentTime

    let c = Comment
              <$> pure entryId
              <*> pure now
              <*> nameRes
              <*> emailRes
              <*> urlRes
              <*> textRes
              <*> pure False <* recapRes

    let widget = do
            toWidget
                [lucius|
                    ##{fvId nameView} {
                        width: 70ch;
                    }
                    ##{fvId emailView} {
                        width: 70ch;
                    }
                    ##{fvId urlView} {
                        width: 70ch;
                    }
                    ##{fvId textView} {
                        width: 120ch;
                        height: 120ch;
                    }
                |]
            [whamlet|
                #{extra}
                <p>
                    Name                 #
                <p>
                    ^{fvInput nameView}
                <p>
                    \ Email (not shown)  #
                <p>
                    ^{fvInput emailView}
                <p>
                    \ URL (optional)     #
                <p>
                    ^{fvInput urlView}
                <p>
                    \ Comment            #
                <p>
                    ^{fvInput textView}
                <p>
                    ^{fvInput recapView0}
            |]

    return (c, widget)


The diff starting from line 71 shows the change in Handler/Home.hs: ce76215f72cbcf748bef89bfa3b09a077ceb9ab9#diff-1d7a37a0e3408faaa1abf31093eb8a50L71.

Structured logging

2016-04-27

Up until now I have used basic log files of strings for applications that I have developed. But as the applications have become larger and been used by more people, dealing with huge verbose log files becomes a real time-waster.

In a blog post on structured logging, Gregory Szorc wrote about five major use cases of logging:

  1. Recording application errors (e.g. exceptions and stack traces).
  2. Low-level information to aid in debugging or human observation.
  3. Application monitoring (includes metrics and alerting).
  4. Business analytics (use the logged data to help make decisions).
  5. Security auditing.

For concreteness, consider a program that I developed for processes imaging datasets for cloud storage. It handles a pipeline for retrieving imaging datasets from MRI scanners, conversion to a format suitable for archival and further processing, and access control for staff in projects.

The five use cases correspond to these kinds of questions:

  1. Which users are encountering errors? (Some users quietly give up and don't tell you about errors!) Does a new imaging instrument provide data that is not supported? Produce a report listing users by number of errors, or by instrument, or by ...
  2. User X walks up to you and asks why they can't see their dataset titled "something something something". Find where their dataset is in the pipeline, and give a concrete estimate for when they will be able to see their data.
  3. How long does each dataset take to process? Where are the bottlenecks?
  4. How many datasets does department X process per week? What about department Y? Or user Z?
  5. When did user X get access to dataset Y?

Answering these questions is possible with logs containing plain strings, but it becomes difficult to run more complicated queries since regexes don't scale very well.

The idea with structured logging is to log JSON blobs or key/value pairs instead of plain strings, so a structured log might look like this:

2016-04-24-1722 [patient_id=12345] [action=started_processing]

2016-04-24-1723 [patient_id=12345] [action=identified] [user=uqbobsmith] [dataset_id=10000]
                                   [project_id=999] [nr_files=20000] [patient_name="john something doe"]

2016-04-24-1724 [patient_id=12345] [action=processing_subset] [user=uqbobsmith] [dataset_id=10000]
                                   [project_id=999] [subset_id=3434]

2016-04-24-1734 [patient_id=12345] [action=uploaded_subset] [user=uqbobsmith] [dataset_id=10000] [project_id=999]
                                   [subset_id=3434] [remote_id=77a70620-3f22-47b5-8da2-ba2c2100f001] [time=600]

2016-04-24-1730 [patient_id=12345] [action=error] [user=uqbobsmith] [dataset_id=10000] [project_id=999]
                                   [subset_id=3435] [error_string="Did not find header file"]

2016-04-24-1742 [patient_id=12345] [action=finished] [user=uqbobsmith] [dataset_id=10000] [project_id=999] [time=1200]

2016-04-24-1750 [patient_id=12345] [action=acl] [acl_action=add] [user=uqbobsmith] [dataset_id=10000] [project_id=999]

Now the five questions can be framed in terms of queries over the structured data:

  1. Find all entries with "action=error" and report on the field "user=...".
  2. Find the most recent log entries with user=uqbobsmith; filter by patient_name field; filter errors; or view "time" field to see how long each subset is taking to process.
  3. Plot the time-series of "time=..." values.
  4. Filter on "user=..." and "action=finished"; join with a table linking user IDs to departments; group by department names.
  5. Filter on "acl_action=add" and "user=uqbobsmith"; sort by timestamp.

As a proof of concept I knocked up django-struct-log. Since Postgresql has support for key-value pairs (including queries and indexes!) it is the logical choice for the storage of log items. The django-hstore package extends Django's ORM model to handle Postgresql key/value pairs. And django-rest-framework provides a REST API for programs to post log entries.

The Django model for log items is in models.py:

class LogItem(models.Model):
    created_at = models.DateTimeField(auto_now_add=True)
    updated_at = models.DateTimeField(auto_now=True)

    # e.g. 'rdiff-backup'
    name = models.CharField(max_length=200)

    # e.g. 'my-server-1'
    host = models.CharField(max_length=200)

    # e.g. 'carlo'
    user = models.CharField(max_length=200)

    # Maybe something nice for a graph or email subject line.
    # e.g. 'daily rdiff-backup of something'
    description = models.TextField()

    # key/value blob
    attributes = HStoreField()

We don't have to define any views, just a serializer and view-set for the model. This is in urls.py:

class LogItemSerializer(serializers.HyperlinkedModelSerializer):
    class Meta:
        model = LogItem
        fields = ('name', 'host', 'user', 'description', 'attributes', 'created_at', 'updated_at',)

class LogItemViewSet(viewsets.ModelViewSet):
    authentication_classes = (SessionAuthentication, BasicAuthentication, TokenAuthentication)
    permission_classes = (IsAuthenticated,)

    queryset = LogItem.objects.all()
    serializer_class = LogItemSerializer

router = routers.DefaultRouter()
router.register(r'logitems', LogItemViewSet)

Surprisingly, that's pretty much it.

Using TokenAuthentication makes authorization easy (no username/passwords stored in plain text) and with the REST API we can post a log item using curl:

curl                                                                    \
    -H "Content-Type: application/json"                                 \
    -H "Authorization: Token 5a610d074e24692c9084e6c845da39acc0ee0002"  \
    -X POST                                                             \
    -d '{"name": "rdiff-backup", "host": "my-server-1", "description": "blah", "user": "carlo", "description": "daily rdiff-backup", "attributes": {"time_s": "1230"} }' \
    http://localhost:8000/logitems/

The response should be something like:

{"name":"rdiff-backup",
 "host":"my-server-1",
 "user":"carlo",
 "description":"daily rdiff-backup",
 "attributes":{"time_s":"1230"},
 "created_at":"2016-04-10T14:47:31.393234Z",
 "updated_at":"2016-04-10T14:47:31.393259Z"}

This means that almost anything can send data to the log server - shell scripts, Python scripts, Haskell programs, anything.

Pulling out data for plotting is easy using Django's ORM model. For example to get all the log items for "server1" with a "time_s" attribute:

data = LogItem.objects.filter(name='server1', attributes__has_key='time_s').all()
x = [z.created_at for z in data]
y = [float(z.attributes['time_s'])/60.0 for z in data] # minutes

Here is a sample script for a nightly report. It uses Matplotlib to draw a graph and the email library to format a HTML email.

import django
import os
import sys

from email.mime.text import MIMEText
from subprocess import Popen, PIPE
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
from email.MIMEImage import MIMEImage

import matplotlib
matplotlib.use('Agg') # Force matplotlib to not use any Xwindows backend.
import matplotlib.pyplot as plt

import datetime as dt

sys.path.append("..")
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "djangostructlog.settings")
import django
django.setup()

from structlog.models import LogItem
import numpy as np

def send_email(subject, body, plot_png):
    # adapted from http://code.activestate.com/recipes/473810/

    strFrom = 'carlo@example.com'
    strTo   = 'carlo@example.com'

    msgRoot = MIMEMultipart('related')
    msgRoot['Subject'] = subject
    msgRoot['From']    = strFrom
    msgRoot['To']      = strTo
    msgRoot.preamble   = 'This is a multi-part message in MIME format.'

    # Encapsulate the plain and HTML versions of the message body in an
    # 'alternative' part, so message agents can decide which they want to display.
    msgAlternative = MIMEMultipart('alternative')
    msgRoot.attach(msgAlternative)

    msgText = MIMEText('This is the alternative plain text message.')
    msgAlternative.attach(msgText)

    # We reference the image in the IMG SRC attribute by the ID we give it below
    msgText = MIMEText(body + '\n<br> <img src="cid:image1">', 'html')
    msgAlternative.attach(msgText)

    # This example assumes the image is in the current directory
    fp = open(plot_png, 'rb')
    msgImage = MIMEImage(fp.read())
    fp.close()

    # Define the image's ID as referenced above
    msgImage.add_header('Content-ID', '<image1>')
    msgRoot.attach(msgImage)

    p = Popen(["/usr/sbin/sendmail", "-t", "-oi"], stdin=PIPE)
    p.communicate(msgRoot.as_string())

def make_plot(n):
    data = LogItem.objects.filter(name=n, attributes__has_key='time_s').all()
    x = [z.created_at for z in data]
    y = [float(z.attributes['time_s'])/60.0 for z in data]

    for (a, b) in zip(x, y):
        print n, a, b

    fig, ax = plt.subplots()
    plt.plot_date(x, y)
    delta = dt.timedelta(days=2)
    ax.set_xlim(min(x).date() - delta, max(x).date() + delta)
    ax.set_ylim(0, max(y) + 20)

    ax.xaxis.set_ticks(x)

    start, end = ax.get_xlim()

    from matplotlib.dates import YEARLY, WEEKLY, DateFormatter, rrulewrapper, RRuleLocator, drange
    rule = rrulewrapper(WEEKLY)
    loc = RRuleLocator(rule)

    formatter = DateFormatter('%Y-%m-%d')
    ax.xaxis.set_major_locator(loc)
    ax.xaxis.set_major_formatter(formatter)

    from matplotlib.ticker import MultipleLocator, FormatStrFormatter
    from matplotlib.dates import DayLocator
    minorLocator = DayLocator()
    ax.xaxis.set_minor_locator(minorLocator)

    plt.xlabel('Date')
    plt.ylabel('Minutes')

    plt.savefig('foo.png') # FIXME use temp file instead!
    return 'foo.png'


#########################################################################

rdiff_server1_png = make_plot('rdiffbackup-server1')
send_email('rdiff-backup report - server1', '<p> Time to run rdiff-backup for server1</p>', rdiff_server1_png)

rdiff_server2_png = make_plot('rdiffbackup-server2')
send_email('rdiff-backup report - server2', '<p> Time to run rdiff-backup for server2</p>', rdiff_server2_png)

print 'done'

Sample plot:

For exploring the log items you can poke around in the Django admin interface, or use the django-rest-framework's endpoint:

Pointy edges

Links


Posts: RSS