Structured logging to AWS ElasticSearch

2024-03-13 These days we have an excellent ecosystem with Grafana Loki, Promtail, Tempo, and friends

A while ago I wrote about how to set up a structured logging service using PostgreSQL. AWS now makes it possible to have the same functionality (plus more) in the “serverless” style. For background on the idea of serverless architecture, watch this talk: GOTO 2017 • Serverless: the Future of Software Architecture • Peter Sbarski. Parts of this post are based on this guide on serverless AWS lambda elasticsearch and kibana.

First, create an Amazon Elasticsearch Service Domain. I used the smallest instance size since it’s just for personal use. Full docs are here.

For programmatic access control, create an AWS IAM user and make a note of its “arn” identifier, e.g. arn:aws:iam::000000000000:user/myiamuser. Then add an access policy as follows. We also add access to our IP address for the kibana interface. I made an ElasticSearch domain called “logs”; see the Resource field below:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::000000000000:user/myiamuser"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:ap-southeast-1:000000000000:domain/logs/*"
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "*"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:ap-southeast-1:000000000000:domain/logs/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": "xxx.xxx.xxx.xxx"
        }
      }
    }
  ]
}

To post to the ElasticSearch instance we use requests-aws4auth:

sudo pip install requests-aws4auth

Then we can post a document, a json blob, using the following script. Set the host, region, AWS key, and AWS secret key. This script saves the system temperature under an index system-stats with the ISO date attached.

import datetime

from elasticsearch import Elasticsearch, RequestsHttpConnection
from requests_aws4auth import AWS4Auth

HOST        = 'search-logs-xxxxxxxxxxxxxxxxxxxxxxxxxx.ap-southeast-1.es.amazonaws.com' # see 'Endpoint' in ES status page
REGION      = 'ap-southeast-1' # choose the correct region
AWS_KEY     = 'XXXXXXXXXXXXXXXXXXXX'
AWS_SECRET  = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY'

def get_temp():
    return 42.0 # actually read from 'sensors' or similar

if __name__ == '__main__':
    now  = datetime.datetime.now()
    date = now.date().isoformat()

    doc = {'host': 'blah', 'temperature': get_temp(), 'datetime': now.isoformat()}

    awsauth = AWS4Auth(AWS_KEY, AWS_SECRET, REGION, 'es')

    es = Elasticsearch(
            hosts=[{'host': HOST, 'port': 443}],
            http_auth=awsauth, use_ssl=True, verify_certs=True,
            connection_class=RequestsHttpConnection)

    _index = 'system-stats-' + date
    _type  = 'temperature'
    print doc
    print es.index(index=_index, doc_type=_type, body=doc)

To query data we use elasticsearch-dsl.

sudo pip install elasticsearch-dsl

from elasticsearch import Elasticsearch
from elasticsearch import RequestsHttpConnection
from requests_aws4auth import AWS4Auth

from elasticsearch_dsl import Search

from datetime import datetime

HOST        = 'search-logs-xxxxxxxxxxxxxxxxxxxxxxxxxx.ap-southeast-1.es.amazonaws.com' # see 'Endpoint' in ES status page
REGION      = 'ap-southeast-1' # choose the correct region
AWS_KEY     = 'XXXXXXXXXXXXXXXXXXXX'
AWS_SECRET  = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY'

awsauth = AWS4Auth(AWS_KEY, AWS_SECRET, REGION, 'es')

client = Elasticsearch(
            hosts=[{'host': HOST, 'port': 443}],
            http_auth=awsauth, use_ssl=True, verify_certs=True,
            connection_class=RequestsHttpConnection)

plot_date      = '2017-08-06'
monitored_host = 'blah'

s = Search(using=client, index='system-stats-' + plot_date) \
       .query('match', host=monitored_host)

response = s.execute()

xy = [(datetime.strptime(hit.datetime, '%Y-%m-%dT%H:%M:%S.%f'), hit.temperature) for hit in response]
xy = sorted(xy, key=lambda z: z[0])

for (x, y) in xy:
    print(x,y)

Sample output:

$ python3 dump.py
2017-08-06 04:00:02.337370 32.0
2017-08-06 05:00:01.779796 37.0
2017-08-06 07:00:01.789370 37.0
2017-08-06 11:00:01.711586 40.0
2017-08-06 12:00:02.054906 42.0
2017-08-06 16:00:02.075869 44.0
2017-08-06 18:00:01.619764 43.0
2017-08-06 19:00:02.319470 38.0
2017-08-06 20:00:03.098032 43.0
2017-08-06 22:00:03.629017 43.0

For exploring the data you can also use kibana, which is included with the ElasticSearch service from AWS.

Another nifty thing about the AWS infrastructure is that you can use Lambda to create ElasticSearch entries when objects drop in an S3 bucket. More details in this post.