Bookkeeping With Libcloud

11 Apr 2012

Apache Libcloud is a Python library that aims to provide a single interface for various cloud services, such as computing and storage. It has a pretty extensive list of providers, and can mitigate vendor lock-in for common tasks such as creating, rebooting, and destroying computing instances.

Perhaps one of the lesser known superpowers of libcloud is the inclusion of instance costs for many of the providers. While these numbers are not provided by the APIs themselves, but rather built into the code, the committers do their best to keep these numbers up to date.

With this in mind, we can render some neat 10,000-foot views.

Motivation and Caveats

“You can’t do much without measuring.” - Coda Hale

Single-click provisioning and auto-scaling (for varying definitions) are becoming easy if not prominent with today’s tools. In this fast-paced world, getting a grip on exact usage can be a powerful and important metric to have, and better yet, to have automated.

Imagine a homegrown Google PowerMeter for your EC2 farm or Linode Hadoop cluster. Graphite immediately comes to mind for receiving, storing and displaying data. If not, replace Graphite with your favorite datastore, and couple it with an in-browser visualization framework such as d3.js or Crossfilter and the possibilities are plentiful.

What we can glean from libcloud is a base number and a minimum cost. Additional considerations such as bandwidth, special-type utilities such as load balancers services, and special types of pricing such as EC2’s Spot Instances will fall outside the scope of our basic calculations.

A Simple Example

We can generate a table of monthly costs, broken down by both instance sizes and by individual servers. The provider in this example will be Rackspace Cloud Servers. Costs will be extrapolated out to per-month costs, or if you prefer more granularity, you can tweak the multiplier and calculate a per-day cost, for example.

To get started, simply install libcloud and then obtain your API key via the Control Panel.

First, the code (gist here):

#!/usr/bin/env python

import locale
from libcloud.compute.types import Provider
from libcloud.compute.providers import get_driver

RACKSPACE_USER = 'YOUR_USERNAME_HERE'
RACKSPACE_KEY = 'YOUR_API_KEY_HERE'
HOURS_PER_MONTH = 730.484 # according to magical google calculator
DELIMITER = '\t'

# helper to convert items in list to str, and join with DELIMITER
def assemble_line(words):
    return DELIMITER.join([str(word) for word in words])

def format_price(price):
    # for US-style dollar amounts
    locale.setlocale(locale.LC_ALL, 'en_US')
    return '${}'.format(locale.format('%0.2f', price, grouping=True))

def main():
    Driver = get_driver(Provider.RACKSPACE)
    conn = Driver(RACKSPACE_USER, RACKSPACE_KEY)

    nodes = conn.list_nodes()
    sizes = conn.list_sizes()

    nodes_by_size = {}
    summary = []
    output = []

    # sort nodes by name
    nodes.sort(lambda x, y: cmp(x.name, y.name))

    for node in nodes:
        size = [s for s in sizes if s.id == node.extra['flavorId']][0]
        size_name = str(size.ram)

        if size in nodes_by_size:
            nodes_by_size[size].append(node)
        else:
            nodes_by_size[size] = [node]

        month_price = size.price * HOURS_PER_MONTH
        output.append((node.name, size_name, format_price(month_price)))

    # calculate total monthly price, broken down by size
    for size, nodes in nodes_by_size.iteritems():
        num_nodes = len(nodes)
        month_price = size.price * HOURS_PER_MONTH
        nodes_total_price = month_price * num_nodes

        summary.append((size.ram,                         # size
                        num_nodes,                        # number of nodes
                        format_price(month_price),        # price per node per month
                        format_price(nodes_total_price))) # total price per month

    total_num_nodes = sum(len(nodes) for nodes in nodes_by_size.values())
    total_price = sum(size.price * HOURS_PER_MONTH * len(nodes)
                      for size, nodes
                      in nodes_by_size.iteritems())

    summary.append(('',
                    total_num_nodes,
                    '',
                    format_price(total_price)))

    # output
    print('Summary')
    print(assemble_line(('Size',
                         'Nodes',
                         '$/Nodes/Month',
                         '$/Month')))
    print('\n'.join([assemble_line(line) for line in summary]))
    print
    print('Inventory')
    print('\n'.join([assemble_line(line) for line in output]))

if __name__ == '__main__':
    main()

We begin by getting the Provider.RACKSPACE driver, then making a connection with our username and API key, and finally, listing our existing nodes and the available sizes.

From there we loop through all the existing nodes and populate the dict nodes_by_size where the key is the instance size, and the values are lists of the instances. Note that the size of the instance is determined by a semi-cryptic list comprehension:

    size = [s for s in sizes if s.id == node.extra['flavorId']][0]

This searches through the list of sizes for one that matches the node’s size, which is buried away in the flavorId value in the node.extra. This is specific to Rackspace; for example, EC2 would use instanceId instead of flavorId.

We also populate a list called output which will have the node’s name, its instance size (by way of RAM), and the extrapolated monthly cost.

Next, we loop through the nodes_by_size dict and do a simple rollup, saving a breakdown of the different instance sizes, how many per category, and the cost breakdown.

At the very end, we print out summary and output.

The Result

The tabulated result is pretty simple, and by changing DELIMITER to a comma, you can have a poor man’s CSV.

Summary
Size    Nodes   $/Nodes/Month   $/Month
256     2       $10.96          $21.91
512     5       $21.91          $109.57
        7                       $131.48

Inventory
alkazar0        512     $21.91
flexo0          512     $21.91
kif1            256     $10.96
kif2            256     $10.96
leela6          512     $21.91
nibbler3        512     $21.91
nibbler4        512     $21.91

Extra Credit

Since we’re in Python, one could use the Google Data API Python Client to directly upload the TSV to Google Docs. For a more UNIX-y approach, GoogleCL is a command line utility that uses said API.

Thus, after the initial OAuth setup for GoogleCL, one can simply do:

$ ./bookkeeping.py > inventory.tsv && \
  google docs upload inventory.tsv

Your Mileage May Vary

Infrastructures come in many shapes and sizes with varying levels of flux. Measuring costs is hardly a glamorous or fun task, but can reap long-term benefits and lead to smarter decision making if done properly.