Keeping an Amazon Elastic Compute Cloud (EC2) Instance Up with Chef and Auto Scaling – Omnibus Update

Friday, Feb. 8th 2013

This post details how to use the Chef Omnibus chef-client install with an AWS auto scaling group. This allows a Ubuntu Precise image to bootstrap with chef and then follow the run list you provide.
In 2011, I wrote a version of this which installs ruby and the required dependencies. Just recently, this started failing for me because of dependency conflicts with net-ssh. For a year now, Opscode have been providing an omnibus version of chef which installs the exact version of the dependencies you need.
The procedure is the same as for the original post but the user data is different. Note, also, that this version pins the chef version to 10.18.2. This is so your setup doesn’t fail when it attempts to start an instance with the very latest version of chef – which happens to be incompatible with your cookbooks. Keeping the chef-client version pinned is a very good idea.

Posted by Edward Sargisson | in Uncategorized | No Comments »

Converting Jazz IBM Rational Team Concert (RTC) Work Items to GitHub Issues

Thursday, Feb. 2nd 2012

I have recently been converting my source code control and work item management from IBM Rational Team Concert (Jazz) to GitHub.

I originally used Jazz because I’d heard about when I worked for IBM New Zealand and thought it was a very good idea. The source code control is nice and the integration with work items and their flow works well. However, now I feel I’m in a bit of a backwater. Sharing code with Chef cookbooks is hard; with git I can pull down the upstream changes and merge them with what I’m doing. We’re starting to use git at my current employer and I want to build up some experience.

I needed a way to get my work items from Jazz into GitHub’s Issue system. I wrote a Ruby script to do just that. This script may be a useful starting point for you if you need to do any access to Jazz’s REST API or to do the conversion I did.

 

Posted by Edward Sargisson | in Uncategorized | No Comments »

Amazon EC2 Backup: Purging old snapshots

Thursday, Oct. 27th 2011

In my post on Keeping an EC2 Instance up with Chef and Auto-Scaling I use regular snapshots for backups. Over time these snapshots add up and will start costing you money. Amazon also has a limit on the number of snapshots you can have.

I found a script that looked useful: ec2-manage-snapshots. This script takes a volume id as a parameter and then applies a scheme where you can specify the snapshots to keep based on time. For example, you can specify you want every snapshot in the last 48 hours, 1 day for the last week and 1 a month for every month.

When a new instance is required my system instantiates a new volume from the snapshot. However, this means that I can have many snapshots for the same data but they will not be for the same volume.  In EC2 you can tag snapshots with arbitrary key-value pairs. I modified my backup scripts to apply tags to every snapshot and then I modified the script above to take those tags to find the snapshots. The author kindly accepted my change so you can now use it.

For example, I have the following in osm_db_snapshot.sh. It backs up my database and tags the snapshot.
In my crontab I have:

This system has dropped my snapshot count from over 500 to around 100; and my costs as well!

Posted by Edward Sargisson | in amazon-ec2 | No Comments »

Using satellite or mini sites for SEO – a bad idea

Thursday, Sep. 22nd 2011

I recently heard Patrick Mackenzie on the Stack Exchange podcast extolling the virtues of exact match domains. Google gives a little boost in the page rankings of sites where the domain name, without hyphens, exactly matches the user’s query. My idea was to have exact match domains for the two trails that get the most traffic in the hopes of getting more traffic. The idea didn’t work out.

As an experiment, I setup www.thegrousegrind.com and www.badenpowelltrail.com. I had it so that the links on my main site would 301 Redirect straight to those sites. This was important because Google will penalize duplicate content. A 301 Redirect tells Google that the content has now moved.

A few days after I submitted these URLs to Google I saw a very nice spike in traffic where I got ~150 visits more than normal. Then the traffic went away.

I then realised that I’d screwed the pages up and there were 404 Not founds everywhere. I fixed those up and resubmitted to Google. Then I saw that Google was finding my content on a shorter URL. Requests to /trails/blah/blah/trail/ will resolve just as well as /trail/blah/blah/blah/trail/. I fixed that by detecting if a more specific url should be used and redirecting to that.

I then used a trial of SEOMoz (great site!) to see what happened to my keyword rankings after a week. I wanted to compare how my exact match domains were comparing with my main domain.

keyword Mini domain Main domain
Google CA Bing en-CA Yahoo CA Google CA Bing en-CA Yahoo CA
baden powell map not in top 50 5 not in top 50
baden powell trail not in top 50 15 12 not in top 50 not in top 50 not in top 50
baden powell trail map not in top 50 12 15 35 not in top 50 not in top 50
baden-powell trail map not in top 50 12 21
grouse grind not in top 50 30 not in top 50 not in top 50 not in top 50 not in top 50
grouse grind map 46 12 15
grouse grind trail map not in top 50 6 14 32 45 not in top 50
the grouse grind not in top 50 30 not in top 50
the grouse grind trail map not in top 50 6 14 34 46 not in top 50

I’ve also basically decided that the experiment with exact match
domains (TheGrouseGrind.com and BadenPowellTrail.com) was a bad one.
Just got the keywork ranking results for this week:

Conclusions

BadenPowellTrail.com does not rank anywhere in the top 50 on google for various combinations of trail and map. It does rank as high as 5 on Bing. TheGrouseGrind.com gets to 46 for ‘grouse grind map’ on google and is 6 for ‘grouse grind trail map’ on Bing. For comparison: TrailHunger.com has ‘grouse grind trail map’ at 32 and ‘baden powell trail map’ is 35.

I did some research in the SEOmoz Q&A site and found the opinion appears to be against the idea.

So I’ve now switched the satellite sites off. The domains now 301 Redirect back to the main site and I’ve told Google to go crawl them. Bing and Yahoo can come and crawl in their own time.

Posted by Edward Sargisson | in Uncategorized | No Comments »

Amazon Web Services: New features to help reliability and scaling

Sunday, Jul. 31st 2011

On July 26, Amazon announced new features which should be useful for keeping a service running on Amazon EC2.

Firstly, you can now use Amazon SNS to be directly informed when Auto Scaling decides to start or stop an instance. In my post on Keeping an Amazon EC2 Instance Up with Chef and Auto Scaling I recommend using Cloud Watch to monitor the Healthy Host Count to be informed when Auto Scaling should have started a new instance. Now, we can be informed directly when this happens which is far more reliable.

Secondly, you can now use the AddToLoadBalancerPolicy to affirmatively tell the Elastic Load Balancer that an instance is ready for traffic. Previously we had to set a grace period and hope that the instance was ready in town.

Lastly, there is now an official AWS SDK for Ruby.

I have yet to try any of these features out but I’m very excited to.

http://aws.amazon.com/about-aws/whats-new/2011/07/14/announcing-aws-sdk-for-ruby/
Posted by Edward Sargisson | in amazon-ec2 | No Comments »

Keeping an Amazon Elastic Compute Cloud (EC2) Instance Up with Chef and Auto Scaling

Saturday, May. 28th 2011

Update 2012-04-10: Added instructions for setting the auto scaling group to explicitly notify when instances launch or terminate.
Update 2011-07-30: Amazon has announced some new features which should make the system created by the instructions in this post more reliable.
Update 2013-01-08: I now have a version of this which installs the Chef omnibus installation.

I use a single instance on Amazon EC2 for TrailHunger.com. I recently had an outage where that instant stopped responding to web requests. Unfortunately, I was at my day job and so could do nothing about it.

I asked a question on the Chef mailing list about approaches to fixing this and I got a number of recommendations on using Amazon Auto Scaling with minimum and desired instances of 1. This blog post details how to do this.

The architecture of this solutions is:

  • an Elastic Load Balancer serving both HTTP and HTTPS
  • an Auto Scaling Group using the Elastic Load Balancer health check with minimum and desired instances set to 1
  • a Launch Config using the default Ubuntu 10.10 Maverick instance with the user data set to a script which installs the chef client and starts it up.

The load balancer is given access to your site’s certificate so the HTTPS connections terminate there. The load balancer then sends the traffic to your instances in plaintext HTTP.

Steps

  1. Ensure your server certificate is in PEM format and convert it if necessary.
  2. Follow the steps in the Elastic Load Balancing Getting Started Guide. However, on the Create a New Load Balancer page add a Secure HTTP Server. On the next page you are given the choice to upload a certificate or select a new one. Upload your certificate.  In my case, I used the iam-servercertupload command, documented in the AWS Identity and Access Management CLI Reference. Do not add any instances.
  3. Save the following user data script, using Unix line endings, and edit it by adding your validation key and run list. The original of this script is from Avishai Ish-Shalom and published here with kind permission. The original can be found on Avishai’s site. I modified this version to log its output to /var/log/user-data.log, create the /var subdirectories individually, set the node name as the AWS instance name to match knife bootstrap, and I don’t start the chef client in daemon mode so that the base recipe can do this for me.
  4. #!/bin/bash
    exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
    apt-get update
    APT_GET="env DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get -q"
    $APT_GET -y remove ruby1.8*
    $APT_GET -y install ruby1.9.1 ruby1.9.1-dev libruby1.9.1
    $APT_GET -y install build-essential
    ln -sf gem1.9.1 /usr/bin/gem
    gem install --no-rdoc --no-ri chef
    ln -sf ruby1.9.1 /usr/bin/ruby
    mkdir -p /var/log/chef
    mkdir -p /var/backups/chef
    mkdir -p /var/run/chef
    mkdir -p /var/cache/chef
    mkdir -p /var/lib/chef
    mkdir /etc/chef
    ln -s /var/lib/gems/1.9.1/bin/chef-client /usr/bin/chef-client
    cat - >/etc/chef/bootstrap.json <<EOF
    {
    "run_list": [
    "role[base]"
    ],
    "default_attributes": {
    },
    "override_attributes": {
    }
    }
    EOF
    cat - >/etc/chef/client.rb <<EOF
    log_level          :info
    log_location       "/var/log/chef/client.log"
    ssl_verify_mode    :verify_none
    validation_client_name "VALIDATION CLIENT NAME"
    validation_key         "/etc/chef/validation.pem"
    client_key               "/etc/chef/client.pem"
    chef_server_url    "CHEF SERVER URL"
    file_cache_path    "/var/cache/chef"
    file_backup_path  "/var/backups/chef"
    pid_file           "/var/run/chef/client.pid"
    node_name       "`curl http://169.254.169.254/latest/meta-data/instance-id`"
    Chef::Log::Formatter.show_time = true
    EOF
    cat - >/etc/chef/validation.pem <<EOF
    -----BEGIN RSA PRIVATE KEY-----
    -----END RSA PRIVATE KEY-----
    EOF
    /usr/bin/chef-client -j /etc/chef/bootstrap.json -E prod
  5. Download and install:
  6. Develop your chef recipes to ensure your server can start without manual intervention.
    I used the following script to start my server from the command line: 

    set EC2_HOME=C:\Program Files\ec2-api-tools-1.4.2.4
    set JAVA_HOME=C:\Program Files\Java\jre1.6.0_07
    call "C:\Program Files\ec2-api-tools-1.4.2.4\bin\ec2-run-instances" ami-a6f504cf -g "TrailHunger Prod" --availability-zone us-east-1a -k TrailHungerProd -C "/path/to/my/amazon/key.pem" -K "/path/to/my/amazon/cert.pem" -f "C:\Users\Edward\Documents\Projects\TrailHunter\etc\app-server-host-user-data.txt"

    You need to make sure you have everything in place to start multiple instances as the load balancer may detect that your instance is not responding even though it is still started. I use a couple of EBS volumes per instance. If I pointed my recipes directly to the instances then the second instance would fail to load because the volumes were attached to the first instance.

    My solution to this was to have the recipes create a volume from the latest snapshot. Every night, my backup system uses XFS to freeze the volumes, take a snapshot, store the ID in a chef data bag and unfreeze the volumes.
    In my recipe the following code finds the snapshot in the data bag and loads it:

    aws_ebs_volume "blog_db_volume" do
     aws_access_key aws['aws_access_key_id']
     aws_secret_access_key aws['aws_secret_access_key']
     snapshot_id data_bag_item("snapshots-#{node.chef_environment}", 'blog_db')["snapshot-id"]
     device "/dev/sdf"
     action [ :create, :attach ]
    end

    The following snippet configures the snapshot script:

    template "/usr/local/sbin/blog_db_snapshot.sh" do
     source "blog_db_snapshot.sh.erb"
     mode 0744
     owner "root"
     group "root"
     variables({
     :directory => "/mnt/mysql",
     :description => "Prod Blog Database",
     :data_bag_item_id => "blog_db"
     })
    end

    My snapshot script is:

    #!/bin/sh
    export JAVA_HOME=/usr/lib/jvm/java-6-sun
    export EC2_HOME=/opt/ec2-api-tools
    
    description="\"<%= @description %> backup at `date`\""
    
    TMPFILE=$(mktemp /tmp/databagXXXXXXXXXX.json)
    
    xfs_freeze -f <%= @directory %>
    ec2-create-snapshot <%= node[:aws][:ebs_volume][:blog_db_volume][:volume_id] %> -K /home/ubuntu/aws/pk-SnapshotUser.pem -C /home/ubuntu/aws/cert-SnapshotUser.pem -d "$description" | awk -F'\t' '{ printf "{\"id\":\"<%= @data_bag_item_id %>\",\"snapshot-id\": \"%s\"}\n", $2 }' > $TMPFILE
    xfs_freeze -u <%= @directory %>
    
    knife data bag from file snapshots-<%= node.chef_environment %> $TMPFILE -c /etc/chef/client.rb

    Note that I attempted to use the same script everywhere. Unfortunately, the volume_id is only saved as an attribute when the aws_ebs_volume completes. This means that it is not available at compile time. I couldn’t find a useful method to parameterize this so have multiple copies of the snapshot template.

  7. Now setup your launch config and auto-scaling group. I used the following script so the process was repeatable:
    set AWS_AUTO_SCALING_HOME=C:\Program Files\AWS\AutoScaling-1.0.33.1
    set JAVA_HOME=C:\Program Files\Java\jre1.6.0_07
    set AWS_CREDENTIAL_FILE=C:\Users\Edward\Documents\Projects\TrailHunter\etc\aws-auto-scaling.credentials
    call "C:\Program Files\AWS\AutoScaling-1.0.33.1\bin\as-create-launch-config" qa-lc --image-id ami-a6f504cf --instance-type m1.small --region us-east-1 --group "TrailHunger Prod" --key TrailHungerProd --user-data-file "C:\Users\Edward\Documents\Projects\TrailHunter\etc\app-server-host-user-data.txt" --monitoring-disabled
    call "C:\Program Files\AWS\AutoScaling-1.0.33.1\bin\as-create-auto-scaling-group" qa-asg --launch-configuration qa-lc --availability-zones us-east-1a --min-size 1 --max-size 1 --load-balancers qa-lb --grace-period 1200 --health-check-type ELB

    Wait a minute and watch an instance startup without you touching it.

To shut down your instances then:

  1. set AWS_AUTO_SCALING_HOME=C:\Program Files\AWS\AutoScaling-1.0.33.1
    set JAVA_HOME=C:\Program Files\Java\jre1.6.0_07
    set AWS_CREDENTIAL_FILE=C:\Users\Edward\Documents\Projects\TrailHunter\etc\aws-auto-scaling.credentials
    
    call "C:\Program Files\AWS\AutoScaling-1.0.33.1\bin\as-update-auto-scaling-group" qa-asg --min-size 0 --max-size 0
  2. Wait a few minutes for AWS to decide to shut your instance down then:
    set AWS_AUTO_SCALING_HOME=C:\Program Files\AWS\AutoScaling-1.0.33.1
    set JAVA_HOME=C:\Program Files\Java\jre1.6.0_07
    set AWS_CREDENTIAL_FILE=C:\Users\Edward\Documents\Projects\TrailHunter\etc\aws-auto-scaling.credentials
    
    call "C:\Program Files\AWS\AutoScaling-1.0.33.1\bin\as-delete-auto-scaling-group" qa-asg
    call "C:\Program Files\AWS\AutoScaling-1.0.33.1\bin\as-delete-launch-config" qa-lc

Notes

For my system a complete startup takes about ten minutes. This is because my chef scripts start from the base Ubuntu image. This means that I can do an Ubuntu upgrade fairly easily and that I could set up an instance in another region if I wanted. If you want your instances to start faster then you should follow Bryan Bandau’s suggestion to me and bake an image with everything you need and use Chef to enforce the configuration afterwards.
Before you stop your instance serving web pages because you’re deploying a new release then make sure you suspend auto scaling procesess beforehand and resume them afterwards. Otherwise the load balancer will detect your instance as being down and start a new one.

Monitoring

It’s important to me that I get warning when an instance goes down as I will probably need to go back and deal with data. The downside of having the database on the same instance as my web server and Java container is that I have to go back to the most recent backup. I could scale out and have replication going but I have chosen not to do this yet.

In this section you can choose to monitor the Healthy Host Count or whether the Auto Scaling Group launches or terminates an instance.

Monitor Healthy Host Count

  1. Open the Amazon CloudWatch Developer Guide navigate to User Instructions -> Create an Alarm that Sends Email -> Set Up Amazon SNS and follow the instructions.
  2. In the same guide navigate to User Instructions -> Create an Alarm that Sends Email -> Send Email Based on Load Balancer Alarm and follow the instructions. However, instead of the Latency metric use HealthyHostCount.

Monitor launch or terminate on Auto Scaling Group

Posted by Edward Sargisson | in amazon-ec2 | No Comments »

Setting up a WordPress blog with a Tomcat web application

Sunday, Apr. 24th 2011

This site now had a WordPress blog beside the Tomcat application server that runs TrailHunger.com itself. The setup was a little complex so I will detail it below.

My aim in setting up these blogs was to provide some quality content so that people share the content and Google sends visitors to the site. Therefore, I wanted to setup the blogs to be below the TrailHunger.com domain.

WordPress runs on PHP and typically uses Apache as the webserver. The original configuration of Tomcat had Tomcat listening to ports 80 and 443 and Apache wasn’t used at all.

I have just started using Opscode Chef to manage my infrastructure. I started with the cookbook for WordPress as that would do most of the work for me. I modified the cookbook to:

  • Install into /var/www/blog so that the resulting blogs would be at www.trailhunger.com/blog
  • Install WordPress 3.1.1 instead of the default 3.0.x

The blog-server role is this:

name "blog-server"
description "Wordpress Blog Server"
run_list(
"recipe[apt]","recipe[trailhunger::blog-server]"
)
default_attributes ( {"mysql" => { "server_root_password" => "password" }, "wordpress" => { "version" => "3.1.1", "checksum" => "aa2ea71a34596a6d53c653c00a549e2abd190a7cb3e80812e63b6e9444500a9b", "dir" => "/var/www/blog" }})

Because we are now using Apache as the front end server we need to communicate from Apache to Tomcat. The way to do this is to use AJP. Hence the wordpress.conf file in /etc/apache2/sites-enabled has:

JkMount / ajp13
JkMount /* ajp13
JkUnMount /blog* ajp13

This means that the default context (/) and everything below it is given to the mod_jk plugin to send to Tomcat. The JkUnMount means that /blog is left for Apache to deal with.
In /var/lib/tomcat6/conf/server.xml the Connector element for port 80 is commented out and the Connector element for AJP/1.3 on port 8009 is commented in. I left the Connector for 443 (HTTPS) in for Tomcat because I decided that I didn’t want to deal with the details of connecting SSL through Apache to Tomcat and I don’t yet need it.
You will also need to setup the /etc/apache2/mods-available/jk.conf to configure mod_jk. Rajeev Sharma’s instructions work perfectly. Lastly, you will need symlinks in /etc/apache2/mods-enabled to both jk.conf and jk.load.

I used the WordPress instructions for creating a network so that I could have two blogs. The way to do this is to setup the root blog at /blog following the normal instructions. You then create the network blogs (/blog/site and /blog/technical in this case). The problem is that you now have a blog at /blog that you may not want. To fix that I added the following rules to /etc/apache2/sites-available/wordpress.conf to redirect traffic for /blog and /blog/ to /blog/site.

RewriteRule /blog/$ /blog/site [R=301]
RewriteRule /blog$ /blog/site [R=301]

I wanted to use the Google FeedBurner service to track the blog subscribers and all the other useful things that service provides. I burnt the feeds for my blogs at Feedburner and then used the MyBrands service so that the resulting links are feeds.trailhunger.com/TrailHungerSiteBlog and so on. The FeedBurner FeedSmith plugin works great to change all your public links to point to the new feeds. For a network install you have to Network Activate it and then go to each blog to activate it there.

Lastly, I used ThemesPress to create a theme from my existing HTML (hat tip to Patrick McKenzie for that). That’s why the header on the blog matches the hedear on the main site (except for Cufon not getting the fonts right, grrr)

Posted by Edward Sargisson | in Uncategorized | No Comments »

Welcome

Wednesday, Apr. 20th 2011

Welcome to the TrailHunger.com Technical Blog.

On this blog I will be talking about matters that software developers my find interesting. I have in mind posts about using StringTemplate and Spring, how to set up these blogs, etc.

For all other news about the site or the business please see the TrailHunger.com Site Blog.

Posted by Edward Sargisson | in Uncategorized | No Comments »
  • Get Trail Mapping Tips

    1 email every 1-2 months
    * = required field