Migrating Installer Images to AWS S3 and CloudFront

This is the second in a series of posts which describe the adventures encountered while sticking our heads even further in the clouds.

  • The first article is mostly Project Introduction & Background, describes what we’re doing and why in higher level terms
  • This post is about getting our static data hosted at AWS as our first steps to using the AWS cloud
  • Then we’ll talk about Migrating Site Local MySQL database to AWS RDS
  • And the big kahuna part 1: Establishing the Ruby on Rails Web App Environment on AWS EBS

Ok firstly let’s think about the types of static content we want to serve from AWS, and then look at how this impacts our web app and then the build and distribution workflows for the Drum Score Editor app itself.

We’ve got 2 types of static content we want served, firstly there’s the Drum Score App installer images for each platform, plus example scores and PDFs. We’re going to store these in an AWS S3 bucket primarily to reduce the size of the web app so it can be deployed in later steps using AWS Elastic Beanstalk, which has an untweakable hard maximum web app size it can deal with (500MB at last look).

Secondly there’s the Rails asset pipeline, all the static css, javascript etc that goes with an app. Some reading of the various opinions on the inter web reveals that AWS CloudFront is the CDN which caches copies of static assets closer to the user. You simply specify the origin of the files and it does it’s magic to make them appear. That origin could be the S3 bucket containing the installer images etc, or the origin could be your web site itself, or the assets can be precompiled to an S3 bucket also. We’ll look at the options and why we chose the solution we did later in this article.

Step 1 – getting the installer images into an S3 bucket

Before we can put anything in a bucket, we need to acquire that bucket. Should we just the use AWS Console as this is a one off operation, or maybe we want to have different buckets for UAT and Production separation and so should create a reusable script for their creation. Do we need that complexity? Probably not at this stage given our overall use case.

First we create a bucket for these resources, imaginatively called drumscoreportal-resources, and we upload our installer image into it using the console (or AWS command line tools) and make it public, by right clicking on the uploaded filename. Selecting properties will show the public name of the file, so to test this works we copied the link presented and pasted it into a command line and used curl to pull it down. Remember it’s a common installer we want to be available to everybody, no need to control access to it so no need to work out any permission stuff. This all just worked, so in theory we can tweak the download links in the appropriate page in the web app and off we go.

However, everybody does local development right? You really don’t want to be running up your AWS network costs by pulling the copies down when in an iterative dev/test cycle. So we need to somehow make the development environment use local copies while UAT and production use the S3 objects.

Given all we’re doing though is pulling down the objects via http, we don’t need anything more clever than a local web server and the files in a similar URL so the app can switch through Rails environment specific initialisers. The secrets.yml file is already used to separate out which hosts are used for each environment for things like the Facebook and Paypal integration. Might be impure, but popping a line in there for the resource_host and using that in the link tags might be viable.

The download link in the view then becomes

<a href="#{Rails.application.secrets.resource_host}/drumscoreportal-resources/DrumScoreEditor-2.23.dmg" class="button radius" download>Download For Mac OS X</a>

The secrets.yml entry for the environment then specifies https://s3-eu-west-1.amazonaws.com  for production and UAT and http://localhost:8080 for development. Why that URL for development, well every Mac comes with python and from the directory you want to serve files from, the command below works well.

python -m SimpleHTTPServer 8080

Simple really, in summary our web app just serves up a link to the resource in the S3 bucket rather than from it’s own host. Really need a Rails guru to chime in and say what the best way to set the resource_host variable would be though. I’m sure secrets.yml isn’t meant for this!

Last thought before moving on, putting that installer image in the S3 bucket cost money, for transfer in fees, every download costs money, and we’ve made it publicly available – this worries me.

Step 2 – moving the rails assets pipeline to S3 & CloudFront

I’ve chosen not to do this at this stage. What? I thought this was an article about how to do that! Here’s the deal, if I follow the pretty simple advice out there to simply use CloudFront, e.g. https://www.happybearsoftware.com/use-cloudfront-and-the-rails-asset-pipeline-to-speed-up-your-app.html, then I’m simply running up my costs further.

Well that’s how it seems at the moment, with no control over bandwidth costs due to user behaviour or any other malevolent person choosing to do so, we’re exposed enough already. I’ll park this for now, and return when I understand better what techniques are available for controlling exposure here.

For reference, my current setup in the single VM, which hosts both the UAT and Production website in Apache virtual sites, and the database is different schemas in a single MySQL instance, costs less than a tenner a month, and has unlimited (subject to fair use) bandwidth. For sure this project is about adding resilience and scalability but as always there’s a decision about costs versus value. We don’t understand our total costs yet, perhaps there’s a way of modelling and understanding potential costs based on apache and mysql logs, but also need to understand the behaviours on costs that EBS, RDS, S3 and maybe eventually CloudFront add to this.

Scalable Resilient Distribution of Drum Score Editor

AKA getting very cloudy out there! This is the first of a series of posts which will describe the adventures encountered while sticking our heads even further in the clouds.

This instalment is mostly Project Introduction & Background, rapidly followed by (which will be links once written):

Briefly, Drum Score Editor is a GUI app that installs locally on Windows, Mac OS X or Linux. It’s written in Java and pretty much lives the write once run (almost) anywhere dream, ok no tablets or phones as the industry pretty much needed a whole bunch of new challenges in that domain – not.

To get it to it’s users, Drum Score Editor is packaged into native installers for each platform (another process worthy of improvement and an article one day) and those installers are hosted on a bespoke website. So it’s not that simple, as there’s a free version which is very usable imho, and then with the application of a license key, it unlocks a bunch of productivity features.

To deliver this to it’s users a bespoke website has been written, as integrating with Mac, Windows, Ubuntu etc app stores is just a bridge too far, oh and Apple won’t let me host it on theirs as the optional license key is regarded as an in-app purchase and because it’s cross-platform it doesn’t use their technology so they believe I’m avoiding the 30% fees they charge. Oh and the Microsoft one just wanted ridiculous upfront costs. There’s probably another article all about this space waiting to be written.

So this website, on the face of it it’s fairly simply, you connect to it, you download the installer right from the home page, no tracking, no identity capture needed, no email spam afterwards, nada. Nothing complicated there, static installer images hosted on the website accessed from a (sort of) pretty html page.

But then there’s the making a financial contribution in exchange for a license, that’s where it gets not-so-simple. Integrations through oauth2 are needed to register a new account or sign in to an existing one. From there if the users wishes to acquire a new license we’re integrated with PayPal to collect the funds, and then when that’s done, encrypt a new license key by spawning some Java code which does all the complicated bits, returning the keys to stash in the users account. Oh, and just to be sure we’re good, everything’s behind an SSL cert (https FTW), which is also used to help protect the site when talking to the aforementioned authentication and payment sites.

So not quite so simple as bunging it in a simple call to AWS Elastic Beanstalk and hey presto. I didn’t mention AWS before, that’s the target cloudy environment chosen for it’s rich etc etc, it’s the first, biggest and most complete imho. We’re already a bit cloudy but it’s oh so last year and not as comprehensive a solution as offered by AWS, not that I’m sure we really need all those bells and whistles as it’s been going fine as a single web app on a VM I rent from a hosting provider on the internet for a few years now. Yes, single, so no resilience in the case of provider outages, no disaster recovery other than the fact I back up the database and email it’s contents to myself each day. Ahem, yes data protection, some of that too please.

Had a quick look at what it will take, and the first hurdle is my current single git repo containing the web app and the installer images is about 1GB in size. The AWS EBS experts will now be jumping up and down to say it’s too big, fix it! We will, that’s the first step in getting this web app some more professional attention. It’s a shame though as it breaks the model of a single repo for the whole app and it’s data. We’ll move the installer images to S3 and use CloudFront to serve the content more locally to the sites users.

Second phase is to shift the database out of the singe instance web app, so it can be accessed by multiple instances of the app. This is going to be trickier. Currently there’s a very effective Capistrano integration (seems I forget to say the web app is written in Ruby on Rails, talking to a MySQL database), which takes care of pushing updates to either a UAT or the Production website. Both of which are hosted on the same VM, by the same Apache2 instance configured with virtual servers. Yeah, not exactly a lot of separation there either, another thing that’ll be fixed by getting cloudier. I like to call the current config as “just good enough”, and the thinking man’s server consolidation (1990’s style).

Third phase has to be to move the main web app to EBS. There will be plenty challenges with this, hence wanting to separate out the two major changes above and make sure they’re working before this piece of heavy lifting. Just to finish this intro to the project, here’s a picture that tries to show the before and after for the whole shebang at a fairly high level.

Slide1

By the way, just as a footnote, when we’re done with this, we’ll look to see how the many tools and github repos which address the complexity in this space are progressing. Right now, I couldn’t find anything that meets the requirements, without a complete rearchitecture and rewrite of the web app. Feel free to comment away with recommendations, I’ll look at them all, honest!

Oh and p.p.s., continuous integration, there must be answers in there for this whole toolchain! Another future post.