April 26th, 2015

Machine Factories with Windows Part 2: Packaging & automating AWS Environments

Using Packer and Fake, we create a Windows machine image pipeline that takes our URL Shortener code and turns it into a runnable & scalable AWS image.

— Matthew Fellows —

In part 1 of our series on Machine Factories with Windows, we looked at how we can automate our development environments with Vagrant, Packer and DSC.

In part 2, we explore the next step in our continuous delivery pipeline; building the AWS images required for our build servers and runtime environments – such as Production. In part 3 we’ll run these environments and launch them in a stack using Terraform.

At the end of it, we’ll have the following:

  • A build script and installable package format for our URL shortener application
  • A Build Agent AMI that has all of the dependencies required to build and test the application (probably to be used to host the CI server)
  • An Application AMI ready to be started as part of a Production stack

Important: I assume you are comfortable with basic AWS concepts and practices, such as using the console and API to do things like launch instances into subnets/VPCs, and have a set of credentials ready and configured in your local environment.

Update: 2016/02/19

Packer merged the plugins from Packer Windows Plugins project meaning Windows gets first class support. Huzzah! Relevant code has been updated in the demo repo.

Base Image

Often times, particularly when building microservices, you’ll note commonality amongst server requirements – CM tools, log forwarders, security settings and so on. Rather than repeat this provisioning step for every image, one strategy you can employ to speed things up is to create an intermediate image or a ‘Base’ image with these things on it. In our case, we’re going to keep things simple and simply ensure that IIS, DSC and MongoDB are installed. These actually do take 10-15 minutes to install so it’s well worth doing now (of course in real life we wouldn’t be installing MongoDB on every web server, but for our purposes this is fine).

Our Base Packer configuration and provisioning script look like this:

Note that we run an Ec2Config.ps1 script at the end; this guy is responsible for re-enabling the ability to add user-data to the next run, to be able to set a password and also the computer name. This is really handy if you plan to create images in pipelines as we do. You can read more about the Ec2Config service here.

We can now run the build:

Build Agent

We’re going to need a server that continuously integrates our code (for CI) and contains the tools required to continuously deploy it (for CD). Whilst the actual CI/CD tool selection & implementation is an exercise left to the user, we will still create a Windows image that contains a super-set of dependencies to our Production image so that we have increased confidence code will work when shipped from one to the other.

It’s also important to be in a position to re-create this system rapidly if things go pear shaped; if we lose our build server our path to Production is essentially blocked – a state of emergency in CD circles – and we sure as hell don’t want to have to build this stuff by hand.

Our build agent Packer configuration and provisioning scripts looks like this, note that we are installing Ruby, CfnDsl and Terraform to deploy our stack later on:

Let’s create that server now:

Packaging

Typically, we would deploy a microservice such as a Chocolatey Nuget package and have it install itself into its target environment. That’s worth taking a second to digest. Chocolatey provides us with the ability to declaratively express our application dependencies, and the capability to create custom install scripts once all dependencies have been resolved and installed, which is where we are employing a small trick – using DSC within the package install process to configure the server. It’s a bit sneaky, but the result is a really clean and almost native approach to installing our application into a server.

Our Chocolatey install script looks like this:

Please excuse the hack at the top, this is a side-effect of the latest Chocolatey bundle which runs as the Invariant Culture which in turn impacts the i18n component of DSC. The rest of the process should, however, look fairly straightforward – it is simply running the same DSC scripts that we were applying in part 1 using Vagrant, and pointing IIS at the location the package was installed.  That’s it!

Now, we need a build and packaging process to tie it back together with Packer. To achieve automation, we really need to get away from clicking stuff and leverage the power of scripting – for the task of building and packaging our application, we have chosen Fake, which works cross-platform and has a bunch of neat helpers for things like packaging and file manipulation:

Run it with:

./build.sh on *nix or ./build.bat on Windows.

Once finished, you should have a bunch of artifacts in the ./publish directory, including source.zip which contains our package along with its dependencies (themselves, also Chocolatey Nuget packages). This means when it comes time to install the package we don’t need to reach out to package repositories and can simply upload the package, unzip it and run chocolatey install, using the target directory as the package source for all packages to be installed – we’ll see this in the next section.

Application Image aka ‘Bake’

So now we have all the ingredients required to create our distributable image – the Application image. This is what we will use to create our Production stack, as the image for auto-scaling groups and so on.

As Warner discusses in our talk, our bake recipe takes the following ingredients:

  • 1 x Base Image
  • 1 x Application
  • 0 x Configuration

Why no configuration? Embedding configuration into our image at this point confines our AMI to a very specific life, for the environment that your are targeting – the result is images specific to ‘staging’, ‘dev’, ‘performance’ and the list goes on.

What we really want to do at this step is create an image that can be run in any context – we want to put the image into ‘stasis’ for ‘re-animation’ some time in the future. We’ll discuss strategies for re-animation in part 3.

For now, we are simply going to bake this image – here is our application Packer configuration:

Let’s build our server, shall we?

Note that all it does is spin up our Base image, upload and install our package, run the EC2Config service and shut down, leaving us with a shiny new and runnable AMI id.

To the cloud!

We are now ready for the final installment of the series – deploying to the cloud!

  • Kevin Littlejohn

    Having the Ec2Config run on the base image for me meant that when I built the application image, it was rebooting part-way through the build process (while the various provisioners were running). Is that expected?

    • http://www.onegeek.com.au Matt Fellows

      Oooh that’s definitely not expected. I’ll have to have a play with this and see if I can repro.

      Did you mean ‘vagrant’ user? It should be deleted in the application.json file (https://github.com/mefellows/windows-machine-factory-tutorial/blob/master/machine-factory/buildagent.json#L57-L62).

      • Kevin Littlejohn

        So, I followed the recipes the way I follow all recipes – badly. It’s possible I introduced errors. But I was running against AWS with the packer operations – I build a base AMI (with Ec2Config.ps1 run), then build an application AMI (and that process takes half an hour because it’s installing all sorts of crud at the moment) – and that’s the build that spontaneous reboots on me. The fix was to remove Ec2Config.ps1 from the base build and put it in the application build only.

        wrt user, I could see the vagrant user being removed, but packer with the darjeeling password is still operational/RDP’able on the final images – and I can’t see anywhere that’s explicitly removed. Again, that may be because I didn’t include something I should have, but I couldn’t at a look through see where it was explicitly removed in your repo…

        • Josh

          Same here, it’s rebooting for me, I might try removing EC2Config.ps1 from the base AMI

  • swade1987

    I am having some issues with my Packer file setup.

    I have the following json structure – https://gist.github.com/swade1987/b19599707e4f7ebf564d

    I have the following “user data file” – https://gist.github.com/swade1987/7f05255f3bcbe3925f5b

    However, I am getting a winrm timeout when building, see here – https://gist.github.com/swade1987/273e917f3f1a81e7e10e

    Any ideas?

    • http://www.onegeek.com.au Matt Fellows

      Couple of suggestions to help diagnose the problem:
      1. When running Packer, enabling logging by exporting the PACKER_LOG=1 PACKER_LOG_PATH=./packer.log environment variables.
      This should give some more diagnostic info (like what IP address it’s trying to hit)
      2. Check if the Packer IP is public or private. You may have restrictions on your VPC / subnets that lock access. Consider enabling ‘”ssh_private_ip”: false’ to prevent access to the internal IP.
      3. Check to see if ports are open via telnet or something. If you can telnet to port 5985 but can’t winrm, that tells us its an auth issue
      4. Finally, it looks like you might be authing with the wrong password in your Packer file you use ‘Password1′ (https://gist.github.com/swade1987/b19599707e4f7ebf564d#file-gistfile1-txt-L14) but in the bootstrap script the password is set to [email protected] (https://gist.github.com/swade1987/7f05255f3bcbe3925f5b#file-gistfile1-txt-L7)

      Let me know how you go!

      • swade1987

        I can telnet but not WinRM

        • http://www.onegeek.com.au Matt Fellows

          This is usually a sure sign its an authentication/authorization problem. If WinRM is running then it has been properly configured in your user-data script. Can you RDP into the machine with the creds?

          • swade1987

            Interestingly enough when I run the following command from my windows VM it works …

            https://gist.github.com/swade1987/2cec957ea7102792e43e

            It’s like their is an issue specific to running it from iTerm on my macbook

          • http://www.onegeek.com.au Matt Fellows

            That’s strange. Have you tried RDPing into it? Or even a Remote PS Session (https://technet.microsoft.com/en-au/library/hh849717.aspx)?

          • http://www.onegeek.com.au Matt Fellows

            As discussed, it’s likely that your password is too simple (and is not what was posted above). A good thing to try is posting the user creation command in a test Windows VM to see if it works i.e. `cmd.exe /c net user /add vagrant mySimplePassword`. If this fails, then likely you won’t be able to auth.

          • swade1987

            A couple of things fixed it …

            1. Increasing the WinRM timeout to 20 minutes
            2. Changing the WinRM username to anything other than “Administrator”
            3. Changing the WinRM password to be “more complex”

          • http://www.onegeek.com.au Matt Fellows

            Glad to hear that’s working. The 20m timeout is a strange one. Occasionally i see very slow boot times (last night i had one take 13 or so). Normally it will be up and connecting in about 4-5 mins.