AppHarbor: Horizontal vs. Vertical Scaling

Continuing my experiments with AppHarbor, this time it’s all about scaling

As a hosting company AppHarbor do a splendid job of simplifying the process of server configuration, with a minimalist approach that appealed to me instantly when I first saw it. But I’m assuming I’m not the only one who’s a little perplexed by the server scaling options with which one can tinker. Having used AppHarbor for a number of years for prototypes and modest SaaS projects I openly admit that I’ve never had to delve too deeply into the complexities of worker process scaling – the most I’ve done up to now is add an extra worker or two, having made safe assumptions about the definition of “horizontal scaling” and its effects.“Vertical scaling”, however, remained an enigma to me, the question is: why and when would it be useful?

As 2015 is turning out to be a big year for my SaaS rollouts it was time I got myself schooled. A quick email to AppHarbor’s support guys garnered the following explanation:

You’d usually scale out to multiple workers (i.e. instances of your app running multiple servers) to increase your throughput, parallel processing and/or to increase availability of your application. Vertical scaling is usually beneficial when you need more worker resources per worker – for instance, you might want to process a CPU-intensive workload faster, and 2-4x the CPU resources would help you do that. Vertical scaling is described in more detail in this blog post which also provides common use cases for choosing this over (or in combination with) horizontal scaling.

Having read this explanation and the recommended blog post, and using my own SaaS projects as an example, I have surmised the following:

  • TuitionKit has a mundane computational complexity (mostly regular CRUD operations) but has a large user volume. An ongoing increase in user sign-ups will require more worker processes (horizontal scaling) to cope with the increase in page requests. As the system doesn’t do anything overly complex over and above serving pages and querying databases the horizontal scaling should cover most growth scenarios.
  • RopeWeaver involves serious number crunching for each company customer at a set number of times per day. The periodic nature of the computationally-heavy calculations means that they are done on a background worker process, removing the burden from the main web server thread, which does make it easier to manage the scaling. Ultimately most of the work is to be migrated to the database tier and done in T-SQL stored procedures, which will remove the burden from the web server entirely. Whilst it is obvious that careful management of vertical scaling is required, as a decision support system intended for use in factories and warehouses (where most users tend to be logged in for extended periods of time, e.g. an 8 hour shift) I fear a horizontal scaling aspect will also come into play. RopeWeaver poses a tricky server resource conundrum which I’ll have to experiment with further before I can come to any proper conclusions.

A Custom “RequireHttpsAttribute” for Use With AppHarbor

Further to my previous article, here’s how I handle one of the common code quirks of AppHarbor web hosting

In my article on AppHarbor gotchas I promised that I would detail my own method for handling the critical issue that the [RequireHttpsAttribute()] simply won’t work on their network of load balanced servers. By way of introduction I’ll quote from that article:

You’ll have to make several concessions in your code to the load balancer, accepting that it needs to do its thing in a certain way. Because of the manner in which web requests get internally routed through AppHarbor’s system of servers you’ll need to stop using the controller action attribute [RequireHttpsAttribute()], as this won’t work correctly; once the request has passed beyond the edge server it is stripped of its HTTPS distinction, and thus internally the controller action can’t be identified if this attribute is attached to it.

The solution – well, my solution at least – is a custom attribute with which you can decorate your secured actions, which I’m calling [RequireHttpsRemixAttribute()]. You attach it in exactly the same way as the regular HTTPS-enforcement attribute, but inside it’s doing some subtly different checks to confirm the specifics of the HTTP request. Here’s the code:

/// <summary>
/// Custom implementation of the RequireHttpAttribute, serving 2 purposes:
/// 1) Allows debugging by ignoring the SSL requirement on localhost
/// 2) AppHarbor load balancing SSL corrector, see URL: http://support.appharbor.com/kb/tips-and-tricks/ssl-and-certificates
/// </summary>
[AttributeUsage(AttributeTargets .Class | AttributeTargets.Method, Inherited = true, AllowMultiple = false)]
public sealed class RequireHttpsRemixAttribute : System.Web.Mvc.RequireHttpsAttribute
{
	#region " Public methods "

	public override void OnAuthorization(System.Web.Mvc.AuthorizationContext filterContext)
	{
		if (filterContext == null)
		{
			throw new ArgumentNullException ("filterContext");
		}
		else
		{
			if (filterContext.HttpContext.Request.IsSecureConnection)
			{
				//Connection is direct SSL                      
				return;
			}
			else if (string.Equals(filterContext.HttpContext.Request.Headers["X-Forwarded-Proto"], Uri.UriSchemeHttps, StringComparison .InvariantCultureIgnoreCase))
			{
				//Connection is SSL beyond the load balancer level
				return;
			}
			else
			{
				HandleNonHttpsRequest(filterContext);
			}
		}
	}

	#endregion

	#region " Protected methods "

	/// <summary>
	/// Could use the outer "If" statement to factor in SSL ignoring on specific testbed environments (ie. by specifying server names)
	/// </summary>
	protected override void HandleNonHttpsRequest(System.Web.Mvc.AuthorizationContext filterContext)
	{
		if (!filterContext.HttpContext.Request.Url.Host.Contains("localhost"))
		{
			if (!string.Equals(filterContext.HttpContext.Request.HttpMethod, "GET", StringComparison.InvariantCultureIgnoreCase))
			{
				throw new InvalidOperationException("The requested resource can only be accessed via SSL");
			}
			else
			{
				string url = string.Format("{0}://{1}{2}", Uri.UriSchemeHttps, filterContext.HttpContext.Request.Url.Host, filterContext.HttpContext.Request.RawUrl);
				filterContext.Result = new System.Web.Mvc.RedirectResult (url);
			}
		}
	}

	#endregion
}

You’ll notice, if you read through the AppHarbor help page to which I’ve linked in the class’s comments, that this code is based on the AppHarbor team’s recommended implementation of the custom attribute. I have, however, spiced up their example with a few changes of my own, mainly the splitting of HTTPS and non-HTTPS checking into separate functions. The HandleNonHttpsRequest function allows you to run the same code on your localhost dev environment (which probably won’t have SSL, and thus would fail the previous checks) and also specify additional testbed server names (should you need that layer of staging before pushing the code to AppHarbor).

And there you have it; short and sweet.

AppHarbor Gotchas

AppHarbor’s free-yet-brilliant tier of SaaS hosting continues to impress, but there’s a few slippery bits

I’m a heavy AppHarbor user. I’ve never tried Azure – when I first launched into building scalable SaaS applications with ASP.NET MVC I adopted AppHarbor as my hosting platform of choice, and that was it, since then I’ve never even looked at another provider.

This doesn’t mean that I don’t have desk-head-butting moments when using it. AppHarbor is a curious beast at the best of times, but once you’ve gotten used to its quirks (mostly unavoidable side-effects caused by their load balancer) it’s fairly easy to live with. After you’ve crafted clever helper functions to manage the oddities you can replicate these in every future application you build, so the hard work is generally front-loaded. From my own experience here’s the things on which to keep a watchful eye.

Caching

Caching at AppHarbor’s servers will screw you over when you least expect it, even if you’re adeptly updating and appending version tags like “?v1.2.3.4” to your CSS and JS file names when you include them in your HTML. The client-facing servers at AppHarbor are configured as a collective “load balancer”, and by default these servers will cache static content. As the balancing is spread over a series of servers the clients connecting to your site may get a mix of content from multiple sources, so numerous side-by-side requests might receive cached content from one server and newer stuff from a different server. This will mess up your CSS, JS and HTML page versioning, especially when using a single page application framework (in my case AngularJS).

To correct this you’ll want to make sure your Web.config tells IIS that only the client browser can cache stuff, so bung something like the following in there (cacheControlCustom="private" is the key item):

<system.webServer>
	<staticContent>
		<clientCache cacheControlCustom="private" cacheControlMode="UseMaxAge" cacheControlMaxAge="3.00:00:00" />
	</staticContent>
</system.webServer>

Before I made this Web.config tweak I’d often seen my applications load new AngularJS HTML view files just fine, but the accompanying new JS files didn’t. This had the effect of crashing the page when users interacted with it, as the AngularJS controller they’d been sent from the cache no longer functioned with the page it was managing. Ever since AppHarbor support informed me of the above tweak I’ve broken my habit of smashing F5 every time I load up a freshly updated version of an application.

HTTPS

You’ll have to make several concessions in your code to the load balancer, accepting that it needs to do its thing in a certain way. Because of the manner in which web requests get internally routed through AppHarbor’s system of servers you’ll need to stop using the controller action attribute [RequireHttpsAttribute()], as this won’t work correctly; once the request has passed beyond the edge server it is stripped of its HTTPS distinction, and thus internally the controller action can’t be identified if this attribute is attached to it.

There are several completely effective workarounds, and I’ll be showcasing my personal method of choice in a future article.

HTTP Status Codes

Another load balancer gripe: web server routing produces will pollute and mutate your HTTP status codes (again because the request isn’t hitting just a single server), so unless you manually manipulate these you’ll get vague exceptions being thrown, and ultimately reflected back to the client, which can make debugging a challenge.

Deployment

Pushing a new version of your website (in my case by linking an AppHarbor application to a Bitbucket repository) is slick and simple, however the deployment process itself – once your code is on their servers and out of your hands – is inconsistent. Sometimes it’s lightning fast, other times it’s dog slow, and on or two worrying occasions I’ve had it take several hours for the server side code to deploy but the client side files were deployed instantly. Of course this disparity caused baffling errors until I’d given the whole thing enough time to deploy completely, although for all I know this might have been caused by yet another bit of curious voodoo from the load balancer.

Shared SQL Server

AppHarbor’s shared SQL Server databases are awful. This awfulness is by-design; AppHarbor acknowledges their poor performance and recommends that you only use these for development purposes, and that a dedicated SQL Server is the way to go when launch time comes. Unfortunately I’ve found the shared SQL Server databases to be nigh-on useless, even during the tentative development stages of a project.

It doesn’t matter whether you’re using the free or paid shared hosting; on average every fourth query you fire will return a random I/O exception (which, if you’re canny and a little naughty, you’ll build a helper function to catch and re-fire the database query until it pushes through). The exceptions are caused by capacity constraints on AppHarbor’s shared database servers, and whilst this is a valid technical reason it still leaves a bitter taste in the mouth. As a result you will find you might have to plump for a dedicated SQL Server earlier than expected, making development costs much more expensive – especially for bootstrappers.

A workaround is to create a shared SQL Server database on a cheap alternative hosting provider and connect that to the application housed within AppHarbor. When you’re ready for prime time you can sort out your dedicated SQL Server on AppHarbor, during that final month prior to launch when paying for stuff can’t be put off any longer. I appreciate that, when using the freebie, I can’t complain too much about the shared database hosting (I agree with the 20MB maximum database size; that’s a perfectly acceptable limitation to me). However the I/O performance is a big letdown, and there is a huge disparity between the quality of their free tier web hosting (utterly brilliant) and the robustness of their free tier database hosting (utterly appalling).

Conclusion

It feels like my petty grievances have strayed into “slagging off” territory, so to correct the balance I’ll close with plaudits: their customer service isn’t just good, it’s exemplary – a benchmark to others. I’ve sent a fair few support queries to them – a couple of them slightly cheeky out-of-bounds requests for changes to my hosting configuration that I really should have sorted out myself, or shouldn’t have been able to sort out at all. To avoid their support team getting deluged with similar pleas I won’t describe the specifics, suffice to say that AppHarbor have done me a few favours – saving me hours of time – without batting an eyelid. Plus they’re very active on StackOverflow and its brethren, which consolidates the image that there’s always a company representative to talk to, and that they do care about the discussion surrounding their platform. Hopefully this involvement with the community will be maintained as continue to evolve the platform.