“…this is it; we built this great thing and we totally messed it up” – Kevin Systrom, Instagram Founder
Building a robust, distributed stack from day one is overkill, but you should already be thinking about how you’ll scale, how much load your current system can handle, and how long it will take to scale if your app/service blows up overnight. Instagram’s founders Kevin Systrom and Mike Krieger learned this firsthand when they launched their photo app in late 2010.
In an interview published to NPR’s How I Built This back in September, the founders of Instagram reflected on the early days of the app and how it was a miracle they survived the launch.
“To be clear, there is no reason we should have succeeded. The server was down every other hour.”
The entire backend of Instagram was running on a single self-hosted server. Luckily, at the time, mobile networks weren’t known for being reliable, so users often blamed the “network failures” on their mobile provider rather than the actual cause – connections timing out due to an overloaded server unable to accept any new requests. The initial feeling of success seeing all of the new users coming online quickly fleeted when they realized they didn’t build a system with scale in mind.
“It was nothing more exhilarating seeing all of those people stream in, and nothing more crushing than seeing people posting on Twitter or on their blogs, ‘Ugh, another startup that doesn’t know how to scale’.”
So what’s the Solution?
This may just be the Java speaking in me, but the more abstract you can build your stack, the easier it will be to swap components out. By abstracting service interaction through interfaces, you can build services that can interact agnostically between say a local filesystem and a distributed, “cloud” database.
When the time comes to scale, you can easily swap out the filesystem for something more robust like a remote database as long as it implements the interface. This can be hard to visualize without a concrete example sometimes, so let’s take a look at Cloud Campaign‘s stack.
At first glance, it looks like the system is comprised of many different servers and storage providers, but in reality there is only a single computing server running in production. By separating the functionality into micro-services, we can simply pull the services apart and put them on their own machines when the time comes. But until the system is hitting upper constraints, all of the micro-services can run together on a single machine.
This goes back to the idea of using interfaces to simplify service interactions. Right now it may just be a method call, tomorrow it might be RPC, next week it may be API. Regardless of the protocol, the data being transferred between the services is the same and the client call is the same since it’s just interacting with an interface.
Looking back at Cloud Campaign’s setup, Portal, Notification, and Trigger service are all running on the same server, bundled in a single Jar. The message bus is an ActiveMq broker running locally, and Zookeeper is also running as a local service.
Building the system this way takes a bit longer initially, but will save a lot of time when you need to grow. And saves a ton of money instead of just scaling from day 1.
Considerations as your Stack Changes
Each time you make a change to your stack, you should revaluate to adjust and optimize to the changes. Two major areas to focus on are:
- Means of communication (method call, RPC, API, persistent TCP, etc…)
- Caching / Network Costs
I think the means of communication will be fairly obvious, and in most cases will be dictated by how you configure your system. But, caching is often overlooked and is a crucial part of getting any system to scale. If you had two services that were once on the same machine, but now are running separately, adding a layer of cache can drastically decrease latency and increase scalability.
If you expand by adding an additional layer to the stack, add in-memory, local cache such as Guava or something similar, while if you’re scaling horizontally you should consider adding a distributed cache such as Hazelcast or Memcached.
If you’re interested in learning more about Cloud Campaign‘s tech stack or want to grow your startup by automating your social media campaign, fill out the subscription box with your email below.
Thanks for taking the time to read. Would love to hear your thoughts below, or list what you’re working on so other people can check it out.