Clout deployment nirvana through Chef

I have a lot of appreciation for the amount of engineering that goes into building sophisticated tools for software deployment. A few years back, when I was working with Amazon, I learnt how large companies manage tasks like building and deploying complex and highly distributed software systems. The build system was called Bob and the deployment system was called Apollo. Though a little arcane in the beginning, after getting a hold on it, I realized how critical it was to have such a system in place, in Amazon. These two systems integrated with the development environment of the developer and exposed a seamless workflow from the time of code change till the time of production deployment. The developer checked in code with the right comments, scheduled a build and happily went on with more serious businesses (like playing TT!). Bob, took the responsibility of identifying all packages that depended on this code change and built them all on specific hardware. If nothing broke, a shiny new version number would be given to the the changed package. After this, the (poor) guy who was in charge of doing the production deployment had to ensure that the he created a version-set with this version number for the changed package. That's it!

Well, obviously I have cut a lot of details to save space (and not get sued by Amazon!) but the behaviour was roughly that. It seems like a straight forward job, but let me tell you that when you get to Amazon scale of software development, having your systems working all the time is a mind boggling exercise. Note that by scale, I mean the sheer amount of code that Amazon has to maintain. The scale of the problem that these libraries solved is a different story altogether.

I think that you can say that a company has reached "escape velocity" when they have a good build+deploy system, that can scale, in place. Once they reach this escape velocity, they are in a much better position to develop, maintain and grow complex code. I think Chef is helping Cloud computing reach that "escape velocity". The escape velocity that is needed to get onto the Cloud. First, some definitions. As far as I am concerned, Cloud computing comes with two guarantees 1) Provisioning and releasing hardware is extremely quick. So, scaling up to hundreds of machines or down to 1 (or 0) machines should be extremely quick. 2) Hardware cost to company is exactly equal to what the company uses. This basically means that a company that is on the Cloud does not pay for 'unused' machines.

Lets take an example. Suppose I own a Media house with a popular website. On a normal day, I would get about 1 lakh visits to my site. That's about 1 hit per second. However on a day when one of my editors have broken a sensational news, the number of visits may jump up to say, 50 lakhs (or even more if the news is exclusive!) So, how should I provision hardware for my web site? Let's say we begin with an architecture that consists of a front-end load-balancer, 2 app servers, and a database server. Let's say that each server is residing on a dedicated host. This is our "normal" day setup.Now, assuming that the database is never going to be a bottleneck (a huge assumption), we scale the system horizontally by throwing more app servers between the load-balancer and the database server.

Architecture Diagram

 

The only problem is that I need to provision extra hardware. This is the time when the men are separated from the boys! The Amazons and the Googles of the world do this with the snap of a finger, but what about me? I place a request to my provider and he says that he will get it within 24 hrs, like I am supposed to feel good about it! And even if I get the hardware, by the time I wake up my ops guy and have him reconfigure my system, the traffic would have died down! Today's Cloud computing providers like Amazon and Rackspace solve the first problem for me pretty well. But the second problem is still a pain in the neck unless I want my ops team to label me a "slave driver"! This is where tools like Chef come in. If you had configured your system using Chef, this is what you would do:

$> knife node create new.machine1

$> knife node create new.machine2

$> knife node run_list add new.machine1 "role[appsrv]"

$> knife node run_list add new.machine2 "role[appsrv]"

$> knife bootstrap new.machine1

$> knife bootstrap new.machine2

$> ssh root@new.machine1 "chef-client"

$> ssh root@new.machine2 "chef-client"

Thats it! Now, all that needs to be done is to tell the load balancer that there are two more app servers available to share the load. If you have a physical load balancer (proxy) doing this job for you, all you need to do is run "chef-client" on the load-balancer after changing the required configuration file. An alternate approach would have been to create machine images, like Amazon's AMI for EC2, after having configured a working system. But I find it tedious for the following reason:

  1. It binds me to EC2 (or whoever has the image)
  2. I need to take a snapshot every time my configuration files change
  3. An AMI is of little use to fix problems with an existing system (e.g., if I want to "re-install" only the MySQL server)
  4. I need to take a snapshot every time I upgrade some part of my software (e.g., if I upgrade to Rails ver 3.2 from ver 3.0)

Chef takes a system admin problem and converts that into a development problem. This is because Chef does its stuff using what are called "Cookbooks". These Cookbooks have "Recipes" that describe how to go about building the system. The best part is that recipes can depend on other recipes much like the way one library can depend on another in a software application. Recipes are written in Ruby DSLs, which means that you can insert programming logic when cooking up recipes!! The best part is that Opscode (the company behind Chef) has created a platform where people can share and use other's recipes. I think that Cookbooks and Recipes are the right abstractions for sharing system setup information. These are much more fine grained when compared to AMIs and thus more reusable. The "Configuration as Code" and sharable Cookbooks create an eco-system, that I think lends itself to some very sophisticated configuration option. The relative ease with which this sophistication is achieved will, I think, accelerate the adoption of Cloud.

O Application What Art Thou?

Apple and Google seem to be the two companies people look up to, when trying to find out the future direction of the Web. With Internet as the foundation, these companies dictate what gets built on top. When the iPhone came out, a lot of developers started writing a lot of Objective C code. The Android based phones had a similar effect on the more "open" developers. With iPad, Apple renewed the interest that developers had with iOS programming. The mobile device became more serious now. The games became smarter, the apps became "full-featured" and the browser became more viewable. The aspect ratio of the iPads* were good enough for users to consume serious content.

Looking at the way freelancers were making money by developing iPhone and Android apps, I was convinced that learning Objective C or the Android SDK was a worthwhile investment. But then, Google came along and released the Chrome OS. Google's thinking with the Chrome OS confuses me. How can they firmly put their feet on two boats and expect to travel for long? If every app is supposed to be rendered through the browser, why on earth did they have us interested in developing apps for the Android OS?? Or, is it that they don't expect people buying/renting Chromebooks to play games or edit contacts? Apple's strategy, on the other hand, is a little less confusing. As far as the developers are concerned, their tablet strategy seems like a logical extension of their mobile strategy: "Learn Objective C first. Then learn a few libraries to be able to write apps on the iPhone. Now, learn a few more libraries to be able to write apps on the iPad." Logical.

However, life was simpler before the iPad* for content publishers. Any content, had to be delivered in two forms, one for the PC* (with bigger aspect ratios) and other for the smart phone (with smaller aspect ratio). Whether or not you want to support the not-so-smart phones was a matter of taste (and also the demographic of your customer base). With the growing popularity of iPads and similar tablets, the content publishers are in a situation where they have to hide their cries with laughter! It is a great thing for their customers, you see. If the publishers can deliver content optimized for the iPad*, which, their new-age customers are now equipped with, they stand a chance of genuinely delighting their customers. But what happens to the tens of thousands of dollars that actually went into developing their iPhone app? Should they continue to support them? How do they make sure that all content is delivered in all three formats in a consistent way? Add to it, the multiple platforms in each format (e.g., Android and iPhone for the smart phone segment), what you have is a Royal Mess of multiple codebases! Remember that software development is not a core competency with many of these content publishers (like Media houses). They are most likely to outsource their software development work.

Viewed in this light, it seems like Google has envisioned the right approach to this problem with their Chrome OS (and the accompanying Chromebooks). Everything is now going to be delivered through the Browser. So, technically, publishing on the Browser is the only skill-set needed to deliver content on any platform. Of course, you still have to optimize the content for the specific device, but the optimizing methodology will be the same everywhere. Roughly put, the Browser is the new OS! So, back to javascript and HTML 5, you developers!!  If you find it hard to imagine glossy apps delivered through the Browser, then take a look at OnSwipe's youtube video.

So, what is it the dominant form of application development going to be in future? Device specific development or intelligent rendering of content through the browser? Obviously, both these approaches will exist parallelly for quite some time, but if I had to bet, I would bet that a lot of applications will be delivered through the Browser in the long run, just because it is a lot less messy that way.

PC* - A desktop computer that could have a Windows, Mac OS X, or Linux Operating Systems.
iPad*, iPads* - iPad and similar devices (like Samsung Galaxy Tabs)

Backup, backup, backup!

A few days back a couple of computers in my uncle's office got affected by viruses (unfortunately, I have not been able to convince my uncle to buy Macs). After about two days of effort in trying to remove them, the maintenance guy, very calmly said that the machines had to be formatted and the OS (Windows) re-installed. Remember these were machines running a licenced version of a very popular anti-virus software. At this, my uncle, after trying to hold back the choiciest profanities said that that was impossible as he had more than an years' worth of accounts in files that he absolutely could not afford to loose. At this, the man assured that *most* of the files could be saved!

I don't think this is a very uncommon situation. Although my research leaves a lot wanting, I would be willing to bet that most small businesses in India do away with little or no backup of their files. The best that I have heard is that everything is backed up to an external hard disk. This solution is ok when what you are backing up is a bunch of photos that you might not mind loosing.. well atleast you wont hurt other peoples lives by loosing your photos. But when your entire business depends on it, I would not be able to sleep with this arrangement. I know my paranoia crops from knowing a bit about what it takes for these disks to crash (not much!) but even without that knowledge, it is amazing to see the amount of trust people are willing to put on commodity hardware!! The truth is that it is bound to fail one day! Why is it that we (Indians) are so reluctant to using technology solutions to save ourselves from such situations? Storage space is dirt cheap. Now with "cloud" being omni-present, you just have to pay for what you use. Even when converted to rupees, its cheap. I think it would come to about Rs. 5 per GB per month! So, effectively, in less than the amount it takes to buy a cup of coffee, your data will be backed up in 3 different locations around the world! Unfortunately unless that happens, your data is not completely immune to failures!

So, all you small business owners and proprietors out there, here are a couple of solutions you must consider:
1. JungleDisk
2. Dropbox

To complete my uncle's story... fortunately, all of his important files were restored. He has since spent about 3.5k rupees and bought an external hard disk to backup his files. He has vowed to backup regularly. I am still trying to convince him to consider backing up on the cloud!

Can Facebook Kill Local Listing Sites?

I think so. There are two very powerful weapons that Facebook has, to do this - LIKE and Checkin.

What do the local search/listing sites have? One, a thorough listing of the businesses in the cities that they support, two, a bunch of reviews for each of their listing sites that would act as decision influencers for their users. So, theoretically if these two were replicated in some other place, an end user would have no special incentive to go to the original site. Ok, there is a third factor - user experience. The original site might be giving a good user experience to have their users coming back to their site. Now, how does "Facebook" sound for a competitor?! The listing sites wanting to be more "social" by adding features like "follow", "compliment", etc, is not helping them either. At best they look like "wannabes" of Facebook/Twitter when it comes to social engagement.

For businesses that are missed out by the listing sites, there are Checkins - the offline equivalent of LIKE (for practical purposes).

I think the tipping point would be when a lot of local businesses have a Fan Page on Facebook. Once they realize that their competitors are building a direct communication channel with their customers by putting up a fan page, I think it would start a Gold Rush to get a fan page up on Facebook to harvest some LIKEs or at-least mark their location on Google Maps so that their customers can Checkin.

Here is where it gets scary (for the local search/listing sites) - once the Gold Rush is on, people would start using Facebook search to find a good local business.

Fb-srch-res

The listing sites are still ok as long as people are using Google for their searches. There is the SEO route that they can use to drive traffic to their site. Unfortunately they cannot do the same with Facebook search. For now at-least, Bing results are below Facebook results. What is worse (for the listing sites) is that Fan pages are in public domain (you don't need to have logged-in to Facebook to view a Fan page) and Google is giving a lot of respect to facebook Fan pages!

What to get "right" the first time?

P1050146

Life is, to a large extent, forgiving. The mistakes we commit are mostly not "end-all". This should allow us to take a few liberties and motivate us to experiment with things that our heart wants in the short duration we spend on this planet. Inspite of this, a vast majority of us, let life pass without trying anything different than what our forefathers did. Why is that?

By experimentation, I mean the kind of things that give us a chance for making a positive change to our life and/or the lives of people around us. As children, we don't hesitate to experiment. As children, we break stuff, touch things we haven't seen before, and don't even hesitate to put weird things inside our mouths! We get hurt, get scolded, fall sick but still are able to bounce back and lead a pretty normal life. In fact evolution should have taught us to experiment more! Even dangerous experiments seem to have very little long term impact. My mom once decided to jump onto a pile of rocks from about 20 feet high compound wall when she was young... she has a scar.. yeah... that's it. I have no scars, but I also don't have an interesting tale on "how I got this scar".

Inspite of all the "cushions" we have in life, we don't stop from warning, scolding, even beating up a child that tries things out of the norm.  The reason (apart from life safety, which I think is something we rarely have to worry about) is that we want the child to get all things "right" the first time. Atleast for me, if my kid were to jump off a 20ft wall, it means that he was not able to judge if he is going to get hurt or not by jumping off that wall. So, he "got it wrong". But that's ok, I am sure my kid would have realised enough not to do that again. That's common sense. But I would still reprimand him. My main reason, sub-consciously, would be to drill into him not to extend the experiment to a 100ft wall!  Because, he cannot afford to get it wrong then.

As we grow older, we don't need someone else to reprimand us; we warn, scold, and beat up the child inside us that wants us to try things that our heart wishes for! Anything that doesn't kill you makes you stronger. But it is extremely important to know what can kill you! As adults, I will assume that we all know what will kill us literally or cause a slow painful death. Excluding such experiments and other obvious stupidities (like drugs), there still are a huge number of things that we pass without consideration. That's because as adults we build up a whole slew of mythical/artificial dangers around things. Or, even worse, we just don't care. So we complicate the equation of what we "think" will kill us by putting a lot of potentially unnecessary stuff on the "right hand side" of the equation. It is no longer life as it is, it becomes life, as we know it. Societal pressures and a hurted ego are the two most important among things that we think will "kill" us. No risks, no rewards is only half the knowledge. "No risk, no fatalities" is the other, implied, half. So, logically the safest thing to do would be to not take risks at all!  We will miss out on opportunities to alter our life in a positive way, yes, but at-least we don't run the risk of negatively impacting it in case we do not get it right! Right?

The only flaw in the above argument, I think, is that the things that we have to get "right" the first time we attempt it, is a very very small percentage of things that want to do. In fact, more you remove stuff from the right hand side of what you think will "kill" you, the less the number of things that we must get right the first time. False ego (oooh, how will I face my family in case I fail??) and societal pressures (what will my friends say in case I fail) are the biggies that we can consider from removal.  Even if we remove one of them, the percentage of things we need to get right the first time, comes down drastically.  It is not that we shouldn't care for the society or our ego, the point is that if our intentions are honest, life does allow us to bounce back, inspite of mistakes, from a vast majority of situations!