Hardware, connectivity and load estimation

Posted by Jad on July 18, 2007

When developing a web application, you can’t only think of software (programming language, operating system, etc.), you also have to think of hardware, connectivity and load management.

In one of my early posts, I said that I’ll leave the web app on a VPS - I rather quickly understood that I was totally wrong to think it could handle the load, which is not only related to the number of users or to the app’s success but also to the number of data tracked, the system’s core functionality.

As a matter of fact, adding more features certainly added more value but it definitely added more overhead. So why don’t I just scratch those new features out, at least for now? After all, one of the books I believe to be true on many aspects, Getting Real (by 37Signals), repeats that over and over again. While I can’t deny that many of the advices found in that book will prove to be very beneficial, but sometimes, you need to put aside a couple general rules and follow your instinct.

And that’s what I decided to do.

Some details

The application’s business logic is pretty extensive and processes data all day long, no matter if users are online or not. It also communicates with different child servers all over the web.

The database, will start with millions of entries and over 30 tables, growing by the millions of new records every day.

Sketching the virtual network

development_phase_-_servers_setup.gif

Choice of hardware

The application server needs to cope with heavy-duty tasks, as fast as possible. The database server doesn’t only need storage space but also processing speed. And finally, the web server needs to handle multiple concurrent connections while serving HTML.

After comparing multiple not-so-solid-benchmarks, reading on Web Hosting Talk and a couple of emails with some ISPs, I finally made my choice on the hardware settings illustrated in green above.

Choosing your ISP

With all the hosting reviews and comparison sites available, you’d think that the task is easy. Let me tell you it’s not. While some will strongly encourage you to look for a provider in your city (and for good reasons), I opted to go for pricing, staff experience and perfect reliability/support track record.

I started researching a bunch, some of the names that sticked: rackspace.com, theplanet.com, 365main.com, cari.net, softlayer.com. I won’t go into too much details about how I finally made my choice because I am no expert, but let’s say that rackspace.com is over-priced (and I am being nice) and theplanet.com seems to have lost its touch by growing too big, otherwise it would have done a great job.

SoftLayer is the one I decided to go with. Established in 2005, with a management team that evolved together for many years, it looked pretty solid to me. Their staff, pricing structure, website and echo in the market just reinforced that feeling.

UPDATE: Looks like I had eliminated 365Main right on time! Less than 10 days after I had made my choice, 365Main suffers a power outage which leaves BIG customers (craiglist, technorati, sixapart, adbrite, yelp, redenvelope and other) offline for several hours. Sources: outage at 365 Main’s San Francisco datacenter 365 Main datacenter power outage - Six Apart Technorati Craigslist San Franciscon Power Outage - A Case Study in Downtime

Estimating your needs

For this web application, the counters just don’t start at 0. The basic servers’ configuration takes that into account but, like for any other application, the actual load inflicted by concurrent users’ requests can’t really be estimated nor benchmarked in advance.

Not so long ago, only 2 solutions would have been available. Start with just enough and be ready to upgrade fast or start strong enough and handle the first couple of months to discover your averages.

Today, there is a third solution: Amazon EC2 and S3 services. Pay-as-you-go for storage and processor usage, what better to avoid seeing your application crashing under the heavy load or your bank account drained because of the hefty initial costs that turned out useless?

Conclusion

I am no expert in that field to start with but I feel confident that I made the right choices. Time will tell I guess.

Agree, disagree?

Trackbacks

Use this link to trackback from your own site.

Comments

Leave a response

Comments