Hardware, connectivity and load estimation
When developing a web application, you can’t only think of software (programming language, operating system, etc.), you also have to think of hardware, connectivity and load management.
In one of my early posts, I said that I’ll leave the web app on a VPS - I rather quickly understood that I was totally wrong to think it could handle the load, which is not only related to the number of users or to the app’s success but also to the number of data tracked, the system’s core functionality.
As a matter of fact, adding more features certainly added more value but it definitely added more overhead. So why don’t I just scratch those new features out, at least for now? After all, one of the books I believe to be true on many aspects, Getting Real (by 37Signals), repeats that over and over again. While I can’t deny that many of the advices found in that book will prove to be very beneficial, but sometimes, you need to put aside a couple general rules and follow your instinct.
And that’s what I decided to do.
Some details
The application’s business logic is pretty extensive and processes data all day long, no matter if users are online or not. It also communicates with different child servers all over the web.
The database, will start with millions of entries and over 30 tables, growing by the millions of new records every day.
Sketching the virtual network
Choice of hardware
The application server needs to cope with heavy-duty tasks, as fast as possible. The database server doesn’t only need storage space but also processing speed. And finally, the web server needs to handle multiple concurrent connections while serving HTML.
After comparing multiple not-so-solid-benchmarks, reading on Web Hosting Talk and a couple of emails with some ISPs, I finally made my choice on the hardware settings illustrated in green above.
Choosing your ISP
With all the hosting reviews and comparison sites available, you’d think that the task is easy. Let me tell you it’s not. While some will strongly encourage you to look for a provider in your city (and for good reasons), I opted to go for pricing, staff experience and perfect reliability/support track record.
I started researching a bunch, some of the names that sticked: rackspace.com, theplanet.com, 365main.com, cari.net, softlayer.com. I won’t go into too much details about how I finally made my choice because I am no expert, but let’s say that rackspace.com is over-priced (and I am being nice) and theplanet.com seems to have lost its touch by growing too big, otherwise it would have done a great job.
SoftLayer is the one I decided to go with. Established in 2005, with a management team that evolved together for many years, it looked pretty solid to me. Their staff, pricing structure, website and echo in the market just reinforced that feeling.
UPDATE: Looks like I had eliminated 365Main right on time! Less than 10 days after I had made my choice, 365Main suffers a power outage which leaves BIG customers (craiglist, technorati, sixapart, adbrite, yelp, redenvelope and other) offline for several hours.
Sources:
outage at 365 Main’s San Francisco datacenter
365 Main datacenter power outage - Six Apart Technorati Craigslist
San Franciscon Power Outage - A Case Study in Downtime
Estimating your needs
For this web application, the counters just don’t start at 0. The basic servers’ configuration takes that into account but, like for any other application, the actual load inflicted by concurrent users’ requests can’t really be estimated nor benchmarked in advance.
Not so long ago, only 2 solutions would have been available. Start with just enough and be ready to upgrade fast or start strong enough and handle the first couple of months to discover your averages.
Today, there is a third solution: Amazon EC2 and S3 services. Pay-as-you-go for storage and processor usage, what better to avoid seeing your application crashing under the heavy load or your bank account drained because of the hefty initial costs that turned out useless?
Conclusion
I am no expert in that field to start with but I feel confident that I made the right choices. Time will tell I guess.
Agree, disagree?
Delays - very often, uncontrollable
In my last post, I said I would start posting more frequently after being absent for a while and here I am, 7 days later, with no posts to show.
I haven’t been procrastinating, nor have I been making more changes to the application’s plan, no - I was recovering from an unexpected surgery!
Aside from affecting the project’s timeline, this hospital stay gave me time to take a couple steps back from the development/planning side of things. Time I spent refreshing my memory with books like Building Scalable Web Sites (by Cal Henderson) and Prioritizing Web Usability (by Jakob Nielsen and Hoa Loranger).
A couple of the things that got either setup, approved or coded during my absence:
- New company (legal documents, bank account, etc.)
- A new Google Adwords API account (GAA) - never heard back from them concerning the old account’s inquiry.
- A new Yahoo! Search Marketing API (YSMA) sandbox account.
- 75% of the data cleansing classes we need.
Lesson of the day:
Early in the development process, having a detailed plan (no matter if you know that delays will occur), combined with taking action on simple third party requests (placing orders, registering accounts, etc.), helps keeping things rolling when you encounter unexpected events or delays.
Trac + Subversion on CentOS
The new app will be running on a VDS from Myriad Network for the moment. To be more precise, I find that their $56.95 per month package gives enough room to test the waters without any big initial investment. Myriad runs CentOS and to be honest, we ran into a couple of problems installing what was supposed to be very easy.
Here is a quick documentation of how we finally got it to run.
IMPORTANT: The following steps have been tested and worked for us. In no way do I guarantee they will work for every CentOS configuration and strongly suggest you do things carefully and read more documentation on the Trac and Subversion sites.
Installing Trac
- Make sure you have at peast Python 2.4 (otherwise, get ready to face some serious issues)
- Download the latest Trac version - here
- Download ClearSilver - here
- Download Swig - here
- Download SQLite - here
- Download PySQLite - here
- Install Trac
- Extract files
- Change to extracted files’ directory
- Run the following command: python setup.py install
- Install ClearSilver, Swig, SQLite, PySQLite. For each:
- Extract files
- Change to extracted files’ directory
- Run: ./configure
- Run: male && make install
- Move svnlib to python libraries directory: cp -r /path/to/svnlib /path/to/python2.x/site-packages/
- Test svn bindings from Python CLI
- Run: python
- Run: import svn core
- If everything is ok, you shouldn’t get any errors
Installing Subversion
- Download the latest version of Subversion - here
- Download the latest deps file - here
- Extract subversion
- Extract deps
- Copy/move extracted deps folder content into Subversion root directory
- Run: ./configure –without-neon
- Run: make && make swig-py && make install
Creating a repository
- Run: svnadmin create /path/to/repo
- Change to /path/to/repo/conf and edit svnserve.conf, disable authz-db
- To start SVN as a standalone server, run: svnserve -d -r /path/to/repo
- You can optionally add –listen-port PORT in the command arguments
Setup Trac for repository
- Run: trac-admin path/to/trac-env-root initenv
- Select a project name
- Enter path to your subversion repository
- Run: tracd -d -p [PORT] /path/to/trac-env/root /path/to/another-trac-en-root
Subversion cheatsheets:
Notes: Python doesn’t send out any stderr to the system, which in turn makes it hard to debug. Instead, run tracd in interactive mode instead of the daemon mode by simply taking out the ‘-d’ from the tracd -d -p [PORT] command.

