December 30, 2003
Scale and Performance
Although there's a good argument that software performance matters per se, and other arguments that (in the long view) code performance really isn't worth worrying about... recently I've been working on some speedup because it will help my applications scale. Performance is one of several gating factors (processor, memory, disk, network...) in the server-based application I've spent a lot of time with recently. Server-based? Yes, why. A Groove Server, natch. The Groove EIS, which is all the Groove peer-synchronization infrastructure pieces (comms, storage, crypto, awareness, dynamics, etc) with minimal UI and some quite aggressive "passivation". The current release of EIS (2.5i) has some fairly low limits on the number of active shared spaces, and several people (the EIS development team, QA, a few very demanding customers, and some folks like myself kicking in from the sidelines) have worked hard to eliminate the big static limitations for the 2.5j release. So far, we're very pleased with the results. The process has shown me a few things about scale, optimisation and performance. I'm still not really sure how to quantify things, though. Scale Several things prevent an application from scaling. In the case of EIS, the first barrier has been memory: we've seen EIS hit Windows' 2GB address-space limit (with around a thousand simultaneously-open Groove spaces), and it's not pretty: there just ain't no more memory. Adding RAM ($500 for a couple gig extra seems cheap enough)? Makes no difference at all. 32-bit apps simply don't do that. It's possible to tweak the OS some, but the only way to escape the 32-bit address limits would appear to be some major esoteric low-level reconstruction, or a complete move to a 64-bit environment - and I'm not expecting that to happen overnight. Meanwhile, this is a wake-up call: even my work laptop is close to being "memmed-out". (Oh no! Another few years of thunks!) Fortunately, memory is only a problem when you use lots at once, and we've learned not to do that with our "bots". After memory, CPU. My code (shuffling data between Groove shared space tools and an Oracle or SQLServer database) was eating processor cycles, even on a chunky Xeon box. While this wasn't an immediate showstopper, it did limit the number of shared spaces we could synchronize in a few-hour daily window (to low-thousands), and that in turn made it difficult to schedule various different things to happen at the right time. The integration code in this case is JavaScript, but I still scraped somewhere above a fivefold performance gain in the test lab, and nailed at least one O(N^2) problem. (Which was to be the topic of this entry here. The gory details are interesting, I promise, but they'll wait). The kicker was to put the new code into production, and see... zip. Nada. Approximately zero (well, maybe 25%). Turns out, the lab environment has a gigabit network to a very-lightly-loaded and massively overspecified SQLServer. The production network to the production database is a little different, and all the code in the world won't make it much faster. Is it worth it? The apparently-trivial performance gain does mean a very significant gain in scalability. After all, the application was CPU-bound; now it's externality-bound, and some simple expedients (careful indexing, for example) can make a big difference on the database side. Since we need several distinct servers for this customer's environment, there's also a case for considering a virtual machine architecture. VMWare ESX for example, which I've also been tinkering with: fascinating. That network-bound app? Just run two or three virtual servers on the one piece of hardware. The 32-bit address space limit on a box with slots for 16GB? Run multiple machines on one. Quantifying Quantifying (setting guidelines for scalability) suddenly got a lot harder as a result of all this work. Previously, we could safely say: you'll have problems running more than 1500 shared spaces on a single device. Now, it's a multivariate problem. How much workload? What sorts of activity in those spaces? Are the users on the same LAN? What external systems are you talking to? What's the lifetime of your spaces? Thse are interesting questions, though. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||
vcard
archives: January 2005 December 2004 November 2004 October 2004 September 2004 August 2004 July 2004 June 2004 May 2004 April 2004 March 2004 February 2004 January 2004 December 2003 November 2003 October 2003 September 2003 August 2003 July 2003 June 2003 May 2003 April 2003 March 2003 February 2003 January 2003 December 2002 November 2002 October 2002 September 2002 August 2002 July 2002 June 2002 May 2002 April 2002 March 2002 February 2002 January 2002 December 2001 November 2001 October 2001 September 2001 August 2001 July 2001 June 2001 see also: {groove: [ ray, matt, paresh, mike, jeff, john ], other: [ /* more blogroll to follow */ ] } The views expressed on this weblog are mine alone and do not necessarily reflect the views of my employer. RSS 2.0 RSS 1.0 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||