Why running database applications over WAN connections are generally a bad idea

This is a problem I’ve run into more often than I’d like to admit during my regular course as a network administrator. You’ve probably heard some version of the following dialogue…

HelpDesk : Customer has called in saying the ‘network is slow’ and would like you to investigate to see why the network is causing his application issues.

Technician : I’ve checked all my traffic graphs, error counters, CPU stats, WAN links, etc and everything looks like it’s running fine. What exactly is slow?

HelpDesk : Well – the user is complaining that he has to wait for up to 5 minutes to retrieve his inventory query through <insert application here>. It never used to take that long…

Me : Has his been a progressive slowness, or did it take 3 seconds yesterday and 5 minutes today?

HelpDesk : I’m not sure…

To be fair, this complaint was more common when Microsoft Access was in heavier use, so before we all go ahead and blame the network for everything, lets take a quick look at how common database applications work in practice. A short disclaimer before I get too far into this. I’m not technically a database guy so there is very likely going to be some errors on the finer aspects of what I’m describing here. Feel free to correct me in the comments!

Let’s assume we have a database of 100,000 rows of data and the particular application I’m running is going to perform some sort of operation on them. If I have a poorly written query such that the client will go out and request all 100,000 rows of data, I have a problem. Each row required will be sent as a separate request which works out to 100k requests that must be sent on the ‘wire’ to retrieve the information we’re looking for. This is might work fine for you connected to a 1Gbps LAN, however lets hit up the math to compare what introducing a higher latency WAN link will do to performance.

At 1Gbps, the RTT (round trip time) for a large packet is going to be in the sub-millisecond range. Let’s use 0.012ms for our example. (see calculation below) and assuming there are no other impediments in the path to get this data including disk I/O, etc.

0.012ms X 100,000 requests = 1200ms  OR 1.2 seconds

Let’s say our client puts a request in to work from home, and connects to the corporate network using a home DSL or Cable connection with a reasonable amount of bandwidth available. I would venture that most VPN’s would be hard pressed to get better than a 25ms RTT to the corporate network, so we’ll use that as an example. Remember, the longer the physical distance, the higher the latency. You can’t speed up light, and every device in the path between client and server will introduce some sort of forwarding delay during packet processing.

25ms X 100,000 requests = 2,500,000ms OR ~42 minutes!

This problem has been described in great detail as the Bandwidth / Delay product.

One thing to keep in mind with these types of complaints is that the general word that starts to float about the office when problems like this become serious is bandwidth. Since most managers can buy more bandwidth, this becomes the magic bullet that will solve all problems. You’ve heard the conversation before… “Well, how much is it going to cost us to upgrade our T1 to a DS3. Just get it done!”

Now, consider this. A database request consists of a very small request for data, and subsequently for smaller databases, also a small response for each row requested. So from a bandwidth perspective if you were monitor the overall BW requirements you may find them to be quite low. The root of the problem here is the number of requests that need to be transmitted (RTT’s) which results in the poor response times. This phenomenon is what fuelled the popularity of Citrix and other thin clients back in the day, whereby the database and client application would run from the same local LAN in the office, and the user would simply look at the screenshots (thin client) of the query results. More on that later…

Fortunately, reducing latency in practice is much easier than accelerating light, since you can do it by accident, simply by increasing your bandwidth. The exception to this rule is simply increasing your BW but not changing your physical interface. Specifically – if you subscribe to a partial T1 service whereby you are allocated a certain number of timeslots (let’s use 768k), simply requesting your provider to bump your line rate to the full T1 speeds (1.544mbps) will NOT improve your performance, simply because previously you were serializing your data at T1 speeds, but limited to 768bps of bandwidth. Increasing your bandwidth will NOT decrease your serialization delay, thus provide no performance improvements. More on serialization delay below!

The concept of serialization delay is the amount of time it takes to break a packet down into its ones and zeros and send it on the wire. I may cover the details at a later date, but in the interim, check the article already published on it here.  At slower link speeds, actually increasing the BW may help reduce latency, simply because it takes less time to serialize the data onto the wire. So you’ll end up with the same amount of data transiting your WAN link, but taking less time to do it. Referencing the article above, you can see the serialization delay is almost halved by upgrading your 128k serial connection to a 256k serial connection. Your performance will improve not necessarily because of more bandwidth, but because you’ve reduced your serialization delay.

Serialization Delay for 1500byte packets

128kbps = 93ms

256kbps = 43ms

1Mbps = 12ms

10Mbps = 1.2ms

100Mbps = 0.12ms

1Gbps = 0.012ms

Another option which claims to solve this problem, are products like the Riverbed Steelhead or Cisco WAAS which perform data object caching, compression, and optimization of small packet, high transaction applications. You need to be aware that anytime a device is analysing, replacing, compressing your data in flight, there are going to be extra delays introduced. However, the optimization algorithms used in these appliances generally offset the extra latency by improving user performance enough where using them is still beneficial. In fact, there are a number of use cases where these appliances make stellar improvements in the user experience in remote offices.

Based on the above info, you now have a couple of solutions in your toolkit.

1. Increase your BW, but more importantly, reduce your latency.

2. Introduce a thin client solution to keep the client closer to the server such as Citrix.

3. Use an application acceleration engine such as Cisco WaaS, or Riverbed Steelhead to perform compression and optimization of your remote queries.

So, the next time a co-worker or manager comes to you with a BW problem, you’ll be in a better position to explain exactly what you are recommending as a solution, and more importantly, why.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s