The PerfMon Blog

August 29, 2008

PerfMon PerfOrmance

Filed under: Performance Monitoring — Tags: , , — Tyler Fullerton @ 9:54 am

Hi everyone,

I thought I’d post some graphs on the performance of the PerfMon Blog over the last couple months.  There are a few reasons for doing this:

  1. I want to see how consistent performance of a WordPress blog is.
  2. The graphs added to the page will mean more objects being downloaded which will affect the weight of the page, which in turn will allow me to look at how adding images will affect the performance of a page.

This first graph shows the average load time of the PerfMon Blog over time.  We can see from this graph that the performance of the blog is pretty consistent over a 30 day period (around 1.84 seconds) and rarely do errors occur (the errors on the graph are timeout errors that occurred when the perfmon.wordpress.com page took longer than 3 seconds to download):

Average load time performance of the PerfMon Blog.

Average load time performance of the PerfMon Blog.

So far we see that the performance is pretty consistent from a high level.  The timeout errors when the page takes longer than 3 seconds to download does not concern me too much (it’s bound to happen) and a very consistent performance (that is a non-spikey graph) is always a good sign, especially when monitoring from geographically dispersed locations.  Now let’s drill down a bit and see what the performance of the page looks like from the object level (images, CSS, JavaScript files).  Here we have a waterfall style graph called a Full-Page Breakdown:

Full-Page breakdown for PerfMon Blog

Full-Page breakdown for PerfMon Blog

The graph shows us the performance of the items on the page and breaks down that performance into three key values: DNS lookup time, latency time (i.e. 1st packet time), and transfer time.  Since monitoring is being performed from an IE browser we know that we’re seeing the actual performance of the page (JavaScript execution, rendering and layout, as well as any objects requested asynchronously…if any).  The object level performance looks pretty good the only thing I’d really like to comment on is the DNS lookup time (the yellow values of the graph).  It seems that because of a number of third party items (analytics and advertising code) there is a bit more DNS lookup time then I’d like.  The way IE works it will perform a DNS lookup for a domain only once and then cache that information for the remainder of the session.  So every time a new domain is introduced a DNS lookup needs to be performed.  We spent about 0.5 seconds doing DNS queries!

This final graph is here to show how the PerfMon Blog performs for viewers coming from different locations.  It looks fairly good:

Uptime and average load time for PerfMon Blog

Uptime and average load time for PerfMon Blog

This is fairly good performance.  The average load time (per monitoring location) is represented by the green line.  This line is rather straight meaning that the performance is consistent regardless of where in the US you are viewing the PerfMon Blog from.  That’s good news!  Often this line will start out low on the right (meaning the performance is good from those locations) and will increase as the line moves to the left of the graph indicating that users further from the server hosting the web-site see poor performance.  The background of the graph is broken down into 3 values:

  1. Green – This is the percentage of successful samples taken from that location.
  2. Yellow – This is the percentage of unvalidated errors.  An error is unvalidated if another monitoring location is unable to duplicate the error that was reported.
  3. Red – This is the percentage of validated errors from the location.  An error is validated if a number of monitoring locations reported the same error.

We can see that certain regions generally see more errors (validated and unvalidated): Salt Lake City, Boston, San Jose, Newark, Scranton, and Los Angeles.  This coeincides with their higher load times as well.  Overall, the performance of the PerfMon Blog page is pretty good.  The question is; is that because it has minimal content, is hosted on a solid infrastructure, or has very few people viewing it ;) .

August 22, 2008

SOA Management Framework

Filed under: Business Considerations, Performance Monitoring — Tags: , , , — Tyler Fullerton @ 2:47 pm

I’ve been reading a book on SOA architectures by Nicolai M. Josuttis which provides a very accessible introduction to SOA (Service Oriented Architecture) design, benefits, and established best practices. One theme that keeps coming up is Collaboration, and in fact Nicolai states:

One key requirement for SOA is collaboration (pg 104).

The collaboration that Nicolai is talking about in the book is among isolated departments or business units within a company and is a key factor in ensuring the success of SOA (pg 104). This need to collaborate is a major driving force behind the decisions that need to be made to manage a SOA application.

Take the example of any company that has successfully impelemented a SOA based application. The company has overcome the inflexibility of large number of complex, distributed systems by creating a framework of services and processes that expose functionality to the consumers (users) of the application. This leaves the company with a number of services and processes that interact in a choreographed manner where no process has total control and all processes and services have limited knowledge of the over all application. These departments have given up some knowledge (and control) to be a part of a more flexible federated application (think about U.S History. At the Constitutional Convention of 1787 the states were basically asked to give up some of their influence and power to a centralized federal government. Some influence is gone but the resulting government is stronger and more flexible). Now there is a flexible (and scalable) infrastructure but there isn’t any unifying view of the application (other than the application itself but if you look at only the application you become ignorant to the components underneath).

In a SOA environment, monitoring of these services and processes is going to become more and more critical because of the limited scope of knowledge each department has. I think SOA based applications are still a relatively new concept that companies are experimenting with so there really hasn’t been any consideration as to how to ensure the performance of these applications and distribute the results to all interested/involved parties. What I suspect is that a need will arise (if it already hasn’t) for a platform in which all functional and non-functional requirements of a SOA base environment can be managed. I’ve heard that this isn’t even possible because of all the different ideas and methodologies for implementing SOA but it seems clear that some base framework that doesn’t contribute to the underlying architecture be present for management of the architectures requirements.  What I’d really like to see is a platform that allows you to plug-in non-functional requirements (e.x. performance monitoring, SLA management, Business Process Management, etc.) as needed. A SOA management platform would help alleviate the pains that can occur when a company’s culture acts to resist collaboration. My experience with performance monitoring tells me that unless such a platform exists, there will never be widespread adoption of monitoring for SOA based applications.

August 12, 2008

I left my heart in San Francisco (Web 2.0 that is)

Filed under: Performance Monitoring — Tags: , , , , — Tyler Fullerton @ 3:26 pm

Earlier this year (March/April) Webmetrics exhibited at the O’Reilly Web 2.0 conference in San Francisco and we found that there were quite a few unanswered questions on the mind of fellow exhibitors and attendees. The most prominent questions were:

  1. Service Level Agreements (SLA) – Just about everyone who came to the Webmetrics booth had some sort of requirement for SLA reporting. Mostly we saw that the requirement was to provide an SLA to users of a service (since many of the exhibitors were companies that have a SaaS model/platform or at least were providing web services that could be used by their clients to extend functionality of existing products. There were some cases where tracking SLA values was more geared towards keeping tabs on SLAs that are offered by third parties but the overwhelming majority were looking to provide their clients with the SLAs that had been agreed upon. This indicates that many companies are becoming proactive in sharing information with their clients (in the form of SLAs). Which leads to…
  2. Collaboration – Everyone understood. Very rarely did someone not get the idea of collaborating with third parties or partners. One of the main ideas behind the Web 2.0 movement is to develop software using a service model. Just about everyone in attendance of the conference was entrenched with some sort of third party. People are naturally suspicious which makes for a bad situation when a third party offers up some metrics that were collected in house. Often reports are generated by the provider of a service and then handed over to the user without any explanation of what errors are, where data was collected from, or even…god forbid, incomplete data sets.
  3. Problems – Finally, the majority of people who stopped by the booth had experienced some sort of performance issue. In most cases it was uptime, that is, the service being provided was not available for extended periods of time (or unavailable for short periods of time very frequently). Although users of web services are becoming more sophisticated with their consumption they need really need to buckle down and pay attention to SLAs and demand (or track) SLA information (see the first point).

I will be at the Web 2.0 conference in New York City this September. Please feel free to stop by the Webmetrics booth and share your thoughts and opinions with me on the performance issues that surround web eco-systems and Web 2.0 applications. September.

The performance for this blog (today’s average load time and uptime):

  • Average load time is 1.78seconds.
  • Availability (uptime) is 100%.

August 6, 2008

More MobileMe

Filed under: Load Testing, Performance Monitoring — Tags: , , , — Tyler Fullerton @ 11:22 am

The other day Ohm Malik wrote an interesting post about the availability issues with Apple’s MobileMe site. He made quite a few key observations related to monitoring and load testing that I wanted to reiterate here. They are:

  • There is no-unified IT plan vis-a-vis applications; each has their own set of servers, IT practices and release scenarios. This is becoming more and more of a problem with adoption SOA architectures, SaaS, and mashup models (see yesterday’s post for more information on monitoring these types of architectures). You now need to work closer with your partners when performing load testing to ensure that all components of your application are thoroughly tested. A load test is of little value if it fails to test 25% of your application/infrastructure. This requires more up-front time planning a load test and if you’re using a vendor for load testing then ensure that they provide services that are consultative and strive to understand your environment and infrastructure.
  • There’s no unified monitoring system. Monitoring data from all sources (server performance, external application performance, analytics, etc.) needs to be considered and in an ideal environment would be mashed up to gain new insight into the data. Consistent and fine-grained monitoring intervals are two properties that are important in effectively monitoring a web application.

Even the giants like Apple will feel the pain if they fail to ensure performance and availability of their applications through pre-release load testing of infrastructure and on-going monitoring for performance and continued availability.

The performance for this blog (today’s average load time and uptime):

  • Average load time is 1.81seconds.
  • Availability (uptime) is 100%.

August 5, 2008

Web Eco-System Monitoring

Filed under: Performance Monitoring — Tags: , , , — Tyler Fullerton @ 3:08 pm

Seems like with cloud computing and the SaaS/PaaS models everyone is starting to really ask questions around availability and Service Level Agreements. And why shouldn’t they be? Moving your critical application functionality outside your firewall and onto the servers of a third party is a big step that makes you vulnerable to the downtime of systems you have little to no control over. Both Amazon and Twitter have had high profile downtime events recently which have caused users to really dig deep into the SLAs of these services.

The concept of monitoring performance of an application can become quite complex when third party services are used to provide key functionality. On the one hand you need to be able to drill-down on the various components that make up the application (previously it would have been sufficient to monitor the performance of the web based application from just an external users perspective, but now your servers are the users that are using third party components so that monitoring needs to be done from behind your firewall). In addition to the added complexity of end user perspective is the sharing of the data that is being collected. If you’re a producer of services then you want to be able to share your uptime and performance SLA statistics (creating an open and trusted relationship with your users) and if you’re a consumer of third party services then you will want to be sharing the appropriate performance data with various departments in your organization (as well as be able to present that data to your third party providers if SLAs are ever in question).

If you have a moment, check out the Webmetrics Eco-System monitoring platform for a solution to managing the SLAs of third party components and mashups. I would be interested to hear any feedback you have. I promise I will not focus this blog too much on Webmetrics based products and will try to keep it focused on web performance monitoring concepts and best practices. In this case I feel that keeping track of the availability and performance of third party components and services is a critical factor in managing any web application that utilizes third party services.

The performance for this blog:

  • Average load time is 1.73 seconds.
  • Availability (uptime) is 100%.

Blog at WordPress.com.