Installing HP Diagnostics in a Performance Testing environment is generally fairly simple. You will probably only be using it during a short performance testing phase, and therefore don't have to worry about the long-term management issues of a system that will be used for years at a time. Personally, I budget about half a day for an install (not including custom configuration) in a non-Production environment as long as I know that I have administrator/root access to all the servers, and there are no firewalls between any of the servers.
Installing HP Diagnostics when it is intended for Production Monitoring is a much more complicated exercise, and requires a much greater investment of time. Read on for my tips...
Basically, an installation of HP Diagnostics will be broken down something like this:
- Collect system information
- Organise Diagnostics Server infrastructure
- Install in Test environment
- Custom .points file development (+ developer training)
- Determine Diagnostics overhead
- Install in Production environment
- Production baselining + alerts setup
- User training
- Ensure processes are in place for ongoing operation
Collect System Information
The first step when planning an installation of Diagnostics is to get an idea of what the system looks like. If you are lucky, there will be some architecture documents that tell you everything you need to know, but usually you need to get some technical people to draw you some diagrams on the whiteboard.
The things you will want to know are:
- The names of all the servers, and what they do (e.g. database server, web server etc)
- The software that is running on each server, and the software version (e.g. SQL Server 2008, IIS 7.5). This should include the operating system version and whether the operating system is 32-bit or 64 bit, and also JVM and .NET runtime versions.
- How many CPUs there are on each server.
- Whether there are any firewalls between the servers, or between the servers and the corporate network.
- How the servers communicate with each other, and what protocols they use (e.g. the Application server retrieves pricing data from the mainframe using WebSphere MQ, the web server verifies credit card numbers by calling the web service on the bank's servers over HTTP)
- Are you planning on integrating Diagnostics with BAC, Transaction Vision, LoadRunner, or Performance Center.
- What is the name of the time server (NTP) and mail server (SMTP)?
Once you have all the system information, you will need to determine which servers you will install the Probe/Agent on. Some companies install probes on every server that runs .NET or Java code. Some choose to install on a single server, and get indicative performance measurements from that server.
Collectors are available for SAP NetWeaver, Oracle 10g database, WebSphere MQ, MS SQL Server, and CICS, so you will need to identify if any of these components are active in your environment.
It is absolutely critical that you check all of the software versions against the Product Availability Matrix for Diagnostics. You might find that your operating system, or the version of a software component is not supported by your version of Diagnostics. Some software versions are supported by HP, but do not have the right functionality for some Diagnostics features to work - a good example would be JVM versions before 1.4.2 that do not support some of the memory analysis features of Diagnostics, or early Linux kernel versions that do not support collection of CPU time metrics by Diagnostics.
Licensing cost is based on the number of CPUs that are in the servers with Probes installed on them. Note that licensing is not free in non-Production environments (as Diagnostics can be used during performance testing, or to monitor Production).
Organise Diagnostics Server infrastructure
Once you know which servers will have Agents installed, and which servers will have metrics collected by the Diagnostics Collector, you will need to determine the sizing and location of your Diagnostics Server (in Commander mode), your Diagnostics Collector, and any Diagnostics Servers in Mediator mode.
Other factors will be what you Diagnostics Server has to interface with (BAC, LoadRunner, Performance Center), and whether you want high-availability.
For 99% of Diagnostics installations, you will probably have a single server running a Diagnostics Server (in Commander mode) and a Diagnostics Collector. For a large-scale large-scale Diagnostics implementation (more than 40 Agents), you would start to run additional instances of the Diagnostics Server (in Mediator mode) on the same physical server (to utilise the hardware more efficiently). You would only have separate physical servers for Diagnostics Servers in Mediator mode if you had a really large Diagnostics installation, or you needed to have separate servers in separate areas of the network (for security reasons, or to limit the amount of network traffic between the monitored system and the Commanding server).
Note that BAC can only communicate with a single Diagnostics Server (in Commander mode), so if you have a separate Diagnostics Server for each monitored application, this might be a justification for separate servers.
Previously, rather than order a separate server for Diagnostics, I have installed it on the application's (WebSphere) "management" server, which had the advantage of not requiring me to provision extra hardware, and not requiring me to get too many extra firewall ports opened.
The hardware requirements for Diagnostics are quite modest. I have run a large (~$600K) installation on a single 4-CPU server, with only 10% CPU utilisation.
|Platform||Item||Up to 50 Java Probes||Up to 100 Java Probes||Up to 200 Java Probes|
|Windows||CPU||2x 2.4 GHz||2x 2.8 GHz||2x 3.4 GHz||Windows||Memory||4 GB||4 GB||4 GB|
|Solaris||CPU||2x Ultra Sparc 3||2x Ultra Sparc 4||2x Ultra Sparc 4|
|Solaris||RAM||4 GB||4 GB||4 GB|
|Linux||CPU||2x 2.0 GHz||2x 2.4 GHz||2x 2.8 GHz|
|Linux||Memory||2 GB||4 GB||4 GB|
|HP-UX||CPU||PA-RISC 2x 650 MHz||PA-RISC 2x 699 MHz||PA-RISC 2x 750 MHz|
|HP-UX||Memory||2 GB||4 GB||4 GB|
|All||Heap Size||512 M||750 M||1280 M|
|All||Disk||4 GB per probe|
The firewall ports required for Diagnostics are quite simple.
|Desktop PC (corporate network)||Diagnostics Server||2006/HTTP||User interface|
|Desktop PC (corporate network)||Profiler on app servers||35000-350xx||Need 1 port for each JVM running on the app server|
|LoadRunner Controller (or Performance Center)||Diagnostics Server||2006/HTTP||For integration with LoadRunner|
|BAC||Diagnostics Server||2006/HTTP||For BAC integration|
|Diagnostics Server||Agent on App server||35000-350xx||Need 1 port for each JVM running on the app server|
|Agent on App Server||Diagnostics Server||2612/TCP||Probe registration|
|Agent on App Server||Diagnostics Server||2006/HTTP|
|Diagnostics Server||NTP Server||NTP||Time synchronisation, if you don't want to use system time|
Network sizing is quite difficult, as the amount of data sent between the Agents and the Diagnostics Server will vary depending on:
- The number of transactions per hour processed by the system being monitored
- The number of points being applied to the application being monitored
- Whether sampling is enabled, and what percentage of requests are being sampled
- Depth trimming
- Latency trimming
Install in Test environment
Hopefully the organisation you are installing Diagnostics at is mature enough that they like to ensure that something works (and doesn't break anything) before installing it in Production.
I have never seen Diagnostics break application functionality but, on some occasions, I have seen it cause poor performance (see section on Determining Diagnostics Overhead).
If a company chooses to install Diagnostics into directly into the Production environment, they might choose a lower-risk deployment model and only install it on a single server to begin with. However, developing custom instrumentation usually requires a repeated change-restart-test cycle as the points file is developed.
Develop Custom Instrumentation
Diagnostics comes with default instrumentation (points) for common classes and methods (like Struts and JDBC), which will tell you a lot about where you application is spending its time. But if you don't use one of the frameworks that already have defined points, or if you want to see how much time is spent in your business logic, then you will have to create some of your own points.
Here is an example point.
[Servlet-service] ; ------------- extends HttpServlet --------------------- ; (See HttpCorrelation point for ignore documentation) ; In addition, ignore class we know we are not interested in. class = javax.servlet.http.HttpServlet method = !(service) signature = !.* ignore_cl = javax.servlet.http.HttpServlet, com.ibm.ws.jsp.runtime.HttpJspBase, com.ibm.ws.jsp.servlet.JspServlet, com.ibm.ws.webcontainer.servlet.FilterProxyServlet, com.ibm.wps.engine.Servlet, com.ibm.ws.webcontainer.jsp.servlet.JspServlet, com.ibm.ws.webcontainer.jsp.runtime.HttpJspBase, com.ibm.ws.console.core.servlet.NodeSyncStatusServlet ignore_tree = org.apache.jasper.runtime.HttpJspBase deep_mode = hard layer = Mediator active = true
As you can probably guess from looking at it, you need a developer-level understanding of the application that you want to create the custom instrumentation for. The best way to create your custom instrumentation (and code snippets) is to work side-by-side with someone from the development team. It is still critical that you still have good Java/.NET knowledge. If you don't understand packages, classes/interfaces, method overloading, and inheritance, then you should read a book or two before you start working with this tool.
Both the Java and .NET version of Diagnostics have tools that will show you all the classes and methods used by your application ("Reflector.exe" for .NET applications, and by enabling "Capture Class Map" in the Agent/Profiler for Java-based applications), but having a huge list of methods does not help someone with no knowledge of the application.
I like to first decompose the application into logical layers (in addition to the existing layers), and then pick methods that are the entry points to those logical layers.
As you can probably guess, developing custom instrumentation can turn into a huge time-sucking black hole, as it is possible to tweak the settings forever.
It is really important that you actually test your custom instrumentation. Does it give an appropriate level of visibility into where time is being spent. One common issue is the MVC problem. Imagine that you have a URL that looks like this: http://www.example.com/controller.aspx?action=generateReport, where "action" could be anything from "logout" to "placeOrder".
As the same server request is made in each case, Diagnostics will (by default) group all calls to /controller.aspx together, even though they do completely different things, and should be reported on separately. You will frequently see this when you find that your avg/min/max call tree instances show different methods being called, even though it is the same server request. This behaviour can be changed, but requires you to write a custom point to do it.
As points are tightly coupled to the source code. They will need to be maintained as the code changes. Refactoring exercises, where lots of method names change are highly likely to break at least some of your points. It is important that you leave someone from the development team with enough knowledge to be able to maintain the points files themselves.
It is best that points files (and other Diagnostics configuration files) are stored in the same version control system as the source code for your application, as they are so tightly coupled to the application's code. It is bad practice to manually deploy points files or make direct changes to them (as you end up with different configurations on different application servers).
Determine Diagnostics overhead
One of the first questions that a customer asks during the sales cycle is "how much overhead does Diagnostics have?"
The answer will really depend on your level of instrumentation (don't create a point for every method in your application) and the sampling/trimming settings for your Agents.
Personally, I have seen Diagnostics installations with lots of custom instrumentation that had an overhead that was too small to measure with LoadRunner, and I have also seen (default) instrumentation levels that made an application completely unusable under load.
I really like to measure the performance overhead of diagnostics by running a load test (in a Test environment) with it enabled and with it disabled, and then comparing the result.
If you are installing directly into Production, you really need a good quality monitoring tool (like RUM) that will allow you to see any differences between a server with an active Diagnostics Agent, and one without.
Another question which is common is "can Diagnostics be left on all the time, or is it designed to be used only when there is a problem?" Yes, this tool is designed to be left on all the time, rather than turned on in times of crisis.
Install in the Production environment
When you install in Production, you might do a couple of things differently to how you installed in Test.
- Enable HTTPS for Agent communications, and for the user interface.
- Ensure that the account the Diagnostics Server runs under has the minmum necessary priveliges, and will never expire.
- Reset all the passwords from their default values.
- Integrate with LDAP or BAC for user authentication.
- Connect to an SMTP server to send email alerts.
Production Baselining and Alerts
If you are using Diagnostics to generate alerts, you will want to set up realistic thresholds. While you can use the thresholds established when you ran the load test to determine the overhead of Diagnostics, it is still a good idea to let the tool run for a while in Production, so you can get typical Production values.
A monitoring tool is useless without users. Diagnostics is a technical tool that the development team may pick up quickly, but this is not necessarily the case with support staff.
End-user training is a critical part of ensuring that a company gets enough benefit out of a tool that they will bother to invest time and money to maintain it.
It is good to show support staff how to diagnose a basic performance problem - i.e. start with slow server requests, check that system monitors are not showing high CPU, check time spent in outbound calls, and then drill down on specific call tree instances for key server requests.
This is a good time to make sure that everyone being trained has a login for the Diagnostics Server, and user account, and can customise their view to be meaningful (make sure they hide all the items they are not using - like CICS etc).
Ensure processes are in place for long-term operation
A lot of people think that once the install is complete and Diagnostics is running in Production, their job is over and they can go home.
It is important to give some thought to the long-term maintenance of the system.
. Ongoing ownership. Users. Maintenance. Support.
- Ensure that the Diagnostics Server will automatically restart when the server it is running on is restarted. On Windows, the service should be set to start automatically, and on Linux/Unix it should be added to the init.d script.
- Points files will "rot" over time, as the application's code is updated. Make sure there is a developer who is responsible for maintaining the points file.
- Make sure the team that uses the tool knows who to call for support.
- Make sure that Diagnostics is listed in the document that specifies all the software used by the system. In a year or two, someone will need to remember to do an upgrade before HP stops supporting the installed version.
- Make sure that "monitoring with Diagnostics" is listed in the Non-Functional Requirements document. Hopefully this will mean that it will be on the Test Manager's list of things to test when a change is made to the application.
- Ensuring that there is a requirement for Diagnostics also means that (hopefully) there will be budget available if it breaks and needs support, or if it needs to be upgraded.
- Make sure that it is being backed up regularly.
- Someone will have to know to install the Java daylight savings patches on the JVM used by the Diagnostics Server whenever the dates for daylight savings change.
- Alerting set up for the server that the Diagnostics Server runs on (e.g. SiteScope alerts or equivalent). A good example would be a disk space monitor for the partition that the Diagnostics Server writes to.
Obviously there is a lot more to installing Diagnostics than what is written here (the Install Guide for version 8.0 is more than 700 pages long). It is a good idea to do HP's training course on Diagnostics if you can. Reading the manual is very helpful. And there are some undocumented features/behavious that you will only know about if you read the comments in some of the configuration files. Good luck!