What is ppProf?
ppProf is a pretty primitive profiler for Java 5.0 programs. It uses the
agent interface to instrument byte code
during application loading.
What is "primitive" about it?
It doesn't do much. It tries to be as simple as possible, and to
generate as little overhead as possible. Other profilers offer
graphical output, allocation of run time to stack traces, and remote
control.
ppProf just counts method invocations and produces a "flat" profile,
i.e. a list of time spent per method. This alone can make your program
several times slower, depending on the circumstances.
Why does it still
take so long?
My pet example, which takes 1-2 minutes of CPU time when not profiled,
performs about 1155 million method calls on "user" code, i.e. not
counting java.* classes and the like. If profiling introduces
a 1 microsecond overhead per call, that's 1155 seconds additional
runtime when the application is profiled. That's about 20 minutes
extra, or 10 times the original runtime.
The are two obvious counter-measures: profile less methods, and reduce
the overhead.
Profiling less methods
This is partly achieved by application defaults, which can be
overridden, and the restriction to classes loaded by the
SystemClassLoader (hardwired in ProfTransformer,java). For the pet
example, the bulk of invocations are inside the XML parser. After
excluding org.apache.xerces and org.jdom, the number of method calls
drops to 68 million, or by a factor of 17.
Future versions may attempt to attribute time spent in excluded
packages to that package, rather than to the callee.
Reducing the overhead
Reducing the profiling overhead is done by restricting the
functionality to the bare minimum. In particular, ppProf does not
attempt to associate time with "call stacks" like other profilers. It
just eats up cycles and space (since you have to build the call
stacks).
Also, ppProf doesn't do any heap profiling, or other fancy stuff. And
it calls the timing function only once per method invocation, roughly.
Consequently, ppProf should only very rarely die with an OutOfMemory
error if your settings are not good enough. It should just take longer.
Less obvious problems
I was quite amazed when my AMD X2 4400 Linux box was four times
slower than my tiny sub-notebook with a 1100 MHz ULV Pentium CPU.
This turned out to be an OS issue: The performance of the gettimeofday
system call. SMP slows it down by a factor of ~16 on my hardware. Some
experiements yielded a little JNI quickie to circumvent this.
The last respot
If timing is too slow, you can use an option to just count calls. The
overhead for this is pretty small. Based on the result, you can exclude the most frequently called classes or packages.