What is ppProf?

ppProf is a pretty primitive profiler for Java 5.0 programs. It uses the agent interface to instrument byte code
during application loading.

What is "primitive" about it?

It doesn't do much. It tries to be as simple as possible, and to generate as little overhead as possible. Other profilers offer  graphical output, allocation of run time to stack traces, and remote control.
ppProf just counts method invocations and produces a "flat" profile, i.e. a list of time spent per method. This alone can make your program several times slower, depending on the circumstances.

Why does it still take so long?

My pet example, which takes 1-2 minutes of CPU time when not profiled, performs about 1155 million method calls on "user" code, i.e. not counting java.* classes and the like.  If profiling introduces a 1 microsecond overhead per call, that's 1155 seconds additional runtime when the application is profiled. That's about 20 minutes extra, or 10 times the original runtime.

The are two obvious counter-measures: profile less methods, and reduce the overhead.

Profiling less methods

This is partly achieved by application defaults, which can be overridden, and the restriction to classes loaded by the SystemClassLoader (hardwired in ProfTransformer,java). For the pet example, the bulk of invocations are inside the XML parser. After excluding org.apache.xerces and org.jdom, the number of method calls drops to 68 million, or by a factor of 17.

Future versions may attempt to attribute time spent in excluded packages to that package, rather than to the callee.

Reducing the overhead

Reducing the profiling overhead is done by restricting the functionality to the bare minimum. In particular, ppProf does not attempt to associate time with "call stacks" like other profilers. It just eats up cycles and space (since you have to build the call stacks). 
Also, ppProf doesn't do any heap profiling, or other fancy stuff. And it calls the timing function only once per method invocation, roughly.
Consequently, ppProf should only very rarely die with an OutOfMemory error if your settings are not good enough. It should just take longer.

Less obvious problems

I was quite amazed when my AMD X2 4400 Linux box was four times slower than my tiny sub-notebook with a 1100 MHz ULV Pentium CPU. This turned out to be an OS issue: The performance of the gettimeofday system call. SMP slows it down by a factor of ~16 on my hardware. Some experiements yielded a little JNI quickie to circumvent this.

The last respot

If timing is too slow, you can use an option to just count calls. The overhead for this is pretty small. Based on the result, you can exclude the most frequently called classes or packages.