Full Instrumentation vs. Sampled Instrumentation
During CPU profiling,
a "method entry" call is injected at the beginning of each profiled
method, and a "method exit" call before each return.
These "method entry"/"method exit" calls generate a timestamp.
The difference between full and sampled instrumentation is when the timestamp on the instrumented methods is taken:
- Full instrumentation mode
Each "method entry" and "method
exit" call generates a timestamp, and the time spent in the method
is calculated as a difference between these timestamps.
- Sampled instrumentation mode
Timestamps are taken only on those
"method entry"/"method exit" call that fall approximately at the end of each
specified sampling period.
Full Instrumentation
When the default full instrumentation is used, both the "method entry"
and "method exit" injected calls record the timestamp every time
each of them is invoked. The execution time for a target application method
is calculated as the difference between the two timestamps. In this way, you
get locally precise timings.
The drawback to this is that the overhead of the OS call that returns
a high-resolution timestamp is pretty high – on the order of a few hundred
nanoseconds on machines running the Solaris and Linux operating systems, and
over 1 microsecond on machines running Windows. Thus, if your application contains
a lot of small methods (a few lines of code) executed frequently, you may discover
that the overhead when using full instrumentation is very high - between a few
tens and a few thousand percent, depending on the application.
Sampled instrumentation
Sampled instrumentation is a hybrid method that provides the advantages of both the traditional
instrumentation (counting the exact number of method invocations) and traditional
sampling technique (small overhead). When this method is used, the "method
entry" and "method exit" calls count the number of invocations,
but don't take a timestamp every time they are called. Instead, they check a
per-thread flag that a separate thread of execution, managed by Profiler, sets
at a specified period.
When the flag for a thread is true, the next call to
"method entry"/"method exit", whichever comes first,
takes a timestamp. It then charges the difference between this timestamp and
the previous one recorded in the same way, to the method that is currently on
top of the thread's stack. In this way, the number of calls to the OS high-precision
timer is reduced dramatically.
For highly call-intensive applications that make
around 10,000 – 1,000,000 calls per second, this translates into a very
significant overhead reduction (10 times or more). Furthermore, it appears that
for such applications this profiling method may actually give more precise results.
This is because it tends to create less disruption to optimizations that the
dynamic compiler in the JVM and the CPU may apply in the course of the program
execution.
The only drawback of this scheme is that it will not give you precise
results for methods that are executed infrequently and for a short time. However,
unlike traditional sampling, it will at least record the exact number of invocations
for these methods.
Which should you choose?
Our advice for choosing a profiling method is to start
with full instrumentation. If you observe that the overhead is 100 per cent
or greater, and the call intensity is more than 10,000/second (usually both things
come together), consider switching to sampled instrumentation. For highly call-intensive
applications, results produced by full and sampled instrumentation may be slightly
different. In this case, sampled instrumentation typically produces more accurate
results for the top 10-20 methods, whereas full instrumentation is more accurate
for the rest of the application code.
See also