VGprof
VGprof is a modified version of gprof used for interpreting the files
generated by the vgprof skin for Valgrind. The output is essentially
the same as for normal gprof usage, except that:
- any program can be profiled without having to recompile
- vgprof works with threaded programs
- vgprof can profile within and between shared libraries, as well
as the main program.
Getting vgprof
The patches against binutils-2.13 are available here.
Here is a pre-compiled version of vgprof.
Changes to gprof
New somap tag
The vgprof skin can include a new somap record in the vgmon.out file.
These records contain a mapping between a range of virtual addresses
and a full path to a shared library. This allows vgprof itself to
extract symbolic information from the shared libraries.
Histograms over the whole address space
Standard gprof only allows histogram records to specify a single address
range (that is, if more than one histogram record exists in the file, all
much cover the same address range). Since the vgmon.out files generated
by vgprof can contain information about code mapped all over the address
space, it includes multiple histogram tags with addresses spread whereever
samples were recorded. I modified gprof to handle a sparse array of
histogram samples all over the address space.
Performance improvements with large numbers of symbols
Several algorithms in gprof were sped up (mostly with hash tables) to
deal with large numbers of source files and symbols. Unfortunately some
algorithms are still too slow, particularly line-level profiling.
Invoking the vgprof skin
The basic form is:
valgrind --skin=vgprof program
By default, this will record function calls graph edges and an instruction
count in the histogram. It will not record any basic-block level information
or generate an somap.
Command line options are (defaults in bold):
- --histo=yes | no
- Include histogram information in the output file.
- --histo-scale=16
- How much code is accounted into each histogram bucket. The default
is 16 bytes per bucket.
- --units=instructions | walltime | cputime
- What is accumulated in the histogram. Instructions counts x86
instructions executed, with a simple weighting scheme to account for the relative
execution times (this is almost completely bogus). Walltime counts
real time, in milliseconds. Since this doesn't count across system calls
(at the moment), it doesn't account for any time spent blocked, and is therefore
much the same as cputime (which isn't implemented at all yet).
- --unit-scale=auto | N
- The scale factor applied to each histogram sample before being written
to the file. One of the limitations of the gmon.out file format is
that histograms are only 16 bits per bucket, whereas the counts generated
by vgprof are often considerably larger. This is the scale factor applied
to the counts as the file is generated to prevent information loss due to
clipping. The default, auto, will determine a power-of-10 scaling factor
which prevents any buckets from overflowing. This is only computed the
first time a profile output is generated; subsequent profiles use the same
scale factor, even if this would lead to overflow.
- --bbcount=yes | no
- Include a count of each basic block's usage.
- --call-graph=yes | no
- Include edges in the control flow of the program (resolution defined
by --calls-only)
- --calls-only=yes | no
- Include only function calls in the call graph. Otherwise, the
call-graph will contain every edge from basic-block to basic-block.
- --text-only=yes | no
- Only instrument code in the text segment of an object. This excludes
basic blocks which are part of the dynamic linker's mechanism, which only
confuses the output (running the program with
LD_BIND_NOW
set
also helps).
- --somap=yes | no
- Include a set of somap records in the output.
- --prof-output=vgmon.out
- Sets the output file name. The actual name used is vgmon.out.pid[.count],
where count is appended if more than one profile file is generated.
Using vgprof to view results
The basic usage is:
vgprof exe vgmon.out [vgmon.out...]
If multiple vgmon.out files are specified, they are added together and
treated as one.
The only new command line option is:
- --object-path=path:path
- This is the path searched when looking up objects listed in an somap
record. This allows profiles to be viewed from a machine other than
the one which recorded the profile, while still allowing symbols to be extracted
over NFS.
All the options which write out summary gmon.out files are currenly disabled.
VGprof skin client requests
VGprof implements a couple of client requests. These are:
VALGRIND_DUMP_PROFILE(zero)
- This dumps out a snapshot of the current profiling information. If
zero is true, then the counters are atomically set to zero after
writing out the file.
VALGRIND_ZERO_STATS()
- This zeros the current counters. This is useful for timing a
specific piece of code.
gprof will take multiple vgmon.out files and accumulate the results, so
repeatedly using DUMP_PROFILE with zero doesn't actually lose any information.