I recently needed to profile an algorithm for cache efficiency, and in doing so, went in search of a tool that could be used to measure cache misses on Linux. In doing so, I ended up learning about the Linux Perf tools. Here are my notes.
The Linux Perf Tools provides a set of terminal-based tools for counting, sampling, and visualising both hardware and software performance events on Linux. This package of useful tools is wrapped up in a single CLI called
perf list will display a list of all of the available events that can be measured, such as:
These events can be specified using the
-e <event> argument to the majority of the other performance tools.
perf stat command can be used to compute a running total of the number of times one or more events are triggered. This is particularly useful in gaining an overview of whether your application is generally compute bound, memory bound, stalled, etc. For example, here is the output of running Maya 2014 using
perf stat --detailed -- M2014, which shows us that the majority of its startup is spent stalled in idle cycles:
perf record command can be used to sample a given set of performance counters at a given frequency, and is typically more useful in isolating performance hotspots in a program. For example, we can sample stalled cycles in Maya 2014 by running:
perf record -e stalled-cycles-frontend -F 1000 --call-graph -- M2014
--call-graph argument is optional, and specifies that full stack traces should be recorded.
The results of
perf record are written to a file called
perf.data (unless specified otherwise). Whilst the file itself is difficult to parse by hand, the
perf report tool provides a terminal-based interface for interactively visualising and navigating the results:
If stack traces are recorded, then the interface allows you to dive into the stack trace, and also zoom into the annotated source code if it can be found on the system.
It is also possible to perform a system-wide capture across all CPUs by using the
perf record -a command (note that you need elevated privileges for this). For system-wide performance monitoring, the
perf top command is also really useful - it provides a running sample of the most expensive processes and symbols, which is automatically refreshed at regular intervals.