Simpleperf: The Premier CPU Profiling Tool
simpleperf is the native CPU profiling tool included in the Android NDK and AOSP. It utilizes the Linux perf_events subsystem to profile Android applications and native processes. Unlike Dalvik/ART specific profilers, simpleperf provides a unified view of both Java/Kotlin and C/C++ execution, making it indispensable for AOSP platform engineers.
Running Simpleperf
You can capture a CPU profile of an actively running surfaceflinger process directly via adb shell:
# Find the PID of surfaceflinger
adb shell pidof surfaceflinger
# Record a 10-second profile using simpleperf
adb shell simpleperf record -p <PID> -g --duration 10 -o /data/local/tmp/perf.data
# Pull the report to your host machine
adb pull /data/local/tmp/perf.data
Sampling vs. Instrumentation Profiling
Understanding how data is collected drastically impacts how you interpret the results:
- Sampling Profiling: Periodically interrupts the CPU (e.g., 1000 times a second) to record the current instruction pointer and call stack. It has low overhead and provides a statistical overview of where CPU time is spent.
simpleperfrelies heavily on sampling. - Instrumentation Profiling: Injects code at the entry and exit points of functions. It provides precise call counts and durations but introduces massive overhead, which can distort real-world timing.
Debug.startMethodTracing()in Android relies on instrumentation.
CPU Hotspot Identification
A CPU hotspot is a specific function or block of code consuming a disproportionate amount of processor cycles. Once you have a perf.data file, you can analyze it to find these hotspots.
Using the simpleperf report command generates an interactive command-line view of the hottest functions:
adb shell simpleperf report -i /data/local/tmp/perf.data --sort dso,symbol
In AOSP development, generating a flamegraph is often the best way to visualize hotspots.
# Convert perf.data to a flame graph using simpleperf scripts
python simpleperf/scripts/report_html.py -i perf.data -o report.html
Lock Contention Analysis
Often, a process isn't CPU-bound because it's computing; it's blocked waiting for a lock. Lock contention occurs when multiple threads try to acquire the same mutex or monitor simultaneously.
ART trace files can reveal lock contention in Java space. You can force an ANR trace to inspect thread states:
# Trigger an ANR dump to inspect thread states
adb shell kill -3 <PID>
# Read the resulting trace
adb shell cat /data/anr/traces.txt
Look for threads in the Blocked state waiting on a specific monitor held by another thread. In native code, you can use simpleperf to track sched:sched_switch events to see exactly when and why threads yield the CPU.
CPU Frequency Scaling Impact
Modern Android devices use dynamic voltage and frequency scaling (DVFS) to balance performance and battery life. CPU profiling results can vary wildly depending on the current CPU frequency.
If the device aggressively scales down the CPU during your test, a lightweight function might appear as a hotspot. To stabilize profiles, you can lock the CPU frequency using adb shell:
# Disable thermal throttling temporarily (root required)
adb shell stop thermal-engine
# Force the highest CPU frequency for a specific core (e.g., CPU 0)
adb shell "echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor"
Note: Always restore the default governor (usually schedutil) after profiling to prevent overheating.