GPU Counter Profiling
When optimizing graphics in AOSP, you must look beyond the CPU. The GPU operates asynchronously, and CPU-side timings do not reflect actual rendering costs. GPU hardware counters provide precise metrics on how the GPU utilizes its execution units, texture cache, and bandwidth.
Each System-on-Chip (SoC) vendor provides specialized tools to read these hardware counters:
- Qualcomm Snapdragon Profiler: Crucial for Adreno GPUs.
- ARM Mobile Studio (Mali Graphics Debugger): Essential for Mali GPUs.
- PowerVR Graphics SDK: Used for Imagination GPUs.
These tools connect over adb and sample internal registers, allowing you to track metrics like ALUs utilized, texture fetch stalls, and memory bandwidth saturation in real-time.
RenderDoc for GPU Frame Capture
RenderDoc is the industry standard for frame capture and graphics debugging. It allows you to capture a single frame of an OpenGL ES or Vulkan application and step through every single draw call.
To use RenderDoc on an Android device:
- Ensure the target application has
android:debuggable="true"in its manifest, or the device is rooted. - Launch the RenderDoc host application.
- Connect to the Android device via the Remote Server tab.
- Launch the target package through RenderDoc.
RenderDoc allows you to inspect:
- Pipeline State: View the exact shaders, blending modes, and depth tests active for a draw call.
- Textures and Buffers: Inspect the contents of render targets, uniform buffers, and vertex data.
Shader Bottleneck Identification
Shaders are small programs running directly on the GPU cores. A poorly optimized fragment shader can bring the entire rendering pipeline to a halt.
Identifying shader bottlenecks involves checking if the GPU is:
- ALU Bound: The shader performs too many complex mathematical operations. Mitigation involves simplifying math or using lower precision types like
mediumpinstead ofhighp. - Texture/Memory Bound: The shader is waiting on memory reads. Mitigation involves reducing texture resolution, enabling texture compression (e.g., ASTC), or optimizing UV access patterns.
Vendor profilers typically provide a "Shader Analyzer" view that estimates the cycle cost of each line of GLSL or SPIR-V code.
GPU Memory Usage Analysis
Excessive GPU memory usage leads to Out-Of-Memory (OOM) crashes and system-wide jank as memory pages are swapped out. In AOSP, you can track GPU memory allocated via the GraphicBuffer allocator (gralloc).
To inspect system-wide graphics memory allocations, use the dumpsys command against the meminfo or SurfaceFlinger services:
# Check overall memory info, look for "Graphics" and "GL mtrack"
adb shell dumpsys meminfo
# Check SurfaceFlinger's specific GraphicBuffer allocations
adb shell dumpsys SurfaceFlinger
A common AOSP optimization strategy involves ensuring UI elements do not allocate overly large textures and that off-screen buffers are promptly destroyed or reused.