AOSP Expert & Production Engineering
5 min read

Trace Analysis Workflow

End-to-End Performance Debugging

Performance analysis in AOSP is rarely solved by looking at a single metric. A dropped frame or a slow app launch is usually the result of a chain of events spanning the application, the system server, the kernel, and the display hardware.

Mastering the trace analysis workflow means knowing how to combine tools like Perfetto and Winscope to build a complete narrative of what went wrong.

Combining Perfetto and Winscope

The ultimate debugging setup involves capturing a Perfetto trace (for CPU, threads, and timings) and a Winscope trace (for window states and layer hierarchy) simultaneously.

Modern versions of ui.perfetto.dev support importing Winscope .pb files alongside Perfetto traces. When you do this, the timelines are synchronized.

  1. You can see a long draw call in Perfetto.
  2. You click on that timestamp.
  3. The Winscope panel immediately updates to show you the exact state of the UI layers at the moment that draw call started.

This correlation is essential for understanding why a thread was doing what it was doing.

Frame Timeline in Perfetto

One of the most powerful features in modern Perfetto for UI analysis is the "Frame Timeline". This track simplifies the incredibly complex choreography of Android rendering.

When an app wants to draw a frame, it involves three main steps:

  1. App: The application thread measures, layouts, and draws the UI elements into a buffer.
  2. RenderThread: The hardware accelerated rendering thread takes those drawing commands and sends them to the GPU.
  3. SurfaceFlinger: The system compositor takes the buffers from all apps and the system UI, and combines them into the final image sent to the display.

The Frame Timeline visually connects these three phases.

Expected Timeline vs Actual Timeline

Perfetto calculates an "Expected" timeline for every frame based on the display's VSYNC rate (e.g., 16.6ms for a 60Hz display). Below the Expected track is the "Actual" track.

  • If the Actual block finishes before the Expected block ends, the frame is on time (Green).
  • If the Actual block extends past the Expected block, you have a Dropped Frame (Red/Jank).

By clicking on a red frame, Perfetto draws arrows connecting the App's work, the RenderThread's work, and SurfaceFlinger's work for that specific frame.

Identifying the Root Cause of Jank

When you identify a dropped frame in the Frame Timeline, your next step is to figure out which component missed its deadline.

1. Long App Draws (Main Thread Blocked)

Follow the arrow from the dropped frame to the App's UI thread.

  • Is the Choreographer#doFrame slice very long?
  • Look inside it. Are you doing heavy JSON parsing (inflate), reading from disk (orange Sleep state), or is garbage collection (GC) running and pausing the thread?
  • Solution: Move I/O or heavy computation to a background thread. Optimize layout hierarchies to speed up inflation.

2. Long RenderThread Times (GPU Bound)

If the App thread finished quickly but the RenderThread took too long, the UI is too complex for the GPU to render within the VSYNC window.

  • Are there many overlapping layers causing overdraw?
  • Are you using expensive rendering effects like blurs, complex paths, or hardware layer updates (saveLayer)?
  • Solution: Simplify the UI, reduce overdraw, or flatten view hierarchies.

3. Scheduling Delays (CPU Starvation)

Sometimes the doFrame slice is short, but it started very late, missing the VSYNC deadline entirely.

  • Look at the state of the UI thread just before doFrame starts. Is it marked as "Runnable" (Blue) for a long time?
  • This means the thread wanted to run, but the kernel scheduler didn't give it a CPU core.
  • Look at the CPU tracks at the top of the trace. Are other background processes (like Play Store updating apps, or dex2oat compiling code) hogging all the CPU cores?
  • Solution: In AOSP, this often requires tuning cgroups, adjusting thread priorities (nice values), or ensuring UI threads are pinned to big cores via the cpuset subsystem.

End-to-End Trace Analysis Example

Let's walk through debugging a "Slow App Launch".

  1. Capture: Start recording a Perfetto trace and launch the app from the launcher. Stop the trace when the app is fully drawn.
  2. Locate: Search the trace for the bindApplication slice in the app's process track. This marks the start of the app's initialization.
  3. Analyze System Server: Look at the system_server track right before bindApplication. You will see ActivityManager handling the intent and PackageManager looking up the app. Is there a delay here? Perhaps system_server is blocked on a slow disk read?
  4. Analyze App Startup: Go back to the app's main thread. Look at the time between bindApplication and the first Choreographer#doFrame.
    • Is Application.onCreate() taking a long time initializing SDKs?
    • Is Activity.onCreate() inflating a massive XML layout?
  5. Analyze First Frame: Look at the first draw call. Is it taking 100ms instead of 16ms because it has to load bitmaps from disk for the first time?
  6. Correlate with Winscope: If the app claims it drew the frame quickly but the screen was still black, open the Winscope trace. Look at the SurfaceFlinger layers. Did the StartingWindow (splash screen) get removed before the app's first buffer was actually ready to be composited?

By methodically following the execution flow across process boundaries, you can pinpoint the exact source of performance degradation in Android.