Android Architecture: From Touch Input to Pixel on Screen (Complete AOSP Journey)

Follow a single touch event through the entire Android operating system—from the Linux kernel and input drivers to the Android Framework, Zygote, Activity Manager, rendering pipeline, SurfaceFlinger, Hardware Composer (HWC), and finally the display. This end-to-end AOSP guide explains how Android transforms a simple screen tap into visible pixels while uncovering the architecture, IPC mechanisms, and system components that power every user interaction.

You tap an icon on your phone. Less than a second later, an application fills the screen. It feels instantaneous, almost like magic. But it's not magic. It's a carefully choreographed dance involving dozens of components, spanning every layer of the Android operating system.

In this final article of our AOSP Insider series, we're going to trace that single tap through the entire stack. We'll follow the signal from the moment your finger makes contact with the glass, down into the Linux kernel, up through the native daemons and framework services, into a brand new application process, and finally back down to the display hardware that lights up the pixels under your finger.

This is the full story. Let's begin.

Section 1: It All Starts with a Spark: The Kernel's Input Subsystem

Every interaction with your device begins as a physical event. So, when your finger touches the screen, what is the very first piece of software to know about it? The answer lies deep inside the Linux kernel, long before any Android-specific code gets involved.

The touchscreen hardware itself is just a sensor grid. When you touch it, it detects a change in capacitance and calculates a set of coordinates. This hardware then needs to tell the CPU that something has happened. It does this by asserting an Interrupt Request, or IRQ. This is a physical signal that forces the CPU to stop what it's doing and execute a special piece of code: the kernel's input driver for that specific touch controller.

The driver's job is to read the raw coordinate data from the hardware and translate it into a standardized format. This is where the Linux evdev (event device) interface comes in. The driver packages the touch data into a series of small, generic event structures. It doesn't know about "taps" or "swipes"; it only knows about absolute X/Y positions and whether a finger is currently down or up. It then writes these standardized events to a special character device file, usually something like /dev/input/event2.

This is the kernel's one and only job in this process: to act as a universal translator for hardware.

The hardware fires an IRQ -> the kernel driver translates it -> a standardized event lands in a device file ready for userspace.

You can see this for yourself. On a physical device with root access, you can use the getevent tool to dump these raw kernel events directly.

# First, find your touchscreen device. "name" will vary.
$ adb shell getevent -il
...
add device 7: /dev/input/event2
  name:     "touchscreen"
...

# Now, listen to that device
$ adb shell getevent -lt /dev/input/event2

Now, tap the screen once. You'll see output similar to this:

[   4313.123456] EV_ABS       ABS_MT_TRACKING_ID   00000abc
[   4313.123456] EV_ABS       ABS_MT_POSITION_X    00000546
[   4313.123456] EV_ABS       ABS_MT_POSITION_Y    000009a8
[   4313.123456] EV_KEY       BTN_TOUCH            DOWN
[   4313.123456] EV_SYN       SYN_REPORT           00000000
[   4313.234567] EV_ABS       ABS_MT_TRACKING_ID   ffffffff
[   4313.234567] EV_KEY       BTN_TOUCH            UP
[   4313.234567] EV_SYN       SYN_REPORT           00000000

This is the raw language of input. We see X and Y coordinates, a "touch down" event, and then a "touch up" event. The SYN_REPORT acts as a delimiter, telling any reader that a complete packet of information has been sent.

Did You Know: The getevent tool isn't Android-specific. It's a standard Linux utility that works with any evdev device, like the mouse on your desktop. This proves that Android is just a consumer of a standard kernel interface.

This raw stream of numbers in a device file is the kernel's final output. It's clean, standardized, and completely hardware-agnostic. But it's also useless until a userspace process reads and interprets it. That is the framework's first point of entry.

Section 2: Crossing the Border: From Kernel to Framework

We have a stream of events sitting in /dev/input/eventX. Who is on the other side of that file, waiting to read it? This is where the system_server process comes into play. It's the heart of the Android framework, and it contains the native services responsible for input.

The problem is that the raw evdev format is too low-level for the rest of Android. The system needs something more structured, like an object that represents a complete touch gesture. This translation is a two-step process handled by a pair of components running on a dedicated thread inside system_server: InputReader and InputMapper.

First, the InputReader thread opens all the /dev/input/eventX device nodes and uses the epoll system call to efficiently monitor them for new data. When the kernel writes a new event, InputReader wakes up and reads it. Its only job is to read the raw struct input_event data as fast as possible.

Second, InputReader passes this raw data to the appropriate InputMapper. There are different mappers for different device types (KeyboardInputMapper, TouchInputMapper, etc.). The TouchInputMapper is the component that understands the evdev protocol for touchscreens. It collects the raw EV_ABS and EV_KEY events, and once it sees a SYN_REPORT, it assembles them into a higher-level Android MotionEvent. This is the object that represents the user's touch.

InputReader polls the kernel device file via epoll(). The TouchInputMapper assembles raw events into a structured MotionEvent, which is queued for the InputDispatcher.

If you look at the AOSP source for TouchInputMapper.cpp, you'll find the exact logic that processes the event codes we saw from getevent. There are large switch statements that handle EV_ABS and EV_KEY, accumulating state about finger positions. This is where the raw numbers get their meaning.

Tip: An easy way to remember the difference is with an analogy. InputReader is the mailroom clerk who just opens envelopes and sorts the raw mail. InputDispatcher, which we meet next, is the executive assistant who reads the sorted mail and decides which department (app window) needs to see it.

So now we have a well-formed Android MotionEvent. The system knows the precise coordinates and action of the touch. But it still faces a critical problem: which of the many windows on the screen should receive this event?

Section 3: The Dispatcher: Who Gets the Touch?

On a typical screen, you might have a status bar, a home screen launcher, and maybe a notification shade partially visible. When a touch occurs, how does the system know it was for the app icon on the launcher and not the wallpaper behind it? This crucial decision is the responsibility of the InputDispatcher.

The InputDispatcher runs on another dedicated thread within system_server. It takes the MotionEvent from the queue filled by InputReader and begins the process of finding a target. It can't do this alone, however. It needs a map of the screen. That map is maintained by another core service: the WindowManagerService (WMS).

WMS knows about every window on the screen: its position, its size, its visibility, and its Z-order (which windows are on top of others). Here's the sequence of events:

InputDispatcher gets a MotionEvent with coordinates (X, Y).
It asks WMS: "At these coordinates, which window is the topmost, touchable target?"
WMS consults its internal state and returns a token representing the target window — in our case, the main window of the Launcher app.
InputDispatcher maintains a list of InputChannels for every window. An InputChannel is a socket pair that forms a direct, private communication pipe to a specific application.
It finds the InputChannel for the Launcher's window and writes the MotionEvent data into it.
The dispatcher then waits for the application to signal that it has finished processing the event. If a response doesn't come back within a few seconds (typically 5), the InputDispatcher concludes the app is frozen and triggers an Application Not Responding (ANR) dialog.

InputDispatcher consults WMS to find the correct target (App B) and delivers the event directly through the InputChannel socket. App A never sees the event.

You can inspect this state using dumpsys. Running dumpsys input will show you the state of the InputDispatcher, including a list of all open InputChannels to various windows. Running dumpsys window will show you the complete list of windows that WMS is managing, which you can correlate with the input channels.

The system pushes events to a specific app through this dedicated channel. An app is completely blind to any events not explicitly sent to it. This is a fundamental security and performance feature.

In our journey, the InputDispatcher and WMS determine the tap was on a specific app icon within the Launcher. The Launcher's code receives the event, identifies the target app, and prepares to launch it. But what if that app's process doesn't exist yet? This is where the system pivots from handling input to managing processes.

Section 4: The Zygote's Fork: Birthing a New App Process

The Launcher, a system application we covered in the previous article, now needs to start a new activity. It does this by making a call to the ActivityManagerService (AMS), the great orchestrator of application lifecycle. AMS receives the request, checks permissions, and determines that it needs to create a new process for the target application.

Starting a new process from scratch on a Linux system can be slow. You have to load a runtime (the ART), initialize core libraries, and set up all the basic framework classes. Doing this for every app launch would make your phone feel sluggish. Android's solution to this problem is one of its most elegant and foundational concepts: the Zygote.

The Zygote is a special process that is started during boot. It loads the ART virtual machine, pre-loads all the core framework classes, and then sits and waits. It's a pre-warmed, initialized template of an Android application process.

When AMS needs a new process, it doesn't start one from scratch. Instead, it sends a request over a local socket to the Zygote. The Zygote's response is simple and incredibly efficient: it calls the Linux fork() system call. fork() creates an exact, copy-on-write duplicate of the Zygote process. This new process inherits the already-initialized ART instance and all the pre-loaded classes. From there, it just needs to load the application-specific code, which is a much faster operation.

This factory model is why Android apps can launch so quickly.

AMS acts as the central coordinator. The Zygote forks a copy-on-write clone of itself — inheriting the pre-warmed ART and all core libraries — then the new process announces itself back to AMS.

Interview Note: A classic interview question is "How does Android start apps so fast?" The answer is Zygote, pre-initialization, and the fork() system call. You can prove this relationship on any device. Run ps -e | grep zygote to find the Zygote's Process ID (PID). Then, launch a new app and quickly run ps -e again. Find your new app's process and look at its PPID (Parent Process ID). It will be the PID of the Zygote.

AMS is the policy-maker; it decides when to launch an app. Zygote is the mechanic; it performs the low-level OS task of creating the process.

A new process now exists, but it's just a generic copy of the Zygote. The next step is for this new process to load the specific app's code and prepare to run its UI.

Section 5: Hello, World: The App's Main Thread Springs to Life

The Zygote has forked a new process. What happens inside that process to turn it from a generic shell into our running application? The entry point is a static method: ActivityThread.main().

This is the code that runs immediately after the fork(). It does several critical setup tasks:

It creates the main thread's Looper. The Looper is the engine of the UI thread. It runs an infinite message-processing loop, ensuring that all UI events, drawing operations, and lifecycle callbacks are executed sequentially.
It makes a Binder IPC call back to the ActivityManagerService to "attach" itself. It's essentially telling the system, "Hello, I'm process 12345, I'm alive and ready for instructions."
AMS, now knowing the app is ready, sends a Binder call back to the app's ActivityThread, instructing it to launch the target Activity.
ActivityThread then uses reflection to instantiate the Activity class, and calls its lifecycle methods: onCreate(), onStart(), and onResume().

It's inside onCreate() that the developer's code finally runs, calling setContentView() to inflate the UI layout. This process creates the tree of View objects that define the app's screen. It also creates a crucial object called ViewRootImpl, which acts as the bridge between the app's window and the WindowManagerService.

The ActivityThread sets up the Looper and MessageQueue as the UI thread's engine. ViewRootImpl connects the app's view hierarchy back to WMS and coordinates draw scheduling via the Choreographer.

Common Mistake: Many developers believe onCreate() is the first thing that runs in their app. In reality, a huge amount of setup (ActivityThread.main, preparing the Looper, attaching to AMS) happens before any of your application code is ever executed.

The app is now running and its View hierarchy exists in memory. This is just a logical tree of objects. The next step is to translate that tree into actual pixels that can be displayed on the screen.

Section 6: From Layouts to Pixels: The Rendering Pipeline

We have a tree of View objects. The CPU understands what a Button or a TextView is. The GPU, however, does not. The GPU understands triangles, textures, and shaders. The rendering pipeline is the bridge between these two worlds.

This process happens in three phases, coordinated by the ViewRootImpl on every frame that needs to be drawn.

Measure Pass: The system walks the View tree from the bottom up (onMeasure). Each View tells its parent how big it would like to be. This pass figures out the dimensions of every element.
Layout Pass: The system walks the tree from the top down (onLayout). Each parent uses the information from the measure pass to tell its children exactly where to place themselves on the screen.
Draw Pass: Finally, the system walks the tree again (onDraw). Each View's onDraw method issues a series of drawing commands (drawRect, drawText, etc.) that are recorded into a DisplayList rather than writing pixels directly.

This DisplayList is then handed off to a dedicated rendering thread within the app, driven by the HardwareRenderer. This thread takes the high-level commands from the DisplayList and translates them into low-level OpenGL or Vulkan commands that the GPU can understand. It executes these commands to draw the UI into an off-screen memory buffer called a GraphicBuffer.

Three sequential passes convert the logical View hierarchy into a GraphicBuffer filled with pixels, ready for the compositor.

Performance Note: The DisplayList system is a key optimization. Instead of re-executing all onDraw methods for every frame of an animation, the system can often just re-play the existing DisplayList with updated properties (like translation or alpha), which is much faster. Enable Profile HWUI rendering -> On screen as bars in developer options to see per-frame timings broken down by phase.

The app has now successfully produced a buffer containing its UI. This buffer is just one of many. The status bar has its own buffer, as does the navigation bar. A master compositor is needed to combine them into the final image.

Section 7: The Grand Compositor: Assembling the Final Scene

Our application has rendered its UI into a GraphicBuffer. The SystemUI process has done the same for the status bar. How do these separate images get combined into the single frame you see on screen? This is the job of SurfaceFlinger.

SurfaceFlinger is a critical system daemon that runs in its own process. It is the sole owner of the display framebuffer and the only process that can write to the screen. Every application that wants to draw something on screen gets a Surface from SurfaceFlinger. This Surface is one end of a BufferQueue, a shared memory mechanism for passing graphics buffers.

The application is the producer. It draws its UI into a buffer and queues it. SurfaceFlinger is the consumer. It dequeues the buffer when it's ready to compose a new frame.

This producer-consumer model decouples the app's rendering loop from the system's display refresh cycle. The app can render as fast as it can, and SurfaceFlinger will simply pick up the latest completed frame on each VSYNC tick.

VSYNC is the heartbeat of the display system. It's a signal from the display hardware that it's ready for a new frame (typically 60 or 120 times per second). On every VSYNC, SurfaceFlinger wakes up and performs these steps:

It looks at all the visible layers (from the app, status bar, wallpaper, etc.).
It asks the BufferQueue for each layer for its latest GraphicBuffer.
It uses the GPU or a special hardware composer to combine all these buffers into a single final frame, respecting their Z-order and transparency.
It pushes this final frame to the display hardware.

SurfaceFlinger acts as the central hub. On each VSYNC tick, it dequeues the latest buffer from every visible layer and composites them into the final frame sent to the display.

You can inspect SurfaceFlinger's state with dumpsys SurfaceFlinger. The output will show you every layer on the screen, its size, format, and position, providing a live snapshot of what SurfaceFlinger is currently managing.

Interview Note: Distinguishing between WindowManagerService and SurfaceFlinger is critical. WMS manages window policy (metadata, focus, Z-order). SurfaceFlinger manages window content (pixel buffers). WMS tells SurfaceFlinger the plan, and SurfaceFlinger executes the graphical composition.

SurfaceFlinger has now produced the final, complete frame. There's just one last step: handing this frame off to the specialized hardware that will actually drive the display panel's pixels.

Section 8: The Last Mile: The Hardware Composer HAL

SurfaceFlinger has a final, composed frame buffer ready to go. How does this buffer of pixels in memory actually get turned into light from the screen? This last step is handled by the Hardware Abstraction Layer, or HAL, specifically the Hardware Composer (HWC) HAL.

The HAL is a set of standard C/C++ interfaces that the Android platform uses to communicate with vendor-specific hardware drivers. It allows the Android framework to remain agnostic to the underlying hardware.

SurfaceFlinger could use the GPU to combine all the layers into a single buffer and then push that buffer to the display. This works, but using the powerful (and power-hungry) GPU for simple tasks like overlaying the static status bar on top of a full-screen app is wasteful.

This is where the HWC comes in. Most modern SoCs have a dedicated piece of hardware called a display controller that is very good at simple composition. It can take a few separate buffers (called hardware overlays) and composite them on the fly as it sends the data to the display panel. This is far more power-efficient than waking up the entire GPU.

Before composing a frame, SurfaceFlinger asks the HWC HAL: "Can you handle composing this scene with these N layers?" If the HWC says yes, SurfaceFlinger simply passes the separate GraphicBuffers for each layer down to the HWC and lets the specialized hardware handle the composition. If the scene is too complex (too many layers, or unsupported transformations), the HWC will say no. In that case, SurfaceFlinger falls back to using the GPU for composition.

SurfaceFlinger always prefers the power-efficient HWC path. The GPU fallback is used only when the scene exceeds the hardware overlay limits.

You can see this in action with the "Show hardware layers updates" developer option. When enabled, any layer being handled by the HWC will flash green. If a layer is being composited by the GPU, it will not flash. This is a powerful tool for verifying that your UI is being composed efficiently.

And with that, the journey is complete. The HWC and display controller send the final pixel data to the display panel, and the screen illuminates in the pattern of the newly launched application.

The Full Picture

From a change in capacitance under your finger to photons of light from your screen, we've traced the entire relay race. It's a journey that crosses process boundaries and architectural layers multiple times.

Let's quickly revisit the key handoffs.

A series of well-defined handoffs, not a single monolithic operation. The Kernel passes an evdev stream to system_server. system_server pushes a MotionEvent to the App. The App produces a GraphicBuffer for SurfaceFlinger. SurfaceFlinger gives layers to the HWC.
An immense degree of decoupling. The input system doesn't know about window management. Window management doesn't know about rendering. Rendering doesn't know about composition. This separation makes the system robust and extensible.
IPC is the nervous system. Binder and sockets are the nervous system of Android, allowing these specialized, isolated processes to work together as a coherent whole.
The invisible made visible by tools. getevent, dumpsys, and the developer options are not just for debugging; they are windows into the live, running system that confirm this architecture.

This series has given you a map of the territory. You now have the fundamental model and the debugging tools to pick any interaction in the Android system, from turning on Wi-Fi to taking a photo, and trace its journey for yourself. The exploration has just begun.