HAL Debugging Techniques - AOSP Framework & Internals

The Silent Failure of Native Code

You flash a new build containing your custom Hardware Abstraction Layer. The device boots, but your hardware feature is completely dead. There are no obvious Java exceptions in the standard logcat output. When an Android app crashes, the framework politely prints a stack trace. A failing HAL often dies silently or takes critical system services down with it.

Hardware integration requires a shift in how you hunt for bugs. HALs are native C++ binaries sitting between the Android framework and the Linux kernel. They run in their own isolated processes or inject directly into system services. You cannot easily attach a standard Android Studio debugger to them. Relying on system-level daemons and command-line utilities exposes what went wrong in the native layer.

Verifying Process Stability

Knowing how to check the process state is the mandatory first step. A HAL must be running before the framework can talk to it. You can verify this execution using standard Linux process utilities.

# Check if your specific HAL process is alive
adb shell ps -A | grep android.hardware.audio

Common Mistake: Do not assume a HAL is stable just because it shows up in the process list once. Run ps multiple times to ensure the Process ID remains constant. A changing PID means the HAL is crashing and restarting in the background.

If the process is missing entirely, check if it is trapped in a rapid crash loop. The system might be repeatedly starting the HAL, only for it to immediately die. Catching these rapid failures requires checking the kernel message buffer and the dedicated crash log.

# Look for Out-Of-Memory (OOM) kills or segfaults
adb shell dmesg | grep -i "kill"
adb shell logcat -b crash

Tip: The crash log buffer clears quickly on a busy system. Always run these checks immediately after reproducing the hardware failure.

Checking Visibility Through the Service Manager

Verifying a stable PID proves the binary can stay alive. But sometimes the framework still claims the hardware is unavailable. This means the process exists, but it failed to announce itself to the system.

Android uses a service manager to keep track of available HALs. Older HIDL HALs use hwservicemanager, while modern AIDL HALs rely on servicemanager. The lshal utility queries this service manager and dumps a list of every registered hardware interface.

# List all registered HALs and their providing processes
adb shell lshal

# Filter for a specific hardware type
adb shell lshal | grep camera

Common Mistake: Do not assume your HAL is fully registered just because lshal outputs the interface name. Ensure the output lists a valid process ID instead of N/A.

If your HAL is running but lacks a valid registration, the C++ code is likely fine. The failure is almost always related to permissions. A missing SELinux policy or a forgotten entry in the VINTF manifest will silently block the HAL from registering with the system. Knowing this visibility check saves you from hunting for memory leaks when the fix is just a missing XML tag.

Intercepting Native Crashes

Eventually, you will write a HAL that actually crashes during operation. A null pointer dereference in a native process bypasses the Java runtime completely.

Android handles native crashes using a daemon called tombstoned. When a native process like a HAL receives a fatal signal like SIGSEGV, tombstoned intercepts it. It pauses the dying process and writes a detailed post-mortem report to the /data/tombstones/ directory.

The sequence diagram below visualizes how tombstoned intercepts a fatal signal from a failing HAL. This helps you understand why the framework receives no immediate Java exception when hardware code fails. Pay attention to how the framework is left waiting for a response while the native layer handles the post-mortem dump.

This file contains the thread state, memory maps, and register values at the exact moment of failure. The framework only finds out the connection died after the dump is complete. You pull the latest tombstone file from the device to look at the abort message.

adb pull /data/tombstones/

Tip: Tombstones are sorted by an index number. The file named tombstone_00 might not be the newest one. Always check the file timestamps before starting your analysis.

Symbolizing the Stack Trace

Raw tombstones often only show cryptic memory addresses instead of function names. Production binaries are stripped of human-readable symbols to save space. You must symbolize the stack trace using the ndk-stack tool along with the unstripped binaries generated during your AOSP build.

ndk-stack -sym out/target/product/YOUR_DEVICE/symbols -dump tombstone_00

Tip: Always check the "Build fingerprint" at the top of the tombstone. If it does not exactly match the unstripped symbols from your current build, the translated stack trace will point to the wrong lines of code.

The Passthrough HAL Trap

Sometimes you will have a crashing HAL, but you cannot find its standalone process anywhere. Not all HALs run in perfect isolation.

Older legacy HALs operate in "passthrough" mode. Instead of running as an independent process, a passthrough HAL is a shared library that gets loaded directly into the client's memory space. If the client is the system_server, the HAL runs entirely inside the system_server.

When a passthrough HAL crashes, it takes the host process down with it. You will not find a tombstone for the HAL itself. Finding the root cause requires pulling the tombstone for the client process and tracing the stack downwards until you hit the hardware library execution. Understanding this architectural difference prevents you from searching for a phantom process that never existed.

Debugging hardware integration is fundamentally an exercise in process tracking. Finding the HAL process using standard Linux tools tells you if the binary is executing. Checking lshal confirms whether the framework has permission to interact with it. When a crash occurs, tombstoned and ndk-stack turn cryptic memory addresses into pinpointed lines of C++ code.

You now know how to isolate and identify a failing HAL process. But how does the actual data flow between this native C++ implementation and the Java framework?