Android - Live debugging of Watchdog kills

      The default implementation of Android's Watchdog is going to restart Android Framework if its in an unrecoverable state and this makes sense for end users by offering graceful handling of the failure scenario. However, this could be a challenge for platform developers to root cause. I happened to come across a scenario where the UI would freeze and eventually watchdog would restart the framework. Watchdog does generate minimum debug info like stack trace of system_server, mediaserver, SurfaceFlinger etc. However, this isn't necessarily helpful in all cases and in this case, Window Manager was blocked on a call to create a new surface and this call was serviced by SurfaceFlinger, which was waiting for a composition response from the hardware composer and further logic was really specific to this particular vendor's HAL implementation. Unfortunately, Android's watchdog isn't generic enough to generate vendor specific debug information and hence the challenge was just to ensure that the Watchdog doesn't kill the Framework upon failure and instead halt. Fortunately, the default implementation avoids killing the framework when a debugger is attached to the system_process (Debug.isDebuggerConnected). This API basically boils down to the state of the JDWP thread of the specific Dalvik virtual machine.

  Use Android SDK's DDMS to attach to system_process


  Once attached, Watchdog isn't going to restart the framework and that gives enough opportunity to use android debug bridge or serial port to get enough debug info.

 I Process : Sending signal. PID: 381 SIG: 3
 I dalvikvm: threadid=3: reacting to signal 3
 I dalvikvm: Wrote stack traces to '/data/anr/traces.txt'
 I Process : Sending signal. PID: 524 SIG: 3
 I dalvikvm: threadid=3: reacting to signal 3
 I dalvikvm: Wrote stack traces to '/data/anr/traces.txt'
 I Watchdog_N: dumpKernelStacks
 W Watchdog: Debugger connected: Watchdog is *not* killing the system process

No comments: