Android's Watchdog

Following is applicable only for pre-kitkat version. For Kitkat and above, refer to updated version.

Android framework's watchdog is meant to deal with cases when any of the following locks is held for more than a minute or when ServerThread is busy.

ActivityManagerService.this
PowerManagerService.mLocks
WindowManagerService.mWindowMap
WindowManagerService.mKeyguardTokenWatcher
WindowManagerService.mKeyWaiter


Watchdog thread posts a message MONITOR to android.server.ServerThread. It end up in the queue of message handler, HeartbeatHandler. The definition of HearbeatHadler is defined in Watchdog.java but its instance is actually created by ServerThread. android.server.ServerThread's looper thread would read all pending messages including watchdog's MONITOR message and would invoke an appropriate handler.


The handler of MONITOR message would simply check for availability of above mentioned locks. It doesn't involve any inter thread communication (between ServerThread, ActivityManager, PowerManager or WindowManager). The key here is the access to the above variables including variables like mCompleted across threads in system_server. If all, locks are available variable mCompleted (Watchdog.java) would be set to true and watchdog would continue to post MONITOR messages once every minute.



mCompleted stays false only when any of the above locks is held by any thread of system_server for more than a minute or if the MONITOR message isn't handled by ServerThread.



Here is a scenario, when ServerThread is busy handling a different message and couldn't service MONITOR request within a minute.

"android.server.ServerThread" prio=5 tid=8 NATIVE
group="main" sCount=1 dsCount=0 s=N obj=0x2fb1df28 self=0x1fece8
sysTid=201 nice=-2 sched=0/0 cgrp=unknown handle=1196944
at android.view.Surface.lockCanvasNative(Native Method)
at android.view.Surface.lockCanvas(Surface.java:314)
at android.view.ViewRoot.draw(ViewRoot.java:1341)
at android.view.ViewRoot.performTraversals(ViewRoot.java:1163)
at android.view.ViewRoot.handleMessage(ViewRoot.java:1727)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:123)
at com.android.server.ServerThread.run(SystemServer.java:513)

And in this case, MONITOR is handled but can't be serviced due to unavailability of lock (ActivityManagerService)

"android.server.ServerThread" prio=5 tid=8 MONITOR
group="main" sCount=1 dsCount=0 s=N obj=0x4690be50 self=0x54e440
sysTid=206 nice=-2 sched=0/0 cgrp=unknown handle=5562400
at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:~14726)
- waiting to lock (0x4691a9b8) (a com.android.server.am.ActivityManagerService) held by threadid=41 (Binder Thread #7)
at com.android.server.Watchdog$HeartbeatHandler.handleMessage(Watchdog.java:306)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:123)
at com.android.server.ServerThread.run(SystemServer.java:517)

In such cases, Watchdog would kill system_server and capture necessary stack traces in /data/anr/traces.txt. In recent version of android, the java and kernel stack trace of system_server is saved halfway through the watchdog interval. This helps in better debugging of watchdog kills.



No comments: