目录

Binder IPC异常案例

本文记录binder ipc的一些不常提到的知识点,以及DeadSystemException,DeadObjectException等binder相关异常的常见案例

Binder背景知识

一次Binder IPC调用流程简单理解可以分为3步:(A进程->B进程)

  1. 调用进程(caller)中,binder框架负责将IPC的各个参数序列化成(一段连续的)二进制数据,放到Parcel容器里
  2. 内核驱动负责将 caller Parcel的数据拷贝到目标进程(callee)的binder buffer里
  3. 目标进程将binder buffer里存放二进制数据反序列化成参数对象,执行对应的IPC调用
Binder Buffer知识点
  • binder驱动层允许的最大binder buffer申请大小是4MB
  • Android binder框架层实际申请的binder buffer大小为BINDER_VM_SIZE=1024KB - 4KB*2
  • 对于同步binder call,一次最多只允许向目标进程传输BINDER_VM_SIZE的数据
  • 同步和异步binder call请求数据是共享整个binder buffer的,但对于异步(oneway)binder call,限制了最大只允许传BINDER_VM_SIZE/2的数据,See alloc->free_async_space = alloc->buffer_size / 2;

上面IPC流程过程中的每一步都可能发生一些无法避免的错误,例如

  • 第2步里拷贝数据到目标进程的时候,发现要写入的数据超过BINDER_VM_SIZE
  • 第2步里拷贝数据到目标进程的时候,目标进程突然FC或被系统Kill了
  • 有多个进程也在向(当前的)目标进程发起IPC,目标进程IPC(业务逻辑复杂或发生卡顿)响应较慢,目标进程的binder buffer就会被这些请求占满,那么后发起的IPC就会失败
  • 同理,caller进程在发起IPC到目标进程的时候,后台有其他进程也在向caller进程发起IPC,导致caller进程的binder buffer不足,最终又导致目标进程在回传IPC调用结果给caller的时候出现binder buffer不足而失败

但是Binder驱动返回给上层Binder框架的错误码只有几种:

BR_DEAD_REPLY

目标进程挂了,binder驱动无法处理此次IPC,binder框架层会抛出DeadObjectException给到Java层

BR_FAILED_REPLY

实际的错误原因可能有3种:

  1. 目标进程的binder buffer里没有足够的空间存放此次IPC要传输的数据,IPC失败
  2. 在向目标进程的binder buffer里拷贝IPC数据时,目标进程突然挂了,IPC失败
  3. 目标进程返回的数据量特别大或调用方进程同时在响应其他进程的IPC的话,就可能出现调用方进程的binder buffer没有足够的剩余空间存放此次的返回数据,IPC失败

由于驱动层并没有返回详细的错误码,框架层只能依据此次向目标进程传输的数据量大小,如果超过200KB就抛出TransactionTooLargeException,其他情况都抛出DeadObjectException

BR_FROZEN_REPLY

AOSP和OEM厂商都有各自的冻结App进程的策略,目的是减少后台app的cpu/memory占用,从而降低功耗,也能延长待机时间

如果向一个被冻结的进程发起同步Binder IPC,binder驱动层会直接reject此次调用,binder框架层收到这个错误时也会向收到BR_FAILED_REPLY一样的抛出TransactionTooLargeException或DeadObjectException

相关代码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// frameworks/base/core/jni/android_util_Binder.cpp
static jboolean android_os_BinderProxy_transact(JNIEnv* env, jobject obj,
        jint code, jobject dataObj, jobject replyObj, jint flags) {
    Parcel* data = parcelForJavaObject(env, dataObj);
    if (data == NULL) {
        return JNI_FALSE;
    }
    Parcel* reply = parcelForJavaObject(env, replyObj);
    if (reply == NULL && replyObj != NULL) {
        return JNI_FALSE;
    }

    IBinder* target = getBPNativeData(env, obj)->mObject.get();
    if (target == NULL) {
        jniThrowException(env, "java/lang/IllegalStateException", "Binder has been finalized!");
        return JNI_FALSE;
    }

    // 实际是通过IPCThreadState::transact()和binder驱动进行交互
    status_t err = target->transact(code, *data, reply, flags);

    if (err == NO_ERROR) {
        return JNI_TRUE;
    }
    if (err == UNKNOWN_TRANSACTION) {
        return JNI_FALSE;
    }
    signalExceptionForError(env, obj, err, true /*canThrowRemoteException*/, data->dataSize());
    return JNI_FALSE;

IPCThreadState::waitForResponse()用于等待此次IPC的返回数据,并将

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// frameworks/native/libs/binder/IPCThreadState.cpp
status_t IPCThreadState::waitForResponse(Parcel *reply, status_t *acquireResult) {
    uint32_t cmd;
    int32_t err;

    while (1) {
        if ((err=talkWithDriver()) < NO_ERROR) break;
        err = mIn.errorCheck();
        if (err < NO_ERROR) break;
        if (mIn.dataAvail() == 0) continue;

        cmd = (uint32_t)mIn.readInt32();

        switch (cmd) {
            // ...
            case BR_DEAD_REPLY:
                err = DEAD_OBJECT;
                goto finish;

            case BR_FAILED_REPLY:
                err = FAILED_TRANSACTION;
                goto finish;

            case BR_FROZEN_REPLY:
                ALOGW("Transaction failed because process frozen.");
                err = FAILED_TRANSACTION;
                goto finish;
            // ...
        }
    }
finish:
    // ...
    return err;
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// frameworks/base/core/jni/android_util_Binder.cpp
void signalExceptionForError(JNIEnv* env, jobject obj, status_t err,
        bool canThrowRemoteException, int parcelSize) {
    switch (err) {
        // ...
        case DEAD_OBJECT:
            // DeadObjectException is a checked exception, only throw from certain methods.
            jniThrowException(env, canThrowRemoteException
                    ? "android/os/DeadObjectException"
                            : "java/lang/RuntimeException", NULL);
            break;
        case FAILED_TRANSACTION: {
            ALOGE("!!! FAILED BINDER TRANSACTION !!!  (parcel size = %d)", parcelSize);
            const char* exceptionToThrow;
            std::string msg;
            // TransactionTooLargeException is a checked exception, only throw from certain methods.
            // TODO(b/28321379): Transaction size is the most common cause for FAILED_TRANSACTION
            //        but it is not the only one.  The Binder driver can return BR_FAILED_REPLY
            //        for other reasons also, such as if the transaction is malformed or
            //        refers to an FD that has been closed.  We should change the driver
            //        to enable us to distinguish these cases in the future.
            if (canThrowRemoteException && parcelSize > 200*1024) {
                // bona fide large payload
                exceptionToThrow = "android/os/TransactionTooLargeException";
                msg = base::StringPrintf("data parcel size %d bytes", parcelSize);
            } else {
                // Heuristic: a payload smaller than this threshold "shouldn't" be too
                // big, so it's probably some other, more subtle problem.  In practice
                // it seems to always mean that the remote process died while the binder
                // transaction was already in flight.
                exceptionToThrow = (canThrowRemoteException)
                        ? "android/os/DeadObjectException"
                        : "java/lang/RuntimeException";
                msg = "Transaction failed on small parcel; remote process probably died, but "
                      "this could also be caused by running out of binder buffer space";
            }
            jniThrowException(env, exceptionToThrow, msg.c_str());
        } break;
        // ...
    }
}

Java Binder异常类继承关系

1
2
3
4
    android.os.RemoteException
        |_ TransactionTooLargeException
        |_ DeadObjectException
            |_ DeadSystemException

DeadSystemException是向system_server发起的IPC时遇到了DeadObjectException,但从上面的背景知识一节,我们知道DeadObjectException不一定说明是因为目标进程system_server挂掉了

Binder异常案例

TransactionTooLargeException

案例1:同步binder call传输数据量过大

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
AndroidRuntime: FATAL EXCEPTION: main
AndroidRuntime: Process: com.mmi.player, PID: 27332
AndroidRuntime: java.lang.RuntimeException: android.os.TransactionTooLargeException: data parcel size 1084868 bytes
AndroidRuntime: 	at android.app.ContextImpl.sendBroadcast(ContextImpl.java:1240)
AndroidRuntime: 	at android.content.ContextWrapper.sendBroadcast(ContextWrapper.java:515)
AndroidRuntime: 	at com.mmi.player.newplayer.PlayCenterUtil.j(SourceFile:61)
AndroidRuntime: 	at com.mmi.player.newplayer.manager.module.MusicWidgetManager.h(SourceFile:9)
AndroidRuntime: 	at com.mmi.player.newplayer.manager.module.MusicWidgetManager.b(SourceFile:1)
AndroidRuntime: 	at com.mmi.player.newplayer.manager.module.l.run(SourceFile:1)
AndroidRuntime: 	at android.os.Handler.handleCallback(Handler.java:958)
AndroidRuntime: 	at android.os.Handler.dispatchMessage(Handler.java:99)
AndroidRuntime: 	at android.os.Looper.loopOnce(Looper.java:222)
AndroidRuntime: 	at android.os.Looper.loop(Looper.java:314)
AndroidRuntime: 	at android.app.ActivityThread.main(ActivityThread.java:8790)
AndroidRuntime: 	at java.lang.reflect.Method.invoke(Native Method)
AndroidRuntime: 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:561)
AndroidRuntime: 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1013)
AndroidRuntime: Caused by: android.os.TransactionTooLargeException: data parcel size 1084868 bytes
AndroidRuntime: 	at android.os.BinderProxy.transactNative(Native Method)
AndroidRuntime: 	at android.os.BinderProxy.transact(BinderProxy.java:639)
AndroidRuntime: 	at android.app.IActivityManager$Stub$Proxy.broadcastIntentWithFeature(IActivityManager.java:6281)
AndroidRuntime: 	at android.app.ContextImpl.sendBroadcast(ContextImpl.java:1235)
AndroidRuntime: 	... 13 more

player向目标进程发送了1084868 bytes,约1059.4KB数据,超过BINDER_VM_SIZE

案例2:异步binder call传输数据量过大

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
AndroidRuntime: FATAL EXCEPTION: main
AndroidRuntime: Process: com.ss.android.ugc.aweme, PID: 30553
AndroidRuntime: java.lang.RuntimeException: android.os.TransactionTooLargeException: data parcel size 546784 bytes
AndroidRuntime: 	at android.app.ActivityClient.activityStopped(ActivityClient.java:86)
AndroidRuntime: 	at android.app.servertransaction.PendingTransactionActions$StopInfo.run(PendingTransactionActions.java:143)
AndroidRuntime: 	at android.os.Handler.handleCallback(Handler.java:938)
AndroidRuntime: 	at android.os.Handler.dispatchMessage(Handler.java:99)
AndroidRuntime: 	at android.os.Looper.loopOnce(Looper.java:210)
AndroidRuntime: 	at android.os.Looper.loop(Looper.java:299)
AndroidRuntime: 	at android.app.ActivityThread.main(ActivityThread.java:8269)
AndroidRuntime: 	at java.lang.reflect.Method.invoke(Native Method)
AndroidRuntime: 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:576)
AndroidRuntime: 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1073)
AndroidRuntime: Caused by: android.os.TransactionTooLargeException: data parcel size 546784 bytes
AndroidRuntime: 	at android.os.BinderProxy.transactNative(Native Method)
AndroidRuntime: 	at android.os.BinderProxy.transact(BinderProxy.java:624)
AndroidRuntime: 	at android.app.IActivityClientController$Stub$Proxy.activityStopped(IActivityClientController.java:1297)
AndroidRuntime: 	at android.app.ActivityClient.activityStopped(ActivityClient.java:83)
AndroidRuntime: 	... 9 more

我们知道Activity有onSaveInstanceState/onRestoreInstanceState的回调,可以存储和恢复页面的一些状态,这些数据会发送给system_server,这样即使app进程挂了,用户重新进入这个页面时也能恢复到之前的状态(当然要开发者去适配)
ActivityClient.activityStopped就是用于将app页面状态传递给system_sever进程的,但是它是个oneway的binder call,最大只让传输508KB的数据 这个例子里,抖音的某个Activity页面肯定是在onSaveInstanceState回调保存了546784 bytes的数据,也就是约534KB,明显是超过限制了

Tips
Android的Activity生命周期回调比较复杂,避免依赖onSaveInstanceState去保存数据,必要场景还是使用文件进行持久化更加靠谱

DeadObjectException

案例1:对端进程死亡

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
04:37:47.638  2313  2453 I ActivityManager: Killing 26784:com.mmi.guardprovider/u0a159 (adj 250): MemoryReclaimService(service)
04:37:47.668  1658  1658 I Zygote  : Process 26784 exited due to signal 9 (Killed)

04:37:47.682 31103 31103 E AndroidRuntime: FATAL EXCEPTION: main
04:37:47.682 31103 31103 E AndroidRuntime: Process: com.mmi.securitycenter.remote, PID: 31103
04:37:47.682 31103 31103 E AndroidRuntime: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
04:37:47.682 31103 31103 E AndroidRuntime: 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:571)
04:37:47.682 31103 31103 E AndroidRuntime: 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1013)
04:37:47.682 31103 31103 E AndroidRuntime: Caused by: java.lang.reflect.InvocationTargetException
04:37:47.682 31103 31103 E AndroidRuntime: 	at java.lang.reflect.Method.invoke(Native Method)
04:37:47.682 31103 31103 E AndroidRuntime: 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:561)
04:37:47.682 31103 31103 E AndroidRuntime: 	... 1 more
04:37:47.682 31103 31103 E AndroidRuntime: Caused by: android.os.DeadObjectException
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.os.BinderProxy.transactNative(Native Method)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.os.BinderProxy.transact(BinderProxy.java:621)
04:37:47.682 31103 31103 E AndroidRuntime: 	at com.mmi.guardprovider.aidl.IAntiVirusServer$Stub$a.q0(Unknown Source:21)
04:37:47.682 31103 31103 E AndroidRuntime: 	at re.l.i(Unknown Source:81)
04:37:47.682 31103 31103 E AndroidRuntime: 	at re.l.a(Unknown Source:0)
04:37:47.682 31103 31103 E AndroidRuntime: 	at re.k.a(Unknown Source:4)
04:37:47.682 31103 31103 E AndroidRuntime: 	at f9.a$d.onServiceConnected(Unknown Source:38)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.app.LoadedApk$ServiceDispatcher.doConnected(LoadedApk.java:2253)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.app.LoadedApk$ServiceDispatcher$RunConnection.run(LoadedApk.java:2286)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.os.Handler.handleCallback(Handler.java:958)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.os.Handler.dispatchMessage(Handler.java:99)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.os.Looper.loopOnce(Looper.java:224)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.os.Looper.loop(Looper.java:318)
04:37:47.682 31103 31103 E AndroidRuntime: 	at android.app.ActivityThread.main(ActivityThread.java:8669)
04:37:47.682 31103 31103 E AndroidRuntime: 	... 3 more

能看到IAntiVirusServer$Stub$a.q0这个调用发生之前,目标进程26784就被系统干掉了

案例2:目标进程的binder buffer不足

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: FATAL EXCEPTION: Thread-16
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: Process: com.wwm.mtbf, PID: 9413
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: java.lang.RuntimeException: android.os.DeadObjectException: Transaction failed on small parcel; remote process probably died, but this could also be caused by running out of binder buffer space
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: 	at com.wwm.mtbf.SlowBinderCallDemo$2.run(SlowBinderCallDemo.java:58)
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: 	at java.lang.Thread.run(Thread.java:1012)
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: Caused by: android.os.DeadObjectException: Transaction failed on small parcel; remote process probably died, but this could also be caused by running out of binder buffer space
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: 	at android.os.BinderProxy.transactNative(Native Method)
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: 	at android.os.BinderProxy.transact(BinderProxy.java:684)
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: 	at com.wwm.mtbf.IMyAidlInterface$Stub$Proxy.mockSlowResonse(IMyAidlInterface.java:124)
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: 	at com.wwm.mtbf.SlowBinderCallDemo$2.run(SlowBinderCallDemo.java:56)
11-08 19:38:41.848 10095  9413 10265 E AndroidRuntime: 	... 1 more

查看此时的binder驱动log

1
2
3
4
5
6
7
[188157.002191] binder: undelivered TRANSACTION_COMPLETE
[188157.002213] binder: undelivered transaction 15278855, process died.
# 此次9413进程的10265线程发起的ipc数据需要分配的95296 bytes
[188166.932812] binder_alloc: 9905: binder_alloc_buf size 95296 failed, no address space
# 当前目标进程9905有10个binder call在处理中,剩余的buffer空间只有87424 bytes
[188166.932819] binder_alloc: allocated: 952960 (num: 10 largest: 95296), free: 87424 (num: 1 largest: 87424)
[188166.932824] binder: 9413:10265 transaction failed 29201/-28, size 95296-0 line 3381

案例3:调用方进程的binder buffer空间不足

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
02-22 17:42:56.314  1000  2495  6267 W system_server: Large reply transaction of 1000776 bytes, interface descriptor , code 1
02-22 17:42:56.316  1000  1720  1720 W BpBinder: Large or Failed outgoing transaction of 4 bytes, interface descriptor , code 1

02-22 17:42:56.316  1000  1720  1720 E JavaBinder: !!! FAILED BINDER TRANSACTION !!!  (parcel size = 4)
02-22 17:42:56.316  1000  1720  1720 D AndroidRuntime: Shutting down VM
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: FATAL EXCEPTION: main
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: Process: com.android.systemui, PID: 1720
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: android.os.BadParcelableException: Failure retrieving array; only received 16 of 22
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.content.pm.BaseParceledListSlice.<init>(BaseParceledListSlice.java:104)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.content.pm.ParceledListSlice.<init>(ParceledListSlice.java:42)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.content.pm.ParceledListSlice.<init>(Unknown Source:0)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.content.pm.ParceledListSlice$1.createFromParcel(ParceledListSlice.java:80)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.content.pm.ParceledListSlice$1.createFromParcel(ParceledListSlice.java:78)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.os.Parcel.readTypedObject(Parcel.java:4025)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.app.INotificationManager$Stub$Proxy.getActiveNotificationsFromListener(INotificationManager.java:4451)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.service.notification.NotificationListenerService.getActiveNotifications(NotificationListenerService.java:1067)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.service.notification.NotificationListenerService.getActiveNotifications(NotificationListenerService.java:991)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at com.android.systemui.statusbar.phone.NotificationListenerWithPlugins.getActiveNotifications(go/retraceme e705dac4e8523a576a9f2e57230c33a5a6b8c24d494a7f72a88e51a609ebdefc:1)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at com.android.systemui.statusbar.NotificationListener.onListenerConnected(go/retraceme e705dac4e8523a576a9f2e57230c33a5a6b8c24d494a7f72a88e51a609ebdefc:14)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at com.android.systemui.statusbar.notification.NotificationListener.onListenerConnected(go/retraceme e705dac4e8523a576a9f2e57230c33a5a6b8c24d494a7f72a88e51a609ebdefc:1)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.service.notification.NotificationListenerService$MyHandler.handleMessage(NotificationListenerService.java:2416)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.os.Handler.dispatchMessage(Handler.java:106)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.os.Looper.loopOnce(Looper.java:224)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.os.Looper.loop(Looper.java:318)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.app.ActivityThread.main(ActivityThread.java:8762)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at java.lang.reflect.Method.invoke(Native Method)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:561)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1013)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: Caused by: android.os.DeadObjectException: Transaction failed on small parcel; remote process probably died, but this could also be caused by running out of binder buffer space
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.os.BinderProxy.transactNative(Native Method)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.os.BinderProxy.transact(BinderProxy.java:639)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	at android.content.pm.BaseParceledListSlice.<init>(BaseParceledListSlice.java:95)
02-22 17:42:56.317  1000  1720  1720 E AndroidRuntime: 	... 19 more

对照这时候的binder驱动log

1
2
3
4
5
6
7
[ 7508.578077][ T6267] binder_alloc: 1720: binder_alloc_buf size 1047296 failed, no address space
# systemui的binder buffer当前已使用70528 bytes,还有969848 bytes空闲 (两个加起来约等于1016KB)
[ 7508.578093][ T6267] binder_alloc: allocated: 70528 (num: 1 largest: 70528), free: 969856 (num: 2 largest: 969848)
[ 7508.578105][ T6267] binder: 2495:6267 transaction failed 29201/-28, size 1000776-46520 line 3325
[ 7508.578154][ T6267] binder: send failed reply for transaction 7111893 to 1720:1720
[ 7509.171866][ T3979] binder_alloc: 1720: binder_alloc_buf, no vma
[ 7509.179961][ T6267] binder: 2495:6267 transaction failed 29189/-22, size 132-0 line 3133

systemui向system_server请求通知列表,但是system_serve回传给systemui的数据达到1022KB,systemui进程的binder buffer根本没有这么大的空间可以存放

DeadSystemException

案例1: 异步binder call堆积

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
09-03 20:42:26.204  1284  1318 E IPCThreadState: Process seems to be sending too many oneway calls.
09-03 20:42:27.243 25810 25810 E IPCThreadState: Binder transaction failure: 22351626/29201/-28
09-03 20:42:27.243 25810 25810 W BpBinder: Large or Failed outgoing transaction of 80 bytes, interface descriptor , code 51
09-03 20:42:27.243 25810 25810 E JavaBinder: !!! FAILED BINDER TRANSACTION !!!  (parcel size = 80)
09-03 20:42:27.244  1284  1318 E IPCThreadState: Binder transaction failure: 22351631/29201/-28
# INetdEventListener的第3个binder call,也就是onConnectEvent
09-03 20:42:27.244  1284  1318 W BpBinder: Large or Failed outgoing transaction of 140 bytes, interface descriptor android.net.metrics.INetdEventListener, code 3

09-03 20:42:27.244 25810 25810 D AndroidRuntime: Shutting down VM
09-03 20:42:27.244 25810 25810 E AndroidRuntime: FATAL EXCEPTION: main
09-03 20:42:27.244 25810 25810 E AndroidRuntime: Process: com.chinamworld.main, PID: 25810
09-03 20:42:27.244 25810 25810 E AndroidRuntime: DeadSystemException: The system died; earlier logs will point to the root cause

此时的binder log

1
2
3
4
5
6
# 1633是netd
[ 1805.078745][ T1672] binder_alloc: 2420: pid 1633 spamming oneway? 1678 buffers allocated for a total size of 416208
[ 1805.078851][ T1672] binder_alloc: 2420: pid 1633 spamming oneway? 1679 buffers allocated for a total size of 416456
[ 1805.079049][ T1672] binder_alloc: 2420: pid 1633 spamming oneway? 1680 buffers allocated for a total size of 416704
[ 1805.079282][ T1672] binder_alloc: 2420: pid 1633 spamming oneway? 1681 buffers allocated for a total size of 416952
[ 1805.079355][ T1672] binder_alloc: 2420: pid 1633 spamming oneway? 1682 buffers allocated for a total size of 417200

能看到system_server进程收到了大量的INetdEventListener.onConnectEvent请求,一下子处理不过来,就导致binder buffer不足 原因是这个时间端内app进程发起了大量网络链接,网络的同事此前跟Google沟通过,谷歌认为这种属于某个应用的恶意行为,建议去定位哪些app的行为导致,并推进app整改

异步binder call
和同步binder call不懂,向同一个接口(例如ActiviyManagerService)的(所有)异步binder call是串行处理的
Google工程师的想法有时候跟我们有明显不同,例如这里我们会认为如果是恶意app发起了大量的请求,应该在系统层面限制这个app使用更多的网络资源,而不是去找app协调沟通;如果一个恶意App可以轻松的耗尽system_server binder buffer,直接影响系统里其他的进程与system_server的binder通信,那整个系统就会变得不稳定

阅读资料: