目录

记一次dex2oat内联优化缺陷导致的App崩溃

雨花,南京,2022冬

近期App业务给我们反馈了一个匿名内部类持有的外部类对象为null的问题,本文记录下调查过程

报错堆栈信息如下:

1
2
3
4
5
6
7
8
FATAL EXCEPTION: NetProxyThread 
 Process: com.mmi.health:device, PID: 17616
 java.lang.NullPointerException: Parameter specified as non-null is null: 
      method com.mi.fitness.netproxy.NetProxyManager.start$lambda$0, parameter this$0
   at com.mi.fitness.netproxy.NetProxyManager.start$lambda$0(SourceFile:3)
   at com.mi.fitness.netproxy.NetProxyManager.a(SourceFile:1)
   at zyb.run(SourceFile:3)
   at java.lang.Thread.run(Thread.java:1012)

查看crash的代码

从crash栈上看到最后崩溃在了com.mi.fitness.netproxy.NetProxyManager.start$lambda$0里,从函数的名称是可以推断出是NetProxyManager.start方法的里的第一个lambda

NetProxyManager.kt源码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package com.mi.fitness.netproxy

class NetProxyManager : NetProxySender, DataHandlerWrapper() {
    private var jniContext: Long = 0
    private var tunnelThread: Thread? = null

    var started = false
    
    fun start() {
        Logger.i(TAG, "start() called, jniContext = $jniContext, started = $started")
        if (started) {
            return
        }
        started = true
        jniContext = NetProxy.netproxy_init(Build.VERSION.SDK_INT)
        NetProxy.setNetProxySender(this)
        if (tunnelThread == null) {
            NetProxy.netproxy_start(jniContext, Log.INFO)
            tunnelThread = Thread {
                NetProxy.netproxy_run(jniContext)
                NetProxy.netproxy_done(jniContext)
            }
            tunnelThread?.name = "NetProxyThread"
            tunnelThread?.start()
        }
        DeviceContact.instance.addDataHandler(WearPacket.TYPE_NET_PROXY, this)
    }
    ...

log显示NPE发生在NetProxyThread线程上,那么就是下面这个了

1
2
3
4
tunnelThread = Thread {
    NetProxy.netproxy_run(jniContext)
    NetProxy.netproxy_done(jniContext)
}

业务代码看着没问题

反编译Apk

从源码上看不出问题,那么我们用jadx工具反编译apk文件,看一下NetProxyManager.kt文件在被编译后的逻辑是怎么个情况

NetProxyManager-decompiled.png

NetProxyManager-start-decompiled.png

这时候能看到netproxy_runnetproxy_done都被移到到了一个编译工具生成的public static final void start$lambda$0(NetProxyManager this$0)方法里了,并且看到这个生成的方法内部首先就是检测参数this$0是不是null,这也正是上面发生NPE的地方!

匿名内部类和Lambda知识点
当创建一个匿名内部类时,它(可能会)需要访问外部类的成员变量和方法
为了实现这一点,匿名内部类会隐式地持有一个外部类的引用(即使它最终并没有访问外部类的任何成员变量)
这个引用就使得匿名内部类可以在其内部访问外部类的成员,就好像它是外部类的一个普通内部类一样

推荐在Activity/Service等类里使用static的内部类替换匿名内部类(尤其是继承自Thread/Handler等可能会长时间存活的类),static内部类不会自动捕获外部类实例,从而避免一些场景下的Activity的泄漏

上面jadx工具反编译出来的结果比较激进,直接把匿名内部类zyb的代码逻辑给集成进来了

实际的话应该是下面这样

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
public final class zyb extends Runnable {
    private final NetProxyManager a;

    public zyb(NetProxyManager npm) {
        this.a = npm;
    }

    @Override
    public void run() {
        NetProxyManager.start$lambda$0(this.a);
    }
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
public class NetProxyManager ... {

    public static final void start$lambda$0(NetProxyManager this$0) {
        Intrinsics.checkNotNullParameter(this$0, "this$0");
        NetProxy netProxy = NetProxy.INSTANCE;
        netProxy.netproxy_run(this$0.jniContext);
        netProxy.netproxy_done(this$0.jniContext);
    }
    
    public final void start() {
        ...
        if (this.tunnelThread == null) {
            netProxy.netproxy_start(this.jniContext, 4);
            Thread thread = new Thread(new zyb()); // <<<<<<
            this.tunnelThread = thread;
            thread.setName("NetProxyThread");
            Thread thread2 = this.tunnelThread;
            if (thread2 != null) {
                thread2.start();
            }
...

start$lambda$0方法的参数this$0是从zyb类的成员变量a ,而afinal的,所以它一定不会是null的,因为我们在创建zyb对象的时候,NetProxyManager对象就不可能是null

这个问题就变得有点诡异了

App研发那边也复现了一次,并反馈说是刚开始使用时,没有问题,但是放置半个小时之后,就会一直FC,没有办法恢复

我们知道Android有个后台dexopt服务,默认是在手机充电和空闲时会对app执行dexopt优化;但是内部的优化推断有个比较激进的优化方案,会在手机第一次使用后,就立即执行dexopt优化,所以就怀疑是dexopt优化生成的odex有点问题

排查dexopt优化

让App的同事在复现的手机上执行了dumpsys package命令

1
2
3
4
5
6
7
8
➜ adb shell dumpsys package com.mmi.health
...
Dexopt state:
  [com.mmi.health]
    path: /data/app/~~9t01WyFHgP1H6oFnkNOPEQ==/com.mmi.health-Wjc1lzE_TDIq0fRdzcBR4A==/base.apk
      arm64: [status=speed-profile] [reason=cmdline] [primary-abi]
          [location is /data/app/~~9t01WyFHgP1H6oFnkNOPEQ==/com.mmi.health-Wjc1lzE_TDIq0fRdzcBR4A==/oat/arm64/base.odex]
    base APK odex file size: base.art: 3800Kb base.odex: 16835Kb base.vdex: 112355Kb 

看到确实是执行了speed-profile的dexopt优化了

于是让他将这个/data/app/~~9t01WyFHgP1H6oFnkNOPEQ==/com.mmi.health-Wjc1lzE_TDIq0fRdzcBR4A==/oat/目录删除

确认了删除odex文件后重启app就不再crash

初步定位到问题在odex上

公司内部没有专注研究art虚拟机的团队,我们这边也不熟悉dexcode转机器码的具体流程
但既然是art的问题,首先是先排查是不是最近art里的代码提交导致的吧

排查ART代码改动

Android各厂商的手机海外版ROM上基本集成的都是google mainline的art,也就是只包含google的代码,不会包含各公司定制的逻辑

所以先让App的研发在海外版上也去试试能否复现

业务的研发同学在尝试了一段时间后,告诉我们海外版本不复现这个问题!

既然海外版都不复现,就说明是内部改动造成这个问题的概率变大了
🤣但其实后面才发现海外版也能复现 🤣

那去挨个先排查art的代码改动吧:但只发现2个内部团队的优化feature可能有一点点关系

1
2
3
➜ adb shell setprop persist.sys.parallel-image-loading false
➜ adb shell setprop persist.sys.app_dexfile_preload.enable false
➜ adb reboot

接着让关闭这个2个feature再去复现

并不是内部改动导致的

业务的研发先跟反馈关闭feature后问题不再复现,就自然以为是这2个feature导致的,于是就拉上feature owner一起进行排查

但过了一会儿又反馈关闭了这2个feature在国内版上也能复现,而且在海外版上也能复现了 -_-||

而我本地用的另一款机型的海外版却一直不能复现,对比发现是我的手机上Google的art的apex版本是860,而业务组那边的是380

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# 业务组提供的复现的机型上art的版本:
➜ adb shell dumpsys package com.google.android.art
com.google.android.art
    Version: 350820380
    Path: /data/apex/decompressed/com.android.art@350820380.decompressed.apex

# 而我本地不能复现问题的机器上art的版本:
➜ adb shell dumpsys package com.google.android.art
  com.google.android.art
    Version: 350820860
    Path: /data/apex/active/com.android.art@350820860.apex

排查google最新的apex是否已修复了问题

我手机上860版本的art apex应该是Google应用商店后台给我升级了,因为手机本身预装的版本是380的

  • 难道是手机性能有差异? 因为目前发现能复现的手机都是高端机,而我使用的那台是个入门机
  • 又或者是Google在860版本的art里已经修复了这个问题了吗?

所以一边让App的研发同事在海外版上安装这个860版本的art apex验证,另一边也提了个b/368471540给Google进行确认

这个问题的复现步骤相对繁琐,目前也只有Health报了这个问题,又需要使用连接手表,催了几次,Google art团队的研发也没有给出什么有效的分析

那还是只能靠我们自己分析了

先规避问题吧

root cause没定位到,但时间已经过去1周了,这个问题在一款新机型用户试用期间上报量还挺大的,并且一旦出现只能卸载app重新安装才能恢复,影响造成负面影响,于是分别在系统和app层面都增加了规避方案

  • 系统侧,先合入了一个兜底方案,检测到app发生频繁的crash后,就会去删除app的odex文件,避免app一直FC,无法使用
  • App侧,出问题既然是匿名内部类,而且它里面逻辑其实并不需要去访问外部类对象,所以也让Health那边改用static内部类从而规避

写个kotlin demo尝试复现一下?

我们的Android V版本用户试用已经有几周了了,虽然只有HealthApp报了这个匿名内部类的问题,但大家也还会担心其他app是不是也会爆发这个问题

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
class NetProxyManager : NetProxySender, DataHandlerWrapper() {
    
    fun start() {
        if (tunnelThread == null) {
            NetProxy.netproxy_start(jniContext, Log.INFO)
            tunnelThread = Thread {
                NetProxy.netproxy_run(jniContext)
                NetProxy.netproxy_done(jniContext)
            }
        }

所以我先写了个kotlin demo app去试试

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
package com.wwm.kttest

import android.app.Activity
import android.os.Bundle

class MainActivity : Activity() {
    private var jniContext: Long = 0
    
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        setContentView(R.layout.activity_main)

        // 为了让这个startThread能被优化成机器码,就执行个50000次,先让它变的hot起来
        for (i in 0..50000) {
            startThread(false);
        }
        startThread(true);
    }
    
    fun startThread(lunch: Boolean) {
        val thread = Thread {
            var proxy = NetProxy();
            proxy.run(jniContext);
        }
        if (lunch) {
            thread.start();
        }
    }
}

class NetProxy {

    fun run(jniContext: Long) {
        Log.i("WWM", "dd");
    }
}

但是对我的demo进行speed-profile模式的dexopt优化的话,实际执行的还是verify优化,也就不会生成机器码

1
2
3
➜ adb shell cmd package compile -f -m speed-profile com.wwm.kttest
➜ adb shell dumpsys package com.wwm.kttest | ag status=
      arm64: [status=verify] [reason=cmdline] [primary-abi]

于是推测art里应该是判断了小型app的话,根本就不会执行dexopt优化,所以又尝试对它进行everything优化

1
2
3
➜ adb shell cmd package compile -f -m everything com.wwm.kttest
➜ adb shell dumpsys package com.wwm.kttest | ag status=        
      arm64: [status=everything] [reason=cmdline] [primary-abi]

我的kotlin demo里的匿名内部类的类名被混淆成c.a了,那先看反编译demo app的结果

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public final class a extends Runnable {
    private final MainActivity a;

    public a(MainActivity a) {
        this.a = a;
    }
     ...
}

public final class MainActivity extends Activity {

    /* renamed from: a, reason: collision with root package name */
    public static final /* synthetic */ int f35a = 0;

    @Override // android.app.Activity
    public final void onCreate(Bundle bundle) {
        super.onCreate(bundle);
        setContentView(R.layout.activity_main);
        for (int i = 0; i < 50001; i++) {
            new Thread(new Runnable() { // from class: c.a
                @Override // java.lang.Runnable
                public final void run() {
                    int i2 = MainActivity.f35a;
                    if (MainActivity.this != null) {
                        Log.i("WWM", "dd");
                        return;
                    }
                    ....

得到对应的字节码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
 adb shell oatdump --class-filter=wwm.kttest.MainActivity --oat-file=/data/app/~~y5ANJPNgtchDgUFojq6eAw==/com.wwm.kttest-6LXk6RkUonrK8QgDigdi6w==/oat/arm64/base.odex 

  1: void com.wwm.kttest.MainActivity.onCreate(android.os.Bundle) (dex_method_idx=205)
    DEX CODE:
      0x0000: 6f20 5f00 3200   | invoke-super {v2, v3}, void android.app.Activity.onCreate(android.os.Bundle) // method@95
      0x0003: 1503 077f        | const/high16 v3, #int +2131165184 // 0x7f070000
      0x0005: 6e20 6000 3200   | invoke-virtual {v2, v3}, void android.app.Activity.setContentView(int) // method@96
      0x0008: 1203             | const/4 v3, #+0
      0x0009: 1400 51c3 0000   | const v0, #+50001
      0x000c: 3503 0f00        | if-ge v3, v0, +15
      0x000e: 2200 5300        | new-instance v0, java.lang.Thread // type@TypeIndex[83]
      0x0010: 2201 3700        | new-instance v1, c.a // type@TypeIndex[55]
      0x0012: 7020 ca00 2100   | invoke-direct {v1, v2}, void c.a.<init>(com.wwm.kttest.MainActivity) // method@202
      0x0015: 7020 f800 1000   | invoke-direct {v0, v1}, void java.lang.Thread.<init>(java.lang.Runnable) // method@248
      0x0018: d803 0301        | add-int/lit8 v3, v3, #+1
      0x001a: 28ef             | goto -17
      0x001b: 2203 5300        | new-instance v3, java.lang.Thread // type@TypeIndex[83]
      0x001d: 2200 3700        | new-instance v0, c.a // type@TypeIndex[55]
      0x001f: 7020 ca00 2000   | invoke-direct {v0, v2}, void c.a.<init>(com.wwm.kttest.MainActivity) // method@202
      0x0022: 7020 f800 0300   | invoke-direct {v3, v0}, void java.lang.Thread.<init>(java.lang.Runnable) // method@248
      0x0025: 6e10 fb00 0300   | invoke-virtual {v3}, void java.lang.Thread.start() // method@251
      0x0028: 0e00             | return-void

c.a对象构造函数的字节码和App的zyb类一样简单,这里也贴一下它的字节码和机器码,有助于理解MainActivity.onCreate的机器码

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  0: void c.a.<init>(com.wwm.kttest.MainActivity) (dex_method_idx=202)
    DEX CODE:
      0x0000: 7010 dc00 0000   | invoke-direct {v0}, void java.lang.Object.<init>() // method@220
      0x0003: 5b01 3a00        | iput-object v1, v0, Lcom/wwm/kttest/MainActivity; c.a.a // field@58
      0x0005: 0e00             | return-void
      
    CODE: (code_offset=0x00004340 size=28)...
      // 注:java对象在内存里的前8个字节分别保存类指针和用于实现synchronized的monitor_字段
      // 所以+8就是成员变量a
      0x00004340: b9000822        str w2, [x1, #8]
      // 下面的都是生成的一些art里的一些check逻辑,不用太关注
      0x00004344: 34000082        cbz w2, #+0x10 (addr 0x4354)
      0x00004348: f9404a70        ldr x16, [tr, #144] ; card_table
      0x0000434c: 530a7c31        lsr w17, w1, #10
      0x00004350: 38316a10        strb w16, [x16, x17]
      0x00004354: d5033abf        dmb ishst
      0x00004358: d65f03c0        ret

查看MainActivity.onCreate中0x1d那行创建c.a对象的机器码后的0x00011ea4那行,就能看到c.a的构造函数被正确内联进来了

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
      0x00011e80: f0000041        adrp x1, #+0xb000 (addr 0x1c000)
      0x00011e84: 1000007e        adr lr, #+0xc (addr 0x11e90)
      0x00011e88: b940f021        ldr w1, [x1, #240]
      0x00011e8c: b50153b4        cbnz x20, #+0x2a74 (addr 0x14900)
      0x00011e90: 340009c1        cbz w1, #+0x138 (addr 0x11fc8)
      0x00011e94: aa0003fa        mov x26, x0
      0x00011e98: aa0103e0        mov x0, x1
      0x00011e9c: f940ee7e        ldr lr, [tr, #472] ; pAllocObjectResolved
      0x00011ea0: d63f03c0        blr lr
        StackMap[18] (native_pc=0x11ea4, dex_pc=0x1d, register_mask=0x7c00000, stack_mask=0b)
          v2:r22 v3:r26 
      // 此时x0是刚分配的c.a类,w22是保存的MainActivity对象地址
      // 所以这就是c.a的构造函数的逻辑了
      0x00011ea4: b9000816        str w22, [x0, #8]
      0x00011ea8: f9404a70        ldr x16, [tr, #144] ; card_table
      0x00011eac: 530a7c11        lsr w17, w0, #10
      0x00011eb0: 38316a10        strb w16, [x16, x17]
      0x00011eb4: aa0003f0        mov x16, x0
      0x00011eb8: aa1703e0        mov x0, x23
      0x00011ebc: aa1003f7        mov x23, x16
      0x00011ec0: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized

这里跟Health有点不同,我们看到是用pAllocObjectResolved分配的c.a对象,因为dexopt时没有给我们的demo app生成.art文件

查看相关的逻辑就会发现是我们的demo app太小了,根本不会生成记录热点方法的profile文件

但是我们可以强制生成profile文件从而在dexopt时产生.art文件,See signal_catcher.cc#ForceProcessProfiles

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# 确实没有profile
➜ adb shell ls /data/misc/profiles/cur/0/com.wwm.kttest/

# 于是强制生成profile文件
➜ adb shell kill -SIGUSR1 `adb shell pidof com.wwm.kttest`
# cur下的的文件也会在app使用时由于jit发现新的热点方法而被更新
➜ adb shell ls /data/misc/profiles/cur/0/com.wwm.kttest/
primary.prof

# 强制用profile文件去生成.art文件
➜ adb shell cmd package compile -f -m speed-profile --force-merge-profile  com.wwm.kttest

➜ adb shell ls -l /data/app/~~cYJ9NQORRvhdErrqzcD-MQ==/com.wwm.kttest-_1S2AjWEJB1AUFrG43jazw==/oat/arm64/
-rw-r--r-- 1 system all_a313  16416 2024-10-18 15:37 base.art
-rw-r--r-- 1 system all_a313 116504 2024-10-18 15:37 base.odex
-rw-r--r-- 1 system all_a313  48720 2024-10-18 15:37 base.vdex
# 也会在ref目录下生成一个新的prof文件
# dex2oat会读取里面的hot method和startup method,然后进行优化
➜ adb shell ls -l /data/misc/profiles/ref/com.mmi.health/
-rw-r--r-- 1 system all_a313 178 2024-10-18 15:37 primary.prof

之后再去dump odex文件就发现分配c.a类用的就和health一样的pAllocObjectInitialized了

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
      0x00004ae0: 17ffffbc        b #-0x110 (addr 0x49d0)
      0x00004ae4: aa0003fc        mov x28, x0
      0x00004ae8: aa1c03e0        mov x0, x28
      0x00004aec: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
      0x00004af0: d63f03c0        blr lr
        StackMap[17] (native_pc=0x4af4, dex_pc=0x1b, register_mask=0x7c00000, stack_mask=0b)
          v2:r22 
      0x00004af4: aa0003f0        mov x16, x0
      0x00004af8: aa1703e0        mov x0, x23
      0x00004afc: aa1003f7        mov x23, x16
      0x00004b00: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
      0x00004b04: d63f03c0        blr lr
        StackMap[18] (native_pc=0x4b08, dex_pc=0x1d, register_mask=0x7c00000, stack_mask=0b)
          v2:r22 v3:r23 
      // c.a类的构造函数里的指令
      0x00004b08: b9000816        str w22, [x0, #8]
      0x00004b0c: f9404a70        ldr x16, [tr, #144] ; card_table
      0x00004b10: 530a7c11        lsr w17, w0, #10
      0x00004b14: 38316a10        strb w16, [x16, x17]
      0x00004b18: aa0003f0        mov x16, x0
      0x00004b1c: aa1803e0        mov x0, x24
      0x00004b20: aa1003f8        mov x24, x16

我们的demo app使用了跟Healthapp基本是一模一样的代码,但在被dexopt后,我们看到机器码里是明确有执行c.a的构造函数逻辑的,所以并没复现Health那样的FC

也可以一定程度上反映出这个问题的的没那么容易复现

分析Healthodex里的汇编代码

在此之前,我还是想确认下crash时zyb.a到底是不是null的,最简单的办法自然是在crash时抓一个hprof

先抓个hprof确认下zyb对象的内容

但是这里比较麻烦的是我此时还没找到100%复现的步骤,我刷了几个自己编译的ROM版本,按照相同的复现步骤去做dexopt优化,就是怎么也没法复现,只有在正式版本上才能复现

所以就干脆改一下art的代码,在创建NetProxyThread时立即导出hprof文件,然后推到复现的手机上bind mount一下libart.so

改动的逻辑如下:

dump-hprof-before-starting-NetProxy-thread.png

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# 编译libart
➜ make libart

➜ adb shell stop
➜ adb push out/target/product/ssi/apex/com.android.art/lib64/libart.so /data/local/tmp/libart.so
# 执行bind mount
➜ adb shell mount --bind /data/local/tmp/libart.so /apex/com.android.art/lib64/libart.so
# 因为/data/loca/tmp/的selinux label和/apex/下的不同,所以要关闭selinux
➜ adb shell setenforce 0 
➜ adb shell start

crash前抓的hprof:

zyb-a-is-null.png

用Eeclipse MAT解析hprof文件能看到zyb.a确实是null的

分析NetProxyManager类方法的汇编

继续使用oatdump导出NetProxyManager.start方法字节码

1
➜ adb shell oatdump --class-filter=com.mi.fitness.netproxy.NetProxyManager --method-filter=start --oat-file=/data/app/~~_clE1wJMsirMD3z6PJZW8Q==/com.mmi.health-PUoSwThSCagDrUZ3D-gKrg==/oat/arm64/base.odex

其中创建和构造zyb对象的部分如下:

1
2
3
4
5
6
7
8
  8: void com.mi.fitness.netproxy.NetProxyManager.start() (dex_method_idx=5996)
    DEX CODE:
      ... 
      0x0041: 2200 1e0e        | new-instance v0, java.lang.Thread // type@TypeIndex[3614]
      0x0043: 2201 5a12        | new-instance v1, zyb // type@TypeIndex[4698]
      0x0045: 7020 224f 5100   | invoke-direct {v1, v5}, void zyb.<init>(com.mi.fitness.netproxy.NetProxyManager) // method@20258
      0x0048: 7020 963f 1000   | invoke-direct {v0, v1}, void java.lang.Thread.<init>(java.lang.Runnable) // method@16278
      ...

start方法的字节码对应的机器码特别长,这里就摘取最关键一部分

查找字节码对应的机器码
通过stackmap信息里的dex_pc能快速找到0x0041那行new-instance指令对应的机器码位置,See art/oatdump/oatdump.cc
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 CODE: (code_offset=0x00c82a90 size=1440)...
  ...
  0x00c82c9c: d0002240        adrp x0, #+0x44a000 (addr 0x10cc000)
  0x00c82ca0: b9408c00        ldr w0, [x0, #140]
  0x00c82ca4: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
  0x00c82ca8: d63f03c0        blr lr  // 对应new-instance v0, java.lang.Thread
    StackMap[14] (native_pc=0xc82cac, dex_pc=0x41, register_mask=0x5400000, stack_mask=0b100000000000000000000000000100000000000)
      v0:r22 v4:r26 v5:r24 
  0x00c82cac: b90033e0        str w0, [sp, #48]

  0x00c82cb0: 900022a1        adrp x1, #+0x454000 (addr 0x10d6000)
  0x00c82cb4: b948cc21        ldr w1, [x1, #2252]
  0x00c82cb8: aa0003f6        mov x22, x0
  0x00c82cbc: aa0103e0        mov x0, x1
  0x00c82cc0: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
  0x00c82cc4: d63f03c0        blr lr  // 对应new-instance v1, zyb
    StackMap[15] (native_pc=0xc82cc8, dex_pc=0x43, register_mask=0x5400000, stack_mask=0b100000000000000000000000001100000000000)
      v0:r22 v4:r26 v5:r24                 

  // 从这里向下的一大段的指令都是执行Thread类的构造函数(因为内联了,所以很长)
  0x00c82cc8: d0002241        adrp x1, #+0x44a000 (addr 0x10cc000)
  0x00c82ccc: b9444021        ldr w1, [x1, #1088]
  0x00c82cd0: aa0003f9        mov x25, x0 
  0x00c82cd4: aa0103e0        mov x0, x1
  0x00c82cd8: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
  0x00c82cdc: d63f03c0        blr lr            
    StackMap[16] (native_pc=0xc82ce0, dex_pc=0x48, register_mask=0x7400000, stack_mask=0b100000000000000000000000001100000000000)
      v0:r22 v1:r25 v4:r26 v5:r24 
    InlineInfo[2] (depth=0, dex_pc=0x0, method_index=4521, is_in_bootclasspath=true, dex_file_index=0)
      v7:r22 v8:r25 
  0x00c82ce0: d5033abf        dmb ishst
  • pAllocObjectInitialized是用于分配一个已经被加载和初始化好的类对象实例,简单理解这个类对象的结构已经被记录到了app或fw的.art 里了(关于.art/.vdex/.odex的简单介绍See how_art_works
  • 对应的还有一个pAllocObjectResolved,就是用于加载一个不在.art里的类

那个pAllocObjectInitialized实际是跳到art_quick_alloc_object_initialized_region_tlab去执行

如何知道是art_quick_alloc_object_initialized_region_tlab?
之前我也没研究过这块代码,所以快速的方法就是用lldb attach到目标进程上,打个断点到0x00c82cd8,使用register read lr进行查看
还不会用lldb调试Android程序? See lldbclient.py

先看下art_quick_alloc_object_initialized_region_tlab的逻辑

1
2
3
GENERATE_ALLOC_OBJECT_RESOLVED_TLAB \
    art_quick_alloc_object_initialized_region_tlab, \
    artAllocObjectFromCodeInitializedRegionTLAB, /* isInitialized */ 1

GENERATE_ALLOC_OBJECT_RESOLVED_TLAB的定义

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
// The common code for art_quick_alloc_object_*region_tlab
// Currently the implementation ignores isInitialized. TODO(b/172087402): clean this up.
// Caller must execute a constructor fence after this.
.macro GENERATE_ALLOC_OBJECT_RESOLVED_TLAB name, entrypoint, isInitialized
ENTRY \name
    // Fast path region tlab allocation.
    // x0: type, xSELF(x19): Thread::Current
    // x1-x7: free.
    ALLOC_OBJECT_TLAB_FAST_PATH_RESOLVED .Lslow_path\name, \isInitialized
.Lslow_path\name:
    SETUP_SAVE_REFS_ONLY_FRAME                 // Save callee saves in case of GC.
    mov    x1, xSELF                           // Pass Thread::Current.
    bl     \entrypoint                         // (mirror::Class*, Thread*)
    RESTORE_SAVE_REFS_ONLY_FRAME
    REFRESH_MARKING_REGISTER
    RETURN_OR_DEOPT_IF_RESULT_IS_NON_NULL_OR_DELIVER
END \name
.endm

其实就是先跳转ALLOC_OBJECT_TLAB_FAST_PATH_RESOLVED到执行

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// If isInitialized=1 then the compiler assumes the object's class has already been initialized.
// If isInitialized=0 the compiler can only assume it's been at least resolved.
.macro ALLOC_OBJECT_TLAB_FAST_PATH_RESOLVED slowPathLabel isInitialized
    ldr    x4, [xSELF, #THREAD_LOCAL_POS_OFFSET]
    ldr    x5, [xSELF, #THREAD_LOCAL_END_OFFSET]
    ldr    w7, [x0, #MIRROR_CLASS_OBJECT_SIZE_ALLOC_FAST_PATH_OFFSET]  // Load the object size (x7).
    add    x6, x4, x7                                         // Add object size to tlab pos.
    cmp    x6, x5                                             // Check if it fits, overflow works
                                                              // since the tlab pos and end are 32
                                                              // bit values.

    // If the class is not yet visibly initialized, or it is finalizable,
    // the object size will be very large to force the branch below to be taken.
    //
    // See Class::SetStatus() in class.cc for more details.
    bhi    \slowPathLabel
    str    x6, [xSELF, #THREAD_LOCAL_POS_OFFSET]              // Store new thread_local_pos.
    POISON_HEAP_REF w0
    str    w0, [x4, #MIRROR_OBJECT_CLASS_OFFSET]              // Store the class pointer.
    mov    x0, x4
    // No barrier. The class is already observably initialized (otherwise the fast
    // path size check above would fail) and new-instance allocations are protected
    // from publishing by the compiler which inserts its own StoreStore barrier.
    ret
.endm

ALLOC_OBJECT_TLAB_FAST_PATH_RESOLVED

  • 先判断分配的类对象的大小能不能在当前线程的thread local region放得下,放得下就在tlab里分配这个对象;

  • 否则返回到 GENERATE_ALLOC_OBJECT_RESOLVED_TLAB里的slowpath,也就是跳到artAllocObjectFromCodeInitializedRegionTLAB方法里,在常规的堆空间里分配对象

接着再瞅一眼artAllocObjectFromCodeInitializedRegionTLAB方法的实现

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// art/runtime/entrypoints/quick/quick_alloc_entrypoints.cc

static constexpr bool kUseTlabFastPath = true;

template <bool kInitialized,
          bool kWithChecks,
          bool kInstrumented,
          gc::AllocatorType allocator_type>
static ALWAYS_INLINE inline mirror::Object* artAllocObjectFromCode(
    mirror::Class* klass,
    Thread* self) REQUIRES_SHARED(Locks::mutator_lock_) {
  ScopedQuickEntrypointChecks sqec(self);
  DCHECK(klass != nullptr);
  if (kUseTlabFastPath &&
      !kWithChecks &&
      !kInstrumented &&
      allocator_type == gc::kAllocatorTypeTLAB) {
    // 这段逻辑跟上面的 GENERATE_ALLOC_OBJECT_RESOLVED_TLAB 里开头的部分一模一样
    // 如果klass打包的类已经被虚拟机加载和初始化过了,
    // 那么GetObjectSizeAllocFastPath就会返回这个klass类的大小(包含一些padding),
    // 然后再尝试在tlab里分别一个klass实例对象,否则的话就会返回一个特别大的数,去执行下面的slowpath
    size_t byte_count = klass->GetObjectSizeAllocFastPath();
    if (LIKELY(byte_count < self->TlabSize())) {
      static_assert(kObjectAlignment == gc::space::BumpPointerSpace::kAlignment, "Alignment check");
      DCHECK_ALIGNED(byte_count, gc::space::BumpPointerSpace::kAlignment);
      mirror::Object* obj = self->AllocTlab(byte_count);
      DCHECK(obj != nullptr) << "AllocTlab can't fail";
      obj->SetClass(klass);  // 设置对象指针
      if (kUseBakerReadBarrier) {
        obj->AssertReadBarrierState();
      }
      QuasiAtomic::ThreadFenceForConstructor();
      return obj;
    }
  }
  if (kInitialized) {
    return AllocObjectFromCodeInitialized<kInstrumented>(klass, self, allocator_type).Ptr();
  } else if (!kWithChecks) {
    return AllocObjectFromCodeResolved<kInstrumented>(klass, self, allocator_type).Ptr();
  } else {
    return AllocObjectFromCode<kInstrumented>(klass, self, allocator_type).Ptr();
  }
}

extern "C" mirror::Object* artAllocObjectFromCodeInitialized##suffix##suffix2( \
    mirror::Class* klass, Thread* self) \
    REQUIRES_SHARED(Locks::mutator_lock_) { \
  return artAllocObjectFromCode<true, false, instrumented_bool, allocator_type>(klass, self); \
} \

#define GENERATE_ENTRYPOINTS_FOR_ALLOCATOR(suffix, allocator_type) \
    GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, Instrumented, true, allocator_type) \
    GENERATE_ENTRYPOINTS_FOR_ALLOCATOR_INST(suffix, , false, allocator_type)

GENERATE_ENTRYPOINTS_FOR_ALLOCATOR(RegionTLAB, gc::kAllocatorTypeRegionTLAB)

我看这些的目的其实就是想100%确认 art_quick_alloc_object_initialized_region_tlab 只负责分配一个类的实例对象,不会执行这个类对象的构造函数(虽然从函数参数的数量上也能知道是不会的)

查看zyb的构造函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
 adb shell oatdump --class-filter=zyb --oat-file=/data/app/~~_clE1wJMsirMD3z6PJZW8Q==/com.mmi.health-PUoSwThSCagDrUZ3D-gKrg==/oat/arm64/base.odex

void zyb.<init>(com.mi.fitness.netproxy.NetProxyManager) (dex_method_idx=20258)
  DEX CODE:
    0x0000: 7010 273f 0000   | invoke-direct {v0}, void java.lang.Object.<init>() // method@16167
    0x0003: 5b01 ffff        | iput-object v1, v0, Lcom/mi/fitness/netproxy/NetProxyManager; zyb.a // field@65535
    0x0005: 0e00             | return-void

  CODE: (code_offset=0x004254b0 size=28)...
    0x004254b0: b9000822        str w2, [x1, #8]
    0x004254b4: 34000082        cbz w2, #+0x10 (addr 0x4254c4)
    0x004254b8: f9404a70        ldr x16, [tr, #144] ; card_table
    0x004254bc: 530a7c31        lsr w17, w1, #10
    0x004254c0: 38316a10        strb w16, [x16, x17]
    0x004254c4: d5033abf        dmb ishst
    0x004254c8: d65f03c0        ret

回到NetProxyManager.start方法的汇编代码里,用dex_pc=0x45我们根本就找不到调用zyb构造函数的地方

  • 要么是zyb.<init>里的指令都被内联到NetProxyManager.start里了
  • 或者就是它丢了?
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
  0x00c82cb0: 900022a1        adrp x1, #+0x454000 (addr 0x10d6000)
  0x00c82cb4: b948cc21        ldr w1, [x1, #2252]
  0x00c82cb8: aa0003f6        mov x22, x0
  0x00c82cbc: aa0103e0        mov x0, x1
  0x00c82cc0: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
  0x00c82cc4: d63f03c0        blr lr  ; // 对应new-instance v1, zyb
    StackMap[15] (native_pc=0xc82cc8, dex_pc=0x43, register_mask=0x5400000, stack_mask=0b100000000000000000000000001100000000000)
      v0:r22 v4:r26 v5:r24                 

  // 从这里向下的一大段的指令都是执行Thread类的构造函数(内联了)
  0x00c82cc8: d0002241        adrp x1, #+0x44a000 (addr 0x10cc000)
  0x00c82ccc: b9444021        ldr w1, [x1, #1088]
  0x00c82cd0: aa0003f9        mov x25, x0 ; // 保存了zyb对象
  0x00c82cd4: aa0103e0        mov x0, x1
  0x00c82cd8: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
  0x00c82cdc: d63f03c0        blr lr      ; // 分配StringBuilder对象,也就是设置线程的默认名"Thread-" + nextThreadNum()
    StackMap[16] (native_pc=0xc82ce0, dex_pc=0x48, register_mask=0x7400000, stack_mask=0b100000000000000000000000001100000000000)
      v0:r22 v1:r25 v4:r26 v5:r24 
    InlineInfo[2] (depth=0, dex_pc=0x0, method_index=4521, is_in_bootclasspath=true, dex_file_index=0)
      v7:r22 v8:r25 
  0x00c82ce0: d5033abf        dmb ishst
  0x00c82ce4: aa0003e1        mov x1, x0  ; // x0此时是StringBuilder对象
  0x00c82ce8: aa0103fb        mov x27, x1 ; // x27里保存SB对象
  0x00c82cec: f0002240        adrp x0, #+0x44b000 (addr 0x10cd000)
  0x00c82cf0: b9437400        ldr w0, [x0, #884] 
  0x00c82cf4: f9400c1e        ldr lr, [x0, #24]
  0x00c82cf8: d63f03c0        blr lr      ; // StringBuilder.append

0x00c82cc4处分配的zyb对象后,就只有把zyb对象的地址保存到了x25里了,后来就没再使用它了

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
0x00c82d78: aa1703e7        mov x7, x23
0x00c82d7c: aa1603e1        mov x1, x22
0x00c82d80: aa1903e3        mov x3, x25 // x25里保存的zyb对象地址赋值到x3
0x00c82d84: aa0003e4        mov x4, x0
0x00c82d88: aa0403f7        mov x23, x4
0x00c82d8c: 52800006        mov w6, #0x0
0x00c82d90: aa0603e2        mov x2, x6
0x00c82d94: d2800005        mov x5, #0x0
0x00c82d98: f0002240        adrp x0, #+0x44b000 (addr 0x10cd000)
0x00c82d9c: b9485800        ldr w0, [x0, #2136]
0x00c82da0: f9400c1e        ldr lr, [x0, #24]
0x00c82da4: d63f03c0        blr lr  // 对应Thread.init(ThreadGroup g, Runnable target, String name,
                                    //                long stackSize, AccessControlContext acc)

再向下查看使用x25的地方,就只有把它复制到x3里,作为Thread类的构造函数参数使用了

也就是即没有用blr指令跳转到zyb构造函数去执行,也没有把zyb构造函数内联进来执行

这跟我们的demo app是明显不同!

zyb的构造函数真丢了

dexopt优化把zyb的构造函数弄没了,那下面就要分析dexopt优化NetProxyManager.start方法的过程了

分析字节码转机器码的流程

日志打印的太多了,logcat日志丢失怎么办?
  • 把logd的所有buffer都调大,adb shell logcat -G 8M -b all
  • 开发者选项里把log默认级别调到verbose或debug级别
  • logd有个限流逻辑,如果打印的太多了,log会别直接丢掉,这时候可以参考如下改动,编译liblog推到手机上替换
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
--- a/liblog/logd_writer.cpp
+++ b/liblog/logd_writer.cpp
@@ -189,6 +189,12 @@ int LogdWrite(log_id_t logId, struct timespec* ts, struct iovec* vec, size_t nr)
     logd_socket.Reconnect();
 
     ret = TEMP_FAILURE_RETRY(writev(logd_socket.sock(), newVec, i));
+  } else {
+    // 不要丢弃我们感兴趣的进程的log,这里也就是uid=artd的进程
+    while (ret < 0 && errno == EAGAIN && getuid() == 1082) {
+      ret = TEMP_FAILURE_RETRY(writev(logd_socket.sock(), newVec, i));
+    }
   }

dexopt流程

dexopt.png

dex2oat64流程

dex2oat-flow.png

dex2oat优化过程里针对字节码会进行多轮优化,其中例如常量展开,指令精简,内联优化等,最后由CodeGeneratorARM64生成最后优化后机器码

回到问题本身,我们知道zyb的构造函数丢了,但不太确定是具体在哪一个优化环节还是CodeGeneratorARM64出现了问题

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// modified:   compiler/optimizing/optimizing_compiler.cc
@@ -650,7 +650,7 @@ void OptimizingCompiler::RunOptimizations(HGraph* graph,
       OptDef(OptimizationPass::kDeadCodeElimination,
              "dead_code_elimination$initial"),
       // Inlining.
-      OptDef(OptimizationPass::kInliner),
+      // OptDef(OptimizationPass::kInliner),
       // Simplification (if inlining occurred, or if we analyzed the invoke as "always throwing").
       OptDef(OptimizationPass::kConstantFolding,
              "constant_folding$after_inlining",

那其实可以把这里面的的优化环节逐个跳过,再去对比生成的机器码,就能大概定位到哪一个环节有问题

而我们都知道zyb的构造函数很小,肯定是会被内联到NetProxyManager.start的机器码里的,所以我有理由怀疑是inline环节有问题

那就直接把kInliner优化去掉,编译libart再进行dexopt优化,就能看到NetProxyManager.start的机器码变成了:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
8: void com.mi.fitness.netproxy.NetProxyManager.start() (dex_method_idx=5996)
  DEX CODE:
    // ...
    0x0041: 2200 1e0e      | new-instance v0, java.lang.Thread // type@TypeIndex[3614]
    0x0043: 2201 5a12      | new-instance v1, zyb // type@TypeIndex[4698]
    0x0045: 7020 224f 5100 | invoke-direct {v1, v5}, void zyb.<init>(com.mi.fitness.netproxy.NetProxyManager) // method@20258
    0x0048: 7020 963f 1000 | invoke-direct {v0, v1}, void java.lang.Thread.<init>(java.lang.Runnable) // method@16278
    // ...
  CODE: (code_offset=0x0020fa70 size=880)...
    // ...
    0x0020fc44: 90000721        adrp x1, #+0xe4000 (addr 0x2f3000)
    0x0020fc48: b945c821        ldr w1, [x1, #1480]
    0x0020fc4c: aa0003f6        mov x22, x0
    0x0020fc50: aa0103e0        mov x0, x1
    0x0020fc54: f940f27e        ldr lr, [tr, #480] ; pAllocObjectInitialized
    0x0020fc58: d63f03c0        blr lr // 对应new-instance v1, zyb
      StackMap[15] (native_pc=0x20fc5c, dex_pc=0x43, register_mask=0x5400000, stack_mask=0b)
      
    0x0020fc5c: d5033abf        dmb ishst
    0x0020fc60: aa1803e2        mov x2, x24 // x24此时是NetProxyManager对象地址
    0x0020fc64: aa0003e1        mov x1, x0  // x0是zyb对象地址
    0x0020fc68: aa0103f7        mov x23, x1 // 因为后面还会用到zyb,将zyb对象临时保存到x23
    0x0020fc6c: f0000760        adrp x0, #+0xef000 (addr 0x2fe000)
    0x0020fc70: f9437000        ldr x0, [x0, #1760]
    0x0020fc74: f9400c1e        ldr lr, [x0, #24]
    0x0020fc78: d63f03c0        blr lr // 对应是去执行invoke-direct {v1, v5}, void zyb.<init>
      StackMap[16] (native_pc=0x20fc7c, dex_pc=0x45, register_mask=0x5c00000, stack_mask=0b)

能看到zyb的构造函数有被调用,也就能更能确定是kInliner优化把zyb构造函数弄丢了!

所以那下面就要分析具体是inliner的具体那个环节出现了问题

Inliner过程

其中TryBuildAndInline包含了以下几个部分

  1. 首先尝试用TryPatternSubstitution 对invoke-xxx的目标方法进行内联,如果优化成功,直接返回
  2. 如果TryPatternSubstitution优化失败,则回退到用MayInline对目标方法进行内联
  3. 通过MaybeReplaceAndRemove删除原来的invoke-xxx指令

art代码内部有个debug log开关,分别能控制是否输出如垃圾回收,类加载,odex文件加载,dex2oat编译等log

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// art/libartbase/base/logging.h
// The members of this struct are the valid arguments to VLOG and VLOG_IS_ON in code,
// and the "-verbose:" command line argument.
struct LogVerbosity {
  bool class_linker;  // Enabled with "-verbose:class".
  bool collector;
  bool compiler;
  bool deopt;
  bool gc;
  bool heap;
  bool interpreter;  // Enabled with "-verbose:interpreter".
  bool jdwp;
  bool jit;
  ...

例如我们可以通过下面的方式开启跟dex2oat和类加载相关的log

1
2
3
➜ adb shell 'setprop dalvik.vm.extra-opts "-verbose:compiler,class"'
# App进程都是由zygote孵化的,所以得重启下zygote才能生效
➜ adb shell kill -9 `adb shell pidof zygote64`

但dex2oat和zygote有点不同,创建的art runtime的时候并没有解析这个extra-opts,所以我们得手动将LogVerbosity.compiler初始化成true的

之后再编译libart.so推送到手机上后再去触发dexopt,能看到下面的log

1
10-18 14:27:59.819  artd 22921 22921 I dex2oat64: Successfully replaced pattern of invoke void zyb.<init>(com.mi.fitness.netproxy.NetProxyManager)

那么我就知道对zyb的构造函数的内联是由TryPatternSubstitution完成的

TryPatternSubstitution代码流程如下

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
// Try to recognize known simple patterns and replace invoke call with appropriate instructions.
bool HInliner::TryPatternSubstitution(HInvoke* invoke_instruction,
                                      ArtMethod* method,
                                      const CodeItemDataAccessor& accessor,
                                      HInstruction** return_replacement) {
  InlineMethod inline_method;
  if (!InlineMethodAnalyser::AnalyseMethodCode(method, &accessor, &inline_method)) {
    return false;
  }

  size_t number_of_instructions = 0u;  // Note: We do not count constants.
  switch (inline_method.opcode) {
    case kInlineOpNop:
      // ...
      break;
    case kInlineOpReturnArg:
      // ...
      break;
    case kInlineOpNonWideConst: {
        ...
        break
    }
    case kInlineOpIGet: {
        ...
        break
    }
    case kInlineOpIPut: {
        // ...
        break
    }
    case kInlineOpConstructor: {
      ...
      break;
    }
    ...
  }

zyb.<init>是个构造函数,那肯定是执行case kInlineOpConstructor分支

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
bool HInliner::TryPatternSubstitution(HInvoke* invoke_instruction,
                                      ArtMethod* method,
                                      const CodeItemDataAccessor& accessor,
                                      HInstruction** return_replacement) {
  InlineMethod inline_method;
  if (!InlineMethodAnalyser::AnalyseMethodCode(method, &accessor, &inline_method)) {
    return false;
  }

  size_t number_of_instructions = 0u;  // Note: We do not count constants.
  switch (inline_method.opcode) {
    // ...
    case kInlineOpConstructor: {
      const InlineConstructorData& data = inline_method.d.constructor_data;
      // Get the indexes to arrays for easier processing.
      uint16_t iput_field_indexes[] = {
          data.iput0_field_index, data.iput1_field_index, data.iput2_field_index
      };
      uint16_t iput_args[] = { data.iput0_arg, data.iput1_arg, data.iput2_arg };
      // Count valid field indexes.
      size_t number_of_iputs = 0u;
      while (number_of_iputs != arraysize(iput_field_indexes) &&
          iput_field_indexes[number_of_iputs] != DexFile::kDexNoIndex16) {
        ++number_of_iputs;
      }
      // Create HInstanceFieldSet for each IPUT that stores non-zero data.
      HInstruction* obj = GetInvokeInputForArgVRegIndex(invoke_instruction,
                                                        /* arg_vreg_index= */ 0u);
      bool needs_constructor_barrier = false;
      for (size_t i = 0; i != number_of_iputs; ++i) {
        HInstruction* value = GetInvokeInputForArgVRegIndex(invoke_instruction, iput_args[i]);
        if (!IsZeroBitPattern(value)) {
          uint16_t field_index = iput_field_indexes[i];
          bool is_final;
          HInstanceFieldSet* iput =
              CreateInstanceFieldSet(field_index, method, obj, value, &is_final);
          invoke_instruction->GetBlock()->InsertInstructionBefore(iput, invoke_instruction);

          // Check whether the field is final. If it is, we need to add a barrier.
          if (is_final) {
            needs_constructor_barrier = true;
          }
        }
      }
      if (needs_constructor_barrier) {
        HConstructorFence* constructor_fence =
            new (graph_->GetAllocator()) HConstructorFence(obj, kNoDexPc, graph_->GetAllocator());
        invoke_instruction->GetBlock()->InsertInstructionBefore(constructor_fence,
                                                                invoke_instruction);
      }
      *return_replacement = nullptr;
      number_of_instructions = number_of_iputs + (needs_constructor_barrier ? 1u : 0u);
      break;
    }
  }

case kInlineOpConstructor这段就是基于AnalyseMethodCode返回的inlined_method里的iput[0-2]_field_index和iput[0-2]_arg,将zyb构造函数里的指令插入到invoke-direct {v1, v5}, void zyb.<init>指令之前 ,然后再删除掉原来invoke-direct那条字节码指令,从而实现对zyb构造函数的内联

那么就会想到是不是这里并没执行InsertInstructionBefore呢?

上面代码里的number_of_instructions是记录了插入了多少条字节码指令的,当我把它打印出来时发现它是0!

1
2
3
4
10-18 14:27:59.819  artd 22921 22921 I dex2oat64: TryPatternSubstitution: void zyb.<init>(com.mi.fitness.netproxy.NetProxyManager)
# kInlineOpConstructor = 5
10-18 14:27:59.819  artd 22921 22921 I dex2oat64: TryPatternSubstitution: inline_method.opcode=5 
10-18 14:27:59.819  artd 22921 22921 I dex2oat64: TryPatternSubstitution: number_of_instructions=0

接着就再把number_of_iputs也打出来,发现它也是0

那么结合逻辑就能知道iput_field_indexes这个数据里面的元素应该全部都是DexFile::kDexNoIndex16

这时候就不用再继续打log了,直接lldb 直接attach上去,查看这段逻辑里各个变量的值,会更直观

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
ANDROID_PRODUCT_OUT=$PWD/out/target/product/ssi/ gdbclient.py -n dex2oat64
 (lldb) pro hand -p true -s false -n false SIGSEGV SIGCHLD 35

 # 设置断点
(lldb)  br set --file inliner.cc --line 1785
Breakpoint 1: 2 locations.
# dex2oat64此时处于stop状态,恢复执行
(lldb) c
Process 27128 resuming
Process 27128 stopped
* thread #1, name = 'dex2oat', stop reason = breakpoint 1.2
    frame #0: 0x000000710d41f6f0 libart.so`art::HInliner::TryBuildAndInline(art::HInvoke*, art::ArtMethod*, art::ReferenceTypeInfo, art::HInstruction**, bool) [inlined] art::HInliner::TryPatternSubstitution(this=0x0000007fdbfe9198, invoke_instruction=0xb400006e5dadfac0, method=0x00000000701c7c00, accessor=0x0000007fdbfe8be8, return_replacement=0x0000007fdbfe8e50) at inliner.cc:1785:39
   1782     case kInlineOpConstructor: {
   1783       const InlineConstructorData& data = inline_method.d.constructor_data;
   1784       // Get the indexes to arrays for easier processing.
-> 1785       uint16_t iput_field_indexes[] = {
   1786           data.iput0_field_index, data.iput1_field_index, data.iput2_field_index
   1787       };
   1788       uint16_t iput_args[] = { data.iput0_arg, data.iput1_arg, data.iput2_arg };
   
 (lldb) fr v inline_method.d.constructor_data 
(art::InlineConstructorData) inline_method.d.constructor_data = {
  iput0_field_index = 65535 # DexFile::kDexNoIndex16
  iput1_field_index = 65535
  iput2_field_index = 65535
  iput0_arg = 1
  iput1_arg = 0
  iput2_arg = 0
  reserved = 0
}

iput_field_indexes里各项是来自inline_method.d.constructor_data

得再去分析inline_method.d.constructor_data里的iput[0-2]_field_index是怎么被赋值的

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// inline_method_analyser.cc
bool AnalyseConstructor(const CodeItemDataAccessor* code_item,
                        ArtMethod* method,
                        InlineMethod* result)
    REQUIRES_SHARED(Locks::mutator_lock_) {
  // kMaxConstructorIPuts=3
  ConstructorIPutData iputs[kMaxConstructorIPuts];
  if (!DoAnalyseConstructor(code_item, method, iputs)) {
    return false;
  }

// 这段就是把ipus数组里的3项一次复制到
// result->d.constructor_data.iputs[0-2]_field_index 和
// result->d.constructor_data.iputs[0-2]_arg里
#define STORE_IPUT(n)                                                         \
  do {                                                                        \
    result->d.constructor_data.iput##n##_field_index = iputs[n].field_index;  \
    result->d.constructor_data.iput##n##_arg = iputs[n].arg;                  \
  } while (false)

  STORE_IPUT(0);
  STORE_IPUT(1);
  STORE_IPUT(2);
#undef STORE_IPUT

  result->opcode = kInlineOpConstructor;
  result->d.constructor_data.reserved = 0u;
  return true;
}

bool DoAnalyseConstructor(const CodeItemDataAccessor* code_item,
                          ArtMethod* method,
                          /*inout*/ ConstructorIPutData (&iputs)[kMaxConstructorIPuts])
      REQUIRES_SHARED(Locks::mutator_lock_) {
  // ...
  for (const DexInstructionPcPair& pair : *code_item) {
    const Instruction& instruction = pair.Inst();
    if (instruction.Opcode() == Instruction::RETURN_VOID) {
      break;
    } else if (instruction.Opcode() == Instruction::INVOKE_DIRECT) {
      // 如果是继续调用当前类的构造函数或父类的构造函数的话,就递归执行DoAnalyseConstructor
      // ...
    } else if (IsInstructionDirectConst(instruction.Opcode())) {
      // ...
    } else {
      if (!RecordConstructorIPut(method, &instruction, this_vreg, zero_vreg_mask, iputs)) {
        return false;
      }
    }
  }
  return true;
}

zyb.<init>里的第一条字节码是调用父类Object的构造函数,对应for-loop里的第一个else if分支,而Object类的构造函数里其实啥指令也没有,那么无需关注
zyb.<init>里的第二条字节码是iput-object v1, v0, Lcom/mi/fitness/netproxy/NetProxyManager; zyb.a,即初始化成员变量a,对应最后一个else分支

所以继续看RecordConstructorIPut函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
bool RecordConstructorIPut(ArtMethod* method,
                           const Instruction* new_iput,
                           uint16_t this_vreg,
                           uint16_t zero_vreg_mask,
                           /*inout*/ ConstructorIPutData (&iputs)[kMaxConstructorIPuts])
    REQUIRES_SHARED(Locks::mutator_lock_) {
  uint32_t field_index = new_iput->VRegC_22c();
  ClassLinker* class_linker = Runtime::Current()->GetClassLinker();
  ArtField* field = class_linker->LookupResolvedField(field_index, method, /* is_static= */ false);
  if (UNLIKELY(field == nullptr)) {
    return false;
  }
  // Remove previous IPUT to the same field, if any. Different field indexes may refer
  // to the same field, so we need to compare resolved fields from the dex cache.
  for (size_t old_pos = 0; old_pos != arraysize(iputs); ++old_pos) {
    if (iputs[old_pos].field_index == DexFile::kDexNoIndex16) {
      break;
    }
    ArtField* f = class_linker->LookupResolvedField(iputs[old_pos].field_index,
                                                    method,
                                                    /* is_static= */ false);
    if (f == field) {
      auto back_it = std::copy(iputs + old_pos + 1, iputs + arraysize(iputs), iputs + old_pos);
      *back_it = ConstructorIPutData();
      break;
    }
  }
  // If the stored value isn't zero, record the IPUT.
  if ((zero_vreg_mask & (1u << new_iput->VRegA_22c())) == 0u) {
    size_t new_pos = 0;
    while (new_pos != arraysize(iputs) && iputs[new_pos].field_index != DexFile::kDexNoIndex16) {
      ++new_pos;
    }
    if (new_pos == arraysize(iputs)) {
      return false;  // Exceeded capacity of the output array.
    }
    
    iputs[new_pos].field_index = field_index;
    iputs[new_pos].arg = new_iput->VRegA_22c() - this_vreg;
  }
  return true;
}

iputs数组的field_index来自new_iput->VRegC_22c()

突然一下就感觉是不是因为zyb.a这个成员变量的field index刚好就是0xFFFF?

再用lldb attach上去

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
(lldb) br set --file inline_method_analyser.cc --line 220

* thread #1, name = 'dex2oat', stop reason = breakpoint 1.2
    frame #0: 0x000000777bbb0dfc libart.so`art::(anonymous namespace)::DoAnalyseConstructor(art::CodeItemDataAccessor const*, art::ArtMethod*, art::(anonymous namespace)::ConstructorIPutData (&) [3]) (.__uniq.313498270130157561975199163952389749517) [inlined] art::(anonymous namespace)::RecordConstructorIPut(method=0x00000000970b1c48, new_iput=0x00000074de32b876, this_vreg=0, zero_vreg_mask=0, iputs=0x0000007fd679a958)::ConstructorIPutData (&) [3]) (.__uniq.313498270130157561975199163952389749517) at inline_method_analyser.cc:220:5
   217    DCHECK(IsInstructionIPut(new_iput->Opcode()));
   218    uint32_t field_index = new_iput->VRegC_22c();
   219    if (field_index == DexFile::kDexNoIndex16) {
-> 220      LOG(INFO) << "Filed index too big";
   221    }

// 变量field_index被编译器优化掉了,无法直接查看它的值
// 但是我们可以直接基于new_iput->VRegC_22c()的实现直接从内存里读出field_index
(lldb) fr v new_iput 
(const art::Instruction *) new_iput = 0x00000074de32b876
(lldb) memory read --format hex --size 2 --count 2 0x00000074de32b876
0x74de32b876: 0x015b 0xffff
// 其中0x5b对应iput-object,0x01对应第一个参数
// 0xffff就是zyb.a的field_index

这里其实也可以编译make com.android.art.debug 然后用这个debug art下的libart.so去替换手机上的libart.so,因为正式art用的是 -O2 优化,而debug art里用的是 -Og 优化,保留了变量的调试信息,就能在上面直接用fr v field_index进行查看

编译器优化选项:
-Og(调试优化)是一种专门为调试而设计的优化级别,提供一定程度的优化以提高性能,但也尽可能地保留了代码的调试信息和原始结构
-O0 编译器不进行任何优化
-O1 编译器会进行一些基本的优化,如减少代码中的冗余计算、简单的常量折叠等
-O2 编译器会进行更多的优化操作,除了包含 - O1 的优化内容外,还会进行诸如函数内联、循环展开、指令重排等复杂的优化

再回去看对应的字节码

1
2
3
4
5
void zyb.<init>(com.mi.fitness.netproxy.NetProxyManager) (dex_method_idx=20258)
  DEX CODE:
  0x0000: 7010 273f 0000 | invoke-direct {v0}, void java.lang.Object.<init>() // method@16167
  0x0003: 5b01 ffff      | iput-object v1, v0, Lcom/mi/fitness/netproxy/NetProxyManager; zyb.a // field@65535
  0x0005: 0e00           | return-void

zyb类的唯一成员变量a的索引刚好就是0xFFFF!

确认root cause

再去梳理这块的逻辑,发现DoAnalyseConstructor的大致逻辑是:

  • 只允许出现iput指令(或调用当前类重载的更多参数的构造函数,或父类的构造函数)
  • 最多支持分析一个构造函数里3条iput指令,所以有result->d.constructor_data.iputs2_field_index

参数iputs数组的大小是3,里面的field_index全部初始化成0xFFFF

并用这个0xFFFF表示iputs数组的那个位置还没有被使用过

TryPatternSubstitution里也跟着就用0xFFFF来统计constructor_data数组里有效的iput指令数量 然后就会错位的统计出zyb.<init>里的iput指令数量是0,于是最终导致zyb的构造函数被丢弃了

但其实0xFFFF是一个合法的field索引 (有效范围是0~655535)!

为啥Android U上没问题?

Inliner: Always try code pattern recognition.

inliner_always_try_code_pattern_recognition.png

我们知道bug出现在TryPatternSubstitution里,而U上是先执行MayInline,只有MayInline失败了才会执行TryPatternSubstitution,而MayInlineNetProxyManager.startzyb的进行内联优化不会失败,所以根本不用执行TryPatternSubstitution

TryPatternSubstitution的逻辑相比MayInline简单很多,它只会内联特别小的方法(<=2条字节码)Android V 应该是为了降低内联优化的处理耗时,加快dexopt整体速度,改成了先执行TryPatternSubstitution,然后就出问题了

为啥我手里机器的海外版以及自己编译的ROM版本都没问题?

App的同学在海外版 + google art apex 380版本能复现问题,而我之前的机器上google art apex 860版本没问题复现,然后我又试了卸载860版本的,将手机恢复到380版本的,也还是没法复现

1
2
3
4
5
6
7
➜ adb shell dumpsys package com.google.android.art
...
Path: /data/apex/decompressed/com.android.art@350820380.decompressed.apex
...
# 强制使用everything编译所有的字节码,就不需要依赖jit生成的profile了
➜ adb shell cmd package compile -f -m everything com.mmi.health
➜ adb shell oatdump --class-filter=com.mi.fitness.netproxy.NetProxyManager --method-filter=start --oat-file=/data/app/~~47Dsk4zrAeIhC-LE_wtVSg==/com.mmi.health-gEJCQRwdd2rNMikMbJ7ecQ==/oat/arm64/base.odex

汇编里有执行zyb的构造函数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
8: void com.mi.fitness.netproxy.NetProxyManager.start() (dex_method_idx=5996)
...
  CODE: (code_offset=0x0b62f160 size=1640)
  ...
    0x0b62f39c: b90033e0  str w0, [sp, #48]
    0x0b62f3a0: b00218c1  adrp x1, #+0x4319000 (addr 0xf948000)
    0x0b62f3a4: b940e421  ldr w1, [x1, #228]
    0x0b62f3a8: 34001c81  cbz w1, #+0x390 (addr 0xb62f738)
    0x0b62f3ac: aa0003f6  mov x22, x0
    0x0b62f3b0: aa0103e0  mov x0, x1
    0x0b62f3b4: f940ee7e  ldr lr, [tr, #472] ; pAllocObjectResolved
    0x0b62f3b8: d63f03c0  blr lr // new-instance v1, zyb
      StackMap[15] (native_pc=0xb62f3bc, dex_pc=0x43, register_mask=0x5400000, stack_mask=0b100000000000000000000000001100000000000)
        v0:r22 v4:r26 v5:r24 
    // zyb.<init>
    0x0b62f3bc: b9000818  str w24, [x0, #8]
    0x0b62f3c0: f9404a70  ldr x16, [tr, #144] ; card_table
    0x0b62f3c4: 530a7c11  lsr w17, w0, #10
    0x0b62f3c8: 38316a10  strb w16, [x16, x17]
    0x0b62f3cc: b0021061  adrp x1, #+0x420d000 (addr 0xf83c000)
    0x0b62f3d0: b94ab421  ldr w1, [x1, #2740]
    0x0b62f3d4: aa0003f9  mov x25, x0
    0x0b62f3d8: aa0103e0  mov x0, x1
    0x0b62f3dc: f940f27e  ldr lr, [tr, #480] ; pAllocObjectInitialized
    0x0b62f3e0: d63f03c0  blr lr

最终确认是打开everything编译时zyb的类状态kVerifiedNeedsAccessChecks,不满足IsVerified()检查,导致DoAnalyseConstructor失败,TryPatternSubstitution里的替换逻辑就不会执行了

注意
关于class verify状态的部分不在这里介绍,请查阅class_verifier.cc相关代码
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// inline_method_analyser.cc
bool DoAnalyseConstructor(const CodeItemDataAccessor* code_item,
                          ArtMethod* method,
                          /*inout*/ ConstructorIPutData (&iputs)[kMaxConstructorIPuts])
  // ...
  if (!method->GetDeclaringClass()->IsVerified() ||
      code_item->InsnsSizeInCodeUnits() > kMaxCodeUnits ||
      code_item->RegistersSize() > kMaxVRegs ||
      !Matcher::Match(code_item, kConstructorPattern)) {
    return false;
  }
  ...

然后注意到只有使用speed-profile编译,才会生成.art文件,See PackageDexOptimizer.java#702

再使用oatdump去查看.art文件里zyb类的状态,就发现它是是处于kVisiblyInitialized状态,满足IsVerified()

1
2
3
4
5
6
7
8
adb shell oatdump --app-image=/data/app/~~O_CMe2TlYNdY5FboyC-Skw==/com.mmi.health-rElfyfpC3SUCSSgr0f6ADQ==/oat/arm64/base.art --oat-file=/data/app/~~O_CMe2TlYNdY5FboyC-Skw==/com.mmi.health-rElfyfpC3SUCSSgr0f6ADQ==/oat/arm64/base.odex --no-disassemble 
...
  0x448c1288: java.lang.Class "zyb" (VisiblyInitialized)
    shadow$_klass_: 0x700016b0   Class: java.lang.Class
    shadow$_monitor_: 536870912 (0x20000000)
    accessFlags: 4113 (0x1011)
    classFlags: 0 (0x0)
    ...

我之前在复现的时候,zyb/NetProxyManager里的方法被调用的不够频繁,使用speed-profile编译并又不会对它们优化

得想办法让NetProxyManager里的方法变得起来:
  • 反复运行app的相关功能,让它俩变成hot method,再触发dexopt (手动操作起来太麻烦了)
  • 参考 performance/baselineprofiles,把上面手动的流程自动化,在apk里生成assets/dexopt/baseline.prof,安装apk时就会对baseline.prof里的类方法自动进行dexopt优化 (App的研发都建议了解下)
  • 【hack一下】修改libart里判断某个方法是否是hot的逻辑,强制去优化它们

然后再去触发dexopt问题就都可以复现了

修复方案

review/3309082:既然0xFFFF是合法的索引值,那其实不应该用它去作为未使用的field index的标识,这个改动已经被合入aosp主线分支了

稳定性相关的问题经常是调查过程很漫长,但是修复起来却很简单,就只有几行代码改动 😄