Crash on retrieving Gecko profile

RESOLVED FIXED in Firefox 40

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: esawin, Assigned: erahm)

Tracking

41 Branch
mozilla41
ARM
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox40 verified, firefox41 fixed)

Details

Attachments

(3 attachments)

Posted file stack.txt
STR:
1. Install Gecko Profiler add-on.
2. Connect to Android device via USB.
3. Press "Analyze" in the add-on to retrieve the profile.

Expected: Profile is retrieved and displayed.
Actual: Fennec crashes (see stack trace), no profile is being retrieved.

Workaround: Removing mUniqueStacks.reset() in ThreadProfile::StreamJSON prevents the crash.
Is this intermittent? And is it reproducible in an emulator?
I got this (or something with the same symptoms) in the browser toolbox today. Couldn't reproduce.
(In reply to PTO until 05/25 from comment #1)
> Is this intermittent? And is it reproducible in an emulator?

It is not intermittent and is reproducing on all Android devices I have tried. I haven't tested on an emulator.
(In reply to Eugen Sawin [:esawin] from comment #3)
> (In reply to PTO until 05/25 from comment #1)
> > Is this intermittent? And is it reproducible in an emulator?
> 
> It is not intermittent and is reproducing on all Android devices I have
> tried. I haven't tested on an emulator.

Could you please try and see? I don't have an actual device handy. Thanks!
Flags: qe-verify+
I guess std::string isn't memmove-safe on Android. Switch to nsCString instead;
I've confirmed locally that nsCString doesn't crash.
Attachment #8610944 - Flags: review?(mstange)
Attachment #8610944 - Flags: review?(mstange) → review+
Assignee: nobody → shu
Duplicate of this bug: 1168473
Duplicate of this bug: 1165859
I just built b2g with inbound and the problem seems to still exist. I'm relatively new to building my own B2G so there is a chance I have done something wrong. Is anyone able to clarify this has actually made profiling devices via WebIDE-devtools possible?
Comment on attachment 8610944 [details] [diff] [review]
Use nsCString instead of std::string in FrameKey in the profiler.

Approval Request Comment
[Feature/regressing bug #]: 1167252, the new performance tool
[User impact if declined]: Won't ship the performance tool
[Describe test coverage new/current, TreeHerder]: There are try pushes in Bug 1167252 with all patches needing uplift
[Risks and why]: Requesting uplift for the accumulated changes in the performance tool since the 40 merge date, so these changes haven't had the full 6 weeks to bake.  Risks are generally contained within devtools, specifically within the performance panel.
[String/UUID change made/needed]: None
Attachment #8610944 - Flags: approval-mozilla-aurora?
@wilsonpage wcosta confirmed this fixed the issue on B2G -- I suspect you may have had trouble with this as the patch has not yet made it's way to b2g-inbound. Not sure how often those get updated, but definitely keeping an eye out for this issue
https://hg.mozilla.org/mozilla-central/rev/1118d0695452
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Note: I had verbal confirmation for these uplifts from Sylvestre even before he's flagged them as a+.  See https://bugzilla.mozilla.org/show_bug.cgi?id=1167252#c26
I just flashed m-c and the bug still exists.

VIDEO:

https://www.youtube.com/watch?v=-UWO-3_hrDQ

DEVICE VERSIONS:

Build ID               20150528010203
Gaia Revision          f84772d2640d825eaf3f9fcd906b4756e19e8a3f
Gaia Date              2015-05-26 12:47:07
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/baa9c64fea6f
Gecko Version          41.0a1
Device Name            flame
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.cltbld.20150212.043653
Firmware Date          Thu Feb 12 04:37:04 EST 2015
Bootloader             L1TC000118D0
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 8610944 [details] [diff] [review]
Use nsCString instead of std::string in FrameKey in the profiler.

Change approved to skip one train as part of the spring campaign.
Attachment #8610944 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Wilson Page [:wilsonpage] from comment #15)
> I just flashed m-c and the bug still exists.
> 
> VIDEO:
> 
> https://www.youtube.com/watch?v=-UWO-3_hrDQ
> 
> DEVICE VERSIONS:
> 
> Build ID               20150528010203
> Gaia Revision          f84772d2640d825eaf3f9fcd906b4756e19e8a3f
> Gaia Date              2015-05-26 12:47:07
> Gecko Revision        
> https://hg.mozilla.org/mozilla-central/rev/baa9c64fea6f
> Gecko Version          41.0a1
> Device Name            flame
> Firmware(Release)      4.4.2
> Firmware(Incremental)  eng.cltbld.20150212.043653
> Firmware Date          Thu Feb 12 04:37:04 EST 2015
> Bootloader             L1TC000118D0

Hi, can you confirm if you have this commit [1] in your repo?

[1] http://hg.mozilla.org/integration/b2g-inbound/rev/1118d0695452
(In reply to Wander Lairson Costa [:wcosta] from comment #17)
> (In reply to Wilson Page [:wilsonpage] from comment #15)
> > I just flashed m-c and the bug still exists.
> > 
> > VIDEO:
> > 
> > https://www.youtube.com/watch?v=-UWO-3_hrDQ
> > 
> > DEVICE VERSIONS:
> > 
> > Build ID               20150528010203
> > Gaia Revision          f84772d2640d825eaf3f9fcd906b4756e19e8a3f
> > Gaia Date              2015-05-26 12:47:07
> > Gecko Revision        
> > https://hg.mozilla.org/mozilla-central/rev/baa9c64fea6f
> > Gecko Version          41.0a1
> > Device Name            flame
> > Firmware(Release)      4.4.2
> > Firmware(Incremental)  eng.cltbld.20150212.043653
> > Firmware Date          Thu Feb 12 04:37:04 EST 2015
> > Bootloader             L1TC000118D0
> 
> Hi, can you confirm if you have this commit [1] in your repo?
> 
> [1] http://hg.mozilla.org/integration/b2g-inbound/rev/1118d0695452

My device reports itself at Gecko revision baa9c64fea6f. From the m-c log, it appears this revision falls after revision 1118d0695452 (when this patch landed on m-c). If that is the case, then yes, my device does contain this commit :)

As discussed yesterday on IRC (and in comment 9), I flashed my device with mozilla-inbound which also contained this commit. Testing todays m-c B2G build clarified yesterday's findings.
(In reply to Wilson Page [:wilsonpage] from comment #18)
> 
> My device reports itself at Gecko revision baa9c64fea6f. From the m-c log,
> it appears this revision falls after revision 1118d0695452 (when this patch
> landed on m-c). If that is the case, then yes, my device does contain this
> commit :)
> 
> As discussed yesterday on IRC (and in comment 9), I flashed my device with
> mozilla-inbound which also contained this commit. Testing todays m-c B2G
> build clarified yesterday's findings.

Do you have a stack for the crash?
(In reply to Shu-yu Guo [:shu] from comment #19)
> (In reply to Wilson Page [:wilsonpage] from comment #18)
> > 
> > My device reports itself at Gecko revision baa9c64fea6f. From the m-c log,
> > it appears this revision falls after revision 1118d0695452 (when this patch
> > landed on m-c). If that is the case, then yes, my device does contain this
> > commit :)
> > 
> > As discussed yesterday on IRC (and in comment 9), I flashed my device with
> > mozilla-inbound which also contained this commit. Testing todays m-c B2G
> > build clarified yesterday's findings.
> 
> Do you have a stack for the crash?

I'm not familiar with retrieving crash dumps and couldn't seem to find the .dmp file following these instructions [1]. Although :wcosta has been able to reproduce, so he should be able to provide them.

[1] https://developer.mozilla.org/en-US/docs/Debugging_a_minidump
I tried latest eng build in pvtbuild and indeed it crashes. I then cloned m-c, built and it doesn't crash, although I can't profile yet due to Bug 1168784. My only guess is that the build in pvtbuilds does not contain the fix yet.
(In reply to Wander Lairson Costa [:wcosta] from comment #21)
> I tried latest eng build in pvtbuild and indeed it crashes. I then cloned
> m-c, built and it doesn't crash, although I can't profile yet due to Bug
> 1168784. My only guess is that the build in pvtbuilds does not contain the
> fix yet.

You can use WebIDE and the devtools performance tool to record profiles on b2g without using the profile.sh script and gecko profiler, as well -- let me know if you need any help with recreating it via that
I've repro'd this w/ m-c and got a SIGBUS.

STR:
  1. Build against m-c (standard build)
  2. Launch clock
  3. Start profiling clock: |./profile.sh start -p 9030 -t Compositor,GeckoMain -f js,leaf,stackwalk -e 2000000 -i 1|
  4. Capture profile: |./profile.sh capture|
  5. Note clock crashes

In gdb:
> Program received signal SIGBUS, Bus error.
> operator[] (aIndex=1, this=0xb09b76e4) at /Users/ericrahm/dev/mozilla-central/tools/profiler/ProfileEntry.cpp:658
> 658             ProfileEntry readAheadEntry = mEntries[readAheadPos];

We believe this is due to unaligned access on ARM. As it turns out ProfileEntry is packed, so this does make some sense. Removing the |#pragma pack| clears up the sigbus for me.
Attachment #8612563 - Flags: review?(shu)
Assignee: shu → erahm
Status: REOPENED → ASSIGNED
Comment on attachment 8612563 [details] [diff] [review]
Don't pack ProfileEntry on ARM

Review of attachment 8612563 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch!

::: tools/profiler/ProfileEntry.h
@@ +24,5 @@
>  #include "mozilla/UniquePtr.h"
>  
>  class ThreadProfile;
>  
> +// NB: Packing this strucure has been shown to cause SIGBUS issues on ARM.

Nit: typo in structure
Attachment #8612563 - Flags: review?(shu) → review+
How come this wasn't a problem before?
(In reply to Markus Stange [:mstange] from comment #27)
> How come this wasn't a problem before?

That's what I was wondering too, since it's been this way for 2 years? But indeed, without packing makes the SIGBUS go away.
I suppose it's possible that my rewrite of profile streaming perturbed something such that GCC started generating different code?
(In reply to Shu-yu Guo [:shu] from comment #29)
> I suppose it's possible that my rewrite of profile streaming perturbed
> something such that GCC started generating different code?

I guess... I can't come up with a better explanation.
Latest patch seems to fix crashes for me :)
https://hg.mozilla.org/mozilla-central/rev/b65d8a1df422
Status: ASSIGNED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
No longer blocks: 1168473
I tried reproducing this issue using an Android 4.4.4 device via WebIDE, but each time I started recording from Desktop, a permanent hang would occur on the device. I'm pretty sure there's something wrong with my test environment.

Wilson, could you please confirm this fix on the latest Aurora [1] as well? I'd really appreciate it.


[1] https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora/
Flags: qe-verify+ → needinfo?(wilsonpage)
Devtools performance tool works for me in latest Aurora.
Flags: needinfo?(wilsonpage)
(In reply to Wilson Page [:wilsonpage] from comment #34)
> Devtools performance tool works for me in latest Aurora.

Thanks a lot for helping out here! Setting status and flags accordingly.
Status: RESOLVED → VERIFIED
I removed this by accident, bug 1176263 tracked restoring it.
Reopening as symptoms have reappeared. I can no longer profile using devtools or ./profile.sh the latter crashes with:

~/B2G (master) $ ./profile.sh capture
Signaling Profiled Processes: 1612 16021
Stabilizing 1612 b2g ...
Pulling /data/local/tmp/profile_0_1612.txt into profile_1612_b2g.txt
Adding symbols to profile_1612_b2g.txt and creating profile_1612_b2g.sym ...
Traceback (most recent call last):
  File "./scripts/profile-symbolicate.py", line 495, in <module>
    main()
  File "./scripts/profile-symbolicate.py", line 460, in main
    profile = json.load(open(args.filename, "rb"))
  File "/usr/lib/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 184310 (char 184310)
Stabilizing 16021 Contacts protot ...
Pulling /data/local/tmp/profile_2_16021.txt into profile_16021_Contactsprotot.txt
Adding symbols to profile_16021_Contactsprotot.txt and creating profile_16021_Contactsprotot.sym ...
Traceback (most recent call last):
  File "./scripts/profile-symbolicate.py", line 495, in <module>
    main()
  File "./scripts/profile-symbolicate.py", line 460, in main
    profile = json.load(open(args.filename, "rb"))
  File "/usr/lib/python2.7/json/__init__.py", line 278, in load
    **kw)
  File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 262141 (char 262141)
Merging profile:  profile_1612_b2g.sym profile_16021_Contactsprotot.sym
./gecko/tools/profiler/merge-profiles.py profile_1612_b2g.sym profile_16021_Contactsprotot.sym
Traceback (most recent call last):
  File "./gecko/tools/profiler/merge-profiles.py", line 107, in <module>
    MergeProfiles(sys.argv[1:])
  File "./gecko/tools/profiler/merge-profiles.py", line 32, in MergeProfiles
    fp = open(fname, "r")
IOError: [Errno 2] No such file or directory: 'profile_1612_b2g.sym'

Results: profile_captured.sym
Removing old profile files (from device) ... done
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
(In reply to Wilson Page [:wilsonpage] from comment #39)
> Reopening as symptoms have reappeared. I can no longer profile using
> devtools or ./profile.sh the latter crashes with:
> 
> ~/B2G (master) $ ./profile.sh capture
> Signaling Profiled Processes: 1612 16021
> Stabilizing 1612 b2g ...
> Pulling /data/local/tmp/profile_0_1612.txt into profile_1612_b2g.txt
> Adding symbols to profile_1612_b2g.txt and creating profile_1612_b2g.sym ...
> Traceback (most recent call last):
>   File "./scripts/profile-symbolicate.py", line 495, in <module>
>     main()
>   File "./scripts/profile-symbolicate.py", line 460, in main
>     profile = json.load(open(args.filename, "rb"))
>   File "/usr/lib/python2.7/json/__init__.py", line 278, in load
>     **kw)
>   File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
>     return _default_decoder.decode(s)
>   File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>   File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
>     obj, end = self.scan_once(s, idx)
> ValueError: Unterminated string starting at: line 1 column 184310 (char
> 184310)
> Stabilizing 16021 Contacts protot ...
> Pulling /data/local/tmp/profile_2_16021.txt into
> profile_16021_Contactsprotot.txt
> Adding symbols to profile_16021_Contactsprotot.txt and creating
> profile_16021_Contactsprotot.sym ...
> Traceback (most recent call last):
>   File "./scripts/profile-symbolicate.py", line 495, in <module>
>     main()
>   File "./scripts/profile-symbolicate.py", line 460, in main
>     profile = json.load(open(args.filename, "rb"))
>   File "/usr/lib/python2.7/json/__init__.py", line 278, in load
>     **kw)
>   File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
>     return _default_decoder.decode(s)
>   File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>   File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
>     obj, end = self.scan_once(s, idx)
> ValueError: Unterminated string starting at: line 1 column 262141 (char
> 262141)
> Merging profile:  profile_1612_b2g.sym profile_16021_Contactsprotot.sym
> ./gecko/tools/profiler/merge-profiles.py profile_1612_b2g.sym
> profile_16021_Contactsprotot.sym
> Traceback (most recent call last):
>   File "./gecko/tools/profiler/merge-profiles.py", line 107, in <module>
>     MergeProfiles(sys.argv[1:])
>   File "./gecko/tools/profiler/merge-profiles.py", line 32, in MergeProfiles
>     fp = open(fname, "r")
> IOError: [Errno 2] No such file or directory: 'profile_1612_b2g.sym'
> 
> Results: profile_captured.sym
> Removing old profile files (from device) ... done

Wilson, can you provide further details? What type of build are you testing, what date is it from, can you grab a snippet of |adb logcat| from when you call |profile.sh|?
Build: Yesterday's B2G build
Logcat: https://pastebin.mozilla.org/8838132
Comment 37 suggests this will be fixed when bug 1176263 lands.
Depends on: 1176263
(In reply to Wilson Page [:wilsonpage] from comment #41)
> Build: Yesterday's B2G build

So a pvt flame-kk m-c eng build?

(In reply to Wilson Page [:wilsonpage] from comment #42)
> Comment 37 suggests this will be fixed when bug 1176263 lands.

AFAICT that's been on m-c since 2015-06-24 [1].

[1] http://hg.mozilla.org/mozilla-central/rev/ae84f9adafc7 (look at the push date)
We lost Shu's original patch to switch to nsCString. I cloned this to bug 1178961 to fix that.
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.