Closed
Bug 1291768
Opened 6 years ago
Closed 6 years ago
Firefox for Android Crashes when performance recording is started
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox48 wontfix, firefox49 fixed, fennec49+, firefox50 fixed, firefox51 fixed)
RESOLVED
FIXED
Firefox 51
People
(Reporter: tmcrodrigues, Assigned: glandium, NeedInfo)
References
Details
Attachments
(1 file)
58 bytes,
text/x-review-board-request
|
froydnj
:
review+
lizzard
:
approval-mozilla-aurora+
lizzard
:
approval-mozilla-beta+
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36 Firefox for Android Steps to reproduce: - Open Firefox 48 on Mac OSX - Open Firefox 48 on Android - Setup remote debugging via USB - Open any website on Firefox for Android - Open performance profiler on inspector window - Start recording performance Actual results: Firefox on Android just crashes Expected results: It doesn't crash and I can use the performance profiler.
Comment 1•6 years ago
|
||
Tiago: Can you please provide the crash report if one is generated? You can find it in about:crashes? Thanks.
Flags: needinfo?(tmcrodrigues)
Reporter | ||
Comment 2•6 years ago
|
||
Here: https://crash-stats.mozilla.com/report/index/a6a73728-f4de-43ed-a60b-3cab42160803
Comment 3•6 years ago
|
||
Looks to be a dupe of Bug 1241536
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(tmcrodrigues)
Resolution: --- → DUPLICATE
Duplicate of bug: 1241536
Reporter | ||
Comment 4•6 years ago
|
||
Yeah, I noticed that now on the crash report (which I hadn't seen before). The keywords I looked for didn't really lead me to that bug before :(
Bug 1241536 is something different (no crash, just doesn't work). I can reproduce this one too, though.
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: DUPLICATE → ---
Component: Untriaged → General
Product: Firefox → Firefox for Android
tracking-fennec: --- → ?
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox50:
--- → affected
status-firefox51:
--- → affected
Mike this seems linker related
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 7•6 years ago
|
||
So, the address of the crash is 0xb4d10000, which, according to the module data in the raw dump is for libLLVM.so, but /proc/self/maps says libLLVM.so starts at 0xb4d34000, and that the memory at 0xb4d10000 is the placeholder between the two PT_LOADs of libcneconn.so. IOW, that doesn't match. The likely problem is that /system/lib/libLLVM.so has some funky elf headers. Can one of the people that can reproduce the bug pull that file from their device, and attach it to this bug.
Flags: needinfo?(mh+mozilla)
Reporter | ||
Comment 8•6 years ago
|
||
Couldn't attach the file directly because it's too large for bugzilla attachments, but you can get it from my Dropbox here: https://www.dropbox.com/s/s0bsmv3rh13sybx/libLLVM.so?dl=0 This is from this device https://crash-stats.mozilla.com/report/index/a6a73728-f4de-43ed-a60b-3cab42160803
Assignee | ||
Comment 9•6 years ago
|
||
This is exactly what I suspected: $ readelf -l ~/Downloads/libLLVM.so | grep '\(Type\|LOAD\)' Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x000000 0x00024000 0x00024000 0xa10e48 0xa10e48 R E 0x1000 LOAD 0xa114d0 0x00a364d0 0x00a364d0 0x49304 0x50c05 RW 0x1000 Most libraries will have both Offset and VirtAddr at 0x0000000 for the first LOAD. That's the assumption our linker does. The funny thing is that the address the system linker puts in the structures it gives to debuggers (which we use to find the libraries it has loaded) is for a VirtAddr of 0, which is why we're trying to read at 0xb4d10000 (which we got from there), and why /proc/self/maps places libLLVM.so at 0xb4d34000, which is 0x24000 further. This type of situation is what the combination of mincore and Elf::Ehdr::validate are trying to avoid hitting, but here we're unlucky because the zone we're reading *is* mapped, *and* not readable. It also doesn't help that the system linker doesn't put the full path to the library in its debugger-targetted structures (which is why our linker is trying to read the elf header from the memory in the first place). A way around this is to try to write to a pipe from that address, which will not segfault, but instead return a EFAULT error.
Assignee: nobody → mh+mozilla
Comment hidden (mozreview-request) |
tracking-fennec: ? → 49+
![]() |
||
Comment 11•6 years ago
|
||
mozreview-review |
Comment on attachment 8782175 [details] Bug 1291768 - Avoid SIGSEGV trying to read ELF headers of libraries with a non-null base virtual address. https://reviewboard.mozilla.org/r/72390/#review70310 r=me; I don't think it's absolutely necessary to get the error handling right here, but I'm sure you can appreciate how much extra work POSIX wants you to do. ::: mozglue/linker/ElfLoader.cpp:157 (Diff revision 1) > + do { > + ret = read(read_fd, raw_ehdr, sizeof(raw_ehdr)); > + } while (ret == -1 && errno == EINTR); The amount of error checking done here to be POSIX-ly correct is...stunning. But I'm not sure if it's actually sufficient. The POSIX page for `read` (http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html) has these rage-inducing paragraphs: > Earlier versions of this standard allowed two very different behaviors with regard to the handling of interrupts. In order to minimize the resulting confusion, it was decided that POSIX.1-2008 should support only one of these behaviors. Historical practice on AT&T-derived systems was to have read() and write() return -1 and set errno to [EINTR] when interrupted after some, but not all, of the data requested had been transferred. However, the US Department of Commerce FIPS 151-1 and FIPS 151-2 require the historical BSD behavior, in which read() and write() return the number of bytes actually transferred before the interrupt. If -1 is returned when any data is transferred, it is difficult to recover from the error on a seekable device and impossible on a non-seekable device. Most new implementations support this behavior. The behavior required by POSIX.1-2008 is to return the number of bytes transferred. > POSIX.1-2008 does not specify when an implementation that buffers read()s actually moves the data into the user-supplied buffer, so an implementation may choose to do this at the latest possible moment. Therefore, an interrupt arriving earlier may not cause read() to return a partial byte count, but rather to return -1 and set errno to [EINTR]. > Consideration was also given to combining the two previous options, and setting errno to [EINTR] while returning a short count. However, not only is there no existing practice that implements this, it is also contradictory to the idea that when errno is set, the function responsible shall return -1. And the Linux man page seems to concur: > On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of-file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. In this case, it is left unspecified whether the file position (if any) changes. AIUI, the same logic applies to write as well. That is, you cannot assume that `read` or `write` does everything in one go, because it might have been interrupted in the middle (and still return success), so you have to loop in the non-error case as well. I think to be fully correct here and above, you would need to track how much you've actually read/written, continuing when `nbytes != sizeof(raw_ehdr) || (ret == -1 && errno == EINTR)` or somesuch, depending on how you want to write the loop.
Attachment #8782175 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 12•6 years ago
|
||
Thankfully, that's not true for write: http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html > Complete/partial/deferred: A write request: > (...) > ret = write(fildes, buf, nbyte); > > may return: > > (...) > Partial > ret<nbyte > > This shall never happen if nbyte<= {PIPE_BUF}. If it does happen (with nbyte> {PIPE_BUF}), this volume of POSIX.1-2008 does not guarantee atomicity, even if ret<= {PIPE_BUF}, because atomicity is guaranteed according to the amount requested, not the amount written.
![]() |
||
Comment 13•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #12) > Thankfully, that's not true for write: > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html Ah, very nice. That's what I get for not reading both spec pages.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 15•6 years ago
|
||
Comment on attachment 8782175 [details] Bug 1291768 - Avoid SIGSEGV trying to read ELF headers of libraries with a non-null base virtual address. Would you mind looking at the interdiff?
Attachment #8782175 -
Flags: review+ → review?(nfroyd)
![]() |
||
Comment 16•6 years ago
|
||
Comment on attachment 8782175 [details] Bug 1291768 - Avoid SIGSEGV trying to read ELF headers of libraries with a non-null base virtual address. Looks good, thank you for the comments.
Attachment #8782175 -
Flags: review?(nfroyd) → review+
Comment 17•6 years ago
|
||
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/b9eeff3c5d37 Avoid SIGSEGV trying to read ELF headers of libraries with a non-null base virtual address. r=froydnj
Comment 18•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b9eeff3c5d37
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
Comment 19•6 years ago
|
||
Mike, I guess we want to uplift that to aurora and probably beta, could you fill the uplift request? Thanks
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 20•6 years ago
|
||
James, do we want to uplift this?
Flags: needinfo?(mh+mozilla) → needinfo?(snorp)
I think we do if you don't believe it's too risky
Flags: needinfo?(snorp)
Updated•6 years ago
|
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 22•6 years ago
|
||
Comment on attachment 8782175 [details] Bug 1291768 - Avoid SIGSEGV trying to read ELF headers of libraries with a non-null base virtual address. Approval Request Comment [User impact if declined]: Crash when enabling performance recording in some cases on some devices. Technically, this has affected Fennec for as long as the profiler has supported symbolication on android (bug 783331 for the linker part that introduced this bug) [Describe test coverage new/current, TreeHerder]: STR in comment 0. [Risks and why]: The worst that can happen is performance recording crashing or freezing the browser on more devices than what was affected initially, if the patch somehow doesn't work as it's expected to (but I expect the chances of that happening are rather slim). The patch doesn't affect anything else than the performance recording feature/profiler, so there is no risk for normal operation of the browser. [String/UUID change made/needed]: None
Flags: needinfo?(mh+mozilla)
Attachment #8782175 -
Flags: approval-mozilla-beta?
Attachment #8782175 -
Flags: approval-mozilla-aurora?
Hello Tiago, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(tmcrodrigues)
Reporter | ||
Comment 24•6 years ago
|
||
Hi there. A coworker of mine has verified it and already managed to debug the issue we were after. Thanks a lot for the fix :)
Comment on attachment 8782175 [details] Bug 1291768 - Avoid SIGSEGV trying to read ELF headers of libraries with a non-null base virtual address. Verified on nightly, let's uplift to aurora and beta so the profiler will avoid some crashes.
Attachment #8782175 -
Flags: approval-mozilla-beta?
Attachment #8782175 -
Flags: approval-mozilla-beta+
Attachment #8782175 -
Flags: approval-mozilla-aurora?
Attachment #8782175 -
Flags: approval-mozilla-aurora+
Comment 26•6 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/1954ae1ff39c
Comment 27•6 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/2afe1869db18
Updated•1 year ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•