Closed
Bug 1083461
Opened 10 years ago
Closed 10 years ago
Segmentation fault on Wikipedia search page, if Firefox was compiled with LLVM 3.5.0
Categories
(Core :: CSS Parsing and Computation, defect)
Tracking
()
RESOLVED
FIXED
mozilla36
People
(Reporter: Kaorukun, Assigned: dbaron)
References
()
Details
(Keywords: crash)
Attachments
(6 files, 1 obsolete file)
6.86 KB,
text/plain
|
Details | |
6.38 KB,
text/plain
|
Details | |
3.05 KB,
text/plain
|
Details | |
6.07 KB,
text/plain
|
Details | |
8.25 KB,
patch
|
bzbarsky
:
review+
bkerensa
:
approval-mozilla-esr31+
|
Details | Diff | Splinter Review |
5.19 KB,
patch
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:33.0) Gecko/20100101 Firefox/33.0 Build ID: 20141015214340 Steps to reproduce: 1. Compile Firefox 33.0 from source using LLVM-3.5.0 on a x86_64 Linux platform 2. Visit Wikipedia's fulltext search page : http://en.wikipedia.org/w/index.php?search=&title=Special%3ASearch&fulltext=Search 3. Click anywhere in the page, including the Firefox' navigation toolbar, or press Alt-Home to try to return to homepage or hover with the mouse over the Search button and then move the mouse outside of the button System: Gentoo Linux x86_64, kernel 3.17.0, all packages updated to the latest stable versions. Actual results: In step 3, firefox immediately crashes with a segmentation fault The issue is always reproducible. The issue is reproducible when using a fresh Firefox profile. I tested by creating a new user account, logged in and started Firefox, i.e. without any add-ons, themes or even any modified settings. The issue only occurs when Firefox was compiled with llvm-3.5.0. When compiled with llwm-3.4.2, everything works fine. I did not try gcc. The issue does not occur with the binary firefox package (www-client/firefox-bin-33.0 in Gentoo Portage) The issue occurs with a different version of Firefox as well - I tried 31.1. When compiled with llvm-3.5.0 it crashes, when compiled with llvm-3.4.2 it works without issues. In case of v31.1 following message appears on the terminal when crash occurs, but not with v33: [5046] ###!!! ABORT: Aborting on channel error.: file /var/tmp/portage/www-client/firefox-31.1.1/work/mozilla-esr31/ipc/glue/MessageChannel.cpp, line 1533 [5046] ###!!! ABORT: Aborting on channel error.: file /var/tmp/portage/www-client/firefox-31.1.1/work/mozilla-esr31/ipc/glue/MessageChannel.cpp, line 1533 Expected results: No segmentation fault or any other issue should occur.
Reporter | ||
Comment 1•10 years ago
|
||
My layman guess is that a tiny animation implemented in Wikipedias search entry field or a similar animation on the "Search" button is causing the issue. The entry field has a blue colored bar when it has focus. Once you click anywhere in the window, the field loses focus, playing brief animation to retract this bar and in that moment the crash occurs. In case when I hover with the mouse over the Search button, the crash occurs only when the mouse cursor leaves the button. Issue with Javascript, JIT?
Reporter | ||
Updated•10 years ago
|
Comment 2•10 years ago
|
||
Thank you for the bug report! > In step 3, firefox immediately crashes with a segmentation fault Can you please attach a stack trace to this bug using https://bugzilla.mozilla.org/attachment.cgi?bugid=1083461&action=enter ?
Updated•10 years ago
|
Flags: needinfo?(Kaorukun)
Reporter | ||
Comment 3•10 years ago
|
||
Flags: needinfo?(Kaorukun)
Reporter | ||
Comment 4•10 years ago
|
||
Reporter | ||
Comment 5•10 years ago
|
||
Disassembled fragment from nsCSSValue.o which was compiled using LLVM 3.4.2, showing the assembly code for the function bool nsCSSValueList::operator==(const nsCSSValueList& aOther) const
Reporter | ||
Comment 6•10 years ago
|
||
Disassembled fragment from nsCSSValue.o which was compiled using LLVM 3.5.0, showing the assembly code for the function bool nsCSSValueList::operator==(const nsCSSValueList& aOther) const
Reporter | ||
Comment 7•10 years ago
|
||
I think I figured out what happened here. The affected code is located in layout/style/nsCSSValue.cpp . The segfault happens because nsCSSValue::operator== receives a null pointer as &aOther reference, as you can see from the attachment 8506025 [details]. I disassembled the code of nsCSSValueList::operator== which calls nsCSSValue::operator== and found that the resulting x86 code looks quite differently when compiled with LLVM 3.5.0. If you go through the asm code in the attachment 8506319 [details] and assume that it was called with this != NULL and aOther == NULL, you can see that the call to nsCSSValue::operator== (from "if (p1->mValue != p2->mValue)") happens without checking the for loop's condition "p1 && p2" first. So, it seems like a bug in LLVM, I see nothing wrong with Mozilla's code of nsCSSValueList::operator==. I'll submit a bug to LLVM team. This one can be closed I guess.
Comment 8•10 years ago
|
||
Thank you for hunting that down! Marking invalid, but please reopen if it turns out there's an issue on our side after all.
Status: UNCONFIRMED → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
Reporter | ||
Comment 9•10 years ago
|
||
Submitted bug at LLVM: http://llvm.org/bugs/show_bug.cgi?id=21308
Comment 10•10 years ago
|
||
Hi, LLVM developer here. This bug is indeed valid, please reopen it. See llvm.org/PR21308#c8 for details why. We also looked into why the clang warning on "&ref == nullptr" didn't fire, and the answer is that there were too many levels of indirection for our compiler warning to work through. It would slow down compiles too much. (It's about as complicated as the -Wuninitialized checker, which is extremely costly at compile time and we really don't want anything more like that.) The clang static analyzer probably would have caught this, though I haven't checked whether it does.
Comment 11•10 years ago
|
||
Nick, thanks for the analysis!
Status: RESOLVED → REOPENED
Component: General → CSS Parsing and Computation
Ever confirmed: true
Flags: needinfo?(dbaron)
Resolution: INVALID → ---
Comment 12•10 years ago
|
||
So presumably the issue is here: 3850 case eUnit_Dasharray: 3851 case eUnit_Filter: 3852 case eUnit_Shadow: 3853 case eUnit_BackgroundPosition: 3854 return *mValue.mCSSValueList == *aOther.mValue.mCSSValueList; in that one of the values has a null mCSSValueList for some reason.
Comment 13•10 years ago
|
||
Oh, that's in StyleAnimationValue::operator==.
Assignee | ||
Comment 14•10 years ago
|
||
Indeed. From the header file: eUnit_Dasharray, // nsCSSValueList* (never null) eUnit_Filter, // nsCSSValueList* (may be null) eUnit_Shadow, // nsCSSValueList* (may be null) eUnit_Transform, // nsCSSValueList* (never null) eUnit_BackgroundPosition, // nsCSSValueList* (never null) This is a regression from https://hg.mozilla.org/mozilla-central/rev/ae6c685d5a37 .
Blocks: 569719
Flags: needinfo?(dbaron)
Assignee | ||
Comment 15•10 years ago
|
||
I think all 4 callers converted from |Equal| methods in that patch are probably wrong.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → dbaron
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 16•10 years ago
|
||
Assignee | ||
Comment 17•10 years ago
|
||
Does this patch help? (I haven't actually tested it beyond compiling it, since right now the network connection between my laptop and the machine on which I compile is rather slow.)
Flags: needinfo?(Kaorukun)
Reporter | ||
Comment 18•10 years ago
|
||
Thank you! I could not successfully apply the patch to the v33.0 source. I guess you're working on a newer code version, and 33.0 is missing a few surrounding lines, so I reapplied the changes manually. It compiled and worked well, no more segfaults on Wikipedia search page. I'll attach the resulting patch for the v33.0 source, just for reference.
Flags: needinfo?(Kaorukun)
Reporter | ||
Comment 19•10 years ago
|
||
This is a copy of previously submitted patch from David Baron, only converted to be applicable to v33.0 source. I was able to successfully compile and test Firefox v33.0 with it. It eliminates the reported segfault issue.
Reporter | ||
Comment 20•10 years ago
|
||
This is a copy of previously submitted patch from David Baron, only converted to be applicable to v33.0 source. I was able to successfully compile and test Firefox v33.0 with it. It eliminates the reported segfault issue. I'm terribly sorry, I submitted wrong file by mistake previously
Attachment #8507502 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8507471 -
Flags: review?(bzbarsky)
Comment 21•10 years ago
|
||
Comment on attachment 8507471 [details] [diff] [review] Convert nsCSSValue{,Pair}List::operator== back to a static Equals method so that it can be validly called on null pointers Maybe add a comment about why the operator== and operator!= are deleted? r=me
Attachment #8507471 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 22•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/983af7fa41f5
Comment 23•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/983af7fa41f5
Status: ASSIGNED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Target Milestone: --- → mozilla36
Comment 24•9 years ago
|
||
Is there a workaround for this? Maybe via blocking something, or an addon? Due to corporate policy I won't be able to upgrade to a newer version of the browser for years, and this seems to happen a couple of times every day on various websites, which is quite annoying. Thanks.
Comment 25•9 years ago
|
||
Saurabh, you have a corporate policy that is forcing you to use a non-default compile of Firefox?
Comment 26•9 years ago
|
||
It's the esr version 31.3.0. Has the same bug.
Comment 27•9 years ago
|
||
Hmm. Maybe we should backport this to ESR, if we're building that with new enough clang...
Flags: needinfo?(dbaron)
Assignee | ||
Comment 28•9 years ago
|
||
Comment on attachment 8507471 [details] [diff] [review] Convert nsCSSValue{,Pair}List::operator== back to a static Equals method so that it can be validly called on null pointers [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixes optimization bug with newer compilers that ome uer seem to be hitting. User impact if declined: crashes on builds compiled with new clang versions (not sure if we do that or if somebody else is compiling them) Fix Landed on Version: 36 Risk to taking this patch (and alternatives if risky): low risk String or UUID changes made by this patch: none See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(dbaron)
Attachment #8507471 -
Flags: approval-mozilla-esr31?
Updated•9 years ago
|
status-firefox-esr31:
--- → affected
tracking-firefox-esr31:
--- → 36+
Updated•9 years ago
|
Attachment #8507471 -
Flags: approval-mozilla-esr31? → approval-mozilla-esr31+
Assignee | ||
Comment 29•9 years ago
|
||
https://hg.mozilla.org/releases/mozilla-esr31/rev/db219b061ce2
Assignee | ||
Updated•9 years ago
|
Updated•9 years ago
|
status-firefox36:
--- → fixed
Updated•9 years ago
|
Flags: qe-verify-
Comment 30•9 years ago
|
||
I have now esr version 31.6.0, and this bug is not fixed there. I am not sure from these comments what esr version is expected to have the fix.
Comment 32•9 years ago
|
||
No information in stack trace, there's a bunch of ?? from libxulrunner.so. The error message is: [8618] ###!!! ABORT: Aborting on channel error.: file /builddir/build/BUILD/firefox-31.6.0/mozilla-esr31/ipc/glue/MessageChannel.cpp, line 1533 [8618] ###!!! ABORT: Aborting on channel error.: file /builddir/build/BUILD/firefox-31.6.0/mozilla-esr31/ipc/glue/MessageChannel.cpp, line 1533 Here's a simple url that triggers the bug: https://www.xcdsystem.com/iie2015 (just visiting the site causes firefox to crash with the above message).
Assignee | ||
Comment 33•9 years ago
|
||
Could you try either (a) running the stack trace through tools/rb/fix_linux_stack.py or (b) attaching to the process with gdb before the crash (where 4828 is the PID): $ gdb (gdb) attach 4828 and then getting a stack trace from gdb once the crash happens: (gdb) backtrace
Flags: needinfo?(saurabh)
Comment 34•9 years ago
|
||
As I said the stack trace shows just a bunch of ?? from libxul.so. I downloaded fix_linux_stack.py from here: https://dxr.mozilla.org/mozilla-central/source/tools/rb/fix_linux_stack.py but I have no idea how to use it. It does not have a help flag or anything.
Assignee | ||
Comment 35•9 years ago
|
||
Where are you getting this build? I was under the impression it was a build you had built yourself.
Flags: needinfo?(saurabh)
Comment 36•9 years ago
|
||
From https://ftp.mozilla.org/pub/mozilla.org/firefox/releases/
Flags: needinfo?(saurabh)
Assignee | ||
Comment 37•9 years ago
|
||
Could you go to about:crashes, then, click a link to one or more crash reports for the crash you're seeing, and paste the URLs into this bug?
Flags: needinfo?(saurabh)
Comment 38•9 years ago
|
||
about:crashes gives a "Problem loading page" error and says "The url is not valid and cannot be loaded".
Flags: needinfo?(saurabh)
Assignee | ||
Comment 39•9 years ago
|
||
(In reply to Saurabh from comment #36) > From https://ftp.mozilla.org/pub/mozilla.org/firefox/releases/ If that's where you got the build, then it wasn't compiled with LLVM, and it's not this bug.
You need to log in
before you can comment on or make changes to this bug.
Description
•