Closed
Bug 1331209
Opened 7 years ago
Closed 7 years ago
Crash in mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
VERIFIED
FIXED
mozilla58
People
(Reporter: Usul, Assigned: luke)
References
Details
(4 keywords, Whiteboard: [adv-main58+][adv-esr52.6+][post-critsmash-triage])
Crash Data
Attachments
(7 files, 2 obsolete files)
7.42 KB,
patch
|
Details | Diff | Splinter Review | |
1.33 KB,
patch
|
asuth
:
review+
Sylvestre
:
approval-mozilla-beta-
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
845 bytes,
patch
|
asuth
:
review+
Sylvestre
:
approval-mozilla-beta-
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
11.46 KB,
patch
|
asuth
:
review+
Sylvestre
:
approval-mozilla-beta-
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
8.92 KB,
patch
|
janv
:
review+
Sylvestre
:
approval-mozilla-beta-
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
5.87 KB,
patch
|
bbouvier
:
review+
|
Details | Diff | Splinter Review |
5.90 KB,
patch
|
ritu
:
approval-mozilla-esr52+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-df744d0d-68c3-420d-afeb-7a42a2170114. ============================================================= 0 XUL mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame(mozilla::ipc::MessageChannel&, mozilla::ipc::Direction, IPC::Message const*) obj-firefox/i386/dist/include/mozilla/ipc/MessageChannel.h:341 1 XUL mozilla::ipc::MessageChannel::Send(IPC::Message*) ipc/glue/MessageChannel.cpp:228 2 XUL mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead(mozilla::dom::asmjscache::Metadata const&) obj-firefox/i386/ipc/ipdl/PAsmJSCacheEntryParent.cpp:66 3 XUL mozilla::dom::asmjscache::(anonymous namespace)::ParentRunnable::Run() dom/asmjscache/AsmJSCache.cpp:1003 4 XUL nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1076 5 XUL NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:290 6 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:384 7 XUL MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:232 8 XUL nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:469 9 libnss3.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:216 Ø 10 libsystem_pthread.dylib libsystem_pthread.dylib@0x15fa Ø 11 libsystem_pthread.dylib libsystem_pthread.dylib@0x1484 Ø 12 libsystem_pthread.dylib libsystem_pthread.dylib@0x6cf1 On mac OS 10.9.5 the STR are as follow : 1) log on facebook 2) Writte on your wall 3) press Enter to validate your update Crash.
Updated•7 years ago
|
Component: General → IPC
Reporter | ||
Comment 1•7 years ago
|
||
FWIW I've switched my wife to nightly and doesn't crash anymore with those strs.
Assignee | ||
Comment 2•7 years ago
|
||
Bill: by any chance would you know, as a matter of general IPC design, if this assert is necessary? https://hg.mozilla.org/releases/mozilla-release/annotate/90f18f9c15f7/ipc/glue/MessageChannel.cpp#l250 The runnable in question (ParentRunable in dom/asmjscache/AsmJSCache.cpp) is dispatch to the main thread, QuotaManager IO thread, and the background thread. I don't know which of those violates the assumption that we're on a "worker thread".
Flags: needinfo?(wmccloskey)
An IPC channel is single-threaded (sort of like a JSRuntime used to be). Whatever thread you Open it on is the thread that it must be used on forever after. The assertion enforces that invariant. In this case it looks like a message is being sent on the wrong thread. The channel used here is PBackground. The parent side of any PBackground channel must run on the PBackground thread in the chrome process.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 4•7 years ago
|
||
Thanks! Since the message channel must be called from the PBackground thread, does that mean that in the crash we must be calling from some non-PBackground thread? Can you tell from the callstack in comment 0 what type of thread this is?
There might be a way to figure out the thread name by opening the crash in Visual Studio, but I don't know for sure. The stack is pretty generic, unfortunately.
Assignee | ||
Comment 6•7 years ago
|
||
Ok, thanks. And to make sure I understand your above comment, you can tell from the crash stack that this MessageChannel *needs* to be called from the PBackground thread? Re-reading the file, it seems everything is asserted to be on the right thread throughout. The only way I can see this failing is if all the asserts are actually failing, it's just that they're not release asserts. So one idea is to simply convert them to release asserts and see if that shows some interesting new crash signatures.
(In reply to Luke Wagner [:luke] from comment #6) > Ok, thanks. And to make sure I understand your above comment, you can tell > from the crash stack that this MessageChannel *needs* to be called from the > PBackground thread? Not from the crash, no. In general, all PBackground communication is from (random thread in random process) to (PBackground thread in main process). Since the actor is a Parent actor, it should be on the (PBackground thread in main process) side. In theory, it would be possible to create a PBackground channel that goes to some other random thread. But doing so would be pretty nutty. > Re-reading the file, it seems everything is asserted to be on the right > thread throughout. The only way I can see this failing is if all the > asserts are actually failing, it's just that they're not release asserts. > So one idea is to simply convert them to release asserts and see if that > shows some interesting new crash signatures. Yeah, it's kinda weird that this doesn't show up in automation. It's usually the sort of thing that always succeeds or always fails. I guess another possibility here is that the IPC channel has already been destroyed, and so the assertion is comparing the current thread ID to some bad memory. That sort of thing happens occasionally. Either way, moving the assertion up and out of the channel code would confirm.
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #7) > Yeah, it's kinda weird that this doesn't show up in automation. It's usually > the sort of thing that always succeeds or always fails. Yeah, that's my impression too. > I guess another possibility here is that the IPC channel has already been destroyed, and so > the assertion is comparing the current thread ID to some bad memory. Ooh yeah, that seems quite possible given that this is a low-volume crash sort of thing. Maybe I can repro locally putting in some sleep()s. Q: in general what's the pattern to detect channel shutdown and avoid these problems? (Once upon a time I reasoned through all this with the original dom/asmjscache impl, but when it was rewritten to use PBackground everything got way simpler but also I'm not familiar with the invariants.)
Assignee | ||
Comment 9•7 years ago
|
||
Hah, yeah, I put a sleep before the dispatch to the failing state and we reliably assert if I close the tab while it's sleeping. So I think this is just a plain race: ParentRunnable::ActorDestroy is called on the PBackground thread which calls Fail() which transitions to the eFinished state at the same time that the sleeping thread is on the QM's IO thread (in the eReadyToReadMetadata state). So when the sleep() wakes up, the ParentRunnable is dispatched to the PBackground thread despite being in the eFinished state. (This is a different kind of assert/failure than we're seeing here but I assume it's all the same underlying race condition causing both.) I think the fix is just to give ParentRunnable a lock (taken by both ParentRunnable::Run() and ActorDestroy (and probably Recv__delete__). I'll try that.
Assignee | ||
Comment 10•7 years ago
|
||
A fix, manually tested by sticking sleep(5)s at various points and closing the tab. I also upgraded the thread asserts so should the crash persist we have more clues.
Assignee: nobody → luke
Attachment #8890604 -
Flags: review?(jvarga)
Updated•7 years ago
|
Priority: -- → P2
Comment 11•7 years ago
|
||
Just came across this in triage - 56 and 57 are still affected. Any luck here (maybe for 57?)
status-firefox55:
--- → wontfix
status-firefox56:
--- → wontfix
status-firefox57:
--- → affected
status-firefox58:
--- → affected
Flags: needinfo?(luke)
Keywords: regression
Assignee | ||
Comment 12•7 years ago
|
||
I expect this is low-enough volume that it's not worth risk of taking for 57, but let me know if you think otherwise. Jan: polite review ping.
Flags: needinfo?(luke) → needinfo?(jvarga)
Comment 13•7 years ago
|
||
Oh, I see. We have this problem also in IDB, but it's solved w/o a mutex. Basically there is |bool mActorDestroyed| but also |Atomic<bool> mOperationMayProceed| Then we have two methods: bool IsActorDestroyed() const { AssertIsOnOwningThread(); return mActorDestroyed; } // May be called on any thread, but you should call IsActorDestroyed() if // you know you're on the background thread because it is slightly faster. bool OperationMayProceed() const { return mOperationMayProceed; } Each method that is invoked from Run() for given state, calls IsActorDestroyed() or OperationMayProceed() to bail out early if the actor was destroyed in the mean time (by calling Fail() or FailOnNonOwningThread(). I think ActorDestroy() could just set the bool variables and doesn't call Close() or Fail() directly.
Flags: needinfo?(jvarga)
Comment 14•7 years ago
|
||
Comment 15•7 years ago
|
||
Comment on attachment 8913102 [details] [diff] [review] wip patch Review of attachment 8913102 [details] [diff] [review]: ----------------------------------------------------------------- Something like this, note it's not complete, it's just an example.
Comment 16•7 years ago
|
||
Comment on attachment 8890604 [details] [diff] [review] fix-race Review of attachment 8890604 [details] [diff] [review]: ----------------------------------------------------------------- I would try to avoid a mutex if possible. If Run() acquired the lock and is doing I/O, then if we get ActorDestroy() on the PBackground thread trying to acquire the same lock, it means that PBackground is basically blocked by I/O. There are some exceptions when we can tolerate I/O on the PBackground thread or synchronous blocking of the thread, but I think we can avoid that in this case.
Attachment #8890604 -
Flags: review?(jvarga)
Comment 17•7 years ago
|
||
Comment on attachment 8913102 [details] [diff] [review] wip patch Review of attachment 8913102 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/asmjscache/AsmJSCache.cpp @@ +841,5 @@ > > case eWaitingToFinishInit: { > AssertIsOnOwningThread(); > > + if (!OperationMayProceed() || QuotaManager::IsShuttingDown()) { err, I meant IsActorDestroyed() here
Assignee | ||
Comment 18•7 years ago
|
||
Sounds like you have a pretty specific idea in mind; care to write it up?
Assignee | ||
Comment 19•7 years ago
|
||
Alternatively, I don't think this path is super hot (at all) so I think the overhead of a mutex would be insigificant, so we could just review the patch as is. This code will probably live for <2 years, so might be good to economize effort spent here. WDYT?
Flags: needinfo?(jvarga)
Assignee | ||
Comment 20•7 years ago
|
||
Oh hah, I didn't even notice you had a WIP. Thanks!
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(jvarga)
Comment 21•7 years ago
|
||
Ok, I will see if I have cycles for this, if not, we can take the mutex.
Comment 23•7 years ago
|
||
this crash signature is spiking up on release/beta in the past days.
Updated•7 years ago
|
Crash Signature: [@ mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead] → [@ mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead]
[@ mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCach…
Comment 27•7 years ago
|
||
Luke - where do we stand on landing a fix?
Comment 28•7 years ago
|
||
I'll resolve this today.
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(luke)
Updated•7 years ago
|
Crash Signature: mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead ] → mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead ]
[@ mozilla::ipc::IProtocol::OtherPid ]
Comment 29•7 years ago
|
||
Ok, I now have cycles for this, I'm going to finish the WIP patch.
Flags: needinfo?(jvarga)
Comment 30•7 years ago
|
||
[Tracking Requested - why for this release]: Carrying over the nomination from the duped bug - Bug 1411721#c1.
tracking-firefox57:
--- → ?
Comment 31•7 years ago
|
||
Luke, I have patches ready. You mentioned in comment 8 and comment 9 that you added some sleep(s) to reliably reproduce the race. Would you have time to test it with my patches ?
Flags: needinfo?(luke)
Assignee | ||
Comment 32•7 years ago
|
||
I'm unfortunately in a standards meeting all day and out-of-office tomorrow. Also, I don't have the particular patch I used to test, but I think the details in comment 9 tell you where to put it. As for a testing workload, you could use mega.nz (to repro crash before w/o patches and validate fix).
Flags: needinfo?(luke)
Comment 33•7 years ago
|
||
Ok, thanks.
Comment 34•7 years ago
|
||
Ok, I added some sleeps and manually verified the fix. It's now on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc39c2f234fd0a1729463565b512ed330e46e3b2
Comment 35•7 years ago
|
||
Andrew, do you want to review this ? Patches are not yet attached, but you can check them on try, see previous comment. If not, I guess Luke will take a look on Monday.
Flags: needinfo?(bugmail)
Comment 36•7 years ago
|
||
Fairly high crash volume on beta 57 (and release 56 ) Is this something you are hoping to land for 57 release? The next build will be the RC build next Monday.
Comment 37•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #36) > Fairly high crash volume on beta 57 (and release 56 ) Is this something you > are hoping to land for 57 release? The next build will be the RC build next > Monday. Yes, I'll try to land it before RC.
Comment 38•7 years ago
|
||
Attachment #8913102 -
Attachment is obsolete: true
Attachment #8925009 -
Flags: review?(bugmail)
Comment 39•7 years ago
|
||
Attachment #8925010 -
Flags: review?(bugmail)
Comment 40•7 years ago
|
||
Attachment #8925011 -
Flags: review?(bugmail)
Updated•7 years ago
|
Attachment #8925011 -
Attachment is patch: true
Comment 41•7 years ago
|
||
Attachment #8925012 -
Flags: review?(bugmail)
Comment 42•7 years ago
|
||
Comment on attachment 8925009 [details] [diff] [review] Part 1: Remove unused state Review of attachment 8925009 [details] [diff] [review]: ----------------------------------------------------------------- Restating: This is missed cleanup from Bug 1408333 which caused the state machine to skip over eBackgroundChildPending but didn't remove the enum state.
Attachment #8925009 -
Flags: review?(bugmail) → review+
Comment 43•7 years ago
|
||
Comment on attachment 8925010 [details] [diff] [review] Part 2: Fix incorrect FailOnNonOwningThread() calls Review of attachment 8925010 [details] [diff] [review]: ----------------------------------------------------------------- Restating: The FailOnNonOwningThread() calls were the wrong variant and should have been Fail() calls, consistent with the AssertIsOnOwningThread() above.
Attachment #8925010 -
Flags: review?(bugmail) → review+
Comment 44•7 years ago
|
||
Comment on attachment 8925011 [details] [diff] [review] Part 3: Allow sending of the __delete__ message in one direction only Review of attachment 8925011 [details] [diff] [review]: ----------------------------------------------------------------- Restating: This patch cleans up the destruction logic. Before the patch: - The PAsmJsCacheEntry protocol allows for deletion by either the parent or the child. - The child asks the PBackground parent to create a PAsmJSCacheEntry actor in ChildRunnable::Run when in state eInitial. - ParentRunnable::Fail triggers deletion in the event of failure to open and !mDeleteReceived (as set by Recv__delete__). - ParentRunnable maintains an mDeleteReceived boolean independent of mActorDestroyed. - ChildRunnable::RecvOnOpenMetadataForRead triggers deletion when !FindHashMatch - ChildRunnable::Run triggers deletion when in state eClosing (as set by Close()). After the patch: - The PAsmJsCacheEntry protocol requires the parent to delete the child. - PAsmsJsCacheEntry gains a new child-to-parent Close() method. - PAsmsJsCacheEntry::SelectCacheFileToRead gains an overload via union type OpenMetadataForReadResponse which is used to signal error when an AsmsJSCacheResult is passed. - The AsmsJSCacheResult type is only used to signal errors, as MOZ_ASSERTed in RecvSelectCacheFileToRead. - Still: The child asks the PBackground parent to create a PAsmJSCacheEntry actor in ChildRunnable::Run when in state eInitial. - ParentRunnable::Close triggers deletion if !mActorDestroyed (set by ActorDestroy) and sets state to eFinished. - ParentRunnable::Fail triggers deletion if !mActorDestroyed (set by ActorDestroy) and sets state to eFinished. - Control flow parity in the child: - ChildRunnable::RecvOnOpenMetadataForRead now sends an error overload to trigger Fail() in the parent to trigger protocol deletion. - ChildRunnable::Run's eClosing state now invokes SendClose() if !mActorDestroyed to trigger protocol deletion.
Attachment #8925011 -
Flags: review?(bugmail) → review+
Comment 45•7 years ago
|
||
Bug 1411721 also seems to be a duplicate.
Updated•7 years ago
|
status-firefox-esr52:
--- → ?
Comment 47•7 years ago
|
||
I believe ESR is affected too.
Comment 49•7 years ago
|
||
Comment on attachment 8925012 [details] [diff] [review] Part 4: Fix the actual race between ActorDestroy() on the PBackground thread and Run() on the I/O thread Review of attachment 8925012 [details] [diff] [review]: ----------------------------------------------------------------- Excellent block comment in ActorDestroy, thank you! Restating: ParentRunnable now understands two types of cancellation: - Actor destruction, as Atomic<>ally tracked by mOperationMayProceed and exposed via OperationMayProceed(). - By definition, set on the background/owning thread in ActorDestroy. - Really just for use by the main and IO threads, as it's just an atomic version of mActorDestroyed exposed by IsActorDestroyed(). - Never accessed as part of any greater synchronization block, so effectively limited to being an optimization. - QuotaManager shutdown, as indicated by its Client's IsShuttingDown* methods introduced in bug 1389561 part 5. - Set on the background/owning thread. - On the background threads, (non-mic) sInstance is safe and so is its (non-atomic) mShutdownRequested. Falling back to QM::IsShuttingDown. - On non-Background threads defers to QuotaManager::IsShuttingDown which is thread-safe thanks to QM's Atomic gShutdown. For the ActorDestroy path, the general idea is that either the state machine has already completed operation (it's in eOpened/mOpened==true) and Close() is appropriate, or the state-machine is still processing and it's up to the state machine to notice !OperationMayProceed/IsActorDestroyed() and divert to Fail() either directly (on owning thread) or indirectly via FailOnNonOwningThread() (on non-owning threads). To this end, every state handler in ParentRunnable::Run other than eWaitingToOpenDirectory (I added a line comment) and eFailing (used by FailOnNonOwningThread()) has one of these stanzas. This idiom explicitly corrects the "eSendingMetadataForRead" case which we're seeing so many of on crash-stats. There's also a dispatch guard added and an additional shutdown check. ::: dom/asmjscache/AsmJSCache.cpp @@ +996,5 @@ > > case eWaitingToFinishInit: { > AssertIsOnOwningThread(); > > + if (NS_WARN_IF(Client::IsShuttingDownOnBackgroundThread()) || Should this be added to eWaitingToOpenDirectory too? There was no pre-existing QuotaManager::IsShuttingDown() check to transform in this manner, which makes sense because the QM was explicitly just created if we're in that state. However, it's conceivable the ActorDestroy transition did/could occur in such a case.
Attachment #8925012 -
Flags: review?(bugmail) → review+
Comment 50•7 years ago
|
||
Awesome, thanks for such a quick review and description of the changes. I'm going to request a sec-approval.
Flags: needinfo?(bugmail)
Comment 51•7 years ago
|
||
Attachment #8925012 -
Attachment is obsolete: true
Attachment #8925228 -
Flags: review+
Comment 52•7 years ago
|
||
(In reply to Andrew Sutherland [:asuth] from comment #49) > Should this be added to eWaitingToOpenDirectory too? There was no > pre-existing QuotaManager::IsShuttingDown() check to transform in this > manner, which makes sense because the QM was explicitly just created if > we're in that state. However, it's conceivable the ActorDestroy transition > did/could occur in such a case. Yeah, I added it.
Updated•7 years ago
|
Attachment #8925228 -
Attachment description: Part 4: Fix the actual race between ActorDestroy() on the PBackground thread and Run() on the I/O thread → Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed
Attachment #8925228 -
Attachment filename: asmjs-race → s9
Comment 53•7 years ago
|
||
Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed [Security approval request comment] How easily could an exploit be constructed based on the patch? It would require quite thorough understanding of the code. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Check-in comments, no. Comments in the code, slightly. Which older supported branches are affected by this flaw? All supported branches, including ESR52. If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? There are no backport patches yet, but this code hasn't been under heavy development, so there shouldn't be big conflicts, so not very high risk. How likely is this patch to cause regressions; how much testing does it need? The asmjscache module contains mochitests and these were run locally and on the try server, everything looks good. The risk for causing regressions is not very high.
Attachment #8925228 -
Flags: sec-approval?
Comment 54•7 years ago
|
||
Comment on attachment 8925011 [details] [diff] [review] Part 3: Allow sending of the __delete__ message in one direction only [Security approval request comment] See the sec-approval for Part 4
Attachment #8925011 -
Flags: sec-approval?
Comment 55•7 years ago
|
||
Comment on attachment 8925010 [details] [diff] [review] Part 2: Fix incorrect FailOnNonOwningThread() calls [Security approval request comment] See the sec-approval for Part 4
Attachment #8925010 -
Flags: sec-approval?
Comment 56•7 years ago
|
||
Comment on attachment 8925009 [details] [diff] [review] Part 1: Remove unused state [Security approval request comment] See the sec-approval for Part 4
Attachment #8925009 -
Flags: sec-approval?
Updated•7 years ago
|
Updated•7 years ago
|
Attachment #8925009 -
Flags: sec-approval? → sec-approval+
Updated•7 years ago
|
Attachment #8925010 -
Flags: sec-approval? → sec-approval+
Updated•7 years ago
|
Attachment #8925011 -
Flags: sec-approval? → sec-approval+
Comment 57•7 years ago
|
||
Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed sec-approval=dveditz
Attachment #8925228 -
Flags: sec-approval? → sec-approval+
Updated•7 years ago
|
Group: core-security → dom-core-security
Comment 58•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/501bc06827b656d258a73b6a1f6124065981cc9d https://hg.mozilla.org/integration/mozilla-inbound/rev/14f4abe67ddc10285cf9ca50fcc1b3487441dfcf https://hg.mozilla.org/integration/mozilla-inbound/rev/63f2e1ab0aca1e229bbd1ed2c844548707b5700d https://hg.mozilla.org/integration/mozilla-inbound/rev/e5169de977fa9970354a782febaaf1649f842999
Comment 59•7 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/501bc06827b6 https://hg.mozilla.org/mozilla-central/rev/14f4abe67ddc https://hg.mozilla.org/mozilla-central/rev/63f2e1ab0aca https://hg.mozilla.org/mozilla-central/rev/e5169de977fa
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Comment 60•7 years ago
|
||
Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed Approval Request Comment [Feature/Bug causing the regression]: The issue existed for long time, it just got more frequent with e10s I think. [User impact if declined]: High volume of crashes. [Is this code covered by automated tests?]: Yes, there are mochitests. [Has the fix been verified in Nightly?]: It landed on m-c just 12 hours ago, we'll see in a day or two if number of crashes decreases. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Slightly risky. [Why is the change risky/not risky?]: The fix is not trivial, but everything was double checked and heavily tested locally and on the try server. [String changes made/needed]: None
Attachment #8925228 -
Flags: approval-mozilla-beta?
Comment 61•7 years ago
|
||
Comment on attachment 8925011 [details] [diff] [review] Part 3: Allow sending of the __delete__ message in one direction only Approval Request Comment See approval-mozilla-beta for Part 4
Attachment #8925011 -
Flags: approval-mozilla-beta?
Comment 62•7 years ago
|
||
Comment on attachment 8925010 [details] [diff] [review] Part 2: Fix incorrect FailOnNonOwningThread() calls Approval Request Comment See approval-mozilla-beta for Part 4
Attachment #8925010 -
Flags: approval-mozilla-beta?
Comment 63•7 years ago
|
||
Comment on attachment 8925009 [details] [diff] [review] Part 1: Remove unused state Approval Request Comment See approval-mozilla-beta for Part 4
Attachment #8925009 -
Flags: approval-mozilla-beta?
Comment 64•7 years ago
|
||
Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed This is too late for 57. This will have to ride the train with 58.
Attachment #8925228 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Updated•7 years ago
|
Attachment #8925011 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Updated•7 years ago
|
Attachment #8925010 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Updated•7 years ago
|
Attachment #8925009 -
Flags: approval-mozilla-beta? → approval-mozilla-beta-
Updated•7 years ago
|
Comment 65•6 years ago
|
||
This is the #3 UAF in crash volume in 57/58 currently (close behind the top 2)... can we consider it for a ride-along? (And the fix will then get bake time on 58)
Flags: needinfo?(sledru)
Flags: needinfo?(lhenry)
Comment 66•6 years ago
|
||
Redirecting to Ritu as she is the release owner.
Flags: needinfo?(sledru)
Flags: needinfo?(rkothari)
Flags: needinfo?(lhenry)
Flags: needinfo?(rkothari)
We decided not to uplift this into 57 RC. Depending on how this impacts us post 57 launch, we will consider taking this in a 57.x dot release after weighing in on the risks.
Updated•6 years ago
|
Group: dom-core-security → core-security-release
Comment 68•6 years ago
|
||
(In reply to Ritu Kothari (:ritu) from comment #67) > We decided not to uplift this into 57 RC. Depending on how this impacts us > post 57 launch, we will consider taking this in a 57.x dot release after > weighing in on the risks. this is currently in the range of 500 daily reports on release & accounting for 1.85% of browser crashes on 57.0
Hi Luke, Jim, given that crash volume on this one, we'd like to consider uplifting this to 57.0.1. Could you please comment on the risk vs. reward here? If you also think it's a good idea, please nominate for uplift to moz-release. Thanks!
Flags: needinfo?(luke)
Flags: needinfo?(jmathies)
Assignee | ||
Comment 70•6 years ago
|
||
Jan wrote this patch and would be in a better position to comment.
Flags: needinfo?(luke) → needinfo?(jvarga)
Comment 71•6 years ago
|
||
Patches here depend on bug 1389561, so that one would have to be uplifted too. Anyway, I think it's a good idea, to take them all.
Flags: needinfo?(jvarga)
Thanks Jan. I think the size of this uplift is really worrisome. Looking at the changes in this bug and bug 1389561, I am leaning towards wontfix'ing it for 57. Sylvestre, FYI.
Flags: needinfo?(sledru)
Updated•6 years ago
|
Flags: needinfo?(jmathies)
Assignee | ||
Comment 75•6 years ago
|
||
If the crash is bad enough, another option we have for 57.0.1 is to simply disable the asm.js cache since it's not a critical optimization. That's a one-line code change so very low risk.
Comment 76•6 years ago
|
||
The crash volume on 57 is high enough that a low-risk mitigation like disabling the asm.js cache sounds worth considering. Luke, can you please write that patch and put it up for release approval?
Assignee | ||
Comment 77•6 years ago
|
||
Approval Request Comment [Feature/Bug causing the regression]: bug landed a long time ago, seems exacerbated by recent shift to e10s [User impact if declined]: crashes [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: no [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: trivial patch, just turns off a feature
Flags: needinfo?(luke)
Attachment #8934239 -
Flags: review?(bbouvier)
Attachment #8934239 -
Flags: approval-mozilla-release?
Comment 78•6 years ago
|
||
Comment on attachment 8934239 [details] [diff] [review] disable-asmjscache Review of attachment 8934239 [details] [diff] [review]: ----------------------------------------------------------------- Aouch.
Attachment #8934239 -
Flags: review?(bbouvier) → review+
Comment 79•6 years ago
|
||
The mitigation patch seems to have fallen through the cracks after being put up for m-r approval. Are we making another 57 release? Should this ride-along?
Flags: needinfo?(ryanvm)
Flags: needinfo?(lhenry)
Comment 80•6 years ago
|
||
It did not really fall through any cracks - I was aware. The problem was other chemspill situations for both 57.0.3 and 57.0.4 during the holidays, so we could not risk taking other fixes. Ritu, over to you to either uplift or wontfix for 57. We could uplift it in case we end up with another dot release in the next week. Since the first merge is this Thursday and we release 58 in 2 weeks it seems unlikely (but not impossible).
Flags: needinfo?(lhenry) → needinfo?(rkothari)
The likelihood of another dot release is slim. Hi Luke, Dan, Al, since this fix landed in 58, should we uplift the fix to ESR52.6? It was set to wontfix for some reason but I'd like to see it fixed in ESR52 if the risk is low. Thanks!
Flags: needinfo?(rkothari)
Flags: needinfo?(luke)
Flags: needinfo?(dveditz)
Flags: needinfo?(abillings)
Assignee | ||
Comment 82•6 years ago
|
||
(Forwarding ni? to patch author.)
Flags: needinfo?(luke) → needinfo?(jvarga)
Assignee | ||
Comment 83•6 years ago
|
||
If landing the 58 patch on ESR is hard/risky, we could also land my disable-asmjscache patch (comment 78) on ESR.
Updated•6 years ago
|
Flags: needinfo?(ryanvm)
Comment 84•6 years ago
|
||
I'd like it on ESR if possible but not in a dangerous sort of way.
Flags: needinfo?(abillings)
Updated•6 years ago
|
Whiteboard: [adv-main58+]
Updated•6 years ago
|
Component: IPC → JavaScript Engine
Comment 85•6 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #83) > If landing the 58 patch on ESR is hard/risky, we could also land my > disable-asmjscache patch (comment 78) on ESR. Given that AsmJSCache on ESR52 still uses QM's isApp flag and CacheMiss message, I think we should rather take Luke's patch for disabling the feature. I'm not saying it's not possible to update these patches for ESR52, but it's not trivial, since the state machine differs a bit, so another round of testing would be required. I'm busy with other stuff right now, but if there's someone who is able/willing to update and test original patches then I will be ok with that.
Flags: needinfo?(jvarga)
Comment 86•6 years ago
|
||
Comment on attachment 8934239 [details] [diff] [review] disable-asmjscache As m-r is 58 now, remove approval‑mozilla‑release flag.
Attachment #8934239 -
Flags: approval-mozilla-release?
Comment 87•6 years ago
|
||
This is topcrash #80 for ESR so it seems worthwhile to take the "disable asmjscache" patch there. The patch itself contains a log message like "disable cache for 57". Would we have to fix that? Does anything else in that patch need merging to the ESR branch?
tracking-firefox-esr52:
--- → 58+
Flags: needinfo?(dveditz) → needinfo?(luke)
Assignee | ||
Comment 88•6 years ago
|
||
Nothing else needed. Here's an updated patch that displays "disabled in ESR 52".
Flags: needinfo?(luke)
Assignee | ||
Comment 89•6 years ago
|
||
Comment on attachment 8943414 [details] [diff] [review] disable-asmjscache (ESR52) [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: see comment 87 User impact if declined: see comment 87 Fix Landed on Version: would have landed in 57, but didn't Risk to taking this patch (and alternatives if risky): low, turns off a feature
Attachment #8943414 -
Flags: approval-mozilla-esr52?
Hi Dan, I've already initiated ESER52.6 release build. Do you believe this is critical enough to start a build2? Please let me know.
Flags: needinfo?(dveditz)
Comment 91•6 years ago
|
||
The risk of an exploit is relatively low, but we will get asked why this didn't land by the enterprise folks.
Flags: needinfo?(dveditz)
Updated•6 years ago
|
Flags: qe-verify-
Whiteboard: [adv-main58+] → [adv-main58+][post-critsmash-triage]
Comment on attachment 8943414 [details] [diff] [review] disable-asmjscache (ESR52) Low risk, sec-high, turns off a feature, ESR52+
Attachment #8943414 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 93•6 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-esr52/rev/8368a9a379e3
Comment 94•6 years ago
|
||
uplift |
And a follow-up Werror fix. https://hg.mozilla.org/releases/mozilla-esr52/rev/a7c8e85285e2 As a QA note, Luke said we can verify this fix on ESR52 by doing the following: * Visit http://kripken.github.io/ammo.js/examples/webgl_demo/ammo.html and confirm the "Disabled in ESR" message shows up in the console (it's a JS warning) * Reload that page twice just to be sure
Flags: needinfo?(andrei.vaida)
Comment 95•6 years ago
|
||
Changing the qe-verify flag, based on Ryan's comment.
Flags: qe-verify- → qe-verify+
Updated•6 years ago
|
Flags: needinfo?(andrei.vaida)
Comment 96•6 years ago
|
||
This issue is verified fixed using 52.6.0 esr (BuildId:20180118122319) on Windows 10 64bit, macOS 10.12.6 and Ubuntu 16.04 64bit. The "Disabled in ESR" message shows up in the console.
Updated•6 years ago
|
Whiteboard: [adv-main58+][post-critsmash-triage] → [adv-main58+][adv-esr52.6+][post-critsmash-triage]
Updated•6 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•