1331209 - Crash in mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead

Reporter

Description

•

8 years ago

str

This bug was filed from the Socorro interface and is report bp-df744d0d-68c3-420d-afeb-7a42a2170114. ============================================================= 0 XUL mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame(mozilla::ipc::MessageChannel&, mozilla::ipc::Direction, IPC::Message const*) obj-firefox/i386/dist/include/mozilla/ipc/MessageChannel.h:341 1 XUL mozilla::ipc::MessageChannel::Send(IPC::Message*) ipc/glue/MessageChannel.cpp:228 2 XUL mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead(mozilla::dom::asmjscache::Metadata const&) obj-firefox/i386/ipc/ipdl/PAsmJSCacheEntryParent.cpp:66 3 XUL mozilla::dom::asmjscache::(anonymous namespace)::ParentRunnable::Run() dom/asmjscache/AsmJSCache.cpp:1003 4 XUL nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1076 5 XUL NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:290 6 XUL mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:384 7 XUL MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:232 8 XUL nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:469 9 libnss3.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:216 Ø 10 libsystem_pthread.dylib libsystem_pthread.dylib@0x15fa Ø 11 libsystem_pthread.dylib libsystem_pthread.dylib@0x1484 Ø 12 libsystem_pthread.dylib libsystem_pthread.dylib@0x6cf1 On mac OS 10.9.5 the STR are as follow : 1) log on facebook 2) Writte on your wall 3) press Enter to validate your update Crash.

Ciprian Muresan [:cmuresan], Formerly of Ecosystem QA - please needinfo @mcoman for things regarding the team

Updated

•

8 years ago

Component: General → IPC

Ludovic Hirlimann [:Usul]

Reporter

Comment 1

•

8 years ago

FWIW I've switched my wife to nightly and doesn't crash anymore with those strs.

Luke Wagner [:luke]

Assignee

Comment 2

•

8 years ago

Bill: by any chance would you know, as a matter of general IPC design, if this assert is necessary? https://hg.mozilla.org/releases/mozilla-release/annotate/90f18f9c15f7/ipc/glue/MessageChannel.cpp#l250 The runnable in question (ParentRunable in dom/asmjscache/AsmJSCache.cpp) is dispatch to the main thread, QuotaManager IO thread, and the background thread. I don't know which of those violates the assumption that we're on a "worker thread".

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 3

•

8 years ago

An IPC channel is single-threaded (sort of like a JSRuntime used to be). Whatever thread you Open it on is the thread that it must be used on forever after. The assertion enforces that invariant. In this case it looks like a message is being sent on the wrong thread. The channel used here is PBackground. The parent side of any PBackground channel must run on the PBackground thread in the chrome process.

Flags: needinfo?(wmccloskey)

Luke Wagner [:luke]

Assignee

Comment 4

•

8 years ago

Thanks! Since the message channel must be called from the PBackground thread, does that mean that in the crash we must be calling from some non-PBackground thread? Can you tell from the callstack in comment 0 what type of thread this is?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 5

•

8 years ago

There might be a way to figure out the thread name by opening the crash in Visual Studio, but I don't know for sure. The stack is pretty generic, unfortunately.

Luke Wagner [:luke]

Assignee

Comment 6

•

8 years ago

Ok, thanks. And to make sure I understand your above comment, you can tell from the crash stack that this MessageChannel *needs* to be called from the PBackground thread? Re-reading the file, it seems everything is asserted to be on the right thread throughout. The only way I can see this failing is if all the asserts are actually failing, it's just that they're not release asserts. So one idea is to simply convert them to release asserts and see if that shows some interesting new crash signatures.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 7

•

8 years ago

(In reply to Luke Wagner [:luke] from comment #6) > Ok, thanks. And to make sure I understand your above comment, you can tell > from the crash stack that this MessageChannel *needs* to be called from the > PBackground thread? Not from the crash, no. In general, all PBackground communication is from (random thread in random process) to (PBackground thread in main process). Since the actor is a Parent actor, it should be on the (PBackground thread in main process) side. In theory, it would be possible to create a PBackground channel that goes to some other random thread. But doing so would be pretty nutty. > Re-reading the file, it seems everything is asserted to be on the right > thread throughout. The only way I can see this failing is if all the > asserts are actually failing, it's just that they're not release asserts. > So one idea is to simply convert them to release asserts and see if that > shows some interesting new crash signatures. Yeah, it's kinda weird that this doesn't show up in automation. It's usually the sort of thing that always succeeds or always fails. I guess another possibility here is that the IPC channel has already been destroyed, and so the assertion is comparing the current thread ID to some bad memory. That sort of thing happens occasionally. Either way, moving the assertion up and out of the channel code would confirm.

Luke Wagner [:luke]

Assignee

Comment 8

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #7) > Yeah, it's kinda weird that this doesn't show up in automation. It's usually > the sort of thing that always succeeds or always fails. Yeah, that's my impression too. > I guess another possibility here is that the IPC channel has already been destroyed, and so > the assertion is comparing the current thread ID to some bad memory. Ooh yeah, that seems quite possible given that this is a low-volume crash sort of thing. Maybe I can repro locally putting in some sleep()s. Q: in general what's the pattern to detect channel shutdown and avoid these problems? (Once upon a time I reasoned through all this with the original dom/asmjscache impl, but when it was rewritten to use PBackground everything got way simpler but also I'm not familiar with the invariants.)

Luke Wagner [:luke]

Assignee

Comment 9

•

8 years ago

Hah, yeah, I put a sleep before the dispatch to the failing state and we reliably assert if I close the tab while it's sleeping. So I think this is just a plain race: ParentRunnable::ActorDestroy is called on the PBackground thread which calls Fail() which transitions to the eFinished state at the same time that the sleeping thread is on the QM's IO thread (in the eReadyToReadMetadata state). So when the sleep() wakes up, the ParentRunnable is dispatched to the PBackground thread despite being in the eFinished state. (This is a different kind of assert/failure than we're seeing here but I assume it's all the same underlying race condition causing both.) I think the fix is just to give ParentRunnable a lock (taken by both ParentRunnable::Run() and ActorDestroy (and probably Recv__delete__). I'll try that.

Luke Wagner [:luke]

Assignee

Comment 10

•

8 years ago

Attached patch fix-race — Details — Splinter Review

A fix, manually tested by sticking sleep(5)s at various points and closing the tab. I also upgraded the thread asserts so should the crash persist we have more clues.

Assignee: nobody → luke

Attachment #8890604 - Flags: review?(jvarga)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

8 years ago

Updated

•

8 years ago

Priority: -- → P2

Liz Henry (:lizzard) (relman/hg->git project)

Comment 11

•

8 years ago

Just came across this in triage - 56 and 57 are still affected. Any luck here (maybe for 57?)

status-firefox55: --- → wontfix

status-firefox56: --- → wontfix

status-firefox57: --- → affected

status-firefox58: --- → affected

Flags: needinfo?(luke)

Keywords: regression

Luke Wagner [:luke]

Assignee

Comment 12

•

8 years ago

I expect this is low-enough volume that it's not worth risk of taking for 57, but let me know if you think otherwise. Jan: polite review ping.

Flags: needinfo?(luke) → needinfo?(jvarga)

Jan Varga [:janv]

Comment 13

•

8 years ago

Oh, I see. We have this problem also in IDB, but it's solved w/o a mutex. Basically there is |bool mActorDestroyed| but also |Atomic<bool> mOperationMayProceed| Then we have two methods: bool IsActorDestroyed() const { AssertIsOnOwningThread(); return mActorDestroyed; } // May be called on any thread, but you should call IsActorDestroyed() if // you know you're on the background thread because it is slightly faster. bool OperationMayProceed() const { return mOperationMayProceed; } Each method that is invoked from Run() for given state, calls IsActorDestroyed() or OperationMayProceed() to bail out early if the actor was destroyed in the mean time (by calling Fail() or FailOnNonOwningThread(). I think ActorDestroy() could just set the bool variables and doesn't call Close() or Fail() directly.

Flags: needinfo?(jvarga)

Jan Varga [:janv]

Comment 14

•

8 years ago

Attached patch wip patch (obsolete) — Details — Splinter Review

Jan Varga [:janv]

Comment 15

•

8 years ago

Comment on attachment 8913102 [details] [diff] [review] wip patch Review of attachment 8913102 [details] [diff] [review]: ----------------------------------------------------------------- Something like this, note it's not complete, it's just an example.

Jan Varga [:janv]

Comment 16

•

8 years ago

Comment on attachment 8890604 [details] [diff] [review] fix-race Review of attachment 8890604 [details] [diff] [review]: ----------------------------------------------------------------- I would try to avoid a mutex if possible. If Run() acquired the lock and is doing I/O, then if we get ActorDestroy() on the PBackground thread trying to acquire the same lock, it means that PBackground is basically blocked by I/O. There are some exceptions when we can tolerate I/O on the PBackground thread or synchronous blocking of the thread, but I think we can avoid that in this case.

Attachment #8890604 - Flags: review?(jvarga)

Jan Varga [:janv]

Comment 17

•

8 years ago

Comment on attachment 8913102 [details] [diff] [review] wip patch Review of attachment 8913102 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/asmjscache/AsmJSCache.cpp @@ +841,5 @@ > > case eWaitingToFinishInit: { > AssertIsOnOwningThread(); > > + if (!OperationMayProceed() || QuotaManager::IsShuttingDown()) { err, I meant IsActorDestroyed() here

Luke Wagner [:luke]

Assignee

Comment 18

•

8 years ago

Sounds like you have a pretty specific idea in mind; care to write it up?

Luke Wagner [:luke]

Assignee

Comment 19

•

8 years ago

Alternatively, I don't think this path is super hot (at all) so I think the overhead of a mutex would be insigificant, so we could just review the patch as is. This code will probably live for <2 years, so might be good to economize effort spent here. WDYT?

Flags: needinfo?(jvarga)

Luke Wagner [:luke]

Assignee

Comment 20

•

8 years ago

Oh hah, I didn't even notice you had a WIP. Thanks!

Luke Wagner [:luke]

Assignee

Updated

•

8 years ago

Flags: needinfo?(jvarga)

Jan Varga [:janv]

Comment 21

•

8 years ago

Ok, I will see if I have cycles for this, if not, we can take the mutex.

[:philipp]

Comment 23

•

8 years ago

this crash signature is spiking up on release/beta in the past days.

[:philipp]

Updated

•

8 years ago

Crash Signature: [@ mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead] → [@ mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead] [@ mozilla::ipc::MessageChannel::Send | mozilla::dom::asmjscache::PAsmJSCach…

Luke Wagner [:luke]

Assignee

Comment 24

•

8 years ago

Which route do we go here?

Flags: needinfo?(jvarga)

Randell Jesup [:jesup] (needinfo me)

Comment 27

•

8 years ago

Luke - where do we stand on landing a fix?

Group: core-security

Flags: needinfo?(luke)

Keywords: csectype-uaf, sec-high

Jan Varga [:janv]

Comment 28

•

8 years ago

I'll resolve this today.

Luke Wagner [:luke]

Assignee

Updated

•

8 years ago

Flags: needinfo?(luke)

[:philipp]

Updated

•

8 years ago

Crash Signature: mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead ] → mozilla::dom::asmjscache::PAsmJSCacheEntryParent::SendOnOpenMetadataForRead ] [@ mozilla::ipc::IProtocol::OtherPid ]

Jan Varga [:janv]

Comment 29

•

8 years ago

Ok, I now have cycles for this, I'm going to finish the WIP patch.

Flags: needinfo?(jvarga)

Marcia Knous [:marcia]

Comment 30

•

8 years ago

[Tracking Requested - why for this release]: Carrying over the nomination from the duped bug - Bug 1411721#c1.

tracking-firefox57: --- → ?

Jan Varga [:janv]

Comment 31

•

8 years ago

Luke, I have patches ready. You mentioned in comment 8 and comment 9 that you added some sleep(s) to reliably reproduce the race. Would you have time to test it with my patches ?

Flags: needinfo?(luke)

Luke Wagner [:luke]

Assignee

Comment 32

•

8 years ago

I'm unfortunately in a standards meeting all day and out-of-office tomorrow. Also, I don't have the particular patch I used to test, but I think the details in comment 9 tell you where to put it. As for a testing workload, you could use mega.nz (to repro crash before w/o patches and validate fix).

Flags: needinfo?(luke)

Jan Varga [:janv]

Comment 33

•

8 years ago

Ok, thanks.

Jan Varga [:janv]

Comment 34

•

8 years ago

Ok, I added some sleeps and manually verified the fix. It's now on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc39c2f234fd0a1729463565b512ed330e46e3b2

Jan Varga [:janv]

Comment 35

•

8 years ago

Andrew, do you want to review this ? Patches are not yet attached, but you can check them on try, see previous comment. If not, I guess Luke will take a look on Monday.

Flags: needinfo?(bugmail)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 36

•

8 years ago

Fairly high crash volume on beta 57 (and release 56 ) Is this something you are hoping to land for 57 release? The next build will be the RC build next Monday.

tracking-firefox57: ? → +

Jan Varga [:janv]

Comment 37

•

8 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #36) > Fairly high crash volume on beta 57 (and release 56 ) Is this something you > are hoping to land for 57 release? The next build will be the RC build next > Monday. Yes, I'll try to land it before RC.

Jan Varga [:janv]

Comment 38

•

8 years ago

Attached patch Part 1: Remove unused state — Details — Splinter Review

Attachment #8913102 - Attachment is obsolete: true

Attachment #8925009 - Flags: review?(bugmail)

Jan Varga [:janv]

Comment 39

•

8 years ago

Attached patch Part 2: Fix incorrect FailOnNonOwningThread() calls — Details — Splinter Review

Attachment #8925010 - Flags: review?(bugmail)

Jan Varga [:janv]

Comment 40

•

8 years ago

Attached patch Part 3: Allow sending of the __delete__ message in one direction only — Details — Splinter Review

Attachment #8925011 - Flags: review?(bugmail)

Jan Varga [:janv]

Updated

•

8 years ago

Attachment #8925011 - Attachment is patch: true

Jan Varga [:janv]

Comment 41

•

8 years ago

Attached patch Part 4: Fix the actual race between ActorDestroy() on the PBackground thread and Run() on the I/O thread (obsolete) — Details — Splinter Review

Attachment #8925012 - Flags: review?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 42

•

8 years ago

Comment on attachment 8925009 [details] [diff] [review] Part 1: Remove unused state Review of attachment 8925009 [details] [diff] [review]: ----------------------------------------------------------------- Restating: This is missed cleanup from Bug 1408333 which caused the state machine to skip over eBackgroundChildPending but didn't remove the enum state.

Attachment #8925009 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Comment 43

•

8 years ago

Comment on attachment 8925010 [details] [diff] [review] Part 2: Fix incorrect FailOnNonOwningThread() calls Review of attachment 8925010 [details] [diff] [review]: ----------------------------------------------------------------- Restating: The FailOnNonOwningThread() calls were the wrong variant and should have been Fail() calls, consistent with the AssertIsOnOwningThread() above.

Attachment #8925010 - Flags: review?(bugmail) → review+

Andrew Sutherland [:asuth] (he/him)

Comment 44

•

8 years ago

Comment on attachment 8925011 [details] [diff] [review] Part 3: Allow sending of the __delete__ message in one direction only Review of attachment 8925011 [details] [diff] [review]: ----------------------------------------------------------------- Restating: This patch cleans up the destruction logic. Before the patch: - The PAsmJsCacheEntry protocol allows for deletion by either the parent or the child. - The child asks the PBackground parent to create a PAsmJSCacheEntry actor in ChildRunnable::Run when in state eInitial. - ParentRunnable::Fail triggers deletion in the event of failure to open and !mDeleteReceived (as set by Recv__delete__). - ParentRunnable maintains an mDeleteReceived boolean independent of mActorDestroyed. - ChildRunnable::RecvOnOpenMetadataForRead triggers deletion when !FindHashMatch - ChildRunnable::Run triggers deletion when in state eClosing (as set by Close()). After the patch: - The PAsmJsCacheEntry protocol requires the parent to delete the child. - PAsmsJsCacheEntry gains a new child-to-parent Close() method. - PAsmsJsCacheEntry::SelectCacheFileToRead gains an overload via union type OpenMetadataForReadResponse which is used to signal error when an AsmsJSCacheResult is passed. - The AsmsJSCacheResult type is only used to signal errors, as MOZ_ASSERTed in RecvSelectCacheFileToRead. - Still: The child asks the PBackground parent to create a PAsmJSCacheEntry actor in ChildRunnable::Run when in state eInitial. - ParentRunnable::Close triggers deletion if !mActorDestroyed (set by ActorDestroy) and sets state to eFinished. - ParentRunnable::Fail triggers deletion if !mActorDestroyed (set by ActorDestroy) and sets state to eFinished. - Control flow parity in the child: - ChildRunnable::RecvOnOpenMetadataForRead now sends an error overload to trigger Fail() in the parent to trigger protocol deletion. - ChildRunnable::Run's eClosing state now invokes SendClose() if !mActorDestroyed to trigger protocol deletion.

Attachment #8925011 - Flags: review?(bugmail) → review+

Liz Henry (:lizzard) (relman/hg->git project)

Comment 45

•

8 years ago

Bug 1411721 also seems to be a duplicate.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 46

•

8 years ago

Is ESR affected?

Flags: needinfo?(luke)

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

8 years ago

status-firefox-esr52: --- → ?

Jan Varga [:janv]

Comment 47

•

8 years ago

I believe ESR is affected too.

Luke Wagner [:luke]

Assignee

Comment 48

•

8 years ago

Agreed.

Flags: needinfo?(luke)

Andrew Sutherland [:asuth] (he/him)

Comment 49

•

8 years ago

Comment on attachment 8925012 [details] [diff] [review] Part 4: Fix the actual race between ActorDestroy() on the PBackground thread and Run() on the I/O thread Review of attachment 8925012 [details] [diff] [review]: ----------------------------------------------------------------- Excellent block comment in ActorDestroy, thank you! Restating: ParentRunnable now understands two types of cancellation: - Actor destruction, as Atomic<>ally tracked by mOperationMayProceed and exposed via OperationMayProceed(). - By definition, set on the background/owning thread in ActorDestroy. - Really just for use by the main and IO threads, as it's just an atomic version of mActorDestroyed exposed by IsActorDestroyed(). - Never accessed as part of any greater synchronization block, so effectively limited to being an optimization. - QuotaManager shutdown, as indicated by its Client's IsShuttingDown* methods introduced in bug 1389561 part 5. - Set on the background/owning thread. - On the background threads, (non-mic) sInstance is safe and so is its (non-atomic) mShutdownRequested. Falling back to QM::IsShuttingDown. - On non-Background threads defers to QuotaManager::IsShuttingDown which is thread-safe thanks to QM's Atomic gShutdown. For the ActorDestroy path, the general idea is that either the state machine has already completed operation (it's in eOpened/mOpened==true) and Close() is appropriate, or the state-machine is still processing and it's up to the state machine to notice !OperationMayProceed/IsActorDestroyed() and divert to Fail() either directly (on owning thread) or indirectly via FailOnNonOwningThread() (on non-owning threads). To this end, every state handler in ParentRunnable::Run other than eWaitingToOpenDirectory (I added a line comment) and eFailing (used by FailOnNonOwningThread()) has one of these stanzas. This idiom explicitly corrects the "eSendingMetadataForRead" case which we're seeing so many of on crash-stats. There's also a dispatch guard added and an additional shutdown check. ::: dom/asmjscache/AsmJSCache.cpp @@ +996,5 @@ > > case eWaitingToFinishInit: { > AssertIsOnOwningThread(); > > + if (NS_WARN_IF(Client::IsShuttingDownOnBackgroundThread()) || Should this be added to eWaitingToOpenDirectory too? There was no pre-existing QuotaManager::IsShuttingDown() check to transform in this manner, which makes sense because the QM was explicitly just created if we're in that state. However, it's conceivable the ActorDestroy transition did/could occur in such a case.

Attachment #8925012 - Flags: review?(bugmail) → review+

Jan Varga [:janv]

Comment 50

•

8 years ago

Awesome, thanks for such a quick review and description of the changes. I'm going to request a sec-approval.

Flags: needinfo?(bugmail)

Jan Varga [:janv]

Comment 51

•

8 years ago

Attached patch Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed — Details — Splinter Review

Attachment #8925012 - Attachment is obsolete: true

Attachment #8925228 - Flags: review+

Jan Varga [:janv]

Comment 52

•

8 years ago

(In reply to Andrew Sutherland [:asuth] from comment #49) > Should this be added to eWaitingToOpenDirectory too? There was no > pre-existing QuotaManager::IsShuttingDown() check to transform in this > manner, which makes sense because the QM was explicitly just created if > we're in that state. However, it's conceivable the ActorDestroy transition > did/could occur in such a case. Yeah, I added it.

Jan Varga [:janv]

Updated

•

8 years ago

Attachment #8925228 - Attachment description: Part 4: Fix the actual race between ActorDestroy() on the PBackground thread and Run() on the I/O thread → Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed

Attachment #8925228 - Attachment filename: asmjs-race → s9

Jan Varga [:janv]

Comment 53

•

8 years ago

Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed [Security approval request comment] How easily could an exploit be constructed based on the patch? It would require quite thorough understanding of the code. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Check-in comments, no. Comments in the code, slightly. Which older supported branches are affected by this flaw? All supported branches, including ESR52. If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? There are no backport patches yet, but this code hasn't been under heavy development, so there shouldn't be big conflicts, so not very high risk. How likely is this patch to cause regressions; how much testing does it need? The asmjscache module contains mochitests and these were run locally and on the try server, everything looks good. The risk for causing regressions is not very high.

Attachment #8925228 - Flags: sec-approval?

Jan Varga [:janv]

Comment 54

•

8 years ago

Comment on attachment 8925011 [details] [diff] [review] Part 3: Allow sending of the __delete__ message in one direction only [Security approval request comment] See the sec-approval for Part 4

Attachment #8925011 - Flags: sec-approval?

Jan Varga [:janv]

Comment 55

•

8 years ago

Comment on attachment 8925010 [details] [diff] [review] Part 2: Fix incorrect FailOnNonOwningThread() calls [Security approval request comment] See the sec-approval for Part 4

Attachment #8925010 - Flags: sec-approval?

Jan Varga [:janv]

Comment 56

•

8 years ago

Comment on attachment 8925009 [details] [diff] [review] Part 1: Remove unused state [Security approval request comment] See the sec-approval for Part 4

Attachment #8925009 - Flags: sec-approval?

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

8 years ago

status-firefox-esr52: ? → affected

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

8 years ago

Attachment #8925009 - Flags: sec-approval? → sec-approval+

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

8 years ago

Attachment #8925010 - Flags: sec-approval? → sec-approval+

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

8 years ago

Attachment #8925011 - Flags: sec-approval? → sec-approval+

Daniel Veditz [:dveditz] back January 5, 2026

Comment 57

•

8 years ago

Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed sec-approval=dveditz

Attachment #8925228 - Flags: sec-approval? → sec-approval+

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

8 years ago

Group: core-security → dom-core-security

Jan Varga [:janv]

Comment 58

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/501bc06827b656d258a73b6a1f6124065981cc9d https://hg.mozilla.org/integration/mozilla-inbound/rev/14f4abe67ddc10285cf9ca50fcc1b3487441dfcf https://hg.mozilla.org/integration/mozilla-inbound/rev/63f2e1ab0aca1e229bbd1ed2c844548707b5700d https://hg.mozilla.org/integration/mozilla-inbound/rev/e5169de977fa9970354a782febaaf1649f842999

Ryan VanderMeulen [:RyanVM]

Comment 59

•

8 years ago

https://hg.mozilla.org/mozilla-central/rev/501bc06827b6 https://hg.mozilla.org/mozilla-central/rev/14f4abe67ddc https://hg.mozilla.org/mozilla-central/rev/63f2e1ab0aca https://hg.mozilla.org/mozilla-central/rev/e5169de977fa

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox58: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla58

Jan Varga [:janv]

Comment 60

•

8 years ago

Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed Approval Request Comment [Feature/Bug causing the regression]: The issue existed for long time, it just got more frequent with e10s I think. [User impact if declined]: High volume of crashes. [Is this code covered by automated tests?]: Yes, there are mochitests. [Has the fix been verified in Nightly?]: It landed on m-c just 12 hours ago, we'll see in a day or two if number of crashes decreases. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Slightly risky. [Why is the change risky/not risky?]: The fix is not trivial, but everything was double checked and heavily tested locally and on the try server. [String changes made/needed]: None

Attachment #8925228 - Flags: approval-mozilla-beta?

Jan Varga [:janv]

Comment 61

•

8 years ago

Comment on attachment 8925011 [details] [diff] [review] Part 3: Allow sending of the __delete__ message in one direction only Approval Request Comment See approval-mozilla-beta for Part 4

Attachment #8925011 - Flags: approval-mozilla-beta?

Jan Varga [:janv]

Comment 62

•

8 years ago

Comment on attachment 8925010 [details] [diff] [review] Part 2: Fix incorrect FailOnNonOwningThread() calls Approval Request Comment See approval-mozilla-beta for Part 4

Attachment #8925010 - Flags: approval-mozilla-beta?

Jan Varga [:janv]

Comment 63

•

8 years ago

Comment on attachment 8925009 [details] [diff] [review] Part 1: Remove unused state Approval Request Comment See approval-mozilla-beta for Part 4

Attachment #8925009 - Flags: approval-mozilla-beta?

Sylvestre Ledru [:Sylvestre]

Comment 64

•

8 years ago

Comment on attachment 8925228 [details] [diff] [review] Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed This is too late for 57. This will have to ride the train with 58.

Attachment #8925228 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

Attachment #8925011 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

Attachment #8925010 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

Attachment #8925009 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Sylvestre Ledru [:Sylvestre]

Updated

•

8 years ago

status-firefox57: affected → wontfix

Randell Jesup [:jesup] (needinfo me)

Comment 65

•

8 years ago

This is the #3 UAF in crash volume in 57/58 currently (close behind the top 2)... can we consider it for a ride-along? (And the fix will then get bake time on 58)

Flags: needinfo?(sledru)

Flags: needinfo?(lhenry)

Sylvestre Ledru [:Sylvestre]

Comment 66

•

8 years ago

Redirecting to Ritu as she is the release owner.

Flags: needinfo?(sledru)

Flags: needinfo?(rkothari)

Flags: needinfo?(lhenry)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

8 years ago

Flags: needinfo?(rkothari)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 67

•

8 years ago

We decided not to uplift this into 57 RC. Depending on how this impacts us post 57 launch, we will consider taking this in a 57.x dot release after weighing in on the risks.

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

8 years ago

Group: dom-core-security → core-security-release

[:philipp]

Comment 68

•

8 years ago

(In reply to Ritu Kothari (:ritu) from comment #67) > We decided not to uplift this into 57 RC. Depending on how this impacts us > post 57 launch, we will consider taking this in a 57.x dot release after > weighing in on the risks. this is currently in the range of 500 daily reports on release & accounting for 1.85% of browser crashes on 57.0

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

8 years ago

status-firefox57: wontfix → affected

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 69

•

8 years ago

Hi Luke, Jim, given that crash volume on this one, we'd like to consider uplifting this to 57.0.1. Could you please comment on the risk vs. reward here? If you also think it's a good idea, please nominate for uplift to moz-release. Thanks!

Flags: needinfo?(luke)

Flags: needinfo?(jmathies)

Luke Wagner [:luke]

Assignee

Comment 70

•

8 years ago

Jan wrote this patch and would be in a better position to comment.

Flags: needinfo?(luke) → needinfo?(jvarga)

Jan Varga [:janv]

Comment 71

•

8 years ago

Patches here depend on bug 1389561, so that one would have to be uplifted too. Anyway, I think it's a good idea, to take them all.

tracking-firefox57: + → ?

Flags: needinfo?(jvarga)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 72

•

8 years ago

Thanks Jan. I think the size of this uplift is really worrisome. Looking at the changes in this bug and bug 1389561, I am leaning towards wontfix'ing it for 57. Sylvestre, FYI.

status-firefox57: affected → wontfix

tracking-firefox57: ? → -

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

8 years ago

Flags: needinfo?(sledru)

Sylvestre Ledru [:Sylvestre]

Comment 73

•

8 years ago

ack

Flags: needinfo?(sledru)

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

8 years ago

Depends on: 1389561

Jim Mathies [:jimm]

Updated

•

8 years ago

Flags: needinfo?(jmathies)

Luke Wagner [:luke]

Assignee

Comment 75

•

8 years ago

If the crash is bad enough, another option we have for 57.0.1 is to simply disable the asm.js cache since it's not a critical optimization. That's a one-line code change so very low risk.

Ryan VanderMeulen [:RyanVM]

Comment 76

•

8 years ago

The crash volume on 57 is high enough that a low-risk mitigation like disabling the asm.js cache sounds worth considering. Luke, can you please write that patch and put it up for release approval?

status-firefox57: wontfix → affected

status-firefox-esr52: affected → wontfix

Flags: needinfo?(luke)

Luke Wagner [:luke]

Assignee

Comment 77

•

8 years ago

Attached patch disable-asmjscache — Details — Splinter Review

Approval Request Comment [Feature/Bug causing the regression]: bug landed a long time ago, seems exacerbated by recent shift to e10s [User impact if declined]: crashes [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: no [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: no [Why is the change risky/not risky?]: trivial patch, just turns off a feature

Flags: needinfo?(luke)

Attachment #8934239 - Flags: review?(bbouvier)

Attachment #8934239 - Flags: approval-mozilla-release?

Benjamin Bouvier [:bbouvier] (inactive)

Comment 78

•

8 years ago

Comment on attachment 8934239 [details] [diff] [review] disable-asmjscache Review of attachment 8934239 [details] [diff] [review]: ----------------------------------------------------------------- Aouch.

Attachment #8934239 - Flags: review?(bbouvier) → review+

Randell Jesup [:jesup] (needinfo me)

Comment 79

•

7 years ago

The mitigation patch seems to have fallen through the cracks after being put up for m-r approval. Are we making another 57 release? Should this ride-along?

Flags: needinfo?(ryanvm)

Flags: needinfo?(lhenry)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 80

•

7 years ago

It did not really fall through any cracks - I was aware. The problem was other chemspill situations for both 57.0.3 and 57.0.4 during the holidays, so we could not risk taking other fixes. Ritu, over to you to either uplift or wontfix for 57. We could uplift it in case we end up with another dot release in the next week. Since the first merge is this Thursday and we release 58 in 2 weeks it seems unlikely (but not impossible).

Flags: needinfo?(lhenry) → needinfo?(rkothari)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 81

•

7 years ago

The likelihood of another dot release is slim. Hi Luke, Dan, Al, since this fix landed in 58, should we uplift the fix to ESR52.6? It was set to wontfix for some reason but I'd like to see it fixed in ESR52 if the risk is low. Thanks!

status-firefox-esr52: wontfix → affected

Flags: needinfo?(rkothari)

Flags: needinfo?(luke)

Flags: needinfo?(dveditz)

Flags: needinfo?(abillings)

Luke Wagner [:luke]

Assignee

Comment 82

•

7 years ago

(Forwarding ni? to patch author.)

Flags: needinfo?(luke) → needinfo?(jvarga)

Luke Wagner [:luke]

Assignee

Comment 83

•

7 years ago

If landing the 58 patch on ESR is hard/risky, we could also land my disable-asmjscache patch (comment 78) on ESR.

Ryan VanderMeulen [:RyanVM]

Updated

•

7 years ago

Flags: needinfo?(ryanvm)

Al Billings [:abillings - ex-MoCo]

Comment 84

•

7 years ago

I'd like it on ESR if possible but not in a dangerous sort of way.

Flags: needinfo?(abillings)

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

status-firefox57: affected → wontfix

Whiteboard: [adv-main58+]

Jed Davis [:jld] (away until 12-29)

Updated

•

7 years ago

Component: IPC → JavaScript Engine

Jan Varga [:janv]

Comment 85

•

7 years ago

(In reply to Luke Wagner [:luke] from comment #83) > If landing the 58 patch on ESR is hard/risky, we could also land my > disable-asmjscache patch (comment 78) on ESR. Given that AsmJSCache on ESR52 still uses QM's isApp flag and CacheMiss message, I think we should rather take Luke's patch for disabling the feature. I'm not saying it's not possible to update these patches for ESR52, but it's not trivial, since the state machine differs a bit, so another round of testing would be required. I'm busy with other stuff right now, but if there's someone who is able/willing to update and test original patches then I will be ok with that.

Flags: needinfo?(jvarga)

Gerry Chang [:gchang]

Comment 86

•

7 years ago

Comment on attachment 8934239 [details] [diff] [review] disable-asmjscache As m-r is 58 now, remove approval‑mozilla‑release flag.

Attachment #8934239 - Flags: approval-mozilla-release?

Daniel Veditz [:dveditz] back January 5, 2026

Comment 87

•

7 years ago

This is topcrash #80 for ESR so it seems worthwhile to take the "disable asmjscache" patch there. The patch itself contains a log message like "disable cache for 57". Would we have to fix that? Does anything else in that patch need merging to the ESR branch?

tracking-firefox-esr52: --- → 58+

Flags: needinfo?(dveditz) → needinfo?(luke)

Luke Wagner [:luke]

Assignee

Comment 88

•

7 years ago

Attached patch disable-asmjscache (ESR52) — Details — Splinter Review

Nothing else needed. Here's an updated patch that displays "disabled in ESR 52".

Flags: needinfo?(luke)

Luke Wagner [:luke]

Assignee

Comment 89

•

7 years ago

Comment on attachment 8943414 [details] [diff] [review] disable-asmjscache (ESR52) [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: see comment 87 User impact if declined: see comment 87 Fix Landed on Version: would have landed in 57, but didn't Risk to taking this patch (and alternatives if risky): low, turns off a feature

Attachment #8943414 - Flags: approval-mozilla-esr52?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 90

•

7 years ago

Hi Dan, I've already initiated ESER52.6 release build. Do you believe this is critical enough to start a build2? Please let me know.

Flags: needinfo?(dveditz)

Daniel Veditz [:dveditz] back January 5, 2026

Comment 91

•

7 years ago

The risk of an exploit is relatively low, but we will get asked why this didn't land by the enterprise folks.

Flags: needinfo?(dveditz)

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

7 years ago

Flags: qe-verify-

Whiteboard: [adv-main58+] → [adv-main58+][post-critsmash-triage]

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 92

•

7 years ago

Comment on attachment 8943414 [details] [diff] [review] disable-asmjscache (ESR52) Low risk, sec-high, turns off a feature, ESR52+

Attachment #8943414 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Ryan VanderMeulen [:RyanVM]

Comment 93

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/8368a9a379e3

status-firefox-esr52: affected → fixed

Ryan VanderMeulen [:RyanVM]

Comment 94

•

7 years ago

uplift

And a follow-up Werror fix. https://hg.mozilla.org/releases/mozilla-esr52/rev/a7c8e85285e2 As a QA note, Luke said we can verify this fix on ESR52 by doing the following: * Visit http://kripken.github.io/ammo.js/examples/webgl_demo/ammo.html and confirm the "Disabled in ESR" message shows up in the console (it's a JS warning) * Reload that page twice just to be sure

Flags: needinfo?(andrei.vaida)

Cornel Ionce [:noni]

Comment 95

•

7 years ago

Changing the qe-verify flag, based on Ryan's comment.

Flags: qe-verify- → qe-verify+

Andrei Vaida [:avaida]

Updated

•

7 years ago

Flags: needinfo?(andrei.vaida)

Emil Ghitta, QA [:emilghitta]

Comment 96

•

7 years ago

This issue is verified fixed using 52.6.0 esr (BuildId:20180118122319) on Windows 10 64bit, macOS 10.12.6 and Ubuntu 16.04 64bit. The "Disabled in ESR" message shows up in the console.

Status: RESOLVED → VERIFIED

status-firefox-esr52: fixed → verified

Flags: qe-verify+

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Whiteboard: [adv-main58+][post-critsmash-triage] → [adv-main58+][adv-esr52.6+][post-critsmash-triage]

Daniel Veditz [:dveditz] back January 5, 2026

Updated

•

7 years ago

Group: core-security-release

fix-race 8 years ago Luke Wagner [:luke] 7.42 KB, patch		Details \| Diff \| Splinter Review
wip patch 8 years ago Jan Varga [:janv] 3.92 KB, patch		Details \| Diff \| Splinter Review
Part 1: Remove unused state 8 years ago Jan Varga [:janv] 1.33 KB, patch	asuth : review+ Sylvestre : approval-mozilla-beta- dveditz : sec-approval+	Details \| Diff \| Splinter Review
Part 2: Fix incorrect FailOnNonOwningThread() calls 8 years ago Jan Varga [:janv] 845 bytes, patch	asuth : review+ Sylvestre : approval-mozilla-beta- dveditz : sec-approval+	Details \| Diff \| Splinter Review
Part 3: Allow sending of the __delete__ message in one direction only 8 years ago Jan Varga [:janv] 11.46 KB, patch	asuth : review+ Sylvestre : approval-mozilla-beta- dveditz : sec-approval+	Details \| Diff \| Splinter Review
Part 4: Fix the actual race between ActorDestroy() on the PBackground thread and Run() on the I/O thread 8 years ago Jan Varga [:janv] 8.53 KB, patch	asuth : review+	Details \| Diff \| Splinter Review
Part 4: Prevent the state machine from continuing if QM is shutting down or the actor has been destroyed 8 years ago Jan Varga [:janv] 8.92 KB, patch	janv : review+ Sylvestre : approval-mozilla-beta- dveditz : sec-approval+	Details \| Diff \| Splinter Review
disable-asmjscache 8 years ago Luke Wagner [:luke] 5.87 KB, patch	bbouvier : review+	Details \| Diff \| Splinter Review
disable-asmjscache (ESR52) 7 years ago Luke Wagner [:luke] 5.90 KB, patch	ritu : approval-mozilla-esr52+	Details \| Diff \| Splinter Review