Closed Bug 1663499 Opened 4 years ago Closed 4 years ago

Crash in [@ mozilla::MozPromise<T>::ThenInternal | mozilla::MozPromise<T>::ThenCommand<T>::~ThenCommand]

Categories

(Core :: DOM: Navigation, defect, P1)

All
Windows
defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox80 --- unaffected
firefox81 --- unaffected
firefox82 --- fixed

People

(Reporter: gsvelto, Assigned: Gijs)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/bd541ee5-2352-4b94-8af4-4c0980200905

Top 10 frames of crashing thread:

0 xul.dll mozilla::MozPromise<nsresult, nsresult, 1>::ThenInternal xpcom/threads/MozPromise.h:862
1 xul.dll mozilla::MozPromise<nsresult, nsresult, 1>::ThenCommand<mozilla::MozPromise<nsresult, nsresult, 1>::ThenValue<`lambda at /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:358:11', `lambda at /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:361:11'> >::~ThenCommand xpcom/threads/MozPromise.h:906
2 xul.dll static mozilla::MozPromise<nsresult, nsresult, 1>::All xpcom/threads/MozPromise.h:356
3 xul.dll nsWebBrowserPersist::EndDownload dom/webbrowserpersist/nsWebBrowserPersist.cpp:2304
4 xul.dll mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1240
5 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:512
6 xul.dll mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:83:7'>::Run xpcom/threads/nsThreadUtils.h:577
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:327

This crash appears to be a regression from bug 1658202. Gijs can you have a look?

The raw crash reason is MOZ_DIAGNOSTIC_ASSERT(!IsExclusive || !mHaveRequest) (Using an exclusive promise in a non-exclusive fashion). Note that the only relevant crashes are those in the nightly channel, the other ones are unrelated.

Interestingly the non-nightly crashes (both on Windows 7) seem to indicate a kind of UAF (EXCEPTION_IN_PAGE_ERROR_EXEC and EXCEPTION_ACCESS_VIOLATION_WRITE). So maybe the MOZ_DIAGNOSTIC_ASSERT is just triggered by random memory content pointed to through some dangling ThenValue pointer ?

Anyway, the MOZ_DIAGNOSTIC_ASSERT has been re-enabled here: https://phabricator.services.mozilla.com/D14138. Any thoughts, Jean-Yves?

Group: core-security
Flags: needinfo?(jyavenard)

The non-nightly crashes have all different stacks so they appear to be unrelated. They also have very different crash reasons (EXCEPTION_IN_PAGE_ERROR_EXEC / STATUS_IO_TIMEOUT, EXCEPTION_IN_PAGE_ERROR_READ / STATUS_DEVICE_DATA_ERROR and EXCEPTION_PRIV_INSTRUCTION) and addresses so they're probably coming from machines with flaky hardware.

That assert happens if you happen to run multiple Then() on an exclusive promise.

From the previous comment, it seems to be a race on mFileClosePromises member

I'm not familiar with this code, nor is there any documentations on which thread thread all of this is running.

However, we have nsWebBrowserPersist::OnStopRequest that I'm guessing run on the main thread and https://searchfox.org/mozilla-central/source/dom/webbrowserpersist/nsWebBrowserPersist.cpp#2304-2309

Which can run on the background taskqueue added by bug 1658202.

Worse, the various dispatches are made using NS_DispatchToCurrentThread(; the background taskqueue is running on top of a thread pool with multiple threads.

So NS_DispatchToCurrentThread isn't the method you want to use here as there's no guarantee this is the thread you want. All of this code should be using GetCurrentSerialEventTarget()->Dispatch(...) instead

Flags: needinfo?(jyavenard) → needinfo?(gijskruitbosch+bugs)

(In reply to Gabriele Svelto [:gsvelto] (PTO until September 5th) from comment #2)

The non-nightly crashes have all different stacks so they appear to be unrelated. They also have very different crash reasons (EXCEPTION_IN_PAGE_ERROR_EXEC / STATUS_IO_TIMEOUT, EXCEPTION_IN_PAGE_ERROR_READ / STATUS_DEVICE_DATA_ERROR and EXCEPTION_PRIV_INSTRUCTION) and addresses so they're probably coming from machines with flaky hardware.

Sorry for the noise then. I cannot remove the security flag, it seems.

(In reply to Jean-Yves Avenard [:jya] from comment #3)

That assert happens if you happen to run multiple Then() on an exclusive promise.

From the previous comment, it seems to be a race on mFileClosePromises member

I'm not familiar with this code, nor is there any documentations on which thread thread all of this is running.

I'm confused; I did try to add documentation (next to the methods and data structures that are used away from the main thread) when I moved some of it away from the main thread. Where did you look and not find the documentation?

Specifically:

https://searchfox.org/mozilla-central/rev/76a83d0a218837ba6937d6a0fac51cb0008c2334/dom/webbrowserpersist/nsWebBrowserPersist.cpp#102-112
https://searchfox.org/mozilla-central/rev/76a83d0a218837ba6937d6a0fac51cb0008c2334/dom/webbrowserpersist/nsWebBrowserPersist.cpp#948-955
https://searchfox.org/mozilla-central/rev/76a83d0a218837ba6937d6a0fac51cb0008c2334/dom/webbrowserpersist/nsWebBrowserPersist.h#168-169,176-177

However, we have nsWebBrowserPersist::OnStopRequest that I'm guessing run on the main thread

Correct.

and https://searchfox.org/mozilla-central/source/dom/webbrowserpersist/nsWebBrowserPersist.cpp#2304-2309

Which can run on the background taskqueue added by bug 1658202.

Which part of the highlighted lines, and why/how? All of the callers of EndDownload are on the main thread, so I'd expect all of the highlighted lines to run on the main thread, too, as all the event targets they refer to will be the main thread, too?

Worse, the various dispatches are made using NS_DispatchToCurrentThread(; the background taskqueue is running on top of a thread pool with multiple threads.

I didn't add any of these, and all the callers of NS_DispatchToCurrentThread that I see in that file are supposed to run on the main thread. The dispatches I added either use the queue or NS_DispatchToMainThread...

So NS_DispatchToCurrentThread isn't the method you want to use here as there's no guarantee this is the thread you want. All of this code should be using GetCurrentSerialEventTarget()->Dispatch(...) instead

But this should be fine if the current thread is the main thread, right?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(jyavenard)

Removing sec group per comment #4.

Group: core-security

If I had to guess about what happens, here, it'd be that there's some existing confusion in the persist code that leads to us calling EndDownload multiple times, and that violates the assert because we don't clear the promise array from the Then call, so we try to Then the promises again, which violates the assert in comment #0.

discussed off-line on slack.

IPDL actor runs on the main thread, so there's no issue than=t EndDownload is called from multiple threads.

Adding MOZ_ASSERT to clarify the code would help.

Flags: needinfo?(jyavenard)
Flags: needinfo?(gijskruitbosch+bugs)
Regressed by: 1658202
Has Regression Range: --- → yes

Set release status flags based on info from the regressing bug 1658202

AFAICT at least one case where we can call EndDownload multiple times is when someone calls Cancel() on the webbrowserpersist object. This sets mCancel to true and calls EndDownload directly.

But inside OnDataAvailable, if mCancel is true, we also call EndDownload. After bug 1658202 this is dispatched to the main thread but it was a problem already.

I can't tell from the stacks in the crash reports whether this is the case triggering these crashes, but we may as well fix it, I guess...

Flags: needinfo?(gijskruitbosch+bugs)

OK, running tests with asserts found me at least one more issue: SerializeNextFile, called when we complete any single file of many which we're persisting, dispatches FinishDownload to main thread, which calls EndDownload with NS_OK. But we may have failed by then (e.g. because something called Cancel(), or because one of the subresources was blocked), and already notified consumers that the download finished...

Component: DOM: Core & HTML → DOM: Navigation
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Severity: -- → S3
Priority: -- → P1

This fixes a few causes of firing EndDownload twice:

  1. firing it from OnDataAvailable if the download was already canceled when the
    method was initially invoked, rather than if we are ourselves wanting to
    cancel the download because we encountered issues inside OnDataAvailable.
  2. firing it from FinishDownload, dispatched from SerializeNextFile, after the
    download has already been ended elsewhere.
  3. calling Cancel() multiple times.

It also adds some code to avoid the re-entrancy on the Promise code that the
bug was originally filed for, and a diagnostic assert to check if there are any
other cases of repeated EndDownload calling that we've missed.

As a driveby, it adds a few thread assertions to help with code clarity.

Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ef3f6fba28f9
fix ending webbrowserpersist downloading so it only happens once, r=valentin
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: