Closed Bug 1605895 Opened 5 years ago Closed 3 years ago

heap-use-after-free in [@ mozilla::css::Loader::Sheets::Lookup]

Categories

(Core :: Networking, defect, P2)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 + wontfix
firefox76 + wontfix
firefox77 + wontfix
firefox78 + wontfix
firefox79 + wontfix
firefox80 + wontfix
firefox81 - wontfix
firefox82 - wontfix
firefox83 --- wontfix

People

(Reporter: tsmith, Assigned: dragana)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [sec-survey])

Crash Data

Attachments

(9 files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
dveditz
: sec-approval+
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
dveditz
: sec-approval+
Details | Review

This has been seen once by the fuzzers running build m-c 20191222-d2a4d549f995

Unfortunately it is not reproducible at this time. If we are able to collect and reduce a testcase we will attach it.

==9824==ERROR: AddressSanitizer: heap-use-after-free on address 0x11566a9b3a30 at pc 0x7ffc9e78e0dc bp 0x0035fbff9a40 sp 0x0035fbff9a88
READ of size 8 at 0x11566a9b3a30 thread T0
    #0 0x7ffc9e78e0db in mozilla::css::Loader::Sheets::Lookup \src\layout\style\Loader.cpp:530
    #1 0x7ffc9e79ce42 in mozilla::css::Loader::CreateSheet \src\layout\style\Loader.cpp:1127
    #2 0x7ffc9e7a80bd in mozilla::css::Loader::LoadChildSheet \src\layout\style\Loader.cpp:2207
    #3 0x7ffc9e76f5dc in LoadImportSheet \src\layout\style\GeckoBindings.cpp:1680
    #4 0x7ffc9e76f22f in Gecko_LoadStyleSheet \src\layout\style\GeckoBindings.cpp:1720
    #5 0x7ffca3a1a3fb in geckoservo::stylesheet_loader::{{impl}}::request_stylesheet \src\servo\ports\geckolib\stylesheet_loader.rs:61
    #6 0x7ffca3df3036 in cssparser::rules_and_declarations::parse_at_rule<style::stylesheets::rule_parser::TopLevelRuleParser,style_traits::StyleParseErrorKind> \src\third_party\rust\cssparser\src\rules_and_declarations.rs:475
    #7 0x7ffca3a2f222 in geckoservo::glue::Servo_CssRules_InsertRule \src\servo\ports\geckolib\glue.rs:2100
    #8 0x7ffc9e7c8202 in mozilla::ServoCSSRuleList::InsertRule \src\layout\style\ServoCSSRuleList.cpp:180
    #9 0x7ffc9e7e5eb3 in mozilla::StyleSheet::InsertRule \src\layout\style\StyleSheet.cpp:483
    #10 0x7ffc99f295a9 in mozilla::dom::CSSStyleSheet_Binding::insertRule \src\obj-firefox\dom\bindings\CSSStyleSheetBinding.cpp:215
    #11 0x7ffc9b3640c2 in mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions> \src\dom\bindings\BindingUtils.cpp:3151
    #12 0xfdcc2bfc39  (<unknown module>)

0x11566a9b3a30 is located 80 bytes inside of 160-byte region [0x11566a9b39e0,0x11566a9b3a80)
freed by thread T0 here:
    #0 0x7ffcaec84ae4 in free Z:\task_1576855953\fetches\llvm-project\llvm\projects\compiler-rt\lib\asan\asan_malloc_win.cc:85
    #1 0x7ffc9e7ac72c in mozilla::css::SheetLoadData::~SheetLoadData \src\layout\style\Loader.cpp:362
    #2 0x7ffc9e78a625 in mozilla::css::SheetLoadData::Release \src\layout\style\Loader.cpp:241
    #3 0x7ffc9e7d74d2 in mozilla::css::StreamLoader::~StreamLoader \src\layout\style\StreamLoader.cpp:22
    #4 0x7ffc9e7d54d5 in mozilla::css::StreamLoader::Release \src\layout\style\StreamLoader.cpp:24
    #5 0x7ffc9541523f in mozilla::net::HttpBaseChannel::~HttpBaseChannel \src\netwerk\protocol\http\HttpBaseChannel.cpp:258
    #6 0x7ffc95456410 in mozilla::net::HttpChannelChild::~HttpChannelChild \src\netwerk\protocol\http\HttpChannelChild.cpp:210
    #7 0x7ffc954de97f in mozilla::net::HttpChannelChild::~HttpChannelChild \src\netwerk\protocol\http\HttpChannelChild.cpp:206
    #8 0x7ffc95457647 in mozilla::net::HttpChannelChild::Release \src\netwerk\protocol\http\HttpChannelChild.cpp:267
    #9 0x7ffc945ddc23 in mozilla::detail::RunnableMethodImpl<nsMemoryReporterManager *,nsresult (nsMemoryReporterManager::*)(),1,mozilla::RunnableKind::Standard>::Run \src\obj-firefox\dist\include\nsThreadUtils.h:1217
    #10 0x7ffc947b8228 in nsThread::ProcessNextEvent \src\xpcom\threads\nsThread.cpp:1241
    #11 0x7ffc947c4858 in NS_ProcessNextEvent \src\xpcom\threads\nsThreadUtils.cpp:486
    #12 0x7ffc95a6468f in mozilla::ipc::MessagePump::Run \src\ipc\glue\MessagePump.cpp:87
    #13 0x7ffc959a617e in MessageLoop::RunHandler \src\ipc\chromium\src\base\message_loop.cc:308
    #14 0x7ffc959a5f15 in MessageLoop::Run \src\ipc\chromium\src\base\message_loop.cc:290
    #15 0x7ffc9e161c3a in nsBaseAppShell::Run \src\widget\nsBaseAppShell.cpp:137
    #16 0x7ffc9e2ff1b8 in nsAppShell::Run \src\widget\windows\nsAppShell.cpp:406
    #17 0x7ffca24c1c08 in XRE_RunAppShell \src\toolkit\xre\nsEmbedFunctions.cpp:946
    #18 0x7ffc959a617e in MessageLoop::RunHandler \src\ipc\chromium\src\base\message_loop.cc:308

previously allocated by thread T0 here:
    #0 0x7ffcaec84bf4 in malloc Z:\task_1576855953\fetches\llvm-project\llvm\projects\compiler-rt\lib\asan\asan_malloc_win.cc:101
    #1 0x7ffcb12016dd in moz_xmalloc \src\memory\mozalloc\mozalloc.cpp:52
    #2 0x7ffc9e7a826d in mozilla::css::Loader::LoadChildSheet \src\layout\style\Loader.cpp:2227
    #3 0x7ffc9e76f5dc in LoadImportSheet \src\layout\style\GeckoBindings.cpp:1680
    #4 0x7ffc9e76f22f in Gecko_LoadStyleSheet \src\layout\style\GeckoBindings.cpp:1720
    #5 0x7ffca3a1a3fb in geckoservo::stylesheet_loader::{{impl}}::request_stylesheet \src\servo\ports\geckolib\stylesheet_loader.rs:61
    #6 0x7ffca3df3036 in cssparser::rules_and_declarations::parse_at_rule<style::stylesheets::rule_parser::TopLevelRuleParser,style_traits::StyleParseErrorKind> \src\third_party\rust\cssparser\src\rules_and_declarations.rs:475
    #7 0x7ffca3a2f222 in geckoservo::glue::Servo_CssRules_InsertRule \src\servo\ports\geckolib\glue.rs:2100
    #8 0x7ffc9e7c8202 in mozilla::ServoCSSRuleList::InsertRule \src\layout\style\ServoCSSRuleList.cpp:180
    #9 0x7ffc9e7e5eb3 in mozilla::StyleSheet::InsertRule \src\layout\style\StyleSheet.cpp:483
    #10 0x7ffc99f295a9 in mozilla::dom::CSSStyleSheet_Binding::insertRule \src\obj-firefox\dom\bindings\CSSStyleSheetBinding.cpp:215
    #11 0x7ffc9b3640c2 in mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy,mozilla::dom::binding_detail::ThrowExceptions> \src\dom\bindings\BindingUtils.cpp:3151
    #12 0xfdcc2bfc39  (<unknown module>)
Flags: needinfo?(emilio)

I'm almost sure this is a necko issue... OnStopRequest is not getting called... Dragana, do you know how that could happen, or who would know?

Flags: needinfo?(emilio) → needinfo?(dd.mozilla)

Maybe if OnStartRequest errors we don't call OnStopRequest or something?

Keywords: sec-high
Priority: -- → P2

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Maybe if OnStartRequest errors we don't call OnStopRequest or something?

We should always call onStop request. If it is not call it is a bug.Being able to reproduce the issue would help.
Emilio do you recognize any details that could narrow down where OnStart is called?

Flags: needinfo?(dd.mozilla) → needinfo?(emilio)

What I mean if you recognize that there was a special OnStartRequest's error code that could trigger this path, for example.

Flags: needinfo?(emilio)

@Emilio: I don't fully parse the implication of comment 5. Is this still actionable on our end, or a Necko issue?

Flags: needinfo?(emilio)
Assignee: nobody → emilio
Status: NEW → ASSIGNED

If I had to guess this is a Necko issue (I've read the style system code involved with those hashmaps and it looks sound).

I've posted a patch to try to get a better idea of what's going on.

Assignee: emilio → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(emilio)
Assignee: nobody → emilio
Keywords: leave-open

Comment on attachment 9126262 [details]
Bug 1605895 - Add some assertions to catch misuse of the stream loader / load datas. r=heycam

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not easily, I'd think.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
  • Which older supported branches are affected by this flaw?: probably all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: This is just adding some sanity checking code, doesn't fix the potential bug.

The bug seems like some sort of network race so doesn't seem very exploitable, off hand...

  • How likely is this patch to cause regressions; how much testing does it need?:
Attachment #9126262 - Flags: sec-approval?

Hey tom, not sure what's the best pattern to land this diagnostic patch, halp?

I could land it as part of this patch as I think it doesn't seem to unveil too much of what's going wrong. But I could also just use the fact that this code is going to get changed soon in bug 1520690 to land it in a separate bug with a message like:

Bug XXX - Add stronger assertions around SheetLoadData.

Constructable Stylesheets are going to touch surronding code, and it
is going to be better if we catch potential mistakes in that new code early.

Or something like that.

Flags: needinfo?(tom)

If 'soon' is in the next week; then sure land it under cover; but otherwise given the aspects of this I think it's okay to land it under this bug id.

Flags: needinfo?(tom)
Attachment #9126262 - Flags: sec-approval? → sec-approval+
Group: layout-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla75

Why was this closed?

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Regressions: 1615832

Bug 1615832 has some crash reports of this happening, involving nested sync loads to nsSyncLoadService.

Dragana, the assumption the style system has always done is that if the AsyncOpen call succeeds, then at some point OnStopRequest will be called. It seems this can not hold in some cases. We don't have a repro yet but do the stacks in that report help? Any other info that may be interesting?

Flags: needinfo?(dd.mozilla)

I could repro this trivially with toolkit.legacyUserProfileCustomizations.stylesheets=true and @import "foo.css"; in my userChrome.css.

I'll take a look ASAP.

Flags: needinfo?(emilio)

This is a bug in the diagnostic patch, I wasn't accounting for sync loads.

Flags: needinfo?(emilio)
Flags: needinfo?(dd.mozilla)

https://crash-stats.mozilla.org/report/index/e2009119-0737-4030-b951-52fd60200217 does show the issue though.

Dragana, those are channels that get destroyed (along with their listeners) after AsyncOpen succeeds but before OnStopRequest gets called. Any idea what may be going on there?

Flags: needinfo?(dd.mozilla)

Honza, can you take a look?

Flags: needinfo?(dd.mozilla) → needinfo?(honzab.moz)

The c-rep in comment 18 suggest the child is killed, that is something else than the UAF. But I will be looking into this more.

(In reply to Honza Bambas (:mayhemer) from comment #20)

The c-rep in comment 18 suggest the child is killed, that is something else than the UAF. But I will be looking into this more.

Sure, but that's the assertions I just added kicking in, right?

(In reply to Emilio Cobos Álvarez (:emilio) from comment #21)

(In reply to Honza Bambas (:mayhemer) from comment #20)

The c-rep in comment 18 suggest the child is killed, that is something else than the UAF. But I will be looking into this more.

Sure, but that's the assertions I just added kicking in, right?

Ah, I probably misread the IPC message. I somehow translated the code to be a killing of the PARENT process that releases the child channel in between states, but this is something else, right.

This will be hard to figure out. I suspect some rare and complicated race and possibly service worker or redirect and/or cancellation involvement. W/o a str this will be hard to figure out. We may have a workaround by fulfilling the stream listener contract in child channel's destructor, except that will hide the actual problem.

I wonder if the child channel is called HttpChannelChild::ActorDestroy and with what value of aWhy. I'll have a patch to extend the assertion to expose the value. We may just tweak the condition in that method and get what we want.

Crash Signature: [@ mozilla::css::Loader::Sheets::Lookup] → [@ mozilla::css::Loader::Sheets::Lookup] [@ mozilla::css::StreamLoader::~StreamLoader ]

Honza, Emilio, is there an update on this? Any chance we can get a safe fix this week for 75?

Flags: needinfo?(emilio)

I don't know the necko internals. Note that the crash with the diagnostic assert won't hit release though... I'd expect an actual crash as a result of this to be way more infrequent.

Flags: needinfo?(emilio)

I'm working on an additional patch to assert with a formatted message (expose some state data) when the channel is destroyed w/o onstop being called.

Let you know that I will soon change references between the channel and the loader. As part of the work on rel=preload, it will likely be exactly the StreamLoader class (: nsIStreamListener) that will hold a strong ref (until OnStopRequest) back to the channel to be able to reprioritize and cancel it. It will fix the crash, because when the channel is referred by its stream listener (a common thing!) we will rather leak than crash - there will be a cycle that only OnStopRequest call will break.

This could also be a quick fix for the crash in the short term.

Flags: needinfo?(honzab.moz)
Status: REOPENED → NEW
Target Milestone: mozilla75 → ---

Comment on attachment 9135391 [details]
Bug 1605895 - Add a diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest before being destroyed, r=michal

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Hard to say, but this narrows down the fact that we are hunting something around http channels lifetime and css loaders lifetime.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
  • Which older supported branches are affected by this flaw?:
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?:
Attachment #9135391 - Flags: sec-approval?

Comment on attachment 9135391 [details]
Bug 1605895 - Add a diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest before being destroyed, r=michal

sec-approval+

Attachment #9135391 - Flags: sec-approval? → sec-approval+

Thanks, this needs to pass try first. I think I forgot about one more case to NOT do this check (failed or never called AsyncOpen).

Flags: needinfo?(honzab.moz)

We have the first crash in our automation: bug 1627946. I will look at that. At the moment not sure if it's not just a red herring related to shutdown and needs even more care.

And one live case: bug 1628262.

Regressions: 1628262
Regressions: 1627946

What's the status on this? Looks like the volume has died down.

Honza: Feels like we should re-assign this to you and move to Core:Networking?

Flags: needinfo?(honzab.moz)

(In reply to Sean Voisen (:svoisen) from comment #37)

What's the status on this? Looks like the volume has died down.

The volume has reduced because I disabled the assert in beta. https://bugzilla.mozilla.org/show_bug.cgi?id=1615832#c28

Michal, sorry I didn't react to your review comments earlier (missed the email), can you please look at the patch again? No major changes. Thanks.

(In reply to Sean Voisen (:svoisen) from comment #37)

What's the status on this? Looks like the volume has died down.

Honza: Feels like we should re-assign this to you and move to Core:Networking?

Possibly, because this is a Necko issue. Note that the diagnostic assertions make this crash go away (we crash in a controlled way sooner).

Flags: needinfo?(honzab.moz)
Assignee: emilio → nobody
Component: CSS Parsing and Computation → Networking

setting NI based on comment #39

Flags: needinfo?(michal.novotny)
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED

Comment on attachment 9142471 [details]
Bug 1605895 - Advance the diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest, r=michal

Beta/Release Uplift Approval Request

  • User impact if declined: because of unintentional review delays the simple assertion slipped to beta, we want to update it too to get more diagnostic info from that channel too.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): zero
  • String changes made/needed:
Attachment #9142471 - Flags: approval-mozilla-beta?

Comment on attachment 9142471 [details]
Bug 1605895 - Advance the diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest, r=michal

Approved for our next beta.

Attachment #9142471 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Backed out for build bustages at netwerk/protocol/http/HttpChannelChild.cpp and dist/include/mozilla/Assertions.h

https://hg.mozilla.org/integration/autoland/rev/ddeedc2eaec11363fd0c11d55a8dab2dff029c9a

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&revision=d1bc7c0ea7b374382983d6ab6fbf194ea05cd5e4
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=301907185&repo=autoland

netwerk/protocol/http/HttpChannelChild.cpp:229:9: error: format specifies type 'unsigned int' but the argument has type 'nsTArray_base::size_type' (aka 'unsigned long') [-Werror,-Wformat]

Flags: needinfo?(honzab.moz)

Comment on attachment 9142471 [details]
Bug 1605895 - Advance the diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest, r=michal

This got backed out on autoland, retracting the uplift approval until we get an updated patch.

Attachment #9142471 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Attachment #9142471 - Flags: approval-mozilla-beta-

./mach try is broken on current m-c, so I risked landing directly. wish these checks were on windows local builds by default!

Flags: needinfo?(honzab.moz)
Flags: needinfo?(michal.novotny)

(In reply to Honza Bambas (:mayhemer) from comment #45)

./mach try is broken on current m-c, so I risked landing directly. wish these checks were on windows local builds by default!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1f130412f9a30841d0e528cac238812b84c4e769

Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Target Milestone: mozilla78 → ---

Comment on attachment 9142471 [details]
Bug 1605895 - Advance the diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest, r=michal

Beta/Release Uplift Approval Request

see Comment 41, this has now successfully landed on nightly, Ok to land on Beta too.

Attachment #9142471 - Flags: approval-mozilla-beta?

Comment on attachment 9142471 [details]
Bug 1605895 - Advance the diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest, r=michal

Approved for 77 beta 7, thanks.

Attachment #9142471 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9142471 [details]
Bug 1605895 - Advance the diagnostic assertion when an http channel on the child process loading CSS does not notify OnStopRequest, r=michal

Clearing approval flag to keep this off the sheriff's uplift queries since it's now landed on beta.

Attachment #9142471 - Flags: approval-mozilla-beta+

Did we get any closer on this with the latest diagnostic patch?

Flags: needinfo?(honzab.moz)

(In reply to Julien Cristau [:jcristau] from comment #52)

Did we get any closer on this with the latest diagnostic patch?

It's a bit narrowed down, nothing more. I've updated the assertion for more info to rule out other possible cause.

I was also looking at few first crashes and this could somehow be related to this change of how we handle events in http channel child. It landed shortly before first few crashes appeared. I'm only referring it here, I'm really not certain there is a connection - that change is not functionally changing anything.

Flags: needinfo?(honzab.moz)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #55)

Advance the diagnostic assertion to check status of channel event queues, r=michal,necko-reviewers
https://hg.mozilla.org/integration/autoland/rev/8f0ceddcef8514d62cebe5a71819bf8bb8c7c242
https://hg.mozilla.org/mozilla-central/rev/8f0ceddcef85

According the reports mEventQ->IsEmpty() is true and mBgChild is null. Another theory is gone...

~HttpChannelChild, mOnStopRequestCalled=false, mStatus=0x804b0002, mActorDestroyReason=1, 20200618 flags=3328

Looks like we never call CleanupBackgroundChannel(). The only possibility is that we are missing HandleAsyncAbort() for some situation. This will be hard to find. I'll try one patch with a possible theory fix and let it test in the field.

Status: REOPENED → ASSIGNED

Comment on attachment 9160966 [details]
Bug 1605895, r=kershaw

This patch is an attempt to actually fix this bug, no longer just a diagnostic assertion with data. It's based only on a theory but needs field testing, hence asking for landing this.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: The crash is so far from this code that it's very hard to figure this out. We already narrowed this down by adding some code specifically to the css loader with the diagnostic patches, those were also approved.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?: easy to have them
  • How likely is this patch to cause regressions; how much testing does it need?: I will push to try (anonymously) and report back. This is a small change, but could have unforeseen consequences.
Attachment #9160966 - Flags: sec-approval?

Comment on attachment 9160966 [details]
Bug 1605895, r=kershaw

approval+

Attachment #9160966 - Flags: sec-approval? → sec-approval+

Looks like we're still seeing some crashes in Nightly builds with the patch from comment 64 applied.

Flags: needinfo?(honzab.moz)

So, there are two categories:
flags=3328
flags=3072

3328:
true: mCancelled (with 0x804b0002), mEventQ->IsEmpty(), nullBgChild,
mBackgroundChildQueueFinalState == BCKCHILD_UNKNOWN (HttpBackgroundChannelChild::OnChannelClosed never called)

3072:
true: mEventQ->IsEmpty(), nullBgChild
but NOT cancelled!
mBackgroundChildQueueFinalState == BCKCHILD_UNKNOWN (HttpBackgroundChannelChild::OnChannelClosed never called)

So, there are two code paths we fail to deliver...

See Also: → 1649560

What's the status for Fx79?

Flags: needinfo?(honzab.moz)

(In reply to Nhi Nguyen (:nhi) from comment #69)

What's the status for Fx79?

For now still wontfix. The last theory didn't prove. I fixed a different, more clear cause in Bug 1651661. There is then bug 1649728 that may have an effect on this one too that should get to 79, but that is not clear.

I'm still investigating the unknown and more frequent cause of the missing stream listener notifications as part of this bug.

Flags: needinfo?(honzab.moz)

New flags from the last assertion:
2297088 =

  • mCanceled
  • mEventQ->IsEmpty()
  • nullBgChild
  • mRemoteChannelExistedAtCancel
  • mEverHadBgChildAtAsyncOpen
  • mCanSendAtCancel
  • mBackgroundChildQueueFinalState == BCKCHILD_UNKNOWN => HttpBackgroundChannelChild::OnChannelClosed() not called; it can't happen that the member would be nullified before that call.

This all again points only at one possible case I tried to fix with D82037... But, there is one point we can get stuck at - when the channel is suspended at the time of cancellation and not released before all references to are freed:
https://searchfox.org/mozilla-central/rev/1b95a0179507a4dc7d4b0c94c2df420dc1a72885/netwerk/protocol/http/HttpBaseChannel.h#959-968

Holding a strong ref to self when establishing mCallOnResume would probably help. If confirmed, this is a bug in channel consumers, not the channel itself. But we need to be resilient to this.

Advancing the assertion even further to confirm.

20200717 flags=2297088 => the theory from comment 72 has been disproved. I'm officially out of ideas how this can happen :(

Comment on attachment 9164791 [details]
Bug 1605895, r=kershaw

This is an actual fix for this crash (kinda hacky, but safe, clear and effective).

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: As before, it's very far from the actual crash this is causing, so really not easily.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: This is easy to do and the change is not risky.
  • How likely is this patch to cause regressions; how much testing does it need?: Hard to say, I believe this is safe enough to not cause any problems.
    We can only do field testing, I was never able to reproduce the crash.
Attachment #9164791 - Flags: sec-approval?

Comment on attachment 9164791 [details]
Bug 1605895, r=kershaw

sec-approval+

Attachment #9164791 - Flags: sec-approval? → sec-approval+

Looks like we had just one crash since the last patch, bp-2cb5f530-feb9-4a7f-b8f4-276870200725.

(In reply to Julien Cristau [:jcristau] from comment #80)

Looks like we had just one crash since the last patch, bp-2cb5f530-feb9-4a7f-b8f4-276870200725.

OK, then there is the same case of a crash happening on the parent process (missed that before among all the crash reports). The fix landed in comment 79 is only for the content process crashes.

I'll look into the parent process crash next week.

Assignee: honzab.moz → nobody
Status: ASSIGNED → NEW

This needs to be retriaged by Necko. Per comment 81, what is likely required here is porting the fix from comment 79 to the parent code, so it is fixed there as well. Dragana, can you reassign this? Thanks!

Flags: needinfo?(dd.mozilla)

I will take a look.

Flags: needinfo?(dd.mozilla)
Depends on: 1703790

Comment 81 suggest to do similar thing as on the child process. That would mean to artificially prolong the lifetime of an object after ref-count goes to 0. That is in my opinion more risky. This crash is rare.
I have spent some time looking into this, but I think the right think to do is to make nsHttpChannel more maintainable (bug 1703790).
This will be done in H1 2021.

Meanwhile, I will submit a patch that removes some unnecessary code from nsHttpChannel and also add explicit check and ensure to set mStatus to a failure code if a channel is canceled. if this is ever set to non-failure code this crash may happen. I do not expect that the patch will fix this problem. (Bug 1703793)

Assignee: nobody → nhnt11

We will observe the rate now that bug 1703793 rode into release.

This looks like bug 1703793 fixed the issue. Lets wait for 1-2 weeks before closing this.

Assignee: nhnt11 → nobody

This bug should be fixed. Let's close this.

Status: NEW → RESOLVED
Closed: 4 years ago3 years ago
Keywords: leave-open
Resolution: --- → FIXED
Assignee: nobody → honzab.moz
Target Milestone: --- → 89 Branch

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(honzab.moz)
Whiteboard: [sec-survey]
Assignee: honzab.moz → dd.mozilla
Flags: needinfo?(honzab.moz) → needinfo?(dd.mozilla)

The survey: done.

Flags: needinfo?(dd.mozilla)
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: