Bugzilla

Assignee

Comment 3

•

7 years ago

(In reply to Kit Cambridge (he/him) [:kitcambridge] from comment #2)
> I don't understand what `RegisterVisitedCallback` does yet, or why moving a
> block of code in  around would cause this crash. Is this because the
> `KeyClass` doesn't live long enough now? Maybe `VisitedQuery::Start` is
> failing now...though, based on the comment and the stacks, we're always
> getting called with a null `aLink` "in IPC builds", which I'm guessing is
> e10s?

RegisterVisitedCallback is the way links on a page register with history to be updated (link coloring) when a page is visited. The content process passes a Link, while the parent process passes null (it doesn't have access to the link).

while Kit's change is the only one explicitly about an history change, there are another couple changes that could be related:
Bug 1408789 - Use VS2017 for official Windows builds
Bug 1385438 - AtomicRefCounted doesn't need to use sequential consistency

I must note that we have another crash bug pointing to History::KeyClass (Bug 1412647) so it's very likely there's a misuse here and it just became more evident. That crash also increased a lot recently, maybe after we enabled Stylo? Additionally bug 1382923 changed the relationship between History and Links and the crashes increased briefly after that landing.

I'm needinfo-ing Nathan, because he worked on the AtomicRefCounted, and he has a deep expertize in refcounted use patterns, I guess at this point any help is welcome and he may have ideas. We may need to completely re evaluate the history links relationship :(
I'll wrap again my head around this code and see if I can figure out something.

Flags: needinfo?(nfroyd)

Assignee

Updated

•

7 years ago

Flags: needinfo?(mak77)

Priority: -- → P1

Whiteboard: [fxsearch]

Assignee

Comment 4

•

7 years ago

taking to have it on my radar. As I said, feel free to evaluate or propose solutions if you see obvious culprits.

Assignee: nobody → mak77

Status: NEW → ASSIGNED

Nathan Froyd [:froydnj]

Comment 5

•

7 years ago

I don't have any super-specific refcounting advice to add; bug 1385438 should have had zero effect on things, other than making them ever-so-slightly faster.  We landed a virtually identical change in bug 1277709 that touched the XPCOM refcounting code, which is much more widely used than RefCounted, and that had no fallout that I know of.

The crash is pretty strange: we're removing an entry from the mObservers table, and we find that the entry (or perhaps memory the entry points to) has been freed.  But we were able to look up the entry without a problem earlier in the function, which suggests that either:

1) We have a race on modifying the mObservers table itself; or
2) Somebody is modifying the data in an mObservers entry out from underneath us.

Glancing through History.cpp, I don't see any evidence that mObservers is modified off the main thread.  Is that correct?  Is it possible that individual entries could be touched off the main thread?

Flags: needinfo?(nfroyd)

Updated

•

7 years ago

Keywords: csectype-uaf, sec-high

Assignee

Comment 6

•

7 years ago

Another insteresting fact is bug 1372433 changed the way we use mObservers, before it was all synchronous, now we dispatch to the document group or to the main thread. Though, we should still be on the same thread, unless the dispatch does something magic with threads.
It seems to go through xpcom's SchedulerGroup, though from there it's unclear to me if things go to the main-thread or not. It looks like documents, not being chrome, may have their own event target?
Nathan do you know anything about this scheduling, or do you know who could help? Could the doc->Dispatch calls end up executing the runnables off the main thread? Then yes, they could remove the entry between the mainthread gets it and removes it and we may need a mutex.

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 7

•

7 years ago

Dispatching to the document still ends up running things on the main thread.  No worries there.

Flags: needinfo?(nfroyd)

Assignee

Comment 8

•

7 years ago

I was looking at aggregates, the only interesting thing is that uptime is, in most cases, very short: less than 20 seconds. So either for these users this hits so often that it crashes at the first page load, or it's related to some startup events.

Assignee

Comment 9

•

7 years ago

I spent various hours checkin threads, hashtable usage and keys usage, I didn't find anything evident that may be causing this :(
The only things I found is:
1. nsURIHashTable has a public non-virtual destructor, that could be a footgun if someone deletes a derived entry from the base class. Though, it doesn't look like our case off-hand.
2. Looks like sometimes we could mutate an nsIURI (thus bug 677281), and then our code would fail. It's unclear if we could hit one of these cases.

I'm not sure how to proceed, apart from adding workarounds like converting the getEntry debug check just before the RemoveEntry call to an actual if, and maybe try to make nsURIHashTable destructor virtual.

Assignee

Comment 10

•

7 years ago

(In reply to Nathan Froyd [:froydnj] from comment #5)
> 1) We have a race on modifying the mObservers table itself; or

Provided everything seems to run on the main-thread, the only thing that could modify mObservers between the put and the remove calls is VisitedQuery::Start. But that method always uses NS_DispatchToMainThread(), thus unless NS_DispatchToMainThread does magic things like invoking ->Run synchronously (doesn't look like and it would be dumb) it doesn't seem possible.

> 2) Somebody is modifying the data in an mObservers entry out from underneath
> us.

History is the only consumer of the entries in mObservers, and I couldn't find anything between get and put that modifies those.

Marcia Knous [:marcia]

Reporter

Updated

•

7 years ago

status-firefox59: --- → affected

:Gijs (he/him)

Comment 12

•

7 years ago

(In reply to Marco Bonardo [::mak] from comment #8)
> I was looking at aggregates, the only interesting thing is that uptime is,
> in most cases, very short: less than 20 seconds. So either for these users
> this hits so often that it crashes at the first page load, or it's related
> to some startup events.

The dupe is a security researcher with a startup crash, in case that helps...

Assignee

Comment 13

•

7 years ago

yep, it's mostly startup crashes. I guess for now I'll try to go with a stop-gap, because I can't explain this crash.

Lina Butler [:lina]

Comment 14

•

7 years ago

I noticed `NotifyVisited` and `NotifyVisitedForDocument` hold script blockers when they access `mObservers`. Is it possible a script is spinning the event loop and changing `mObservers`, and we need to add blockers to `RegisterVisitedCallback` and `UnregisterVisitedCallback`, too?

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 15

•

7 years ago

The reason for the script blockers is that notifying the visited status calls SetLinkState() that may cause a modal dialog to be shown, that spins the events loop, causing a nested notification. We had a clear stack in that case (bug 668245) showing the modal loop and the nested call. The stack here doesn't show a nested call unfortunately.
Though, you could be onto something, between the putEntry and the removeEntry calls, the nsTHashTable may realloc, and that usually happens with a nested loop. It's hard to verify though.

Comment 16

•

7 years ago

Marco - can you at least take a shot at a stopgap, since this is a 58 regression and we need to resolve this soon.

Thanks

Flags: needinfo?(mak77)

Assignee

Comment 17

•

7 years ago

I'll make a stop-gap patch today, that hopefully will help.
Fwiw, this doesn't look like a topcrash, and it's not a new crash, since it existed before the regression. What regressed is the amount of times it hits.

Assignee

Comment 18

•

7 years ago

Attached patch patch v1 — Details — Splinter Review

Flags: needinfo?(mak77)

Attachment #8935373 - Flags: review?(gijskruitbosch+bugs)

:Gijs (he/him)

Comment 19

•

7 years ago

Comment on attachment 8935373 [details] [diff] [review]
patch v1

Review of attachment 8935373 [details] [diff] [review]:
-----------------------------------------------------------------

Tentative rs=me. Maybe Doug Thayer could do a second review here in case there's anything else obvious to do/try?

::: toolkit/components/places/History.cpp
@@ +2705,5 @@
>      if (NS_FAILED(rv) || !aLink) {
>        // Remove our array from the hashtable so we don't keep it around.
>        MOZ_ASSERT(key == mObservers.GetEntry(aURI), "The URIs hash mutated!");
> +      // In some rare case this crashes for unknown reasons.  Our suspect is
> +      // that something between PutEntry and this call causes a nested loop

Nit: "this crashes" - could do with clarifying like "calling RemoveEntry() with the current value of `key` " or something like that.

Attachment #8935373 - Flags: review?(gijskruitbosch+bugs) → review+

Assignee

Comment 20

•

7 years ago

(In reply to :Gijs from comment #19)
> Tentative rs=me. Maybe Doug Thayer could do a second review here in case
> there's anything else obvious to do/try?

We can continue the investigation in bug 1412647, that imo is the same exact thing wit ha slightly different signature, but I don't think there's anything obvious here, based on the hours I spent investigating it (surely I may have missed something, no doubt). The most likely theory (nested loop) looks hard to confirm and without a way to reproduce the crash (even by chance on Try!).

Assignee

Updated

•

7 years ago

Blocks: 1412647

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Assignee

Comment 21

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/94f1fb1043a81e619d7b73d48a86d66d1cf4eebc

Comment 22

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/94f1fb1043a8

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox59: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Julien Cristau [:jcristau]

Assignee

Comment 23

•

7 years ago

Comment on attachment 8935373 [details] [diff] [review]
patch v1

Approval Request Comment
[Feature/Bug causing the regression]: unknown
[User impact if declined]: UAF crash
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: not yet
[Needs manual test from QE? If yes, steps to reproduce]: no known steps
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: it's a simple stop-gap fix
[String changes made/needed]: none

Attachment #8935373 - Flags: approval-mozilla-beta?

Comment 24

•

7 years ago

Comment on attachment 8935373 [details] [diff] [review]
patch v1

sec-high, stop-gap fix for beta58

let's try this for 58.0b11

Attachment #8935373 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 25

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/ff2ac648087a

Does ESR52 have the same underlying defect (even if it's not actively crashing)?

status-firefox57: --- → wontfix

status-firefox58: affected → fixed

status-firefox-esr52: --- → ?

Flags: needinfo?(mak77)

Assignee

Comment 26

•

7 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #25)
> https://hg.mozilla.org/releases/mozilla-beta/rev/ff2ac648087a
> 
> Does ESR52 have the same underlying defect (even if it's not actively
> crashing)?

the code is the same, the cause (nested loop or event loop spinning in a notification) may be missing.

Flags: needinfo?(mak77)

Assignee

Comment 27

•

7 years ago

Comment on attachment 8935373 [details] [diff] [review]
patch v1

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: it's a safe stop-gap for a possible UAF
User impact if declined: possible crash
Fix Landed on Version: 58, 59
Risk to taking this patch (and alternatives if risky): none, it's quite simple
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #8935373 - Flags: approval-mozilla-esr52?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

7 years ago

Group: toolkit-core-security → core-security-release

Updated

•

7 years ago

status-firefox-esr52: ? → affected

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 28

•

7 years ago

Hi Al, Dan, do we need a sec-approval for this one? I'll do A+ for ESR52 as I am assuming this is something we need for 58 and ESR52.6

Flags: needinfo?(dveditz)

Flags: needinfo?(abillings)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 29

•

7 years ago

Comment on attachment 8935373 [details] [diff] [review]
patch v1

Sec-high, ESR52+

Attachment #8935373 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Ryan VanderMeulen [:RyanVM]

Comment 30

•

7 years ago

(In reply to Ritu Kothari (:ritu) from comment #28)
> Hi Al, Dan, do we need a sec-approval for this one? I'll do A+ for ESR52 as
> I am assuming this is something we need for 58 and ESR52.6

Yes, this should have had sec-approval? and the template questions answered before it was checked in at all. We need this set and the questions answered.

Flags: needinfo?(abillings) → needinfo?(mak77)

Comment 31

•

7 years ago

https://hg.mozilla.org/releases/mozilla-esr52/rev/0cc1c9068714

status-firefox-esr52: affected → fixed

Updated

•

7 years ago

Flags: needinfo?(dveditz)

Assignee

Comment 32

•

7 years ago

Comment on attachment 8935373 [details] [diff] [review]
patch v1

[Security approval request comment]
How easily could an exploit be constructed based on the patch?
unlikely, the patch is a complete stop gap

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
it surely makes clear where we crashed, but there are no known ways to cause the crash and the stop gap fix makes that not possible

Which older supported branches are affected by this flaw?
Likely all of them

If not all supported branches, which bug introduced the flaw?
unknown

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?
The patch is trivial enough and not risky.

How likely is this patch to cause regressions; how much testing does it need?
Unlikely to cause regressions.

Flags: needinfo?(mak77)

Attachment #8935373 - Flags: sec-approval?

Assignee

Comment 33

•

7 years ago

Sorry for not setting sec-approval, I wasn't well informed about its usage.

Updated

•

7 years ago

tracking-firefox58: --- → +

tracking-firefox59: --- → +

tracking-firefox-esr52: --- → 58+

Updated

•

7 years ago

Attachment #8935373 - Flags: sec-approval? → sec-approval+

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

7 years ago

Whiteboard: [fxsearch] → [fxsearch][adv-main58+][adv-esr52.6+]

Updated

•

7 years ago

Flags: qe-verify-

Whiteboard: [fxsearch][adv-main58+][adv-esr52.6+] → [fxsearch][adv-main58+][adv-esr52.6+][post-critsmash-triage]