Closed Bug 1402251 Opened 7 years ago Closed 7 years ago

Content process gets stuck in nsRange::IsNodeSelected

Categories

(Core :: DOM: Core & HTML, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox-esr52 --- unaffected
firefox56 --- unaffected
firefox57 blocking fixed
firefox58 + fixed

People

(Reporter: freesamael, Assigned: catalinb)

References

Details

Attachments

(1 file, 1 obsolete file)

Nightly 58.0a1 build id 20170921220243 on macOS Sierra.

On a typical daily use of Firefox nightly today, I noticed some tabs suddenly became blank with a rotating circle in the center (the loading indicator). I found it was a specific content process got stuck.

Unfortunately I don't have an explicit STR, but the tabs in the content process were bugzilla, slack and searchfox.

I captured a profile after noticing this symptom: https://perfht.ml/2jPs2kX

I also attached lldb onto the process, but there were not enough debugging information built into nightly. All I know is that it never exit this loop:

(lldb) th step-inst
Process 36706 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x000000010c99d88d XUL`nsRange::IsNodeSelected(nsINode*, unsigned int, unsigned int) + 493
XUL`nsRange::IsNodeSelected:
->  0x10c99d88d <+493>: jne    0x10c99d800               ; <+352>
    0x10c99d893 <+499>: nopw   %cs:(%rax,%rax)
    0x10c99d8a0 <+512>: movq   0x28(%r13), %r13
    0x10c99d8a4 <+516>: testq  %r13, %r13
Target 0: (plugin-container) stopped.
(lldb) th step-inst
Process 36706 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step into
    frame #0: 0x000000010c99d800 XUL`nsRange::IsNodeSelected(nsINode*, unsigned int, unsigned int) + 352
XUL`nsRange::IsNodeSelected:
->  0x10c99d800 <+352>: testb  $0x1, 0xa8(%rbx)
    0x10c99d807 <+359>: je     0x10c99d840               ; <+416>
    0x10c99d809 <+361>: movq   0x78(%rbx), %rax
    0x10c99d80d <+365>: cmpq   0x90(%rbx), %rax
Target 0: (plugin-container) stopped.

Presumably it was this loop:
http://searchfox.org/mozilla-central/rev/15ce5cb2db0c85abbabe39a962b0e697c9ef098f/dom/base/nsRange.cpp#207-223
Catalin is looking some similar issue.
Priority: -- → P2
Given the severity of this problem, I'll track this as blocking 57. If we notice with broader Beta rollout the # of user reports does not increase, I will change it from "blocking" to just "+" tracked for 57.
Hi Catalin, mind chiming in here and suggesting the next step per comment 1? Thank you.
Flags: needinfo?(catalin.badea392)
An infinite loop in the the inner for statement might have been caused by the bug fixed in bug 1399091, because the same range could be added to two different ancestor linked lists. I would add a diagnostic assert in nsRange::RegisterCommonAncestor to see if this still occurs.
Flags: needinfo?(catalin.badea392)
Comment on attachment 8916597 [details] [diff] [review]
Prevent ranges from being added to multiple common ancestor lists.

This feels wrong. Either we trust the range is never in a list (and just assert), or we don't trust that and just have the if-check.
Doesn't it mean a bug elsewhere if we are still in a list.
Attachment #8916597 - Flags: review?(bugs) → review-
Assignee: nobody → catalin.badea392
Got r+ on irc for landing just the assertion.
Attachment #8916597 - Attachment is obsolete: true
Pushed by catalin.badea392@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/82957b8ff351
Assert ranges our not added to multiple common ancestor lists. r=smaug
Hi Catalin, should we consider uplifting this fix to 57? This bug is currently being tracked as a 57 blocker.
Flags: needinfo?(catalin.badea392)
(In reply to Ritu Kothari (:ritu) from comment #10)
> Hi Catalin, should we consider uplifting this fix to 57? This bug is
> currently being tracked as a 57 blocker.

It's not a fix, it just adds a diagnostic assert. I'm hoping to get a crash report that would better reveal the bug. There's also a chance this was fixed by bug 1399091.
Flags: needinfo?(catalin.badea392)
Do we know if this was fixed by bug 1399091?
Flags: needinfo?(catalin.badea392)
Priority: P2 → P1
(In reply to Andrew Overholt [:overholt] from comment #12)
> Do we know if this was fixed by bug 1399091?

I think it was fixed by bug 1399091.

An infinite loop would occur at [1] if the linked list was broken by adding the range to a second ancestor list without removing it from the previous one. An iterator over the first linked list would jump to the second list and loop forever since it expects to find the sentinel element from the first element, but it will be stuck in the second list.

 sentinel1 -> range <-> sentinel2

This can happen if we call RegisterCommonAncestor on a range twice without prior calling UnregisterCommonAncestor. Inspecting the call sites of RegisterCommonAncestor[2], we see that we always check for an existing common ancestor. This was not the case with nsRange::SetSelection, which was fixed in bug 1399091.

Also, I didn't get any hits on the diagnostic assert I added in RegisterCommonAncestor. 

[1] http://searchfox.org/mozilla-central/rev/d30462037ffea383e74c42542c820cf65b2b144e/dom/base/nsRange.cpp#212
[2] http://searchfox.org/mozilla-central/search?q=symbol:_ZN7nsRange22RegisterCommonAncestorEP7nsINode&redirect=false

I would close this as fixed.
Flags: needinfo?(catalin.badea392)
So is this fixed?
Flags: needinfo?(catalin.badea392)
(In reply to Olli Pettay [:smaug] from comment #14)
> So is this fixed?

yes.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(catalin.badea392)
Resolution: --- → FIXED
Depends on: 1399091
Keywords: leave-open
Target Milestone: --- → mozilla58
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: