Closed Bug 1202080 Opened 10 years ago Closed 10 years ago

[e10s] Reproducible crash in telemetry.mozilla.org: nsTArray_Impl<T>::AppendElements<T>(unsigned int) | nsLineBreaker::AppendText(nsIAtom*, unsigned char const*, unsigned int, unsigned int, nsILineBreakSink*)

Categories

(Core :: DOM: UI Events & Focus Handling, defect)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
e10s m8+ ---

People

(Reporter: vladan, Assigned: masayuki)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Component: General → Layout: Text
Product: Firefox → Core
Group: core-security
Flags: needinfo?(dmajor)
Based on the stack this just looks like some kind of stack overflow, so I don't think this really needs to be hidden.
Is it okay to unhide this?
Flags: needinfo?(vdjeric)
The reason code is EXCEPTION_STACK_OVERFLOW. Is it reasonable to have a stack so deep?
Flags: needinfo?(dmajor) → needinfo?(dbaron)
If it turns out that this amount of stack is really necessary (and it's not "just a logic bug") then maybe we could look at the delta between ChildEBP values and slim down the heaviest frames. I bet PGO gets pretty aggressive on inlining in layout code.
This recursive step: 3d 0015b24c 5b47d457 xul!PresShell::ProcessReflowCommands+0x9c 3e 0015b308 5b47d248 xul!PresShell::FlushPendingNotifications+0x209 3f 0015b324 5c40e4ab xul!PresShell::FlushPendingNotifications+0x1d 40 0015b330 5c40e5a7 xul!mozilla::ContentEventHandler::InitBasic+0x1b 41 0015b348 5c40e42c xul!mozilla::ContentEventHandler::InitCommon+0x1d 42 0015b36c 5c40fbec xul!mozilla::ContentEventHandler::Init+0xf 43 0015b384 5bb0254d xul!mozilla::ContentEventHandler::OnQueryEditorRect+0xf 44 0015b5f0 5ba4a90d xul!mozilla::EventStateManager::PreHandleEvent+0x451e50 45 0015b690 5b740d13 xul!PresShell::HandleEvent+0x5d1102 46 0015b6bc 5b741a00 xul!nsViewManager::DispatchEvent+0xb8 47 0015b6dc 5c6af8e5 xul!nsView::HandleEvent+0x43 48 0015b70c 5c6adf6f xul!mozilla::widget::PuppetWidget::DispatchEvent+0x6a 49 0015b878 5c6b4603 xul!mozilla::ContentCacheInChild::CacheEditorRect+0x9a 4a 0015b88c 5c6b44c2 xul!mozilla::widget::PuppetWidget::NotifyIMEOfPositionChange+0x1f 4b 0015b89c 5c6aaba2 xul!mozilla::widget::PuppetWidget::NotifyIMEInternal+0x54 4c 0015b8b0 5bab88af xul!nsBaseWidget::NotifyIME+0x86 4d 0015b8e4 5b47f519 xul!mozilla::IMEStateManager::NotifyIME+0x4e99f4 4e 0015b8ec 5ba24698 xul!nsCOMPtr_base::~nsCOMPtr_base+0x15 4f 0015b914 6bbc2cad xul!nsDocShell::NotifyScrollObservers+0x62b9ac 50 0015b930 5c416fdf mozglue!arena_dalloc_small+0x3d 51 0015b968 5b687eee xul!mozilla::IMEContentObserver::PositionChangeEvent::Run+0x38 52 0015b970 5b817e86 xul!nsRunnable::Release+0x1f 53 0015b990 5b3d3d85 xul!nsAutoScriptBlocker::~nsAutoScriptBlocker+0x81 54 0015ba24 5b47d64e xul!mozilla::ScrollFrameHelper::ReflowFinished+0x17f 55 0015ba44 5b47a250 xul!PresShell::HandlePostedReflowCallbacks+0xd0 56 0015ba60 5b479288 xul!PresShell::DidDoReflow+0x1b 57 0015babc 5b47d457 xul!PresShell::ProcessReflowCommands+0xe2 seems like an infinite recursion bug. (The stuff from frames 00-3c is normal layout recursion; this bit is not.)
Flags: needinfo?(dbaron)
Maybe Mats can look?
Flags: needinfo?(mats)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #0) > STR: > > 1. Load <url> > 2. Type "firstpaint" (no quotes) into the histogram name box To clarify: 1. Load <url> 1b. wait for the page to fully load (takes about 20 sec) 1c. click on the TELEMETRY_PENDING_LOAD_FAILURE_PARSE "link" near the top of the page. In the drop-down menu that is presented, there is a "Search" text input field: this is the "histogram name box" 2. Type "firstpaint" (no quotes) into that box (please correct me if I misunderstood)
The bug doesn't seem to occur on Linux (trunk or aurora). A breakpoint at mozilla::ContentEventHandler::OnQueryEditorRect is never hit.
Looking at the stack in comment 6, it seems like an IME problem. Since PositionChangeEvent::Run leads to flushing layout, it seems like we should dispatch an async event instead of using a script runner?
Flags: needinfo?(mats) → needinfo?(masayuki)
Yeah, that looks the root cause is a call of IMEContentObserver::PositionChangeEvent::Run() fails to check if it's safe to notify IME of reflow. But I'm not sure why IMEContentObserver::AChangeEvent::IsSafeToNotifyIME() http://hg.mozilla.org/mozilla-central/annotate/fb720c90eb49/dom/events/IMEContentObserver.cpp#l1029 and IMEContentObserver::IsSafeToNotifyIME() http://hg.mozilla.org/mozilla-central/annotate/fb720c90eb49/dom/events/IMEContentObserver.cpp#l903 are returns true at that time.
Flags: needinfo?(masayuki)
Oh, but this should block the nested flush: http://hg.mozilla.org/mozilla-central/annotate/fb720c90eb49/dom/events/IMEContentObserver.cpp#l949 I guess that before flushing the pending notifications of IMEContentObserver, two or more IMEContentObserver::PositionChangeEvent is added to the script runner. Next, during a notification is flushed, script blocker is added again and destroyed during the previous script blocker is being destroyed. This must cause this symptom. I'll try to write the patch to fix this unexpected nest.
FYI: If it's true, this may be affected to current release too. Because IME or our IME event handler tries to retrieve something (e.g., caret rect) when it's notified of NOTIFY_IME_OF_POSITION_CHANGE.
(In reply to Andrew McCreight [:mccr8] from comment #2) > Is it okay to unhide this? I'm fine with unhiding it if you guys don't think there's a security risk (i.e. it's just stack space getting exhausted)
Flags: needinfo?(vdjeric)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #14) > (In reply to Andrew McCreight [:mccr8] from comment #2) > > Is it okay to unhide this? > > I'm fine with unhiding it if you guys don't think there's a security risk > (i.e. it's just stack space getting exhausted) Could you check it?
Flags: needinfo?(continuation)
I filed bug 1203381 as performance bug which I found before this bug was reported. I guess that the optimization will fix this bug.
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #15) > Could you check it? I'm sorry, I don't understand what you mean. Would you like me to unhide the bug? (I don't personally know if this is stack space exhaustion.)
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #17) > Would you like me to unhide the bug? Yes, if it's safe.
> 2. Type "firstpaint" (no quotes) into the histogram name box Where is the "histogram name box"? I cannot find it...
Flags: needinfo?(vladan.bugzilla)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #19) > > 2. Type "firstpaint" (no quotes) into the histogram name box > > Where is the "histogram name box"? I cannot find it... Click on the link in comment 0, click on the underlined text TELEMETRY_PENDING_LOAD_FAILURE_PARSE text. You will see a list box pop out as well as a search box. Type "firstpaint" (no quotes) into the search box
Flags: needinfo?(vladan.bugzilla)
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #20) > (In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #19) > > > 2. Type "firstpaint" (no quotes) into the histogram name box > > > > Where is the "histogram name box"? I cannot find it... > > Click on the link in comment 0, click on the underlined text > TELEMETRY_PENDING_LOAD_FAILURE_PARSE text. You will see a list box pop out > as well as a search box. Type "firstpaint" (no quotes) into the search box Thank you. But I cannot reproduce this bug with typing firstpaint there... Do you reproduce it randomly?
Flags: needinfo?(vladan.bugzilla)
This build includes performance optimized patches. With the patches, recursive call should be prevented completely. Do you reproduce with this build? https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0d27dc6b463 Win x86: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/masayuki@d-toybox.com-f0d27dc6b463/try-win32/
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #22) > This build includes performance optimized patches. With the patches, > recursive call should be prevented completely. Do you reproduce with this > build? I can't reproduce this crash in Nightly anymore, possibly because telemetry.mozilla.org has fewer histograms in that list box. I'll look into it and test your patch if I can find a way to reproduce the crash
Flags: needinfo?(vladan.bugzilla)
I was able to reproduce the crash again. It turns out e10s is required for this crash The crash does not reproduce when I use your build
tracking-e10s: --- → ?
Masayuki, assigning to you since your patch on bug 1203381 fixes this
Assignee: nobody → masayuki
This should be fixed by the fix of bug 1203381. Please reopen this if it's not so.
Status: NEW → RESOLVED
Closed: 10 years ago
Component: Layout: Text → Event Handling
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Did you uplift the fix in bug 1203381 to Aurora though?
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #27) > Did you uplift the fix in bug 1203381 to Aurora though? It was landed before the merge. So, already fixed on 43 Aurora.
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: