Closed Bug 616668 Opened 14 years ago Closed 14 years ago

crash [@ mozilla::dom::TabParent::RecvGetDPI ]

Categories

(Core :: DOM: Core & HTML, defect)

ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0b4+ ---

People

(Reporter: scoobidiver, Assigned: azakai)

References

Details

(Keywords: crash, Whiteboard: [fennec-checkin-postb3])

Crash Data

Attachments

(2 files, 2 obsolete files)

It is #12 top crasher in Fennec 4.0b3pre for the last 3 days. Signature mozilla::dom::TabParent::RecvGetDPI UUID d53ac02b-b5e2-4326-8f23-693052101203 Time 2010-12-03 21:46:00.907237 Uptime 435 Install Age 14572 seconds (4.0 hours) since version was first installed. Product Fennec Version 4.0b3pre Build ID 20101203042345 Branch 2.0 OS Linux OS Version 0.0.0 Linux 2.6.32.15-g6a358a9 #1 PREEMPT Fri Aug 6 22:36:38 CST 2010 armv7l CPU arm CPU Info Crash Reason SIGSEGV Crash Address 0x0 User Comments App Notes HTC HTC Desire htc_wwe/htc_bravo/bravo/bravo:2.2/FRF91/226611:user/release-keys Frame Module Signature [Expand] Source 0 libxul.so mozilla::dom::TabParent::RecvGetDPI dom/ipc/TabParent.cpp:547 1 libxul.so mozilla::dom::PBrowserParent::OnMessageReceived PBrowserParent.cpp:1199 2 libxul.so mozilla::dom::PContentParent::OnMessageReceived PContentParent.cpp:1196 3 libxul.so mozilla::ipc::SyncChannel::OnDispatchMessage ipc/glue/SyncChannel.cpp:172 4 libxul.so mozilla::ipc::RPCChannel::OnMaybeDequeueOne ipc/glue/RPCChannel.cpp:436 5 libxul.so RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run ipc/chromium/src/base/task.h:308 6 libxul.so mozilla::ipc::RPCChannel::DequeueTask::Run RPCChannel.h:475 7 libxul.so MessageLoop::RunTask ipc/chromium/src/base/message_loop.cc:344 8 libxul.so MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:354 9 libxul.so MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:451 10 libxul.so mozilla::ipc::DoWorkRunnable::Run ipc/glue/MessagePump.cpp:71 11 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:626 12 libxul.so NS_ProcessNextEvent_P nsThreadUtils.cpp:250 13 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:134 14 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:220 15 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:512 16 libxul.so nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:198 17 libxul.so nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:192 18 libxul.so XRE_main toolkit/xre/nsAppRunner.cpp:3693 19 libxul.so GeckoStart toolkit/xre/nsAndroidStartup.cpp:131 20 libc.so libc.so@0x10f47 21 libc.so libc.so@0x10a33 More reports at: http://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=mozilla%3A%3Adom%3A%3ATabParent%3A%3ARecvGetDPI&version=Fennec%3A4.0b3pre
tracking-fennec: --- → ?
azakai@58528 543 TabParent::RecvGetDPI(float* aValue) azakai@58528 544 { azakai@58528 545 nsCOMPtr<nsIWidget> widget = GetWidget(); azakai@58528 546 NS_ABORT_IF_FALSE(widget, "Must have a widget to find the DPI!"); azakai@58528 547 *aValue = widget->GetDPI(); You don't have a widget. please stop crashing us :)
Assignee: nobody → azakai
Blocks: 613076
Attached patch patch? (obsolete) — Splinter Review
I guess this would fix the crash, but we should really understand why we can have a situation without a widget, and how bad it is to not have a proper DPI to return in that case...?
Attachment #495535 - Flags: feedback?(jones.chris.g)
Yeah, the options here aren't great. Can anyone reproduce?
Comment on attachment 495535 [details] [diff] [review] patch? Might as well start planning fixes for the worst-case here. What happens in-process if a DPI query is made for a tab for which a widget can't be located?
Can GetDPI static? I don't think DPI is really window/widget specific.
tracking-fennec: ? → 2.0b3+
(In reply to comment #5) > Can GetDPI static? I don't think DPI is really window/widget specific. Maybe on mobile that is unlikely, but in general DPI can be widget specific apparently, https://bugzilla.mozilla.org/show_bug.cgi?id=613076#c5
interesting. I suppose multiple monitors would make that true. Returning 200 is wrong. 96 is probably the better wrong value. if the child process is dying or has died, and there are pending events would we see this? For example, 1) child post a GetDPI event 2) child dies 3) parent start handling the crash 4) parent processes the GetDPI event and finds a null widget
I think that 96 is suitable for a desktop, but that something like 200 might make more sense for phones, http://en.wikipedia.org/wiki/List_of_displays_by_pixel_density If the child dies, don't its messages vanish? Or is there some short interval in which they still arrive, but the parent has already destroyed the widget - but *not* the TabParent? If this issue only happens in such a scenario then we should just apply the above patch and be done with this. However, the risk is that there are other cases without a widget - does anyone know? The IME code in particular is happy to ignore messages when there is no widget, perhaps the issue is the same there?
Everywhere else in that file, we test for widget before using it. In the GetDPI case, we abort. I think taking a null check is fine and something we'd want for b3 (its a top crasher). And a follow up bug to better understand when widgets can be null.
Comment on attachment 495535 [details] [diff] [review] patch? Sounds good to me.
Attachment #495535 - Flags: feedback?(jones.chris.g) → review?(jones.chris.g)
At the risk of sounding like a broken record, (In reply to comment #3) > Can anyone reproduce? (In reply to comment #4) > Comment on attachment 495535 [details] [diff] [review] > patch? > > What happens in-process if a DPI query is made for a tab for which a widget > can't be located? Returning |200.0*drand()| sounds like a recipe for pain.
I can't reproduce this crash.
tracking-fennec: 2.0b3+ → 2.0b4+
As suggested by dougt I strengthened the IME checks into runtime aborts, if there is no widget. But I couldn't get it to crash - so perhaps the IME tests are not really needed, or are needed to the same degree as the DPI check is needed.
Also on dougt's suggestion, I added additional checks for the widget, even where there weren't checks before. I found one situation where there is no widget - we receive several AsyncMessages *after* a tab is closed where that happens. Closing the tab sets mFrameElement to null. The AsyncMessages arrive later, and can't get a widget. It's impossible to know if this is what happened in the reported crashes of GetDPI. But in theory perhaps some CSS element was working just as they closed the tab, and that message arrives after the tab is closed. This seems possible. In that case, the patch above would be fine since it doesn't matter what we return for the DPI, as the tab is closing.
tn, quoted from IRC, about the GetWidget() function: > there are a few less common situations where it can fail > there are times when the content can have no primary frame, this would happen very early in page loading > and when we transition from one page to another we keep the old page alive in the bfcache but it has been "unhooked" so GetNearestWidget can return null I think These seem rare but I guess just as possible as the tab closing situation mentioned before.
I'm OK with a patch that saves away into a static var the last value TabParent saw for GetDPI() and returns that if it can't otherwise find a widget to ask. If we get into a situation where even that static val hasn't been set, then I guess we could return 200*drand() (but keep the static var uninit).
Actually, I still don't understand how the in-process code would deal with this situation.
Attached patch patch (obsolete) — Splinter Review
Attachment #495535 - Attachment is obsolete: true
Attachment #496249 - Flags: review?(jones.chris.g)
Attachment #495535 - Flags: review?(jones.chris.g)
I'm seeing this crash on the latest trunk build of Fennec on the Droid, when quickly closing and reopening a page like about:license or about:firefox, for example.
Those aren't remote pages, so they shouldn't use TabParent::RecvGetDPI. Are you sure it's this crash that you are seeing?
Yes, I'm sure. With the steps to reproduce in comment 19, I can usually recreate this crash within 10 seconds or so. This is a crash report where I added the url that I visited: http://crash-stats.mozilla.com/report/index/7e16eb08-9172-47b7-8f37-327972101208
Attached patch patch 2Splinter Review
Attachment #496249 - Attachment is obsolete: true
Attachment #496353 - Flags: review?(jones.chris.g)
Attachment #496249 - Flags: review?(jones.chris.g)
Comment on attachment 496353 [details] [diff] [review] patch 2 >diff --git a/dom/ipc/TabParent.cpp b/dom/ipc/TabParent.cpp > void >+TabParent::SetOwnerElement(nsIDOMElement* aElement) >+{ >+ mFrameElement = aElement; >+ >+ // Cache the DPI of the screen, since we Looks like this comment OK by me as far as my limited knowledge extends.
Attachment #496353 - Flags: review?(jones.chris.g) → review+
Whiteboard: [fennec-checkin-postb3]
tchung asked me to add my HTC Evo crash report to the bug: http://crash-stats.mozilla.com/report/index/c47e1e5a-3620-4ec8-9c7e-fe4742101209
doug, given this is easy to reproduce crash, should we reassess it for blocking Beta 3? both Martijn and marcia have stack traces and can reproduce.
reflagging blocking-fennec flag to ask for re-triage for b3
tracking-fennec: 2.0b4+ → ?
tracking-fennec: ? → 2.0b4+
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Crash Signature: [@ mozilla::dom::TabParent::RecvGetDPI ]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: