Closed Bug 616668 Opened 9 years ago Closed 9 years ago

crash [@ mozilla::dom::TabParent::RecvGetDPI ]

Categories

(Core :: DOM: Core & HTML, defect, critical)

ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0b4+ ---

People

(Reporter: scoobidiver, Assigned: azakai)

References

Details

(Keywords: crash, Whiteboard: [fennec-checkin-postb3])

Crash Data

Attachments

(2 files, 2 obsolete files)

It is #12 top crasher in Fennec 4.0b3pre for the last 3 days.

Signature	mozilla::dom::TabParent::RecvGetDPI
UUID	d53ac02b-b5e2-4326-8f23-693052101203
Time 	2010-12-03 21:46:00.907237
Uptime	435
Install Age	14572 seconds (4.0 hours) since version was first installed.
Product	Fennec
Version	4.0b3pre
Build ID	20101203042345
Branch	2.0
OS	Linux
OS Version	0.0.0 Linux 2.6.32.15-g6a358a9 #1 PREEMPT Fri Aug 6 22:36:38 CST 2010 armv7l
CPU	arm
CPU Info	
Crash Reason	SIGSEGV
Crash Address	0x0
User Comments	
App Notes 	HTC HTC Desire
htc_wwe/htc_bravo/bravo/bravo:2.2/FRF91/226611:user/release-keys

Frame 	Module 	Signature [Expand] 	Source
0 	libxul.so 	mozilla::dom::TabParent::RecvGetDPI 	dom/ipc/TabParent.cpp:547
1 	libxul.so 	mozilla::dom::PBrowserParent::OnMessageReceived 	PBrowserParent.cpp:1199
2 	libxul.so 	mozilla::dom::PContentParent::OnMessageReceived 	PContentParent.cpp:1196
3 	libxul.so 	mozilla::ipc::SyncChannel::OnDispatchMessage 	ipc/glue/SyncChannel.cpp:172
4 	libxul.so 	mozilla::ipc::RPCChannel::OnMaybeDequeueOne 	ipc/glue/RPCChannel.cpp:436
5 	libxul.so 	RunnableMethod<mozilla::ipc::RPCChannel, bool , Tuple0>::Run 	ipc/chromium/src/base/task.h:308
6 	libxul.so 	mozilla::ipc::RPCChannel::DequeueTask::Run 	RPCChannel.h:475
7 	libxul.so 	MessageLoop::RunTask 	ipc/chromium/src/base/message_loop.cc:344
8 	libxul.so 	MessageLoop::DeferOrRunPendingTask 	ipc/chromium/src/base/message_loop.cc:354
9 	libxul.so 	MessageLoop::DoWork 	ipc/chromium/src/base/message_loop.cc:451
10 	libxul.so 	mozilla::ipc::DoWorkRunnable::Run 	ipc/glue/MessagePump.cpp:71
11 	libxul.so 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:626
12 	libxul.so 	NS_ProcessNextEvent_P 	nsThreadUtils.cpp:250
13 	libxul.so 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
14 	libxul.so 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:220
15 	libxul.so 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:512
16 	libxul.so 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:198
17 	libxul.so 	nsAppStartup::Run 	toolkit/components/startup/src/nsAppStartup.cpp:192
18 	libxul.so 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3693
19 	libxul.so 	GeckoStart 	toolkit/xre/nsAndroidStartup.cpp:131
20 	libc.so 	libc.so@0x10f47 	
21 	libc.so 	libc.so@0x10a33 	

More reports at:
http://crash-stats.mozilla.com/report/list?range_value=4&range_unit=weeks&signature=mozilla%3A%3Adom%3A%3ATabParent%3A%3ARecvGetDPI&version=Fennec%3A4.0b3pre
tracking-fennec: --- → ?
azakai@58528   543 TabParent::RecvGetDPI(float* aValue)
azakai@58528   544 {
azakai@58528   545   nsCOMPtr<nsIWidget> widget = GetWidget();
azakai@58528   546   NS_ABORT_IF_FALSE(widget, "Must have a widget to find the DPI!");
azakai@58528   547   *aValue = widget->GetDPI();

You don't have a widget. please stop crashing us :)
Assignee: nobody → azakai
Blocks: 613076
Attached patch patch? (obsolete) — Splinter Review
I guess this would fix the crash, but we should really understand why we can have a situation without a widget, and how bad it is to not have a proper DPI to return in that case...?
Attachment #495535 - Flags: feedback?(jones.chris.g)
Yeah, the options here aren't great.

Can anyone reproduce?
Comment on attachment 495535 [details] [diff] [review]
patch?

Might as well start planning fixes for the worst-case here.

What happens in-process if a DPI query is made for a tab for which a widget can't be located?
Can GetDPI static?  I don't think DPI is really window/widget specific.
tracking-fennec: ? → 2.0b3+
(In reply to comment #5)
> Can GetDPI static?  I don't think DPI is really window/widget specific.

Maybe on mobile that is unlikely, but in general DPI can be widget specific apparently,

https://bugzilla.mozilla.org/show_bug.cgi?id=613076#c5
interesting.  I suppose multiple monitors would make that true.


Returning 200 is wrong.  96 is probably the better wrong value.



if the child process is dying or has died, and there are pending events would we see this?  For example,

1) child post a GetDPI event
2) child dies
3) parent start handling the crash
4) parent processes the GetDPI event and finds a null widget
I think that 96 is suitable for a desktop, but that something like 200 might make more sense for phones,

http://en.wikipedia.org/wiki/List_of_displays_by_pixel_density


If the child dies, don't its messages vanish? Or is there some short interval in which they still arrive, but the parent has already destroyed the widget - but *not* the TabParent?

If this issue only happens in such a scenario then we should just apply the above patch and be done with this. However, the risk is that there are other cases without a widget - does anyone know? The IME code in particular is happy to ignore messages when there is no widget, perhaps the issue is the same there?
Everywhere else in that file, we test for widget before using it.  In the GetDPI case, we abort.

I think taking a null check is fine and something we'd want for b3 (its a top crasher). And a follow up bug to better understand when widgets can be null.
Comment on attachment 495535 [details] [diff] [review]
patch?

Sounds good to me.
Attachment #495535 - Flags: feedback?(jones.chris.g) → review?(jones.chris.g)
At the risk of sounding like a broken record,

(In reply to comment #3)
> Can anyone reproduce?

(In reply to comment #4)
> Comment on attachment 495535 [details] [diff] [review]
> patch?
> 
> What happens in-process if a DPI query is made for a tab for which a widget
> can't be located?

Returning |200.0*drand()| sounds like a recipe for pain.
I can't reproduce this crash.
tracking-fennec: 2.0b3+ → 2.0b4+
As suggested by dougt I strengthened the IME checks into runtime aborts, if there is no widget. But I couldn't get it to crash - so perhaps the IME tests are not really needed, or are needed to the same degree as the DPI check is needed.
Also on dougt's suggestion, I added additional checks for the widget, even where there weren't checks before. I found one situation where there is no widget - we receive several AsyncMessages *after* a tab is closed where that happens.

Closing the tab sets mFrameElement to null. The AsyncMessages arrive later, and can't get a widget.

It's impossible to know if this is what happened in the reported crashes of GetDPI. But in theory perhaps some CSS element was working just as they closed the tab, and that message arrives after the tab is closed. This seems possible. In that case, the patch above would be fine since it doesn't matter what we return for the DPI, as the tab is closing.
tn, quoted from IRC, about the GetWidget() function:

> there are a few less common situations where it can fail
> there are times when the content can have no primary frame, this would happen very early in page loading
> and when we transition from one page to another we keep the old page alive in the bfcache but it has been "unhooked" so GetNearestWidget can return null I think

These seem rare but I guess just as possible as the tab closing situation mentioned before.
I'm OK with a patch that saves away into a static var the last value TabParent saw for GetDPI() and returns that if it can't otherwise find a widget to ask.  If we get into a situation where even that static val hasn't been set, then I guess we could return 200*drand() (but keep the static var uninit).
Actually, I still don't understand how the in-process code would deal with this situation.
Attached patch patch (obsolete) — Splinter Review
Attachment #495535 - Attachment is obsolete: true
Attachment #496249 - Flags: review?(jones.chris.g)
Attachment #495535 - Flags: review?(jones.chris.g)
I'm seeing this crash on the latest trunk build of Fennec on the Droid, when quickly closing and reopening a page like about:license or about:firefox, for example.
Those aren't remote pages, so they shouldn't use TabParent::RecvGetDPI. Are you sure it's this crash that you are seeing?
Yes, I'm sure. With the steps to reproduce in comment 19, I can usually recreate this crash within 10 seconds or so.
This is a crash report where I added the url that I visited:
http://crash-stats.mozilla.com/report/index/7e16eb08-9172-47b7-8f37-327972101208
Attached patch patch 2Splinter Review
Attachment #496249 - Attachment is obsolete: true
Attachment #496353 - Flags: review?(jones.chris.g)
Attachment #496249 - Flags: review?(jones.chris.g)
Comment on attachment 496353 [details] [diff] [review]
patch 2

>diff --git a/dom/ipc/TabParent.cpp b/dom/ipc/TabParent.cpp
> void
>+TabParent::SetOwnerElement(nsIDOMElement* aElement)
>+{
>+  mFrameElement = aElement;
>+
>+  // Cache the DPI of the screen, since we

Looks like this comment

OK by me as far as my limited knowledge extends.
Attachment #496353 - Flags: review?(jones.chris.g) → review+
Whiteboard: [fennec-checkin-postb3]
doug, given this is easy to reproduce crash, should we reassess it for blocking Beta 3?   both Martijn and marcia have stack traces and can reproduce.
reflagging blocking-fennec flag to ask for re-triage for b3
tracking-fennec: 2.0b4+ → ?
tracking-fennec: ? → 2.0b4+
http://hg.mozilla.org/mozilla-central/rev/ebac1eeb8192
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Crash Signature: [@ mozilla::dom::TabParent::RecvGetDPI ]
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.