Closed Bug 1153023 Opened 5 years ago Closed 5 years ago

[Gallery][Crop] When the gallery opens with a white screen to start, the user will not be able to resize the crop tool

Categories

(Core :: DOM: Content Processes, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla40
blocking-b2g 2.5?
Tracking Status
firefox40 --- fixed
b2g-v2.2 --- unaffected
b2g-master --- verified

People

(Reporter: dharris, Assigned: kats)

References

()

Details

(Keywords: regression, Whiteboard: [3.0-Daily-Testing])

Attachments

(6 files, 2 obsolete files)

Description:
The gallery will launch starting with a white screen and some parts of it will be broken. If the user tries to crop after this has happened, the cropping tool cannot be resized, unless a premade cropping size is selected.


Repro Steps:
1) Update a Flame to 20150409010203
2) Open the Gallery app> Wait till it loads
3) Press the home button
4) Repeat steps 2 and 3 until the screen is full white when launching gallery
5) Tap on a picture> Go to edit mode> Try to crop the picture


Actual:
The user cannot resize the cropping tool


Expected:
Cropping tool will always be functional


Environmental Variables:
Device: Flame 3.0 (319mb)(Kitkat)(Full Flash)
Build ID: 20150409010203
Gaia: 5dfd0460eb6e616205154b0d219aa5123bf1abb3
Gecko: 8f57f60ee58a
Gonk: b83fc73de7b64594cd74b33e498bf08332b5d87b
Version: 40.0a1 (3.0)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0


Repro frequency: 9/10
See attached: Logcat, Video - https://youtu.be/AzLfVgdhB9M
This issue does NOT occur on Flame 2.2

The gallery will never launch with a white screen and the cropping tool can always be used

Environmental Variables:
Device: Flame 2.2 (319mb)(Kitkat)(Full Flash)
Build ID: 20150409002503
Gaia: ea735c21bfb0d78333213ff0376fce1eac89ead6
Gecko: 0efd5cdbe224
Gonk: ebad7da532429a6f5efadc00bf6ad8a41288a429
Version: 37.0 (2.2)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:37.0) Gecko/37.0 Firefox/37.0
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
[Blocking Requested - why for this release]:
Functional regression of a core feature

Requesting a window
blocking-b2g: --- → 3.0?
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(pbylenga)
QA Contact: pcheng
Last working behavior: when white screen occurs, resizing the cropping tool still works.
First broken behavior: when white screen occurs, resizing the cropping tool does not work.

Last Working Environmental Variables:
Device: Flame
BuildID: 20150326141446
Gaia: 525c341254e08f07f90da57a4d1cd5971a3cc668
Gecko: c297ede37ac5
Version: 39.0a1 (3.0 Master)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

First Broken Environmental Variables:
Device: Flame
BuildID: 20150326141946
Gaia: 525c341254e08f07f90da57a4d1cd5971a3cc668
Gecko: fb063ad596fd
Version: 39.0a1 (3.0 Master)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:39.0) Gecko/39.0 Firefox/39.0

Gaia is the same so it's a Gecko issue.

Gecko pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c297ede37ac5&tochange=fb063ad596fd

Caused by Bug 1075670.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Bill, can you take a look at this please. This might have been caused by the landing for bug 1075670.
Blocks: 1075670
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker) → needinfo?(wmccloskey)
Can you please look at this kats? Sorry to keep asking for help, but I don't know this code at all.
Flags: needinfo?(wmccloskey) → needinfo?(bugmail.mozilla)
I tried reproducing this but so far have been unable to. I tried the apr13 pvtbuild for flame-kk-eng as well as a local build and on neither of them I was able to repro the situtation shown in the video where the gallery starts up with a white screen. I'll try with a 319MB configuration as well but I doubt that will make a difference. Bill, are you able to reproduce this at least?
Flags: needinfo?(bugmail.mozilla) → needinfo?(wmccloskey)
Ah, I can reproduce on a 319MB configuration. Let me look into it a bit more.
Flags: needinfo?(wmccloskey) → needinfo?(bugmail.mozilla)
Logging the SendUpdateDimensions calls in TabParent shows that on first start of the gallery app, the mChromeOffset gets set to (0, -854) and then during startup it gets set to (0, 0). It then stays at (0,0) (which is the correct value) for a while. Eventually with going in and out of the gallery app I see another SendUpdateDimensions with (0,-854) as the chrome offset, and that then stays that way, throwing everything out of whack.
This is the backtrace where the bad offset comes in, pretty uninteresting. The real problem is somewhere else, likely a race condition somewhere.
I strongly suspect the -854 is coming from the fact that when the app is not in the foreground it gets a translateY(-100%), at https://github.com/mozilla-b2g/gaia/blob/e7bccef7e8a8bb002666fb0f13bfd4ea69841253/apps/system/style/window.css#L553
Flags: needinfo?(bugmail.mozilla)
To be clear, my theory is that when the transform is applied/unapplied, it doesn't necessarily trigger a call to TabParent::UpdateDimensions, which it should be doing. In some cases we get a reflow before the transform is changed, and so we get an UpdateDimensions call with the translate still applied, and that's when the problem happens.
dbaron, does comment 11 sound reasonable? As far as I can tell changing the CSS transform property on an element doesn't necessarily trigger a reflow, and it's the nsSubDocumentFrame::ReflowFinished() code that ends up calling TabParent::UpdateDimensions (via nsFrameLoader::UpdatePositionAndSize). So changing the CSS transform property doesn't necessarily result in the TabParent dimensions getting updated. Is this behavior intentional/desirable?
Flags: needinfo?(dbaron)
It's expected that transforms shouldn't trigger reflow.

Maybe TabParent::UpdateDimensions shouldn't consider the effects of transforms?  Or if it needs to, it would need to be hooked up to something that's triggered more often.
Flags: needinfo?(dbaron)
(I don't know what TabParent::UpdateDimensions is used for.)
(And does whatever does what it does in non-e10s consider transforms?)
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #13)
> It's expected that transforms shouldn't trigger reflow.

Ok, thanks.

> Maybe TabParent::UpdateDimensions shouldn't consider the effects of
> transforms?

That might be a possibility, but it might be tricky. In this context, the transforms have an impact on the position of the child process on the screen, and so affect the screen-relative widget bounds of the child process. That's what we're trying to keep updated.

> Or if it needs to, it would need to be hooked up to something
> that's triggered more often.

Is there some callback/listener that we can hook into that satisfies this? I'm not familiar with all the hooks into layout code - I think the only other one I know about is the AddPostRefreshObserver, and I don't know if that's a good one to hook into.

(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #15)
> (And does whatever does what it does in non-e10s consider transforms?)

In non-e10s we don't have this problem because there are no child processes that need this adjustment.
Flags: needinfo?(dbaron)
Attached patch WIP (obsolete) — Splinter Review
This seems to work but it's probably very inefficient and may leak things and who knows what. Proof-of-concept that validates my hypothesis at least.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #16)
> Is there some callback/listener that we can hook into that satisfies this?
> I'm not familiar with all the hooks into layout code - I think the only
> other one I know about is the AddPostRefreshObserver, and I don't know if
> that's a good one to hook into.

This seems like it might be a little bit too frequent, in that it's checking the state of things on every refresh cycle.  I guess it would work if what it does when there's no change is really cheap, though.

I don't have any better ideas, though.


> (In reply to David Baron [:dbaron] ⏰UTC-7 from comment #15)
> > (And does whatever does what it does in non-e10s consider transforms?)
> 
> In non-e10s we don't have this problem because there are no child processes
> that need this adjustment.

I think there is still a non-e10s equivalent.  TabChild::RecvUpdateDimensions does various things to size a widget and a window; in non-e10s, there's also some path that makes us resize those things.

See, e.g., nsFrameLoader::UpdatePositionAndSize, which both calls mRemoteBrowser->UpdateDimensions() and does things in the non-e10s case.

I suspect we want those sizes to respond to transforms in the parent document the same way for e10s and non-e10s, although there might be a reason for them to be different.
Flags: needinfo?(dbaron)
(In reply to David Baron [:dbaron] ⏰UTC-7 from comment #18)
> This seems like it might be a little bit too frequent, in that it's checking
> the state of things on every refresh cycle.  I guess it would work if what
> it does when there's no change is really cheap, though.

It's pretty cheap, I think. And I can make it even cheaper with a few modifications. Unless there's some other way to only pick up changes in transforms I think this might be our best bet.

> I don't have any better ideas, though.

Ok, thanks.

> > (In reply to David Baron [:dbaron] ⏰UTC-7 from comment #15)
> I think there is still a non-e10s equivalent. 
> TabChild::RecvUpdateDimensions does various things to size a widget and a
> window; in non-e10s, there's also some path that makes us resize those
> things.
> 
> See, e.g., nsFrameLoader::UpdatePositionAndSize, which both calls
> mRemoteBrowser->UpdateDimensions() and does things in the non-e10s case.
> 
> I suspect we want those sizes to respond to transforms in the parent
> document the same way for e10s and non-e10s, although there might be a
> reason for them to be different.

I gave this some thought, and I think the difference between e10s and non-e10s is that in the e10s case, the event in the child proccess is relative to the PuppetWidget, and uses that as it's widget. So when the screen coordinates are computed at [1] in the child process, the mRefPoint already has the transform taken into account (from the call at [2] in the parent process), and so the PuppetWidget's WidgetToScreenOffset code must also take the transform into account to cancel it out properly. In the non-e10s case though the event's mRefPoint doesn't undergo the transformation to the child process, and so nothing needs changing. In the e10s case, for the transforms to cancel out properly, we need to make sure that the chrome displacement in the child process is always up-to-date (i.e. UpdateDimensions is called on the parent any time the result of GetChildProcessOffset() changes).

As I was writing this, I realized that another option to fix this might be that in functions like TabParent::SendRealMouseEvent where the refPoint is adjusted, we could also stash the child process offset on the event object that is sent over to the child. In the child process then we could use that instead of the widget's "chrome displacement". That might be better to do in a follow-up though as I'm not sure what else relies on the chrome displacement and whether that can be removed easily.

[1] http://mxr.mozilla.org/mozilla-central/source/dom/events/Event.cpp?rev=e60e056a230c#905
[2] http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp?rev=e812687a81cd#1172
Component: Gaia::Gallery → DOM: Content Processes
Product: Firefox OS → Core
Assignee: nobody → bugmail.mozilla
Attachment #8594851 - Flags: review?(wmccloskey)
Not really related to this bug, but this code was wrong and wouldn't update properly if the client offset dropped from nonzero to zero.
Attachment #8594852 - Flags: review?(wmccloskey)
This is the main fix.
Attachment #8592394 - Attachment is obsolete: true
Attachment #8594855 - Flags: review?(wmccloskey)
Comment on attachment 8594855 [details] [diff] [review]
Part 4 - Send dimensions update when the child process offset changes

Hm, that try push doesn't look so green. Will need to investigate a bit more.
Attachment #8594855 - Flags: review?(wmccloskey)
Thanks for working on this kats. I'm not a good reviewer for this code, though, since I don't know anything about widgets. I landed the patch in bug 1075670, but David Parks was the author and Olli reviewed it. Maybe smaug or roc could review these patches.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #25)
> Hm, that try push doesn't look so green. Will need to investigate a bit more.

Seems like in some cases the TabParent is destroyed while leaving a dangling PostRefresh listener. Even calling RemoveWindowListeners() in TabParent::Destroy doesn't fix it because at that point the document may have been disconnected from the presshell and there's no way to get a handle to the presshell and call RemovePostRefreshObserver on it.
New try push with holding a local refptr to the presShell. It's even uglier but I can't think of anything cleaner right now, let's see if it passes tests.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d05cf7f83c04
Comment on attachment 8594851 [details] [diff] [review]
Part 1 - Extract a helper method

Review of attachment 8594851 [details] [diff] [review]:
-----------------------------------------------------------------

I guess I can review stuff like this :-).
Attachment #8594851 - Flags: review?(wmccloskey) → review+
Attachment #8594852 - Flags: review?(wmccloskey) → review+
Comment on attachment 8594853 [details] [diff] [review]
Part 3 - (Cleanup) Convert nsIntPoint to LayoutDeviceIntPoint

Review of attachment 8594853 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/TabChild.cpp
@@ +2021,5 @@
>          return true;
>      }
>  
>      mOuterRect = rect;
> +    mChromeDisp = LayoutDeviceIntPoint::ToUntyped(chromeDisp);

It seems like, without much trouble, you could go a bit further here and change |mChromeDisp| and |GetChromeDisplacement()| to LayourDeviceIntPoint. Then maybe you could put the ToUntyped in PuppetWidget::GetChromeDimensions(). Up to you though.
Attachment #8594853 - Flags: review?(wmccloskey) → review+
Try push above was green.

roc, please let me know if there's a better way to do this - I'm not a huge fan of this but couldn't think of anything better.
Attachment #8594855 - Attachment is obsolete: true
Attachment #8595076 - Flags: review?(roc)
Comment on attachment 8595076 [details] [diff] [review]
Part 4 - Send dimensions update when the child process offset changes

Review of attachment 8595076 [details] [diff] [review]:
-----------------------------------------------------------------

This is OK I guess. I'm not enthusiastic about it because I think in principle the offset could change without a refresh of the presshell, though I can't think of a case where that would happen in practice.

I think really the content process shouldn't need to know about any offsets from outside its puppetwidget subtree, though I understand that could be a big change. Is there a good reason for the content process to need these offsets?
Attachment #8595076 - Flags: review?(roc) → review+
I believe the rationale for that is buried somewhere in bug 1075670, or maybe :handyman can provide a quick summary. I'm mostly just helping with b2g regressions from that change (on the assumption that there is a good reason things are this way) because I'm more familiar with debugging b2g bugs.
Flags: needinfo?(davidp99)
(In reply to Bill McCloskey (:billm) from comment #30)
> It seems like, without much trouble, you could go a bit further here and
> change |mChromeDisp| and |GetChromeDisplacement()| to LayourDeviceIntPoint.
> Then maybe you could put the ToUntyped in
> PuppetWidget::GetChromeDimensions(). Up to you though.

Made this change and landed. We can file a follow-up for making changes based on comment 32/33 if needed.
The mChromeDisp field was needed to compute screen coordinates in content processes (say, because client JS code accesses event.screenX).  This is because of the behavior of the geometry, which is a bit complex.  Details:

* In the content proc, the PuppetWidget's bounds (PuppetWidget::GetBounds()) are the bounds of the OS widget, translated to the origin.  So the content proc's PuppetWidget::GetBounds() == main proc's nsWindow::GetBounds().
* widget::GetClientOffset is supposed to represent the offset from the widget's top-left corner to any gecko content.  In the main proc, this is just the OS's window decorations.  In the content proc, this includes the main proc's chrome, so the PuppetWidget::GetClientOffset is nsWindow::GetClientOffset + the mChromeDisp.  (Before bug 1075670, clientOffset was (0,0) -- so the geometry wasn't consistent, leading to the original bugs.)
* An alternative to keeping the PuppetWidget's bounds the same as the nsWindow's is to have its bounds be the content of the nsWindow.  Since a lot of information passed to the content proc is widget-relative, these places (mouse-events/touches, resize requests, compositor, etc) would also need to have their math updated.

Does that help?
Flags: needinfo?(davidp99)
That sort of makes sense. It sounds like it might be possible to pass the chrome offset on a per-event basis rather than every time the window is moved which might make it a little cleaner, but we can defer that to later, if we run into more problems with this approach.
This issue is verified fixed on Flame 3.0. When Gallery opens with a white screen to start, the crop tool can be resized.

Device: Flame 3.0 Master (full flashed, 319MB KK)
BuildID: 20150504010202
Gaia: e18cce173840d6ff07fb6f1f0e0ffb58b99aab3e
Gecko: dc5f85980a82
Gonk: a9f3f8fb8b0844724de32426b7bcc4e6dc4fa2ed
Version: 40.0a1 (3.0 Master)
Firmware Version: v18D-1
User Agent: Mozilla/5.0 (Mobile; rv:40.0) Gecko/40.0 Firefox/40.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.