Closed Bug 1049195 Opened 10 years ago Closed 10 years ago

[B2G][Window Mgmt][HERE Maps] HERE Maps is closed out by LMK when sitting idle in Preferences menu

Categories

(Core :: Graphics, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

VERIFIED FIXED
mozilla35
blocking-b2g 2.0+
Tracking Status
b2g-v1.4 --- unaffected
b2g-v2.0 --- verified
b2g-v2.0M --- verified
b2g-v2.1 --- unaffected
b2g-v2.2 --- verified

People

(Reporter: jdegeus, Assigned: sotaro)

References

()

Details

(Keywords: perf, regression, Whiteboard: [2.0-exploratory][c=memory p= u=2.0 s=][MemShrink:P1])

Attachments

(11 files, 17 obsolete files)

1.21 MB, text/plain
Details
82.81 KB, text/plain
Details
3.33 MB, application/zip
Details
3.31 MB, application/zip
Details
1.16 MB, text/plain
Details
47.04 KB, text/plain
Details
6.03 KB, text/plain
Details
49.50 KB, text/plain
Details
18.71 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
18.61 KB, patch
sotaro
: review+
Details | Diff | Splinter Review
2.41 MB, video/mp4
Details
Attached file HERE_now.txt
Description:
When users are in the HERE Maps app, if the user selects the Layers button on the bottom right corner and repeatedly taps "Miles", users will see the app is closed out by LMK. 

Setup: 
Navigate to Marketplace> Search "HERE Maps"> Install app

Repro Steps:
1) Update a Flame to 20140805000238
2) Launch "HERE Maps"
3) Select the layers button on the bottom right corner
4) Scroll down and repeatedly tap "Miles"
5) Observe app eventually closes resulting in LMK

Actual:
HERE Now app closes out by LMK when repeatedly tapping "Miles"

Expected:
App does not close out due to LMK

Environmental Variables:
Device: Flame 2.0 (319mb)
Build ID: 20140805000238
Gaia: 597975839c04e0198eb98c2c77474f057b5531e7
Version: 32.0 (2.0)
Gecko: ddeead927143
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Repro frequency: 3/3
See attached: Video, dmesg and Logcat
Dmesg.txt - HERE_now.txt - http://youtu.be/9GA3z7_gyz8
Attached file dmesg.txt
This issue DOES NOT occur on Flame 2.1 (319mb), Flame 2.0 (512mb), Flame 1.4 (319mb), Flame Base 122 (319mb), Flame Base 121-2 (319mb), Buri 2.1, Buri 2.0, Buri 1.4, Open C 2.1, Open C 2.0, and Open C 1.4

Actual: The app will not be closed out by LMK 

Flame 2.1 (319mb)

Environmental Variables:
Device: Flame Master
Build ID: 20140805040204
Gaia: 19bf9795263e2ccc15d824a52ebf23c2670fa9b9
Gecko: 7f81be7db528
Version: 34.0a1 (Master)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Flame 2.0 (512mb)

Environmental Variables:
Device: Flame 2.0
Build ID: 20140805000238
Gaia: 597975839c04e0198eb98c2c77474f057b5531e7
Gecko: ddeead927143
Version: 32.0 (2.0)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Flame 1.4 (319mb)

Environmental Variables:
Device: Flame 1.4
Build ID: 20140804183020
Gaia: 9377274b17200a60cebcd2427d489a7756c4cc72
Gecko: 24bc2ae19c59
Version: 30.0 (1.4)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0

Flame Base 122 (319mb)

Environmental Variables:
Device: Flame 1.3 (319mb)
Build ID: 20140616171114
Gaia: e1b7152715072d27e0880cdc6b637f82fa42bf4e
Gecko: e181a36ebafaa24e5390db9f597313406edfc794
Version: 28.0 (1.3)
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0

Flame Base 121-2 (319mb)

Environmental Variables:
Device: Flame 1.3
Build ID: 20140610200025
Gaia: e106a3f4a14eb8d4e10348efac7ae6dea2c24657
Gecko: b637b0677e15318dcce703f0358b397e09b018af
Version: 28.0 (1.3)
Firmware Version: v121-2
User Agent: Mozilla/5.0 (Mobile; rv:28.0) Gecko/28.0 Firefox/28.0

Buri 2.1

Environmental Variables:
Device: Buri Master
Build ID: 20140805073149
Gaia: 19bf9795263e2ccc15d824a52ebf23c2670fa9b9
Gecko: 2aaedcdf69f6
Version: 34.0a1 (Master)
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Buri 2.0

Environmental Variables:
Device: Buri 2.0
Build ID: 20140805063011
Gaia: 4959d1eb92fa82198409c90f471aad6d68c463b3
Gecko: c18fb97addb8
Version: 32.0 (2.0)
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Buri 1.4

Environmental Variables:
Device: Buri 1.4
Build ID: 20140804183020
Gaia: 9377274b17200a60cebcd2427d489a7756c4cc72
Gecko: 24bc2ae19c59
Version: 30.0 (1.4)
Firmware Version: v1.2device.cfg
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0

Open C 2.1

Environmental Variables:
Device: Open_C Master
Build ID: 20140805040204
Gaia: 19bf9795263e2ccc15d824a52ebf23c2670fa9b9
Gecko: 7f81be7db528
Version: 34.0a1 (Master)
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Open C 2.0 

Environmental Variables:
Device: Open_C 2.0
Build ID: 20140805000238
Gaia: 597975839c04e0198eb98c2c77474f057b5531e7
Gecko: ddeead927143
Version: 32.0 (2.0)
Firmware Version: P821A10V1.0.0B06_LOG_DL
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Open C 1.4

v1.4 Environmental Variables:
Device: Open_C v1.4 MOZ
BuildID: 20140728000238
Gaia: 0a864988f5dce7f9f3dea9609e8ef054679c30ff
Gecko: 2da96d621030
Version: 32.0
Firmware Version: v122
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?]
Please fix the product and component since this is a regression.

Seems like a tester bug. How many times do you have to tap the "Miles" button for this to occur? Does this only happen when tapping it repeatedly at a very fast rate?
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage-]
Flags: needinfo?(ktucker) → needinfo?(jdegeus)
Adjusted product, component, and the Summary 

After further investigation of this issue, the user does not need to tap "Miles". The user only needs to enter the Preferences menu and scroll to the base of the menu, then sit idle for about 15 seconds.
QA Whiteboard: [QAnalyst-Triage-] → [QAnalyst-Triage?]
Component: Preinstalled B2G Apps → Gaia::System::Window Mgmt
Flags: needinfo?(jdegeus) → needinfo?(ktucker)
Product: Tech Evangelism → Firefox OS
Summary: [B2G][Tech Evangelism][HERE Maps] HERE Maps is closed out by LMK when repeatidly tapping "Miles" → [B2G][Window Mgmt][HERE Maps] HERE Maps is closed out by LMK when sitting idle in Preferences menu
[Blocking Requested - why for this release]:

According to comment 3 the user will experience a LMK from just sitting in the preference menu so nominating this 2.0? since this is also a regression from 1.4.
blocking-b2g: --- → 2.0?
QA Whiteboard: [QAnalyst-Triage?]
Component: Gaia::System::Window Mgmt → Performance
Flags: needinfo?(ktucker)
QA Contact: jmercado
Triage group concluded that we should block on this because 1) it's a regression and 2) it's a preloaded app.
blocking-b2g: 2.0? → 2.0+
memory reports from before and after tapping "Miles" a lot of times would be a good start.
B2g-inbound Regression Window

Last working 
Environmental Variables:
Device: Flame Master
BuildID: 20140430023004
Gaia: ed00198def12a2df82fb8e4277fa5c3ed2471dd5
Gecko: c46b933d79a2
Version: 32.0a1 (Master) 
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0


First Broken 
Environmental Variables:
Device: Flame Master
BuildID: 20140430053004
Gaia: adfe7c3fc8c8cf39bf14ab416ff531af1f3dd02f
Gecko: b5b912c44f82
Version: 32.0a1 (Master) 
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0


Last working gaia / First broken gecko - Issue does NOT occur
Gaia: ed00198def12a2df82fb8e4277fa5c3ed2471dd5
Gecko: b5b912c44f82


First broken gaia / Last working gekko - Issue DOES occur
Gaia: adfe7c3fc8c8cf39bf14ab416ff531af1f3dd02f
Gecko: c46b933d79a2


Gaia Pushlog: https://github.com/mozilla-b2g/gaia/compare/ed00198def12a2df82fb8e4277fa5c3ed2471dd5...adfe7c3fc8c8cf39bf14ab416ff531af1f3dd02f
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(jmitchell)
Keywords: perf
Whiteboard: [2.0-exploratory] → [2.0-exploratory][c=memory p= u= s=]
Priority: -- → P1
Severity: normal → blocker
Whiteboard: [2.0-exploratory][c=memory p= u= s=] → [2.0-exploratory][c=memory p= u=2.0 s=]
FxOS Perf Triage: Could you also provide the memory report for Kyle as requested in comment 6?
Flags: needinfo?(jmercado)
See Also: → 1049251
Assignee: nobody → wcosta
Assignee: wcosta → nobody
Attached file after pressing miles.zip (obsolete) —
Flags: needinfo?(jmercado)
I have attached the memory reports as requested in comment 6
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(jmitchell)
The diff of attachment 8472580 [details] and 8472581 shows this:

  74.63 MB (100.0%) -- kgsl-memory
  ├──93.65 MB (125.49%) -- HERE Maps (pid=5879)
  │  ├──80.64 MB (108.06%) ── texture [97] [+]
Off to graphics.
Component: Performance → Graphics
Product: Firefox OS → Core
I wonder is this about GPS, from attachment 8468096 [details], during 17:48:55:046 ~ 17:49:09.386, there are 228 nmea_cb:

  LocSvc_eng_nmea: I/<=== nmea_cb line 62 0xacf15dbc

which is ~16 times/sec. But in general I see 4 times/sec, i.e., 4x nmea_cb during the period. I don't know what exactly the callback is though.

The javascript warnings during 17:50:00.846 ~ 17:50:02.316 look also strange, is it possible the geolocation events are somehow queued and then dump to next listener?

My reproduce rate is low, I reproduced once but then never again (tried more than 10 times).
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][2.0-signoff-need]
QA Whiteboard: [QAnalyst-Triage+][2.0-signoff-need] → [QAnalyst-Triage+][2.0-signoff-need+]
Could you please enable GPS logging (Settings->Developer->"Gelocation output in A...") and get another logcat?
Flags: needinfo?(jmercado)
I have attached a logcat created with geolocation turned on while reproducing this bug.
Flags: needinfo?(jmercado)
Thank you Jayme.

Kanru, could you please take a look at the attachment 8473761 [details], see if the geolocation callbacks before HERE gets killed point to different positions or anything you found abnormal? Thanks.
Flags: needinfo?(kanru)
I am not so sure are the positions steady:

08-15 09:55:26.703   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165528,A,4742.461570,N,12212.600014,W,0.0,,150814,0.0,E,D*23
08-15 09:55:27.733   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165529,A,4742.460783,N,12212.599848,W,0.0,,150814,0.0,E,D*2F
08-15 09:55:28.683   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165530,A,4742.460825,N,12212.599512,W,0.0,,150814,0.0,E,D*26
08-15 09:55:29.693   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165531,A,4742.460812,N,12212.600152,W,0.0,,150814,0.0,E,D*20
08-15 09:55:30.683   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165532,A,4742.460658,N,12212.599196,W,,,150814,,,D*6D
08-15 09:55:37.763   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165539,A,4742.460328,N,12212.598405,W,0.0,,150814,0.0,E,D*2F
08-15 09:55:38.683   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165540,A,4742.460331,N,12212.598379,W,0.0,,150814,0.0,E,D*25
08-15 09:55:39.713   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165541,A,4742.460331,N,12212.598526,W,0.0,,150814,0.0,E,D*28
08-15 09:55:40.693   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165542,A,4742.460327,N,12212.598514,W,0.0,,150814,0.0,E,D*2D
08-15 09:55:41.693   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165543,A,4742.460327,N,12212.598551,W,0.0,,150814,0.0,E,D*2D
08-15 09:55:42.743   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165544,A,4742.460335,N,12212.598568,W,0.0,,150814,0.0,E,D*23
08-15 09:55:43.693   326  1373 E GeckoConsole: NMEA: nmea:     	$GPRMC,165545,A,4742.460336,N,12212.598601,W,0.0,,150814,0.0,E,D*2D

Note there were frequent $GPGSV and $GLGSV which are detailed satellite data.
Probably print backtrace out at where allocating kgsl texture would be easier.
(In reply to Ting-Yu Chou [:ting] from comment #19)
> I am not so sure are the positions steady:

Compare with the numbers from Geoloc test application, those should be steady enough.
Flags: needinfo?(kanru)
Assignee: nobody → sotaro.ikeda.g
Jayme, does the problem still happen on latest 2.0? I tried the STR on Comment 0 on laster 2.0 flame. But I did not see the crash.

On which version of the os were memory reports(attachment 8472580 [details] and attachment 8472581 [details]) got? From the memory-reports, it seems like master flame.
Flags: needinfo?(jmercado)
Tagging qawanted in case someone can get to it before I can
Flags: needinfo?(jmercado)
Keywords: qawanted
I saw the crash by the following STR on latest v2.0 flame(319mb)

[1] start Here Maps
[2] Select the layers button on the bottom right corner
[3] Scroll down
[4] Tap "Public transport view" and "Live traffic view" repeatedly.
[5] Observe app eventually closes resulting in LMK
(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> I saw the crash by the following STR on latest v2.0 flame(319mb)
> 
> [1] start Here Maps
> [2] Select the layers button on the bottom right corner
> [3] Scroll down
> [4] Tap "Public transport view" and "Live traffic view" repeatedly.
> [5] Observe app eventually closes resulting in LMK

I forgot to write one thing. Before [4] need to check "High-resolution map".
About [4] the STR in Comment 24, on mater flame, it is very difficult to select item in "Preferences".
I am not sure if the STR in Comment 24 is regression.
(In reply to Jayme Mercado [:JMercado] from comment #23)
> Tagging qawanted in case someone can get to it before I can

Issue DOES occur in latest 2.0 Flame build (319 MB memory). 

Actual Results:  Opening "Preferences" and scrolling down and up causes HERE Maps to crash. 

Environmental Variables:
Device: Flame 2.0
Build ID: 20140818021116
Gaia: 640ce38ca03f1e26a4524ff4215b8b3f7731e2f0
Gecko: 692c93509dc9
Version: 32.0 (2.0)
Firmware Version: v123
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
QA Whiteboard: [QAnalyst-Triage+][2.0-signoff-need+] → [QAnalyst-Triage?][2.0-signoff-need+]
Flags: needinfo?(jmitchell)
Keywords: qawanted
QA Whiteboard: [QAnalyst-Triage?][2.0-signoff-need+] → [QAnalyst-Triage+][2.0-signoff-need+]
Flags: needinfo?(jmitchell)
(In reply to Sotaro Ikeda [:sotaro] from comment #26)
> About [4] the STR in Comment 24, on mater flame, it is very difficult to
> select item in "Preferences".

Bug 1055214 is created.
See Also: → 1055214
(In reply to Sotaro Ikeda [:sotaro] from comment #24)
> I saw the crash by the following STR on latest v2.0 flame(319mb)
> 
> [1] start Here Maps
> [2] Select the layers button on the bottom right corner
> [3] Scroll down
> [4] Tap "Public transport view" and "Live traffic view" repeatedly.
> [5] Observe app eventually closes resulting in LMK

The crash happen on v2.0 flame(319mb), v2.0 flame(512mb).
The crash did not happen on v1.4 flame(319mb).

I can not test on master flame because of Bug 1055214.
Got about_memoyr, during selecting "Live traffic view"  in [4] in the STR in Comment 24. gralloc buffer (ion-memory) allocation is not too big. But texture in kgsl-memory is very big. It seems like that SkigGL do caches.

13.61 MB (100.0%) -- ion-memory
├───8.86 MB (65.09%) ── b2g (pid=277) [6]
├───1.58 MB (11.63%) ── fd900000.qcom,mdss_mdp (pid=1)
├───1.58 MB (11.63%) ── kworker/0:1 (pid=15)
├───1.58 MB (11.63%) ── mdss_fb0 (pid=484)
└───0.00 MB (00.03%) ── adsprpc-smd (pid=1)

100.55 MB (100.0%) -- kgsl-memory
├───83.82 MB (83.36%) -- HERE Maps (pid=1602)
│   ├──71.44 MB (71.05%) ── texture [81]
│   ├───7.27 MB (07.23%) ── egl_image [3]
│   ├───2.02 MB (02.01%) ── gl [11]
│   ├───1.13 MB (01.12%) ── command [18]
│   ├───1.07 MB (01.06%) ── arraybuffer [35]
│   └───0.91 MB (00.90%) ++ (4 tiny)
└───16.73 MB (16.64%) -- b2g (pid=277)
    ├───8.23 MB (08.19%) ── egl_image [19]
    ├───3.16 MB (03.15%) ── egl_surface [2]
    ├───2.79 MB (02.77%) ── gl [17]
    ├───1.30 MB (01.29%) ++ (6 tiny)
    └───1.25 MB (01.24%) ── command [20]
But SkiaGL cache size is limited to small size by Bug 1045680.
Attachment #8474844 - Attachment is patch: false
When the issue happen, glDrawElements() is called very frequently (680 times in 250ms) every few seconds and kgsl_ioctl_gpumem_alloc_id() gets called (21 times) but never freed.
(In reply to Ting-Yu Chou [:ting] from comment #33)
> When the issue happen, glDrawElements() is called very frequently (680 times
> in 250ms) every few seconds and kgsl_ioctl_gpumem_alloc_id() gets called (21
> times) but never 

I am not so sure that kgsl_ioctl_gpumem_alloc_id() is called from glDrawElements() in user space, will try to verify it.
(In reply to Sotaro Ikeda [:sotaro] from comment #32)
> But SkiaGL cache size is limited to small size by Bug 1045680.

I confirmed that the GrResourceCache in SkiaGL works correctly on Firefox OS v2.0.
On b2g v1.4, SkiaGL is not userd for "HERE Maps" app. It might be a regression of Bug 999841.
Whiteboard: [2.0-exploratory][c=memory p= u=2.0 s=] → [2.0-exploratory][c=memory p= u=2.0 s=][MemShrink]
Whiteboard: [2.0-exploratory][c=memory p= u=2.0 s=][MemShrink] → [2.0-exploratory][c=memory p= u=2.0 s=][MemShrink:P1]
QA Whiteboard: [QAnalyst-Triage+][2.0-signoff-need+] → [QAnalyst-Triage+][lead-review+][2.0-signoff-need+]
Scrach comment 33 & 34.

Even gecko calls glDeleteTextures(), but I don't see corresponding kgsl_ioctl_gpumem_free_id() when kgsl bloat.
Unlike b2g v2.0, CanvasRenderingContext2D::DrawImage() is never called in Here process on v2.1.
(In reply to Sotaro Ikeda [:sotaro] from comment #36)
> On b2g v1.4, SkiaGL is not userd for "HERE Maps" app. It might be a
> regression of Bug 999841.

By modifying the code of b2g v1.4, when SkiaGL is enabled on the "HERE Maps" app, the crash happened on the STR in Comment 24.
(In reply to Jordan de Geus(JordanD) from comment #1)
> Created attachment 8468102 [details]
> dmesg.txt
> 
> This issue DOES NOT occur on Flame 2.1 (319mb), Flame 2.0 (512mb), Flame 1.4
> (319mb), Flame Base 122 (319mb), Flame Base 121-2 (319mb), Buri 2.1, Buri
> 2.0, Buri 1.4, Open C 2.1, Open C 2.0, and Open C 1.4
> 
> Actual: The app will not be closed out by LMK 
> 
> Flame 2.1 (319mb)
> 
> Environmental Variables:
> Device: Flame Master
> Build ID: 20140805040204
> Gaia: 19bf9795263e2ccc15d824a52ebf23c2670fa9b9
> Gecko: 7f81be7db528
> Version: 34.0a1 (Master)
> Firmware Version: v123
> User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

The confirmation on b2g-v2.1 might be incorrect because of Bug 1055214. On  v2.1, touch items on "Preferences" does not work correctly.
(In reply to Sotaro Ikeda [:sotaro] from comment #30)
> (In reply to Sotaro Ikeda [:sotaro] from comment #24)
> > I saw the crash by the following STR on latest v2.0 flame(319mb)
> > 
> > [1] start Here Maps
> > [2] Select the layers button on the bottom right corner
> > [3] Scroll down
> > [4] Tap "Public transport view" and "Live traffic view" repeatedly.
> > [5] Observe app eventually closes resulting in LMK
> 
> The crash happen on v2.0 flame(319mb), v2.0 flame(512mb).
> The crash did not happen on v1.4 flame(319mb).
> 
> I can not test on master flame because of Bug 1055214.

I locally fixed Bug 1055214, by using it, I confirmed that the problem does not happen on master flame.
(In reply to Jayme Mercado [:JMercado] from comment #7)
> B2g-inbound Regression Window
> 
> Last working 
> Environmental Variables:
> Device: Flame Master
> BuildID: 20140430023004
> Gaia: ed00198def12a2df82fb8e4277fa5c3ed2471dd5
> Gecko: c46b933d79a2
> Version: 32.0a1 (Master) 
> Firmware Version: v123
> User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
> 
> 
> First Broken 
> Environmental Variables:
> Device: Flame Master
> BuildID: 20140430053004
> Gaia: adfe7c3fc8c8cf39bf14ab416ff531af1f3dd02f
> Gecko: b5b912c44f82
> Version: 32.0a1 (Master) 
> Firmware Version: v123
> User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

The above gecko's range does not include Bug 999841. But from comment 39, Bug 999841 seems to cause the regression.
And on master, some change fixed the problem.
(In reply to Sotaro Ikeda [:sotaro] from comment #43)
> And on master, some change fixed the problem.

From the nightly roms I checked from which rom the problem is fixed.

Last broken
Device: Flame Master
BuildID: 20140720160202
Gaia: dd5777c3ccfbbf04d0a67f0e21685a31895411e2
Gecko: a5f19ac4e94681d464dcac231dd718b4d748b0b9
Version: 33.0a1 (Master) 

First Fixed 
Device: Flame Master
BuildID: 20140721040210
Gaia: 0114839afc44aa98ae87db91004835f16a49dcab
Gecko: 7ab156d0b6b2401dde5688331fb2c29dfe268002
Version: 33.0a1 (Master)
Attached file git log
From comment 44, checked git log between the changes.
(In reply to Sotaro Ikeda [:sotaro] from comment #45)
> Created attachment 8476428 [details]
> git log
> 
> From comment 44, checked git log between the changes.

Did bisection. It becomes clear that Bug 1022612 fixed the problem on master, though it is still not clear why it fixed the problem.

And Bug 1022612 fix is too risky to uplift to v2.0. It is a very very big change and still some regressions exist on master.
From Comment 46, for b2g-v2.0, disable SkiaGL for HereMap case seems better.
Regarding comment 37, should we ask for vendor's help? Between raw_fDeleteTextures() and ioctl(), there are proprietary libraries libgsl.so and libGLESv2_adreno.so.
Depends on: 1042305
Set "qawanted". Does this problem still happen since Bug 1042305 fix? Bug 1042305 reduces memory usage under memory-pressure event.
Keywords: qawanted
(In reply to Sotaro Ikeda [:sotaro] from comment #50)
> Set "qawanted". Does this problem still happen since Bug 1042305 fix? Bug
> 1042305 reduces memory usage under memory-pressure event.

I confirmed the crash with bug 1042305 fix.
Keywords: qawanted
(In reply to Ting-Yu Chou [:ting] from comment #48)
> Regarding comment 37, should we ask for vendor's help? Between
> raw_fDeleteTextures() and ioctl(), there are proprietary libraries libgsl.so
> and libGLESv2_adreno.so.

Before asking help, at first we need to debug above(gekco) and below(kernel) as much as possible. I am going to investigate more about them from tomorrow. Almost all cases, gecko does something bad thing. We need to make clear if gecko works correctly.
I tested on master flame-kk and v2.0 flame-kk. The results were same to flame-jb.
In the STR in "Preferences", when the "Preferences" was scrolled down, the canvas was hidden by the menu, then CanvasLayer destroyed because CanvasLayer was not necessary for the rendering. Since then, no one calls glFinish(), but Canvas continue to render to SkiaGL. This might cause GL command queue full and deffer to release kgsl buffers.
By adding the glFlush() call, I confirmed that the problem was fixed. But calling of glFlush() should be minimum, need to think about how to add glFlush().
We need to minimize the glFlush() calls, otherwise an application's performance is degraded. When canvas has layers, glFlush() is called like the following.

ClientCanvasLayer::RenderLayer()
 ->CanvasClientSurfaceStream::Update()
  ->SurfaceStream_TripleBuffer::CopySurfaceToProducer()
  ->SurfaceStream_TripleBuffer::SwapProducer()
   ->SharedSurface_Gralloc::Fence()
    ->glFlush()

When Canvas is hidden, ClientCanvasLayer and CanvasClientSurfaceStream is destroyed.
(In reply to Sotaro Ikeda [:sotaro] from comment #55)
> By adding the glFlush() call, I confirmed that the problem was fixed. But
> calling of glFlush() should be minimum, need to think about how to add
> glFlush().

If affects to the performance significantly.
This bug is similar to Bug 859608. In the bug, on Tegra, for eash 100 WebGL's draw calls, WebGLContext calls glFlush(). On latest gecko, it is implemented by WebGLContext::Draw_cleanup(). But it's implementation is not ideal :-(

http://mxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContextDraw.cpp#347
CanvasRenderingContext2D needs to call glFlush() only when CanvasRenderingContext2D does not have related ClientCanvasLayer.
(In reply to Sotaro Ikeda [:sotaro] from comment #59)
> CanvasRenderingContext2D needs to call glFlush() only when
> CanvasRenderingContext2D does not have related ClientCanvasLayer.

Matt, can you give me an advice about Comment 59? What is a good way to do it?
Flags: needinfo?(matt.woodrow)
Blocks: 1049251
Jeff Gilbert should take a look at this.

My suggestion would be to look at nsIAppShell::RunInStableState. That can run an nsIRunnable once we've return to the event loop, and JS should have finished issuing calls to the <canvas> for the frame.
Flags: needinfo?(matt.woodrow) → needinfo?(jgilbert)
Perhaps we can take as an opportunity to improve how we deal with GL resource when we send an app to the background:
- We should call glFlush when we minimize a process. (Note, we usually send a memory pressure when we minimize a process, but we shouldn't assume that we do and still call glFinish first).
- If we get a memory pressure we should call glFinish to guarantee all resources we deleted can be freed. I assume we already free any cached resource.
** Do we force a GLContext lost here already? That's the nuclear option that would let us release -all- gl resources.
- flush/finish should not be done via the CanvasLayer because we can have some GLContext that are not attached to the DOM or presented. We should do this using some mechanism that can find any GLContext in any state.
- We should maybe check the behavior of releasing a GLContext. If the resources are not sync release we should consider calling a glFinish before releasing the context. This might hurt us on benchmark if some are churning GLContext.
We should be able to glFlush every frame, at least, if that helps. Otherwise, we could move the flush-per-100-draw-calls down into GLContext.
Flags: needinfo?(jgilbert)
Depends on: 1060790
(In reply to Benoit Girard (:BenWa) from comment #62)
> - We should call glFlush when we minimize a process. (Note, we usually send
> a memory pressure when we minimize a process, but we shouldn't assume that
> we do and still call glFinish first).

I think glFlush() is enough in this case. memory-pressure is system global event, it is not an application local event. From this point of view, glFlush() is quick enough.
(In reply to Benoit Girard (:BenWa) from comment #62)
> ** Do we force a GLContext lost here already? That's the nuclear option that
> would let us release -all- gl resources.

Current code does not do GLContext lost for SkiaGL. canvas could be rendered even when an application is in background. SkiaGL's GLContext is shared with all SkiaGLs. It seems to mean that all SkiaGL canvases are fallen back to Skia canvas(no hw acceleration) and release the GLContext. I do not know there is such necessity now.
(In reply to Benoit Girard (:BenWa) from comment #62)
> - flush/finish should not be done via the CanvasLayer because we can have
> some GLContext that are not attached to the DOM or presented. We should do
> this using some mechanism that can find any GLContext in any state.

In current gecko, glFlush() is triggered by SurfaceStream's buffer swap operation. And the buffer swap is triggered by CanvasLayer. On gonk, during SurfaceStream's buffer swap, EGL sync is created then glFlush() is called. I am not sure this order could be changed without problem/regression on gonk.

> mEGL->fCreateSync()
> mGL->fFlush();

If it is unavoidable, we keep trigger glFlush() during SurfaceStream's buffer swap that is triggered by CanvasLayer. In this case, we need to add calling glFlush() only when the CanvasLayer does not exist.
The patch works on v2.0 flame. When a SkiaGL canvas has CanvasLayer, glFlush() is called twice for each frame. But second glFlush() is called just after Layer Transaction, therefore impact to the performance seems minimum.
Attachment #8481872 - Attachment description: patch - Call glFlush() to SkiaGL's GLContext for each frame → patch for b2g-v2.0 - Call glFlush() to SkiaGL's GLContext for each frame
Reduce glFlush() calls.
Attachment #8481872 - Attachment is obsolete: true
Fix add/remove PostRefreshObserver problem.
Attachment #8481928 - Attachment is obsolete: true
Comment on attachment 8481969 [details] [diff] [review]
patch for master - Call glFlush() to SkiaGL's GLContext for each frame

Matt, can I have a feedback to the patch?
Attachment #8481969 - Flags: feedback?(matt.woodrow)
Comment on attachment 8481969 [details] [diff] [review]
patch for master - Call glFlush() to SkiaGL's GLContext for each frame

Review of attachment 8481969 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +843,5 @@
> +  {
> +    return;
> +  }
> +  context->mRefreshDriver = context->GetPresShell()->GetPresContext()->RefreshDriver();
> +  context->mRefreshDriver->AddPostRefreshObserver(context);

It's not obvious (without looking things up) why this code is within AddDemotableContext. You should add a comment explaining it, and probably MOZ_ASSERT(mStream).

@@ +869,5 @@
> +  }
> +  nsICanvasRenderingContextInternal::SetCanvasElement(aParentCanvas);
> +
> +  if (mStream) {
> +    mRefreshDriver = GetPresShell()->GetPresContext()->RefreshDriver();

We have a bunch of null checks in the other place we set mRefreshDriver, do we need them here too? Maybe share the code for finding the refresh driver and setting up the observer.

@@ +2610,5 @@
>  
> +void CanvasRenderingContext2D::DidRefresh()
> +{
> +  if (mStream && mStream->GLContext()) {
> +    if (mStream->GLContext()->GetDrawCallsSinceLastFlush() > 0) {

I'm not sure if jgilbert will be ok with adding this sort of thing into GLContext. Could we just have a bool mDrawn in CanvasRenderingContext2D that we set to true for all draw calls and check that?
Attachment #8481969 - Flags: feedback?(matt.woodrow) → feedback+
WebGL also has same problem. I am going to add the change also to WebGLContext.
(In reply to Matt Woodrow (:mattwoodrow) from comment #74)
> 
> @@ +2610,5 @@
> >  
> > +void CanvasRenderingContext2D::DidRefresh()
> > +{
> > +  if (mStream && mStream->GLContext()) {
> > +    if (mStream->GLContext()->GetDrawCallsSinceLastFlush() > 0) {
> 
> I'm not sure if jgilbert will be ok with adding this sort of thing into
> GLContext. Could we just have a bool mDrawn in CanvasRenderingContext2D that
> we set to true for all draw calls and check that?

When we use CLContext, we could correctly understand if calling glFlush() is necessary. If SurfaceStream already called glFlush(), the patch does not duplicate the glFlush() call.
Add WebGL change.
Attachment #8481969 - Attachment is obsolete: true
Attachment #8482863 - Attachment is patch: true
Attachment #8482863 - Attachment mime type: text/x-patch → text/plain
Attachment #8481971 - Attachment is obsolete: true
Attachment #8482863 - Attachment description: patch for master - Call glFlush() to SkiaGL's GLContext for each frame → patch for master - Call glFlush() for each frame if necessary
Attachment #8482864 - Attachment description: patch for b2g-v2.0 - Call glFlush() to SkiaGL's GLContext for each frame → patch for b2g-v2.0 - Call glFlush() for each frame if necessary
Add MakeCurrent() call.
Attachment #8482863 - Attachment is obsolete: true
Attachment #8482864 - Attachment is obsolete: true
Attachment #8482931 - Flags: review?(jgilbert)
Attachment #8482931 - Flags: review?(gwright)
Comment on attachment 8482931 [details] [diff] [review]
patch for master - Call glFlush() for each frame if necessary

Review of attachment 8482931 [details] [diff] [review]:
-----------------------------------------------------------------

It looks a lot like this should be done at the GLContext level.
In BeforeDrawCall, increment the counter. If the counter is >= some value, call fFlush.

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +761,5 @@
>    mCanvasElement->InvalidateCanvasContent(&r);
>  }
>  
>  void
> +CanvasRenderingContext2D::DidRefresh()

Don't do this here. Do it in CanvasRenderingContext2DUserData::PreTransactionCallback.

::: dom/canvas/WebGLContext.cpp
@@ +1742,5 @@
>  
> +void
> +WebGLContext::DidRefresh()
> +{
> +    if (gl && (gl->GetDrawCallsSinceLastFlush() > 0)) {

This is not necessary, as we already handle this in WebGLContext::Draw_cleanup().

::: gfx/gl/GLContext.h
@@ +1076,5 @@
>      void raw_fDrawArrays(GLenum mode, GLint first, GLsizei count) {
>          BEFORE_GL_CALL;
>          mSymbols.fDrawArrays(mode, first, count);
>          AFTER_GL_CALL;
> +        ++mDrawCallsSinceLastFlush;

No, put this in the AfterDrawCall function.

@@ +3161,5 @@
> +    int mDrawCallsSinceLastFlush;
> +
> +public:
> +    int GetDrawCallsSinceLastFlush() {
> +      return mDrawCallsSinceLastFlush;

4-space, mark function as constant.
Attachment #8482931 - Flags: review?(jgilbert) → review-
Attachment #8482931 - Flags: review?(gwright)
Jgilbert, I have some questions to the comment, can you answer them?

> In BeforeDrawCall, increment the counter. If the counter is >= some value, call fFlush.

">= some value" means that the draw calls could be accumulated to some level. This could increase kgsl memory usage on b2g. Why can's we just call glFlush() at most for each frame if draw calls are called?

> >  
> >  void
> > +CanvasRenderingContext2D::DidRefresh()
> 
> Don't do this here. Do it in
> CanvasRenderingContext2DUserData::PreTransactionCallback.

Why do we need to do it in PreTransactionCallback()? It is called only when canvas layer exist. This bug handles the case that canvas layer does not exit. And the PreTransactionCallback() calls SkCanvas::flush(), but it does not call glFlush() in current gecko.

> ::: dom/canvas/WebGLContext.cpp
> @@ +1742,5 @@
> >  
> > +void
> > +WebGLContext::DidRefresh()
> > +{
> > +    if (gl && (gl->GetDrawCallsSinceLastFlush() > 0)) {
> 
> This is not necessary, as we already handle this in
> WebGLContext::Draw_cleanup().

To reduce kgsl memory, I want to call glFlush() at most for each frame if draw calls are called since last flash. Can you explain about why it is a problem?
Flags: needinfo?(jgilbert)
(In reply to Sotaro Ikeda [:sotaro] from comment #84)
> ">= some value" means that the draw calls could be accumulated to some
> level. This could increase kgsl memory usage on b2g. Why can's we just call
> glFlush() at most for each frame if draw calls are called?
> 

Correction:
  at most once for each frame if draw calls are called
(In reply to Sotaro Ikeda [:sotaro] from comment #84)
> Jgilbert, I have some questions to the comment, can you answer them?
> 
> > In BeforeDrawCall, increment the counter. If the counter is >= some value, call fFlush.
> 
> ">= some value" means that the draw calls could be accumulated to some
> level. This could increase kgsl memory usage on b2g. Why can's we just call
> glFlush() at most for each frame if draw calls are called?
I need a detailed explanation of what the memory usage problem is. This is what it sounds like to me:
If we don't flush ever, commands can queue on the GL client in an unbounded manner. Flush clears this by flushing the commands to the GL server.

The solution would seem to be: Flush every N draw calls, to prevent >N draw calls queuing.

If we only flush every frame, an app can still queue a huge number of draw calls, causing this issue. Thus we should really just flush every N draw calls to prevent command queuing.

The only reason to flush every frame would be to get a more consistent framerate, by having each frame flush, instead of only flushing every N/DrawCallsPerFrame frames.
> 
> > >  
> > >  void
> > > +CanvasRenderingContext2D::DidRefresh()
> > 
> > Don't do this here. Do it in
> > CanvasRenderingContext2DUserData::PreTransactionCallback.
> 
> Why do we need to do it in PreTransactionCallback()? It is called only when
> canvas layer exist. This bug handles the case that canvas layer does not
> exit. And the PreTransactionCallback() calls SkCanvas::flush(), but it does
> not call glFlush() in current gecko.
When does a canvas layer not exist? Flush-every-N-draw-calls also handles this case, without all the extra complexity this patch introduces.

> 
> > ::: dom/canvas/WebGLContext.cpp
> > @@ +1742,5 @@
> > >  
> > > +void
> > > +WebGLContext::DidRefresh()
> > > +{
> > > +    if (gl && (gl->GetDrawCallsSinceLastFlush() > 0)) {
> > 
> > This is not necessary, as we already handle this in
> > WebGLContext::Draw_cleanup().
> 
> To reduce kgsl memory, I want to call glFlush() at most for each frame if
> draw calls are called since last flash. Can you explain about why it is a
> problem?
Because we should not have two systems handling additional flushing if they're both targeting similar bugs. We should have one system, and it should handle both cases.
The solution we already have for WebGL sounds like it should handle the bug discussed here. We should not add another workaround if the workaround we already have does the job already.
Flags: needinfo?(jgilbert) → needinfo?(sotaro.ikeda.g)
(In reply to Jeff Gilbert [:jgilbert] from comment #86)
> (In reply to Sotaro Ikeda [:sotaro] from comment #84)
> > Jgilbert, I have some questions to the comment, can you answer them?
> > 
> > > In BeforeDrawCall, increment the counter. If the counter is >= some value, call fFlush.
> > 
> > ">= some value" means that the draw calls could be accumulated to some
> > level. This could increase kgsl memory usage on b2g. Why can's we just call
> > glFlush() at most for each frame if draw calls are called?
> I need a detailed explanation of what the memory usage problem is. This is
> what it sounds like to me:
> If we don't flush ever, commands can queue on the GL client in an unbounded
> manner. Flush clears this by flushing the commands to the GL server.

In the STR in comment 0, when user is showing "Preferences" menu, canvas is not shown in the Display.
Then, CanvasLayer is destroyed. But CanvasRenderingContext2D continue to render to SkiaGL.

On gonk, glFlush() is called like the following sequence. It is triggered only from CanvasLayer. And SkiaGL does not call glFlush().

ClientCanvasLayer::RenderLayer()
->CanvasClientSurfaceStream::Update()
->CanvasClientSurfaceStream::Update()
->SurfaceStream_TripleBuffer::SwapProducer()
->SharedSurface_Gralloc::Fence()
->glFlush()

Therefore, if the CanvasLayer does not exist, no one calls glFlush(), but continue drawing by the application.

On HereMaps consume around 17MB kgsl memory normally. Without glFlush(), it grows up more than 90MB. And then killed by low memory killer. Before the kill, the appliation's response becomes also very slow.
Flags: needinfo?(sotaro.ikeda.g)
> > Why do we need to do it in PreTransactionCallback()? It is called only when
> > canvas layer exist. This bug handles the case that canvas layer does not
> > exit. And the PreTransactionCallback() calls SkCanvas::flush(), but it does
> > not call glFlush() in current gecko.
> When does a canvas layer not exist? Flush-every-N-draw-calls also handles
> this case, without all the extra complexity this patch introduces.

When the canvas is not shown like covered by other elements, CanvasLayer is destroyed for optimization.
The patch calls glFlush() for each 100 draw calls, but it does not fix the crash problem :-(
Attachment #8484359 - Attachment description: temporay patch - Call glFlush() for each 100 draw calls → temporay patch for b2g-2.0 - Call glFlush() for each 100 draw calls
(In reply to Sotaro Ikeda [:sotaro] from comment #89)
> Created attachment 8484359 [details] [diff] [review]
> temporay patch for b2g-2.0 - Call glFlush() for each 100 draw calls
> 
> The patch calls glFlush() for each 100 draw calls, but it does not fix the
> crash problem :-(

Call glFlush() for every 100 draw calls works.
(In reply to Sotaro Ikeda [:sotaro] from comment #90)
> (In reply to Sotaro Ikeda [:sotaro] from comment #89)
> > Created attachment 8484359 [details] [diff] [review]
> > temporay patch for b2g-2.0 - Call glFlush() for each 100 draw calls
> > 
> > The patch calls glFlush() for each 100 draw calls, but it does not fix the
> > crash problem :-(
> 
> Call glFlush() for every 100 draw calls works.

Sorry, glFlush() for every 50 draw calls works.
Attachment #8484359 - Attachment is obsolete: true
Attachment #8484407 - Attachment description: temporay patch for b2g-2.0 - Call glFlush() for each 100 draw calls → temporay patch for b2g-2.0 - Call glFlush() for each 50 draw calls
attachment 8484407 [details] [diff] [review] fixes the crash problem. But it degrade the WebGL performance significantly. On application's fps was degraded from 47fps to 41fps. I do not think this can be used on Firefox OS  as product.
(In reply to Jeff Gilbert [:jgilbert] from comment #86)
> 
> The only reason to flush every frame would be to get a more consistent
> framerate, by having each frame flush, instead of only flushing every
> N/DrawCallsPerFrame frames.

From Comment 93, when we used "Flushing every N DrawCalls" to fix the problem, it degrade WebGL performance significantly(maybe also SkiaGL). I think that it can not be used for Firefox OS product.

Only my another option is attachment 8482931 [details] [diff] [review].
jgilbert, do you have another idea about Comment 94?
Flags: needinfo?(jgilbert)
(In reply to Sotaro Ikeda [:sotaro] from comment #93)
> attachment 8484407 [details] [diff] [review] fixes the crash problem. But it
> degrade the WebGL performance significantly. On application's fps was
> degraded from 47fps to 41fps. I do not think this can be used on Firefox OS 
> as product.

This patch causes WebGL to flush twice every N draw calls. Also, it adds a bunch of code to a function which is inlined into every GLContext draw function. If you tested this patch with the printf, then the results from the test are not useful. `printf_stderr` has a large amount of overhead.
Flags: needinfo?(jgilbert)
Comment on attachment 8484407 [details] [diff] [review]
temporay patch for b2g-2.0 - Call glFlush() for each 50 draw calls

Review of attachment 8484407 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/WebGLContextDraw.cpp
@@ -349,5 @@
>          MOZ_ASSERT(!mBackbufferNeedsClear);
>      }
>  
>      if (gl->WorkAroundDriverBugs()) {
>          if (gl->Renderer() == gl::GLRenderer::Tegra) {

If you wanted to test the behavior of this with WebGL, just remove this tegra-only conditional, or add the vender/renderer that you're seeing this bug on. This bug isn't on all drivers, is it?
(In reply to Jeff Gilbert [:jgilbert] from comment #96)
> 
> This patch causes WebGL to flush twice every N draw calls.

jgilbert, can you explain how the patch cause flush twice every N draw calls on gonk? Though, the patch do it on tegra.

> Also, it adds a
> bunch of code to a function which is inlined into every GLContext draw
> function. If you tested this patch with the printf, then the results from
> the test are not useful. `printf_stderr` has a large amount of overhead.

I did not test it by using `printf_stderr`. To testing, I used an application from Bug 1002776. It shows fps on the display.
Flags: needinfo?(jgilbert)
(In reply to Jeff Gilbert [:jgilbert] from comment #97)
> 
> If you wanted to test the behavior of this with WebGL, just remove this
> tegra-only conditional, or add the vender/renderer that you're seeing this
> bug on. This bug isn't on all drivers, is it?

I created attachment 8484407 [details] [diff] [review] to just for testing on "firefox os flame device". "tegra-only conditional, or add the vender/renderer" does not affect to the testing. The problem could happen on all low memory firefox os devices. And the performance drop is not acceptable as Firefox os.
(In reply to Sotaro Ikeda [:sotaro] from comment #98)
> (In reply to Jeff Gilbert [:jgilbert] from comment #96)
> > 
> > This patch causes WebGL to flush twice every N draw calls.
> 
> jgilbert, can you explain how the patch cause flush twice every N draw calls
> on gonk? Though, the patch do it on tegra.

And can you explain why attachment 8482931 [details] [diff] [review] is not acceptable for you? It fixes the problem without performance regression. jrmuizel and BenWa agreed to it. The problem could happen on low memory firefox os devices. And we want to fix the problem without performance regression. There are already some applications that using SkiaGL and WebGL. They are implemented to work with once glFlush() calls for each frame.
Intent of attachment 8482931 [details] [diff] [review] is calling glFlush() at most once for each frame even when CanvasLayer does not exist.
(In reply to Sotaro Ikeda [:sotaro] from comment #100)
> And can you explain why attachment 8482931 [details] [diff] [review] is not
> acceptable for you? It fixes the problem without performance regression.
> jrmuizel and BenWa agreed to it. The problem could happen on low memory
> firefox os devices. And we want to fix the problem without performance
> regression. There are already some applications that using SkiaGL and WebGL.
> They are implemented to work with once glFlush() calls for each frame.

The problem happens only when CanvasLayer does not exist.
(In reply to Sotaro Ikeda [:sotaro] from comment #101)
> Intent of attachment 8482931 [details] [diff] [review] is calling glFlush()
> at most once for each frame even when CanvasLayer does not exist.

The applications works correctly when CanvasLayers exist. Therefore, if glFlush() is called same way also on CanvasLayers does not exist case, they should work correctly even no CanvasLayers case.

"flush every N draw calls" is totally different logic from the above. And there is no guarantee that the specific N fixes this bug of all applications on all firefox os devices. And it causes performance regression.
Remove redundant printf_stderr(). It did not affect to the FPS of the application of Bug 1002776.
Attachment #8484407 - Attachment is obsolete: true
Let's talk about this next week when I'm in Toronto.

The big thing for me, here, is you're proposing adding a large amount of complexity to handle something that is already handled in WebGL.

I just got done doing a last-second fix to lifetime issues regarding attachments to WebGL contexts, so I really don't want to add new ones if we can avoid it.

In order to solve this, I'm going to require a little paragraph explaining what the actual issue is. I feel like I do not yet completely understand this, since you say that the solution we have for Tegra devices is not sufficient, even though it seems fully-sufficient to me. (It solved an OUT_OF_MEMORY case we were getting from draw calls piling up without being flushed)


(In reply to Sotaro Ikeda [:sotaro] from comment #100)
> (In reply to Sotaro Ikeda [:sotaro] from comment #98)
> > (In reply to Jeff Gilbert [:jgilbert] from comment #96)
> > > 
> > > This patch causes WebGL to flush twice every N draw calls.
> > 
> > jgilbert, can you explain how the patch cause flush twice every N draw calls
> > on gonk? Though, the patch do it on tegra.
> 
> And can you explain why attachment 8482931 [details] [diff] [review] is not
> acceptable for you? It fixes the problem without performance regression.
> jrmuizel and BenWa agreed to it. The problem could happen on low memory
> firefox os devices. And we want to fix the problem without performance
> regression. There are already some applications that using SkiaGL and WebGL.
> They are implemented to work with once glFlush() calls for each frame.

Your patch adds a redundant path which flushes the WebGL context every refresh interval. We already have a workaround for this in WebGL, where we flush every N draw calls. This was added in response to getting GL_OUT_OF_MEMORY errors after too many draw calls. The problem here sounds similar to the reason we added the flush-every-N-draws.

The OUT_OF_MEMORY errors sound very similar to the increased memory usage you're seeing on some devices. It sounds like we need a general version of WebGL's flush-every-N-draws, perhaps with a better number for N based on the device/driver.

Note that if gl issues more than some N draw calls in a frame, we'll still get memory issues with the approach you propose.

I also do not like the counter being handled by GLContext, even though GLContext is not the one to act on it.

Basically, the patch seems like a hack, since it seems to me that we are perfectly capable of doing this more properly.
I need to be convinced that the extra complexity in this patch is both strictly necessary and is the best way to handle this issue. (Or than handling the issue properly is unnecessarily difficult)
Flags: needinfo?(jgilbert)
My thinking is simple, if an application works on Firefox OS devices when canvas have CanvasLayer, the application should also works without problem even when canvas does not have CanvasLayer.

If glFlush() is called similar timing both with/without Canvas Layer, the application works correctly also when canvas does not have CanvasLayer.
We discussed with jgilbert that and agreed attachment 8482931 [details] [diff] [review] to be a temporary solution. But this is a kind of a hack and not an ideal solution. It is going to be handled in a new bug.

By the way, attachment 8482931 [details] [diff] [review] need to be updated.
Attachment #8485283 - Attachment is obsolete: true
Blocks: 1064478
Bug 1064478 is created for long term solution.
Apply comments by jgilbert.
Attachment #8482931 - Attachment is obsolete: true
Comment on attachment 8485949 [details] [diff] [review]
patch for master - Call glFlush() for each frame if necessary

jgilbert, BenWa, can you review the patch? I do not know which GL functions are heavy height. They might not be correct.
Attachment #8485949 - Flags: review?(jgilbert)
Attachment #8485949 - Flags: review?(bgirard)
Attachment #8482933 - Attachment is obsolete: true
Comment on attachment 8485949 [details] [diff] [review]
patch for master - Call glFlush() for each frame if necessary

Review of attachment 8485949 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly fine, but there doesn't seem to be benefit from disabling and enabling the observer only when we 'need' it, and there's some tricky bits regarding WebGL with context loss.
It's much simpler to just have the lifetime for the observer match the lifetime of the rendering context. (Add in the ctor, remove in the dtor)

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +842,5 @@
>      return;
>  
>    DemotableContexts().push_back(context);
> +  // All SkiaGL enabled contexts are managed as DemotableContext.
> +  context->AddPostRefreshObserverIfNecessary();

Just leave the observer attached for the entire lifetime of the canvas context.

::: dom/canvas/CanvasRenderingContext2D.h
@@ +478,5 @@
> +    }
> +    return nullptr;
> +  }
> +  virtual bool NeedsPostRefresh() MOZ_OVERRIDE
> +  {

Put the { at the end of the single-line definition declaration.

::: dom/canvas/WebGLContext.cpp
@@ +399,5 @@
>          printf_stderr("--- WebGL context destroyed: %p\n", gl.get());
>      }
>  #endif
>  
> +    RemovePostRefreshObserver();

Just leave the observer attached for the entire lifetime of the canvas context.

::: dom/canvas/WebGLContext.h
@@ +174,5 @@
>      virtual int32_t GetWidth() const MOZ_OVERRIDE;
>      virtual int32_t GetHeight() const MOZ_OVERRIDE;
>  #endif
> +    virtual bool NeedsPostRefresh() MOZ_OVERRIDE
> +    {

virtual bool Foo() {
  // code
}

@@ +195,5 @@
>                                   JS::Handle<JS::Value> aOptions) MOZ_OVERRIDE;
>  
>      NS_IMETHOD SetIsIPC(bool b) MOZ_OVERRIDE { return NS_ERROR_NOT_IMPLEMENTED; }
> +
> +    virtual void DidRefresh() MOZ_OVERRIDE;

Leave a comment that this is from nsAPostRefreshObserver

::: dom/canvas/nsICanvasRenderingContextInternal.h
@@ +59,5 @@
> +    }
> +    return nullptr;
> +  }
> +
> +  virtual bool NeedsPostRefresh() = 0;

Assume this function always returns true, and remove it.

::: gfx/gl/GLContext.h
@@ +1644,5 @@
>              height = -1;
>              border = -1;
>          }
>          raw_fTexImage2D(target, level, internalformat, width, height, border, format, type, pixels);
> +        mHeavyGLCallsSinceLastFlush = true;

Put this in the raw_fTexImage2D call.

@@ +2271,5 @@
>          BEFORE_GL_CALL;
>          ASSERT_SYMBOL_PRESENT(fEGLImageTargetTexture2D);
>          mSymbols.fEGLImageTargetTexture2D(target, image);
>          AFTER_GL_CALL;
> +        mHeavyGLCallsSinceLastFlush = true;

I don't think this is that heavy, but maybe, so we can keep this.
Attachment #8485949 - Flags: review?(jgilbert) → review-
Apply the comments.
Attachment #8485949 - Attachment is obsolete: true
Attachment #8485949 - Flags: review?(bgirard)
Attachment #8486004 - Flags: review?(jgilbert)
Yesterday, we agreed that this fix is only for the branch. But I think that it might be better also to be checked-in on master to check if it cause the regression. And it will be replaced by Bug 1064478 when it will becomes ready.
jgilbert, BenWa how do you think about Comment 114?
Flags: needinfo?(jgilbert)
Flags: needinfo?(bgirard)
(In reply to Sotaro Ikeda [:sotaro] from comment #114)
> Yesterday, we agreed that this fix is only for the branch. But I think that
> it might be better also to be checked-in on master to check if it cause the
> regression. And it will be replaced by Bug 1064478 when it will becomes
> ready.

I agree. I don't think we should wait until we have the perfect fix before landing this quite-good fix.
Flags: needinfo?(jgilbert)
Yes, taking the 'branch approach' on master as well is better (more testing). Then later we can move the flush to where we think it should go permanently.
Flags: needinfo?(bgirard)
Comment on attachment 8486004 [details] [diff] [review]
patch for master - Call glFlush() for each frame if necessary

Review of attachment 8486004 [details] [diff] [review]:
-----------------------------------------------------------------

I missed two functions the last time, sorry.

::: dom/canvas/CanvasRenderingContext2D.cpp
@@ +489,5 @@
>  
>  NS_IMPL_CYCLE_COLLECTION_CLASS(CanvasRenderingContext2D)
>  
>  NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(CanvasRenderingContext2D)
> +  tmp->RemovePostRefreshObserver();

Huh? Why is this not just in the dtor?

::: gfx/gl/GLContext.h
@@ +1936,5 @@
>      void fFramebufferTexture2D(GLenum target, GLenum attachmentPoint, GLenum textureTarget, GLuint texture, GLint level) {
>          BEFORE_GL_CALL;
>          mSymbols.fFramebufferTexture2D(target, attachmentPoint, textureTarget, texture, level);
>          AFTER_GL_CALL;
> +        mHeavyGLCallsSinceLastFlush = true;

This is not heavy.

@@ +3160,5 @@
>  #endif
> +
> +
> +protected:
> +    bool mHeavyGLCallsSinceLastFlush;

BufferData and BufferSubData are also heavy.
Comment on attachment 8486004 [details] [diff] [review]
patch for master - Call glFlush() for each frame if necessary

I am going to update the patch.
Attachment #8486004 - Flags: review?(jgilbert)
(In reply to Jeff Gilbert [:jgilbert] from comment #118)
> Comment on attachment 8486004 [details] [diff] [review]
> patch for master - Call glFlush() for each frame if necessary
> 
> Review of attachment 8486004 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I missed two functions the last time, sorry.
> 
> ::: dom/canvas/CanvasRenderingContext2D.cpp
> @@ +489,5 @@
> >  
> >  NS_IMPL_CYCLE_COLLECTION_CLASS(CanvasRenderingContext2D)
> >  
> >  NS_IMPL_CYCLE_COLLECTION_UNLINK_BEGIN(CanvasRenderingContext2D)
> > +  tmp->RemovePostRefreshObserver();
> 
> Huh? Why is this not just in the dtor?

I'll move it to destructor in a next patch. In the past it was necessary. The observer was unregistered via nsIPresShell.

> 
> ::: gfx/gl/GLContext.h
> @@ +1936,5 @@
> >      void fFramebufferTexture2D(GLenum target, GLenum attachmentPoint, GLenum textureTarget, GLuint texture, GLint level) {
> >          BEFORE_GL_CALL;
> >          mSymbols.fFramebufferTexture2D(target, attachmentPoint, textureTarget, texture, level);
> >          AFTER_GL_CALL;
> > +        mHeavyGLCallsSinceLastFlush = true;
> 
> This is not heavy.

Will update in a next patch.
> 
> @@ +3160,5 @@
> >  #endif
> > +
> > +
> > +protected:
> > +    bool mHeavyGLCallsSinceLastFlush;
> 
> BufferData and BufferSubData are also heavy.

Will update in a next patch.
Applied the comments.
Attachment #8486004 - Attachment is obsolete: true
Attachment #8486505 - Flags: review?(jgilbert)
Comment on attachment 8486505 [details] [diff] [review]
patch for master - Call glFlush() for each frame if necessary

Review of attachment 8486505 [details] [diff] [review]:
-----------------------------------------------------------------

I forgot that fReadPixels should count as a flush. Otherwise good. Just make raw_fReadPixels count as a flush as well.

::: gfx/gl/GLContext.h
@@ +1132,5 @@
>      void fFinish() {
>          BEFORE_GL_CALL;
>          mSymbols.fFinish();
>          AFTER_GL_CALL;
> +        mHeavyGLCallsSinceLastFlush = false;

fReadPixels is also a Flush.
Attachment #8486505 - Flags: review?(jgilbert) → review+
Apply the comment. Carry "r=jgilbert".
Attachment #8486505 - Attachment is obsolete: true
Attachment #8486646 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/0bf2ecd85d9a
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Please request b2g32 approval on this when you get a chance.
Flags: needinfo?(sotaro.ikeda.g)
Comment on attachment 8486646 [details] [diff] [review]
patch for master - Call glFlush() for each frame if necessary

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 999841
User impact if declined: Some applications cause a crash because of oom.
Testing completed: locally tested
Risk to taking this patch (and alternatives if risky): low risk.
String or UUID changes made by this patch:
Attachment #8486646 - Flags: approval-mozilla-b2g32?
Flags: needinfo?(sotaro.ikeda.g)
Attachment #8486646 - Flags: approval-mozilla-b2g32? → approval-mozilla-b2g32+
QA please help with verification once this lands on 2.0.
Keywords: verifyme
Depends on: 1067561
verify with Flame KK v180 + v2.0 gaia/gecko, it's fine
Gaia      7edd3b0b9f65c3dde235c732d270e43e055a1254
Gecko     https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/f3639e825b3b
BuildID   20140915135336
Version   32.0
Status: RESOLVED → VERIFIED
Keywords: verifyme
This issue has been verified successfully on Woodduck 2.0;Flame2.0&2.2
Reproducing rate: 0/5
See attachment: Verify_Woodduck_Maps.MP4

Woodduck build version:
Gaia-Rev        87b23fa81c3b59f2ba24b84f7d686f4160d4e7cb
Gecko-Rev       d049d4ef127844121c9cf14d2e8ca91fd9045fcb
Build-ID        20141124050313
Version         32.0

Flame 2.0 build:
Gaia-Rev        124c2f85f1b956e9e8429dab5121de702a3bc197
Gecko-Rev       https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/6b59dd2837c1
Build-ID        20141123000206
Version         32.0

Flame2.2 build:
Gaia-Rev        c5bad6d78c5fe168e3bb894fc5cb70902c9b19b1
Gecko-Rev       https://hg.mozilla.org/mozilla-central/rev/c9cfa9b91dea
Build-ID        20141123040209
Version         36.0a1
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: