RESOLVED FIXED in Firefox 60

Status

()

defect
P3
normal
RESOLVED FIXED
2 years ago
6 months ago

People

(Reporter: intermittent-bug-filer, Assigned: aosmond)

Tracking

(Blocks 1 bug, {intermittent-failure})

unspecified
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox58 wontfix, firefox59 wontfix, firefox60 fixed)

Details

(Whiteboard: [gfx-noted][stockwell unknown])

Attachments

(3 attachments, 5 obsolete attachments)

this has started failing a bit more this past week, specifically on linux32-stylo debug:
https://brasstacks.mozilla.com/orangefactor/index.html?display=Bug&bugid=1383682

:xidorn, could you look at this and help set a priority?
Flags: needinfo?(xidorn+moz)
Isn't that just because we didn't start testing linux32-stylo until this past week?

I don't believe this is related to stylo given it fails on normal builds as well.

As I mentioned in bug 1369957 comment 13, it may seem to happen more often on stylo builds simply because tests run more frequently on stylo builds than normal builds. We may want to set up SETA data for linux stylo builds for saving machine resources and reduce noises on intermittents.
Flags: needinfo?(xidorn+moz)
See Also: → 1383676
See Also: → 1382969
See Also: → 1383230
The see-also bugs fail the same way as this one: slight differences (I can't see them) around the "image" word. Collectively, these bugs account for about 100 failures / week.

:jmaher -- What's the best way to get some attention for these failures?
Flags: needinfo?(jmaher)
Whiteboard: [stockwell needswork]
starting with the right bugzilla component:
$ ./mach file-info bugzilla-component testing/web-platform/tests//css/css-backgrounds-3/background-size-031.html
Core :: CSS Parsing and Computation
  testing/web-platform/tests/css/css-backgrounds-3/background-size-031.html
Component: web-platform-tests → CSS Parsing and Computation
Flags: needinfo?(jmaher)
Product: Testing → Core
Version: Version 3 → unspecified
:jet, we have a series of failures here all seem to be related to the same root tests of background-size-xx.html- in total the failures are extremely high- can you help find someone to look at these failures, I assume there is a single root cause.
Flags: needinfo?(bugs)
Those are just images, and image showing something unstable (rather than, e.g. wrong position) sounds more like issue in ImageLib or Graphics, though it is also possible that Layout sends some subpixel position to them. Anyway I don't think it's related to style system.

Maybe the easiest way is to add some fuzzy, but we cannot do that for wpt reftest I suppose?
Component: CSS Parsing and Computation → ImageLib
:jgraham, in wpt reftests is there a way to add fuzzy conditions?
Flags: needinfo?(james)
No. It has been discussed, but there's no cross-vendor agreement that it's a good idea. But we will likely need something at some point if webrender makes exact references unworkable.
Flags: needinfo?(james)
Given the fact there is no immediate support for fuzzy in wpt-reftests, do you have suggestions of how to resolve this?
Flags: needinfo?(xidorn+moz)
I have no idea. Maybe ask people working on Graphics or ImageLib and see if they have any idea what may be happening here.
Flags: needinfo?(xidorn+moz)
See Also: → 1383061
:milan (when you get back) -- Do you have an idea of who could investigate these frequent intermittent test failures?
Flags: needinfo?(milan)
I'll leave needinfo on me, but chances are we won't get to the intermittents until 57 is out the door.
There have been 31 failures in the last week.

Number of failures per platform: 
 - 11 linux
 - 11 linux32-stylo-disabled
 - 5 linux64-ccov
 - 4 Linux x64
 
Number of failures per buid type: 
 - 26 debug
 - 5 opt

I see a peak in number of failures on November 24th and from that point on the number of failures is decreasing.

Here is an example of a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=147635673&lineNumber=13605
and a snippet from it:

[task 2017-11-25T22:55:00.041Z] 22:55:00     INFO - PID 5511 | 1511650500033	Marionette	INFO	Testing http://web-platform.test:8000/css/css-backgrounds/background-size-031.html == http://web-platform.test:8000/css/css-backgrounds/reference/background-size-031-ref.xht
13602
[task 2017-11-25T22:55:00.117Z] 22:55:00     INFO - PID 5511 | ++DOMWINDOW == 83 (0xdf4c4000) [pid = 5618] [serial = 83] [outer = 0xe7997040]
13603
[task 2017-11-25T22:55:00.259Z] 22:55:00     INFO - PID 5511 | ++DOMWINDOW == 84 (0xdf4c4c00) [pid = 5618] [serial = 84] [outer = 0xe7997040]
13604
[task 2017-11-25T22:55:00.356Z] 22:55:00     INFO - PID 5511 | 1511650500351	Marionette	INFO	Found 2388 pixels different, maximum difference per channel 22
13605
[task 2017-11-25T22:55:00.400Z] 22:55:00     INFO - TEST-UNEXPECTED-FAIL | /css/css-backgrounds/background-size-031.html | Testing http://web-platform.test:8000/css/css-backgrounds/background-size-031.html == http://web-platform.test:8000/css/css-backgrounds/reference/background-size-031-ref.xht
13606
[task 2017-11-25T22:55:00.404Z] 22:55:00     INFO - REFTEST   IMAGE 1 (TEST): 
13607
[task 2017-11-25T22:55:00.409Z] 22:55:00     INFO - REFTEST   IMAGE 2 (REFERENCE): 
13608
[task 2017-11-25T22:55:00.409Z] 22:55:00     INFO - TEST-INFO took 369ms
13609
[task 2017-11-25T22:55:00.553Z] 22:55:00     INFO - PID 5511 | --DOCSHELL 0xc72c1c00 == 3 [pid = 5511] [id = {cb9fef26-5b86-4dfc-86dc-514e6e4f09bd}]
13610
[task 2017-11-25T22:55:00.553Z] 22:55:00     INFO - PID 5511 | 
13611
[task 2017-11-25T22:55:00.554Z] 22:55:00     INFO - PID 5511 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
13612
[task 2017-11-25T22:55:00.554Z] 22:55:00     INFO - PID 5511 | 
13613
[task 2017-11-25T22:55:00.556Z] 22:55:00     INFO - PID 5511 | [Child 5618, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 791
13614
[task 2017-11-25T22:55:00.657Z] 22:55:00     INFO - PID 5511 | [Parent 5511, Main Thread] WARNING: NS_ENSURE_TRUE(weak) failed: file /builds/worker/workspace/build/src/dom/base/nsFrameMessageManager.cpp, line 339
13615
[task 2017-11-25T22:55:00.941Z] 22:55:00     INFO - PID 5511 | [Parent 5511, Main Thread] WARNING: NS_ENSURE_TRUE(weak) failed: file /builds/worker/workspace/build/src/dom/base/nsFrameMessageManager.cpp, line 339
13616
[task 2017-11-25T22:55:00.979Z] 22:55:00     INFO - PID 5511 | [Child 5561, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 791
13617
[task 2017-11-25T22:55:01.156Z] 22:55:01     INFO - PID 5511 | --DOMWINDOW == 3 (0xe7933800) [pid = 5561] [serial = 3] [outer = (nil)] [url = about:blank]
13618
[task 2017-11-25T22:55:01.158Z] 22:55:01     INFO - PID 5511 | --DOMWINDOW == 2 (0xe792b400) [pid = 5561] [serial = 2] [outer = (nil)] [url = about:blank]
13619
[task 2017-11-25T22:55:01.158Z] 22:55:01     INFO - PID 5511 | --DOCSHELL 0xe676a000 == 0 [pid = 5561] [id = {d6f15acc-d59d-4955-9e3f-47866903ea62}]
Whiteboard: [stockwell unknown] → [stockwell needswork]
this bug also appears to be blocking bug 1383233 (13 failures last week), bug 1383676 (17 failures last week), bug 1382969 (23 failures last week), bug 1383230 (9 failures last week), bug 1383061 (46 failures last week).

In total that is 141 failures that we believe are related to this bug.

:milan, can we get some traction on here now that 57 has been released?
See Also: → 1383233
Flags: needinfo?(milan) → needinfo?(aosmond)
I have been able to reproduce this locally with some effort. Will continue to investigate.
Assignee: nobody → aosmond
Flags: needinfo?(aosmond)
Priority: -- → P3
Whiteboard: [stockwell needswork:owner] → [gfx-noted][stockwell needswork:owner]
Flags: needinfo?(aosmond)
From my reproductions, it appears the imgRequest for the image under test is validated, and purged from the cache. However before it is purged, the layout code checks imgRequestProxy::GetImageStatus which says the image is fully decoded. This messes with the invalidation and causes us to exit nsDOMWindowUtils::UpdateLayerTree without drawing with the downscale-on-decoded size, and instead the native size.

imgRequestProxy::GetImageStatus should not return any state if it has a validator, similar to how imgRequestProxy::Clone defers notifications until the validator has completed.
Status: NEW → ASSIGNED
Flags: needinfo?(aosmond)
There is a second issue with imgRequestProxy::StartDecodingWithResult. Callers also use this as a way to indirectly get the state of the image, as it will return true if a Draw call will succeed with either DrawResult::SUCCESS or DrawResult::WRONG_SIZE. This implies LOAD_COMPLETE, SIZE_AVAILABLE, FRAME_COMPLETE, and DECODE_COMPLETE are set. As with imgRequestProxy::GetImageStatus, if we are validating, the result can be misleading if we throw out the cache. As such it should continue to return false, even if it has everything, until validation has completed.
Flags: needinfo?(bugs)
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=08a094796acfbc2ccfb461712f4cddab359fdbb0

A consequence of this change is that it flashes red as it switches between the reference and the test displays in this test case. This is because when it switches to validating the cache, layout now believes the image to be incomplete and doesn't render any image at all. In theory this could be solved by keeping a previous imgIContainer reference around, so that if we switch from frame complete to frame incomplete, then we render what we used to have. In practice I'm not sure how important this is -- this would only affect new requests created via imgLoader::LoadImage which ended up causing a validate, or requests cloned from *that* request via imgRequestProxy::Clone. Existing imgRequestProxy objects will continue using the old RasterImage, even if the cache gets purged. As such, I believe this is much more likely to be observable when doing specific tests rather than general use.
I seem to be running into a similar root cause as to bug 1413378 by asserting notifications are deferred in GetImageStatus. Not all paths of validation will trip the assert in bug 1413378 so presumably I'm hitting the silent majority.

I'll take out the assert for now, and see about cornering the other problem as part of another bug.
Attachment #8934961 - Attachment is obsolete: true
Okay, this requires more thought given the sheer number of failures. I believe what we need is to distinguish between requests which were created before the validator, and requests created after. The former get grandfathered in and don't change state if and only if the validator evicts the cache. The latter don't get any state until the validator completes. The trouble with the attached patch (and why I tripped the assert in the first place) is that it does not distinguish between the two, and so all requests don't get any state anymore until validation completes.
The patch got a lot bigger. At this point, the only tests that should be failing are some intermittent accessiblity mochitests. I need to confirm whether or not I caused it to permafail due to timing, or an actual regression.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9bf2752358cd3079d5d1cb0f11b69be5a3574dd0
Attachment #8934986 - Attachment is obsolete: true
accessible/tests/mochitest/elm/test_HTMLSpec.html and accessible/tests/mochitest/tree/test_txtcntr.html are failing very consistently and block landing of this patch.
Depends on: 1408636
(In reply to Andrew Osmond [:aosmond] from comment #42)
> accessible/tests/mochitest/elm/test_HTMLSpec.html and
> accessible/tests/mochitest/tree/test_txtcntr.html are failing very
> consistently and block landing of this patch.

It appears that if the image is not displayed at the time of onload, there is an extra text leaf containing a single space. This breaks both tests. If I remove the spaces in the <figure> block of test_HTMLSpec.html, it passes. This is why it failed intermittently before -- it usually (wrong) reused an unvalidated cache, which then expired but only after the test completed. With my changes, it consistently uses a fresh entry, but nothing is decoded, and we see this extra text leaf artifact breaking the test.
Comment on attachment 8936648 [details] [diff] [review]
0001-Bug-1383682-imgRequestProxy-should-avoid-leaking-sta.patch, v4

I'm still figuring out the (new) accessibility test failures, but I think they are all related to the whitespace optimizations, rather than a problem with this patch.
Attachment #8936648 - Flags: review?(tnikkel)
Comment on attachment 8936648 [details] [diff] [review]
0001-Bug-1383682-imgRequestProxy-should-avoid-leaking-sta.patch, v4

Review of attachment 8936648 [details] [diff] [review]:
-----------------------------------------------------------------

Would it make more sense to split this into two patches, one that splits the two notions of deferring notifications, and one that avoids leaking state via GetImageStatus etc?

::: image/imgRequestProxy.h
@@ +116,1 @@
>    virtual bool NotificationsDeferred() const override

I think it would be better if we worked on the naming here a bit more. It's confusing/surprising to have NotificationsDeferred() to return something other than mDeferNotifications.
There are 40 failures in the past 7 days on Linux/ Linux x64 debug, linux32-stylo-disabled debug, linux64-ccov opt, and linux64-stylo-disabled debug.
Most recent log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=154938634
Relevant part of the log: 
[task 2018-01-09T05:14:41.488Z] 05:14:41     INFO - PID 5515 | 1515474881482	Marionette	INFO	Testing http://web-platform.test:8000/css/css-backgrounds/background-size-031.html == http://web-platform.test:8000/css/css-backgrounds/reference/background-size-031-ref.xht
13622
[task 2018-01-09T05:14:41.503Z] 05:14:41     INFO - PID 5515 | ++DOMWINDOW == 83 (0x7f28e6423c00) [pid = 5609] [serial = 83] [outer = 0x7f28f057a660]
13623
[task 2018-01-09T05:14:41.565Z] 05:14:41     INFO - PID 5515 | ++DOMWINDOW == 84 (0x7f28e6425400) [pid = 5609] [serial = 84] [outer = 0x7f28f057a660]
13624
[task 2018-01-09T05:14:41.621Z] 05:14:41     INFO - PID 5515 | 1515474881616	Marionette	INFO	Found 2388 pixels different, maximum difference per channel 22
13625
[task 2018-01-09T05:14:41.661Z] 05:14:41     INFO - TEST-UNEXPECTED-FAIL | /css/css-backgrounds/background-size-031.html | Testing http://web-platform.test:8000/css/css-backgrounds/background-size-031.html == http://web-platform.test:8000/css/css-backgrounds/reference/background-size-031-ref.xht
13626
[task 2018-01-09T05:14:41.667Z] 05:14:41     INFO - REFTEST   IMAGE 1 (TEST): 
13627
[task 2018-01-09T05:14:41.674Z] 05:14:41     INFO - REFTEST   IMAGE 2 (REFERENCE): 
13628
[task 2018-01-09T05:14:41.674Z] 05:14:41     INFO - TEST-INFO took 173ms
13629
[task 2018-01-09T05:14:41.752Z] 05:14:41     INFO - PID 5515 | 
13630
[task 2018-01-09T05:14:41.753Z] 05:14:41     INFO - PID 5515 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
13631
[task 2018-01-09T05:14:41.754Z] 05:14:41     INFO - PID 5515 | 
13632
[task 2018-01-09T05:14:41.755Z] 05:14:41     INFO - PID 5515 | [Child 5609, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 791
13633
[task 2018-01-09T05:14:41.997Z] 05:14:41     INFO - PID 5515 | 
13634
[task 2018-01-09T05:14:41.998Z] 05:14:41     INFO - PID 5515 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
13635
[task 2018-01-09T05:14:41.999Z] 05:14:41     INFO - PID 5515 | 
13636
[task 2018-01-09T05:14:42.006Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 791
13637
[task 2018-01-09T05:14:42.007Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 791
13638
[task 2018-01-09T05:14:42.008Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 791
13639
[task 2018-01-09T05:14:42.197Z] 05:14:42     INFO - PID 5515 | --DOMWINDOW == 4 (0x7f96a0384400) [pid = 5570] [serial = 4] [outer = (nil)] [url = about:blank]
13640
[task 2018-01-09T05:14:42.198Z] 05:14:42     INFO - PID 5515 | --DOMWINDOW == 3 (0x7f96a0378800) [pid = 5570] [serial = 3] [outer = (nil)] [url = about:blank]
13641
[task 2018-01-09T05:14:42.198Z] 05:14:42     INFO - PID 5515 | --DOMWINDOW == 2 (0x7f96a00d4400) [pid = 5570] [serial = 2] [outer = (nil)] [url = about:blank]
13642
[task 2018-01-09T05:14:42.199Z] 05:14:42     INFO - PID 5515 | --DOCSHELL 0x7f96a024c000 == 0 [pid = 5570] [id = {1a452fd4-8ae2-4c98-beba-16212037e470}]
13643
[task 2018-01-09T05:14:42.481Z] 05:14:42     INFO - PID 5515 | --DOMWINDOW == 1 (0x7f96a1679660) [pid = 5570] [serial = 1] [outer = (nil)] [url = http://web-platform.test:8000/testharness_runner.html]
13644
[task 2018-01-09T05:14:42.481Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13645
[task 2018-01-09T05:14:42.481Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13646
[task 2018-01-09T05:14:42.481Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13647
[task 2018-01-09T05:14:42.481Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13648
[task 2018-01-09T05:14:42.481Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13649
[task 2018-01-09T05:14:42.482Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13650
[task 2018-01-09T05:14:42.482Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13651
[task 2018-01-09T05:14:42.482Z] 05:14:42     INFO - PID 5515 | [Child 5570, Main Thread] WARNING: '!mMainThread', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 403
13652
[task 2018-01-09T05:14:42.482Z] 05:14:42     INFO - PID 5515 | --DOMWINDOW == 0 (0x7f96a16eb400) [pid = 5570] [serial = 5] [outer = (nil)] [url = http://web-platform.test:8000/testharness_runner.html]
13653
[task 2018-01-09T05:14:42.489Z] 05:14:42     INFO - PID 5515 | --DOCSHELL 0x7fca6abbd800 == 3 [pid = 5515] [id = {c2e31cf2-1aba-4842-9463-5baa761268a5}]
13654
[task 2018-01-09T05:14:42.506Z] 05:14:42     INFO - PID 5515 | --DOMWINDOW == 10 (0x7fca92bd8800) [pid = 5515] [serial = 2] [outer = (nil)] [url = about:blank]
13655
[task 2018-01-09T05:14:42.506Z] 05:14:42     INFO - PID 5515 | --DOCSHELL 0x7fcab037a000 == 2 [pid = 5515] [id = {ac05895b-a43c-48d7-be75-be7a5bd04d7a}]
13656
[task 2018-01-09T05:14:42.528Z] 05:14:42     INFO - PID 5515 | nsStringStats
13657
[task 2018-01-09T05:14:42.528Z] 05:14:42     INFO - PID 5515 |  => mAllocCount:          17761
13658
[task 2018-01-09T05:14:42.528Z] 05:14:42     INFO - PID 5515 |  => mReallocCount:          112
13659
[task 2018-01-09T05:14:42.528Z] 05:14:42     INFO - PID 5515 |  => mFreeCount:           17761
13660
[task 2018-01-09T05:14:42.529Z] 05:14:42     INFO - PID 5515 |  => mShareCount:           8823
13661
[task 2018-01-09T05:14:42.530Z] 05:14:42     INFO - PID 5515 |  => mAdoptCount:           1526
13662
[task 2018-01-09T05:14:42.530Z] 05:14:42     INFO - PID 5515 |  => mAdoptFreeCount:       1526
13663
[task 2018-01-09T05:14:42.530Z] 05:14:42     INFO - PID 5515 |  => Process ID: 5570, Thread ID: 140285391853376
13664
[task 2018-01-09T05:14:42.530Z] 05:14:42     INFO - PID 5515 | --DOCSHELL 0x7fca877a4000 == 1 [pid = 5515] [id = {3207091e-0a5a-4a6d-ac69-5d8b3f6a949e}]
13665
[task 2018-01-09T05:14:42.530Z] 05:14:42     INFO - PID 5515 | --DOCSHELL 0x7fca9203c800 == 0 [pid = 5515] [id = {8546835e-dccd-4073-ab9a-fdd0ee82567b}]
13666
[task 2018-01-09T05:14:42.545Z] 05:14:42     INFO - PID 5515 | 
13667
[task 2018-01-09T05:14:42.546Z] 05:14:42     INFO - PID 5515 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
13668
[task 2018-01-09T05:14:42.546Z] 05:14:42     INFO - PID 5515 | 
13669
[task 2018-01-09T05:14:42.547Z] 05:14:42     INFO - PID 5515 | [Child 5667, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 791
13670
[task 2018-01-09T05:14:42.547Z] 05:14:42     INFO - PID 5515 | [Child 5667, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 399
13671
[task 2018-01-09T05:14:42.568Z] 05:14:42     INFO - PID 5515 | --DOCSHELL 0x7f28ef2df000 == 0 [pid = 5609] [id = {754b921c-17e1-40e4-ae24-59fad7e2a669}]
13672
[task 2018-01-09T05:14:42.670Z] 05:14:42     INFO - PID 5515 | [Child 5667, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 399
13673
[task 2018-01-09T05:14:42.671Z] 05:14:42     INFO - PID 5515 | [Child 5667, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 399
13674
[task 2018-01-09T05:14:42.692Z] 05:14:42     INFO - PID 5515 | [Child 5667, Main Thread] WARNING: '!gThread', file /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp, line 399
13675
[task 2018-01-09T05:14:42.728Z] 05:14:42     INFO - PID 5515 | nsStringStats
:milan Could you please take a look?
Flags: needinfo?(milan)
This bug is assigned and there are patches just waiting for review comments to be addressed.
Flags: needinfo?(milan)
:tnikkel, there is a patch here for review from 2+ months ago- we continue to have a very high rate of failures- can you please take a look at this patch soon so we can land it or the patch can be updated to pass review and then land?
Flags: needinfo?(tnikkel)
I'm waiting for review comments to be addressed, see comment 48, comment 54.
Flags: needinfo?(tnikkel)
Comment on attachment 8936648 [details] [diff] [review]
0001-Bug-1383682-imgRequestProxy-should-avoid-leaking-sta.patch, v4

:aosmond, can you address the feedback from comment 48?
Flags: needinfo?(aosmond)
Attachment #8936648 - Flags: review?(tnikkel)
This patch mostly splits off the validation state tracking from regular notification deferral.
Attachment #8936648 - Attachment is obsolete: true
Flags: needinfo?(aosmond)
Attachment #8947568 - Flags: review?(tnikkel)
This part renames SetNotificationsDeferred to Mark/ClearPendingNotify to make things more clear.
Attachment #8947569 - Flags: review?(tnikkel)
This part prevents StartDecoding/StartDecodingWithResult/GetImageStatus from leaking state information when validating a request.
Attachment #8947570 - Flags: review?(tnikkel)
Comment on attachment 8947568 [details] [diff] [review]
0001-Bug-1383682-Part-1.-Split-off-imgRequestProxy-notifi.patch, v1

Review of attachment 8947568 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/imgLoader.cpp
@@ +2947,5 @@
> +
> +void
> +imgCacheValidator::UpdateProxies()
> +{
> +  for (auto& proxy : mProxies) {

ChangeOwner can SyncNotify too (if mCancelled) so shouldn't we make a copy of the mProxies array in case it is changed? I guess the old code didn't do this, so is that a bug in the old code?

::: image/imgRequestProxy.cpp
@@ +270,2 @@
>  
>    // If we'd previously requested a synchronous decode, request a decode on the

Why move the decode requesting into ClearValidating? It adds an ordering requirement for the ChangeOwner/ClearValidating call for what benefit?
Attachment #8947569 - Flags: review?(tnikkel) → review+
Attachment #8947570 - Flags: review?(tnikkel) → review+
(In reply to Timothy Nikkel (:tnikkel) from comment #65)
> Comment on attachment 8947568 [details] [diff] [review]
> 0001-Bug-1383682-Part-1.-Split-off-imgRequestProxy-notifi.patch, v1
> 
> Review of attachment 8947568 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: image/imgLoader.cpp
> @@ +2947,5 @@
> > +
> > +void
> > +imgCacheValidator::UpdateProxies()
> > +{
> > +  for (auto& proxy : mProxies) {
> 
> ChangeOwner can SyncNotify too (if mCancelled) so shouldn't we make a copy
> of the mProxies array in case it is changed? I guess the old code didn't do
> this, so is that a bug in the old code?
> 

Grrr, you are right. I missed this. This is a bug in the old code. Maybe I should make imgCacheValidator::RemoveProxy dispatch. It should only be called rarely.

> ::: image/imgRequestProxy.cpp
> @@ +270,2 @@
> >  
> >    // If we'd previously requested a synchronous decode, request a decode on the
> 
> Why move the decode requesting into ClearValidating? It adds an ordering
> requirement for the ChangeOwner/ClearValidating call for what benefit?

I wanted to call StartDecoding if deferred even if ChangeOwner was not called.
(In reply to Andrew Osmond [:aosmond] from comment #67)
> (In reply to Timothy Nikkel (:tnikkel) from comment #65)
> > Comment on attachment 8947568 [details] [diff] [review]
> > 0001-Bug-1383682-Part-1.-Split-off-imgRequestProxy-notifi.patch, v1
> > 
> > Review of attachment 8947568 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > ::: image/imgLoader.cpp
> > @@ +2947,5 @@
> > > +
> > > +void
> > > +imgCacheValidator::UpdateProxies()
> > > +{
> > > +  for (auto& proxy : mProxies) {
> > 
> > ChangeOwner can SyncNotify too (if mCancelled) so shouldn't we make a copy
> > of the mProxies array in case it is changed? I guess the old code didn't do
> > this, so is that a bug in the old code?
> > 
> 
> Grrr, you are right. I missed this. This is a bug in the old code. Maybe I
> should make imgCacheValidator::RemoveProxy dispatch. It should only be
> called rarely.
> 

Actually while ChangeOwner can call SyncNotify, it will be deferred as I haven't cleared the validation flag yet:

https://searchfox.org/mozilla-central/rev/f80722d4f3bfb722c5ec53880c4a7efb71285676/image/ProgressTracker.cpp#305

This strikes me as sort of strange, but it is *not* a change in behaviour as the original code does the same thing:

https://searchfox.org/mozilla-central/rev/f80722d4f3bfb722c5ec53880c4a7efb71285676/image/imgLoader.cpp#3065

I did however move the array onto the stack, just in case. I think the SyncNotifyListener in the second loop is still unsafe for similar reasons...

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=35ba549bbf86515591d0bb2dd9034eae2498a376
Attachment #8947568 - Attachment is obsolete: true
Attachment #8947568 - Flags: review?(tnikkel)
Attachment #8948670 - Flags: review?(tnikkel)
Attachment #8948670 - Flags: review?(tnikkel) → review+
Blocks: 1436323
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c657d87a0011
Part 1. Split off imgRequestProxy notification deferrals for validation. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/62497967f092
Part 2. Rename IProgressObserver::SetNotificationsDeferred to make purpose clear. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/30ee5a21a6a1
Part 3. Prevent imgRequestProxy from leaking the current state when validating. r=tnikkel
Duplicate of this bug: 1426587
Duplicate of this bug: 1383230
Depends on: 1445479
Depends on: 1523555
You need to log in before you can comment on or make changes to this bug.