Closed Bug 1325080 Opened 8 years ago Closed 7 years ago

acid 3 test 72 failed: expected '10' but got '19' - prerequisite failed: style didn't affect image

Categories

(Core :: Layout, defect, P3)

51 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: lando.orellana+bz, Assigned: kevin.hsieh)

References

()

Details

(Keywords: regression)

Attachments

(5 files, 1 obsolete file)

Attached image screenshot (obsolete) —
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0 Build ID: 20161219140923 Steps to reproduce: http://acid3.acidtests.org/ (Firefox 51 beta (all) 64 bits, windows 7 and windows 10) Actual results: Failed 2 tests. Test 26 passed, but took 153ms (less than 30fps) Test 35 failed: expected '0' but got '1' - root element, with no parent node, claims to be a :first-child Test 65 passed, but took 47ms (less than 30fps) Test 69 passed, but took 4 attempts (less than perfect). Test 71 passed, but took 111ms (less than 30fps) Test 72 failed: expected '10' but got '19' - prerequisite failed: style didn't affect image Total elapsed time: 5.39s Expected results: Passed 100
Severity: normal → minor
Has Regression Range: --- → yes
Group: firefox-core-security
Attachment #8820725 - Attachment filename: borrar.tmp.png → screenshot.png
Attachment #8820725 - Attachment is obsolete: true
Attached image screenshot —
Group: firefox-core-security
(In reply to lando.orellana from comment #0) > Test 35 failed: expected '0' but got '1' - root element, with no parent node, claims to be a :first-child This is bug 1311329. > Test 72 failed: expected '10' but got '19' - prerequisite failed: style didn't affect image This also affects 50 (release) and 53 (nightly). Morphing to reflect this bug. Dan, I don't suppose you know why this would be? (Moving over to Core, but it's quite plausible that "Layout" is not the best component)
Component: Untriaged → Layout
Flags: needinfo?(dholbert)
Product: Firefox → Core
Summary: acid 3 test not passing completely → acid 3 test 72 failed: expected '10' but got '19' - prerequisite failed: style didn't affect image
I can't reproduce the test 72 failure. I'm seeing 99/100 most of the time (occasionally 98/100), and in the one time that I saw a dialog pop up to explain the failures, it was 99/100 and #72 wasn't listed as a failure. (I can't reliably make the test pop up a dialog at the end to explain which pieces failed. Anyone know if there's something extra I need to do for that to happen?)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Apparently if you reload the page it does runs 99/100, and only if you close the browser and open it again or if you clean your cache (even with F12 tools), then it shows 98/100 consistently.
Ah, I see in the page's source now: > // for debugging [...] click the "A" in "Acid3" That's how to trigger the alert(). And I can confirm that I do get 98/100 (with #72 failing) on the first load in a fresh profile, echoing comment 4. I also see 98/100 (with #72 failing) reliably in my main browser profile (with lots of tabs open). So maybe there's a race condition involved here, and contention/background-work makes it easier to trigger. Or maybe the image cache is involved. Or both.
Attached file testcase 1 —
This more targeted testcase reproduces the issue for me, in Nightly. It also reproduces the issue in 50.1.0, but not in Firefox 37 (and old release build I happen to have lying around). I think it's basically a race condition around image decoding -- the testcase is trying to read the image's height, *before* we've decoded the image (at which point, it's an inline element showing its alt-text (to which 'height' legitimately does not apply). We never visually render that state, since it decodes so quickly, but it is an intermediate state which we briefly visit while loading a page with images. After the image is decoded, its decoded verison is cached & we never bother with the inline-element alt-text intermediate state again. So that's why this only happens on first load (or more often in my main browsing profile, where my image cache is probably fuller). Additionally, Chrome/Safari never has this problem, because they're not spec-compliant about rendering broken/loading images (as discussed in bug 851048) -- they honor width/height on those elements even when they're not supposed to.
Meant to say -- I think the "regression" here is from the fact that we *used* to synchronously decode data-URI-encoded images, so we'd never visit the intermediate alt-text state. But I think we decode asynchronously now. So, the test is making an invalid assumption & we should probably fix it to listen for the image's "load" event.
Blocks: acid3
Flags: needinfo?(dholbert)
Running mozregression with my reduced testcase, I end up at these csets for Bug 1163856: >86800b7a3362 Seth Fowler — Bug 1163856 (Part 2) - Fix tests that depended on image load event timing. r=tn >3d4f45ad3b8d Seth Fowler — Bug 1163856 (Part 1) - Delay the image load event until the size decoder gets finalized. r=tn This makes sense. Note "Part 2" in particular -- I think this acid3 test is in the same category as the tests that were fixed up there (inadvertently depending on image load event timing).
Blocks: 1163856
Here's a "fixed" version of testcase 1, which tests the "HTMLImageElement.complete" flag before measuring the size, and tries back later if that's false. This passes reliably for me. I think this is what the acid3 test should be doing, too (via the special "retry" return value used elsewhere in the test). (It could listen for img.onload, too, but that seems harder to shoehorn into the framework.) Note that this "complete" flag is supported in all browsers that MDN knows about, per https://developer.mozilla.org/en-US/docs/Web/API/HTMLImageElement#Browser_compatibility tn, could you sanity-check me -- does our behavior and this suggested-Acid3-fix make sense to you?
Flags: needinfo?(tnikkel)
Yes, everything you said sounds right to me. I don't think this is the first time that our doing the size decode async has come up in a compat type way. I'm not even sure it's an overall win since the cost of dispatching the job to another thread and then receiving the results on the main thread could exceed just reading the size from the image directly (for most images this is straight foward) on the main thread.
Flags: needinfo?(tnikkel)
Thanks, Timothy. I posted to www-archive to ask hixie to fix it: https://lists.w3.org/Archives/Public/www-archive/2016Dec/0001.html (That's a list we've used for Acid3 issues in the past, so I think it's the right forum for this.)
Should we move this over to Tech Evang? Looks unlikely that we're going to change anything on the Firefox side here.
Hixie replied to my thread and pointed to a different spec section which might suggest that we're indeed wrong here; I've been meaning to follow up on that. (Specifically: it sounds like the spec might actually require us to synchronously decode the size of data-URI-encoded images.) https://lists.w3.org/Archives/Public/www-archive/2016Dec/0005.html (Technically the spec requires us to synchronously decode external images as well, but it doesn't matter there because any delay for an async-decode for external images can't be detected, since the web page can't tell whether/when the image data has arrived over the network.) That would mean Bug 1163856 (and/or one of its dependencies) was in violation of the spec, for this particular use-case with data URIs. We should probably use this bug to fix that up.
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
So, this will Just Work if we can get local-variable canSyncDecodeMetadata set to "true" in RasterImage::OnImageDataComplete, here: https://dxr.mozilla.org/mozilla-central/rev/6f63f95e28ffc05c0d2f5ef6cd6e05905fe8ea5a/image/RasterImage.cpp#882-887 (With that variable set to true, we'll then call "DecodeMetadata(FLAG_SYNC_DECODE)", which gets us the size synchronously. And then that will mean that a later nsCSSFrameConstructor call to nsImageFrame::ShouldCreateImageFrameFor() will return "true" instead of "false", and we'll skip the intermediate alt-text frame that we otherwise end up creating.) We can get this all to happen if we set RasterImage::mSyncLoad to true. That flag is just a cached version of the INIT_FLAG_SYNC_LOAD init-flag, and it's documented here: https://dxr.mozilla.org/mozilla-central/rev/6f63f95e28ffc05c0d2f5ef6cd6e05905fe8ea5a/image/Image.h#147-148 > * [...] The container is being loaded synchronously, so > * it should avoid relying on async workers to get the container ready. This seems like exactly the behavior we want here. So -- we just need to pass in that INIT_FLAG_SYNC_LOAD flag. The init flags are calculated in a function called "ComputeImageFlags" in ImageFactor.cpp, so that's the code that the patch will modify. We *could* just add a check for |SchemeIs("data", ...)| there, BUT, bz has always steered me away from depending on exact scheme string-matching, in case e.g. someone comes up with a "betterData://" scheme in the future which all our existing checks should still work for. The better better way to whitelist "data:" is to call NS_URIChainHasFlags() and pass in the relevant flags that we care about. The data: flags are listed here... https://dxr.mozilla.org/mozilla-central/rev/6f63f95e28ffc05c0d2f5ef6cd6e05905fe8ea5a/netwerk/protocol/data/nsDataHandler.cpp#55-57 ...and I think nsIProtocolHandler::URI_IS_LOCAL_RESOURCE | nsIProtocolHandler::URI_SYNC_LOAD_IS_OK are the relevant ones we care about here. (The latter one is actually data-URI-specific, right now at least.) SO: Patch coming up, which checks those flags and adds the INIT_FLAG_SYNC_LOAD init-flag if they're set.
(In reply to Daniel Holbert [:dholbert] from comment #15) > We *could* just add a check for |SchemeIs("data", ...)| there, BUT, bz has > always steered me away from depending on exact scheme string-matching, in > case e.g. someone comes up with a "betterData://" scheme in the future which > all our existing checks should still work for. The better better way to > whitelist "data:" is to call NS_URIChainHasFlags() and pass in the relevant > flags that we care about. Just kidding -- we can't do this after all, because the function in question (mozilla::image::ComputeImageFlags) gets called off-main-thread, which means it can't call NS_URIChainHasFlags(). (In particular / most proximally: we can't modify the refcount of the underlying nsIURI, which which means we can't ask the ImageURL for an already_AddRef'ed reference to it, which means we don't have anything that we can pass into NS_URIChainHasFlags().) So, I guess we're stuck doing a protocol string-comparison for "data" after all. Oh well.
I'd actually prefer to just do all size decodes sync (and not just data URIs). Bug 1163856 also caused bug 1299776 and the image that seemed to the causing the problem wasn't a data URI, it was loaded over http (see bug 1299776, comment 16). Never was able to fully get to the bottom of that to determine what was going on because it was too hard to reduce what was going on in that site. Further, I have observed (what seems like a common case) where we have a size decode off-main-thread that is complete and there is a runnable waiting for its turn on the main thread to report the results to the image, meanwhile we ask to decode (or even ask to do some sync decoding on the main thread) of that image but we can't because we don't have the results of the size decode yet. Additionally, doing size decodes async can delay their results by up to 500ms (see bug 1299776, comment 16). 500ms we could be using to decode the image off main thread so it's ready to display sooner, but we can't because we are blocked from doing that. And if the page has layout dependent on the size of the image we could layout the page with the "wrong" image size, and then be forced to do another reflow once we have the "right" image size. For most images a size decode is very straight forward (reading the first few bytes of the image header for size info). It might take more overall cpu time to dispatch a size decode to a decoding thread, and then process the runnable for its results then to just do the size decode sync on the main thread. Also note bug 1320684 has an unlanded patch which touches this code.
Doesn't look like bug 1320684 helped. Timothy, given that 52 is going to be our next ESR, is there any chance we can take another look at this soon?
Flags: needinfo?(tnikkel)
(Timothy, feel free to steal this if you like -- I've been meaning to circle back & implement what you suggested in comment 17, but haven't yet.) (In reply to Ryan VanderMeulen [:RyanVM] from comment #18) > Doesn't look like bug 1320684 helped. Timothy, given that 52 is going to be > our next ESR, is there any chance we can take another look at this soon? (We should fix this, but I'm not sure we need to prioritize it for ESR - its impact is likely pretty minimal. This could only break sites if they use data-URI-images, and depend on querying the dimensions of those data-URI-images immediately after setting them up -- and even then, it only would cause unexpected results on the very first load. Seems like the sort of conditions that are unlikely to occur outside of benchmarks.)
Too late for 54 as 54 was released.
THIS BUG is probably CRITICAL. If first run is finished with 98/100 and second test is 99/100 (not stable) it means the Firefox is system with RACE HAZARD.
You need to catch the error: slow internet connection (mobile) + slow computer. Please delete cached files before retesting. I got 98% in 50% of cases in v52.2.1 ESR 32-bit. No new topic because this problem is reported many times by many people. "status-firefox52: affected → wontfix status-firefox53: affected → wontfix status-firefox55: --- → affected status-firefox-esr52: --- → fix-optional!!! (ESR version!!!) status-firefox54: affected → wontfix" Well, this is embarrassing... STRONGLY!!!
Flags: needinfo?(bzbarsky)
Comment 14 explains what's going on here. The test is racy unless we force synchronous decoding of the size of data: images, which may be a bad idea for performance reasons.
Flags: needinfo?(bzbarsky)
Oh, and this is definitely not critical. This is not a data race; it's a case of an operation happening asynchronously while the test expects it to happen sychronously.
I'm going to have Kevin take a look at this, in spare cycles that he's got during/after his main internship project. (tnikkel, aosmond, and I can help with questions that you might have here.)
Assignee: dholbert → kevin.hsieh
(In reply to Boris Zbarsky [:bz] from comment #23) > Comment 14 explains what's going on here. The test is racy unless we force > synchronous decoding of the size of data: images, which may be a bad idea > for performance reasons. You can start decoding and the size of image is stored in first bytes in GIF, BMP and JPG. The problem will not occur if partial decoding will be made and full decoding later. Detection of iVBOR etc. and partial decoding of first n bytes shall not be a performance problem. Chromium and MSIE hasn't this problem... The test is real - I have this problem in one of my commercial products but used in WAN (not online). I know now why.
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bzbarsky)
(In reply to Vader from comment #26) > You can start decoding and the size of image is stored in first bytes in > GIF, BMP and JPG. Only in this special case where the image is a data URI. Otherwise, the image load probably has to hit the network, so you can't depend on having synchronous access to any of it (even the first few bytes). > The test is real - I have this problem in one of my commercial products but > used in WAN (not online). I know now why. Does your product use a data URI for the image in question? If so, then that's a useful data point to know that this does affect actual real-world content. But if not, then you're likely hitting a different bug from this one.
Flags: needinfo?(vader.24)
(In reply to Daniel Holbert [:dholbert] from comment #27) > (In reply to Vader from comment #26) > > You can start decoding and the size of image is stored in first bytes in > > GIF, BMP and JPG. > > Only in this special case where the image is a data URI. Otherwise, the > image load probably has to hit the network, so you can't depend on having > synchronous access to any of it (even the first few bytes). > > > The test is real - I have this problem in one of my commercial products but > > used in WAN (not online). I know now why. > > Does your product use a data URI for the image in question? If so, then > that's a useful data point to know that this does affect actual real-world > content. But if not, then you're likely hitting a different bug from this > one. Yes, only for uri data: bacause using data uri is used to synchronous show small images that shall be loaded with rest part of content. My project used data uri to prevent saltation of content during loading. The images was changed (including size). These imeges presented state of item. I think this problem shall be solved for data uri that presents any image. Images loaded not locally (http://) - it is natural - they are loaded asynchronously.
Flags: needinfo?(vader.24)
Comment on attachment 8894667 [details] Bug 1325080 - Synchronously decode image metadata for data URIs. https://reviewboard.mozilla.org/r/165800/#review171282 This generally looks good, but it still needs an automated test. r- for now since the test will need a review. ::: image/RasterImage.cpp:938 (Diff revision 1) > > - // Allow a synchronous metadata decode if mSyncLoad was set, or if we're > - // running on a single thread (in which case waiting for the async metadata > + // Always synchronously decode metadata > + if (!mHasSize) { > - // decoder could delay this image's load event quite a bit), or if this image Nit: code comments should generally end with a period. Though in this case I think we should change the wording as well, read on: Nit: this code-comment ("Always...decode") makes sense *in the context of this bug & patch*, but it makes less sense if I'm just looking at the code without knowing anything about this patch. (E.g., we still don't "always" decode here -- we only do it if we don't have the size yet.) How about we replace this with: // If we haven't decoded the metadata (e.g. dimensions) yet, then // do so immediately, now that we have all the image data.
Attachment #8894667 - Flags: review?(dholbert) → review-
Comment on attachment 8894667 [details] Bug 1325080 - Synchronously decode image metadata for data URIs. https://reviewboard.mozilla.org/r/165800/#review171284 ::: commit-message-ec666:1 (Diff revision 1) > +Bug 1325080 - Always synchronously decode image size. r=dholbert This commit message isn't quite accurate. This patch is only changing one of the DecodeMetadata calls (in OnImageDataComplete), but there's one other call which I believe is still async -- in RasterImage::OnImageDataAvailable. So: either (a) this commit message needs to be softened, *or* (b) the patch needs to tweak the other call to live up to what the commit message is promising. (Personally I'd be fine with (a), but Timothy might disagree, depending on how comprehensive he intended "all size decodes" to be in comment 17 here.)
Attachment #8894667 - Flags: review?(dholbert) → review?(tnikkel)
Comment on attachment 8894667 [details] Bug 1325080 - Synchronously decode image metadata for data URIs. https://reviewboard.mozilla.org/r/165800/#review172436 I punted review to tnikkel, but I'll post a few test nits while I'm here (hopefully this doesn't break the MozReview state): ::: image/test/mochitest/test_bug1325080.html:15 (Diff revision 2) > +</head> > +<body> > +<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1325080">Mozilla Bug 1325080</a> > +<pre id="test"> > +<script type="application/javascript"> > +/** Test for Bug 1325080**/ nit: add a space between the bug number and "**/", for symmetry ::: image/test/mochitest/test_bug1325080.html:19 (Diff revision 2) > +function createImage() { > + document.open(); > + document.write('<!DOCTYPE html><head><style>img { height: 10px; }</style></head><body><img src="%2FAMDAwAAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw%3D%3D" alt="alt-text"></body>'); > + document.close(); > +} Probably worth adding a comment here saying "This function's code comes from the Acid3 test #72", both to tacitly give credit and also because (on its own) this seems a bit arcane. ::: image/test/mochitest/test_bug1325080.html:27 (Diff revision 2) > + setTimeout(() => { > + ok(document.images[0].height == 10, "Style should set height of image."); > + SimpleTest.finish(); > + }, 0); Use SimpleTest.executeSoon() -- it's basically the same as setTimeout(..., 0) but you don't need the magic "0". (And setTimeout itself is frowned on in our tests these days, because arbitrary timeout values can be flaky. And I think we really treat setTimeout "0" as some nonzero value under the hood, to throttle things a bit, whereas we don't do that for executeSoon.)
Attachment #8894667 - Flags: review?(dholbert)
Attachment #8894667 - Flags: review?(dholbert)
Also, just to be sure -- have you verified that the mochitest *fails* locally, when your fix isn't applied?
Flags: needinfo?(kevin.hsieh)
Yes, I verified that the mochitest fails without the fix. Just pushed a revision with the nits addressed.
Flags: needinfo?(kevin.hsieh)
Comment on attachment 8894667 [details] Bug 1325080 - Synchronously decode image metadata for data URIs. https://reviewboard.mozilla.org/r/165800/#review173832
Attachment #8894667 - Flags: review?(tnikkel) → review+
Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/393eadbb146f Always synchronously decode image size. r=tnikkel
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Given that this is a simple fix that includes a test and we have confirmation of real-world impact in comment 28, it seems like we should probably nominate this for Beta uplift.
Flags: needinfo?(tnikkel)
Flags: needinfo?(kevin.hsieh)
Flags: in-testsuite+
Let's hold off on beta uplift until we've sorted out bug 1390799.
Flags: needinfo?(kevin.hsieh)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Backout by dholbert@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/69ae163ef3d7 Backed out changeset 393eadbb146f for causing a talos regression.
Abusing needinfo to re-request review, since mozreview has the old patch's review+ status still saved
Flags: needinfo?(tnikkel)
(Note: I backed this out on inbound rather than autoland, because autoland doesn't accept direct pushes like backouts [except from sheriffs]. That means the backout won't make it to autoland for a little while [needs to wait for a inbound->central merge, and then a central->autoland merge]. And this new patch can't be re-"autolanded" until those merges happen. So: if this updated patch gets signoff from tnikkel before that merge happens, we might just want to push the new version to inbound rather than waiting for the merging to happen for autoland's benefit.)
Comment on attachment 8894667 [details] Bug 1325080 - Synchronously decode image metadata for data URIs. https://reviewboard.mozilla.org/r/165800/#review175020 r+ on the new patch. The flag was already set.
Flags: needinfo?(tnikkel)
(Yeah, MozReview bug -- it won't let you re-request review on a new versions of a patch that has previously been granted r+.) I'll probably land the new patch on inbound, per comment 46, but I'm gonna wait a bit because inbound is pretty backlogged right now after the closure (with lots of pending builds, and with a decent amount of the non-pending stuff being busted from other earlier changes). I want to be sure we've got some talos numbers (i.e. non-busted builds) between the backout and the re-landing.
Flags: needinfo?(dholbert)
Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/072e904155c3 Synchronously decode image metadata for data URIs. r=tnikkel
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Flags: needinfo?(dholbert)
Comment on attachment 8894667 [details] Bug 1325080 - Synchronously decode image metadata for data URIs. Approval Request Comment [Feature/Bug causing the regression]: Bug 1163856 [User impact if declined]: See comment 28 [Is this code covered by automated tests?]: Yes [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: No [Why is the change risky/not risky?]: The effect of this patch is limited to data URIs. See Bug 1390799 regarding performance. [String changes made/needed]: None
Attachment #8894667 - Flags: approval-mozilla-beta?
Comment on attachment 8894667 [details] Bug 1325080 - Synchronously decode image metadata for data URIs. Fix a data URIs issue and include a test. Beta56+.
Attachment #8894667 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: