Closed Bug 1462983 Opened 3 years ago Closed 2 years ago

Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data

Categories

(Core :: Layout, defect, P2)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 --- unaffected
firefox62 + fixed
firefox63 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: dholbert)

References

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files, 1 obsolete file)

Filed by: rgurzau [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=179405243&repo=try

https://queue.taskcluster.net/v1/task/Zpm4hYitS16lO7LR1ZdeDw/runs/0/artifacts/public/logs/live_backing.log

Central as beta simulation: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ad3cb610bc3d44f05c3e1bcd34d68371d4516e3

Log: https://treeherder.mozilla.org/logviewer.html#?job_id=179405243&repo=try&lineNumber=1871

[task 2018-05-20T12:22:10.406Z] 12:22:10     INFO -  15 INFO TEST-START | layout/base/tests/test_bug607529.html
[task 2018-05-20T12:22:20.721Z] 12:22:20     INFO -  Buffered messages logged at 12:22:14
[task 2018-05-20T12:22:20.722Z] 12:22:20     INFO -  16 INFO TEST-PASS | layout/base/tests/test_bug607529.html | Should never end up not being cached
[task 2018-05-20T12:22:20.722Z] 12:22:20     INFO -  Buffered messages logged at 12:22:16
[task 2018-05-20T12:22:20.723Z] 12:22:20     INFO -  17 INFO TEST-PASS | layout/base/tests/test_bug607529.html | Should never end up not being cached
[task 2018-05-20T12:22:20.724Z] 12:22:20     INFO -  Buffered messages logged at 12:22:17
[task 2018-05-20T12:22:20.724Z] 12:22:20     INFO -  18 INFO TEST-PASS | layout/base/tests/test_bug607529.html | Should never end up not being cached
[task 2018-05-20T12:22:20.724Z] 12:22:20     INFO -  Buffered messages finished
[task 2018-05-20T12:22:20.725Z] 12:22:20     INFO -  19 INFO TEST-UNEXPECTED-FAIL | layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data at window.onmessage@http://mochi.test:8888/tests/layout/base/tests/test_bug607529.html:53:17
This looks very similar to e.g. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=50a26bd1c5af69f7aa710a6b5513e37753201be4&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&selectedJob=162795661  which got fixed in bug 1427236 three months ago. Now just the first failure line is missing.
Summary: Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data when Gecko 62 merges to Beta on 2018-05-20 → Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data when Gecko 62 merges to Beta on 2018-06-14
[Tracking Requested - why for this release]: Hits the beta simulations approx. every second day.
Geoff, is this something you can take a look at? Or help us find the right owner? Thanks.
Flags: needinfo?(gbrown)
Sorry, I don't think I can help. Better to ping the triage owner (:mreavy) or someone familiar with the test (:bz?).
Flags: needinfo?(gbrown)
Maire, can you help out? Thanks.
Flags: needinfo?(mreavy)
Geoff - this is similar to  bug 1427236 which you resolved several months ago - can you take a look?  See comment 1.  thanks
Flags: needinfo?(mreavy) → needinfo?(gbrown)
(In reply to Maire Reavy [:mreavy] Plz needinfo? from comment #10)
> Geoff - this is similar to  bug 1427236 which you resolved several months
> ago - can you take a look?  See comment 1.  thanks

In bug 1427236 I only disabled the test, waited a while, then noticed that it was no longer failing and re-enabled it. I am otherwise not familiar with the test or the technologies under test. Thus, comment 6.
Flags: needinfo?(gbrown)
Olli, can you take a look at this? It hit the beta simulations frequently and bz is on PTO. Thank you.
Flags: needinfo?(bugs)
This hit also in February and April, but started getting frequent in mid-May. Tests running after this one can also be affected, see e.g. https://treeherder.mozilla.org/logviewer.html#?job_id=179405243&repo=try&lineNumber=1871
I don't think this should block the merge to beta, as an intermittent, but it causing other tests to fail does seem like a bad problem so I'll keep tracking for 62 for now.
Summary: Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data when Gecko 62 merges to Beta on 2018-06-14 → Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data
Daniel -- Can I ask you to look at this and see if it's a bug in the test or if there's a real issue here?

Rob, Daniel -- Could pernosco help us with this one?

Making this a P2 for investigation given its bump in frequency.
Assignee: nobody → dholbert
Flags: needinfo?(roc)
Flags: needinfo?(dholbert)
Priority: P5 → P2
It looks like this is Android-specific, FWIW, based on the reports so far.

(Comment 17 does show 1 "linux32-nightly" report, but that seems to have been a mis-star on this completely unrelated testrun: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ab0cd7871364576739ca44132f17c14fafd13ce&selectedJob=185847737 )
OS: Unspecified → Android
Brief analysis: the test has an iframe, which posts a message back to the main test file (the iframe's opener). That message is expected to be one of several different strings, *or* a JSON-encoded blob.  This bug's intermittent test failure is happening in a scenario where the message isn't any of our expected strings, and so we assume it's the JSON blob and we try to parse it as JSON, and JSON.parse encounters an unexpected character and throw an exception.

I'll post a patch to make us log the string that we're failing to parse, whenever JSON.parse fails here.  Hopefully this will help us in sorting out what's going wrong.
Comment on attachment 8991679 [details]
Bug 1462983 diagnostic: when test_bug607529.html fails JSON.parse operation, print the string that it choked on.

https://reviewboard.mozilla.org/r/256606/#review263510
Attachment #8991679 - Flags: review?(bzbarsky) → review+
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b93e0b5846d7
diagnostic: when test_bug607529.html fails JSON.parse operation, print the string that it choked on. r=bz
I uplifted the diagnostic to beta (with a=testonly), since that seems to be where this is failing the most:
https://hg.mozilla.org/releases/mozilla-beta/rev/273c096e4391

I'll circle back in a few days and see if this makes any more sense once we've got some new test failures.
(In reply to Maire Reavy [:mreavy] Plz needinfo? from comment #19)
> Rob, Daniel -- Could pernosco help us with this one?

Sure, searching for it now.
Flags: needinfo?(roc)
Got a hit from the diagnostics -- looks like the mystery non-JSON string is 'loaded':
===
> INFO TEST-UNEXPECTED-FAIL | layout/base/tests/test_bug607529.html | JSON.parse threw, when passed string 'loaded'
===
https://treeherder.mozilla.org/logviewer.html#?job_id=187931037&repo=mozilla-beta&lineNumber=1949
So "loaded" is what gets sent by the first file that we load in the iframe -- and our string-recognition chain only accepts the first one (via a "doneOneLoad" boolean).  I'm not sure whether the second one is unexpected or not, and if there's anything to be done for the second one or if we can just disregard it.

The test has this comment at the top:

>  /* General idea: Open a new window (needed because we don't bfcache
>     subframes) that uses requestAnimationFrame, navigate it, navigate it
>     back, and verify that the animations are still running.  */

That leads me to believe that we don't particularly care about the second "loaded" message and we could just disregard it...
Depends on: 1475546
No longer depends on: 1475546
Duplicate of this bug: 1475546
[not using mozreview anymore, since it'll get confused about the fact that the diagnostic patch already landed but subsequent parts haven't]

Here's a possible fix -- just watching for & explicitly ignoring a redundant "loaded" notification.

I'm guessing (?) that this notification indicates that the opened window wasn't in our bfcache -- if that turns out to be the case, then now the test will time out and fail, due to never receiving a "revived" message.  And I think (?) that'd be the test accurately indicating the presence of a bug with bfcache management, since it seems like we're expecting it to be guaranteed that this page is cached, though I'm not sure.  And if that happens, we can go from there and annotate the failure with an Android-specific "todo" perhaps...

For now, per the comment in the test, I'm unclear on whether that timeout will happen or whether this "loaded" message is just spurious & ignorable & the page will also fire pageshow and end up proceeding to the other parts.  bz, if you have a leaning about what's expected here and what a better way to approach the error would be, please let me know!
Flags: needinfo?(dholbert)
Attachment #8991959 - Flags: review?(bzbarsky)
Attachment #8991959 - Attachment description: possible fix v1 → patch to fix or at least improve the test v1
Note also that this test was failing for a while with "notcached" reports, in bug 1427236 -- that is kinda consistent with us reporting "loaded" a second time as we're doing here (though it's odd that we're not reporting the notcached issue anymore).

Bottom-line, there seems to be a real bug with bfcache management in this "w = window.open(); w.location = ...; w.history.back()" flow, specifically on Android, where we're failing to pull the page out of bfcache, and it manifested in bug 1427236 as well as this bug.

My diagnostic patch & the patch in comment 31 make the issue a bit clearer, but to really fix this, we probably need someone who knows the expected bfcache semantics (someone from the DOM team) to take a look here & see to what extent the test's expectations are valid & where they're falling over.
Comment on attachment 8991959 [details] [diff] [review]
patch to fix or at least improve the test v1

So you're right that if we're getting a second "loaded" that means we didn't get opened from bfcache.

Presumably we didn't get the "notcached" message, since that failure is not in the log?  That suggests that we went into bfcache correctly but didn't end up getting restored from it.  I guess it's possible we hit a memory pressure notification or something that evicted us from the bfcache before we could get restored....

I'm not sure whether we should count the test as failing or just as not testing anything useful when that happens.

In any case, the comment about waiting for "revived" is wrong.  IF we get a second "loaded", that "revived" will never come.

My worry is that if we just allow the second "loaded" then we might not notice when we stop being bfcached altogether and this test is just not testing anything useful anymore. So I think I would just prefer an explicit test failure when we fail to bfcache.  Of course that won't solve this intermittent orange problem.  :(
Attachment #8991959 - Flags: review?(bzbarsky) → review-
Thanks for the review feedback!  Yeah, it seems we're not getting the "notcached" message.  And memory-pressure does seem like a good idea for something that might be purging us from the cache regardless... That possibility makes this more understandable.

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #33)
> I'm not sure whether we should count the test as failing or just as not
> testing anything useful when that happens.

How about we treat it as a failure on desktop, but a "todo(false,...)" soft-fail on Android (known issue, not much we can do about it, presumably caused by memory pressure which we do want to respond to gracefully)?
Flags: needinfo?(bzbarsky)
If memory pressure is indeed what's triggering this, then the only way I can imagine of making this test reliably pass would be to:
 - manually use SpecialPowers.SomeNewSpecialFunction() to disable bfcache's responsiveness to memory pressure just for this test.
...or:
 - clear some caches at the start of this test to free up memory & reduce likelihood of memory pressure happening.

I'm not immediately sure how to do either of those, though, so I'm tempted to just go with comment 34 if that sounds OK (which means we'll be relying on desktop for the full strictness of this test, basically... which isn't terrible IMO.
> How about we treat it as a failure on desktop, but a "todo(false,...)" soft-fail on Android 

Sounds good to me.

One other thing we could do is have the test listen for memory pressure and always pass if it fires during the test, I guess.  But comment 34 seems simpler.
Flags: needinfo?(bzbarsky)
Here's a patch along the lines of comment 34.

(I'm leaving the JSON.parse() failure logging diagnostic in, for robustness, but removing the comment that suggested it was about specifically diagnosing this bug.)
Attachment #8992118 - Flags: review?(bzbarsky)
Attachment #8991959 - Attachment is obsolete: true
Two notes on the patch:
 - As you can see, I refactored out the "clean up and finish" code, since I'm adding a second place where we're doing that (to avoid timing out in this scenario where we know no more messages will be coming).

 - I cribbed the navigator.appVersion.includes("Android") check from two other mochitests in layout/style; I assume it works as an "is this Android" test.
Flags: needinfo?(bugs)
Comment on attachment 8992118 [details] [diff] [review]
patch to fix or at least improve the test v2

r=me.  Thank you for digging through this!
Attachment #8992118 - Flags: review?(bzbarsky) → review+
We weren't able to reproduce this bug in Pernosco, probably because we never triggered any memory pressure notifications. In fact desktop Linux Firefox doesn't ever dispatch memory-pressure notifications as far as I can tell... That is unfortunate!
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2bef68d99ba
Check for redundant "loaded" messages from opened window, in test_bug607529.html. r=bz
See Also: → 1475939
Uplifted to beta: https://hg.mozilla.org/releases/mozilla-beta/rev/b286967e4939


(In reply to Robert O'Callahan (:roc) (email my personal email if necessary) from comment #40)
> We weren't able to reproduce this bug in Pernosco, probably because we never
> triggered any memory pressure notifications.

That's a shame -- thank you for trying!  Fortunately this was straightforward enough to sort out on its own (unless I missed something -- fingers crossed...)
Status: NEW → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Yup, noted in comment 44 but forgot to update status flag. Thanks!
Duplicate of this bug: 1475939
You need to log in before you can comment on or make changes to this bug.