Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data

RESOLVED FIXED in Firefox 62

Status

()

defect
P2
normal
RESOLVED FIXED
Last year
11 months ago

People

(Reporter: intermittent-bug-filer, Assigned: dholbert)

Tracking

({intermittent-failure, regression})

unspecified
mozilla63
Unspecified
Android
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox60 unaffected, firefox61 unaffected, firefox62+ fixed, firefox63 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Reporter

Description

Last year
treeherder
Filed by: rgurzau [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=179405243&repo=try

https://queue.taskcluster.net/v1/task/Zpm4hYitS16lO7LR1ZdeDw/runs/0/artifacts/public/logs/live_backing.log

Central as beta simulation: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ad3cb610bc3d44f05c3e1bcd34d68371d4516e3

Log: https://treeherder.mozilla.org/logviewer.html#?job_id=179405243&repo=try&lineNumber=1871

[task 2018-05-20T12:22:10.406Z] 12:22:10     INFO -  15 INFO TEST-START | layout/base/tests/test_bug607529.html
[task 2018-05-20T12:22:20.721Z] 12:22:20     INFO -  Buffered messages logged at 12:22:14
[task 2018-05-20T12:22:20.722Z] 12:22:20     INFO -  16 INFO TEST-PASS | layout/base/tests/test_bug607529.html | Should never end up not being cached
[task 2018-05-20T12:22:20.722Z] 12:22:20     INFO -  Buffered messages logged at 12:22:16
[task 2018-05-20T12:22:20.723Z] 12:22:20     INFO -  17 INFO TEST-PASS | layout/base/tests/test_bug607529.html | Should never end up not being cached
[task 2018-05-20T12:22:20.724Z] 12:22:20     INFO -  Buffered messages logged at 12:22:17
[task 2018-05-20T12:22:20.724Z] 12:22:20     INFO -  18 INFO TEST-PASS | layout/base/tests/test_bug607529.html | Should never end up not being cached
[task 2018-05-20T12:22:20.724Z] 12:22:20     INFO -  Buffered messages finished
[task 2018-05-20T12:22:20.725Z] 12:22:20     INFO -  19 INFO TEST-UNEXPECTED-FAIL | layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data at window.onmessage@http://mochi.test:8888/tests/layout/base/tests/test_bug607529.html:53:17
This looks very similar to e.g. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=50a26bd1c5af69f7aa710a6b5513e37753201be4&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&selectedJob=162795661  which got fixed in bug 1427236 three months ago. Now just the first failure line is missing.
Summary: Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data when Gecko 62 merges to Beta on 2018-05-20 → Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data when Gecko 62 merges to Beta on 2018-06-14
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
[Tracking Requested - why for this release]: Hits the beta simulations approx. every second day.
Geoff, is this something you can take a look at? Or help us find the right owner? Thanks.
Flags: needinfo?(gbrown)
Sorry, I don't think I can help. Better to ping the triage owner (:mreavy) or someone familiar with the test (:bz?).
Flags: needinfo?(gbrown)
Maire, can you help out? Thanks.
Flags: needinfo?(mreavy)
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Geoff - this is similar to  bug 1427236 which you resolved several months ago - can you take a look?  See comment 1.  thanks
Flags: needinfo?(mreavy) → needinfo?(gbrown)
(In reply to Maire Reavy [:mreavy] Plz needinfo? from comment #10)
> Geoff - this is similar to  bug 1427236 which you resolved several months
> ago - can you take a look?  See comment 1.  thanks

In bug 1427236 I only disabled the test, waited a while, then noticed that it was no longer failing and re-enabled it. I am otherwise not familiar with the test or the technologies under test. Thus, comment 6.
Flags: needinfo?(gbrown)
Comment hidden (Intermittent Failures Robot)
Olli, can you take a look at this? It hit the beta simulations frequently and bz is on PTO. Thank you.
Flags: needinfo?(bugs)
This hit also in February and April, but started getting frequent in mid-May. Tests running after this one can also be affected, see e.g. https://treeherder.mozilla.org/logviewer.html#?job_id=179405243&repo=try&lineNumber=1871
I don't think this should block the merge to beta, as an intermittent, but it causing other tests to fail does seem like a bad problem so I'll keep tracking for 62 for now.
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Summary: Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data when Gecko 62 merges to Beta on 2018-06-14 → Intermittent layout/base/tests/test_bug607529.html | uncaught exception - SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data
Comment hidden (Intermittent Failures Robot)
Daniel -- Can I ask you to look at this and see if it's a bug in the test or if there's a real issue here?

Rob, Daniel -- Could pernosco help us with this one?

Making this a P2 for investigation given its bump in frequency.
Assignee: nobody → dholbert
Flags: needinfo?(roc)
Flags: needinfo?(dholbert)
Priority: P5 → P2
Assignee

Comment 20

11 months ago
It looks like this is Android-specific, FWIW, based on the reports so far.

(Comment 17 does show 1 "linux32-nightly" report, but that seems to have been a mis-star on this completely unrelated testrun: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ab0cd7871364576739ca44132f17c14fafd13ce&selectedJob=185847737 )
OS: Unspecified → Android
Assignee

Comment 21

11 months ago
Brief analysis: the test has an iframe, which posts a message back to the main test file (the iframe's opener). That message is expected to be one of several different strings, *or* a JSON-encoded blob.  This bug's intermittent test failure is happening in a scenario where the message isn't any of our expected strings, and so we assume it's the JSON blob and we try to parse it as JSON, and JSON.parse encounters an unexpected character and throw an exception.

I'll post a patch to make us log the string that we're failing to parse, whenever JSON.parse fails here.  Hopefully this will help us in sorting out what's going wrong.

Comment 23

11 months ago
mozreview-review
Comment on attachment 8991679 [details]
Bug 1462983 diagnostic: when test_bug607529.html fails JSON.parse operation, print the string that it choked on.

https://reviewboard.mozilla.org/r/256606/#review263510
Attachment #8991679 - Flags: review?(bzbarsky) → review+

Comment 24

11 months ago
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b93e0b5846d7
diagnostic: when test_bug607529.html fails JSON.parse operation, print the string that it choked on. r=bz
Assignee

Updated

11 months ago
Keywords: leave-open
Assignee

Comment 26

11 months ago
I uplifted the diagnostic to beta (with a=testonly), since that seems to be where this is failing the most:
https://hg.mozilla.org/releases/mozilla-beta/rev/273c096e4391

I'll circle back in a few days and see if this makes any more sense once we've got some new test failures.
(In reply to Maire Reavy [:mreavy] Plz needinfo? from comment #19)
> Rob, Daniel -- Could pernosco help us with this one?

Sure, searching for it now.
Flags: needinfo?(roc)
Assignee

Comment 28

11 months ago
Got a hit from the diagnostics -- looks like the mystery non-JSON string is 'loaded':
===
> INFO TEST-UNEXPECTED-FAIL | layout/base/tests/test_bug607529.html | JSON.parse threw, when passed string 'loaded'
===
https://treeherder.mozilla.org/logviewer.html#?job_id=187931037&repo=mozilla-beta&lineNumber=1949
Assignee

Comment 29

11 months ago
So "loaded" is what gets sent by the first file that we load in the iframe -- and our string-recognition chain only accepts the first one (via a "doneOneLoad" boolean).  I'm not sure whether the second one is unexpected or not, and if there's anything to be done for the second one or if we can just disregard it.

The test has this comment at the top:

>  /* General idea: Open a new window (needed because we don't bfcache
>     subframes) that uses requestAnimationFrame, navigate it, navigate it
>     back, and verify that the animations are still running.  */

That leads me to believe that we don't particularly care about the second "loaded" message and we could just disregard it...
Depends on: 1475546
Assignee

Updated

11 months ago
No longer depends on: 1475546
Duplicate of this bug: 1475546
Assignee

Comment 31

11 months ago
[not using mozreview anymore, since it'll get confused about the fact that the diagnostic patch already landed but subsequent parts haven't]

Here's a possible fix -- just watching for & explicitly ignoring a redundant "loaded" notification.

I'm guessing (?) that this notification indicates that the opened window wasn't in our bfcache -- if that turns out to be the case, then now the test will time out and fail, due to never receiving a "revived" message.  And I think (?) that'd be the test accurately indicating the presence of a bug with bfcache management, since it seems like we're expecting it to be guaranteed that this page is cached, though I'm not sure.  And if that happens, we can go from there and annotate the failure with an Android-specific "todo" perhaps...

For now, per the comment in the test, I'm unclear on whether that timeout will happen or whether this "loaded" message is just spurious & ignorable & the page will also fire pageshow and end up proceeding to the other parts.  bz, if you have a leaning about what's expected here and what a better way to approach the error would be, please let me know!
Flags: needinfo?(dholbert)
Attachment #8991959 - Flags: review?(bzbarsky)
Assignee

Updated

11 months ago
Attachment #8991959 - Attachment description: possible fix v1 → patch to fix or at least improve the test v1
Assignee

Comment 32

11 months ago
Note also that this test was failing for a while with "notcached" reports, in bug 1427236 -- that is kinda consistent with us reporting "loaded" a second time as we're doing here (though it's odd that we're not reporting the notcached issue anymore).

Bottom-line, there seems to be a real bug with bfcache management in this "w = window.open(); w.location = ...; w.history.back()" flow, specifically on Android, where we're failing to pull the page out of bfcache, and it manifested in bug 1427236 as well as this bug.

My diagnostic patch & the patch in comment 31 make the issue a bit clearer, but to really fix this, we probably need someone who knows the expected bfcache semantics (someone from the DOM team) to take a look here & see to what extent the test's expectations are valid & where they're falling over.
Comment on attachment 8991959 [details] [diff] [review]
patch to fix or at least improve the test v1

So you're right that if we're getting a second "loaded" that means we didn't get opened from bfcache.

Presumably we didn't get the "notcached" message, since that failure is not in the log?  That suggests that we went into bfcache correctly but didn't end up getting restored from it.  I guess it's possible we hit a memory pressure notification or something that evicted us from the bfcache before we could get restored....

I'm not sure whether we should count the test as failing or just as not testing anything useful when that happens.

In any case, the comment about waiting for "revived" is wrong.  IF we get a second "loaded", that "revived" will never come.

My worry is that if we just allow the second "loaded" then we might not notice when we stop being bfcached altogether and this test is just not testing anything useful anymore. So I think I would just prefer an explicit test failure when we fail to bfcache.  Of course that won't solve this intermittent orange problem.  :(
Attachment #8991959 - Flags: review?(bzbarsky) → review-
Assignee

Comment 34

11 months ago
Thanks for the review feedback!  Yeah, it seems we're not getting the "notcached" message.  And memory-pressure does seem like a good idea for something that might be purging us from the cache regardless... That possibility makes this more understandable.

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #33)
> I'm not sure whether we should count the test as failing or just as not
> testing anything useful when that happens.

How about we treat it as a failure on desktop, but a "todo(false,...)" soft-fail on Android (known issue, not much we can do about it, presumably caused by memory pressure which we do want to respond to gracefully)?
Assignee

Updated

11 months ago
Flags: needinfo?(bzbarsky)
Assignee

Comment 35

11 months ago
If memory pressure is indeed what's triggering this, then the only way I can imagine of making this test reliably pass would be to:
 - manually use SpecialPowers.SomeNewSpecialFunction() to disable bfcache's responsiveness to memory pressure just for this test.
...or:
 - clear some caches at the start of this test to free up memory & reduce likelihood of memory pressure happening.

I'm not immediately sure how to do either of those, though, so I'm tempted to just go with comment 34 if that sounds OK (which means we'll be relying on desktop for the full strictness of this test, basically... which isn't terrible IMO.
> How about we treat it as a failure on desktop, but a "todo(false,...)" soft-fail on Android 

Sounds good to me.

One other thing we could do is have the test listen for memory pressure and always pass if it fires during the test, I guess.  But comment 34 seems simpler.
Flags: needinfo?(bzbarsky)
Assignee

Comment 37

11 months ago
Here's a patch along the lines of comment 34.

(I'm leaving the JSON.parse() failure logging diagnostic in, for robustness, but removing the comment that suggested it was about specifically diagnosing this bug.)
Attachment #8992118 - Flags: review?(bzbarsky)
Assignee

Updated

11 months ago
Attachment #8991959 - Attachment is obsolete: true
Assignee

Comment 38

11 months ago
Two notes on the patch:
 - As you can see, I refactored out the "clean up and finish" code, since I'm adding a second place where we're doing that (to avoid timing out in this scenario where we know no more messages will be coming).

 - I cribbed the navigator.appVersion.includes("Android") check from two other mochitests in layout/style; I assume it works as an "is this Android" test.
Flags: needinfo?(bugs)
Comment on attachment 8992118 [details] [diff] [review]
patch to fix or at least improve the test v2

r=me.  Thank you for digging through this!
Attachment #8992118 - Flags: review?(bzbarsky) → review+
We weren't able to reproduce this bug in Pernosco, probably because we never triggered any memory pressure notifications. In fact desktop Linux Firefox doesn't ever dispatch memory-pressure notifications as far as I can tell... That is unfortunate!
Comment hidden (Intermittent Failures Robot)

Comment 42

11 months ago
Pushed by dholbert@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2bef68d99ba
Check for redundant "loaded" messages from opened window, in test_bug607529.html. r=bz
Reporter

Updated

11 months ago
See Also: → 1475939
Assignee

Comment 44

11 months ago
Uplifted to beta: https://hg.mozilla.org/releases/mozilla-beta/rev/b286967e4939


(In reply to Robert O'Callahan (:roc) (email my personal email if necessary) from comment #40)
> We weren't able to reproduce this bug in Pernosco, probably because we never
> triggered any memory pressure notifications.

That's a shame -- thank you for trying!  Fortunately this was straightforward enough to sort out on its own (unless I missed something -- fingers crossed...)
Status: NEW → RESOLVED
Closed: 11 months ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Assignee

Comment 46

11 months ago
Yup, noted in comment 44 but forgot to update status flag. Thanks!
Assignee

Updated

11 months ago
Duplicate of this bug: 1475939
Comment hidden (Intermittent Failures Robot)
You need to log in before you can comment on or make changes to this bug.