Closed Bug 1455824 Opened 2 years ago Closed 6 months ago

Intermittent layout/style/test/test_font_loading_api.html | application timed out after 370 seconds with no output

Categories

(Core :: CSS Parsing and Computation, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE
mozilla68
Tracking Status
firefox68 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: archaeopteryx [at] coole-files.de

https://treeherder.mozilla.org/logviewer.html#?job_id=174754089&repo=try

https://queue.taskcluster.net/v1/task/MqoNS-uSQ--v12hd7FlWAg/runs/0/artifacts/public/logs/live_backing.log

Seen for a central-as-beta simulation.

[task 2018-04-20T12:52:51.636Z] 12:52:51     INFO -  198 INFO None199 INFO TEST-START | layout/style/test/test_font_loading_api.html
[task 2018-04-20T13:00:22.274Z] 13:00:22     INFO -  Browser unexpectedly found running. Killing...
[task 2018-04-20T13:00:22.275Z] 13:00:22     INFO -  TEST-INFO | started process screentopng
[task 2018-04-20T13:00:22.642Z] 13:00:22     INFO -  TEST-INFO | screentopng: exit 0
[task 2018-04-20T13:00:37.218Z] 13:00:37  WARNING -  TEST-UNEXPECTED-FAIL | layout/style/test/test_font_loading_api.html | application timed out after 370 seconds with no output
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
New failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=180125611&repo=mozilla-inbound&lineNumber=1735
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 2 years agoLast year
Resolution: --- → INCOMPLETE
Recent failure log:

https://treeherder.mozilla.org/logviewer.html#?job_id=185972514&repo=mozilla-beta&lineNumber=1395

[task 2018-07-02T15:49:57.487Z] 15:49:57     INFO -  192 INFO None193 INFO TEST-START | layout/style/test/test_font_loading_api.html
[task 2018-07-02T15:57:27.530Z] 15:57:27     INFO -  Browser unexpectedly found running. Killing...
[task 2018-07-02T15:57:27.531Z] 15:57:27     INFO -  TEST-INFO | started process screentopng
[task 2018-07-02T15:57:27.887Z] 15:57:27     INFO -  TEST-INFO | screentopng: exit 0
[task 2018-07-02T15:57:42.572Z] 15:57:42  WARNING -  TEST-UNEXPECTED-FAIL | layout/style/test/test_font_loading_api.html | application timed out after 370 seconds with no output
[task 2018-07-02T15:57:42.572Z] 15:57:42     INFO -  INFO | automation.py | Application ran for: 0:40:18.077557
[task 2018-07-02T15:57:42.573Z] 15:57:42     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmp4ipoZ1pidlog
[task 2018-07-02T15:57:42.894Z] 15:57:42     INFO -  Contents of /data/anr/traces.txt:
[task 2018-07-02T15:57:42.894Z] 15:57:42     INFO -  ----- pid 807 at 2018-07-02 08:57:27 -----
[task 2018-07-02T15:57:42.894Z] 15:57:42     INFO -  Cmd line: org.mozilla.firefox_beta
[task 2018-07-02T15:57:42.895Z] 15:57:42     INFO -  JNI: CheckJNI is on; workarounds are off; pins=0; globals=265
[task 2018-07-02T15:57:42.895Z] 15:57:42     INFO -  DALVIK THREADS:
[task 2018-07-02T15:57:42.896Z] 15:57:42     INFO -  (mutexes: tll=0 tsl=0 tscl=0 ghl=0)
[task 2018-07-02T15:57:42.896Z] 15:57:42     INFO -  "main" prio=5 tid=1 NATIVE
[task 2018-07-02T15:57:42.897Z] 15:57:42     INFO -    | group="main" sCount=1 dsCount=0 obj=0x414c9578 self=0x2a00d090
[task 2018-07-02T15:57:42.897Z] 15:57:42     INFO -    | sysTid=807 nice=0 sched=0/0 cgrp=apps handle=1073811452
[task 2018-07-02T15:57:42.898Z] 15:57:42     INFO -    | state=S schedstat=( 178902406085 92980650526 216589 ) utm=11167 stm=6723 core=0
[task 2018-07-02T15:57:42.898Z] 15:57:42     INFO -    #00  pc 0001c5a4  /system/lib/libc.so (__futex_syscall3+8)
[task 2018-07-02T15:57:42.899Z] 15:57:42     INFO -    #01  pc 0000e688  /system/lib/libc.so (__pthread_cond_timedwait_relative+48)
[task 2018-07-02T15:57:42.899Z] 15:57:42     INFO -    #02  pc 0000e6e8  /system/lib/libc.so (__pthread_cond_timedwait+64)
[task 2018-07-02T15:57:42.900Z] 15:57:42     INFO -    #03  pc 00052e97  /system/lib/libdvm.so
[task 2018-07-02T15:57:42.901Z] 15:57:42     INFO -    #04  pc 00053461  /system/lib/libdvm.so (dvmChangeStatus(Thread*, ThreadStatus)+30)
[task 2018-07-02T15:57:42.901Z] 15:57:42     INFO -    #05  pc 00048d29  /system/lib/libdvm.so
[task 2018-07-02T15:57:42.902Z] 15:57:42     INFO -    #06  pc 00039afd  /system/lib/libdvm.so
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: Last yearLast year
Resolution: --- → INCOMPLETE
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=198432075&repo=mozilla-beta&lineNumber=1533

[task 2018-09-10T14:44:53.188Z] 14:44:53     INFO -  131 INFO TEST-START | layout/style/test/test_font_loading_api.html
[task 2018-09-10T14:53:29.253Z] 14:53:29     INFO -  wait for org.mozilla.firefox_beta complete; top activity=org.mozilla.firefox_beta
[task 2018-09-10T14:53:29.559Z] 14:53:29     INFO -  Browser unexpectedly found running. Killing...
[task 2018-09-10T14:53:29.559Z] 14:53:29     INFO -  TEST-INFO | started process screentopng
[task 2018-09-10T14:53:29.935Z] 14:53:29     INFO -  TEST-INFO | screentopng: exit 0
[task 2018-09-10T14:53:44.398Z] 14:53:44  WARNING -  TEST-UNEXPECTED-FAIL | layout/style/test/test_font_loading_api.html | application timed out after 370 seconds with no output
[task 2018-09-10T14:53:44.399Z] 14:53:44     INFO -  INFO | automation.py | Application ran for: 0:37:35.174032
[task 2018-09-10T14:53:44.400Z] 14:53:44     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpu7O23Opidlog
[task 2018-09-10T14:53:44.716Z] 14:53:44     INFO -  Contents of /data/anr/traces.txt:
[task 2018-09-10T14:53:44.717Z] 14:53:44     INFO -  ----- pid 814 at 2018-09-10 07:53:29 -----
[task 2018-09-10T14:53:44.717Z] 14:53:44     INFO -  Cmd line: org.mozilla.firefox_beta
[task 2018-09-10T14:53:44.717Z] 14:53:44     INFO -  JNI: CheckJNI is on; workarounds are off; pins=0; globals=274
[task 2018-09-10T14:53:44.718Z] 14:53:44     INFO -  DALVIK THREADS:
[task 2018-09-10T14:53:44.718Z] 14:53:44     INFO -  (mutexes: tll=0 tsl=0 tscl=0 ghl=0)
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

Hi Geoff,

I found a very strange behavior on "android-em-4-3-armv7-api16". In short, merely adding a mochitest at layout/svg/test without running it drastically increases the intermittent failure rate on another irrelevant mochitest at layout/style/test/test_font_loading_api.html. Since there is no code change, I believe there is something wrong on the testing platform.

This https://treeherder.mozilla.org/#/jobs?repo=try&revision=f838d96cda8bb55312fe8df2c9bc6ab7d7f420fb is a patch that has nothing to do with layout/style/test/test_font_loading_api.html. But it makes this testcase 50% failure rate, before it's 0.1% failure rate. Also note this patch passes all the tests on other platforms.

After some investigation, I found the cause is the testcase layout/svg/tests/test_multiple_font_size.html associated with the patch. I remove the testcase from my patch, without touching the code, and rerun the test. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=f892dc9295281a1366d60f2a36f8f6593d2a9396&selectedJob=237086534. It becomes normal again.

It's unbelievable, since I just selected the mochitest at layout/style/test, my associated testcase at layout/svg/test didn't run in the first place, how can it impact the behavior of an irrelevant test?

What to you think about this strange behavior? Will a testcase that is not even selected be loaded somehow on Android?

Thanks!

Flags: needinfo?(gbrown)

A more straightforward example:

This is a commit from mozilla-central: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0e1d7b7825e10fb3186c1ff6f67497b3d32a22b. Everything is fine.

Then I just added the testcase without running it, without modifying any code:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe1eda26df49e0220a44bc0423fdf6d7a1fb6940

The intermittent failure rate increased drastically.

Adding a skip-if = os == 'android' on the testcase on my patch everything becomes normal. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=6dc79d04b0424f0aa7858451cca429bd773a9bce

So it's clear that Android testing platform will somehow load a testcase from another manifest, even if it's not the selected test, though I couldn't find anything about this loading process in the logcat logging file. Not sure if it's the intended behavior ...

Hi Emilio,

Are you familiar with layout/style/test/test_font_loading_api.html? I suspect this testcase isn't particularly robust on Android platform. In short, merely adding a testcase https://hg.mozilla.org/try/rev/2bbc8e6b08f7c7b35a24f5ab0f2e15c435379ae1 on layout/svg/test can increase the failure rate of this test to 50%. For more context, see Comment 34 and 35 on this thread.

Flags: needinfo?(emilio)

I'm not particularly familiar, no, but that failure is not quite actionable since it's crashing and the stack doesn't show anything. I'm not sure if the test is crashing or just getting killed for timing out. Does increasing the timeout value on android using SimpleTest.requestLongerTimeout help?

If it's just a timeout, the usual way to go about debugging this is figuring out where is the test when getting killed, by adding logging to the test. My next guess with who could be familiar with this would be Cameron (:heycam).

Flags: needinfo?(emilio)

but that failure is not quite actionable since it's crashing and the stack doesn't show anything. I'm not sure if the test is crashing or just getting killed for timing out.

I've investigated the detailed logging, it is a hang. In the TEST 18 in that file, a promise for loading font from a url is always in a pending state. I'm not able to debug Android related stuff since I don't have an Android device... Honestly I discovered this bizarre behavior just because a patch I submitted got backed out for breaking this test. It turns out the breaking is caused by adding the associated testcase of my patch, not the code (which really has nothing to do with font loading...)

I guess you could try to log something here:

https://searchfox.org/mozilla-central/rev/532e4b94b9e807d157ba8e55034aef05c1196dc9/layout/style/FontFace.cpp#419

And figure out whether there's some promise that hasn't been resolved. Other than that I'm out of ideas, but I'd defer to Geoff or Cam on whether it makes sense to disable the test on android to unblock you.

Maybe it's just the test being really slow and increasing the timeout does help. The reason why adding a test may exacerbate this is because (AFAIK) adding a test can change in which order tests are run, and with which other tests they're run at the same time. So if this test starts running when the machine is under way more load, maybe we hit the timeout more often.

Maybe it's just the test being really slow and increasing the timeout does help.

The original test already had a 5x factor for more timeout, I increased the factor to 50x, the failure rate did drop significantly. But 2/20 is probably still a nontrivial failure rate. https://treeherder.mozilla.org/#/jobs?repo=try&revision=c954cc8621372252a500fe1db04fb06edd636d9c

My added testcase must be doing some magic to the scheduling algorithm so that it managed to slow an irrelevant testcase that much ... :(

(In reply to violet.bugreport from comment #33)

Will a testcase that is not even selected be loaded somehow on Android?

In short, no. The log is an accurate reflection of which tests have been run.

Keep in mind that a test task or chunk, like mochitest-56, is not a fixed set of tests, but rather a slice from a list of all the tests: If you add or remove any test, it can affect which tests are run in any chunk. For example, from the links in comment 34:

https://taskcluster-artifacts.net/TUjF9M35SsunuefzQJiT-A/0/public/logs/live_backing.log

[task 2019-03-29T10:50:37.924Z] 10:50:37 INFO - 0 INFO SimpleTest START
[task 2019-03-29T10:50:37.925Z] 10:50:37 INFO - 1 INFO TEST-START | layout/style/test/test_flexbox_order.html
[task 2019-03-29T10:50:58.359Z] 10:50:58 INFO - 2 INFO TEST-OK | layout/style/test/test_flexbox_order.html | took 14017ms
[task 2019-03-29T10:50:58.360Z] 10:50:58 INFO - 3 INFO TEST-START | layout/style/test/test_flexbox_order_abspos.html
[task 2019-03-29T10:51:08.575Z] 10:51:08 INFO - 4 INFO TEST-OK | layout/style/test/test_flexbox_order_abspos.html | took 11783ms
[task 2019-03-29T10:51:08.576Z] 10:51:08 INFO - 5 INFO TEST-START | layout/style/test/test_flexbox_order_table.html
[task 2019-03-29T10:51:29.009Z] 10:51:29 INFO - 6 INFO TEST-OK | layout/style/test/test_flexbox_order_table.html | took 15293ms
[task 2019-03-29T10:51:29.010Z] 10:51:29 INFO - 7 INFO TEST-START | layout/style/test/test_flexbox_reflow_counts.html
[task 2019-03-29T10:51:29.013Z] 10:51:29 INFO - 8 INFO TEST-OK | layout/style/test/test_flexbox_reflow_counts.html | took 3563ms
[task 2019-03-29T10:51:39.228Z] 10:51:39 INFO - 9 INFO TEST-START | layout/style/test/test_font_face_cascade.html
[task 2019-03-29T10:51:39.228Z] 10:51:39 INFO - 10 INFO TEST-OK | layout/style/test/test_font_face_cascade.html | took 2882ms
[task 2019-03-29T10:51:39.229Z] 10:51:39 INFO - 11 INFO TEST-START | layout/style/test/test_font_face_parser.html
[task 2019-03-29T10:51:59.802Z] 10:51:59 INFO - 12 INFO TEST-OK | layout/style/test/test_font_face_parser.html | took 19787ms
[task 2019-03-29T10:51:59.803Z] 10:51:59 INFO - 13 INFO TEST-START | layout/style/test/test_font_family_parsing.html
[task 2019-03-29T10:53:24.057Z] 10:53:24 INFO - 14 INFO TEST-OK | layout/style/test/test_font_family_parsing.html | took 81097ms
[task 2019-03-29T10:53:24.058Z] 10:53:24 INFO - 15 INFO TEST-START | layout/style/test/test_font_family_serialization.html
[task 2019-03-29T10:53:34.379Z] 10:53:34 INFO - 16 INFO TEST-OK | layout/style/test/test_font_family_serialization.html | took 7079ms
[task 2019-03-29T10:53:34.380Z] 10:53:34 INFO - 17 INFO TEST-START | layout/style/test/test_font_loading_api.html
[task 2019-03-29T10:57:17.354Z] 10:57:17 INFO - 18 INFO TEST-OK | layout/style/test/test_font_loading_api.html | took 216342ms

https://taskcluster-artifacts.net/Tqool193TBCJeofUHZtI5g/0/public/logs/live_backing.log

[task 2019-03-30T06:13:56.876Z] 06:13:56 INFO - 0 INFO SimpleTest START
[task 2019-03-30T06:13:56.877Z] 06:13:56 INFO - 1 INFO TEST-START | layout/style/test/test_flexbox_order_abspos.html
[task 2019-03-30T06:14:17.308Z] 06:14:17 INFO - 2 INFO TEST-OK | layout/style/test/test_flexbox_order_abspos.html | took 14275ms
[task 2019-03-30T06:14:17.308Z] 06:14:17 INFO - 3 INFO TEST-START | layout/style/test/test_flexbox_order_table.html
[task 2019-03-30T06:14:27.529Z] 06:14:27 INFO - 4 INFO TEST-OK | layout/style/test/test_flexbox_order_table.html | took 9907ms
[task 2019-03-30T06:14:27.530Z] 06:14:27 INFO - 5 INFO TEST-START | layout/style/test/test_flexbox_reflow_counts.html
[task 2019-03-30T06:14:37.746Z] 06:14:37 INFO - 6 INFO TEST-OK | layout/style/test/test_flexbox_reflow_counts.html | took 4536ms
[task 2019-03-30T06:14:37.747Z] 06:14:37 INFO - 7 INFO TEST-START | layout/style/test/test_font_face_cascade.html
[task 2019-03-30T06:14:37.748Z] 06:14:37 INFO - 8 INFO TEST-OK | layout/style/test/test_font_face_cascade.html | took 2937ms
[task 2019-03-30T06:14:37.748Z] 06:14:37 INFO - 9 INFO TEST-START | layout/style/test/test_font_face_parser.html
[task 2019-03-30T06:14:48.101Z] 06:14:48 INFO - 10 INFO TEST-OK | layout/style/test/test_font_face_parser.html | took 7763ms
[task 2019-03-30T06:14:48.104Z] 06:14:48 INFO - 11 INFO TEST-START | layout/style/test/test_font_family_parsing.html
[task 2019-03-30T06:16:01.511Z] 06:16:01 INFO - 12 INFO TEST-OK | layout/style/test/test_font_family_parsing.html | took 72024ms
[task 2019-03-30T06:16:01.512Z] 06:16:01 INFO - 13 INFO TEST-START | layout/style/test/test_font_family_serialization.html
[task 2019-03-30T06:16:11.930Z] 06:16:11 INFO - 14 INFO TEST-OK | layout/style/test/test_font_family_serialization.html | took 5754ms
[task 2019-03-30T06:16:11.930Z] 06:16:11 INFO - 15 INFO TEST-START | layout/style/test/test_font_loading_api.html
[task 2019-03-30T06:23:28.825Z] 06:23:28 INFO - wait for org.mozilla.fennec_aurora complete; top activity=org.mozilla.fennec_aurora

Notice in the first case, layout/style/test/test_flexbox_order.html was run; in the second case it was not (it would probably have been run in mochitest-55 instead).

You can see when Firefox for Android is started:

[task 2019-03-29T10:47:27.826Z] 10:47:27 INFO - adb launch_application: am start -W -n org.mozilla.fennec_aurora/org.mozilla.gecko.BrowserApp ...

It is unlikely that anything that ran before that could affect a test in that process. On the other hand, it is fairly easy -- and too common! -- for a test to influence the behavior of another test in the same browser session (if one test modifies state in firefox that a later test depends on).

Note that the test harness creates a new profile for each firefox session. I think that mechanism is working fine, but if it isn't, then that is one way that we could see tests in one firefox session influenced by tests from another session. I don't think that could be relevant to your case because the failure happens in the first firefox session of the task: In a new emulator session, with android freshly booted.

Flags: needinfo?(gbrown)

layout/style/test/test_font_loading_api.html is fragile on Android in the sense
that adding a new mochitest in an irrelevant directory (e.g. layout/svg/test)
might cause a failure rate close to 50%. See Comment 34 in Bug 1455824.
Increasing SimpleTest.requestLongerTimeout() doesn't make much difference, see
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c954cc8621372252a500fe1db04fb06edd636d9c.

Comment 43 and 44 from Testing team suggested the drastical failure increasing
is probably only related to different task grouping when a new testcase is added
elsewhere.

We should probably disable it now on Android platform, because it might cause
false-negative and confusion when someone lands an unrelated mochitest in
an irrelevant directory.

Is it true that TEST 18 is the only sub-test that causes a problem? If so, I might prefer to just disable that particular sub-test on Android, rather than the whole test. You can check for navigator.appVersion.includes("Android") and skip over that test.

If there are other things in the test causing problems, I'm fine with disabling it entirely on Android.

Attachment #9055036 - Attachment description: Bug 1455824 - Disable test_font_loading_api.html on Android for fragile timeout behavior → Bug 1455824 - Disable TEST 18 of test_font_loading_api.html on Android for fragile timeout behavior

(In reply to Cameron McCormack (:heycam) from comment #46)

Is it true that TEST 18 is the only sub-test that causes a problem? If so, I might prefer to just disable that particular sub-test on Android, rather than the whole test.

Yes, disabling TEST 18 is enough. See try server result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0994efb692eda6414ce61e32f6000dfdb394da5f

I've updated the patch.

Pushed by violet.bugreport@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/d2d7645639c7
Disable TEST 18 of test_font_loading_api.html on Android for fragile timeout behavior r=emilio
Status: REOPENED → RESOLVED
Closed: Last year8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
Assignee: nobody → violet.bugreport

The test is still flaky, it was just disabled on android.

Assignee: violet.bugreport → nobody
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 8 months ago6 months ago
Resolution: --- → INCOMPLETE
See Also: → 1577119
You need to log in before you can comment on or make changes to this bug.