Closed Bug 1425698 Opened 2 years ago Closed 9 months ago

wpt missing Ahem font on OSX in CI

Categories

(Testing :: web-platform-tests, defect)

Version 3
defect
Not set

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: emilio, Unassigned)

References

Details

Attachments

(2 files)

Bug 1425227 fixed a test that relied on the "ahem" flag to run correctly, and it worked fine on my machine (which has Ahem installed) and on the Linux try server.

However, this test failed on OSX machines like this:

  https://treeherder.mozilla.org/logviewer.html#?job_id=151982071&repo=autoland

So I had to back out my patch.

There are tons of tests with:

  <meta name="flags" content="ahem">

It'd be nice to run them correctly.
Blocks: 1425227
So in theory starting wptrunner with --install-fonts does the right thing. But whilst this works on OSX locally it didn't seem to make any difference in CI. I need to get a loaner  or something to figure out what's happening.
Update:

I got a loaner and tried running wpt with --install-fonts. The font appears to be correctly installed to ~/Library/Fonts but it isn't picked up as a font by the system. I have no idea what's going on; I can't find any documentation on account configuration that might stop it working and the same font installed into the global font directory does work. A solution is to add Ahem directly to the puppet configs, but when I looked at that it was hard to figure out where it would go.
Summary: Figure out a way to run WPT tests with the Ahem flag correctly. → wpt missing Ahem font on OSX in CI
jfkthame: Do you have any idea why trying to install a font on OSX by adding it to ~/Library/Fonts isn't working here? It seems to depend on something specific to the OSX configuration since it was working locally but not on the CI images.
Flags: needinfo?(jfkthame)
https://github.com/web-platform-tests/results-collection/issues/218 is a similar issue observed on Sauce Labs. In that case it seemed like restarting the font server after creating ~/Library/Fonts helped, but locally deleting that directory, rebooting, and running wpt with --install-font doesn't make the test fail. It is not impossible that ensuring ~/Libary/Fonts exists for the test user in the puppet config would solve this, but I'm not sure how to test that.
(In reply to James Graham [:jgraham] from comment #4)
> jfkthame: Do you have any idea why trying to install a font on OSX by adding
> it to ~/Library/Fonts isn't working here? It seems to depend on something
> specific to the OSX configuration since it was working locally but not on
> the CI images.

Is it possible the tests are being run under a different user than the font is installed for? Fonts in a user's ~/Library/Fonts would only be available to that specific user.

Hmm, comment 5 makes it sound like that probably isn't the issue.
Flags: needinfo?(jfkthame)
I wonder if it has something to do with how the font files end up on the machine: are they being marked as "quarantined" by macOS in some way, and need an explicit user confirmation that it's OK to go ahead and use them?
bhearsum: Can we ensure that ~/Library/Fonts exists for the builder user on the macs? I assume this requires some configuration change, but it is was pretty unclear to me where best to put this in the puppet configs. It is of course possible that this directory already exists and there's some other problem; I'd need to get a loaner again to verify one way or another.
Flags: needinfo?(bhearsum)
I'll look at a builder and see what I can find out.
Flags: needinfo?(dhouse)
For clarity: I'm assuming that the builder user is the one that actually runs mozharness/tests, if that isn't the case please substitute in the correct user there :)
Flags: needinfo?(bhearsum)
If you can't get the ~/Library/Fonts install working, then it might be
worth trying to install it globally in /Library/Fonts instead.
I don't know if we have enough permissions to get sudo access from tasks? If not we obviously can just install the font onto the machines directly, but that does have the disadvantage that if we change the font we need to update the worker configuration, so it seems less than ideal.
Depends on: 1492703
We do not allow sudo from the tasks. I agree with keeping the fonts for tests in the test user's ~/Library/Fonts. In the little I researched this issue, I saw some reported issues on osx with user fonts not appearing, but I've seen nothing very clear or testable. In my experience, I have seen problems with specific fonts not having the correct postscript font name, or other internal font data, and that preventing OSX from finding the font correctly. But that would consistently fail and if that were the problem with "Ahem", I'd expect it to not work when manually tested on a macbook with similar osx version.

I requested a loaner to test it out myself to see if I can break and fix local fonts, and CIDuty may be able to test and research continuing with what I find.
dhouse: Did you have any luck with the loaner? It should be possible to reproduce with a gecko checkout / artifact build running something like `mach wpt --install-fonts testing/web-platform/tests/infrastructure/assumptions/ahem.html` (or something similar with the tests packages from a build). If the font is available the test will pass, otherwise it won't. You can also copy the font to ~/Library/Fonts and verify that the OSX font tools don't immediately recognise it. I previously found that installing it in the system font directory *does* work, and it works on a mac machine locally, so the theory that the font is corrupted doesn't seem to fit the facts.
Is there any progress on this?
So I spent some time with this on a local mac. It seems like it's possible to reproduce the problem by creating a user on the command line and su-ing into that user. The avaialble fonts (according to `system_profiler SPFontsDataType` or starting e.g. Firefox and trying) doesn't include any of the user fonts in ~/Library/Fonts. Switching to that user in the GUI and trying the same thing *does* show the user fonts. So it seems like one of two things must be true; either a) the setup code that runs when you first enter the GUI is required to register the font directory or b) the available font paths only depend on the GUI user and not the user starting the application process. To distinguish these possibilities I logged in as a user and then did su to access another user for which the GUI setup steps had previously been run. In this case the listed fonts still corresponded to the GUI login user. So it seems like b) is true i.e. we get fonts for the user running the graphical session, not for the user launching the Firefox process.

I'm assuming that at some point in the setup on automation we su to the build user and therefore don't have access to fonts in ~/Library/Fonts?

If the setup is as I expect I'm not sure what solution is possible here. Presumably making the build user run the GUI session would be a big change? Maybe it's possible to make /Library/Fonts globally writable so we can install the font there? That might also have downsides.
I suspect that's the expected behavior of the su command.
Does it work if you change "su" to "su -" instead?
(i.e. make it behave as a login shell)
AFAICT when it's a login shell I just don't see any user fonts at all; neither the ones for the current user nor the ones for the GUI session user. Restarting the fontserver using astutil server -shutdown doesn't seem to make any difference.
Similar to user su/switching, it may be a problem with how the worker processes are run automatically. Maybe they start before the gui is active or are not given the full environment (like `su -`).

The configuration in automation is:
* The builder user is set as the autoLoginUser
* A LaunchAgent is set to run the worker wrapper script
* There is a fake display (hdmi dongle I think) attached
So when the mac boots, it auto-logs-in as the builder user and runs the worker to then pull and run tests.

I tested gui/none on the loaner I have, and confirmed that I see user fonts when there is an active gui session for the user (whether I am checking the font list from that gui or from a remote shell). And a full shell/environment is required (If I `su` to the active gui user, then the user fonts are not displayed. If I `su -`, then the user fonts are displayed).

I'll test running my loaner with an autologinuser and see what I can find out about the environment there.
Well, it doesn't look like it is caused by the autologin. But it could still be a timing issue, or that the worker process does not have the full environment:

I set autologin for my newly created user account. After a reboot, I ssh'd in (no vnc session) and checking for the user font and display succeeded:
```
[dhouse@t-yosemite-r7-246.test.releng.mdc1.mozilla.com ~]$ /usr/local/bin/font-check.sh 
Tue Oct  9 22:22:54 PDT 2018
/usr/local/bin/font-check.sh
      Location: /Local/Users/dhouse/Library/Fonts/OpenDyslexic-Regular.otf
Graphics/Displays:

    Intel Iris:

      Chipset Model: Intel Iris
      Type: GPU
      Bus: Built-In
      VRAM (Dynamic, Max): 1536 MB
      Vendor: Intel (0x8086)
      Device ID: 0x0a2e
      Revision ID: 0x0009
      Displays:
        Display:
          Resolution: 1600 x 1200 @ 60 Hz
          Pixel Depth: 32-Bit Color (ARGB8888)
          Display Serial Number: 1600x1200 60 
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Rotation: Supported
```

I checked for a display because I found that, when logged into the gui with vnc, (user fonts are found and) a display is also reported:
```
[cltbld@t-yosemite-r7-246.test.releng.mdc1.mozilla.com ~]$ system_profiler SPDisplaysDataType
Graphics/Displays:

    Intel Iris:

      Chipset Model: Intel Iris
      Type: GPU
      Bus: Built-In
      VRAM (Dynamic, Max): 1536 MB
      Vendor: Intel (0x8086)
      Device ID: 0x0a2e
      Revision ID: 0x0009
      Displays:
        Display:
          Resolution: 1600 x 1200 @ 60 Hz
          Pixel Depth: 32-Bit Color (ARGB8888)
          Display Serial Number: 1600x1200 60 
          Main Display: Yes
          Mirror: Off
          Online: Yes
          Rotation: Supported

# logged out of the cltbld user gui session:
[cltbld@t-yosemite-r7-246.test.releng.mdc1.mozilla.com ~]$ system_profiler SPDisplaysDataType
Graphics/Displays:

    Intel Iris:

      Chipset Model: Intel Iris
      Type: GPU
      Bus: Built-In
      VRAM (Dynamic, Max): 1536 MB
      Vendor: Intel (0x8086)
      Device ID: 0x0a2e
      Revision ID: 0x0009
```
Flags: needinfo?(dhouse)
12+ hours later, without user activity, the user font and display are still found. So, the system isn't breaking them with a screensaver, hardware/software power savings, or something else related to inactive time.

Other things I want to test:
1. Does a launchagent process see the user fonts and display the same as an active console session?
  a. Does this change within seconds of starting?
  b. Is the agent started before the gui is fully active?
  c. When is the font daemon started?
2. Is the test's install of the font doing something different or failing in some way?
  a. Users do not have a Fonts directory by default. Is the install creating the directory?
3. Test with the actual test's font.
  a. Is the test font in the same font format that I've tested?
4. If a font is installed by a process,
  a. when is that font registered and visible by the same process?
  b. or by another/child process?
5. Is the font visibility the problem? (Is this testing/investigation not relevant?)
  a. review the actual problem
  b. If it might be "visible" by the process, but not usable:
    1. Is the test trying to use the font too soon (before it is loaded/parsed?)?
    2. Can we get more logging or debug within the test processes to see the font object?
re: 2.

The install of the font within the test is done by wptrunner: https://github.com/web-platform-tests/wpt/blob/master/tools/wptrunner/wptrunner/font.py#L68-L88
It copies the font into the ~username/Library/Fonts directory (with shutil's copy2), and then uses system_profiler to check for the font.
So that is the same as what we've tested.

I saw mention of font auto-activation in some issues, and confirmed auto-activation is turned on for the builder user (default. viewed in font book's preferences).
re: 2a.

I cloned and ran wpt directly to perform the infrastructure/assumptions/ahem.html test. When run from a console, separate from the gui but with the user gui session active, the test copies the Ahem font into the ~user/Library/Fonts directory and the test passes (the ~/Library/Fonts directory is created, and the font is left after). (The font is being used: I added a test loading a font that I could recognize in the test as loaded and confirmed I see the newly installed user font is being used.)

Whether the test is run headless or not does not change the font's visibility in the test.

So I did not get a font failure with wpt/Ahem.

6. Does the test with user font (ahem) always fail? When I delete the font from the user fonts directory, the font cache keeps a copy in its database. If the font is being removed at some point, we might have a mis-match.
Here's another try run [1] with the patch to use --install-fonts rather than the (also-broken-on-mac) custom thing we were doing before that. The Ahem test in testing/web-platform/tests/infrastructure/assumptions/ahem.html is still failing on mac (the job passes because it's marked as an expected fail), but there isn't a test that runs before that one, or anything else obvious that would fit the cache-corruption theory.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a0e97d1a1e6ff59044935c8db98f782ec772e66e&selectedJob=204821422
I tested some more of this but no cause found yet:

(In reply to Dave House [:dhouse] from comment #21)
> 1. Does a launchagent process see the user fonts and display the same as an
> active console session?

When the user fonts were installed before reboot:
Yes, from an at-load LaunchAgent.
Yes, from a LaunchAgent triggered by a semaphore, created by a system at-boot LaunchDaemon (like we use in automation for the test task).
Also, I tested the LaunchAgent running wpt for the ahem test and it passes. NOTE: It doesn't look like the LaunchAgent gets the full gui user's environment (PATH did not include /usr/local/bin to find the virtualenv binary).

When the user fonts were not installed before reboot, and I tested wpt again from a LaunchAgent:
It installs the user font and the test passes.
```
user: dhouse
Thu Oct 11 16:44:14 PDT 2018
ls: /Users/dhouse/Library/Fonts/: No such file or directory
[...]
 0:49.21 pid:449 1539301512816	Marionette	INFO	Testing http://web-platform.test:8000/infrastructure/assumptions/ahem.html == http://web-platform.test:8000/infrastructure/assumptions/ahem-ref.html
[...]
web-platform-test
~~~~~~~~~~~~~~~~~
Ran 1 checks (1 tests)
Expected results: 1
OK
 0:50.51 INFO Closing logging queue
 0:50.51 INFO queue closed
Thu Oct 11 16:45:14 PDT 2018
#system_profiler:
      Location: /Users/dhouse/Library/Fonts/Ahem.ttf
      Location: /Users/dhouse/Library/Fonts/OpenDyslexicMono-Regular.ttf
total 360
drwxr-xr-x   4 dhouse  1000     136 Oct 11 16:45 .
drwx------@ 44 dhouse  1000    1496 Oct 11 16:44 ..
-rw-r--r--   1 dhouse  1000   21768 Oct 10 15:51 Ahem.ttf
-rw-r--r--   1 dhouse  1000  157400 Oct 10 14:58 OpenDyslexicMono-Regular.ttf
```

>   a. Does this change within seconds of starting?

It does not appear to change.

>   b. Is the agent started before the gui is fully active?

Possibly. The gui starts at the end of system startup. "As the final part of system initialization, launchd launches loginwindow" "[...] loginwindow bypasses the usual login prompt and begins the user session immediately [...] when the system administrator has configured the computer to automatically log in as a specified user." (https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/Lifecycle.html)

In my testing, the gui/autologin were ready when the test ran. However, the startup timing could vary. If the test sometimes passes then that may be why. If that happens, we could set another semaphore from a user LaunchAgent (...how entangled will this become?).

>   c. When is the font daemon started?

OSX does not guarantee services are started, but starts them on first use if they are not already started:
"In OS X v10.4 and later, most low-level services are started with launchd. By the time your startup item starts executing, launchd is running, and any attempt to access any of the services provided by a launchd daemon will result in that daemon starting. Thus, you can safely assume (or at least pretend) that any of these services are running by the time your startup item is called." (https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/StartupItems.html#//apple_ref/doc/uid/20002132-106927)

So we can assume fontd is started when we use it.
(In reply to James Graham [:jgraham] from comment #24)
> Here's another try run [1] with the patch to use --install-fonts rather than
> the (also-broken-on-mac) custom thing we were doing before that. The Ahem
> test in testing/web-platform/tests/infrastructure/assumptions/ahem.html is
> still failing on mac (the job passes because it's marked as an expected
> fail), but there isn't a test that runs before that one, or anything else
> obvious that would fit the cache-corruption theory.
> 
> [1]
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=a0e97d1a1e6ff59044935c8db98f782ec772e66e&selectedJob=2
> 04821422

Thankyou! I'll review this (and maybe run it on a staging worker to compare with my manual set-up).
(In reply to Dave House [:dhouse] from comment #26)
> (In reply to James Graham [:jgraham] from comment #24)
> > Here's another try run [1] with the patch to use --install-fonts rather than
> > the (also-broken-on-mac) custom thing we were doing before that. The Ahem
> > test in testing/web-platform/tests/infrastructure/assumptions/ahem.html is
> > still failing on mac (the job passes because it's marked as an expected
> > fail), but there isn't a test that runs before that one, or anything else
> > obvious that would fit the cache-corruption theory.
> > 
> > [1]
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=a0e97d1a1e6ff59044935c8db98f782ec772e66e&selectedJob=2
> > 04821422
> 
> Thankyou! I'll review this (and maybe run it on a staging worker to compare
> with my manual set-up).

I triggered a try run on copy of the changes on a staging server with a debug build (https://treeherder.mozilla.org/#/jobs?repo=try&revision=88e0b88a9bdfd2e869330c65e313312daf95d6a2&selectedJob=205005065),
and it passes (https://tools.taskcluster.net/groups/VQYIYONZStSE9vAaohpOSw/tasks/aEZEK27yQr-5lC8WyxjO1A/runs/0/logs/public%2Flogs%2Flive_backing.log).
```
01:21:31     INFO - Copy/paste: /Users/cltbld/tasks/task_1539332354/build/venv/bin/python -u /Users/cltbld/tasks/task_1539332354/build/tests/web-platform/runtests.py --log-raw=- --log-raw=/Users/cltbld/tasks/task_1539332354/build/blobber_upload_dir/wpt_raw.log --log-wptreport=/Users/cltbld/tasks/task_1539332354/build/blobber_upload_dir/wptreport.json --log-errorsummary=/Users/cltbld/tasks/task_1539332354/build/blobber_upload_dir/wpt_errorsummary.log --binary=/Users/cltbld/tasks/task_1539332354/build/application/NightlyDebug.app/Contents/MacOS/firefox --symbols-path=https://queue.taskcluster.net/v1/task/TueANkUOQFqNYpr5G63Nvg/artifacts/public/build/target.crashreporter-symbols.zip --stackwalk-binary=/Users/cltbld/tasks/task_1539332354/build/macosx64-minidump_stackwalk --stackfix-dir=/Users/cltbld/tasks/task_1539332354/build/tests/bin --run-by-dir=3 --no-pause-after-test --install-fonts --exclude=css --test-type=reftest --test-type=reftest --stylo-threads=4 [...]
[...]
01:28:25     INFO - PID 2158 | ++DOCSHELL 0x12daee800 == 8 [pid = 2158] [id = {6d859c70-8fe8-b743-b0e6-09e1be87baf9}]
01:28:25     INFO - PID 2158 | ++DOMWINDOW == 19 (0x12dec9e00) [pid = 2158] [serial = 19] [outer = 0x0]
01:28:25     INFO - PID 2158 | ++DOMWINDOW == 20 (0x12dedcc00) [pid = 2158] [serial = 20] [outer = 0x12dec9e00]
01:28:25     INFO - PID 2158 | ++DOCSHELL 0x11efe5800 == 1 [pid = 2162] [id = {02c2aff7-9133-654f-af55-4ffd3201661d}]
01:28:25     INFO - PID 2158 | ++DOMWINDOW == 1 (0x11703c800) [pid = 2162] [serial = 1] [outer = 0x0]
01:28:25     INFO - PID 2158 | ++DOMWINDOW == 2 (0x11efc5000) [pid = 2162] [serial = 2] [outer = 0x11703c800]
01:28:25     INFO - PID 2158 | ++DOMWINDOW == 3 (0x1263b1400) [pid = 2162] [serial = 3] [outer = 0x11703c800]
01:28:25     INFO - PID 2158 | 1539332905745	Marionette	INFO	Testing http://web-platform.test:8000/infrastructure/assumptions/ahem.html == http://web-platform.test:8000/infrastructure/assumptions/ahem-ref.html
01:28:25     INFO - PID 2158 | [Child 2162, Main Thread] WARNING: site security information will not be persisted: file /builds/worker/workspace/build/src/security/manager/ssl/nsSiteSecurityService.cpp, line 553
01:28:25     INFO - PID 2158 | ++DOMWINDOW == 4 (0x1263b7c00) [pid = 2162] [serial = 4] [outer = 0x11703c800]
01:28:26     INFO - PID 2158 | ++DOMWINDOW == 5 (0x12dc4c000) [pid = 2162] [serial = 5] [outer = 0x11703c800]
01:28:26     INFO - PID 2158 | 1539332906094	Marionette	INFO	Testing http://web-platform.test:8000/infrastructure/assumptions/ahem-ref.html != http://web-platform.test:8000/infrastructure/assumptions/ahem-notref.html
01:28:26     INFO - PID 2158 | ++DOMWINDOW == 6 (0x12ddca800) [pid = 2162] [serial = 6] [outer = 0x11703c800]
01:28:26     INFO - PID 2158 | ++DOMWINDOW == 7 (0x12ddd1800) [pid = 2162] [serial = 7] [outer = 0x11703c800]
01:28:26     INFO - TEST-UNEXPECTED-PASS | /infrastructure/assumptions/ahem.html | Testing http://web-platform.test:8000/infrastructure/assumptions/ahem.html == http://web-platform.test:8000/infrastructure/assumptions/ahem-ref.html
01:28:26     INFO - Testing http://web-platform.test:8000/infrastructure/assumptions/ahem-ref.html != http://web-platform.test:8000/infrastructure/assumptions/ahem-notref.html
```

I pulled the images out of the log and verified it is drawing with the user font (boxes in test). I'll attach the images.

I am running more tests and checking on the staging machines to see what is different and if they all succeed in installing and displaying the user font.
Could you make another try push to test again if the font will get installed and load using the new wpt --install-fonts argument? In my test on staging, it worked.
Flags: needinfo?(james)
Depends on: 1500081
Flags: needinfo?(james)
I pushed bug 1500081 to autoland, and if you look at the results in the associated jobs, the ahem test is still failing on mac:

07:49:28     INFO - TEST-START | /infrastructure/assumptions/ahem.html
07:49:29     INFO - PID 1001 | ++DOCSHELL 0x12a049800 == 9 [pid = 1001] [id = {6c24771e-18c0-2146-9b2d-3b5380e5fac8}]
07:49:29     INFO - PID 1001 | [Parent 1001, Main Thread] WARNING: NS_ENSURE_TRUE(browserChrome) failed: file /builds/worker/workspace/build/src/docshell/base/nsDocShell.cpp, line 12726
07:49:29     INFO - PID 1001 | ++DOMWINDOW == 22 (0x129f03c00) [pid = 1001] [serial = 22] [outer = 0x0]
07:49:29     INFO - PID 1001 | ++DOMWINDOW == 23 (0x129d05c00) [pid = 1001] [serial = 23] [outer = 0x129f03c00]
07:49:29     INFO - PID 1001 | ++DOCSHELL 0x126f09800 == 1 [pid = 1005] [id = {503da3fa-abce-df49-bee1-2944781e50f8}]
07:49:29     INFO - PID 1001 | [Child 1005, Main Thread] WARNING: NS_ENSURE_TRUE(browserChrome) failed: file /builds/worker/workspace/build/src/docshell/base/nsDocShell.cpp, line 12726
07:49:29     INFO - PID 1001 | ++DOMWINDOW == 1 (0x117a40400) [pid = 1005] [serial = 1] [outer = 0x0]
07:49:29     INFO - PID 1001 | ++DOMWINDOW == 2 (0x11fcd1800) [pid = 1005] [serial = 2] [outer = 0x117a40400]
07:49:29     INFO - PID 1001 | ++DOMWINDOW == 3 (0x126fbfc00) [pid = 1005] [serial = 3] [outer = 0x117a40400]
07:49:29     INFO - PID 1001 | 1539874169294	Marionette	INFO	Testing http://web-platform.test:8000/infrastructure/assumptions/ahem.html == http://web-platform.test:8000/infrastructure/assumptions/ahem-ref.html
07:49:29     INFO - PID 1001 | [Child 1005, Main Thread] WARNING: site security information will not be persisted: file /builds/worker/workspace/build/src/security/manager/ssl/nsSiteSecurityService.cpp, line 553
07:49:29     INFO - PID 1001 | ++DOMWINDOW == 4 (0x126fc5c00) [pid = 1005] [serial = 4] [outer = 0x117a40400]
07:49:29     INFO - PID 1001 | ++DOMWINDOW == 5 (0x12e852000) [pid = 1005] [serial = 5] [outer = 0x117a40400]
07:49:29     INFO - PID 1001 | 1539874169627	Marionette	INFO	Found 134880 pixels different, maximum difference per channel 255
07:49:29     INFO - TEST-FAIL | /infrastructure/assumptions/ahem.html | took 716ms

https://treeherder.mozilla.org/logviewer.html#?job_id=206381763&repo=autoland&lineNumber=14553
(In reply to James Graham [:jgraham] from comment #31)
> 07:49:29     INFO - PID 1001 | 1539874169627	Marionette	INFO	Found 134880
> pixels different, maximum difference per channel 255
> 07:49:29     INFO - TEST-FAIL | /infrastructure/assumptions/ahem.html | took
> 716ms

I ran a larger set of tests and I also got a failure. I'll retry a few times to see if I get a pass again, and to see if something is different.
Flags: needinfo?(dhouse)
Did something change? I just saw the Ahem test pass on an unrelated try push, and I see something similar on inbound: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&searchStr=Wr&selectedJob=206614523
My attempts show unexpected test passes for the ahem reftest again on
staging: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9df4569f94b8808acbf26bcc4b2c25e3714d8374&selectedJob=206438717
and try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=94f398e13649a5acad825c536cf595db4d31c33e&selectedJob=206479020
```
14:16:41     INFO - TEST-UNEXPECTED-PASS | /infrastructure/assumptions/ahem.html | Testing http://web-platform.test:8000/infrastructure/assumptions/ahem.html == http://web-platform.test:8000/infrastructure/assumptions/ahem-ref.html
```

I imported the changeset from the bug 15000081 push, and re-ran tests with that on staging. And I get the same unexpected-pass results:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=097ab68455f3250027b74bc2a8b47148ea92f2e2&selectedJob=206539016
I've triggered the same (with only the changeset and not my changes to run the tasks on staging) on the try pool:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7324c55b85681685fa41bf42cfdcb91fd59fa840

So, I'll wait and see if those unexpectedly pass also.
(In reply to James Graham [:jgraham] from comment #33)
> Did something change? I just saw the Ahem test pass on an unrelated try
> push, and I see something similar on inbound:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&searchStr=Wr&selectedJob=206614523

I wonder if the font is being left behind by tests that run with "--install-fonts". I checked over the minis and see the Ahem.ttf font in the user fonts directory for many of the machines (but they are actively running tests, and so it could be in place for the test).

That is the only change I know of (running these tests). But that does not match with my testing on staging where I know the font is not being left behind (I looked just now and it is not in the ~cltbld/Library/Fonts/ directory on the four staging minis where I ran the tests last night).
(In reply to Dave House [:dhouse] from comment #35)
> (In reply to James Graham [:jgraham] from comment #33)
> > Did something change? I just saw the Ahem test pass on an unrelated try
> > push, and I see something similar on inbound:
> > https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> > inbound&searchStr=Wr&selectedJob=206614523
> 
> I wonder if the font is being left behind by tests that run with
> "--install-fonts". I checked over the minis and see the Ahem.ttf font in the
> user fonts directory for many of the machines (but they are actively running
> tests, and so it could be in place for the test).
> 
> That is the only change I know of (running these tests). But that does not
> match with my testing on staging where I know the font is not being left
> behind (I looked just now and it is not in the ~cltbld/Library/Fonts/
> directory on the four staging minis where I ran the tests last night).

I think the ~cltbld/Library/Fonts/ directory is being created and left when the font file is removed after the test (the directory does not exist by default). This might allow the old code to install the font without needing to switch to --install-fonts.
(In reply to Dave House [:dhouse] from comment #36)
> I think the ~cltbld/Library/Fonts/ directory is being created and left when
> the font file is removed after the test (the directory does not exist by
> default). This might allow the old code to install the font without needing
> to switch to --install-fonts.

I'll change the staging minis to force remove the Fonts directory before running each test. Then I can see if the --install-fonts test passes without that (or if it was only passing on the second+ runs when the directory existed).
So --install-fonts is now being used. But yeah it seems plausible that font is picked up iff a previous test run created that directory on the machine. So perhaps the solution here is to ensure that puppet creates the directory?
(In reply to James Graham [:jgraham] from comment #38)
> So --install-fonts is now being used. But yeah it seems plausible that font
> is picked up iff a previous test run created that directory on the machine.
> So perhaps the solution here is to ensure that puppet creates the directory?

Okay, I didn't know the change to use --install-fonts was merged.

I'd like to not create the directory if we don't have to since OSX doesn't create it by default. I'll test and see if we need to create it.
I re-ran the tests on staging with the ~cltbld/Library/Fonts directory removed between tests, and the tests still pass (https://treeherder.mozilla.org/#/jobs?repo=try&revision=238c12bc8ebbf0b8b591e26c3416fb7afd3f1a23&selectedJob=206636725).

Is it possible that there was code (that changed) to cause the failure in the autoland runs, and your previous try run? There has been no change to the test machines, and so I think the changes in code or tests are the cause of it now passing.
Flags: needinfo?(dhouse) → needinfo?(james)
Looking at the history of the tests on inbound the failures (i.e. tests passing) don't correspond to a wpt change and there are some cases where it's failing on debug and passing in opt on the same build (see [1]), whereas recently it seems to fail everywhere. So I still think there's some state being propogated as it runs, or similar.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&searchStr=Wr%2COSX&fromchange=c06d1f31c0914b09091a7e2d531c782607504d0e&selectedJob=206296455
Flags: needinfo?(james)

Given bug 1490969 comment 47 "All CSS tests now run on Mac and Windows (both with Ahem)", can we close this bug as fixed now?

Flags: needinfo?(james)

I dodnt think we ever fully understood why things started working, but it seems to work stabley, so resolving as WFM.

Status: NEW → RESOLVED
Closed: 9 months ago
Flags: needinfo?(james)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.