Closed Bug 1832469 Opened 2 years ago Closed 2 years ago

resource:// URLs fail to load in Windows 10 builds

Categories

(Core :: Audio/Video: GMP, defect, P2)

Desktop
Windows 10
defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox113 --- unaffected
firefox114 --- fixed
firefox115 --- fixed

People

(Reporter: aminomancer, Assigned: aosmond)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Last week I noticed my artifact builds on Windows 10 have some major problems. The first thing I noticed is about:newtab fails to load. Same with about:welcome. Opening the content devtools or multiprocess browser toolbox also fails to load, it gets stuck as just a blank page. However, there isn't any useful console output. Eventually I pieced together that all resource:// URLs don't work. That is, typing any resource:// URL into the address bar results in a blank page (though without the browser console message you'd get if you typed a resource:// URL to a truly nonexistent address).

However, I tried this on my macbook and could not reproduce the problem. I asked several other people and nobody noticed it. So I thought maybe it's a problem on my end. But I tried checking out older revisions and those builds worked just fine. Eventually I did a bisection, with these results:

a66b13fb69269efc9d10e9b99cc8b98e2565acc0 is the first bad commit
commit a66b13fb69269efc9d10e9b99cc8b98e2565acc0
Author: Andrew Osmond <aosmond@mozilla.com>
Date:   Tue May 2 16:39:54 2023 +0000

    Bug 1763978 - Fix resolving final paths on Windows with third party driver mount points. r=cmartin,mhowell

    Some third party driver can create mount points without using the
    Mount Manager. Without that, GetFinalPathNameByHandleW calls with
    VOLUME_NAME_DOS cannot succeed. Instead we need to call it with
    VOLUME_NAME_NT and convert the NT path to a DOS path using
    QueryDosDevice to perform the mapping.

    Prior to this patch, we would fail to load GMP plugins from
    profiles mounted in a ramdisk, and potentially other situations.

    See documentation for GetFinalPathNameByHandleW for more details:
    https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfinalpathnamebyhandlew

    Differential Revision: https://phabricator.services.mozilla.com/D174887

 widget/windows/WinUtils.cpp | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

Now I think this makes more sense. I confirmed that this is the first commit with the problem, and previous commits are fine. Tested the parent, d8f66c, which works as expected. But I haven't found anyone else who can reproduce this, so perhaps there is something about my Windows 10 PC in particular that doesn't cooperate nicely with the changes in bug 1763978. Oddly, my full builds also seem to suffer from this problem, but not with every URL. The devtools fail to load, but the newtab and welcome pages do load.

I wish I had some useful console output to share with you so you could debug this issue. However, if you're unable to reproduce the issue, but have some ideas for how to fix it, please r? me and I'll be able to test your fix on my machine where I can reproduce it. And then I should be able to identify if it's fixed. I suppose it's possible I'm the only engineer affected, just by some odd coincidence, but since this is the PC all my work stuff is set up on, I would hugely appreciate if we can prioritize this, and I will help in any way possible. Feel free to ping me on slack or matrix at aminomancer if you need any info or assistance from me.

Thanks so much 🙏

Flags: needinfo?(aosmond)

Set release status flags based on info from the regressing bug 1763978

Some thoughts:

  • Are you using symlinks? Those are not enabled by default on windows but maybe the build system uses them if possible and they break here?
  • If this does happen on full builds, can you maybe add some printf_stderr calls to WinUtils::ResolveJunctionPointsAndSymLinks? Alternatively what does MOZ_LOG=Widget:5 ./mach run when a file fails to load? It seems we have logs for this stuff that would be useful
Flags: needinfo?(shughes)

We are still in early beta, so if we don't get to the bottom of this soon, a backout may be in order and let it soak another cycle.

Presumably it is either NtPathToDosPath or GetFinalPathNameByHandleW that is failing, and emilio's logging suggestion will reveal which it is.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Some thoughts:

  • Are you using symlinks? Those are not enabled by default on windows but maybe the build system uses them if possible and they break here?

Wow, yeah... good idea. Pretty sure it does use symlinks. That makes sense, because I often don't need to rebuild Firefox between edits, for certain kinds of files they just seem to get mapped into the obj dir. Anyway, I ran dir /AL /S c:\ to check and there's a dumpster fire of symlinks in my obj dirs for everything mozilla related.

  • If this does happen on full builds, can you maybe add some printf_stderr calls to WinUtils::ResolveJunctionPointsAndSymLinks? Alternatively what does MOZ_LOG=Widget:5 ./mach run when a file fails to load? It seems we have logs for this stuff that would be useful

Here's what happens when I do your second suggestion (MOZ_LOG) and try opening a resource:// URL.

[Parent 20548: StreamTrans #39]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\mozilla-git\mozilla-unified-2\browser\components\customizableui\CustomizableUI.jsm

The log is also full of these messages just for loading random stuff. The chrome:// URLs work fine, it's just the resource:// URLs that don't work. But both seem to produce the same log output:

[Parent 20548: StreamTrans #63]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\mozilla-git\mozilla-unified-2\browser\themes\shared\icons\login.svg

You can see the above are paths to the repo, not the obj dir. But some of the other files show direct paths to the obj dir, like

[Parent 20548: StreamTrans #55]: D/Widget ResolveJunctionPointsAndSymLinks: Resolving path: C:\mozilla-git\mozilla-unified-2\obj-firefox\artifact\dist\bin\browser\chrome\browser\skin\classic\browser\bookmark.svg
Flags: needinfo?(shughes)

Those logs are a bit confusing. It is successfully resolving? That's not what I was expecting given my change is localized to this method...

These are the logs with my patch right? Would you be able to provide similar logs from a build without my patch? I wonder if it resolves to something different.

Flags: needinfo?(shughes)

Also, please attach the full log, just in case there is some other hint. A debug build would be best if possible to capture warnings elsewhere in the code that may provide some hints as to what went wrong.

(In reply to Andrew Osmond [:aosmond] (he/him) from comment #5)

Those logs are a bit confusing. It is successfully resolving? That's not what I was expecting given my change is localized to this method...

These are the logs with my patch right? Would you be able to provide similar logs from a build without my patch? I wonder if it resolves to something different.

Yeah these are logs with your patch. The files are not loading, i.e. I just see a blank page. But it does seem that ResolveJunctionPointsAndSymLinks is resolving, or we shouldn't get a log message at all, right?

Yes, I can try the latest working commit, which was the parent commit.

FWIW I don't see any of the log messages that are visible in your diff, like NtPathToDosPath failed. But the files do successfully load.

Bad:

[Parent 38944: StreamTrans #24]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\mozilla-git\mozilla-unified-2\browser\themes\shared\icons\reload-to-stop.svg
[Parent 38944, Main Thread] ###!!! ASSERTION: uninitialized: 'mHost.mLen >= 0', file /builds/worker/checkouts/gecko/netwerk/base/nsStandardURL.cpp:1801

Good:

[Parent 43672: StreamTrans #31]: D/Widget ResolveJunctionPointsAndSymLinks: Resolved path to: C:\mozilla-git\mozilla-unified-2\browser\components\customizableui\CustomizableUI.jsm
[Parent 43672, Main Thread] ###!!! ASSERTION: uninitialized: 'mHost.mLen >= 0', file /builds/worker/checkouts/gecko/netwerk/base/nsStandardURL.cpp:1801
Attached file bad debug build output
Sure, here's what I've got from a bad artifact debug build with MOZ_LOG=Widget:5

(In reply to Emilio Cobos Álvarez (:emilio) from comment #2)

Are you using symlinks? Those are not enabled by default on windows but maybe the build system uses them if possible and they break here?

It's enabled when one enables Developer Mode and yes the build system uses it. See also bug 1643072 and bug 1635428.

Here's the output from a good artifact debug build, in this case the immediate parent commit I linked before

From all the scrambled output I'll assume that's probably an issue with my terminal or something, and unrelated to the path resolution issue. (Since that second build works fine, in spite of the scrambled logs)

Flags: needinfo?(shughes)

I'll have to check this as my machine is also affected.

Flags: needinfo?(krosylight)

All of these logs are from the parent process, and seem to load just fine. There should be additional logs for the child processes. Would you be able to provide those as well? I wonder if we are hitting a permissions issue given chrome is fine (but always the parent process?) and resources are not (presumably in a content process?).

Flags: needinfo?(aosmond) → needinfo?(shughes)

Hmm... I just copied the output from the terminal from which I launched fx. How can we get logs for the child process when the files for the devtools can't load and the devtools frame content is just a blank page?

Flags: needinfo?(shughes)

Hm? DevTools should load in the parent process.

Go to about:logging, add Widget:5 to the log modules, set log to a file and specify the filename. It should output one file per Firefox process. That might solve the interleaving as well.

Flags: needinfo?(shughes)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #15)

Hm? DevTools should load in the parent process.

The mainstay of it is, but surely there are parts that run in the content process to interact with content and maybe those failing to load bring the whole thing down?

Oh, I just meant my way of reading content logs, to date, has been to open the console in a content toolbox lol. But one of the consequences of this bug is content toolbox and multiprocess toolbox get stuck while loading.

I uploaded the about:logging output on google drive, since it was too big to upload on BMO. Also sent it to you via slack.

Basically what I did in the session was follow the about:logging steps, then open a new tab, enter the path to CustomizableUI.jsm in the address bar, hit Enter (resulting in a blank page), then press Ctrl+Shift+I to open a toolbox (which got stuck loading, as expected), then quit out.

Flags: needinfo?(shughes)

This patch backs out bug 1763978 for breaking resource:// loading on
Windows in certain circumstances. It isn't clear why this is happening
since all of the log evidence is showing that the paths are resolving
correctly, but we have verified that without the patch, the builds work
as expected.

We can reland the prior patch with updates once we understand the
problem. The prior patch was not critical so there is no urgency to
reuplift a fix.

Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Severity: -- → S2
Priority: -- → P2

Comment on attachment 9333374 [details]
Bug 1832469 - Backout recent changes to final patch resolution on Windows.

Beta/Release Uplift Approval Request

  • User impact if declined: resource:// URLs seem to have trouble loading sometimes, but not consistently. This can break devtools for some users, for example.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It is not risky because it is just a backout, reverting us to the same behaviour we have shipped for many years without issue.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9333374 - Flags: approval-mozilla-beta?
Pushed by aosmond@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f9a06942d2c4 Backout recent changes to final patch resolution on Windows. r=cmartin

Over the weekend I can try emilio's other suggestion, that seems like it could be revealing:

If this does happen on full builds, can you maybe add some printf_stderr calls to WinUtils::ResolveJunctionPointsAndSymLinks

Btw, I realized I didn't make this very clear before, resource:// URLs apparently fail to load in-content and possibly some other contexts, but in the browser console I don't have any problems importing them, e.g. ChromeUtils.import("resource:///modules/CustomizableUI.jsm") works

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch

Comment on attachment 9333374 [details]
Bug 1832469 - Backout recent changes to final patch resolution on Windows.

Approved for 114 beta 5, thanks.

Attachment #9333374 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Hi Andrew, are you planning to reland the patch soon? In that case I can investigate what file is failing to be loaded, but if not maybe not.

Flags: needinfo?(krosylight) → needinfo?(aosmond)
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: