Closed
Bug 1401250
Opened 7 years ago
Closed 6 years ago
Permafailing Win10 toolkit/xre/test/browser_checkdllblockliststate.js | Windows dll blocklist status should be true, indicating it is running properly. A failure in this test is considered a release blocker. -
Categories
(Toolkit :: Blocklist Implementation, defect, P1)
Toolkit
Blocklist Implementation
Tracking
()
RESOLVED
FIXED
mozilla59
People
(Reporter: intermittent-bug-filer, Assigned: away)
References
Details
Attachments
(2 files)
1.80 KB,
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
39.87 KB,
image/png
|
Details |
Filed by: rvandermeulen [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=132006346&repo=mozilla-beta https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-beta-win64/mozilla-beta_win10_64_test-mochitest-e10s-browser-chrome-4-bm109-tests1-windows-build0.txt.gz This showed up when I did a sync between m-c and m-b today. What we know so far: * Fails only on Desktop Firefox builds. DevEdition is passing. And only opt builds - debug is green. * We switched Win64 mochitest-browser-chrome from Win8 to Win10 (see bug 1397229) in that range. It was previously passing on Win8. * It is passing on Trunk. The failures are only happening on Beta.
Comment 1•7 years ago
|
||
[Tracking Requested - why for this release]: According to the error message in the test, this is bad.
status-firefox57:
--- → affected
tracking-firefox57:
--- → ?
Comment 2•7 years ago
|
||
If the test says that, it is probably right :)
Updated•7 years ago
|
Flags: needinfo?(jmathies)
Priority: P5 → P1
Updated•7 years ago
|
Flags: needinfo?(jmathies)
Updated•7 years ago
|
Component: Security: Process Sandboxing → Blocklisting
Flags: needinfo?(jmathies)
Product: Core → Toolkit
Version: unspecified → Trunk
Comment hidden (Intermittent Failures Robot) |
Comment 6•7 years ago
|
||
Local beta build passes running on one of our low end win10 reference laptops. Joel, do you know what kind of hardware these tests run on? I think it's pretty easy to gain access to one of these slaves, I'd like to do that to investigate*. Curious if you can point me to some docs on how to do that? * only if I can install dev tools on it, check out source and build.
Flags: needinfo?(jmaher)
Comment 7•7 years ago
|
||
these were running on hardware, as of 2 days ago I moved them to virtual machines (and oddly the failures have gone away). This was on trunk, we could uplift the patch to beta if needed. Possibly we can revisit this bug in a week to see if it is still a problem?
Flags: needinfo?(jmaher)
Comment 8•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #7) > and oddly the failures have gone away Where did they go away from? As far as I know, this only affected Beta and is still happening as of the latest push there today. But yes, I'd be happy to land the patches that moved the tests over to VMs if you think it might help.
Flags: needinfo?(jmaher)
Comment 9•7 years ago
|
||
:ryanvm, lets upload the patch from bug 1402068 to mozilla-beta, that would give us a good chance to debug on what we have ensured is greenest on trunk.
Flags: needinfo?(jmaher)
Comment 10•7 years ago
|
||
Still failing on the VMs. https://treeherder.mozilla.org/logviewer.html#?job_id=133424523&repo=mozilla-beta
Comment 11•7 years ago
|
||
thanks :RyanVM! this was failing on hardware: https://wiki.mozilla.org/Buildbot/Talos/Misc#Hardware_Profile_of_machines_used_in_automation now it continues to fail on a VM. here is how to get a windows 10 vm loaner: https://wiki.mozilla.org/ReleaseEngineering/How_To/Self_Provision_a_TaskCluster_Windows_Instance I do not think you will easily be able to build on there- probably installing mozilla-build and maybe some additional tools will help, it will be a longer bootstrap problem though. Historically on all loaners we have not had the ability to easily build and run debugging tools- the builders have different packages installed than the testers.
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Flags: needinfo?(jmathies)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 15•7 years ago
|
||
Any luck with this so far, Jim? This failure is still hitting on the 58-as-Beta Try simulation runs as well.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 21•7 years ago
|
||
Hi Jim, Any progress here? It seems it is still the #28 most frequent failure.
Comment 22•7 years ago
|
||
(In reply to Gerry Chang [:gchang] - On PTO - Back on October 23 from comment #21) > Hi Jim, > Any progress here? It seems it is still the #28 most frequent failure. Nope not yet. Need to get a hold of one of these slaves and poke around, figure out what's going on. I haven't had the time to go step through the "borrow a slave" process. If you would like to pick this up and have the cycles, please do.
Flags: needinfo?(jmathies)
Comment 23•7 years ago
|
||
Aaron, is this something you might be able to look at? I know you're busy - so I'll n-i to dmajor or handyman as well. From what I understand here, the test is failing, and we aren't sure if there's a real problem with dll blocklists for 57, or if it is a minor problem with the test itself. DLL blocklisting is pretty important for stability on the release channel (more so than on beta), so the test failure is marked as a release blocker.
Flags: needinfo?(dmajor)
Flags: needinfo?(davidp99)
Flags: needinfo?(aklotz)
Assignee | ||
Comment 24•7 years ago
|
||
Does this ever happen on debug builds? There ought to be enough printf in there to help narrow this down.
Comment 25•7 years ago
|
||
All opt builds, unfortunately. It's easily reproducible on Try if you want to attempt any extra logging patches, though.
Assignee | ||
Comment 26•7 years ago
|
||
Can you try this? It'll turn the failure into a crash, and we can read the variables from the crash .extra file. --- a/mozglue/build/WindowsDllBlocklist.cpp +++ b/mozglue/build/WindowsDllBlocklist.cpp @@ -924,6 +924,6 @@ MFBT_API bool DllBlocklist_CheckStatus() { if (sBlocklistInitFailed || sUser32BeforeBlocklist) - return false; + __debugbreak(); return true; }
Comment 27•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #23) > Aaron, is this something you might be able to look at? > I know you're busy - so I'll n-i to dmajor or handyman as well. > > From what I understand here, the test is failing, and we aren't sure if > there's a real problem with dll blocklists for 57, or if it is a minor > problem with the test itself. DLL blocklisting is pretty important for > stability on the release channel (more so than on beta), so the test failure > is marked as a release blocker. I don't think there's a real problem here, we've tested this on various real hardware and it doesn't reproduce. the failure was introduced when we switched a set of tests to aws vms. Akotz is busy with a11y, davidp is busy with flash sandbox hardening. dmajor, if you have cycles I'd appreciate the help.. Otherwise I'll try to get back to this in a week or so.
Flags: needinfo?(davidp99)
Flags: needinfo?(aklotz)
Assignee | ||
Comment 28•7 years ago
|
||
https://queue.taskcluster.net/v1/task/JhPUd8uvSjafeZEKzPJmtA/runs/0/artifacts/public/test_info/5f440e7d-3176-4cd3-b346-418d9e825bbb.extra User32BeforeBlocklist=1 This is troubling.
Assignee | ||
Comment 29•7 years ago
|
||
The build from Ryan's push has a load-time dependency on user32. Here's the module list before the initial process breakpoint. I'll investigate further tomorrow. ModLoad: 00007ff6`199d0000 00007ff6`19a3f000 firefox.exe ModLoad: 00007fff`8ce70000 00007fff`8d04b000 ntdll.dll ModLoad: 00007fff`8aba0000 00007fff`8ac4e000 C:\WINDOWS\System32\KERNEL32.DLL ModLoad: 00007fff`89f00000 00007fff`8a149000 C:\WINDOWS\System32\KERNELBASE.dll ModLoad: 00007fff`81cc0000 00007fff`81d3e000 C:\WINDOWS\SYSTEM32\apphelp.dll ModLoad: 00007fff`738e0000 00007fff`73937000 C:\WINDOWS\AppPatch\AppPatch64\AcGenral.DLL ModLoad: 00007fff`8a530000 00007fff`8a5cd000 C:\WINDOWS\System32\msvcrt.dll ModLoad: 00007fff`8b550000 00007fff`8b5a9000 C:\WINDOWS\System32\sechost.dll ModLoad: 00007fff`8a690000 00007fff`8a7b5000 C:\WINDOWS\System32\RPCRT4.dll ModLoad: 00007fff`8acc0000 00007fff`8ad11000 C:\WINDOWS\System32\SHLWAPI.dll ModLoad: 00007fff`8cab0000 00007fff`8cda9000 C:\WINDOWS\System32\combase.dll ModLoad: 00007fff`8a270000 00007fff`8a366000 C:\WINDOWS\System32\ucrtbase.dll ModLoad: 00007fff`89e90000 00007fff`89efa000 C:\WINDOWS\System32\bcryptPrimitives.dll ModLoad: 00007fff`8aa20000 00007fff`8aa47000 C:\WINDOWS\System32\GDI32.dll ModLoad: 00007fff`89430000 00007fff`895b7000 C:\WINDOWS\System32\gdi32full.dll ModLoad: 00007fff`89390000 00007fff`8942a000 C:\WINDOWS\System32\msvcp_win.dll ModLoad: 00007fff`8aa50000 00007fff`8ab9a000 C:\WINDOWS\System32\USER32.dll
Assignee | ||
Comment 30•7 years ago
|
||
For reference here's a list from a good build; these modules can be cleared of suspicion. start end module name 00007ff6`90310000 00007ff6`90380000 firefox (deferred) 00007fff`5dc10000 00007fff`5dcaf000 MSVCP140 (deferred) 00007fff`61880000 00007fff`618ae000 mozglue (deferred) 00007fff`730e0000 00007fff`730f6000 VCRUNTIME140 (deferred) 00007fff`78700000 00007fff`788a9000 dbghelp (deferred) 00007fff`859c0000 00007fff`859ca000 VERSION (deferred) 00007fff`88d30000 00007fff`88d3b000 CRYPTBASE (deferred) 00007fff`89e90000 00007fff`89efa000 bcryptPrimitives (deferred) 00007fff`89f00000 00007fff`8a149000 KERNELBASE (deferred) 00007fff`8a270000 00007fff`8a366000 ucrtbase (deferred) 00007fff`8a530000 00007fff`8a5cd000 msvcrt (deferred) 00007fff`8a690000 00007fff`8a7b5000 RPCRT4 (deferred) 00007fff`8aba0000 00007fff`8ac4e000 KERNEL32 (deferred) 00007fff`8b550000 00007fff`8b5a9000 sechost (deferred) 00007fff`8cdb0000 00007fff`8ce51000 ADVAPI32 (deferred) 00007fff`8ce70000 00007fff`8d04b000 ntdll (pdb symbols)
Assignee | ||
Comment 31•7 years ago
|
||
AcGenral.DLL is my biggest suspect right now. Why on earth is it in our process? Did we accidentally request some appcompat thing?
Assignee | ||
Comment 32•7 years ago
|
||
Ok, I think I know what's triggering this. According to the Compatibility Administrator in the Windows ADK, Windows 10 ships with an appcompat setting for Firefox called "DamThrottleControl". It keys off PRODUCT_NAME="Firefox" and FILE_DESCRIPTION="Firefox" so we don't trigger it on trunk where our app calls itself "Nightly". This "DamThrottleControl" must be new to Windows 10 because I don't see it in my Windows 8.1 machine's compat database. I have no idea what this compat fix was intended to solve, or how to get around it (short of using a different product name in our app metadata!). Milan, I've heard you mention a "Microsoft list" in another bug. Would you be able to get some help on this? To recap: this compat setting causes AcGenrl.dll to load early in our process, which in turn pulls in user32.dll, which loads AppInit_DLLs before our DLL blocklist is initialized, which prevents us from blocking crashy injected libraries.
Flags: needinfo?(dmajor) → needinfo?(milan)
Comment 33•7 years ago
|
||
That sounds plausible. So far, I've ruled out via Try pushes the version number change and --enable-profiling as related to this, which isn't leaving much left other than official branding :)
Comment 34•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #32) > Ok, I think I know what's triggering this. > > According to the Compatibility Administrator in the Windows ADK, Windows 10 > ships with an appcompat setting for Firefox called "DamThrottleControl". It > keys off PRODUCT_NAME="Firefox" and FILE_DESCRIPTION="Firefox" so we don't > trigger it on trunk where our app calls itself "Nightly". This > "DamThrottleControl" must be new to Windows 10 because I don't see it in my > Windows 8.1 machine's compat database. > > I have no idea what this compat fix was intended to solve, or how to get > around it (short of using a different product name in our app metadata!). > > Milan, I've heard you mention a "Microsoft list" in another bug. Would you > be able to get some help on this? To recap: this compat setting causes > AcGenrl.dll to load early in our process, which in turn pulls in user32.dll, > which loads AppInit_DLLs before our DLL blocklist is initialized, which > prevents us from blocking crashy injected libraries. I've run this test manually on a fully updated win 10 laptop and didn't get a failure. Could this be something weird aws installed on it's slaves?
Assignee | ||
Comment 35•7 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #34) > I've run this test manually on a fully updated win 10 laptop and didn't get > a failure. Could this be something weird aws installed on it's slaves? Was it on a local build? You need to have official branding enabled so that the product name is "Firefox" rather than "Nightly".
Message sent to the microsoft discuss list, :dmajor is CC-d on it.
Flags: needinfo?(milan)
Assignee | ||
Comment 37•7 years ago
|
||
If we don't hear back from Microsoft, I think this is our only option: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3e0a693a081920b58fea13415a2c4bad185bfa40 This patch changes our Windows file-description metadata from "Firefox" to "Mozilla Firefox" in order to work around an issue on Windows 10. To be clear: This patch does NOT change any branding normally presented to a typical user. Short of attaching a debugger, the only way to see this change is to right-click firefox.exe and choose Properties then Details. We're doing this because Windows 10 ships with an appcompat fix called DamThrottleControl that applies to any firefox.exe with product name and description equal to the string "Firefox". I have no idea what that fix was intended to do, but Nightly and DevEdition appear to be doing fine without it. In any case, the appcompat fix comes with the side effect of loading user32.dll very early in our process, which allows AppInit_DLLs to sneak in before our blocklist initializes. That's bad (and it fails a mochitest). This patch evades the Windows appcompat engine by making us no longer match the string that it's testing for.
Assignee: jmathies → dmajor
Attachment #8925932 -
Flags: review?(core-build-config-reviews)
Assignee | ||
Comment 38•7 years ago
|
||
Example of what the patch affects
Comment 39•7 years ago
|
||
For the curious, here's what I've found about "DAM": https://channel9.msdn.com/Shows/Going+Deep/Inside-Windows-8-Jon-Berry-Desktop-Activity-Moderator-and-Connected-Standby
Comment 40•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #35) > (In reply to Jim Mathies [:jimm] from comment #34) > > I've run this test manually on a fully updated win 10 laptop and didn't get > > a failure. Could this be something weird aws installed on it's slaves? > > Was it on a local build? You need to have official branding enabled so that > the product name is "Firefox" rather than "Nightly". Yes, it's a non-branded build. Thanks for picking this up.
Assignee | ||
Comment 41•7 years ago
|
||
Pike, I'm planning to change an internal string in comment 37. I pulled up a ru build (without my patch) and the file properties still had their English, Latin-script values, so I'm pretty sure these strings are not localized. But I wanted to give you a heads up in case you foresee any problems I'm not aware of.
Flags: needinfo?(l10n)
Assignee | ||
Comment 42•7 years ago
|
||
dbolter: I don't expect comment 37 to break any screen readers (they better not be testing for product name "Firefox" or else they'd be broken already in Nightly and FirefoxDeveloperEdition) but just to be safe, in case we rush this to release, could you take the try build for a spin? Thanks!
Flags: needinfo?(dbolter)
Comment 43•7 years ago
|
||
That string is coming from https://dxr.mozilla.org/mozilla-central/source/browser/branding/official/configure.sh#5, I think, so keeping it hard-coded in English is fine. We also don't want to modify our branding from a trademarks POV, so this is OK from an l10n pov. Thanks for asking.
Flags: needinfo?(l10n)
Comment 44•7 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) - Backlogged on reviews and needinfos from comment #39) > For the curious, here's what I've found about "DAM": > https://channel9.msdn.com/Shows/Going+Deep/Inside-Windows-8-Jon-Berry- > Desktop-Activity-Moderator-and-Connected-Standby WOW. This might explain some problems that we've had in a11y-land.
David, Jim, :Philipp and a bunch of others have reviewed the severity of this issue and whether it should block 57. The findings so far is that this is not a new problem in 57. This has been a problem with all Firefox versions (55, 56) on win10. The crash rates from 56 are not concerning around this specific issue. David and Jim also mentioned that uplifting this patch to 57 is risky and should be avoided if possible. To summarize, while this is a problem that needs to be fixed in 58+. For 57, this is not release blocking as the crash rates due to this problem is low (malware crashes on win10 due to some of the dll blocklisting code not working as expected). I would like to keep this tracking for 57 and status affected for a bit longer post 57 launch.
(In reply to David Major [:dmajor] from comment #37) > Created attachment 8925932 [details] [diff] [review] > Change firefox.exe description metadata Graphics drivers are keying off the name of the executable, so as long as that stays firefox.exe, I don't think the change in the meta data will have any negative effect on the graphics.
Comment 47•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #42) > dbolter: I don't expect comment 37 to break any screen readers (they better > not be testing for product name "Firefox" or else they'd be broken already > in Nightly and FirefoxDeveloperEdition) but just to be safe, in case we rush > this to release, could you take the try build for a spin? Thanks! I'm betting MarcoZ will beat me to this (and he has more screen readers to try out). Transferring NI. Marco can you jump on this?
Flags: needinfo?(dbolter) → needinfo?(mzehe)
Comment 48•7 years ago
|
||
(In reply to David Major [:dmajor] from comment #37) > the appcompat fix comes with the side effect of loading > user32.dll very early in our process Maybe that was the goal of that fix?
Comment 49•7 years ago
|
||
Comment on attachment 8925932 [details] [diff] [review] Change firefox.exe description metadata Review of attachment 8925932 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/app/module.ver @@ +2,5 @@ > WIN32_MODULE_COPYRIGHT=©Firefox and Mozilla Developers; available under the MPL 2 license. > WIN32_MODULE_PRODUCTVERSION=@MOZ_APP_WINVERSION@ > WIN32_MODULE_PRODUCTVERSION_STRING=@MOZ_APP_VERSION@ > WIN32_MODULE_TRADEMARKS=Firefox is a Trademark of The Mozilla Foundation. > +WIN32_MODULE_DESCRIPTION=Mozilla @MOZ_APP_DISPLAYNAME@ Hardcoding Mozilla here is not really great. Sadly the branding's configure.sh doesn't have an equivalent to brandFullName.
Assignee | ||
Comment 50•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #48) > (In reply to David Major [:dmajor] from comment #37) > > the appcompat fix comes with the side effect of loading > > user32.dll very early in our process > > Maybe that was the goal of that fix? Given the DAM context, I'm doubtful. I think it's more likely that user32 is merely along for the ride, as it's an (indirect) dependency of the appcompat DLLs.
Comment 51•7 years ago
|
||
Comment on attachment 8925932 [details] [diff] [review] Change firefox.exe description metadata Review of attachment 8925932 [details] [diff] [review]: ----------------------------------------------------------------- I don't see any reason this shouldn't be OK from a build system perspective. It is a little worrying that we are working around a Microsoft appcompat fix, but it's also annoying that they would ship something like that without asking us to fix a bug on our side first.
Attachment #8925932 -
Flags: review?(core-build-config-reviews) → review+
Comment 52•7 years ago
|
||
(In reply to David Bolter [:davidb] (NeedInfo me for attention) from comment #47) > (In reply to David Major [:dmajor] from comment #42) > > dbolter: I don't expect comment 37 to break any screen readers (they better > > not be testing for product name "Firefox" or else they'd be broken already > > in Nightly and FirefoxDeveloperEdition) but just to be safe, in case we rush > > this to release, could you take the try build for a spin? Thanks! > > I'm betting MarcoZ will beat me to this (and he has more screen readers to > try out). Transferring NI. Marco can you jump on this? I tried the try build from comment #37, and both NVDA and JAWS, our main assistive technologies, still recognize that they're running in Firefox. So I am reasonably sure we're safe.
Flags: needinfo?(mzehe)
Updated•7 years ago
|
status-firefox56:
--- → wontfix
status-firefox58:
--- → affected
Keywords: checkin-needed
Comment 53•7 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0b07ff291caa Change firefox.exe description metadata to evade the Windows 10 appcompat engine. r=ted
Keywords: checkin-needed
Comment 54•7 years ago
|
||
Jim based on the "DamThrottleControl" email thread and a discussion just now on IRC (release-drivers) I understand we're not taking the current fix on 57 correct?
Flags: needinfo?(jmathies)
Assignee | ||
Comment 55•7 years ago
|
||
Per the MS discussion list, this appcompat setting is applied only on Windows 10 Creators Update (1703), and it allows Firefox to run without throttling in that version. I've confirmed that we don't get AcGenral/user32 in Anniversary Update (1607) or Fall Creators Update (1709). Since our workaround means Firefox will be subject to throttling, it's probably doing more harm than good, and we'd be better off living with the weakened blocklist in 1703. Ryan, could you please back out this patch?
Flags: needinfo?(ryanvm)
Updated•7 years ago
|
Flags: needinfo?(ryanvm)
Comment 56•7 years ago
|
||
Backout by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/6d174f463c1a Backed out changeset 0b07ff291caa because it'll probably do more harm than good.
Updated•7 years ago
|
Flags: needinfo?(jmathies)
Comment hidden (Intermittent Failures Robot) |
Sylvestre - I suggest we remove this bug from tracking 57 because it isn't a regression relative to 56. The issue is fixed in Win 10 1709 which came out on the 17th of October so is slowly making its way out to users.
Flags: needinfo?(sledru)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 65•6 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/76c9e778b0e1 Skip browser_checkdllblockliststate.js on Win10 since it permafails on the version we use in automation. r=me
Comment 66•6 years ago
|
||
Hopefully we remember to un-skip this once we update to a newer version of Win10 in CI.
status-firefox59:
--- → affected
Whiteboard: [checkin-needed-beta]
Comment 67•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/d2a5a5d18781
Whiteboard: [checkin-needed-beta]
Comment 68•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/76c9e778b0e1
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
Component: Blocklist Policy Requests → Blocklist Implementation
You need to log in
before you can comment on or make changes to this bug.
Description
•