Closed Bug 1625452 Opened 4 years ago Closed 4 years ago

The target.cppunittest.tests fails to hook QueryDosDeviceW from kernelbase.dll for TestDllInterceptor.exe

Categories

(Core :: mozglue, defect)

75 Branch
x86
Windows 10
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla76
Tracking Status
firefox-esr68 --- wontfix
firefox74 --- wontfix
firefox75 + wontfix
firefox76 --- verified

People

(Reporter: atrif, Assigned: toshi)

Details

(Keywords: regression, regressionwindow-wanted)

Attachments

(5 files)

Attached image failed dll.png

Affected versions

  • 75.0b10 (20200326191140)

Affected platforms

  • Windows 10 x32

Steps to reproduce

  1. Download target.cppunittest.tests.zip and target.zip from https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=c6d1ad75d7ca1ab66cf4b269ef73e3020a4abc90 (Fx75.0b10 - Windows 2012 shippable opt Bpgo-B).
  2. Unzip the content.
  3. Copy mozglue.dll from target.zip in the unzipped file from above.
  4. Open the PowerShell window in the folder the content was extracted in the previous step.
  5. Type .\TestDllInterceptor.exe and hit Enter.

Expected result

  • All checks passed.

Actual result

  • TEST-UNEXPECTED-FAIL | WindowsDllInterceptor | Failed to hook QueryDosDeviceW from kernelbase.dll First 5 bytes of function: 6A 7C 68 D0 F0

Regression Range

  • Reproducible with Firefox 75.0b8, Firefox 75.0b9 and Firefox 75.0b1. Will search for one ASAP.

Notes

  • Attached a screenshot with the issue.

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
Has Regression Range: --- → no
Has STR: --- → yes

Toshi, David, can you take a look?

Flags: needinfo?(tkikuchi)
Flags: needinfo?(davidp99)

What happens if you try the test in a regular console (instead of PowerShell)?

Flags: needinfo?(davidp99) → needinfo?(alexandru.trif)

FYI, I am asking because of Bug 1601071.

(In reply to David Parks (dparks) (he/him) [:handyman] from comment #3)

What happens if you try the test in a regular console (instead of PowerShell)?

Same error with cmd too.

TestDllInterceptorCrossProcess.exe test pass.

Flags: needinfo?(alexandru.trif)

I think I reproduced this issue. When our detour meets the pattern of 6A 7C 68 D0 F0, it can process the first instruction 6a 7c (= push 7c). After that, its head needs to jump to the 3rd byte 68, which is push imm32, but it jumped to the 4th byte D0. Probably something wrong happened in CountPrefixBytes. Let me continue investigation and find out the root cause..

Flags: needinfo?(tkikuchi)

I got the root cause and came up with a simple patch. I'll b e able to start a review by EOD tomorrow.

When our detour processes instructions, we pass ReadOnlyTargetFunction to
CountPrefixBytes to determine whether a lock prefix exists or not.
In that case, we don't need to pass both ReadOnlyTargetFunction and an offset
as a parameter because ReadOnlyTargetFunction has an offset as a member.

Assignee: nobody → tkikuchi
Status: NEW → ASSIGNED

What's the likely effect of this issue beyond the failed test? Is it bad enough that we should get the fix in 75 (we're in RC week)?

Flags: needinfo?(tkikuchi)

(In reply to Julien Cristau [:jcristau] from comment #9)

What's the likely effect of this issue beyond the failed test? Is it bad enough that we should get the fix in 75 (we're in RC week)?

An immediate impact is our detour fails to hook QueryDosDeviceW here, so ChromiumCDMAdapter may not function correctly. I'm not sure how it emerges, though. Media team can answer that..?

Flags: needinfo?(tkikuchi)
Flags: needinfo?(bvandyk)

I believe the reason we hook this function is so the GMP doesn't need to use the system version, which means we can make the sandbox more restrictive. So I think if we fail to hook we'll run into the sandbox when we're trying to playback premium media (Netflix, Amazon prime, etc.). My preference from the media side would be try and uplift fixes if possible.

Flags: needinfo?(bvandyk)
Component: General → mozglue
Product: Firefox Build System → Core

It should be possible to check on a Windows 10 32-bit environment (which doesn't sound like its common) to see if those (Netflix, Amazon Prime, ...) work or not. I'm guessing that will define our urgency in uplifting.

Note that this was found by our own tests, if this had actually been broken in beta this might've been reported?

Also I'm not clear what regressed this, or if this had been broken in previous versions?

(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)

It should be possible to check on a Windows 10 32-bit environment (which doesn't sound like its common) to see if those (Netflix, Amazon Prime, ...) work or not. I'm guessing that will define our urgency in uplifting.

Alexandru, can you test this?

Note that this was found by our own tests, if this had actually been broken in beta this might've been reported?

I'd hope so, but probably better to make sure... Thanks :)

Flags: needinfo?(alexandru.trif)

(In reply to Bryce Seager van Dyk (:bryce) from comment #15)

Wonder if this is the cause of bug 1496607.

I believe it is. As shown below (that's from my local system where no repro), KernelBase!QueryDosDeviceW starts with push 7Ch, followed by a code pushing an immediate value. This detour failure occurs depending on a number of that immediate value, which means this issue could easily disappear/reappear release by release.

KernelBase!QueryDosDeviceW:
1012e5e0 6a7c            push    7Ch
1012e5e2 6898321b10      push    offset KernelBase!HviIsAnyHypervisorPresent+0xcdc (101b3298)

My proposed patch affects only 32bit binaries (i.e. 32bit Windows or WOW64), and the change is simple. The regression risk from uplift is low.

(In reply to Julien Cristau [:jcristau] from comment #14)

(In reply to Gian-Carlo Pascutto [:gcp] from comment #12)

It should be possible to check on a Windows 10 32-bit environment (which doesn't sound like its common) to see if those (Netflix, Amazon Prime, ...) work or not. I'm guessing that will define our urgency in uplifting.

Alexandru, can you test this?

I get the attached console errors when trying to play videos on Netflix or amazon prime. Tested with two machines. On one machine Netflix was working after trying to update the widevine plugin (even if no updates were installed) and playing another video (I could not check the other machine due to boot errors) but amazon prime was not at all on neither on the machines. On the other machine, Netflix was not working. These errors were presented in Firefox 75.0 (20200331175109) win 10x86. If more information is needed please let me know.

Flags: needinfo?(alexandru.trif)

So I retested on another win10x86 machine. It seems that Netflix is working after a browser restart.

Tested on Amazon Prime too and from the first try, an error message is shown that the stream cannot be played. After restarting the PC, and accessing Amazon Prime again, movies/TV series loaded as expected. This happened on a Win 10 x86 system with Firefox 75.0.

Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bc31229fd23f
No need to pass an offset to CountPrefixBytes.  r=handyman
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76

Given that this is Windows 10 32-bit only (which is likely not very common?), restarting can make it work and it's been there for a while, it probably does not make sense to try to uplift it aggressively to a dot release.

Attached image Win 10 x86 dll.png

Verified the fix on 76 (beta - treeherder) under Windows 10 32-bit and all tests are passed.

Status: RESOLVED → VERIFIED

Part of me thinks we should take this on ESR68 since this breaks potentially-important functionality, but the other part of me says we've made it this far into it's life cycle without anybody complaining and I wonder who will notice. Calling this wontfix for now but I'm open to arguments for why we should if someone feels strongly about it.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: