Closed Bug 1546498 Opened 5 months ago Closed 5 months ago

[trunk] Crash on startup (mozilla::ReadAheadLib() Line 388) when building with Windows SDK 10.0.17763

Categories

(Firefox :: General, defect, P1, blocker)

x86_64
Windows 7
defect

Tracking

()

RESOLVED FIXED
Firefox 68
Tracking Status
firefox-esr60 --- unaffected
firefox67 --- unaffected
firefox68 + fixed

People

(Reporter: mark, Assigned: dthayer)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Following the instructions to build Firefox using the bootstrap method from mozilla-central, I consistently run into the following crash after a successful build on startup:

Exception thrown at 0x000000013F7B1B10 in firefox.exe: 0xC0000005:
Access violation reading location 0x000007FEED96B000. occurred

firefox.exe!mozilla::ReadAheadLib(const wchar_t * aFilePath) Line 388 C++
firefox.exe!mozilla::GetBootstrap(const char * aXPCOMFile, mozilla::LibLoadingStrategy aLibLoadingStrategy) Line 374 C++
firefox.exe!InitXPCOMGlue(mozilla::LibLoadingStrategy aLibLoadingStrategy) Line 222 C++
firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 281 C++
firefox.exe!wmain(int argc, wchar_t * * argv) Line 131 C++

This happens both with a default nightly build (no mozconfig) and my intended mozconfig (primarily some optimizations for my system).
No crash signature since this is a hard crash before the reporter is initialized.

Are you able to bisect when this failure starts?
Hard crash before crash reporter is a drop-everything-and-fix.

Severity: critical → blocker
Priority: -- → P1

I'm not sure; I've never run into something like this before.

What would be the best way to go about bisecting something like this?

I am experiencing this as well. Running Windows 7.

I tried running mozregression both on shippable builds and debug builds, and all seem to run right up to the current date. I have tried doing clobber builds and the like locally, but they all just won't run for me.

I recently updated my Windows 10 SDK to version 10.0.17763 due to the recent ANGLE update, so I partly suspect this might have something to do with it. A hypothesis here is that maybe we are broken on Windows 7 when building with some SDK versions?

What version of the Windows 10 SDK(s) do you have installed? You an check the "Add or Remove Programs" dialog to see...

Flags: needinfo?(mark)
Attached image MSVC debugger info

Perhaps an overview of the state (using the JIT debugger of VS) when it crashes is helpful. It seems the problem occurs when it tries to pre-read xul.dll (judging by the indicated file size). Maybe xul.dll is simply getting too big?

What version of the Windows 10 SDK(s) do you have installed?

I have 3 versions installed:

  • 10.0.26624
  • 10.0.17763.132
  • 10.0.15063.674
Flags: needinfo?(mark)

A related question: Why are we trying to pre-read these files to begin with? Any modern file system and storage medium will already use read-ahead. And it's not like we're still on XP or Vista with suboptimal I/O. Can't we trust the O.S. to handle file reading properly instead of trying to force the matter?

I downgraded my Windows 10 SDK version from 10.0.17763 to 10.0.17134 and it finally worked for me! It does seem to be a problem with the 10.0.17763 SDK somehow interacting with Windows 7 and us in a bad way.

Until or if we find out some better reason why this is happening, I guess this is an okay workaround for now, unless we continue to be broken in subsequent SDKs as well...

I can confirm that removing the 10.0.17763.132 SDK and installing the 10.0.17134.12 SDK instead solved the startup crash. Of note is that without 17134 installed (and using just 15063 as stated on the windows build page) the tree also doesn't build due to ANGLE missing a required header in that case.

This works as a workaround but it should most definitely be mentioned on the Windows build page https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Windows_Prerequisites

I can also see this being a problem in the future if it continues to be broken in later SDKs that might become required by third party libs.

Summary: [trunk] Crash on startup (mozilla::ReadAheadLib() Line 388) → [trunk] Crash on startup (mozilla::ReadAheadLib() Line 388) when building with Windows SDK 10.0.17763
Regressed by: 1538279
Flags: needinfo?(dothayer)

Temporarily just sidestep the issue in bug 1546498 (crash with latest SDK
on startup in Windows 7) by just continuing to use the old method in
Windows 7. We saw no wins in telemetry for Windows 7 anyway, so we should
investigate why that is, and why we see a mysterious crash in the fallback
code, in a followup bug.

Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1c79adcd8483
Split out MaybePrefetchMemory's check into two functions r=aklotz
See Also: → 1548048
Pushed by dothayer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cbdf3fdfcbac
Split out MaybePrefetchMemory's check into two functions r=aklotz
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 68
Assignee: nobody → dothayer
Regressions: 1548419
Flags: needinfo?(dothayer)

If the binary is built with Windows SDK 10.0.17763, it will contain the .retplne section that has PAGE_NOACCESS protection. So reading the image will cause an access violation at the section.

See Also: → 1554380
You need to log in before you can comment on or make changes to this bug.