Startup crash in abort | _acrt_RtlGenRandom (Lenovo OneKey Theater)

RESOLVED FIXED

Status

defect
--
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: philipp, Assigned: marco)

Tracking

(Depends on 1 bug, Blocks 2 bugs, {crash})

unspecified
x86_64
Windows 7
Dependency tree / graph

Firefox Tracking Flags

(relnote-firefox 56+, firefox-esr52 wontfix, firefox55 wontfix, firefox56+ fixed, firefox57 fixed)

Details

(crash signature)

Attachments

(2 attachments, 2 obsolete attachments)

This bug was filed from the Socorro interface and is 
report bp-7c5d3993-2a37-4751-97a4-9a5701170601.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	ucrtbase.dll 	abort 	
1 	ucrtbase.dll 	_acrt_RtlGenRandom 	
2 	ucrtbase.dll 	rand_s 	
3 	nss3.dll 	winRandomness 	db/sqlite3/src/sqlite3.c:43256
4 	nss3.dll 	sqlite3_randomness 	db/sqlite3/src/sqlite3.c:26694
5 	nss3.dll 	writeJournalHdr 	db/sqlite3/src/sqlite3.c:48251
6 	nss3.dll 	pager_open_journal 	db/sqlite3/src/sqlite3.c:52471
7 	nss3.dll 	pager_write 	db/sqlite3/src/sqlite3.c:52656
8 	nss3.dll 	sqlite3PagerWrite 	db/sqlite3/src/sqlite3.c:52829
9 	nss3.dll 	newDatabase 	db/sqlite3/src/sqlite3.c:61930
10 	nss3.dll 	sqlite3BtreeBeginTrans 	db/sqlite3/src/sqlite3.c:62080
11 	nss3.dll 	sqlite3VdbeExec 	db/sqlite3/src/sqlite3.c:81184
12 	nss3.dll 	sqlite3_step 	db/sqlite3/src/sqlite3.c:76477
13 	nss3.dll 	sqlite3_exec 	db/sqlite3/src/sqlite3.c:110601
14 	xul.dll 	mozilla::storage::Connection::executeSql(sqlite3*, char const*) 	storage/mozStorageConnection.cpp:1178
15 	xul.dll 	mozilla::storage::Connection::ExecuteSimpleSQL(nsACString_internal const&) 	storage/mozStorageConnection.cpp:1646
16 	xul.dll 	mozilla::places::Database::InitTempEntities() 	toolkit/components/places/Database.cpp:1121
17 	xul.dll 	mozilla::places::Database::Init() 	toolkit/components/places/Database.cpp:488
18 	xul.dll 	mozilla::places::Database::GetSingleton() 	toolkit/components/places/Database.cpp:320
19 	xul.dll 	mozilla::places::Database::GetDatabase() 	toolkit/components/places/Database.cpp:429
20 	xul.dll 	nsNavHistory::Init() 	toolkit/components/places/nsNavHistory.cpp:308
21 	xul.dll 	nsNavHistory::GetSingleton() 	toolkit/components/places/nsNavHistory.cpp:266
22 	xul.dll 	nsNavHistoryConstructor 	toolkit/components/places/nsPlacesModule.cpp:18
23 	xul.dll 	nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) 	xpcom/components/nsComponentManager.cpp:1057
24 	xul.dll 	nsComponentManagerImpl::GetService(nsID const&, nsID const&, void**) 	xpcom/components/nsComponentManager.cpp:1300
25 	xul.dll 	nsJSCID::GetService(JS::Handle<JS::Value>, JSContext*, unsigned char, JS::MutableHandle<JS::Value>) 	js/xpconnect/src/XPCJSID.cpp:693
26 	xul.dll 	XPTC__InvokebyIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_64.asm:97
27 		@0x2d9227 	
28 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp:1296
29 	xul.dll 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:983
30 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:460
31 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp:2989
32 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp:406
33 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:478
34 	xul.dll 	js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) 	js/src/proxy/CrossCompartmentWrapper.cpp:333
35 	xul.dll 	js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) 	js/src/proxy/Proxy.cpp:421
36 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:448
37 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp:2989
38 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp:406
39 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:478
40 	xul.dll 	js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) 	js/src/proxy/CrossCompartmentWrapper.cpp:333
41 	xul.dll 	js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) 	js/src/proxy/Proxy.cpp:421
42 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:448
43 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp:2989
44 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp:406
45 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:478
46 	xul.dll 	JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) 	js/src/jsapi.cpp:2788
47 	xul.dll 	nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) 	js/xpconnect/src/XPCWrappedJSClass.cpp:1213
48 	xul.dll 	nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) 	js/xpconnect/src/XPCWrappedJS.cpp:613
49 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/md/win32/xptcstubs_x86_64.cpp:174
50 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/md/win32/xptcstubs_asm_x86_64.asm:57
51 	xul.dll 	nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverList.cpp:112
52 	xul.dll 	nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) 	xpcom/ds/nsObserverService.cpp:281
53 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4402
54 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4654
55 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4745
56 	firefox.exe 	NS_internal_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:305
57 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
58 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
59 	kernel32.dll 	BaseThreadInitThunk 	
60 	ntdll.dll 	RtlUserThreadStart

we had a user reporting this startup crash on this thread: https://support.mozilla.org/questions/1162522
looking through a couple of reports with the same signature, all had the unversioned DLLs "WindowsApiHookDll64.dll" & "ActiveDetect64.dll" in the modules list hooking into firefox. these modules belong to Lenovo OneKey Theater. though the overall crash volume for it is currently very low, this is a 64bit-only firefox crash and may get more common once we migrate more users to win64.

in the small sample we have available, the crash also seems to be more predominant with some eastern european locales (pl, cs, ru, uk), so maybe it's dependant on particular notebook models that are sold there.
I'm fine with blocklisting if we can get that user to report that the blocklisting works and doesn't cause other problems/crashes. Marco can you take this?
Component: OpenH264 → Other
Flags: needinfo?(mcastelluccio)
Posted patch Patch (obsolete) — Splinter Review
I'll spin a try build so that we can ask users to test.
Assignee: nobody → mcastelluccio
Status: NEW → ASSIGNED
Flags: needinfo?(mcastelluccio)
Philipp asked the user to try and reproduce the bug with the custom build.

We don't have other options as in the past 14 days nobody affected by this crash has left an email address. There are addresses in crashes older than 14 days, but I think it's not useful to contact them (if they are not crashing anymore, they can't verify the fix).
These rand_s crashes are caused by bad third-party software injecting hooks into advapi32.dll. Other rand_s crashes over the years: bug 723447, bug 1094945, bug 1167248, bug 1240589. We patched those crashes by calling the RtlGenRandom API instead of rand_s, but in this case SQLite is calling rand_s.
There are some email addresses now, I've triggered a new try build and will send an email to those (eight) users.
Marco, is this crash signature spiking? We just migrated Beta users running 32-bit Firefox to 64-bit (in build 56.0b9), so we are seeing other 64-bit specific crash signatures starting to spike.

Can we land your patch from comment 3 to block these DLLs in Beta 56? Over the last six months, 100% of these crashes are on Windows 7, so we can limit the block to Windows 7.

Curiously, I don't see this crash signature at all on Nightly, even looking back six months.
Flags: needinfo?(mcastelluccio)
(In reply to Chris Peterson [:cpeterson] from comment #7)
> Marco, is this crash signature spiking? We just migrated Beta users running
> 32-bit Firefox to 64-bit (in build 56.0b9), so we are seeing other 64-bit
> specific crash signatures starting to spike.
> 
> Can we land your patch from comment 3 to block these DLLs in Beta 56? Over
> the last six months, 100% of these crashes are on Windows 7, so we can limit
> the block to Windows 7.

No check for this currently but should be easy to add. We already have a win8 or higher flag we can crib from.
(In reply to Chris Peterson [:cpeterson] from comment #7)
> Marco, is this crash signature spiking? We just migrated Beta users running
> 32-bit Firefox to 64-bit (in build 56.0b9), so we are seeing other 64-bit
> specific crash signatures starting to spike.
> 
> Can we land your patch from comment 3 to block these DLLs in Beta 56? Over
> the last six months, 100% of these crashes are on Windows 7, so we can limit
> the block to Windows 7.
> 
> Curiously, I don't see this crash signature at all on Nightly, even looking
> back six months.

It looks like it is more common now than before, 161 crashes over the last week (from approximately 27 installations).

The only problem is that we don't know if the blocklist works or if it causes problems/crashes. I've sent emails to the users who left their address in the reports, but I haven't heard back from them yet.
Julien, can you rebased Marco's DLL blocklist patch for these Lenovo DLLs? Unfortunately, we don't know anyone who has reproduced the crash or can test whether blocking those Lenovo DLLs fixes the Firefox crash without causing other problems. I can try asking on Lenovo's support forum for any testers.
Crash Signature: [@ abort | _acrt_RtlGenRandom] → [@ abort | _acrt_RtlGenRandom] [@ abort | __acrt_RtlGenRandom] [@ __acrt_RtlGenRandom]
Flags: needinfo?(jcristau)
btw, here is the installer for Lenovo Onekey Theater for Windows 7 on the IdeaPad Z460, Z560. The latest version if from February 2012, so we should not expect Lenovo to provide a fix.

https://support.lenovo.com/us/en/downloads/ds012956
Some Firefox users kindly included the email address in their crash reports. I emailed them to see if they can help test the DLL blocklist fix.
(In reply to Chris Peterson [:cpeterson] from comment #12)
> Some Firefox users kindly included the email address in their crash reports.
> I emailed them to see if they can help test the DLL blocklist fix.

I've already done this yesterday :)

The updated build I sent them is here: https://queue.taskcluster.net/v1/task/ETKqHlG-Q8K_RyfrjZqVtg/runs/0/artifacts/public/build/install/sea/target.installer.exe.
Flags: needinfo?(mcastelluccio)
Flags: needinfo?(jcristau)
(In reply to Marco Castelluccio [:marco] from comment #13)
> > I emailed them to see if they can help test the DLL blocklist fix.
> 
> I've already done this yesterday :)

oops! Sorry I missed that detail. <:)
I received a user feedback. Crash report: https://crash-stats.mozilla.com/report/index/0e8258d2-9aaf-4220-b377-b5ef81170911
According to his description, Firefox crashed after the installation and the safe mode can not work as well.

So I can ask him try to use EXE in comment 13? If the problem is solved, the normal performance is that firefox can use fine and no longer crash.
(In reply to yxu from comment #15)
> I received a user feedback. Crash report:
> https://crash-stats.mozilla.com/report/index/0e8258d2-9aaf-4220-b377-
> b5ef81170911
> According to his description, Firefox crashed after the installation and the
> safe mode can not work as well.
> 
> So I can ask him try to use EXE in comment 13? If the problem is solved, the
> normal performance is that firefox can use fine and no longer crash.

Yes please! It would help us make a decision here.
Flags: needinfo?(yxu)
Marco, dmajor already blocked these Lenovo OneKey Theatre DLLs in Firefox 36 bug 1123778, but he only blocked UNVERSIONED. The activedetect64.dll and windowsapihookdll64.dll DLLs in yxu's crash report bp-0e8258d2-9aaf-4220-b377-b5ef81170911 are unversioned, so I'm not sure why the UNVERSIONED block is not working.

So your new patch can just update the existing entries for activedetect64.dll and windowsapihookdll64.dll to block ALL_VERSIONS. You might as well block ALL_VERSIONS for the 32-bit activedetect32.dll and windowsapihookdll32.dll too, since the 32-bit DLLs were causing problems in the past and the UNVERSIONED block doesn't seem to be working for 64-bit.
Blocks: 1123778
Flags: needinfo?(mcastelluccio)
Posted patch Patch (obsolete) — Splinter Review
Thanks, I hadn't noticed we were already blocking them when unversioned.
Attachment #8873508 - Attachment is obsolete: true
Flags: needinfo?(mcastelluccio)
Attachment #8906805 - Flags: review?(dmajor)
I'm confused. If we're seeing continuing crash reports with unversioned DLLs, which are already on the blocklist, why do we expect that upgrading the blocklist to ALL_VERSIONS will change anything?
Flags: needinfo?(mcastelluccio)
Tracking for 56, since this may affect more users as they migrate to 64-bit.
(In reply to David Major [:dmajor] from comment #19)
> I'm confused. If we're seeing continuing crash reports with unversioned
> DLLs, which are already on the blocklist, why do we expect that upgrading
> the blocklist to ALL_VERSIONS will change anything?

The unversioned block used to work, maybe it is still working for a subset of users. So my hope is that the versioned one will block at least a subset of the crashes.
Flags: needinfo?(mcastelluccio)
(In reply to Marco Castelluccio [:marco] from comment #21)
> (In reply to David Major [:dmajor] from comment #19)
> > I'm confused. If we're seeing continuing crash reports with unversioned
> > DLLs, which are already on the blocklist, why do we expect that upgrading
> > the blocklist to ALL_VERSIONS will change anything?
> 
> The unversioned block used to work, maybe it is still working for a subset
> of users. So my hope is that the versioned one will block at least a subset
> of the crashes.

My point isn't about whether the unversioned block is effective. What I'm saying is that all the reports I looked at have unversioned binaries, so changing the version condition isn't going to change the set of binaries that we attempt to block (regardless of whether the block works or not). Am I missing something?
David, you are right. Lenovo's support website [1] lists a product version (2.0.1.8), so I figured that there was no harm in expanding the block to all DLL versions, in case some Lenovo DLLs included a version in a later release. But that would not solve this crash because we know there the unversioned DLLs are still crashing and not being blocked. So we need to figure out why these DLLs are not being blocked. Bob says we are such a problem in Quick Heal Antivirus bug 1347867.

What are our options for preventing these early DLL loads? I don't expect Lenovo to fix the crash in their DLL because their last update was in 2012. Can we do something such as shipping a dummy DLL in the Firefox directory that will get loaded instead of Lenovo's?

[1] https://support.lenovo.com/us/en/downloads/ds012956
I wonder what has changed since Firefox 36. The Lenovo DLLs were successfully blocked when bug 1123778 was resolved and verified, but now they're not. Maybe the difference is that the Firefox 36 crashes were in the 32-bit DLLs and now we are crashing in the 64-bit DLLs?
mak or ddurst, I think we should patch our copy of SQLite to stop calling rand_s to work around these Lenovo DLL crashes in Beta 56. Can you help with this or recommend someone else I can ask?

We are crashing in SQLite's call to the Windows API `rand_s`. We (and Chrome) have had recurring problems with antivirus DLL hooks causing crashes in rand_s, e.g. bug 1240589, bug 1167248, bug 723447, and bug 694344. We replaced some rand_s calls with RtlGenRandom (e.g. changeset [1]) and those crashes seemed to go away.

There are three calls to rand_s in mozilla-central [2], but the SQLite call is the only one that shows up in crash reports now, probably because it is called very early during browser startup.

[1] https://hg.mozilla.org/mozilla-central/rev/795a10bae428
[2] https://searchfox.org/mozilla-central/search?case=true&regexp=true&q=%5Cbrand_s%5Cb
Flags: needinfo?(mak77)
Flags: needinfo?(ddurst)
I reworked my rand_s() patch from bug 1167248 to replace rand_s() with RtlGenRandom() in sqlite3.c. If this works around the Lenovo DLL crashes, we might consider pushing this change to upstream SQLite.

A sqlite3.c code comment says rand_s() isn't available with MinGW. I don't know if RtlGenRandom() is available with MinGW or not.

Try run:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=488ac239bd6be858eeedf158123d3a73df6cf586
Flags: needinfo?(mak77)
Flags: needinfo?(ddurst)
(In reply to Chris Peterson [:cpeterson] from comment #24)
> I wonder what has changed since Firefox 36. The Lenovo DLLs were
> successfully blocked when bug 1123778 was resolved and verified, but now
> they're not. Maybe the difference is that the Firefox 36 crashes were in the
> 32-bit DLLs and now we are crashing in the 64-bit DLLs?

FWIW, the block didn't completely work on Chrome either, and that was back in 2014, see https://bugs.chromium.org/p/chromium/issues/detail?id=379218#c15 and later comments.

I support your "rewrite rand_s" patch. It's probably the best technical means at our disposal right now. I didn't bring it up because I assumed it would be too much pain/process to patch sqlite. If that's not the case, then great!
Comment on attachment 8906805 [details] [diff] [review]
Patch

I hadn't noticed the modules were always unversioned.
Attachment #8906805 - Attachment is obsolete: true
Attachment #8906805 - Flags: review?(dmajor)
Attachment #8906805 - Flags: review-
(In reply to Chris Peterson [:cpeterson] from comment #25)
> I think we should patch our copy of SQLite to stop calling
> rand_s to work around these Lenovo DLL crashes in Beta 56. 

We (the SQLite developers) will make sure that the next release of SQLite does not call rand_s().

To be clear, rand_s() is currently only used to help seed SQLite's own internal PRNG on Windows.  rand_s() should only be called once, when SQLite first initializes.  On UNIX we have /dev/random and rand() is never used.
Sorry to reply late, I have been waiting for the user's response. The user installed the above commented version for testing. Currently in use he did not find the problem, the startup crash has been resolved.

About Lenovo DLLs virsion, this user used Onekey Theater 2.0.2.7
Flags: needinfo?(yxu)
Comment on attachment 8907319 [details]
Bug 1369361 - Backport SQLite fix removing rand_s() in an attempt to avoid Lenovo DLL crashes.

https://reviewboard.mozilla.org/r/178994/#review184232

We don't patch Sqlite, and I'd not want to go this direction unless the world is falling upon us. The current setup is really cheap to maintain, anyone can indeed update the library without the need for peers or the module owner to do it (there's no one officially assigned to work on mozStorage, we just contribute our time from other teams). This creates a precedent, and people will start willing to patch this thing and that thing, I dont' really want to lose the precious testing the Sqlite team does on the original code.
This crash is caused by old libraries from 2012, I'm sure it's not the only one and we can't just patch all of them.

Richard promised a solution upstream in comment 30 and that's likely what we'll take. At a maximum we can patch our sqlite with the upstream patch, without a patches folder and without complicating the upgrade process. So that on upgrade we'll just pick upstream.

Did we already figured out why the dll block doesn't work?  That sounds like would have a nicer impact on the product.
Attachment #8907319 - Flags: review?(mak77) → review-
If you would like a quick SQLite 3.20.2 release that contains this one patch, let me know.
(In reply to Marco Bonardo [::mak] from comment #32)
> We don't patch Sqlite, and I'd not want to go this direction unless the
> world is falling upon us. The current setup is really cheap to maintain,
> anyone can indeed update the library without the need for peers or the
> module owner to do it (there's no one officially assigned to work on
> mozStorage, we just contribute our time from other teams). This creates a
> precedent, and people will start willing to patch this thing and that thing,

That's a good point. Upstream SQLite just removed the rand_s() call [1] so we could just apply their tiny fix to our copy of SQLite 3.20.1 without committing patch files in hg. Whenever the next version of SQLite is available, it will aleady have the rand_s() fix and won't need to be patched.

[1] https://www.sqlite.org/src/info/3a2793aa65727cbb


> Did we already figured out why the dll block doesn't work?  That sounds like
> would have a nicer impact on the product.

I don't know why the DLL blocklist isn't working, but I am talking to some other people about this.


(In reply to D. Richard Hipp from comment #33)
> If you would like a quick SQLite 3.20.2 release that contains this one
> patch, let me know.


I don't think you need to release a SQLite 3.20.2 just for this fix. I can just apply that fix to our copy of 3.20.1, which is the same effect but less work for you. :-)
Comment on attachment 8907319 [details]
Bug 1369361 - Backport SQLite fix removing rand_s() in an attempt to avoid Lenovo DLL crashes.

https://reviewboard.mozilla.org/r/178994/#review184824

Much better, considered this only affects Windows it won't even be a problem with Linux usage of System Sqlite.
Thanks!
Attachment #8907319 - Flags: review?(mak77) → review+
Pushed by cpeterson@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/14207baa622e
Backport SQLite fix removing rand_s() in an attempt to avoid Lenovo DLL crashes. r=mak
Comment on attachment 8907319 [details]
Bug 1369361 - Backport SQLite fix removing rand_s() in an attempt to avoid Lenovo DLL crashes.

Approval Request Comment

[Feature/Bug causing the regression]: Migrating Beta and Dev Edition users to 64-bit Firefox caused this existing bug in a Lenovo DLL to start crashing more than before.

[User impact if declined]: Some 64-bit Firefox 56 users on Windows 7 will see startup crashes. Beta 56 has had 266 Lenovo DLL crashes total, Beta 55 had 66, and Beta 54 had zero.

[Is this code covered by automated tests?]: No.

[Has the fix been verified in Nightly?]: No because we don't know anyone who can reproduce the crash. We don't know for certain that this change actually fixes the crash or just moves it to a different crash signature.

[Needs manual test from QE? If yes, steps to reproduce]: No because we don't know anyone who can reproduce the crash.

[List of other uplifts needed for the feature/fix]: None

[Is the change risky?]: No

[Why is the change risky/not risky?]: This should be a safe fix because it just removes some Windows code that is now also removed from SQLite upstream. SQLite runs its own extensive tests.

[String changes made/needed]: None
Attachment #8907319 - Flags: approval-mozilla-beta?
(In reply to Chris Peterson [:cpeterson] from comment #38)
> [Has the fix been verified in Nightly?]: No because we don't know anyone who
> can reproduce the crash. We don't know for certain that this change actually
> fixes the crash or just moves it to a different crash signature.
> 
> [Needs manual test from QE? If yes, steps to reproduce]: No because we don't
> know anyone who can reproduce the crash.

There's the user who contacted yxu who can reproduce, but he wasn't able to reproduce with my try build (which shouldn't have fixed the crash).
Maybe he can't reproduce on Nightly at all. Perhaps he can test once this is in beta.
btw, this should be a very safe fix, but this particular crash's volume is quite low (266 reports from about 30 Beta users). So not taking this uplift in Beta 56 final RC build would not be the end of the world.
https://hg.mozilla.org/mozilla-central/rev/14207baa622e
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Comment on attachment 8907319 [details]
Bug 1369361 - Backport SQLite fix removing rand_s() in an attempt to avoid Lenovo DLL crashes.

Let's try this for the RC build on Monday. I'd love for us to avoid this startup crash on 56 release.
Attachment #8907319 - Flags: approval-mozilla-release+
Attachment #8907319 - Flags: approval-mozilla-beta?
Attachment #8907319 - Flags: approval-mozilla-beta-
Liz, I think we should relnote this Lenovo crash and resolution for Firefox 56.

[Why is this notable]:
Windows 7 users running 64-bit Firefox and Lenovo's OneKey Theater software will hit startup crashes. We plan to automatically migrate some 32-bit Firefox users to 64-bit during the 56 release, so some users may hit this crash but not know they are now running 64-bit Firefox.
 
[Affects Firefox for Android]:
No

[Suggested wording]:
Lenovo's "OneKey Theater" software for IdeaPad laptops can cause 64-bit Firefox startup crashes on Windows 7. To fix this crash, please [re-install 32-bit Firefox](https://www.mozilla.org/firefox/all/).

[Links (documentation, blog post, etc)]:
https://www.mozilla.org/firefox/all/
relnote-firefox: --- → ?
Flags: needinfo?(lhenry)
(In reply to Chris Peterson [:cpeterson] from comment #44)
> Lenovo's "OneKey Theater" software for IdeaPad laptops can cause 64-bit
> Firefox startup crashes on Windows 7. To fix this crash, please [re-install
> 32-bit Firefox](https://www.mozilla.org/firefox/all/).

Have we actually confirmed that reinstalling 32-bit Firefox fixes these crashes? I am wondering if they merely show up with different signatures...
Flags: needinfo?(cpeterson)
(In reply to David Major [:dmajor] from comment #45)
> (In reply to Chris Peterson [:cpeterson] from comment #44)
> > Lenovo's "OneKey Theater" software for IdeaPad laptops can cause 64-bit
> > Firefox startup crashes on Windows 7. To fix this crash, please [re-install
> > 32-bit Firefox](https://www.mozilla.org/firefox/all/).
> 
> Have we actually confirmed that reinstalling 32-bit Firefox fixes these
> crashes? I am wondering if they merely show up with different signatures...

I haven't confirmed that downgrading to 32-bit avoids the crashes, but it seems like a safe suggestion. If migrating a user from 32- to 64-bit causes enough crashes that the user seeks out information in the Firefox release notes, then they probably had no or few crashes with 32-bit Firefox.

When we've had rand_s crash spikes in the past, removing the call to rand_s seemed to make the crash reports go away. If the crashes just moved to a different signature, we might have seen a new spike. But the new crash signatures are random, we might not see a new spike. I'm not sure how to search for crash reports that these Lenovo DLLs (ActiveDetect64.dll and WindowsApiHookDll64.dll). Socorro's Super Search doesn't appear to allow searching for module names, but it does have a field called "app init dlls" that has some DLL module names (but not ActiveDetect64.dll or WindowsApiHookDll64.dll).
Flags: needinfo?(cpeterson)
Noted, slightly reworded: "Startup crashes with 64-bit Firefox on Windows 7, for users of Lenovo's "OneKey Theater" software for IdeaPad laptops. To fix this crash, please re-install 32-bit Firefox."
Flags: needinfo?(lhenry)
I wonder if we should try to detect these DLLs and prevent migration of those users to 64-bit.
(In reply to Chris Peterson [:cpeterson] from comment #46)
> I'm not sure how to
> search for crash reports that these Lenovo DLLs (ActiveDetect64.dll and
> WindowsApiHookDll64.dll). Socorro's Super Search doesn't appear to allow
> searching for module names, but it does have a field called "app init dlls"
> that has some DLL module names (but not ActiveDetect64.dll or
> WindowsApiHookDll64.dll).

The DLLs are not available via SuperSearch, they are only available in the Telemetry Socorro dataset (which is not as easy to use).

(In reply to David Major [:dmajor] from comment #48)
> I wonder if we should try to detect these DLLs and prevent migration of
> those users to 64-bit.

This is a good idea and should be pretty trivial to implement. I don't know how we could deploy it to users though (56 is ready, we would need a system addon).
This is a good idea, but since we don't have time to add a DLL check in 56 (or probably 57), we could exclude Windows 7 users from 56's migration. This Lenovo crash only affects Windows 7. Unfortunately, Windows 7 is the most common OS version among Firefox users (46%). Windows 10 is 33% and Windows 8.x is 10%.

I think we should still include all Windows versions in our 1% migration of 56. If we see similar crashes in 56 even though we have patched SQLite to avoid rand_s(), we can then decide what to do about these DLLs and Windows 7 users in future migration attempts.
I tried to reproduce the issue with 55.0.3, and I saw the troublesome dlls are blocked. But then I found this comment:

  bp-ad568e71-46bc-4179-be81-14a490170924: sudden crash after Avast install

I installed Avast and then can reproduce that the dlls are not blocked. I'll try to understand why.
(In reply to Ting-Yu Chou [:ting] from comment #51)
> I tried to reproduce the issue with 55.0.3, and I saw the troublesome dlls
> are blocked. But then I found this comment:
> 
>   bp-ad568e71-46bc-4179-be81-14a490170924: sudden crash after Avast install
> 
> I installed Avast and then can reproduce that the dlls are not blocked. I'll
> try to understand why.

Oh, that's a good point! The crash reports that we're getting could be the intersection of "has OneKey software" and "has another software that breaks out blocklist" -- either of those would be OK in isolation, but together they spell trouble.

This makes me wonder if the relnote is maybe too broad; not all OneKey users would be affected by this. Same for the plans re blocking migration.
 
Can anyone check whether these reports are setting BlocklistInitFailed (not sure if that's the exact spelling)? I don't see it on crash-stats but maybe it's a private field.

Specifically in the Avast case, I bet they hook LdrLoadDll before we do, and then our disassembler no longer understands it.
(In reply to David Major [:dmajor] from comment #52)
> This makes me wonder if the relnote is maybe too broad; not all OneKey users
> would be affected by this. Same for the plans re blocking migration.

I see what you mean. I don't think the relnote will be a problem. I assume that only Lenovo users actually affected by the startup crashes would actually seek out or act upon the relnote's recommendation to reinstall 32-bit Firefox. And it's not that big a deal if anyone else reinstalls 32-bit Firefox.


> Can anyone check whether these reports are setting BlocklistInitFailed (not
> sure if that's the exact spelling)? I don't see it on crash-stats but maybe
> it's a private field.

I spot checked some of these Lenovo crash reports from different Firefox versions. All of these crash reports had "BlocklistInitFailed: 1". You do need to sign into Socorro to view BlocklistInitFailed in the crash report's Metadata tab.


> Specifically in the Avast case, I bet they hook LdrLoadDll before we do, and
> then our disassembler no longer understands it.

Do we have a bug on file for this type of LdrLoadDll problem? I don't see an exact match in any of the bugs blocking the "injecteject" bug 1306406.

Should I reach out to Avast?
(In reply to Chris Peterson [:cpeterson] from comment #53)
> Do we have a bug on file for this type of LdrLoadDll problem? I don't see an
> exact match in any of the bugs blocking the "injecteject" bug 1306406.

I'm sure Avast isn't the only one; it's in the nature of AV software to want to filter DLL loads. I recall seeing bugs for specific instances over the years but not one for the problem in general.
 
> Should I reach out to Avast?

I'll leave that up to you and Jim. (Maybe it's pointless if we're planning to take a harder stance on external DLLs anyway)
This problem appears massively after 56.0.1 32-bit Firefox is automatically upgraded to 64-bit. The problem now is that when user startup Firefox no longer shows a crash, but instead displays an error pop-up window for the application (see attachment). At which point Firefox can not open any page.

After re-install 32-bit Firefox works properly. The application error code is 0x80000003.
Posted image error message
yxu, can you still reproduce this crash? In which Firefox versions? Do you have the Lenovo "OneKey Theater" software installed?

Unfortunately, there's not much we can do. We can't block Lenovo's crashing DLL. I landed a speculative fix in Beta 56b13 (comment 43) that seemed to make the crashes "go away". There have only been 17 crash reports in the last week, all from people running Firefox < 56b13, suggesting my speculative fix worked (or moved the crash to a different signature):

https://crash-stats.mozilla.com/search/?signature=~acrt_RtlGenRandom&product=Firefox&platform=Windows&date=%3E%3D2017-11-13T14%3A33%3A20.000Z&date=%3C2017-11-20T14%3A33%3A20.000Z&_sort=-date&_facets=signature&_facets=version&_facets=platform_pretty_version&_columns=date&_columns=signature&_columns=version#facet-version
Flags: needinfo?(yxu)
(In reply to Chris Peterson [:cpeterson] from comment #57)
> yxu, can you still reproduce this crash? In which Firefox versions? 

Yes I can reproduce, but unfortunately not on my computer, I try to reach some users, they will appear above I said the kind of situation.

> Do you have the Lenovo "OneKey Theater" software installed?

This is not all of them have this software installed or said can not find in the control panel, but it seems that all users are Lenovo machines & win7 system.

The problem began to appear in large numbers after we upgrade 32-bit Firefox to 64 bits in 56.0.1, for the user, the performance is that they can't use Firefox after upgraded. I think the previous fixes came into effect, but it seemed to turn the problem into another. Because in the past this problem led to the crash at startup, and now Firefox did not crash, but pop up the window in my attachment above. No collapse means we have no way of tracking.
Flags: needinfo?(yxu)
(In reply to yxu from comment #58)
> > Do you have the Lenovo "OneKey Theater" software installed?
> 
> This is not all of them have this software installed or said can not find in
> the control panel, but it seems that all users are Lenovo machines & win7
> system.

Do you know many people hitting this crash? How can we estimate how many people are affected?

If this is a very common problem, we could add a Firefox stub installer check for Lenovo software and give those users 32-bit Firefox.


> The problem began to appear in large numbers after we upgrade 32-bit Firefox
> to 64 bits in 56.0.1, for the user, the performance is that they can't use
> Firefox after upgraded. I think the previous fixes came into effect, but it
> seemed to turn the problem into another. Because in the past this problem
> led to the crash at startup, and now Firefox did not crash, but pop up the
> window in my attachment above. No collapse means we have no way of tracking.

That is unfortunate. I don't know what we can do for users we have already upgraded to 64-bit. They will keep crashing until they re-install 32-bit Firefox. We can't automatically downgrade them to 32-bit if they can't run Firefox (to receive the new update).

btw, I have been watching for _acrt_RtlGenRandom crash reports and sending instructions for re-installing 32-bit Firefox to people whose crash reports included their email address. Unfortunately, it sounds like I am missing those people who crash before Firefox sends a crash report. :(
(In reply to Chris Peterson [:cpeterson] from comment #59)
> (In reply to yxu from comment #58)

> If this is a very common problem, we could add a Firefox stub installer
> check for Lenovo software and give those users 32-bit Firefox.

Yes, this is a common problem, but I am not sure whether this issue is only related to Lenovo Onekey. Because there is no crash report, just the application error code(0x80000003). According to this article(http://iknow.lenovo.com/detail/dc_R0039.html), use Lenovo Onekey software needs to be installed corresponding graphics driver, sound card driver, Energy Management, Lenovo SlideNav (some software may not be available) and OneKey Theater at the same time. I think it is necessary to add a Firefox stub installer check for Lenovo software because this problem has become a major problem for Chinese users. Also in Bug 1274659, we may need to suspend this part of the user 32-bit to 64-bit migration.

> That is unfortunate. I don't know what we can do for users we have already
> upgraded to 64-bit. They will keep crashing until they re-install 32-bit
> Firefox. We can't automatically downgrade them to 32-bit if they can't run
> Firefox (to receive the new update).

I have wirte a FAQ article of this problem in the Chinese user's community. If the user encountered the problem, they should be able to understand how to solve.
(In reply to yxu from comment #60)
> Yes, this is a common problem, but I am not sure whether this issue is only
> related to Lenovo Onekey. Because there is no crash report, just the
> application error code(0x80000003). According to this
> article(http://iknow.lenovo.com/detail/dc_R0039.html), use Lenovo Onekey
> software needs to be installed corresponding graphics driver, sound card
> driver, Energy Management, Lenovo SlideNav (some software may not be
> available) and OneKey Theater at the same time. I think it is necessary to
> add a Firefox stub installer check for Lenovo software because this problem
> has become a major problem for Chinese users. Also in Bug 1274659, we may
> need to suspend this part of the user 32-bit to 64-bit migration.

Do you know how we can check whether the Lenovo software is installed or enabled? Is there a certain DLL the stub installer or auto updater should look for?


> I have wirte a FAQ article of this problem in the Chinese user's community.
> If the user encountered the problem, they should be able to understand how
> to solve.

Thanks! Should we promote your FAQ on Mozilla social media or create a SUMO page? We currently only have a comment in the Firefox 56 release notes:

"Startup crashes with 64-bit Firefox on Windows 7, for users of Lenovo's "OneKey Theater" software for IdeaPad laptops. To fix this crash, please re-install 32-bit Firefox"

https://www.mozilla.org/en-US/firefox/56.0/releasenotes/
(In reply to yxu from comment #60)
> Yes, this is a common problem, but I am not sure whether this issue is only
> related to Lenovo Onekey. Because there is no crash report, just the
> application error code(0x80000003).

Ted, do you know why Breakpad is not able to catch these "unknown software exception 0x80000003" crashes in firefox.exe? Exception code 0x80000003 implies corrupted system files or buggy anti-virus hooks.

We used to catch these Lenovo users' crash reports when SQLite called Windows' rand_s() API. Removing the call to rand_s() turned these crashes into exception code 0x80000003 instead of just crashing in other Firefox code.

There is a screenshot of the exception dialog in attachment 8929992 [details].
Flags: needinfo?(ted)
I've opened bug 1420251 to track the new crash, we can continue the discussion there.
Flags: needinfo?(ted)
Blocks: 1435797
No longer blocks: injecteject
(In reply to Chris Peterson [:cpeterson] from comment #53)
> Should I reach out to Avast?

Did you take any action regarding this?
Even if we don't contact them, it might be worth getting a bug on file for the InjectEject project.
Flags: needinfo?(cpeterson)
(In reply to Marco Castelluccio [:marco] from comment #64)
> (In reply to Chris Peterson [:cpeterson] from comment #53)
> > Should I reach out to Avast?
> 
> Did you take any action regarding this?
> Even if we don't contact them, it might be worth getting a bug on file for
> the InjectEject project.

No. I didn't reach out to them. I can file a new bug for InjectEject. Based on comment 51 and 52, it sounds like the issue is that Avast does something that prevents our DLL blocklist from blocking ActiveDetect64.dll and WindowsApiHookDll64.dll.
Flags: needinfo?(cpeterson)
Depends on: 1444030
You need to log in before you can comment on or make changes to this bug.