Status

()

Toolkit
Autocomplete
P2
critical
9 months ago
8 months ago

People

(Reporter: lizzard, Unassigned)

Tracking

({crash})

Firefox Tracking Flags

(Not tracked)

Details

(crash signature)

(Reporter)

Description

9 months ago
This bug was filed from the Socorro interface and is 
report bp-d637e49e-b902-4228-aaee-cb76b0170908.
=============================================================

Startup crash, new in beta 8. The one comment on the crash reports so far mentions dev edition is crashing over and over.
Looks like DLL injection to me. Second stack frame is xpcom/reflect/xptcall/md/win32/xptcstubs_asm_x86_64.asm:57. Maybe nfroyd can shed some light here?
Flags: needinfo?(nfroyd)
OK, so just to establish a shared baseline, in the crash report from comment 0, we have:

0 		@0x180019183 	
1 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/md/win32/xptcstubs_asm_x86_64.asm:57
2 	xul.dll 	nsAutoCompleteController::HandleEnter(bool, nsIDOMEvent*, bool*) 	toolkit/components/autocomplete/nsAutoCompleteController.cpp:370
3 	xul.dll 	XPTC__InvokebyIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_64.asm:97
4 		@0x83e719bb02 	
5 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp:1282
6 	xul.dll 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:967
7 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:469

The HandleEnter frame (#2) is actually calling EnterMatch (https://hg.mozilla.org/releases/mozilla-beta/annotate/da291ce74248/toolkit/components/autocomplete/nsAutoCompleteController.cpp#l1442), which was inlined by the compiler and is...rather large.  What's getting called that gets us into SharedStub?

Looking at a different crash report, https://crash-stats.mozilla.com/report/index/604f8dbf-e145-4adf-b571-63bca0170915 , we have:

0 		@0x180019183 	
1 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/md/win32/xptcstubs_asm_x86_64.asm:57
2 	xul.dll 	nsAutoCompleteController::PostSearchCleanup() 	toolkit/components/autocomplete/nsAutoCompleteController.cpp:1743
3 		@0x18001946c 	
4 	xul.dll 	nsAutoCompleteController::HandleEnter(bool, nsIDOMEvent*, bool*) 	toolkit/components/autocomplete/nsAutoCompleteController.cpp:370
5 	xul.dll 	XPTC__InvokebyIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_64.asm:97
6 		@0x9 	
7 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp:1282
8 	xul.dll 	XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:967
9 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:469

Which shares a lot of similarities with the first crash report.  The bogus frames are a little concerning, but let's ignore those for now.  This crash report says we're calling into SharedStub from PostSearchCleanup:

https://hg.mozilla.org/releases/mozilla-beta/annotate/6bd183fc6921/toolkit/components/autocomplete/nsAutoCompleteController.cpp#l1743

So we're calling into nsIAutoCompleteInput::OnSearchComplete:

  nsCOMPtr<nsIAutoCompleteInput> input(mInput);
  ...elided lines...
  // notify the input that the search is complete
  input->OnSearchComplete();

which means that `input`, and therefore `mInput`, must be something implemented in JS.  This seems like a likely thing for the first crash report too.

Why is SharedStub crashing, then?  The line in xptcstubs_asm_x86_64.asm is calling PrepareAndDispatch:

https://dxr.mozilla.org/mozilla-central/source/xpcom/reflect/xptcall/md/win32/xptcstubs_x86_64.cpp#15

which does some really gritty asm-level stuff, but it's also the heart of our JS->C++ function call mechanism, so any problems here would show up a lot sooner than crashes in the wild.

Panos's thought of DLL injection doesn't sound too bad, but I'm not sure what would be injected; the correlations don't show up anything that looks particularly suspicious.  Did you have specific DLLs in mind?

I'd be a little more inclined to blame memory corruption, but that seems like a cop-out, especially since the crashes all look to be localized to the particular area of autocomplete.  If we had some sort of weird thing where JS and C++ disagreed on what methods nsIAutoCompleteInput implemented, that would definitely cause weird crashes like this...do we ship updates to autocomplete as a system addon or something?
Flags: needinfo?(nfroyd)
My DLL injection theory was mainly based on the stack frames with memory addresses (@0x180019183, @0x83e719bb02) and no module data, but I might be reading the crash report wrong. Marco, could any of the changes you made recently in this area be related?
Flags: needinfo?(mak77)

Comment 4

8 months ago
From aggregates, this doesn't look like a startup crash, I also see larger uptimes.

nsIAutoCompleteInput didn't change in the last year, so it's unlikely to be a mismatch.
OnSearchComplete is usually implemented in XBL, we keep an owning pointer on it it shouldn't go away, but it may be invalid, somehow.

I don't think this is strictly related to the recent changes in the Address Bar because:
* it goes back to Firefox 56. Most of the changes made 57.
* I looked at the whole list of fixes in 56, and I couldn't find anything in Location Bar, Autocomplete or Places that looks related. But there have been changes in XBL, build, xpcom and xpconnect. Things like bug 1380397 (just to name one).
* 13% of the crashes are in the Content process, that seems to exclude this being due to Location Bar changes.

The first reports seem to be at 20170926 that is more or less the 56 release date (well, at least when 56 appeared on the ftp).
I wonder if the problem is some kind of PGO bug activated by the changes to xpconnect or some of the middle layers between xpcom and js.
Flags: needinfo?(mak77)

Comment 5

8 months ago
Setting as P2, because it doesn't look immediately actionable off-hand.
There are various problems with the way we currently handle input, it can basically change at any time, be nullified, I can't exclude something could set it to a broken object. Bug 1406229 is the n-th example of a null mInput crash for example.
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.