Last Comment Bug 670542 - Crash in nsStandardURL::SchemeIs or nsSimpleURI::SchemeIs mainly close to startup, mainly with a random .cpl or .dll filename in stack (malware-related)
: Crash in nsStandardURL::SchemeIs or nsSimpleURI::SchemeIs mainly close to sta...
Status: VERIFIED FIXED
: crash, regression
Product: Core
Classification: Components
Component: Networking (show other bugs)
: 6 Branch
: x86 Windows NT
: -- critical (vote)
: mozilla7
Assigned To: Daniel Holbert [:dholbert]
:
: Patrick McManus [:mcmanus]
Mentors:
: 680127 680418 (view as bug list)
Depends on:
Blocks: malware-attacks 308590
  Show dependency treegraph
 
Reported: 2011-07-10 10:18 PDT by Marcia Knous [:marcia - use ni]
Modified: 2013-12-27 14:31 PST (History)
16 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
fixed
fixed


Attachments
stack from windbg (17.26 KB, text/plain)
2011-07-18 12:35 PDT, Ted Mielczarek [:ted.mielczarek]
no flags Details
alternate stack from MSVC (5.45 KB, text/plain)
2011-07-18 19:51 PDT, Daniel Holbert [:dholbert]
no flags Details
patch 1: shift nsIURI changes to end of interface (3.25 KB, patch)
2011-07-26 18:28 PDT, Daniel Holbert [:dholbert]
jst: review+
christian: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
patch 2: rev IIDs in IDL files inheriting from nsIURI (4.25 KB, patch)
2011-07-26 18:32 PDT, Daniel Holbert [:dholbert]
jst: review+
christian: approval‑mozilla‑aurora+
Details | Diff | Splinter Review
patch 3: Swap out previous nsIURI IID for new one, in nsBinaryInputStream::ReadObject (1.15 KB, patch)
2011-07-26 18:54 PDT, Daniel Holbert [:dholbert]
jst: review+
christian: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Marcia Knous [:marcia - use ni] 2011-07-10 10:18:25 PDT
This bug was filed from the Socorro interface and is 
report bp-0ac0593f-8bdf-47cd-851a-770f22110710 .
============================================================= 

Filing as this is a new crash showing up in the early 6.0 data (but I did see it showing up in smaller numbers before beta).  See https://crash-stats.mozilla.com/report/list?signature=nsStandardURL::SchemeIs%28char%20const*,%20int*%29. It also appears on 7.0a2 but in no other versions.

Some of the comments mentioned it happened right after they updated.

The stack varies quite a bit but this is one sample:

https://crash-stats.mozilla.com/report/index/0ac0593f-8bdf-47cd-851a-770f22110710

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	nsStandardURL::SchemeIs 	netwerk/base/src/nsStandardURL.cpp:1686
1 	yo0O9efo.cpl 	yo0O9efo.cpl@0xe67b 	
2 	yo0O9efo.cpl 	yo0O9efo.cpl@0x3321 	
3 	xul.dll 	nsDocShell::LoadURI 	docshell/base/nsDocShell.cpp:1431
4 	xul.dll 	nsLocation::SetURI 	dom/base/nsLocation.cpp:364
5 	xul.dll 	nsLocation::SetHrefWithBase 	dom/base/nsLocation.cpp:646
6 	xul.dll 	nsRefPtr<nsIDOMEventListener>::~nsRefPtr<nsIDOMEventListener> 	obj-firefox/xpcom/build/nsCOMPtr.cpp:81
7 	xul.dll 	nsLocation::SetHrefWithContext 	dom/base/nsLocation.cpp:593
Comment 1 Scoobidiver (away) 2011-07-10 11:16:55 PDT
It is #1 top crasher in 6.0 and happens almost exclusively at startup.

Stack traces contain either a dll or a cpl as frame 1 and 2.
Comment 2 Sheila Mooney 2011-07-11 14:54:55 PDT
Kairo, can we get some correlations for this one. It seems like there are different stacks...can we get a report with the groupings...might help.
Comment 3 Sheila Mooney 2011-07-11 14:56:08 PDT
Tracking this for FF6.
Comment 4 Robert Kaiser 2011-07-12 08:15:17 PDT
(In reply to comment #2)
> Kairo, can we get some correlations for this one.

We don't have correlations with modules for 6.0 right now, and there's no way to force those reports. We need to wait until we have enough crashes on it.
And I'm not even sure how chofmann's signature grouping reports work, unfortunately.
Comment 5 chris hofmann 2011-07-13 08:11:44 PDT
the correlation reports work off an older method of picking a few of the releases with the highest volume crashes to 'automatically' manage where we see the correlation reports.  I think arvind put that script together.  now that we have rapid release we might be able to do correlations based on channel for nightly aurora and beta without taxing the system for these extra reports, since the volume there should be lower.  

it might be a good idea to file a bug on this and cc' rhelmer who has picked up some of aravinds scripts.
Comment 6 Scoobidiver (away) 2011-07-13 08:17:58 PDT
(In reply to comment #5)
> it might be a good idea to file a bug on this and cc' rhelmer who has picked
> up some of aravinds scripts.
bug 623164?
Comment 7 chris hofmann 2011-07-13 11:36:28 PDT
sure.  added the additional background to the clean up parts that were initially talked about in that bug.
Comment 8 Asa Dotzler [:asa] 2011-07-18 00:03:47 PDT
(In reply to comment #3)
> Tracking this for FF6.

Sheila, do we have enough data with beta2 to make any better gauge of this?
Comment 9 Robert Kaiser 2011-07-18 10:23:25 PDT
#9 on 6.0b2 yesterday.
Comment 10 Sheila Mooney 2011-07-18 10:26:35 PDT
Asa, for b2 it's still high. No obvious leads when we look at the add on correlation. We do see it in previous versions 3.6 but very low volume. None on 5.0 in the last week.
Comment 11 Boris Zbarsky [:bz] (still a bit busy) 2011-07-18 10:53:42 PDT
Line 1686 on the 6.0 branch is this:

  1686    *result = SegmentIs(mScheme, scheme);

I spot-checked a few of the crashes, and they all share the following features:

1)  The crash is at address 0 or nearly 0 with a write violation.
2)  The crash is on that line above, which implies that |result| is a null
    pointer.
3)  The two stack frames above the SchemeIs call are from non-Gecko DLLs.  Some
    of the DLLs that appear there are:

  NativeMaindrv.dll
  RWeUJ7MEPKMj.cpl
  tDaqdM02O.cpl
  Directmap80.dll
  DevWebAgent.dll
  yo0O9efo.cpl

4) The stack frame above that is either the call to InternalLoad in
   nsDocShell::LoadURI or the call to InternalLoad in OnLinkClickSync.

I don't know what to make of item 3 above nor how reliable that is...
Comment 12 Ted Mielczarek [:ted.mielczarek] 2011-07-18 12:08:23 PDT
Frame 1 is almost certainly correct, since frame 0 is in our code and we have unwind information for it. Frame 2 and beyond may be suspect since we may resort to stack scanning there.
Comment 13 Ted Mielczarek [:ted.mielczarek] 2011-07-18 12:17:51 PDT
I amend my previous statement. I downloaded the dump from comment 0, as well as the matching xul.sym from the symbol server, and running minidump_stackwalk locally gives me:
Thread 0 (crashed)
 0  xul.dll!nsStandardURL::SchemeIs(char const *,int *) [nsStandardURL.cpp:0d82a53ffaa6 : 1686 + 0x7]
    eip = 0x57b6fcf8   esp = 0x0018b6f4   ebp = 0x0018b74c   ebx = 0x00000000
    esi = 0x128d82f0   edi = 0x00000000   eax = 0x00000000   ecx = 0x0018b728
    edx = 0x0000001c   efl = 0x00010246
    Found by: given as instruction pointer in context
 1  yo0O9efo.cpl + 0xe67b
    eip = 0x0c99e67c   esp = 0x0018b6fc   ebp = 0x0018b74c
    Found by: call frame info with scanning

so the stackwalker did resort to scanning here, which means this is not an entirely reliable frame.
Comment 14 Ted Mielczarek [:ted.mielczarek] 2011-07-18 12:18:19 PDT
I'll try WinDBG to see what it thinks...
Comment 15 Ted Mielczarek [:ted.mielczarek] 2011-07-18 12:32:31 PDT
FWIW, I question the integrity of that randomly-named .cpl file:
 C:\Program Files\c3QeO0ey0lx\yo0O9efo.cpl
Comment 16 Ted Mielczarek [:ted.mielczarek] 2011-07-18 12:35:35 PDT
Created attachment 546605 [details]
stack from windbg

WinDBG gave me something like:
xul!nsStandardURL::SchemeIs
xul!nsTHashtable<nsBaseHashtableET<nsCStringHashKey,nsFactoryEntry *> >::s_MatchEntry
xul!SearchTable
xul!PL_DHashTableOperate
xul!nsCOMPtr_base::~nsCOMPtr_base
xul!nsComponentManagerImpl::GetServiceByContractID
xul!nsDocShell::LoadURI
xul!nsLocation::SetURI
xul!nsLocation::SetHrefWithBase
xul!nsLocation::SetHrefWithContext
xul!nsLocation::SetHref
xul!nsWindowSH::SetProperty

(full stack attached). I don't know if that looks any saner, but it is nothing but Mozilla code on the stack.
Comment 17 Daniel Holbert [:dholbert] 2011-07-18 13:52:09 PDT
OK -- so nsDocShell::LoadURI has this:
> 1365     nsCOMPtr<nsIScriptSecurityManager> secMan =
> 1366         do_GetService(NS_SCRIPTSECURITYMANAGER_CONTRACTID, &rv);
http://mxr.mozilla.org/mozilla-beta/source/docshell/base/nsDocShell.cpp#1365
...which invokes the component manager code to look up that service.

The hashtable that's being searched looks like this:
> 151     nsDataHashtable<nsCStringHashKey, nsFactoryEntry*> mContractIDs;
http://mxr.mozilla.org/mozilla-central/source/xpcom/components/nsComponentManager.h#151

and the hashtable ::s_MatchEntry() function just does this:
>374   return ((const EntryType*) entry)->KeyEquals(
>375     reinterpret_cast<const KeyTypePointer>(key));
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsTHashtable.h#370

At first glance, I'd expect that we'd only hit string-comparison code there (to compare two nsCStringHashKey objects) -- not sure why we're hitting nsStandardURL-comparison code instead.  Maybe I'm misunderstanding, though.
Comment 18 Daniel Holbert [:dholbert] 2011-07-18 14:08:48 PDT
So if I break (in a debug trunk build) at the first chunk in Comment 17, and single-step, I eventually hit what looks like the "s_MatchEntry" call from Ted's stack.  Here's the function I'm in, copied from GDB:
> nsTHashtable<nsBaseHashtableET<nsCStringHashKey, nsFactoryEntry*> >::s_MatchEntry

When I single-step through that, I enter " nsCStringHashKey::KeyEquals" (defined in nsHashKeys.h), which just does a string comparison (nsTSubstring_CharT::Equals) and then returns.  No URI code is invoked at all.  

I have no idea how we could end up triggering a call to nsStandardURL::SchemeIs...
Comment 19 Daniel Holbert [:dholbert] 2011-07-18 15:07:54 PDT
(In reply to comment #11)
> 1)  The crash is at address 0 or nearly 0 with a write violation.

Some of them are at other addresses with a write violation, too -- e.g. the following are at 0x6ff7115f and 0x725c115f, respectively:
 bp-a41c6b48-ffd9-4f6e-a37f-8d9db2110718
 bp-5b60bcd6-394e-408c-ade3-07d8e2110718
Comment 20 Daniel Holbert [:dholbert] 2011-07-18 15:26:22 PDT
(In reply to comment #11)
> 2)  The crash is on that line above, which implies that |result| is a null
>     pointer.

BTW, the stack that Ted attached confirms this, for that crash.  It has "int * result = 0x00000015" in the first line.
Comment 21 Daniel Holbert [:dholbert] 2011-07-18 15:53:12 PDT
A few levels up Ted's stack, things are looking pretty busted -- here are 3 consecutive stack levels (partially trimmed for readability): 
> xul!SearchTable(struct PLDHashTable * table = 0x0c993322, void * key = 0x128d82f0, unsigned int keyHash = 0x15, PLDHashOperator op = 1470972173 (No matching enumerant)
> xul!PL_DHashTableOperate(struct PLDHashTable * table = 0x0018b728, void * key = 0x0000001c, PLDHashOperator op = <Memory access error>
> xul!nsCOMPtr_base::~nsCOMPtr_base(void)+0xe [e:\builds\moz2_slave\rel-m-beta-w32-bld\build\obj-firefox\xpcom\build\nscomptr.cpp @ 82]

Note in particular:
 * 1st line: Bogus PLDHashOperator value 1470972173  (Valid values are 0,1,2)
             Likely-bogus keyHash = 0x15 (that's a distinctly un-random-looking hash value, and it happens to end up exactly matching our "int * result" in prev comment, which is suspicious)
 * 2nd line: Likely-bogus "void * key" pointer, 0x0000001c
 * 3rd line: Looks like it could trigger a delete operation -- that's basically the only way for code to be invoked from ~nsCOMPtr.  Not sure why "delete" ends up translating to "do some bogus hash table operations" though.

I wonder if this is a double-free or something...
Comment 22 Daniel Holbert [:dholbert] 2011-07-18 16:19:14 PDT
(Looks like we have a bundle of crashes in nsSimpleURI::SchemeIs, with similar-looking stacks, BTW. e.g. bp-831337db-a24c-461c-b8d4-19ac32110717 )
Comment 23 Daniel Holbert [:dholbert] 2011-07-18 19:51:17 PDT
Created attachment 546700 [details]
alternate stack from MSVC

Here's a stack of a different (but related) crash report -- bp-831337db-a24c-461c-b8d4-19ac32110717 -- which is a crash in nsSimpleURI::SchemeIs.  (with user comments "Starting Aurora for the first time.")  In this case, we crash when setting the outparam *o_equals, because it's a null pointer.(Incidentally, |this| is null too, though.)

Snippet of stack:
> nsSimpleURI::SchemeIs(const char * i_Scheme, int * o_Equals)  Line 483
> nsScriptSecurityManager::QueryInterface(const nsID & aIID, void * * aInstancePtr)  Line 514 + 0x48 bytes
> nsComponentManagerImpl::GetServiceByContractID(const char * aContractID, const nsID & aIID, void * * result)  Line 1642 + 0x1d bytes
> nsDocShell::LoadURI(nsIURI * aURI, nsIDocShellLoadInfo * aLoadInfo, unsigned int aLoadFlags, int aFirstParty)  Line 1469 + 0x3e bytes
> nsLocation::SetURI(nsIURI * aURI, int aReplace)  Line 364 + 0x12 bytes
> nsLocation::SetHrefWithBase(const nsAString_internal & aHref, nsIURI * aBase, int aReplace)  Line 646 + 0x15 bytes

From debugging this minidump in MSVC, the stack mostly appears sensible, right up to the nsSimpleURI::SchemeIs call -- which makes no sense (how'd we get there?)  It appears that nsScriptSecurityManager::QueryInterface is calling "SchemeIs", but that's bogus, as the actual code there (implemented via NS_IMPL_ISUPPORTS4) does no such thing.
Comment 24 Ted Mielczarek [:ted.mielczarek] 2011-07-19 08:47:51 PDT
Per my comment 15, it's possible that there's malware involved here doing something awful.
Comment 25 Daniel Holbert [:dholbert] 2011-07-19 09:46:45 PDT
Hmmm, yeah...  The fact that things get mysterious (e.g. jump into the mysterious .cpl / .dll from ::LoadURI, according to crash stats) is a bit suspicious -- maybe this is malware trying to intercept visited URIs, either for redirection ("sure, paypal.com is right over here") or logging purposes.  (Maybe calling SchemeIs to check for https?)

This *does* seem to be entirely windows-only, which increases the apparent likelihood of it all being due to a particular piece of [windows] malware. (In the last 4 weeks, there were 3666 crashes at nsStandardURL::SchemeIs & 165 at nsSimpleURI::SchemeIs, 100% on Windows.)

I'm unclear on why WinDBG / MSVC would disagree with crash-stats about the mysterious-.cpl files being involved (in comment 16 & comment 23), though. I guess the moz-only alternative stacks don't make a ton of sense anyway...
Comment 26 Scoobidiver (away) 2011-07-19 10:00:59 PDT
Can some users who let their email addresses in crash reports be contacted to get more information? Which information is needed?
Ref: https://support.mozilla.com/kb/crash-report-email-faq
Comment 27 Ted Mielczarek [:ted.mielczarek] 2011-07-19 10:16:08 PDT
Yeah, I don't know if that's actually a code address in that .cpl, but if you look at the stack memory from the crashing thread in the dump from comment 0, you'll clearly see:
0018b6f4 98 06 0d 12 7c e6 99 0c f0 82 8d 12 24 b7 18 00 1c 00

That address (0x0c99e67c) is the second word of memory on the stack. Not a smoking gun, but certainly suspicious.
Comment 28 Daniel Holbert [:dholbert] 2011-07-19 12:34:12 PDT
So in every instance of this I've seen, the following is true:
 (a) There's a non-Mozilla .dll or .cpl file on the stack, inside of ::LoadURI (which is a suspicious place to be jumping into external code, per the first chunk in comment 25)

 (b) In the "Raw Dump" view, that dll/cpl is missing a bunch of metadata that's present in (virtually) all other loaded modules. e.g. here's the shady dll:
> Module|oleobjspl.dll||||0x10000000|0x1001efff|0
vs. here's what the rest of the modules look like:
> Module|srvcli.dll|6.1.7601.17514|srvcli.pdb|B83FAA07A5A644DEAF314DD746429E092|0x72330000|0x72348fff|0
https://crash-stats.mozilla.com/report/index/8cd29eea-4ccb-49a9-8130-f9c672110719
Ted says in IRC: "99% of legit software will have version/debug id info. anything lacking it is automatically suspicious. that's a pretty sure sign of a virus/malware/trojan/whatever."

 (c) In cases where the mysterious dll/cpl has a "normal"-looking name (e.g. SysPathVdm.dll, oleobjspl.dll, HpMainSnap.dll), a google-search for the filename turns up nothing, which suggests that it might be a randomly-generated-but-believable filename (from a wordlist, or from making tweaks to existing DLL names)

Conclusion: These facts *strongly* suggest that this is from one or more pieces of malware that use randomly-named (and metadata-lacking) DLL / CPL files.
Comment 29 Daniel Holbert [:dholbert] 2011-07-19 17:49:04 PDT
(In reply to comment #26)
> Can some users who let their email addresses in crash reports be contacted
> to get more information? Which information is needed?
> Ref: https://support.mozilla.com/kb/crash-report-email-faq

That's probably a good idea, though I'm not sure what we'd ask them.

I imagine we'd want them to run some malware or virus scanner, and in an ideal world, we'd end up with either or both of the following results.
 (a) the scanner would detect the same malware across different users
 (b) the scanner would remove some malware, and in doing so, fix the Firefox crashes

If we get (a), that'd tell us there's a pattern & that our hunch about the cause is likely correct.  If we get (b), that'd confirm the cause (and fix the problem for the users we contact, which is nice.)

This could also go wrong in a number of ways, though -- the scanner might detect false-positive malware that's not involved at all (since there are just a lot of malware-infected machines out there), or it might try & fail to remove malware and leave the machine in a less-functional state (unlikely but worth considering when we ask users to do something), or the scanner might fail to find anything 'cause it's missing signatures for whatever malware is involved here. (particularly considering the random-ish DLL / CPL filename)

So, the above are my thoughts/concerns about contacting users.  I have no idea which malware scanner(s) to recommend, or who (cww?) would drive the user-contacting effort...
Comment 30 Scoobidiver (away) 2011-07-20 00:55:46 PDT
I think it's the equivalent of bug 633445 which is #3 top browser crasher in 5.0 (159 occurrences in 6.0 but not related to malware) while this one is #3 top browser crasher in 6.0 (7 occurrences in 5.0 or 5.0.1 but not related to malware).
Let's assign it to Chris that will give email addresses to Cheng.
Comment 31 [:Cww] 2011-07-20 09:29:24 PDT
I need the email addresses and the name of the DLL... I can ask users to search their computers and email me that DLL.

That's about all that I think users will be able to do given the information we have.  I won't recommend AV software since I'd rather not troubleshoot other people's software and pretty much all malware these days tries its darndest to cripple your AV.
Comment 32 Scoobidiver (away) 2011-07-20 09:52:32 PDT
For info., here is the SUMO article about malware that targets Firefox:
http://support.mozilla.com/en-US/kb/Is%20my%20Firefox%20problem%20a%20result%20of%20malware
The failure to start Firefox is one of the symptoms.

According to me, the goal of contacting some users is not to help troubleshoot their problem (there's already the article above), but to get one or several kinds of dlls in order to make Firefox more robust against these threats.
But maybe it's better to have a big visible symptom than invisible malicious ones. The risk is that Mozilla will be held for responsible of their problem instead of the malware and they will switch to another browser.
Comment 33 chris hofmann 2011-07-20 09:57:06 PDT
sent mail to cww with a list of reports that have email addresses to see if he can make headway
Comment 34 Daniel Holbert [:dholbert] 2011-07-20 10:00:51 PDT
(In reply to comment #31)
> I need the email addresses and the name of the DLL...

(the name of the DLL or CPL file varies per-report -- hopefully you can grab that on a case-by-case basis from the reports that choffman sent you.  Thanks cww!)
Comment 35 chris hofmann 2011-07-20 10:20:11 PDT
agree with scoobidiver in comment 32, so as soon as we figure out what to tell users we should consider turning on the e-mail auto-responder for these signatures.
Comment 36 Scoobidiver (away) 2011-07-20 12:21:49 PDT
(In reply to comment #35)
> what to tell users
After Cheng got some dlls, we can sent something like that:
"Hello,

Some time ago, Firefox crashed on your computer and you sent us a crash report using the Mozilla Crash Reporter tool. We sent you an e-mail because we have more info about this crash, and you requested in the report that you wanted to be informed whenever we do. 

We established that this crash was caused by malware that targets among others Firefox. This malware uses randomly named dll or cpl files so as not to be identified by anti-virus and anti-spyware software. Please follow the procedure in the following support article to try to get rid of it:
http://support.mozilla.com/kb/Is%20my%20Firefox%20problem%20a%20result%20of%20malware

If you've questions about this email, see:
http://support.mozilla.com/kb/crash-report-email-faq

We hope this email helped you solve your crash issue.

   The Mozilla Security team"
Comment 37 Daniel Holbert [:dholbert] 2011-07-26 12:50:11 PDT
Cww spoke with me & LegNeato this morning about maaaaaaaybe backing out bug 308590 and all its dependencies / helper-bugs, with the goal of giving AV companies another 6 weeks to killkillkill this malware before we ship an update that will make it trip over itself and explode in Firefox.

Cww points out that the current situation is this: if we release Firefox 6.0 as-is, our #6 beta topcrash will likely become a much-more-frequent crash for our much larger release audience, because they're likely to be more malware-ridden than our beta audience.  That's bad.

From one perspective, it's better to crash like this than to let the malware hijack us to spy on users.  However, we have reason to believe that this malware targets multiple browsers -- so the crash doesn't really stop any spying, because users may continue to be spied on when they try another browser after hitting this bug.

So, just to experiment, I ran a trial backout locally, to see how feasible that'd be.  The results were good - I had no merge conflicts.  The bugs that backed out locally were:
  Bug 662242, Bug 659698, Bug 659177, Bug 658949, Bug 658845, Bug 308590
with these 15 csets:
  http://hg.mozilla.org/releases/mozilla-beta/rev/903365ff5efa
  http://hg.mozilla.org/releases/mozilla-beta/rev/84ef9983832e
  http://hg.mozilla.org/releases/mozilla-beta/rev/bb5904c365a2
  http://hg.mozilla.org/releases/mozilla-beta/rev/f5c062e99196
  http://hg.mozilla.org/releases/mozilla-beta/rev/c4d58bb8f5ad
  http://hg.mozilla.org/releases/mozilla-beta/rev/41371c9fe361
  http://hg.mozilla.org/releases/mozilla-beta/rev/96fae135f592
  http://hg.mozilla.org/releases/mozilla-beta/rev/76f08a32c046
  http://hg.mozilla.org/releases/mozilla-beta/rev/dbce2f14576e
  http://hg.mozilla.org/releases/mozilla-beta/rev/655514007ebd
  http://hg.mozilla.org/releases/mozilla-beta/rev/a006860c018d
  http://hg.mozilla.org/releases/mozilla-beta/rev/835e9789e934
  http://hg.mozilla.org/releases/mozilla-beta/rev/add8d8d67ba8
  http://hg.mozilla.org/releases/mozilla-beta/rev/e9c7616c4f72
  http://hg.mozilla.org/releases/mozilla-beta/rev/ee42941252f6

In addition to the mega-backout, we'd also need a one-off fix for the inverse of bug 662242 -- basically, in order for sessionrestore to work across the backout-update for current beta users, we'd need to add a chunk of code much like bug 662242's patch but with the IIDs reversed.  (because otherwise the nsIURI IID change will break their deserialization in one particular part of session restore.)

I'll leave it up to release drivers to decide whether or not we want this backout, but I have at least established that "it's possible and not too painful."
Comment 38 [:Cww] 2011-07-26 14:10:59 PDT
For the record, the malware is a variant (and possibly an undetected variant) of: http://www.symantec.com/security_response/writeup.jsp?docid=2010-121016-0900-99&tabid=3
Comment 39 Daniel Holbert [:dholbert] 2011-07-26 17:49:46 PDT
So, Bug 308590 made 3 tweaks to the nsIURI interface, and 2 of them are before the declaration of "SchemeIs".

jst says it's possible that if we shift those tweaks to all live at the *end* of the nsIURI interface, then this problem will go away, because the first chunk of the vtable will look like it used to.  That's worked with some similar bugs in the past, anyway.

That would require IID revving and an additional hackaround along the lines of bug 662242, but it's significantly more appealing than a full backout, and it might "fix" this for good regardless of how quickly AV companies can jump on this.

I'm spinning up a patch to hopefully make this change on trunk today, so that we can figure out whether this strategy works ASAP.
Comment 40 Daniel Holbert [:dholbert] 2011-07-26 18:28:23 PDT
Created attachment 548651 [details] [diff] [review]
patch 1: shift nsIURI changes to end of interface
Comment 41 Daniel Holbert [:dholbert] 2011-07-26 18:32:50 PDT
Created attachment 548652 [details] [diff] [review]
patch 2: rev IIDs in IDL files inheriting from nsIURI

This patch revs the IIDs of every interface that inherits (directly or indirectly) from nsIURI.

I verified that this is all of the interfaces whose IIDs we changed in the csets mentioned in comment 37.
Comment 42 Daniel Holbert [:dholbert] 2011-07-26 18:54:04 PDT
Created attachment 548654 [details] [diff] [review]
patch 3: Swap out previous nsIURI IID for new one, in nsBinaryInputStream::ReadObject

This third patch extends the fix in bug 662242 to catch the newly-obsolete nsIURI IID and replace it with the new one, when we QI a freshly-deserialized principal during session restore.
Comment 43 Johnny Stenback (:jst, jst@mozilla.com) 2011-07-26 19:02:43 PDT
I think it's very worth getting these patches landed asap to see if it seems this eliminates the crashes we've been tracking here. If so, we should consider this as an option for the branches as well.
Comment 44 Daniel Holbert [:dholbert] 2011-07-26 19:16:47 PDT
(confirmed via local testing that patch 2 re-breaks bug 662242 and patch 3 re-fixes it. Hooray!)

(In reply to comment #43)
> I think it's very worth getting these patches landed asap

Agreed - I intend to land them tonight, for tomorrow's nightly build.
Comment 45 Scoobidiver (away) 2011-07-27 01:11:34 PDT
(In reply to comment #43)
> I think it's very worth getting these patches landed asap to see if it seems
> this eliminates the crashes we've been tracking here. If so, we should
> consider this as an option for the branches as well.
It happens only once every 3-5 days in 8.0a1, so it will be hard to determine if it is fixed without a testcase:
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A8.0a1&query_search=signature&query_type=exact&query=nsStandardURL%3A%3ASchemeIs%28char%20const*%2C%20int*%29&reason_type=contains&range_value=4&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=nsStandardURL%3A%3ASchemeIs%28char%20const*%2C%20int*%29

Cheng, can you ask to someone that reliably crashes in 6.0 to test the new build?
Comment 46 [:Cww] 2011-07-27 12:56:41 PDT
This is malware, now that I know how to fix, I've been telling users to delete the file so I don't have users who have reproducible crashes.
Comment 47 Daniel Holbert [:dholbert] 2011-07-27 13:38:48 PDT
Started discussion with release-drivers on this - it's not clear yet whether or not we want to take this fix on beta, due to how soon the next beta-->release switchover is & the potential for addon-author-pain caused by IID revving.

Will update this bug when we know more.
Comment 48 Daniel Holbert [:dholbert] 2011-07-27 14:13:05 PDT
Sorry -- I forgot to mention that I landed the attached patches on m-c last night, so they're included in today's nightly build.
 http://hg.mozilla.org/mozilla-central/rev/5a6baeef6e16
 http://hg.mozilla.org/mozilla-central/rev/bf75a52ec0fa
 http://hg.mozilla.org/mozilla-central/rev/59aa79bfa11d
Comment 49 Scoobidiver (away) 2011-07-27 14:28:37 PDT
(In reply to comment #46)
> This is malware, now that I know how to fix, I've been telling users to
> delete the file so I don't have users who have reproducible crashes.
Try some you didn't talk to previously by asking new email addresses to Chris.
Comment 50 [:Cww] 2011-07-27 14:43:06 PDT
(In reply to comment #49)
Already in progress... I was just referring to the fact that I can't just do it right away.
Comment 51 Daniel Holbert [:dholbert] 2011-08-02 11:53:40 PDT
Crash-stats sees no instances of this from 20110727 or newer nightlies. (which have the csets from comment 48):
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A8.0a1&query_search=signature&query_type=contains&query=nsStandardURL%3A%3ASchemeIs%28char%20const%2A%2C%20int%2A%29&reason_type=contains&date=08%2F02%2F2011%2011%3A43%3A46&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=nsStandardURL%3A%3ASchemeIs%28char%20const%2A%2C%20int%2A%29

There are 2 crashes that *happened* since 2011-7-27, but those users were running unpatched out-of-date-by-a-few-days nightly builds.

So, the (sparse) nightly crash data so far suggests that Comment 48 worked.

Cww contacted some users per Comment 50 to ask them if patched builds fix their crash, but he hasn't heard back yet.

I'm going to request approval to land on Aurora.  I intend to wait one more day before landing, for more crash-stats data and possible-user-responses-to-Cww, and then I'll assume the best & land.
Comment 52 Daniel Holbert [:dholbert] 2011-08-02 16:24:22 PDT
Landed patches on aurora:
 http://hg.mozilla.org/releases/mozilla-aurora/rev/276705ef5f1f
 http://hg.mozilla.org/releases/mozilla-aurora/rev/20657d07fec3
 http://hg.mozilla.org/releases/mozilla-aurora/rev/c87e7fe60bdc

Tentatively resolving as FIXED, but we should watch crash-stats (on nightly as well as aurora) to verify.

(In reply to comment #47)
> Started discussion with release-drivers on this - it's not clear yet whether
> or not we want to take this fix on beta, due to how soon the next
> beta-->release switchover is & the potential for addon-author-pain caused by
> IID revving.

To follow up on this: we ended up leaning against taking this on beta, with the feeling that the fix would likely cause more collective pain than leaving it unfixed on that branch (due to the IID rev likely breaking a bunch of binary addons, without much time for them to react before release).
Comment 53 Sheila Mooney 2011-08-08 14:31:06 PDT
We decided to fix this on Aurora, Trunk but not Beta. There is really no point continuing to track it for 6.0. If for some reason it becomes a serious issue after 6.0 is released, the craskill team can track it.
Comment 54 Daniel Holbert [:dholbert] 2011-08-09 00:39:31 PDT
(In reply to Daniel Holbert [:dholbert] from comment #52)
> Tentatively resolving as FIXED, but we should watch crash-stats (on nightly
> as well as aurora) to verify.

Just to follow up on this -- the fix does appear to have stuck!

Here's a search I just did, for crashes with this signature on Aurora over the last 2 weeks:
https://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A7.0a2&version=Firefox%3A7.0a1&platform=windows&query_search=signature&query_type=contains&query=nsStandardURL%3A%3ASchemeIs&reason_type=contains&date=08%2F09%2F2011%2000%3A23%3A31&range_value=2&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=nsStandardURL%3A%3ASchemeIs%28char%20const%2A%2C%20int%2A%29

That has 89 results (including some crashes submitted yesterday), but the last *build* with any results returned is 2011080200, which uncoincidentally is the day that the patches landed on Aurora (fixing the subsequent nightly).

Also, there's at least one crash submitted on each build for the week before 20110802 (with four days that have >10 crashes).  So, the fact that we've now had 6 straight days of generating apparently-non-crashy builds is significant.

(Same sort of thing on Nightly channel, though less dramatic.  The most recent build with crashes there is 2011072600, which is the last unpatched build on that channel.)

So, I'm marking this VERIFIED|FIXED, based on the above crash analysis.
Comment 55 Robert Kaiser 2011-08-17 15:10:57 PDT
As expected, this is hitting us pretty badly after 6.0 shipped. This is the #4 report overall in yesterday's data for 6.
Comment 56 Per Beijer 2011-08-18 10:35:46 PDT
This happened to me after firefox 5 updated itself to 6. The dll casuing the problem in my case was called apphelpinterval.dll. After removing it from the registry and rebooting firefox 6 started working.

This is my crash report: https://crash-stats.mozilla.com/report/index/c2720c47-7a6d-4a6a-96df-596282110817.

A virus scan of the dll file is here: http://www.virustotal.com/file-scan/report.html?id=904c029f2818d9136f6a4420abd26d4034b0f8970978e31255cf7034bac544c2-1307082643
Comment 57 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-18 17:00:11 PDT
Can we set up the auto-mailer to mail users crashing with this signature to tell them they have malware?
Comment 58 Scoobidiver (away) 2011-08-19 14:35:58 PDT
*** Bug 680418 has been marked as a duplicate of this bug. ***
Comment 59 Scoobidiver (away) 2011-08-23 00:25:29 PDT
bug 680418 shows malware with no randomly named dll: DesktopGLSvcs.dll.
Comment 60 Daniel Holbert [:dholbert] 2011-08-23 01:15:30 PDT
(In reply to Scoobidiver from comment #59)
> bug 680418 shows malware with no randomly named dll: DesktopGLSvcs.dll.

I don't think that's new information, if I'm understanding you correctly.  I've been using "random" to mean "different on each system" (which suggests that it's being randomly generated), but not necessarily gibberish.  See e.g. comment 11, comment 28, comment 56, which all have superficially-legit-sounding DLL filenames.
Comment 61 Marcia Knous [:marcia - use ni] 2011-08-26 16:10:10 PDT
*** Bug 680127 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.