Closed Bug 1337418 Opened 3 years ago Closed 3 years ago

Crash in nsACString_internal::Assign | nsACString_internal::Assign | mozilla::media::OriginKeyStore::OriginKeysTable::GetOriginKey

Categories

(Core :: WebRTC: Audio/Video, defect, P1, critical)

50 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 53+ fixed
firefox53 --- fixed
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: philipp, Assigned: baku)

Details

(4 keywords, Whiteboard: [adv-main53+][adv-esr52.1+])

Crash Data

This bug was filed from the Socorro interface and is 
report bp-81913aa2-e1af-498f-9d16-55fc52170207.
=============================================================
Crashing Thread (74)
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsACString_internal::Assign(nsACString_internal const&, mozilla::fallible_t const&) 	xpcom/string/nsTSubstring.cpp:446
1 	xul.dll 	nsACString_internal::Assign(nsACString_internal const&) 	xpcom/string/nsTSubstring.cpp:412
2 	xul.dll 	mozilla::media::OriginKeyStore::OriginKeysTable::GetOriginKey(nsACString_internal const&, nsCString&, bool) 	dom/media/systemservices/MediaParent.cpp:83
3 	xul.dll 	mozilla::media::OriginKeyStore::OriginKeysLoader::GetOriginKey(nsACString_internal const&, nsCString&, bool) 	dom/media/systemservices/MediaParent.cpp:119
4 	xul.dll 	<lambda_c0892d3761bf70011bbdeba5a013f228>::operator() 	dom/media/systemservices/MediaParent.cpp:418
5 	xul.dll 	mozilla::media::LambdaRunnable<<lambda_c0892d3761bf70011bbdeba5a013f228> >::Run() 	dom/media/systemservices/MediaUtils.h:194
6 	xul.dll 	nsThreadPool::Run() 	xpcom/threads/nsThreadPool.cpp:225
7 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1067
8 	xul.dll 	NS_ProcessNextEvent(nsIThread*, bool) 	xpcom/glue/nsThreadUtils.cpp:311
9 	xul.dll 	mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:368
10 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:225
11 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:205
12 	xul.dll 	nsThread::ThreadFunc(void*) 	xpcom/threads/nsThread.cpp:465
13 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c:397
14 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c:95
15 	ucrtbase.dll 	thread_start<unsigned int (__stdcall*)(void*)> 	
16 	kernel32.dll 	BaseThreadInitThunk 	
17 	ntdll.dll 	__RtlUserThreadStart 	
18 	ntdll.dll 	_RtlUserThreadStart

this signature is on the rise since 3 days ago and occurring on windows and os x so far. two user comments seem to reference "Wayfair" - i'm not sure what this is exactly, but maybe url correlations for the signature would help.

the crashing address of a small fraction of these reports indicates a uaf-situation, so i'm marking the bug as security sensitive as a precaution too.
There are several www.argos.co.uk, e.mail.ru, www.wayfair.com, *.gap.com, shop.lenovo.com

Most URLs seem to be in the shopping section of websites.

Perhaps they are all using the same e-commerce CMS.
Why are they using the OriginKeyStore.... I thought that was for the DRM interface? is it being repurposed as a tracker?

Anthony: who should we CC on this one?
Group: core-security → media-core-security
Component: General → Audio/Video
Flags: needinfo?(ajones)
The crashes spiked in early February (slightly elevated a few days, then boom on Feb 5&6). Did we unthrottle the release on that date? The vast majority of the crashes are in 51.0.1, though there are a scattered few on earlier releases. We'll have to check with the release user numbers, but it looks more like a site started using a feature than release-related.

There seem to be two distinct symptoms here. The majority of the crashes are a read violation at 0x4: we're passing a null string to Assign() and it crashes here:
https://hg.mozilla.org/releases/mozilla-release/annotate/327e081221b0/xpcom/string/nsTSubstring.cpp#l427

The Use-after-free crashes happen 20 lines later, writing to freed memory when trying to get an owning reference to a shared string.
https://hg.mozilla.org/releases/mozilla-release/annotate/327e081221b0/xpcom/string/nsTSubstring.cpp#l446
(In reply to Daniel Veditz [:dveditz] from comment #3)
> The crashes spiked in early February (slightly elevated a few days, then
> boom on Feb 5&6). Did we unthrottle the release on that date?

the timing seems unrelated to our releases then. we unthrottled auto-updates for release users to 51.0.1 on jan 27.
Chris - is this you?
Flags: needinfo?(ajones) → needinfo?(cpearce)
dom/media/systemservices/* is WebRTC code, not media playback code.
Flags: needinfo?(cpearce) → needinfo?(rjesup)
NI-ing jib (who wrote much of the MediaParent code) and baku (who knows origins, etc, and just rewrote this).

Note that all the GetOriginKey/etc code was rewritten by baku to be GetPrincipalKey/etc. (bug 1320170) -- very recently (~ 2 weeks ago)
Flags: needinfo?(rjesup)
Flags: needinfo?(jib)
Flags: needinfo?(amarchesini)
Rank: 10
Component: Audio/Video → WebRTC: Audio/Video
Priority: -- → P1
(In reply to Randell Jesup [:jesup] from comment #7)
> Note that all the GetOriginKey/etc code was rewritten by baku to be
> GetPrincipalKey/etc. (bug 1320170) -- very recently (~ 2 weeks ago)

That seems to line up with when this started to happen...

These keys are stored and retreived from disk. See the enumerate_devices.txt in the profile dir. It looks to me like bug 1320170 broke the format of this file and didn't bump the internal version number from 1 to 2. 

Perhaps people are trying Nightly with the same profile as 51... I tried it, and wasn't immediately able to make it crash. Will look closer tomorrow.
Flags: needinfo?(jib)
Filed bug 1340163 on the issue in comment 8. Still working on reproducing a crash.
Note that this backtrace shows code older than bug 1320170: mozilla::media::OriginKeyStore::OriginKeysTable::GetOriginKey doesn't exist anymore after bug 1320170.
Flags: needinfo?(amarchesini)
i see no crashes in 53 or 54 in the last week, and no crashes with GetPrincipalKey() on the stack.  So I presume that bug 1320170 may have actually fixed this.  However... even beta isn't showing many crashes (very few); the spike is all in release when 51.0.1 went out to users.  The volume of testing in aurora/53 and 54 may be too low to show a signal.

baku: thoughts?

jib: given the crash frequency, it may be hard to repro
Flags: needinfo?(amarchesini)
It will be interesting to see if the crashes return once bug 1340163 lands.

From what I can tell, regardless of signature, all crashes come from this line [1]:

      OriginKey* key;
      if (!mKeys.Get(aOrigin, &key)) {
        nsCString salt; // Make a new one
        nsresult rv = GenerateRandomName(salt, key->EncodedLength);
        if (NS_WARN_IF(NS_FAILED(rv))) {
          return rv;
        }
        key = new OriginKey(salt);
        mKeys.Put(aOrigin, key);
      }
      if (aPersist && !key->mSecondsStamp) {
        key->mSecondsStamp = PR_Now() / PR_USEC_PER_SEC;
        mPersistCount++;
      }
 -->  aResult = key->mKey;

where key is null (the different signatures stem from internal state in nsCString, reuse of string-buffer optimizations) AFAICT.

The only way for key to be null here (short of new returning null, which I assume it can't) would be for mKeys to contain a nullptr entry. Looking at the two mKeys.Put(), I see no way for that to happen naturally either, short of memory-corruption.

A bit oddly, I found no crashes 4 lines above, at:

      if (aPersist && !key->mSecondsStamp) {

but that can be explained by aPersist being false on calls to gUM and enumerateDevices. The only time it would be true is on first-time gUM success on a new site, which I imagine is relatively rare.

Ideas welcome.

[1] https://hg.mozilla.org/releases/mozilla-release/annotate/327e081221b0/dom/media/systemservices/MediaParent.cpp#l83
(In reply to Randell Jesup [:jesup] from comment #11)
> i see no crashes in 53 or 54 in the last week, and no crashes with
> GetPrincipalKey() on the stack.  So I presume that bug 1320170 may have
> actually fixed this.  However... even beta isn't showing many crashes (very
> few); the spike is all in release when 51.0.1 went out to users.  The volume
> of testing in aurora/53 and 54 may be too low to show a signal.

Note that we had crashes in the 2/9 and 2/16 builds of 52 beta.  None since.  and bug 1320170 was NOT uplifted to 52, since it didn't affect 52.  

baku?  jib? more thoughts?
Flags: needinfo?(jib)
Bug 1320170 caused a lot of the deviceId machinery to stop working (bug 1340163), so it's possible it's concealing the problem, and that crashes will come back once bug 1340163 lands. We also can't consider uplifting bug 1320170 to 52 until then, if we're thinking about doing that.

It's also possible that bug 1320170 either truly replaced broken code that was causing this crash somehow, or that it just moved stuff around enough to make it not reproduce (if this is some kind of memory trashing problem).

Hard to tell, but it'll be informative to see if the crashes return once bug 1340163 lands (it currently seems blocked by a lot of rework of principals in bug 1340710).
Flags: needinfo?(jib)
I landed bug 1340163. Can we test this again?
Flags: needinfo?(amarchesini)
Mass wontfix for bugs affecting firefox 52.
No crashes on builds after 2017-3-23 except on 52 and 52esr
We should consider uplifting bug 1340163 to 52esr.
Flags: needinfo?(ryanvm)
Flags: needinfo?(amarchesini)
Flags: needinfo?(amarchesini)
ESR52 is affected by this. Are we going to backport it there?
Flags: needinfo?(madperson)
Flags: needinfo?(madperson) → needinfo?(amarchesini)
I marked the patches of bug 1340163 for be uplifted to 52ESR.
Flags: needinfo?(amarchesini)
Flags: needinfo?(ryanvm)
Whiteboard: [adv-main53+][adv-esr52.1+]
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Group: media-core-security → core-security-release
Assignee: nobody → amarchesini
Target Milestone: --- → mozilla55
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.