Crash in [@ js::NewStringCopyNDontDeflateNonStaticValidLength<T>] with ProxyAutoConfig
Categories
(Core :: Networking, defect, P1)
Tracking
()
People
(Reporter: aryx, Assigned: valentin)
References
Details
(Keywords: crash, csectype-uaf, sec-moderate, Whiteboard: [necko-triaged][necko-priority-review])
Crash Data
Attachments
(1 file)
67 crashes from 30 installations of Firefox 102.0esr. There are isolated reports for the non-esr 102.0 and earlier versions.
Most reports are for bcb.gov.br sites (several subdomains).
Crash report: https://crash-stats.mozilla.org/report/index/f2c6e8be-558a-44fc-a5b3-49dd10220718
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll js::NewStringCopyNDontDeflateNonStaticValidLength<js::CanGC, unsigned char> js/src/vm/StringType.cpp:1651
1 xul.dll JS_NewStringCopyZ js/src/jsapi.cpp:2968
2 xul.dll mozilla::net::PACDnsResolve netwerk/base/ProxyAutoConfig.cpp:326
3 xul.dll Interpret js/src/vm/Interpreter.cpp:3314
4 xul.dll js::RunScript js/src/vm/Interpreter.cpp:389
5 xul.dll js::ExecuteKernel js/src/vm/Interpreter.cpp:781
6 xul.dll js::Execute js/src/vm/Interpreter.cpp:813
7 xul.dll ExecuteScript js/src/vm/CompilationAndEvaluation.cpp:509
8 xul.dll JS_ExecuteScript js/src/vm/CompilationAndEvaluation.cpp:533
9 xul.dll mozilla::net::ProxyAutoConfig::SetupJS netwerk/base/ProxyAutoConfig.cpp:584
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 1•2 years ago
|
||
These signatures might be both new variants of [@ mozilla::net::PACDnsResolve]
which already affected Firefox 91esr.
Comment 2•2 years ago
|
||
Kershaw, is this something you can take a look at?
Comment 3•2 years ago
|
||
I found two crash reports that contains the poison value (0xe5e5)
- ffedbfa5-488c-446e-8c08-71a240220722
- 71ef8913-0c9d-486a-b599-78a1f0220724
Looks like the JSContext is already released, so this might be a UAF.
Updated•2 years ago
|
Comment 4•2 years ago
|
||
Calling this sec-moderate on the theory that most people don't use a PAC script, and those that do generally get them from trustworthy sources like their enterprise admins. But a UAF here is not great.
Assignee | ||
Comment 5•2 years ago
|
||
Looking at a dump in this report, in js::ErrorToException it seems errorNumber=JSMSG_BAD_BYTECODE
Not sure exactly what that means. Maybe Nicholas has an idea if this is really a UAF or not.
My first thought was that maybe we are destroying the context in ProxyAutoConfig::Shutdown while another thread is using it in ProxyAutoConfig::SetupJS but right now I don't see how that is actually possible.
Comment 6•2 years ago
|
||
From experience, I have never seen JSMSG_BAD_BYTECODE
before.
This code is produced by the C++ interpreter, which is the first execution stage to read the bytecode. It would be reported if some unknown opcode appears in the bytecode. This can happen if the bytecode gets corrupted, or if the JS engine fails to produce jumps to the proper offset and that an immediate value gets interpreted as an opcode. The last case is unlikely as debug builds have plenty of checks to verify the coherency of our jump-targets.
Another option would be if the bytecode comes from a previous version of Firefox. Caching the bytecode without checking for the version number could cause some opcode to have shifted. All our deserialization path should include a version check, but this might still happen with twilight builds of Firefox (= building it your-self, testing, caching bytecode, rebasing, rebuilding, loading old bytecode from the cache).
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 7•2 years ago
|
||
The most recent crash report https://crash-stats.mozilla.org/report/index/67ac0582-1540-4a06-b8c8-534520220809#allthreads doesn't have any networking signatures on the stack.
It's actually crashing when initializing the main thread JS context here
Crashing Thread (0), Name: MainThread
Frame Module Signature Source Trust
0 xul.dll js::NewStringCopyNDontDeflateNonStaticValidLength<js::CanGC, unsigned char>(JSContext*, unsigned char const*, unsigned int, js::gc::InitialHeap) js/src/vm/StringType.cpp:1648 context
1 xul.dll js::NewStringCopyUTF8N(JSContext*, const JS::UTF8Chars, js::gc::InitialHeap) js/src/vm/StringType.cpp:1790 cfi
2 xul.dll js::NewStringCopyUTF8Z(JSContext*, const JS::ConstUTF8CharsZ, js::gc::InitialHeap) js/src/vm/StringType.h:1455 cfi
3 xul.dll js::ErrorToException(JSContext*, JSErrorReport*, JSErrorFormatString const* (*)(void*, const unsigned int), void*) js/src/jsexn.cpp:320 cfi
4 xul.dll js::ReportErrorNumberVA(JSContext*, js::IsWarning, JSErrorFormatString const* (*)(void*, const unsigned int), void*, const unsigned int, js::ErrorArgumentsType, char*) js/src/vm/ErrorReporting.cpp:484 cfi
5 xul.dll JS_ReportErrorNumberASCII(JSContext*, JSErrorFormatString const* (*)(void*, const unsigned int), void*, const unsigned int, <NoType>) js/src/jsapi.cpp:3550 cfi
6 xul.dll js::ReportAllocationOverflow(JSContext*) js/src/vm/JSContext.cpp:373 cfi
7 xul.dll js::TempAllocPolicy::reportAllocOverflow() const js/src/util/AllocPolicy.cpp:20 cfi
8 xul.dll mozilla::Vector<char16_t, 32, js::StringBufferAllocPolicy>::growStorageBy(unsigned int) mfbt/Vector.h:996 cfi
9 xul.dll js::StringBuffer::append(char16_t const*, unsigned int) js/src/util/StringBuffer.h:210 cfi
10 xul.dll js::frontend::ParserAtomsTable::appendTo(js::StringBuffer&, js::frontend::TaggedParserAtomIndex) const js/src/frontend/ParserAtom.cpp:1086 cfi
11 xul.dll `anonymous namespace'::NameResolver::nameExpression(js::frontend::ParseNode*, bool*) js/src/frontend/NameFunctions.cpp:137 cfi
12 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
13 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
14 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
15 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
16 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
17 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
18 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
19 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
20 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
21 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
22 xul.dll `anonymous namespace'::NameResolver::visit(js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:482 cfi
23 xul.dll js::frontend::NameFunctions(JSContext*, js::frontend::ParserAtomsTable&, js::frontend::ParseNode*) js/src/frontend/NameFunctions.cpp:490 cfi
24 xul.dll js::frontend::BytecodeEmitter::emitScript(js::frontend::ParseNode*) js/src/frontend/BytecodeEmitter.cpp:2484 cfi
25 xul.dll CompileGlobalScriptToStencilAndMaybeInstantiate<mozilla::Utf8Unit>(JSContext*, js::LifoAlloc&, js::frontend::CompilationInput&, JS::SourceText<mozilla::Utf8Unit>&, js::ScopeKind, mozilla::Variant<mozilla::UniquePtr<js::frontend::ExtensibleCompilationStencil, JS::DeletePolicy<js::frontend::ExtensibleCompilationStencil> >, RefPtr<js::frontend::CompilationStencil>, js::frontend::CompilationGCOutput*>&) js/src/frontend/BytecodeCompiler.cpp:296 cfi
26 xul.dll js::frontend::CompileGlobalScriptToStencil(JSContext*, js::LifoAlloc&, js::frontend::CompilationInput&, JS::SourceText<mozilla::Utf8Unit>&, js::ScopeKind) js/src/frontend/BytecodeCompiler.cpp:376 cfi
27 xul.dll JSRuntime::initSelfHostingStencil(JSContext*, mozilla::Span<const unsigned char, 4294967295>, bool (*)(JSContext*, mozilla::Span<const unsigned char, 4294967295>)) js/src/vm/SelfHosting.cpp:2592 cfi
28 xul.dll JS::InitSelfHostedCode(JSContext*, mozilla::Span<const unsigned char, 4294967295>, bool (*)(JSContext*, mozilla::Span<const unsigned char, 4294967295>)) js/src/vm/Initialization.cpp:224 cfi
29 xul.dll static XPCJSContext::NewXPCJSContext() js/xpconnect/src/XPCJSContext.cpp:1420 cfi
30 xul.dll xpc::InitializeJSContext() js/xpconnect/src/nsXPConnect.cpp:103 cfi
31 xul.dll XREMain::XRE_mainRun() toolkit/xre/nsAppRunner.cpp:5415 cfi
The error number here is JSMSG_ALLOC_OVERFLOW.
The locals at internalEnsureCapacity are:
Name | Value | Type | |
---|---|---|---|
◢ | this | 0x0537e680 {mBegin=0x11213400 <Error reading characters of string.> mLength=0x711e5cc8 mTail={mBytes=...} } | mozilla::Vector<char16_t,32,js::StringBufferAllocPolicy> * |
▶ js::StringBufferAllocPolicy | {impl_={cx_=0x11213400 {runtime_={...} kind_={value=??? } nurserySuppressions_={...} ...} } arenaId_=...} | js::StringBufferAllocPolicy | |
▶ mBegin | 0x11213400 <Error reading characters of string.> | char16_t * | |
mLength | 0x711e5cc8 | unsigned int | |
▶ mTail | {mBytes=0x0537e694 "\x10" } | mozilla::Vector<char16_t,32,js::StringBufferAllocPolicy>::CRAndStorage<32,0> | |
aNeeded | 0x6d6f7250 | unsigned int |
I'd like the JS team to also take a closer look at this, as this specific crash seems independent from necko code.
Assignee | ||
Updated•2 years ago
|
Comment 8•2 years ago
|
||
It's possible there are two different issues here though. The crash in comment 7 is probably some sort of memory corruption when compiling self-hosted code (because an allocation overflow shouldn't happen here, it's always exactly the same code...). This is likely unrelated to the PAC crashes.
@arai we're trying to allocate JS strings for an allocation-overflow exception thrown when compiling self-hosted code. Is this something that will be fixed by your frontend exception handling changes?
Comment 9•2 years ago
|
||
yes, the frontend code will stop allocating GC strings during compilation. we'll have a separate step to convert frontend-specific error (similar to off-thread errors) to runtime error, immediately after the compilation, and we can have dedicate step for self-hosted JS case to somehow report the error without using GC strings.
Comment 10•2 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #0)
67 crashes from 30 installations of Firefox 102.0esr. There are isolated reports for the non-esr 102.0 and earlier versions.
Most reports are for bcb.gov.br sites (several subdomains).
Knowing that this bug is about a government sub-domain and a proxy-auto-config, it sounds like this might be a recipe for targeted attacks, or a corrupted binary distributed to multiple computers.
Comment 11•2 years ago
|
||
One of the concerned I had was about Web Proxy Auto Discovery Protocol, but https://wpad.gov.br/wpad.dat does not seems to exists. This does not mean that an internal equivalent does not exists.
The fact that mozilla::net::PACDnsResolve
crashes would suggest that there is a PAC (Proxy Auto Config) script. (right?)
For the context, a PAC script has the capability of by-passing the DNS resolution.
The over-representation of internal domain names with *.gov.br
suggest something might be wrong, in the installations or configuration of PAC scripts.
On a side-note, the start-up crash mentioned in comment 7 would be in favor of a corrupted Firefox binary. However, if the binary were to be corrupted, we would expect all crashes to be start-up crashes. As these are manipulating self-hosted data, which is backed in the binary, but only 2.4% are reported as start-up crashes.
Everything seems to point towards memory corruptions, but the source of the corruption is unclear. The source of the memory corruption seems to happen at runtime, while parsing script content or while generating the bytecode for the script. These allocations are mostly managed by the LifoAlloc, which relies on the system allocator.
This crash-stat search, isolated to the sub-domain bcb.gov.br
, reports many crashes which are related to SpiderMonkey's Interpreter under the context of the PAC script execution.
Assignee | ||
Comment 12•2 years ago
|
||
Thank you for that evaluation.
The WPAD URL is "http://wpad/wpad.dat"
When resolving wpad
the OS resolver may append the DNS suffix to its name, resulting in wpad.gov.br.
It's entirely possible that either this specific organization in brazil has a PAC script that is triggering a bug/memory corruption, or maybe they have some specific software installed that is causing it?
(In reply to Valentin Gosu [:valentin] (he/him) from comment #5)
My first thought was that maybe we are destroying the context in ProxyAutoConfig::Shutdown while another thread is using it in ProxyAutoConfig::SetupJS but right now I don't see how that is actually possible.
This was my only approach, and we likely wouldn't be able to confirm a fix before it lands on ESR.
(In reply to Tooru Fujisawa [:arai] from comment #9)
yes, the frontend code will stop allocating GC strings during compilation. we'll have a separate step to convert frontend-specific error (similar to off-thread errors) to runtime error, immediately after the compilation, and we can have dedicate step for self-hosted JS case to somehow report the error without using GC strings.
I expect if we do this these crashes would either go away or turn into some other kind of crashes? Do we have a bug for this work?
Comment 13•2 years ago
|
||
Valentin, do you know if there's something that prevents us from calling Shutdown here while we're spinning the event loop here? There's a comment about nsPACMan
being responsible for allowing one PAC execution at a time, I think that's what mInProgress
is for? Maybe we can add some release asserts to the shutdown code?
I still think there are two separate issues here: a PAC issue that affects at least ESR 102 and the one in comment 7 is a one-off memory corruption.
Comment 14•2 years ago
|
||
In ProxyAutoConfig::SetupJS
we should have called SetRunning(this)
, is there any chance that state is being clobbered by the SetRunning
calls in ProxyAutoConfig::GetProxyForURI
?
Assignee | ||
Comment 15•2 years ago
|
||
Those are great leads. I'll take a look.
Comment 16•2 years ago
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #12)
(In reply to Tooru Fujisawa [:arai] from comment #9)
yes, the frontend code will stop allocating GC strings during compilation. we'll have a separate step to convert frontend-specific error (similar to off-thread errors) to runtime error, immediately after the compilation, and we can have dedicate step for self-hosted JS case to somehow report the error without using GC strings.
I expect if we do this these crashes would either go away or turn into some other kind of crashes? Do we have a bug for this work?
it will happen in bug 1785762 or followup bug.
Comment 17•2 years ago
|
||
(In reply to Valentin Gosu [:valentin] (he/him) from comment #12)
(In reply to Tooru Fujisawa [:arai] from comment #9)
yes, the frontend code will stop allocating GC strings during compilation. we'll have a separate step to convert frontend-specific error (similar to off-thread errors) to runtime error, immediately after the compilation, and we can have dedicate step for self-hosted JS case to somehow report the error without using GC strings.
I expect if we do this these crashes would either go away or turn into some other kind of crashes? Do we have a bug for this work?
If this is indeed a memory corruption, then the other kind of crashes would be as unactionable as the current one.
The fact that we have multiple JS signatures showing the bytecode corruption, or corruption in the computation of the bytecode would suggest that the change of the frontend error reporting will just move the error to else-where.
Also, the error reporting is likely to be a consequence of a memory corruption, as it is more likely that a distributed PAC script does not contain any errors preventing its execution.
Comment 18•2 years ago
|
||
I think we should focus on the PAC crashes like the one in comment 0. There are hundreds of crashes with this signature that are similar to that one (crash address 0x420, under PACDnsResolve
) so there's likely a real bug here.
The one in comment 7 is unrelated. The UAF crash in comment 5 might be a different symptom of the PAC crashes or could also be unrelated.
Assignee | ||
Comment 19•2 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #14)
In
ProxyAutoConfig::SetupJS
we should have calledSetRunning(this)
, is there any chance that state is being clobbered by theSetRunning
calls inProxyAutoConfig::GetProxyForURI
?
As far as I can tell this should not be possible. SetupJS should run before GetProxyForURI, or be called by it. So by the time we get to here there shouldn't be any JS running. But you're right, I'll add an assertion anyway.
(In reply to Jan de Mooij [:jandem] from comment #13)
Valentin, do you know if there's something that prevents us from calling Shutdown here while we're spinning the event loop here? There's a comment about
nsPACMan
being responsible for allowing one PAC execution at a time, I think that's whatmInProgress
is for? Maybe we can add some release asserts to the shutdown code?
I've been looking at this question - if anything were to call Shutdown while we're running a PAC script we should hit this condition here, and it shouldn't crash.
I'll try to add some assertions to see if we can rule out these scenarios.
Assignee | ||
Comment 20•2 years ago
|
||
Assignee | ||
Comment 21•2 years ago
|
||
We need to wait for this patch to go to beta and see if any new crashes happen.
Reporter | ||
Comment 22•2 years ago
|
||
Add diagnostic asserts that ProxyAutoConfig::SetRunning is called at the correct times r=necko-reviewers,kershaw
https://hg.mozilla.org/integration/autoland/rev/66820b92dcc7f26a2aaa25321604fa7d6547c21a
https://hg.mozilla.org/mozilla-central/rev/66820b92dcc7
Assignee | ||
Comment 23•2 years ago
|
||
106 was just released. No assertions or crashes on Nightly or Beta.
Let's wait for 2-3 more weeks. If no crashes we could uplift to ESR.
Assignee | ||
Comment 24•2 years ago
|
||
I think this check might have fixed the issues. Based on the previous crash rate, I would have expected at least one report on 106 until now - since there are none, maybe it's time to uplift and confirm.
Assignee | ||
Comment 25•2 years ago
|
||
Comment on attachment 9294417 [details]
Bug 1780094 - Add diagnostic asserts that ProxyAutoConfig::SetRunning is called at the correct times r=#necko
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Potential UAF.
- User impact if declined: Crash or security vulnerability
- Fix Landed on Version: 106
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): We now return an error if there's still a running script happening, instead of deleting the JS context.
Comment 26•2 years ago
|
||
:valentin diagnostic asserts are only enabled nightly and early beta.
Wondering about this ESR uplift request?
Assignee | ||
Comment 27•2 years ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #26)
:valentin diagnostic asserts are only enabled nightly and early beta.
Wondering about this ESR uplift request?
Apart from the diagnostic assert, it also adds an extra check:
if (GetRunning()) {
return NS_ERROR_ALREADY_INITIALIZED;
}
This is the only line that is actually changing behaviour, and if we actually find ourselves in that condition we really want to return early, otherwise we crash.
Uplifting to ESR would confirm if this is the proper fix, or if the crash rate on release is simply too low. In either case it's worth doing I think.
Updated•1 year ago
|
Comment 28•1 year ago
|
||
Comment on attachment 9294417 [details]
Bug 1780094 - Add diagnostic asserts that ProxyAutoConfig::SetRunning is called at the correct times r=#necko
Approved for 102.6esr.
Comment 29•1 year ago
|
||
uplift |
Comment on attachment 9294417 [details]
Bug 1780094 - Add diagnostic asserts that ProxyAutoConfig::SetRunning is called at the correct times r=#necko
Landed on ESR102. Removing the approval flag to get this off the needs-uplift radar.
https://hg.mozilla.org/releases/mozilla-esr102/rev/44a89d647667
Assignee | ||
Comment 30•1 year ago
|
||
Firefox 102.6 was released on 2022-12-13 and there seem to be no more PAC related crashes since then.
I think we can consider this FIXED.
A big thank you to everyone who helped resolve this one.
The one crash with this signature in 102.6 is https://crash-stats.mozilla.org/report/index/4bfc8c74-2671-4e02-9909-8f57d0221220 which does not seem PAC related at all. We might want to spin up a separate bug for that.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Description
•