Closed Bug 1373934 Opened 3 years ago Closed 2 years ago
Crash in mozilla::detail::Hash
This bug was filed from the Socorro interface and is report bp-52121e4c-b5be-4b5d-b058-196920170617. ============================================================= Crashing Thread (12) Frame Module Signature Source 0 xul.dll mozilla::detail::HashKnownLength<unsigned char>(unsigned char const*, unsigned __int64) obj-firefox/dist/include/mozilla/HashFunctions.h:231 1 xul.dll js::AtomizeChars<unsigned char>(JSContext*, unsigned char const*, unsigned __int64, js::PinningBehavior) js/src/jsatom.cpp:499 2 xul.dll js::XDRAtom<1>(js::XDRState<1>*, JS::MutableHandle<JSAtom*>) js/src/jsatom.cpp:649 3 xul.dll js::XDRScript<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::Handle<JSFunction*>, JS::MutableHandle<JSScript*>) js/src/jsscript.cpp:685 4 xul.dll js::XDRInterpretedFunction<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::MutableHandle<JSFunction*>) js/src/jsfun.cpp:668 5 xul.dll js::XDRScript<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::Handle<JSFunction*>, JS::MutableHandle<JSScript*>) js/src/jsscript.cpp:870 6 xul.dll js::XDRInterpretedFunction<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::MutableHandle<JSFunction*>) js/src/jsfun.cpp:668 7 xul.dll js::XDRScript<1>(js::XDRState<1>*, JS::Handle<js::Scope*>, JS::Handle<js::ScriptSourceObject*>, JS::Handle<JSFunction*>, JS::MutableHandle<JSScript*>) js/src/jsscript.cpp:870 8 xul.dll js::XDRState<1>::codeScript(JS::MutableHandle<JSScript*>) js/src/vm/Xdr.cpp:177 9 xul.dll js::MultiScriptsDecodeTask::parse(JSContext*) js/src/vm/HelperThreads.cpp:492 10 xul.dll js::HelperThread::handleParseWorkload(js::AutoLockHelperThreadState&) js/src/vm/HelperThreads.cpp:1857 11 xul.dll js::HelperThread::threadLoop() js/src/vm/HelperThreads.cpp:2119 12 xul.dll js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) js/src/threading/Thread.h:227 13 ucrtbase.dll o__realloc_base 14 kernel32.dll BaseThreadInitThunk 15 ntdll.dll RtlUserThreadStart 16 kernelbase.dll GetLegacyComposition reports with this signature and js::AtomizeChars in the stack are regressing in firefox 55 (possibly related to 654190). https://crash-stats.mozilla.com/search/?signature=%3Dmozilla%3A%3Adetail%3A%3AHashKnownLength%3CT%3E&proto_signature=~js%3A%3AAtomizeChars%3CT%3E&date=%3E%3D2017-03-17T12%3A36%3A49.000Z#crash-reports
Crash Signature: [@ mozilla::detail::HashKnownLength<T>] → [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>]
Shu, can you help find someone to investigate these crashes? Looking at just the [@ js::AtomizeChars<T> ] signature, this is the #11 topcrash on beta 55.
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #1) > Shu, can you help find someone to investigate these crashes? Looking at just > the [@ js::AtomizeChars<T> ] signature, this is the #11 topcrash on beta 55. nbp seems to know what's going on here.
Flags: needinfo?(shu) → needinfo?(nicolas.b.pierron)
(In reply to [:philipp] from comment #0) > reports with this signature and js::AtomizeChars in the stack are regressing > in firefox 55 (possibly related to 654190). That a good call, I will check if I can make an XDR test case based on Bug 654190 optimization.
Flags: needinfo?(nicolas.b.pierron) → needinfo?(evilpies)
Apparently evilpie did not managed to get the parser to generate any of the integer strings while making Bug 654190. The issue might be the we encode the bytecode after the execution, which would record the mutated state of JSOP_OBJECT object of run-once scripts. The compartment option singleton-as-templates was made to avoid this issue, by converting JSOP_OBJECT into a cloning function, in order to keep the original object un-mutated, and thus saving the unmutated JSOP_OBJECT. What might happen, is that we are encoding an object which got a computed string property, which is formatted into an integer string, and this integer string get serialized by XDR and fails while decoding. We should either: - encode the bytecode ahead of time. - use incremental encoding. - set the singleton-as-template flag on the compartments.  http://searchfox.org/mozilla-central/source/js/src/jsapi.h#2499-2502
Flags: needinfo?(evilpies) → needinfo?(kmaglione+bmo)
This signature spiked when enabling the JS start-up bytecode cache (Bug 900784) on a small portion of nightly users, i-e without the MultiScriptsDecodeTask. Also, this happens both on the main thread as well as on the helper thread.
(In reply to Nicolas B. Pierron [:nbp] from comment #5) > This signature spiked when enabling the JS start-up bytecode cache (Bug > 900784) on a small portion of nightly users, i-e without the > MultiScriptsDecodeTask. Also, this happens both on the main thread as well > as on the helper thread. This does not seems to be related to the JSBC at the moment, as there is only a single report from nightly, and most of the remaining crashes are on beta.
Some of these beta crashes are marked as high exploitability in crash-stats. Hiding the bug for now till someone can follow up.
This is fixed on trunk now I believe? Are we planning to ask for backport to 55 at this point in the cycle (one beta & RC build left)?
this is still ongoing in 55.0b13 after the patch from bug 1382329 has landed.
I will notice that HashKnownLength has many, many different callers in all these crash reports, even if most of them seems to be from AtomizeChars called from XDR. https://crash-stats.mozilla.com/report/index/578db4d4-4316-426c-86fd-d9dbf0170731 https://crash-stats.mozilla.com/report/index/6e082df6-7abd-474c-bf5d-002d50170728 Looking at the crash addresses, it sounds like either we provide a bad pointer, or we provide a bad length argument. Then we attempt to hash some random things, and SEGV at a page boundary. These could be explained by some corrupted data in the XDR buffer, or a non-reversible encoding. In any case, knowing that we don't know anything about this issue, and that this issue appears mostly on beta, investigating this issue will require multiple cycles. I will recommend to ask :kmag to disable this feature for the last beta, and start investigating these issues in 56.0b*.
Julien, is it too late to disable since we're in RC-land? (See Comment 10)
What feature would we disable? The last beta for 55 was a week ago...
Flags: needinfo?(jcristau) → needinfo?(nicolas.b.pierron)
(In reply to Julien Cristau [:jcristau] (away until Aug 21) from comment #12) > What feature would we disable? The last beta for 55 was a week ago... I have strictly no idea, and I do not even know if there is a feature flag. The feature is the MultiScript decoder which is currently used for loading Firefox resource files, from what I understand.
Crashes are still happening in builds that contain the fix for bug 1382329 :(
Given that this is sec-moderate & we are only 1 week from RC, mark 56 won't fix.
Commenting for Naveed. Since Firefox 55 made it to release we seems to have more than ~600 crashes per day, which are at 80% start-up crashes, and which are reproduced by the same users (same install time) over and over. So this crash has the potential of preventing users from running firefox once and for all. Unfortunately this signature does not sounds actionable. At the moment, our best chance for us would be to fix Bug 1401939, and hope we can get rid of these crashes too.
Clearing out my old needinfos. I don't really have anything to add to comment 17. This is mostly blocked on bug 1402167 at this point.
Track 58- as there are no actions at this point. Feel free to nominate again if it's getting worse.
Assigning this to myself because it seems similar to XDR-related corruption I'm investigating.
Assignee: nobody → tcampbell
Depends on: 1418894
Some of these crashes remain after Bug 1418894. The crash is due to a bad length field from XDR, causing use to keep reading string data and walk off the end of the XDR buffer triggering a fault. If this crash rate is a problem, we can add further mitigations to XDRBuffer::read / XDRState::codeBytes to check bounds before calling memcpy. Bug 1423616 is the work to address root-causes.
Depends on: 1423616
Crash Signature: [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] → [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] [@ JSAtom* js::AtomizeChars<T>]
Crash Signature: [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] [@ JSAtom* js::AtomizeChars<T>] → [@ mozilla::detail::HashKnownLength<T>] [@ js::AtomizeChars<T>] [@ JSAtom* js::AtomizeChars<T>] [@ js::XDRAtom<T>]
the volume of these signatures has significantly gone down in 59.0b12 - presumably with the help of bug 1438645 \o/
I'm curious how any of these are still getting through.. e.g. https://crash-stats.mozilla.com/report/index/9a5e9883-f192-4f14-b58b-fb2730180304
In the remaining crashes, the XDR buffer is valid and in range but we are definitely sharing someone's memory. In the above crash, the memory map is as follows: // AllocationBase, BaseAddress, RegionSize, State+Protect 00000c9b0000 00000c9b0000 (000fd000) (RESERVED+RW) 00000caad000 (00001000) (COMMITED+RW+PAGE_GUARD) 00000caae000 (00002000) (COMMITED+RW) 00000cab0000 00000cab0000 (00484000) (COMMITED+RO) The XDR buffer is 0x0ccad7e4..0x0ccb3870, and the data we are about to XDR is in range. This seems to cross from one VirtualAlloc chunk into another. The violation we hit is a page protect rather than unmapped pages we saw in the majority of these crashes (that is fixed).
Oops.. I read that wrong. Those first three pagemap entries are unrelated. This is all contained in the last one. What is odd that the AtomizeChars call is giving a ptr of 0x0ccaffff (which should also be readonly), but the fault happens on the second byte 0x0ccb0000.
Actually, these are all disk faults during OS page fault handling. The user has a bad disk so when the OS tries to page in an mmap page on-demand from the preload cache disk file, they trigger a fault and the browser goes down. Unfortunately they will keep crashing if the disk sector is bad.
Bug 1422087 will purge startup cache after crashes which should remove these remaining failures.
Depends on: 1422087
Just to confirm the disk failure hypothesis, could we have a "commitXDRBufferPages" functions, which just force the pages to be loaded in memory before resuming within XDR decoding? If this hypothesis is correct, all the signature should move to this commitXDRBufferPages function. Then we could potentially catch it with some RAII only used within this commitXDRBufferPages, and a SEGV handler to fallback nicely to loading the missing files if we are unable to load everything from the XDR file.
I think the EXCEPTION_IN_PAGE_ERROR_READ / STATUS_DEVICE_DATA_ERROR error is pretty clear that this is happening. The STATUS_DEVICE_DATA_ERROR is the file system error that Breakpad extracted (See ). What you propose sounds a little to heavyweight. I'm in favor of Bug 1422087 purging the startup cache so that the user doesn't get into a loop of these crashes. That will probably reduce crash rate. If we are still care after, we should approach this as a general problem across browser for anyone who mmaps files. It might be argued that once hitting disk corruption like this, some other important profile I/O will fail too if we didn't crash here. Query  shows there are a few other sources that run into these crashes too.  https://searchfox.org/mozilla-central/rev/bffd3e0225b65943364be721881470590b9377c1/toolkit/crashreporter/google-breakpad/src/processor/minidump_processor.cc#1089-1097  https://crash-stats.mozilla.com/search/?reason=~EXCEPTION_IN_PAGE_ERROR_READ&product=Firefox&date=%3E%3D2018-02-27T04%3A18%3A43.000Z&date=%3C2018-03-06T04%3A18%3A43.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Remaining crashes are disk problems. Tracking that general concern in Bug 1444442. We can close out this security bug now :)
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.