Closed Bug 1405843 Opened 3 years ago Closed 3 months ago

Crash in js::jit::RValueAllocation::layoutFromMode

Categories

(Core :: JavaScript Engine: JIT, defect, P3, critical)

57 Branch
All
Windows
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- fix-optional
firefox65 --- fix-optional
firefox66 --- fix-optional

People

(Reporter: philipp, Assigned: nbp)

References

Details

(Keywords: crash, regression, Whiteboard: [#jsapi:crashes-retriage])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-077f5c9e-da1d-423b-b87d-eb8f80171004.
=============================================================
Crashing Thread (6)
Frame 	Module 	Signature 	Source
0 	xul.dll 	js::jit::RValueAllocation::layoutFromMode(js::jit::RValueAllocation::Mode) 	js/src/jit/Snapshots.cpp:269
1 	xul.dll 	js::jit::CodeGeneratorShared::encode(js::jit::LSnapshot*) 	js/src/jit/shared/CodeGenerator-shared.cpp:620
2 	xul.dll 	js::jit::CodeGenerator::visitOsiPoint(js::jit::LOsiPoint*) 	js/src/jit/CodeGenerator.cpp:2799
3 	xul.dll 	js::jit::LOsiPoint::accept(js::jit::LElementVisitor*) 	js/src/jit/shared/LIR-shared.h:75
4 	xul.dll 	js::jit::CodeGenerator::generateBody() 	js/src/jit/CodeGenerator.cpp:5471
5 	xul.dll 	js::jit::CodeGenerator::generate() 	js/src/jit/CodeGenerator.cpp:9746
6 	xul.dll 	js::jit::GenerateCode(js::jit::MIRGenerator*, js::jit::LIRGraph*) 	js/src/jit/Ion.cpp:1948
7 	xul.dll 	js::HelperThread::handleIonWorkload(js::AutoLockHelperThreadState&) 	js/src/vm/HelperThreads.cpp:1886
8 	xul.dll 	js::HelperThread::threadLoop() 	js/src/vm/HelperThreads.cpp:2259
9 	xul.dll 	js::HelperThread::ThreadMain(void*) 	js/src/vm/HelperThreads.cpp:1795
10 	xul.dll 	js::detail::ThreadTrampoline<void (&)(void*), js::HelperThread*>::Start(void*) 	js/src/threading/Thread.h:227
11 	ucrtbase.dll 	_o__CIpow 	
12 	kernel32.dll 	BaseThreadInitThunk 	
13 	mozglue.dll 	patched_BaseThreadInitThunk 	mozglue/build/WindowsDllBlocklist.cpp:824
14 	ntdll.dll 	__RtlUserThreadStart 	
15 	ntdll.dll 	_RtlUserThreadStart

crash reports with this signature are rising during the 57.0b cycle on windows. they are all annotated with MOZ_CRASH(Wrong mode type?)...
If there is a bug here, then it must be isolated under the CodeGeneratorShared::encodeAllocation function.
This function is responsible for creating, and encoding the RValueAllocation.

As far as I can read in the source code, there is literally no way that the CodeGeneratorShared::encodeAllocation function can create a non-initialized RValueAllocation.  All the branches are ending with "alloc = RValueAllocation::…".

Then, the encoding and decoding of these functions, are well tested and covered with:
  http://searchfox.org/mozilla-central/source/js/src/jsapi-tests/testJitRValueAlloc.cpp

The only way to get a non-valid mode_ value in the switch case which is being reported here, is if one of the branches of CodeGeneratorShared::encodeAllocation is being miss-compiled by our compilers.

As far as I know, the only recent change which happened recently in this code is:
  https://hg.mozilla.org/mozilla-central/rev/68444a862226

And this patch cannot explain these issues.

I will note that all Firefox 57 / 58 crashes are only on Windows, and the earliest 57.0a1 started with the build-id 20170819100442.
So Bug 1388014 might be the source of the regression.  Strangely enough there is no report at all from 57.0b1 and 57.0b2.

I would have blamed PGO, but that would not explain the 57.0a1 crashes.
Maybe there is another compiler optimization which causes this crashes on some builds.
Priority: -- → P3
I don't see any reports in 60?
Attempt to get more information for Bug 1405843:
  1. Assert/Crash if the payload check is incorrect, as this is the only code
     path which can give an Invalid RValueAllocation in case of a register
     allocator issue or a memory corruption.
  2. Assert that the RValueAllocation is never invalid.
  3. Dump whatever content manage to gothrough this set of impossibilities.
Attachment #8954830 - Flags: review?(jdemooij)
Comment on attachment 8954830 [details] [diff] [review]
JIT RValueAllocation: The impossible happened.

This patch is adding a MOZ_CRASH_UNSAFE_PRINTF to an assertion which is currently being hit on crash-stat.

The value reported by this assertion is masked by all its callers with the 0x17f mask, limiting the leakage of any data.

This patch also add assertions ahead in order to catch the issue sooner without ever hitting this MOZ_CRASH.
Attachment #8954830 - Flags: review?(francois)
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Could you please fill out the data request form at https://github.com/mozilla/data-review/blob/master/request.md? You can attach it to this bug as a text file and r? me on it.

I will take care of doing the data review (i.e. https://wiki.mozilla.org/Firefox/Data_Collection#Step_2:_Request_is_reviewed) then.
Attachment #8954830 - Flags: review?(jdemooij) → review+
1. What questions will you answer with this data?

I am trying to answer if we ever generate any Invalid mode value, and knowing which value got generated.

2. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements? Some example responses:

Improve the knowledge of this crash.

3. What alternative methods did you consider to answer these questions? Why were they not sufficient?

Guarding this MOZ_CRASH_PRINTF with other assertion, as already present in the same patch, which should filter out any other known values, or reasons which might cause the previous assertion to fail.

4. Can current instrumentation answer these questions?

Yes, by displaying the mode_ field.

5. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the found on the Mozilla wiki.

  Measurement Description 	Data Collection Category 	Tracking Bug #
	MOZ_CRASH                    crash-stat                  Bug 1405843	

6. How long will this data be collected? Choose one of the following:

I want this data to be collected for 6 months initially (potentially renewable).
(and remove it once this bug is closed)

7. What populations will you measure?

All users crashing with this assertions, i.e. ~60 users per days (based on current crash-rate)

8. If this data collection is default on, what is the opt-out mechanism for users?

It is opt-in to report crashes.

9. Please provide a general description of how you will analyze this data.

One question is: is it a constant across all crashes, or a random number.
If this is a constant, then dig in to figure what might produce this constant.

10. Where do you intend to share the results of your analysis?

Here, Bug 1405843.
1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes, in comment 6.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, crash reporting setting.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**

Nicolas will monitor for 6 months and either renew or remove at this point.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

Category 4.

5) Is the data collection request for default-on or default-off?

Default-off.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data?

Yes, it will happen here in bug 1405843.
Comment on attachment 8954830 [details] [diff] [review]
JIT RValueAllocation: The impossible happened.

datareview+
Attachment #8954830 - Flags: review?(francois)
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/42237b8f4247
JIT RValueAllocation::Mode: Add more assertions. r=jandem datareview=francois
(leave-open: landing diagnostic assertions)
Keywords: leave-open
(In reply to Nicolas B. Pierron [:nbp] {backlog: ~36} from comment #3)
> Attempt to get more information for Bug 1405843:
>   1. Assert/Crash if the payload check is incorrect, as this is the only code
>      path which can give an Invalid RValueAllocation in case of a register
>      allocator issue or a memory corruption.

Recent beta crashes[1] seems to point to this hypothesis.

This implies that either we got a corrupted payload kind, or we got a payload which is still of kind USE.

[1] bp-912851ee-8bfb-46ac-af16-899ad0180314 : MOZ_CRASH("Unexpected payload type.");
Flags: needinfo?(nicolas.b.pierron)
Crash Signature: [@ js::jit::RValueAllocation::layoutFromMode] → [@ js::jit::RValueAllocation::layoutFromMode] [@ js::jit::CodeGeneratorShared::encodeAllocation ]
Still seeing moderate crash volume on beta 62, any luck looking at the new crash data you mention in comment 12?
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #13)
> Still seeing moderate crash volume on beta 62, any luck looking at the new
> crash data you mention in comment 12?

I have not. The crash report mentioned in comment 12 suggests that this is either a memory corruption, or a problem in our register allocator, which fails to allocate for a given case.

One thing I could do is landing more MOZ_CRASH_UNSAFE_PRINTF to help distinguish between the 2 hypothesis. The first one is unlikely to be actionable unless we can trace back the corrupting source, and the second one unlikely to be actionable either unless we get a test case to reproduce this issue.
Fairly low-volume crash, has a priority set, may not be actionable.
Marking fix-optional to remove this from regression triage. 
Happy to still take a patch in nightly.
See Also: → 1541817
Flags: needinfo?(nicolas.b.pierron)
Keywords: stalled
Whiteboard: [#jsapi:crashes-retriage]

The leave-open keyword is there and there is no activity for 6 months.
:nbp, maybe it's time to close this bug?

Flags: needinfo?(nicolas.b.pierron)

I will close this bug as Incomplete, as this is likely an untraceable memory corruption which causes random value to appear, as detailed in comment 14.

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Flags: needinfo?(nicolas.b.pierron)
Resolution: --- → INCOMPLETE

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
You need to log in before you can comment on or make changes to this bug.