Intermittent browser_net_statistics-02.js,browser_net_sort-02.js,browser_net_api-calls.js,browser_net_icon-preview.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)]

RESOLVED FIXED in mozilla60

Status

()

defect
P2
critical
RESOLVED FIXED
2 years ago
7 months ago

People

(Reporter: intermittent-bug-filer, Assigned: sfink)

Tracking

(Blocks 1 bug, {crash, intermittent-failure})

unspecified
mozilla60
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 wontfix)

Details

(Whiteboard: [stockwell unknown], crash signature)

Attachments

(2 attachments, 2 obsolete attachments)

Comment hidden (Intermittent Failures Robot)
Component: Developer Tools: Netmonitor → JavaScript: GC
Product: Firefox → Core
Ryan, is there somebody who could bisect down the source of the revision on Treeherder? Thanks.
Blocks: 719114
Flags: needinfo?(ryanvm)
Maybe one of our intrepid stockwell folks can :)
Flags: needinfo?(ryanvm) → needinfo?(jmaher)
This started on autoland on May 18. It is a very frequent intermittent on Windows Debug e10s only.

https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=windows%20debug%20dt5&tochange=d18aa0ba5171b293f46fc5f095859d960a55c303&fromchange=249246e433dc06b270d0e5e5cfb446b97e879fe4

suggests this may have been triggered by 1365635.

:rickychien - can you have a look?
Flags: needinfo?(jmaher)
Flags: needinfo?(rchien)
Whiteboard: [stockwell needswork]
On the one hand it is surprising that a JS-only change could cause a GC crash. On the other hand, it was a change to netmonitor and the crashes are in netmonitor tests.
See Also: → 1366153
Comment hidden (Intermittent Failures Robot)
Assignee: nobody → rchien
Status: NEW → ASSIGNED
Flags: needinfo?(rchien)
Comment hidden (mozreview-request)
Honza, I'm not sure why bug 1365635 only caused this weird GC crash on win32 debug. I can't figure it out how that happens. However, most of platforms looks like irrelevant. I'd prefer to disable it for win32 debug.
I agree we should just disable the test on win32 and leave JS team to investigate further.
Keywords: leave-open
(In reply to Ricky Chien [:rickychien] from comment #9)
> Honza, I'm not sure why bug 1365635 only caused this weird GC crash on win32
> debug. I can't figure it out how that happens. However, most of platforms
> looks like irrelevant. I'd prefer to disable it for win32 debug.

A similar crash is also happening on Win64 (bug 1366296) and OSX (bug 1366153).
See Also: → 1366296
See Also: → 1365564
Comment hidden (Intermittent Failures Robot)
Attachment #8869671 - Attachment is obsolete: true
Attachment #8869671 - Flags: review?(odvarko)
After having a discussion with performance team engineers, we believe this might be a JS engine or memory relevant issue but we hit unluckily. We are going to leave it open and let us deal with it in bug 1366296.
Assignee: rchien → nobody
Status: ASSIGNED → NEW
(In reply to Ricky Chien [:rickychien] from comment #13)
> After having a discussion with performance team engineers, we believe this
> might be a JS engine or memory relevant issue but we hit unluckily. We are
> going to leave it open and let us deal with it in bug 1366296.

It would still probably be a good idea to disable this test on Win32 debug given how frequently this crash is happening, just to improve the orange factor.
Comment hidden (Intermittent Failures Robot)
after May 24th, this has reduced significantly (12 failures in 2 weeks):
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1366083&startday=2017-05-24&endday=2017-06-07&tree=all
Whiteboard: [stockwell needswork] → [stockwell unknown]
Summary: Intermittent browser_net_statistics-02.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)] → Intermittent browser_net_statistics-02.js,browser_net_sort-02.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)]
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Summary: Intermittent browser_net_statistics-02.js,browser_net_sort-02.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)] → Intermittent browser_net_statistics-02.js,browser_net_sort-02.js,browser_net_api-calls.js,browser_net_icon-preview.js | application crashed [@ js::GCMarker::processMarkStackTop(js::SliceBudget &)]
Duplicate of this bug: 1386875
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
this seems to have picked up a bit lately, not enough to make a stockwell bug though
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Bulk priority update of open intermittent test failure bugs. 

P3 => P5

https://bugzilla.mozilla.org/show_bug.cgi?id=1381960
Priority: P3 → P5
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Steve, would you mind taking a look at this intermittent failure, just enough to make sure we're not overlooking an obvious clue?

Could be lurking JSAPI misuse somewhere...
Flags: needinfo?(sphink)
Assignee

Comment 29

2 years ago
We're marking a value array, and one of the values is a nullptr JSObject*. The value array is probably the containing object's slots, but it might also be an Array.

I've begin seeing other instances recently of null object values, including ones in release crashes. So it's pretty great that we have these intermittents.

I'm not sure how these null JSObject Values might be introduced. Let's see...

In C++, these should mostly flow through Value::setObject(JSObject&). Which is a little worrisome, because although there are various asserts that the reference is valid, dereferencing a nullptr is UB and so I could imagine the compiler discarding all the null checks both before and after the setObject call. I haven't looked at the disassembly to check, though.

Other sources of Values I can think of are memcpy and the JIT.

This is a hot path, but this seems like it has enough value to warrant some diagnostics. Dumping out the containing JSObject seems like it could be very revealing -- assuming the dumping doesn't dereference the same null pointer, anyway. And it won't tell us what wrote that Value in, but it might make it easier to track down.
Flags: needinfo?(sphink)
Assignee

Comment 30

2 years ago
I'm going to upgrade the priority of this bug because I am hopeful that it might uncover a more serious problem that is dogging us right now, and for whatever reason, the work is happening here.
Priority: P5 → P2
Assignee

Comment 31

2 years ago
Oh, I thought I'd submitted this already, but I was waiting on a try run. Good news is that it doesn't seem to have broken anything. Bad news is that it didn't break anything and find bugs.
Attachment #8928768 - Flags: review?(jcoppeard)
Assignee

Updated

2 years ago
Assignee: nobody → sphink
Status: NEW → ASSIGNED
Comment on attachment 8928768 [details] [diff] [review]
diagnostic patch for ObjectValue(nullptr)

Review of attachment 8928768 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Marking.cpp
@@ +1760,5 @@
> +                fprintf(stderr,
> +                        "processMarkStackTop found ObjectValue(nullptr) "
> +                        "at %zu Values from end of array in object:\n",
> +                        end - (vp - 1));
> +                DumpObject(obj);

Worth a MOZ_CRASH here too?
Attachment #8928768 - Flags: review?(jcoppeard) → review+
Assignee

Comment 33

2 years ago
(In reply to Jon Coppeard (:jonco) (PTO until 20th Nov) from comment #32)
> Comment on attachment 8928768 [details] [diff] [review]
> diagnostic patch for ObjectValue(nullptr)
> 
> Review of attachment 8928768 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/gc/Marking.cpp
> @@ +1760,5 @@
> > +                fprintf(stderr,
> > +                        "processMarkStackTop found ObjectValue(nullptr) "
> > +                        "at %zu Values from end of array in object:\n",
> > +                        end - (vp - 1));
> > +                DumpObject(obj);
> 
> Worth a MOZ_CRASH here too?

I was leaving it out in hopes that the crashes would continue to be routed to this bug; would that still happen if I changed the crash site via MOZ_CRASH?

Comment 34

2 years ago
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3059e975eef8
diagnostic patch for ObjectValue(nullptr), r=jonco

Comment 37

2 years ago
Backout by archaeopteryx@coole-files.de:
https://hg.mozilla.org/mozilla-central/rev/010374bce606
Backed out 5 changesets (bug 1366083, bug 1410528, bug 1417267) for failing Spidermonkey jstests and jit-tests r=backout on a CLOSED TREE

Comment 38

2 years ago
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/29cc33a514b6
diagnostic patch for ObjectValue(nullptr), r=jonco
Assignee

Comment 40

2 years ago
While I'm at it, let me put one closer to the source. Just pushing to try with this for now.
Comment hidden (Intermittent Failures Robot)
Comment hidden (Intermittent Failures Robot)
Assignee

Comment 43

Last year
It seems it's time to up the bar. The problem that the previous assert was added for doesn't seem to be happening now. But ObjectValue(nullptr) appears to still be causing issues in other bugs. I would like to try landing a more draconian diagnostic assert for a while to see if it catches anything.
Attachment #8953192 - Flags: review?(jcoppeard)
Assignee

Updated

Last year
Attachment #8930766 - Attachment is obsolete: true
Flags: needinfo?(sphink)
Comment on attachment 8953192 [details] [diff] [review]
Diagnostic assert for ObjectValue(nullptr)

Review of attachment 8953192 [details] [diff] [review]:
-----------------------------------------------------------------

Let's do it.
Attachment #8953192 - Flags: review?(jcoppeard) → review+

Comment 45

Last year
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c28735015441
Diagnostic assert for ObjectValue(nullptr), r=jonco
The leave-open keyword is there and there is no activity for 6 months.
:sfink, maybe it's time to close this bug?
Flags: needinfo?(sphink)
Assignee

Updated

7 months ago
Depends on: 1509985
Assignee

Comment 48

7 months ago
Yes. I think we should probably back this out now, but I guess that makes sense to do in another bug since it's a totally different version now.
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Flags: needinfo?(sphink)
Resolution: --- → FIXED
Assignee

Updated

7 months ago
Keywords: leave-open
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.