Closed Bug 1216175 Opened 10 years ago Closed 3 years ago

GC handling in workers is broken | crash in OOM | large | NS_ABORT_OOM | mozilla::dom::CallbackObject::Init

Categories

(Core :: DOM: Workers, defect, P3)

Product:

Component:

Platform:

x86_64

Windows 8.1

Type:

defect

Priority:

P3

Severity:

S3

Tracking

()

Status:

RESOLVED FIXED

Milestone:

111 Branch

Tracking Flags:

Tracking

Status

firefox-esr102

---

wontfix

firefox109

---

wontfix

firefox110

---

wontfix

firefox111

---

fixed

People

(Reporter: jujjyl, Assigned: smaug)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, Whiteboard: DWS_NEXT)

Attachments

(5 files, 7 obsolete files)

settimeouts.zip 10 years ago Jukka Jylänki 785 bytes, application/x-zip-compressed		Details
simpler testcase 10 years ago Olli Pettay [:smaug][bugs@pettay.fi] 152 bytes, text/html		Details
wip 10 years ago Olli Pettay [:smaug][bugs@pettay.fi] 11.27 KB, patch		Details \| Diff \| Splinter Review
simpler 10 years ago Olli Pettay [:smaug][bugs@pettay.fi] 9.26 KB, patch	baku : review+	Details \| Diff \| Splinter Review
up-to-date 9 years ago Olli Pettay [:smaug][bugs@pettay.fi] 9.04 KB, patch		Details \| Diff \| Splinter Review
worker_gc_scheduling_6.diff 9 years ago Olli Pettay [:smaug][bugs@pettay.fi] 8.77 KB, patch	baku : review+	Details \| Diff \| Splinter Review
v7 9 years ago Olli Pettay [:smaug][bugs@pettay.fi] 10.29 KB, patch		Details \| Diff \| Splinter Review
v8 8 years ago Olli Pettay [:smaug][bugs@pettay.fi] 10.92 KB, patch	baku : review+	Details \| Diff \| Splinter Review
v9 8 years ago Olli Pettay [:smaug][bugs@pettay.fi] 11.62 KB, patch		Details \| Diff \| Splinter Review
v10 8 years ago Olli Pettay [:smaug][bugs@pettay.fi] 8.85 KB, patch		Details \| Diff \| Splinter Review
worker_gc_scheduling_11.diff 7 years ago Olli Pettay [:smaug][bugs@pettay.fi] 7.75 KB, patch		Details \| Diff \| Splinter Review
Bug 1216175 - ensure GC/CC runs in workers 3 years ago Olli Pettay [:smaug][bugs@pettay.fi] 48 bytes, text/x-phabricator-request		Details \| Review

Reporter

Description

•

10 years ago

This bug was filed from the Socorro interface and is report bp-55e6e5b0-125f-4454-a4be-c54252151019. ============================================================= A call stack that performs a large 3GB allocation and OOMs without handling possible failure. Unfortunately no consistent repro.

Reporter

Comment 1

•

10 years ago

Here's another dump: https://crash-stats.mozilla.com/report/index/4985a6bd-2212-4102-b1b7-00e7a2151019

Reporter

Comment 2

•

10 years ago

And a third one: https://crash-stats.mozilla.com/report/index/fdadba46-b719-47b2-9ca2-f03182151019

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 3

•

10 years ago

Hmm, mJSHolders tries to do some large allocation when we're close to OOM anyway? Does the testcase perhaps create tons of setTimeout calls? mccr8, we've talked at some point splitting mJSHolders to several hashtables so that black marking could be done in several steps. Something like switch (ptr % 3) { case 0: mJSHolders0.Put(ptr); break; case 1: mJSHolders1.Put(ptr); break; case 2: mJSHolders2.Put(ptr); break; } That might or might not help here, by reducing the risk for large hashtable. (we could add more jsholder hashtables dynamically)

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 4

•

10 years ago

If we're really doing a 3GB allocation here, splitting up the table into a few parts won't help. ;)

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 5

•

10 years ago

Jukka, can you try to figure out what this test case is doing? If that crash report is right, it seems like it is creating hundreds of millions of DOM callbacks...

Reporter

Comment 6

•

10 years ago

Built current mozilla-central from sources and attached Visual Studio to it and waited for the crash. The callstack looks like this: https://dl.dropboxusercontent.com/u/40949268/dump/callstack_1.png https://dl.dropboxusercontent.com/u/40949268/dump/callstack_2.png https://dl.dropboxusercontent.com/u/40949268/dump/callstack_3.png https://dl.dropboxusercontent.com/u/40949268/dump/callstack_4.png https://dl.dropboxusercontent.com/u/40949268/dump/callstack_5.png How does that call trace look to you guys? The page I'm seeing this on exhibits the problem systematically after pretty close to 1000 seconds of running. It is a large Emscripten-compiled application with some handwritten JS code on the side, though something is unique to this Emscripten application since other Emscripten pages don't have the problem. Thanks Andrew for the pointer on checking setTimeout()s. Let me debug the setTimeout()s to see if the page is doing something bad there.

Reporter

Updated

•

10 years ago

OS: Windows NT → Windows 8.1

Hardware: Unspecified → x86_64

Version: unspecified → Trunk

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 7

•

10 years ago

Yeah, it looks like the application must be calling setTimeout in workers an excessive number of times. (Or else there's an underflow bug in the hashtable.)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 8

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #4) > If we're really doing a 3GB allocation here, splitting up the table into a > few parts won't help. ;) Well, I was actually wondering if there is first some huge allocation and then some rather excessive setTimeout calls. But sure, sounds like a web app bug anyway.

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 9

•

10 years ago

(In reply to Olli Pettay [:smaug] from comment #8) > Well, I was actually wondering if there is first some huge allocation and > then some rather excessive setTimeout calls. But sure, sounds like a web app > bug anyway. If you look at the crash report, the "OOM Allocation Size" field is "3120562176", which means that is how big the allocation we were attempting to do when we crashed, so the actual JS holder table is trying to grow itself to 3gb.

Reporter

Comment 10

•

10 years ago

It's likely that the page is doing something rogue (still trying to figure out what), but this should not be disregarded as a web app bug, since it hard-crashes the browser in non-e10s, and in e10s, it crashes all tabs, so whatever the page is doing, it is an effective denial of service attack. I do see a few setTimeout()s used in workers that the page is spawning, but statically analyzing, the logic looks ok. I'm implementing console.log() now in workers page to get some visibility into them.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 11

•

10 years ago

There are plenty of web app bugs which can't really be handled in browsers without breaking something. Leaks for example - web apps tend to leak rather often.

Reporter

Comment 12

•

10 years ago

Sure, I understand that not all OOM-related denial of service vectors are easily fixable, but we should still recognize when such DoS vectors exist. The ideal behavior would be only to crash the offending tab on OOM and free up all the memory the offending tab had leaked, and leave other tabs alive. In any case, we don't still know whether this is a web app bug or not. I'm debugging it further to diagnose.

Reporter

Comment 13

•

10 years ago

Attached file settimeouts.zip — Details

Given that the web worker on the page in question (sorry for not being able to link it directly in here, it is a nonpublic demo governed by Mozilla Partner NDA) is using setTimeout(0) to drive a continuous event loop, I tried to reproduce that scenario as a standalone test case, and it looks like I'm seeing similar behavior. The Emscripten page is growing in memory consumption at about 10MB/sec rate, and the crash occurs when it reaches about 13 GB of consumed memory. In about:memory, this memory is listed in the content process under heap-unclassified. A standalone test case of the same setTimeout() behavior is shown in the attached .zip file. In that file, the main thread is simulated to be under heavy load by repeatedly calling performance.now() (not sure how relevant that is, it did seem to exaggerate the behavior), and the main thread spawns web workers, which run a busy loop of setTimeout() callbacks. These setTimeout() calls keep growing the used memory size up until the memory size reaches 13GB (on my system with 16GB of RAM). Once the memory consumption reaches close to 100% of total system RAM (as observed in Windows task manager), there is momentary large ~1GB/sec disk activity, after which memory usage goes down by about 1-3GB, and begins to grow again. In the .zip sample, the OOM crash does not occur, but it does look like such a chain of setTimeout()s either leaks, or consumes a lot of memory which is GCd/CCd in an overly relaxed manner(?). Could you take a peek at the .zip file :smaug?

Flags: needinfo?(bugs)

Reporter

Comment 14

•

10 years ago

If I manually go and click on the GC button in the about:memory page when the browser is at 12GB memory usage, then the memory usage drops to 4GB, so looks at least like this is not a direct memory leak, but the GC has not decided to run in between?

Reporter

Comment 15

•

10 years ago

Running the .zip test page in 32-bit Firefox stable did exhibit a OOM crash like this https://crash-stats.mozilla.com/report/index/bb18caf8-b045-4fab-b568-58f7d2151020 .

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 16

•

10 years ago

Attached file simpler testcase — Details

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 17

•

10 years ago

So, this is fun. We never trigger GC based on the GC timers in workers in this case. We trigger periodic timer, but then we don't have anything else in the event loop, so we cancel that and trigger idle timer, and then the setTimeout timer runs and we do the same again (triggering periodic timer cancels idle timer...).

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 18

•

10 years ago

Attached patch wip (obsolete) — Details — Splinter Review

This helps with setTimeout case, but I consider any SetGCTimerMode(NoTimer); call still a bug. I wonder whether to deal those later when I finally manage to have saner GC/CC scheduling for both main thread and workers.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 19

•

10 years ago

The current setup in workers seems to be totally ad-hoc, and even though we cancel some timer, the relevant GarbageCollectRunnable can be already in control queue, so cancelling doesn't actually guarantee anything is cancelled.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 20

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0dd63b7ddc0e

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

10 years ago

Component: DOM → DOM: Workers

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 21

•

10 years ago

Attached patch simpler (obsolete) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bade0f516f94

Assignee: nobody → bugs

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 22

•

10 years ago

Comment on attachment 8679734 [details] [diff] [review] simpler We need to not cancel timer all the time, so we need to keep at least idle timer running. hadDebuggerOrNormalRunnables is there so that we don't restart idle timer all the time. Btw, periodic timer runs quite rarely with or without the patch. We could possibly get rid of it.

Attachment #8679734 - Flags: review?(khuey)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 23

•

10 years ago

Comment on attachment 8679734 [details] [diff] [review] simpler whoever gets chance to review this first ;)

Attachment #8679734 - Flags: review?(amarchesini)

Andrea Marchesini [:baku]

Comment 24

•

10 years ago

Comment on attachment 8679734 [details] [diff] [review] simpler Review of attachment 8679734 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/workers/WorkerPrivate.cpp @@ +4578,4 @@ > // periodically (PERIODIC_GC_TIMER_DELAY_SEC) while the worker is running. > // Once the worker goes idle we set a short (IDLE_GC_TIMER_DELAY_SEC) timer to > + // run a shrinking GC. > + mPeriodicGCTimer = do_CreateInstance(NS_TIMER_CONTRACTID); you removed MOZ_ASSERT(). @@ +4583,5 @@ > new GarbageCollectRunnable(this, false, false); > + nsCOMPtr<nsIEventTarget> target = new TimerThreadEventTarget(this, runnable); > + MOZ_ALWAYS_TRUE(NS_SUCCEEDED(mPeriodicGCTimer->SetTarget(target))); > + > + mIdleGCTimer = do_CreateInstance(NS_TIMER_CONTRACTID); an MOZ_ASSERT() here too?

Attachment #8679734 - Flags: review?(amarchesini) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 25

•

10 years ago

yes, removed MOZ_ASSERT very much on purpose. useless to assert if something is null if we're going to crash (safely) anyway immediately after that.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

10 years ago

Attachment #8679734 - Flags: review?(khuey)

Comment 26

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/bf4205bd5198

Phil Ringnalda (:philor)

Comment 27

•

10 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/0959ed75ce40 for Android hangs in test_fetch_cors.html, https://treeherder.mozilla.org/logviewer.html#?job_id=17439634&repo=mozilla-inbound. I'm as surprised as you, but https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=66768d6a50fa&group_state=expanded&filter-searchStr=5f6da659586d13352b7855893813fcd42a4bac87&tochange=0959ed75ce40

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 28

•

10 years ago

bizarre, I'm not seeing any worker usage in that test.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 29

•

10 years ago

pushing to try again, since I don't really know what could cause the issue https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ffe4a4ff2a1

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

9 years ago

Blocks: 1254240

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 30

•

9 years ago

I need to upload the patch again to try

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 31

•

9 years ago

Attached patch up-to-date (obsolete) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cbded4c488659222559324f1b35216a3bd8c4f68

Kan-Ru Chen [:kanru] (UTC+9)

Updated

•

9 years ago

See Also: → 1333035

Ben Kelly [:bkelly, not reviewing]

Comment 32

•

9 years ago

I harshed your patches good in bug 1319278. I want to uplift that, so maybe you can just depend on it for all branches.

Depends on: 1319278

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 33

•

9 years ago

bug 1319278 changed the underlying code quite a bit so need to merge the patch on top of that.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 34

•

9 years ago

Attached patch worker_gc_scheduling_6.diff (obsolete) — Details — Splinter Review

The patch isn't that different, but I think a new review doesn't harm. Basically it isn't using TimeThreadEventTarget anymore but WorkerControlEventTarget (because of https://bugzilla.mozilla.org/show_bug.cgi?id=1319278#c24) Let's see what tryserver thinks about it https://treeherder.mozilla.org/#/jobs?repo=try&revision=f312b423f34479da9b21fc3d8f08ae0288739ac7

Attachment #8830967 - Flags: review?(amarchesini)

Andrea Marchesini [:baku]

Updated

•

9 years ago

Attachment #8830967 - Flags: review?(amarchesini) → review+

Comment 35

•

9 years ago

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/15a5f1ecca37 ensure GC/CC are run in workers, r=baku

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 36

•

9 years ago

I had to back this out for frequent intermittent failures like https://treeherder.mozilla.org/logviewer.html#?job_id=72691153&repo=mozilla-inbound https://hg.mozilla.org/integration/mozilla-inbound/rev/4ae1f35907f85eb93c607c17560524c2bf070891

Flags: needinfo?(bugs)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 37

•

9 years ago

I managed to trigger the failures on your latest try run, if that gives you a sense of how many retriggers would be needed to hit this: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f312b423f34479da9b21fc3d8f08ae0288739ac7&filter-searchStr=linux%20x64%20(c2&group_state=expanded&selectedJob=72691893

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 38

•

9 years ago

huh, devtools. I have no idea how to fix this.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 39

•

9 years ago

Perhaps I could try longer timeout for the test.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 40

•

9 years ago

Hmm, but it is usually very fast test

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 41

•

9 years ago

Attached patch v7 (obsolete) — Details — Splinter Review

Pure guesses here https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5e90ea55b1282c772d8a23fac2f7d0f1848ac1f

Flags: needinfo?(bugs)

Ben Kelly [:bkelly, not reviewing]

Comment 42

•

9 years ago

FYI, I did notice some other worker intermittents after the landing of this and bug 1319278: Bug 1334378 Bug 1334379 Bug 1334383 I'm not sure if they are related, but it makes me a bit nervous to uplift. Perhaps 5 seconds is too often to run a shrinking GC? Or maybe the shrinking GC does not play well with worker termination? The service worker tests probably test the termination paths more than other tests.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 43

•

9 years ago

v7 has 10, but I doubt that is the reason. The link you gave, Bug 1003730, sounds much more likely reason. But I re-triggered c2 several times.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 44

•

9 years ago

Testing something https://treeherder.mozilla.org/#/jobs?repo=try&revision=4f22ece028d02d6e2a0323173dc6f4605dc198f3

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 45

•

8 years ago

Attached patch v8 (obsolete) — Details — Splinter Review

Passed the failing test at least locally. The key change is while (mControlQueue.IsEmpty() && !(debuggerRunnablesPending = !mDebuggerQueue.IsEmpty()) && !(normalRunnablesPending = NS_HasPendingEvents(mThread))) { + if (previousStatus == Closing) { + // Nothing to do, let the code below to kill us. + break; + } WaitForWorkerEvents(); That fixes .close() handling. Right now .close() works somewhat accidentally since we have a GC timer scheduled and that spins event loop so we get out of that while loop.

Attachment #8835147 - Flags: review?(amarchesini)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

8 years ago

Attachment #8679060 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

8 years ago

Attachment #8679734 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

8 years ago

Attachment #8830081 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

8 years ago

Attachment #8830967 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

8 years ago

Attachment #8831338 - Attachment is obsolete: true

Andrea Marchesini [:baku]

Updated

•

8 years ago

Attachment #8835147 - Flags: review?(amarchesini) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 46

•

8 years ago

And now there are some new test failures :/

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 47

•

8 years ago

Attached patch v9 (obsolete) — Details — Splinter Review

This may fix bug 1137403. https://treeherder.mozilla.org/#/jobs?repo=try&revision=f571403da29eb2f2142b298a146c9ad607c0753c The change to the previous patch is that postMessage shouldn't throw if the other side is already closed.

Attachment #8835147 - Attachment is obsolete: true

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 48

•

8 years ago

Attached patch v10 — Details — Splinter Review

yet another variant, trying to change the code less. Won't ask review until I get good enough tryserver results. https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ea7d08c8b7321cefc373e5a7b52ee05517d6f02

Attachment #8835601 - Attachment is obsolete: true

Nathan Froyd [:froydnj]

Comment 49

•

8 years ago

(In reply to Olli Pettay [:smaug] from comment #48) > Created attachment 8835701 [details] [diff] [review] > v10 > > yet another variant, trying to change the code less. > Won't ask review until I get good enough tryserver results. > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=7ea7d08c8b7321cefc373e5a7b52ee05517d6f02 That run looks sort of green...are you planning on investigating the remaining failures?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 50

•

8 years ago

yes, once I have time. I did spend several days looking at those failures bug couldn't figure out a setup which doesn't break any tests. If anyone else has time, feel free to take a look.

Flags: needinfo?(bugs)

Marion Daly [:mdaly]

Updated

•

8 years ago

Priority: -- → P2

Marion Daly [:mdaly]

Comment 51

•

7 years ago

:smaug how's this bug looking?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 52

•

7 years ago

I guess I should fine time to get back to this. I never managed to get green try runs, but since then many things have changed.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 53

•

7 years ago

Attached patch worker_gc_scheduling_11.diff — Details — Splinter Review

rebased remote: Follow the progress of your build on Treeherder: remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f049eaa0d9a695e11e03bc2d03e08e74bac742c remote: recorded changegroup in replication log in 0.018s

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 54

•

7 years ago

Looks like I had pushed from a broken m-i base, remote: Follow the progress of your build on Treeherder: remote: https://treeherder.mozilla.org/#/jobs?repo=try&revision=188cb07a1a46362feae2941eaccf8e67aa023029 remote: recorded changegroup in replication log in 0.018s

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 55

•

7 years ago

Marion, would be great if you could find someone to look at this. Clearly the tryserver push isn't green enough. Comment 17 explains the issue.

Assignee: bugs → nobody

Flags: needinfo?(mdaly)

Summary: crash in OOM | large | NS_ABORT_OOM | mozilla::dom::CallbackObject::Init → GC handling in workers is broken | crash in OOM | large | NS_ABORT_OOM | mozilla::dom::CallbackObject::Init

Marion Daly [:mdaly]

Comment 57

•

7 years ago

Perry, talk to Asuth and see if this may be up your ally?

Flags: needinfo?(mdaly) → needinfo?(perry)

Andrew Sutherland [:asuth] (he/him)

Comment 58

•

7 years ago

This is definitely something we should pursue landing. I'm really hoping :smaug's second push in comment 54 is also on top of bad commits, because the MessageChannel assertions are super odd. :perry, I'd suggest starting with a fresh try build on top of a known clean mozilla-central commit.

Assignee: nobody → perry

Status: NEW → ASSIGNED

Flags: needinfo?(perry)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 59

•

7 years ago

As far as I know, it is not on top of a bad commit.

Perry Jiang [:perry] [no longer employee, use ni?]

Comment 60

•

7 years ago

Haven't gotten to this yet but it's on the todo list.

Andrew Sutherland [:asuth] (he/him)

Comment 61

•

6 years ago

(resetting so we can assign to someone with more cycles)

Assignee: perry → nobody

Status: ASSIGNED → NEW

Whiteboard: DWS_NEXT

Andrew Sutherland [:asuth] (he/him)

Updated

•

6 years ago

Blocks: 1514723

Andrew Sutherland [:asuth] (he/him)

Comment 62

•

6 years ago

I had a better re-cap that got lost in tab bankruptcy. I think the appropriate summary that I'm now stealing from bug 1514723 is: The test failures seem to be that the patch exacerbates some platform thread shutdown issues where nsThread thinks the thread is shutdown because it messaged back and so gecko shutdown continues, but the thread wasn't actually shutdown (we don't join on the thread), so when the thread actually hits full pthread-level shutdown and the RAII stuff about thread-local-storage involving PBackground goes to clean up, assertions start exploding.

The next step was going to be to reproduce under rr since what was happening was more than a little complex.

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Keywords: regression

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 63

•

6 years ago

Not a regression. We've had this GC/CC handling in workers basically forever.

Keywords: regression

BugBot [:suhaib / :marco/ :calixte]

Comment 64

•

6 years ago

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WORKSFORME

Andrew McCreight (out of office until 8/21) [:mccr8]

Updated

•

6 years ago

Status: RESOLVED → REOPENED

Crash Signature: [@ OOM | large | NS_ABORT_OOM | mozilla::dom::CallbackObject::Init]

Resolution: WORKSFORME → ---

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 65

•

6 years ago

This bug is about a specific test case that fails, not failures on crash-stats.

Jens Stutte [:jstutte]

Comment 66

•

6 years ago

:asuth, :perry, is this still actionable as is? Do we plan to come to this anytime soon?

Flags: needinfo?(perry)

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 67

•

6 years ago

This is actionable. Comment 62's synopsis about thread invariants attempting to enforce things as reality was torn down around them was the problem, not the patch. (Like bug 1604005, but instead actor-TLS assertions.) It's possible :nika's work on cleaning up IPC invariants already fixed the problem. If not, then it should be more directly correctable now.

:ytausky, I think this might be particularly up your alley, both for performance and invariant reasons.

Flags: needinfo?(ytausky)

Flags: needinfo?(perry)

Flags: needinfo?(bugmail)

Jens Stutte [:jstutte]

Comment 68

•

5 years ago

Perry, as Yaron is out for a while: you might want to take a look?

Flags: needinfo?(ytausky) → needinfo?(perry)

Perry Jiang [:perry] [no longer employee, use ni?]

Comment 69

•

5 years ago

Won't have time to get to this, clearing ni?

Flags: needinfo?(perry)

Steve Fink [:sfink] [:s:]

Comment 70

•

5 years ago

Reverting needinfo to a valid person.

Flags: needinfo?(ytausky)

Andrew Sutherland [:asuth] (he/him)

Updated

•

4 years ago

See Also: → 1678230

Andrew Sutherland [:asuth] (he/him)

Updated

•

4 years ago

Blocks: 1693143

Comment 71

•

4 years ago

Hey Jukka,
Can you still reproduce this or should we close it?

Flags: needinfo?(jujjyl)

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 72

•

4 years ago

asuth marked this as blocking another bug a few months ago, so I think the general issue is still relevant.

Flags: needinfo?(jujjyl)

Updated

•

4 years ago

QA Whiteboard: [qa-not-actionable]

Jens Stutte [:jstutte]

Updated

•

4 years ago

Flags: needinfo?(ytausky) → needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 73

•

3 years ago

We are clearly not considering this to be P2, if we do not work on it. Should we?

Severity: critical → S3

Flags: needinfo?(jstutte) → needinfo?(bugmail)

Priority: P2 → P3

Andrew Sutherland [:asuth] (he/him)

Comment 74

•

3 years ago

(In reply to Jens Stutte [:jstutte] from comment #73)

We are clearly not considering this to be P2, if we do not work on it. Should we?

Yes. This might be a good candidate for your ongoing tech-debt cleanup efforts. Specifically, if de-bitrotted, the patch on the bug (in comment 54) should probably apply cleanly and the failures I mentioned in comment 62 I think will have gone away.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Updated

•

3 years ago

Blocks: 1805460

Andrew Sutherland [:asuth] (he/him)

Updated

•

3 years ago

Assignee: nobody → bugmail

Status: REOPENED → ASSIGNED

Andrew Sutherland [:asuth] (he/him)

Updated

•

3 years ago

No longer blocks: 1805460

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

3 years ago

Assignee: bugmail → smaug

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 75

•

3 years ago

Attached file Bug 1216175 - ensure GC/CC runs in workers — Details

Phabricator Automation

Updated

•

3 years ago

Attachment #9313173 - Attachment description: WIP: Bug 1216175 - ensure GC/CC runs in workers → Bug 1216175 - ensure GC/CC runs in workers

Comment 76

•

3 years ago

Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f48341f59d55 ensure GC/CC runs in workers r=edenchuang

Sandor Molnar[:smolnar]

Comment 77

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f48341f59d55

Status: ASSIGNED → RESOLVED

Closed: 6 years ago → 3 years ago

status-firefox111: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 111 Branch

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox109: --- → wontfix

status-firefox110: --- → wontfix

status-firefox-esr102: --- → wontfix

You need to log in before you can comment on or make changes to this bug.