If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Frequent test_2d.composite.canvas.color.html | application timed out after 450.0 seconds with no output after ABORT: not on worker thread!: 'mWorkerLoopID == MessageLoop::current()->id()'

RESOLVED FIXED in Firefox OS v2.2

Status

()

Core
IPC
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: RyanVM, Assigned: jerry)

Tracking

({assertion, crash, intermittent-failure})

Other Branch
2.2 S14 (12june)
ARM
Gonk (Firefox OS)
assertion, crash, intermittent-failure
Points:
---

Firefox Tracking Flags

(b2g-v2.2 fixed, b2g-master unaffected)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
Created attachment 8588898 [details]
test log

I've conclusively been able to pinpoint the uplift of bug 1142360 as the "cause" of this frequent (near-permanent) failure on b2g37. Given the nature of the patch, it strongly points to changing chunking boundaries as exposing whatever underlying issue is at play here.

Looks like we're having stackwalking issues on b2g37 as well. I wonder if we're missing an uplift or something there.
Flags: needinfo?(bkelly)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 8

3 years ago
Bill, any ideas otherwise?
Flags: needinfo?(wmccloskey)
We've seen a couple bugs like this recently. Without a stack, though, it's hard to fix. Maybe a b2g person could run the test locally? It sounds like it's pretty reproducible.

I agree that the "regressing" patch probably didn't cause this.
Flags: needinfo?(wmccloskey)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Comment 55

3 years ago
Ryan, I'm sorry but I'm not realistically going to have time to work on this in the next month.  I have too many other fires to put out.  Is there anyone else you can ask?
Flags: needinfo?(bkelly) → needinfo?(ryanvm)
(Reporter)

Comment 56

3 years ago
Jed maybe? To recap, this is near perma-fail on b2g37 (v2.2), our recently announced long-term supported B2G release.
Flags: needinfo?(ryanvm) → needinfo?(jld)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 58

3 years ago
Actually, let's try Cervantes first.
Flags: needinfo?(jld) → needinfo?(cyu)
The stackwalking looks like it's working fine; what's needed is some way to translate the libxul.so offsets into names.  The breakpad symbols would work, but they're on pvtbuilds and I apparently don't have the right flavor of VPN set up to get there from where I currently am.

Piping the logs through fix_b2g_stack.py, with the unstripped binaries from the build in the appropriate places, would have taken care of this; I don't know what part(s) of that aren't happening for automated tests.

It's also a little worrying that the crash reporter isn't working — minidump_stackwalk can be seen complaining about 0-length .dmp files.
(Reporter)

Comment 60

3 years ago
You can access pvtbuilds with LDAP at https://pvtbuilds.mozilla.org/pvt/mozilla.org/
Comment hidden (Treeherder Robot)
Is this failure only seen on 2.2? I ran the test on emulator debug build but didn't see the abort() in question.
Flags: needinfo?(cyu)
(In reply to Cervantes Yu from comment #62)
> Is this failure only seen on 2.2? I ran the test on emulator debug build but
> didn't see the abort() in question.
  ^^^^^^^^^^^^^^^^^^^^^^ on m-c.
I didn't see this crash on local 2.2 built.
(Reporter)

Comment 65

3 years ago
Yes, we're only seeing it on v2.2. Note that per comment 0, this is heavily dependent on the specific chunk.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(In reply to Treeherder Robot from comment #71)
> log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-b2g37_v2_2&job_id=106044
> revision: cc02ee38b252

With the hint from comment #60, I have the symbols.  There ought to be a tool for this, but it's easy enough to interpret breakpad symbols manually with less(1).

> 03:10:30 INFO - #01: ???[libxul.so +0x5729b6]

mozilla::ipc::MessageChannel::AssertWorkerThread, ipc/MessageChannel.h line 402

> 03:10:30 INFO - #02: ???[libxul.so +0x57db8c]

mozilla::ipc::MessageChannel::Send, ipc/glue/MessageChannel.cpp line 720

> 03:10:30 INFO - #03: ???[libxul.so +0x6c525c]

mozilla::layers::PTextureChild::SendClearTextureHostSync, (generated) PTextureChild.cpp line 85

> 03:10:30 INFO - #04: ???[libxul.so +0x89ad04]

mozilla::layers::TextureClient::ForceRemove, gfx/layers/client/TextureClient.cpp line 511 

> 03:10:30 INFO - #05: ???[libxul.so +0x8c1af2]

mozilla::layers::ShadowLayerForwarder::RemoveTexture, line 523

> 03:10:30 INFO - #06: ???[libxul.so +0x8a06d4]
> 03:10:30 INFO - #07: ???[libxul.so +0x86be98]
> 03:10:30 INFO - #08: ???[libxul.so +0x86c3c4]
> 03:10:30 INFO - #09: ???[libxul.so +0x86c3f4]
> 03:10:30 INFO - #10: ???[libxul.so +0x56bb14]
> 03:10:30 INFO - #11: ???[libxul.so +0x8607b6]
> 03:10:30 INFO - #12: ???[libxul.so +0x86082e]
> 03:10:30 INFO - #13: ???[libxul.so +0x8608e2]
> 03:10:30 INFO - #14: ???[libxul.so +0x860908]
> 03:10:30 INFO - #15: ???[libxul.so +0x56bb14]
> 03:10:30 INFO - #16: ???[libxul.so +0x863ed4]
> 03:10:30 INFO - #17: ???[libxul.so +0x86407a]
> 03:10:30 INFO - #18: ???[libxul.so +0x86410c]
> 03:10:30 INFO - #19: ???[libxul.so +0x857384]
> 03:10:30 INFO - #20: ???[libxul.so +0x87913e]
> 03:10:30 INFO - #21: ???[libxul.so +0xdf2f54]
> 03:10:30 INFO - #22: ???[libxul.so +0xdd10e6]
> 03:10:30 INFO - #23: ???[libxul.so +0xdd10f8]
> 03:10:30 INFO - #24: ???[libxul.so +0xde11fa]
> 03:10:30 INFO - #25: ???[libxul.so +0xde6b9e]

mozilla::WebGLContext::cycleCollection::DeleteCycleCollectable, dom/canvas/WebGLContext.h line 192

> 03:10:30 INFO - #26: ???[libxul.so +0x379b7c]

SnowWhiteKiller::~SnowWhiteKiller, xpcom/base/nsCycleCollector.cpp line 2674

> 03:10:30 INFO - #27: ???[libxul.so +0x379c62]
> 03:10:30 INFO - #28: ???[libxul.so +0x379df2]
> 03:10:30 INFO - #29: ???[libxul.so +0x37a4c4]
> 03:10:30 INFO - #30: ???[libxul.so +0x37a5c4]
> 03:10:30 INFO - #31: ???[libxul.so +0x37a660]
> 03:10:30 INFO - #32: ???[libxul.so +0x3c2466]

mozilla::ShutdownXPCOM, xpcom/build/XPCOMInit.cpp line 942

> 03:10:30 INFO - #33: XRE_TermEmbedding[libxul.so +0x165f504]
> 03:10:30 INFO - #34: ???[libxul.so +0x574222]
> 03:10:30 INFO - #35: XRE_InitChildProcess[libxul.so +0x16602f6]
> 03:10:30 INFO - #36: ???[/system/b2g/plugin-container +0x8dc0]
> 03:10:30 INFO - #37: __libc_init[libc.so +0x16722]

I could resolve the rest of them, but this is already a little weird — if I'm reading the IPDL correctly, PTexture's toplevel protocol is PCompositor, CompositorChild expects to be owned by the main thread, and this *is* the main thread… but we're hitting the assertion for trying to send from the wrong thread.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Here's a more fixed version of the log from comment #71:

> 03:10:30 INFO - [Child 790] ###!!! ABORT: not on worker thread!: 'mWorkerLoopID == MessageLoop::current()->id()', file ../../dist/include/mozilla/ipc/MessageChannel.h, line 401
> 03:10:30 INFO - #01: mozilla::ipc::MessageChannel::AssertWorkerThread /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/ipc/MessageChannel.h:400 (libxul.so+0x5729b6)
> 03:10:30 INFO - #02: mozilla::ipc::MessageChannel::Send ipc/glue/MessageChannel.cpp:719 (libxul.so+0x57db8c)
> 03:10:30 INFO - #03: mozilla::layers::PTextureChild::SendClearTextureHostSync /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/ipc/ipdl/PTextureChild.cpp:84 (libxul.so+0x6c525c)
> 03:10:30 INFO - #04: mozilla::layers::TextureClient::ForceRemove gfx/layers/client/TextureClient.cpp:510 (libxul.so+0x89ad04)
> 03:10:30 INFO - #05: mozilla::layers::ShadowLayerForwarder::RemoveTexture gfx/layers/ipc/ShadowLayers.cpp:523 (libxul.so+0x8c1af2)
> 03:10:30 INFO - #06: mozilla::layers::TextureClient::Finalize gfx/layers/client/TextureClient.cpp:557 (libxul.so+0x8a06d4)
> 03:10:30 INFO - #07: mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/layers/AtomicRefCountedWithFinalize.h:59 (libxul.so+0x86be98)
> 03:10:30 INFO - #08: mozilla::gl::SharedSurface_Gralloc::~SharedSurface_Gralloc /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/RefPtr.h:302 (libxul.so+0x86c3c4)
> 03:10:30 INFO - #09: mozilla::gl::SharedSurface_Gralloc::~SharedSurface_Gralloc gfx/gl/SharedSurfaceGralloc.cpp:164 (libxul.so+0x86c3f4)
> 03:10:30 INFO - #10: nsRefPtr<txACompileObserver>::assign_assuming_AddRef /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/nsRefPtr.h:47 (libxul.so+0x56bb14)
> 03:10:30 INFO - #11: mozilla::gl::ShSurfHandle::~ShSurfHandle /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/UniquePtr.h:255 (libxul.so+0x8607b6)
> 03:10:30 INFO - #12: mozilla::RefPtr<mozilla::gl::ShSurfHandle>::unref /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/RefPtr.h:142 (libxul.so+0x86082e)
> 03:10:30 INFO - #13: mozilla::gl::GLScreenBuffer::~GLScreenBuffer /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/RefPtr.h:243 (libxul.so+0x8608e2)
> 03:10:30 INFO - #14: mozilla::gl::GLScreenBuffer::~GLScreenBuffer gfx/gl/GLScreenBuffer.cpp:88 (libxul.so+0x860908)
> 03:10:30 INFO - #15: nsRefPtr<txACompileObserver>::assign_assuming_AddRef /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/nsRefPtr.h:47 (libxul.so+0x56bb14)
> 03:10:30 INFO - #16: mozilla::gl::GLContext::MarkDestroyed /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/UniquePtr.h:281 (libxul.so+0x863ed4)
> 03:10:30 INFO - #17: mozilla::gl::GLContextEGL::~GLContextEGL gfx/gl/GLContextProviderEGL.cpp:248 (libxul.so+0x86407a)
> 03:10:30 INFO - #18: mozilla::gl::GLContextEGL::~GLContextEGL gfx/gl/GLContextProviderEGL.cpp:274 (libxul.so+0x86410c)
> 03:10:30 INFO - #19: mozilla::detail::GenericRefCounted<(mozilla::detail::RefCountAtomicity)0u>::Release /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/mozilla/GenericRefCounted.h:95 (libxul.so+0x857384)
> 03:10:30 INFO - #20: nsRefPtr<mozilla::gl::GLContext>::assign_assuming_AddRef /builds/slave/b2g_m-b37_22_emu-d_ntly-000000/build/objdir-gecko/dist/include/nsRefPtr.h:47 (libxul.so+0x87913e)
> 03:10:30 INFO - #21: mozilla::WebGLContext::~WebGLContext dom/canvas/WebGLContext.cpp:305 (libxul.so+0xdf2f54)
> 03:10:30 INFO - #22: mozilla::WebGL1Context::~WebGL1Context dom/canvas/WebGL1Context.cpp:26 (libxul.so+0xdd10e6)
> 03:10:30 INFO - #23: mozilla::WebGL1Context::~WebGL1Context dom/canvas/WebGL1Context.cpp:26 (libxul.so+0xdd10f8)
> 03:10:30 INFO - #24: mozilla::WebGLContext::DeleteCycleCollectable dom/canvas/WebGLContext.cpp:1874 (libxul.so+0xde11fa)
> 03:10:30 INFO - #25: mozilla::WebGLContext::cycleCollection::DeleteCycleCollectable dom/canvas/WebGLContext.h:192 (libxul.so+0xde6b9e)
> 03:10:30 INFO - #26: SnowWhiteKiller::~SnowWhiteKiller xpcom/base/nsCycleCollector.cpp:2674 (libxul.so+0x379b7c)
> 03:10:30 INFO - #27: nsCycleCollector::FreeSnowWhite xpcom/base/nsCycleCollector.cpp:2841 (libxul.so+0x379c62)
> 03:10:30 INFO - #28: nsCycleCollector::BeginCollection xpcom/base/nsCycleCollector.cpp:3781 (libxul.so+0x379df2)
> 03:10:30 INFO - #29: nsCycleCollector::Collect xpcom/base/nsCycleCollector.cpp:3616 (libxul.so+0x37a4c4)
> 03:10:30 INFO - #30: nsCycleCollector::ShutdownCollect xpcom/base/nsCycleCollector.cpp:3568 (libxul.so+0x37a5c4)
> 03:10:30 INFO - #31: nsCycleCollector_shutdown xpcom/base/nsCycleCollector.cpp:4264 (libxul.so+0x37a660)
> 03:10:30 INFO - #32: mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:940 (libxul.so+0x3c2466)
> 03:10:30 INFO - #33: XRE_TermEmbedding[libxul.so +0x165f504]
> 03:10:30 INFO - #34: mozilla::ipc::ScopedXREEmbed::Stop ipc/glue/ScopedXREEmbed.cpp:115 (libxul.so+0x574222)
> 03:10:30 INFO - #35: XRE_InitChildProcess[libxul.so +0x16602f6]
> 03:10:30 INFO - #36: content_process_main ipc/contentproc/plugin-container.cpp:211 (plugin-container+0xdc0)
> 03:10:30 INFO - #37: __libc_init[libc.so +0x16722]
Comment hidden (Treeherder Robot)
(In reply to Jed Davis [:jld] {UTC-7} from comment #72)
> I could resolve the rest of them, but this is already a little weird — if
> I'm reading the IPDL correctly, PTexture's toplevel protocol is PCompositor,
> CompositorChild expects to be owned by the main thread, and this *is* the
> main thread… but we're hitting the assertion for trying to send from the
> wrong thread.

Maybe this is use after free? If the MessageChannel is already dead, it could fail this assertion check (the MessageLoop is still alive when this crash happends).
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
This appears to be use after free of the MessageChannel instance. I added log instrumentation and found that when the assertion fails, mWorkerLoopID contains a random integer value.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 196

2 years ago
This has shifted to test_2d.composite.canvas.color-dodge.html now.
Summary: Frequent test_2d.composite.canvas.color-burn.html | application timed out after 450.0 seconds with no output after ABORT: not on worker thread!: 'mWorkerLoopID == MessageLoop::current()->id()', file MessageChannel.h, line 401 → Frequent test_2d.composite.canvas.color-burn.html,test_2d.composite.canvas.color-dodge.html | application timed out after 450.0 seconds with no output after ABORT: not on worker thread!: 'mWorkerLoopID == MessageLoop::current()->id()'
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Updated

2 years ago
Blocks: 1167904
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 225

2 years ago
Josh, B2G debug mochitest-5 has been failing ~90% of the time on B2G v2.2 for nearly 2 months now. Please help me find someone who can investigate this before I'm forced to hide the suite due to lack of progress.
Flags: needinfo?(jocheng)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Comment 233

2 years ago
Hi Peter,
This seems to be gfx issue.
Could you help to find someone who can help to fix this?
Thanks!
Flags: needinfo?(pchang)
Jerry, please help to check the WebGL resource free problem based on comment 76 and comment 106.
Flags: needinfo?(pchang) → needinfo?(hshih)
(Assignee)

Comment 235

2 years ago
I will try to reproduce locally.
Assignee: nobody → hshih
Flags: needinfo?(hshih)
(Assignee)

Updated

2 years ago
Status: NEW → ASSIGNED

Comment 236

2 years ago
Thanks, Peter and Jerry!
Flags: needinfo?(jocheng)
(Reporter)

Updated

2 years ago
Summary: Frequent test_2d.composite.canvas.color-burn.html,test_2d.composite.canvas.color-dodge.html | application timed out after 450.0 seconds with no output after ABORT: not on worker thread!: 'mWorkerLoopID == MessageLoop::current()->id()' → Frequent test_2d.composite.canvas.color.html | application timed out after 450.0 seconds with no output after ABORT: not on worker thread!: 'mWorkerLoopID == MessageLoop::current()->id()'
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 241

2 years ago
The mWorkerLoopID id in the broken case seems invalid. Investigating...


04:52:53.693 I/GonkMemoryPressure(  793): Observed XPCOM shutdown.
...........
04:53:13.653 I/Gecko   (  793): [SharedSurface_Gralloc 0x4670b820] destroyed
04:53:13.653 I/Gecko   (  793): bignose ForceRemove 1, addr:0x441bcb80, actor:0x451ab100, tid:793
04:53:13.653 I/Gecko   (  793): bignose thread error, tid:793, workerID:1515870810, MessageLoop::current()->id():1   <====
04:53:13.653 I/Gecko   (  793): [Child 793] ###!!! ABORT: not on worker thread!: 'mWorkerLoopID == MessageLoop::current()->id()', file ../../dist/include/mozilla/ipc/MessageChannel.h, line 404
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 247

2 years ago
I see some interested log. We delete the actor twice. Please see the address "0x44053af0".
Investigating...

23:00:15     INFO -  06-01 05:46:28.555 I/Gecko   (  791): bignose TextureClient::Finalize, actor:0x44053af0, forwarder:0x44a679b0, tid:815
23:00:15     INFO -  06-01 05:46:28.555 I/Gecko   (  791): bignose ForceRemove 1, addr:0x44053af0, actor:0x45aa3440, tid:815
...........
23:00:17     INFO -  06-01 05:50:51.606 I/Gecko   (  791): [SharedSurface_Gralloc 0x46586b50] destroyed
23:00:17     INFO -  06-01 05:50:51.615 I/Gecko   (  791): bignose TextureClient::Finalize, actor:0x44053af0, forwarder:0x44afb820, tid:791
23:00:17     INFO -  06-01 05:50:51.615 I/Gecko   (  791): bignose ForceRemove 1, addr:0x44053af0, actor:0x44f48d80, tid:791
23:00:17     INFO -  06-01 05:50:51.615 I/Gecko   (  791): bignose thread error, tid:791, worker:1515870810 current:1
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 253

2 years ago
TextureClient could be recycled, so ignore the comment 247.
(Assignee)

Comment 254

2 years ago
The crash stack from comment 76 show that we use ipc actor after ipc shutdown.

ipc shutdown:
https://hg.mozilla.org/mozilla-central/annotate/56241c1f8a3b/xpcom/build/XPCOMInit.cpp#l854

CC shutdown:
https://hg.mozilla.org/mozilla-central/annotate/56241c1f8a3b/xpcom/build/XPCOMInit.cpp#l946

Let me check our ActorDestroy() function.
Comment hidden (Treeherder Robot)
(Assignee)

Comment 256

2 years ago
I know what's going on here. Will upload a patch later.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 260

2 years ago
Created attachment 8614181 [details] [diff] [review]
remove the shutdown call of CompositorChild in child process. v1

Since we call CompositorChild::Shutdown() in gfxPlatform::ShutdownLayersIPC(), we also delete the LayerTransaction ipc. Then we use the ipc during cc. Please check comment 76.
And we don't have this in m-c. Only v2.2(b2g37) has this patch. I try to remove this patch now. https://bugzilla.mozilla.org/show_bug.cgi?id=1065536#c18
Attachment #8614181 - Flags: review?(continuation)
(Assignee)

Comment 261

2 years ago
wait for the try
https://treeherder.mozilla.org/#/jobs?repo=try&revision=18d5f48b237e
Blocks: 1065536
Comment on attachment 8614181 [details] [diff] [review]
remove the shutdown call of CompositorChild in child process. v1

Review of attachment 8614181 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for causing trouble here. Please add something to the description saying that you are backing out commit 3cf774a0954d. It would probably also make more sense to land this backout in bug 1065536.
Attachment #8614181 - Flags: review?(continuation) → review+
(Reporter)

Comment 263

2 years ago
Comment on attachment 8614181 [details] [diff] [review]
remove the shutdown call of CompositorChild in child process. v1

I can't recall the last time I've seen so much green on a b2g37 debug M5 run. *sniff* It's beautiful!
Attachment #8614181 - Flags: feedback+
(Assignee)

Comment 264

2 years ago
Created attachment 8614459 [details] [diff] [review]
back out bug 1065536 part 3 for some ipc crash. v2. r=mccr8

update patch comment.
Attachment #8614181 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8614459 - Attachment description: back out bug 1065536 part 3 for some ipc crash. v2 → back out bug 1065536 part 3 for some ipc crash. v2. r=mccr8
Attachment #8614459 - Flags: review+
Comment hidden (Treeherder Robot)
(Assignee)

Comment 266

2 years ago
Comment on attachment 8614459 [details] [diff] [review]
back out bug 1065536 part 3 for some ipc crash. v2. r=mccr8

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
hit abort as https://bugzilla.mozilla.org/show_bug.cgi?id=1151711#c76
User impact if declined:
m5 test failed
Testing completed:
b2g37 try link https://treeherder.mozilla.org/#/jobs?repo=try&revision=18d5f48b237e
Risk to taking this patch (and alternatives if risky):
low
maybe we will have leak as https://bugzilla.mozilla.org/show_bug.cgi?id=1065536#c0
We back out a patch in 1065536 as m-c does here.
String or UUID changes made by this patch:
none
Attachment #8614459 - Flags: approval-mozilla-b2g37?

Comment 267

2 years ago
Comment on attachment 8614459 [details] [diff] [review]
back out bug 1065536 part 3 for some ipc crash. v2. r=mccr8

Hi Jerry,
Approving and please check again whether m5 test no longer fail.
Thanks!
Flags: needinfo?(hshih)
Attachment #8614459 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
(Assignee)

Updated

2 years ago
Flags: needinfo?(hshih)
Keywords: checkin-needed
(Assignee)

Comment 268

2 years ago
(In reply to Josh Cheng [:josh] from comment #267)
> Comment on attachment 8614459 [details] [diff] [review]
> back out bug 1065536 part 3 for some ipc crash. v2. r=mccr8
> 
> Hi Jerry,
> Approving and please check again whether m5 test no longer fail.
> Thanks!

The test result.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=18d5f48b237e
Needs more tests.
(Assignee)

Comment 269

2 years ago
please land the attachment 8614459 [details] [diff] [review] to b2g37.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Reporter)

Comment 272

2 years ago
Huzzah!

https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/2b930da45291
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-b2g-v2.2: --- → fixed
status-b2g-master: --- → unaffected
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S14 (12june)
(Reporter)

Updated

2 years ago
No longer blocks: 1167904
You need to log in before you can comment on or make changes to this bug.