Closed Bug 1857610 Opened 2 years ago Closed 2 years ago

VideoBridgeParent receives IPC close with reason=AbnormalShutdown with Decode error: NS_ERROR_DOM_MEDIA_FATAL_ERR

Categories

(Core :: Audio/Video, defect)

Unspecified
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr115 --- unaffected
firefox118 --- wontfix
firefox119 --- wontfix
firefox120 --- wontfix

People

(Reporter: yoasif, Unassigned)

Details

(Keywords: regression, reproducible)

Attachments

(3 files)

Updated my Fedora 38 system today and rebooted the system, and found Firefox crashing upon start (I use session restore).

Safe mode didn't help, with errors:

[asif@hp-laptop firefox-nightly]$ ./firefox --safe-mode 
[fluent] Missing message in locale en-US: refresh-profile-instead
[fluent] Couldn't find a message: refresh-profile-instead
[dom/l10n] Could not complete initial document translation.
Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
[GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown
Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
[GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown
[Child 25488, MediaDecoderStateMachine #1] WARNING: Decoder=7f7faebc0100 Decode error: NS_ERROR_DOM_MEDIA_FATAL_ERR (0x806e0005) - auto mozilla::MediaChangeMonitor::CreateDecoderAndInit(MediaRawData *)::(anonymous class)::operator()(const MediaResult &) const: Unable to create decoder: file /builds/worker/checkouts/gecko/dom/media/MediaDecoderStateMachineBase.cpp:166
Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
[GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown
ExceptionHandler::GenerateDump cloned child 26988
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.
Exiting due to channel error.

This is not a recent regression. Older versions of Firefox seem to crash tabs instead of exiting the browser - that is better behavior.

28:06.96 INFO: Narrowed integration regression window from [0a06638b, e608dfe1] (4 builds) to [bdf8893a, e608dfe1] (2 builds) (~1 steps left)
28:06.96 INFO: No more integration revisions, bisection finished.
28:06.96 INFO: Last good revision: bdf8893a3dfdac6fe0ad445625a24e6d18e89fda
28:06.96 INFO: First bad revision: e608dfe11fc705c543ec4050ee01403eb5f66a52
28:06.96 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=bdf8893a3dfdac6fe0ad445625a24e6d18e89fda&tochange=e608dfe11fc705c543ec4050ee01403eb5f66a52

Regressed by: 1835804
Attached file installed-packages.txt

output of sudo dnf list installed in case that helps.

Two reported crashes that seem to correspond to this issue:

bp-4acd1f54-e879-4416-9619-a424a0231006
bp-ba150765-aa51-4d2d-9971-ef9d10231006

Flags: needinfo?(lissyx+mozillians)

Set release status flags based on info from the regressing bug 1835804

I have high doubts regarding bug 1835804 being the cause for that:

While JSoracle runs on a utility process, it shares the utility audio decoder process, not the rdd one ; rdd or utility audio decoder would only start upon media required (video decoder for rdd ; audio decoder for utility)

Flags: needinfo?(lissyx+mozillians)

Would have been nice to get more STR, btw.

And if it's really related to bug 1835804 then flipping media.allow-audio-non-utility in about:config to true should "fix" the crash but it would suggest something else is wrong on your system ...

Flags: needinfo?(yoasif)

please test a fresh profile and share about:support as well ?

But really, the sequence of events:

Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
[GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown

Makes me really more worried about the system itself, errors like that are more symptoms of a hardware issue (SSD, RAM, or even CPU or PSU, I've seen it)

Thanks, I dont see anything more interesting in about:support. Can you test the pref I mentionned ? And on a fresh profile, can you test e.g., some audio only content vs some video+audio ?

Attached file rpm -Va output

I tried flipping the pref and browsing some random Pocket new stories in a new profile.

Within a few seconds, I ended up with crashed tabs in URLs:

https://www.wired.com/story/heisse-preise-food-prices/?utm_source=pocket-newtab-en-us
https://www.rollingstone.com/culture/culture-features/maverick-miles-nehemiah-true-story-1234820794/
https://www.desy.de/news/news_search/index_eng.html?openDirectAnchor=2951&two_columns=0

I don't doubt that this could be a hardware issue - the OS upgrade (and the fact that going back to a snapshot prior to the upgrade resolves the issue) happening around the same time makes me doubt that a bit, but file corruption may be an issue as well.

I also went ahead and added the ouput of rpm -Va in case something is interesting there as well.

Flags: needinfo?(yoasif)
Attachment #9357190 - Attachment filename: rpm-va → rpm-va.txt
Attachment #9357190 - Attachment mime type: application/octet-stream → text/plain

(In reply to Asif Youssuff from comment #11)

Created attachment 9357190 [details]
rpm -Va output

I tried flipping the pref and browsing some random Pocket new stories in a new profile.

Within a few seconds, I ended up with crashed tabs in URLs:

So if flipping the pref still crashes, I am really really skeptical my change has any relationship

https://www.wired.com/story/heisse-preise-food-prices/?utm_source=pocket-newtab-en-us
https://www.rollingstone.com/culture/culture-features/maverick-miles-nehemiah-true-story-1234820794/
https://www.desy.de/news/news_search/index_eng.html?openDirectAnchor=2951&two_columns=0

I don't doubt that this could be a hardware issue - the OS upgrade (and the fact that going back to a snapshot prior to the upgrade resolves the issue) happening around the same time makes me doubt that a bit, but file corruption may be an issue as well.

ok, so maybe you should verify this before ? if you dont hit the issue after a rollback and you suspect some corruption, that would be much more aligned with :

    Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: Assertion `ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!
    [GFX1-]: VideoBridgeParent receives IPC close with reason=AbnormalShutdown

this would suggest, combined with https://crash-stats.mozilla.org/report/index/ba150765-aa51-4d2d-9971-ef9d10231006 that a RDD process tried to launch and failed in a way that made mozilla::PRDDChild::OtherPid() access nullptr ? It would be consistent with something corrupted (fs, ram, cpu?) that makes the process die early, and would complete Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: Assertion ELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!` accurately imho

maybe corruption around ffmpeg and/or other libs? Maybe about:crashes has some other crashes you could submit ?

I also went ahead and added the ouput of rpm -Va in case something is interesting there as well.

I'm not sure how useful it is to me

I don't deny that there may be an underlying issue within the OS upgrade that is making this bug's behavior worse. The only thing I would note is that both flipping the pref and prior to 1835804, Firefox would crash the tab and not the entire browser.

Maybe the system seems so broken to you that you would rather Firefox crash here so that the user investigates further -- I do know that e.g. Element did not work even in a fresh Firefox profile in my "upgraded", while Gnome Web does.

Feel free to close in that case - if you have any idea of what package may be broken so that I can report to Fedora, I'd appreciate that - totally fine if you have no idea, though.

(In reply to Asif Youssuff from comment #13)

I don't deny that there may be an underlying issue within the OS upgrade that is making this bug's behavior worse. The only thing I would note is that both flipping the pref and prior to 1835804, Firefox would crash the tab and not the entire browser.

if you have such "before" content crash, you should share them so we can investigate, but content process crashing is orthogonal to utility/rdd process crashing so I dont really know if it would help us more or not

Maybe the system seems so broken to you that you would rather Firefox crash here so that the user investigates further -- I do know that e.g. Element did not work even in a fresh Firefox profile in my "upgraded", while Gnome Web does.

it's not what I prefer, it's that the crash is inconsistent with what we changed, and you have symptoms that really suggest something else very bad happening, which might explain all the issues in itself.

Feel free to close in that case - if you have any idea of what package may be broken so that I can report to Fedora, I'd appreciate that - totally fine if you have no idea, though.

Again, I am unsure a package is broken that needs to be reported to Fedora, I am worrying something on your system is broken. Please look a bit at this error "Inconsistency detected by ld.so: ../sysdeps/x86_64/dl-machine.h: 487: elf_machine_rela_relative: AssertionELFW(R_TYPE) (reloc->r_info) == R_X86_64_RELATIVE' failed!" and you will understand it's not just me trying to avoid investigating the issue ; I should be sleeping rather than commenting this bug, to be honest.

Bad RAM could be triggered by something we do for example, you can even think of https://en.wikipedia.org/wiki/Row_hammer for example

I'm really sorry that you lost sleep on this issue - I reinstalled packages and things are working. But now I'm wondering why the updater thought the files on disk were written correctly.

Take it easy!

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

I prefer we checked and know it's a system problem rather than letting a potential issue in the codebase :). Thanks for checking

No longer regressed by: 1835804
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: