Closed Bug 1991491 Opened 5 months ago Closed 5 months ago

Startup crash results in the equivalent of a fork bomb

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

RESOLVED FIXED
145 Branch
Tracking Status
firefox-esr140 --- unaffected
firefox143 --- wontfix
firefox144 --- fixed
firefox145 --- fixed

People

(Reporter: yannis, Assigned: afranchuk)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files, 1 obsolete file)

Reproducing the startup crash in bug 1991378 started the equivalent of a fork bomb on my Windows computer, with windows opening one after another in a loop, in a very quick and neverending sequence. These windows were the crash reporter window and the "could not load XPCOM" error window. Essentially, Firefox crashes, starting the crash reporter, which launches Firefox, which crashes, starting the crash reporter, etc.

The crash reporter starts with command line e.g.: "C:\Program Files\Mozilla Firefox Beta\crashreporter.exe" "C:\Users\yjugl\AppData\Roaming\Mozilla\Firefox\Crash Reports\pending\ccf22810-0c7e-484b-bafa-f3089bd4cf19.dmp"

Firefox starts with command line e.g.: "C:\Program Files\Mozilla Firefox Beta\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/75692631-adef-4bdb-808e-5dfae2fc2ad0 "crashreporter/144.0 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter16236-request0.json

Based on crashreporterNetworkBackend, I believe that bug 1866863 should be the regressor.

While my first test was in Beta, I have confirmed that I can also reproduce this in Release. This issue made me lose control of my computer to the point that I had to manually force it to shut down, and then I had enter to my BitLocker recovery key after turning the computer back on. It would have resulted in a huge data loss (all of my C: drive) for me if I didn't have the recovery key available.

I am now using the Powershell script below when I test this issue, in order to guarantee that I can safely shut down my computer:

Start-Sleep -Seconds 20; while ($true) {
    $targets = @("crashreporter", "firefox")

    foreach ($name in $targets) {
        Get-Process -Name $name -ErrorAction SilentlyContinue | ForEach-Object {
            try {
                Stop-Process -Id $_.Id -Force -ErrorAction Stop
                Write-Host "Killed $name.exe with PID $($_.Id)"
            } catch {
                Write-Host "Failed to kill $name.exe with PID $($_.Id): $_"
            }
        }
    }
}

Here is the script that generated xul_corrupt.dll for reference or if you prefer to re-create it (for another version of Firefox, you will need to adapt the value of cut_at, it was chosen to be smaller than the PointerToRawData for the .tls section in the PE section table but still high enough for the DLL corruption to not be detected at startup):

cut_at = 152 # in MiB
cut_at *= 1024 * 1024
with open("xul.dll", "rb") as h:
    data = h.read()
data_corrupt = data[0:cut_at] + bytes([0] * (len(data) - cut_at))
with open("xul_corrupt.dll", "wb") as h:
    h.write(data_corrupt)

WARNING: The STR below will close all running Firefox instances 20 seconds after you run the powershell script, make sure that this is fine with you!

STR:

  • install Firefox 143.0.1;
  • navigate to the installation directory;
  • rename xul.dll to xul_backup.dll;
  • download xul_corrupt.dll, place it in the installation directory, rename it to xul.dll;
  • open a Powershell window, copy-paste the Powershell script from above, hit Enter;
  • within 20 seconds, double-click firefox.exe in the installation directory;
  • once all crash reporter and/or Firefox windows are closed, hit Ctrl-C in the Powershell window.

Note: These STR reproduce an existing issue that we known some actual users can run into after updating! See bug 1816848 comment 21 for more details.

Summary: fork bomb → Startup crash results in the equivalent of a fork bomb
See Also: → 1816848, 1991378

Set release status flags based on info from the regressing bug 1866863

:afranchuk, since you are the author of the regressor, bug 1866863, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(afranchuk)
Has STR: --- → yes
Assignee: nobody → afranchuk
Status: NEW → ASSIGNED
Flags: needinfo?(afranchuk)

Surprisingly, I'm not able to reproduce this on Linux. I can see that when the crash reporter client invokes the background task, it crashes as expected, but it doesn't launch a new crash reporter client. This makes it a bit annoying for me to verify my fix (my Windows dev setup is not ideal right now), however the delay won't be large.

Aha, on Linux and Mac we hit this logic which prevents the crash reporter from launching for background tasks: https://searchfox.org/firefox-main/source/toolkit/crashreporter/nsExceptionHandler.cpp#1571-1575.

We have no such logic in the WER handler.

I'm having a bit of trouble reproducing the issue on Windows, too, though I'm trying to reproduce with a debug build. I will try again tomorrow with a release build.

(In reply to Alex Franchuk [:afranchuk] from comment #3)

We have no such logic in the WER handler.

The crash I'm reproducing is indeed a WER crash. Maybe this only occurs for WER startup crashes then?

With the maintenance service trying to launch Firefox for background updates, I got hit another time this morning without starting Firefox myself. Uninstalling works (it starts two crash reporter windows, but not more, and then Firefox is uninstalled).

The fix was as simple as I expected. Rather than continue to muck with altered STR, I just did yours (which reproduced as expected) and copied a modified crashreporter executable into the installation to verify the fix.

Duplicate of this bug: 1987145

Otherwise, if the background task crashes, a loop of crash reporter
clients launching will occur as each tries to launch another background
task to send the crash ping.

Pushed by afranchuk@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/e4aeb6a4de68 https://hg.mozilla.org/integration/autoland/rev/de236a54e877 Ensure the background task launched by the crash reporter client doesn't launch its own crash reporter r=gsvelto
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 145 Branch

Hi Alex, in Nightly after the patch, same STR (corrupting the new xul.dll), I confirm that I don't run into the fork bomb-like behavior, so this part is fine. Still the user experience feels degraded to me compared to before bug 1866863. I have a recording available here if you want to double check this. Essentially:
(a) after the crash reporter starts, I have a Firefox window that appears saying "Couldn't load XPCOM.", and this window keeps reappearing everytime I click OK as long as the crash reporter is running;
(b) when I send a report, I have another "Couldn't load XPCOM" window showing, however the report is properly sent.

Is this the intended behavior? Can we do something about it or not?

Below is the list of successive command lines ordered by process start time, based on the Windows Performance Recorder recording:

"C:\Program Files\Firefox Nightly\firefox.exe" 
C:\WINDOWS\System32\svchost.exe -k WerSvcGroup
C:\WINDOWS\system32\WerFault.exe -pss -s 472 -p 24244 -ip 24244
C:\WINDOWS\system32\WerFault.exe -u -p 24244 -s 600
"C:\Program Files\Firefox Nightly\crashreporter.exe" "C:\Users\yjugl\AppData\Roaming\Mozilla\Firefox\Crash Reports\pending\fbd6b066-d90f-4dbe-a90b-7390c0b829db.dmp"
taskhostw.exe LocalUserSyncDataAvailable
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/7cf70ff7-a6f3-41aa-8960-2833e9e1f550 "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request0.json
taskhostw.exe LocalUserSyncDataAvailable
C:\WINDOWS\system32\WerFault.exe -pss -s 596 -p 30192 -ip 30192
C:\WINDOWS\system32\WerFault.exe -u -p 30192 -s 608
"C:\WINDOWS\system32\wermgr.exe" -upload
"curl" --user-agent "crashreporter/145.0a1 (windows x86_64)" --header "X-Telemetry-Agent: Glean/65.2.1 (Rust on Windows)" --header "Date: Thu, 02 Oct 2025 08:58:05 GMT" --header "Content-Encoding: gzip" --header "Content-Type: application/json; charset=utf-8" --header "Content-Length: 993" --data-binary @- https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/7cf70ff7-a6f3-41aa-8960-2833e9e1f550
\??\C:\WINDOWS\system32\conhost.exe 0xffffffff -ForceV1
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/37b62fee-0607-49dc-963c-5860bbc6547b "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request1.json
C:\WINDOWS\system32\WerFault.exe -pss -s 656 -p 10072 -ip 10072
C:\WINDOWS\system32\WerFault.exe -u -p 10072 -s 576
"C:\WINDOWS\system32\wermgr.exe" -upload
"curl" --user-agent "crashreporter/145.0a1 (windows x86_64)" --header "Content-Encoding: gzip" --header "Content-Length: 978" --header "X-Telemetry-Agent: Glean/65.2.1 (Rust on Windows)" --header "Content-Type: application/json; charset=utf-8" --header "Date: Thu, 02 Oct 2025 08:58:09 GMT" --data-binary @- https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/37b62fee-0607-49dc-963c-5860bbc6547b
\??\C:\WINDOWS\system32\conhost.exe 0xffffffff -ForceV1
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/4016167b-ef01-4a09-a142-bb73ff1fdc1d "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request2.json
C:\WINDOWS\system32\WerFault.exe -pss -s 716 -p 13260 -ip 13260
C:\WINDOWS\system32\WerFault.exe -u -p 13260 -s 592
"C:\WINDOWS\system32\wermgr.exe" -upload
"curl" --user-agent "crashreporter/145.0a1 (windows x86_64)" --header "Content-Length: 982" --header "Content-Encoding: gzip" --header "Content-Type: application/json; charset=utf-8" --header "Date: Thu, 02 Oct 2025 08:58:11 GMT" --header "X-Telemetry-Agent: Glean/65.2.1 (Rust on Windows)" --data-binary @- https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/4016167b-ef01-4a09-a142-bb73ff1fdc1d
\??\C:\WINDOWS\system32\conhost.exe 0xffffffff -ForceV1
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/8b0222e3-d8c7-4fca-a1e3-2a9386928bab "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request3.json
C:\WINDOWS\system32\WerFault.exe -pss -s 560 -p 32516 -ip 32516
C:\WINDOWS\system32\WerFault.exe -u -p 32516 -s 608
"C:\WINDOWS\system32\wermgr.exe" -upload
"curl" --user-agent "crashreporter/145.0a1 (windows x86_64)" --header "Content-Encoding: gzip" --header "Date: Thu, 02 Oct 2025 08:58:14 GMT" --header "Content-Type: application/json; charset=utf-8" --header "Content-Length: 990" --header "X-Telemetry-Agent: Glean/65.2.1 (Rust on Windows)" --data-binary @- https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/8b0222e3-d8c7-4fca-a1e3-2a9386928bab
\??\C:\WINDOWS\system32\conhost.exe 0xffffffff -ForceV1
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/5fb873e9-a940-40c0-9bd8-d7ad56f9cdcf "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request4.json
C:\WINDOWS\system32\WerFault.exe -pss -s 700 -p 28828 -ip 28828
C:\WINDOWS\system32\WerFault.exe -u -p 28828 -s 576
"C:\WINDOWS\system32\wermgr.exe" -upload
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://crash-reports.mozilla.com/submit?id={ec8030f7-c20a-464f-9b0e-13a3a9e97384}&version=145.0a1&buildid=20251001201601 "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request5.json
C:\WINDOWS\system32\WerFault.exe -pss -s 556 -p 25816 -ip 25816
C:\WINDOWS\system32\WerFault.exe -u -p 25816 -s 616
"C:\WINDOWS\system32\wermgr.exe" -upload
"curl" --user-agent "crashreporter/145.0a1 (windows x86_64)" --form extra=@-;filename=extra.json;type=application/json --form "upload_file_minidump=@\"C:\\Users\\yjugl\\AppData\\Roaming\\Mozilla\\Firefox\\Crash Reports\\pending\\fbd6b066-d90f-4dbe-a90b-7390c0b829db.dmp\"" https://crash-reports.mozilla.com/submit?id={ec8030f7-c20a-464f-9b0e-13a3a9e97384}&version=145.0a1&buildid=20251001201601
\??\C:\WINDOWS\system32\conhost.exe 0xffffffff -ForceV1
"curl" --user-agent "crashreporter/145.0a1 (windows x86_64)" --header "X-Telemetry-Agent: Glean/65.2.1 (Rust on Windows)" --header "Content-Length: 973" --header "Content-Type: application/json; charset=utf-8" --header "Content-Encoding: gzip" --header "Date: Thu, 02 Oct 2025 08:58:16 GMT" --data-binary @- https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/5fb873e9-a940-40c0-9bd8-d7ad56f9cdcf
\??\C:\WINDOWS\system32\conhost.exe 0xffffffff -ForceV1
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/50abd3aa-5b15-47d1-92bd-70180471e444 "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request6.json
C:\WINDOWS\system32\WerFault.exe -pss -s 700 -p 28176 -ip 28176
C:\WINDOWS\system32\WerFault.exe -u -p 28176 -s 576
"C:\WINDOWS\system32\wermgr.exe" -upload
"curl" --user-agent "crashreporter/145.0a1 (windows x86_64)" --header "X-Telemetry-Agent: Glean/65.2.1 (Rust on Windows)" --header "Date: Thu, 02 Oct 2025 08:58:32 GMT" --header "Content-Length: 976" --header "Content-Encoding: gzip" --header "Content-Type: application/json; charset=utf-8" --data-binary @- https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/50abd3aa-5b15-47d1-92bd-70180471e444
\??\C:\WINDOWS\system32\conhost.exe 0xffffffff -ForceV1
"C:\Program Files\Firefox Nightly\firefox.exe" --backgroundtask crashreporterNetworkBackend https://incoming.telemetry.mozilla.org/submit/firefox-crashreporter/crash/1/e1879d80-f187-4716-bade-faa06948cb59 "crashreporter/145.0a1 (windows x86_64)" C:\Users\yjugl\AppData\Local\Temp\crashreporter3220-request7.json
C:\WINDOWS\system32\WerFault.exe -pss -s 620 -p 23776 -ip 23776
C:\WINDOWS\system32\WerFault.exe -u -p 23776 -s 596
"C:\WINDOWS\system32\wermgr.exe" -upload
Flags: needinfo?(afranchuk)

(In reply to Yannis Juglaret [:yannis] from comment #11)

So, interestingly, I couldn't reproduce the "Couldn't load XPCOM" dialog coming up when I followed your STR.

(a) after the crash reporter starts, I have a Firefox window that appears saying "Couldn't load XPCOM.", and this window keeps reappearing everytime I click OK as long as the crash reporter is running;

I agree that what you're experiencing is poor behavior. I was initially confused why it would continuously come up. However, now that I think about it, the logic around sending the ping isn't fully handled by us; Glean is the one that initiates the requests. So it must be retrying the ping. I wonder whether the curl fallback is failing. I'll double-check how failures are exposed to the Glean network interface to make sure that there's nothing weird there.

(b) when I send a report, I have another "Couldn't load XPCOM" window showing, however the report is properly sent.

This is something we can fix, by having some sticky state in the network backend that remembers the background task failure and goes right to using curl instead. And if we had that sticky state, it ought to avoid most of the weirdness encountered in (a).

All this being said, I feel as if we should configure background tasks to not show that dialog (it's likely to only confuse users, I think?).

Flags: needinfo?(afranchuk)
Blocks: 1992267

The patch landed in nightly and beta is affected.
:afranchuk, is this bug important enough to require an uplift?

For more information, please visit BugBot documentation.

Flags: needinfo?(afranchuk)
See Also: → 1992420

Otherwise, if the background task crashes, a loop of crash reporter
clients launching will occur as each tries to launch another background
task to send the crash ping.

Original Revision: https://phabricator.services.mozilla.com/D266826

Attachment #9518171 - Flags: approval-mozilla-beta?

firefox-beta Uplift Approval Request

  • User impact if declined: In rare cases, a UI loop will occur which will prevent the user from reasonably using their computer.
  • Code covered by automated testing: no
  • Fix verified in Nightly: yes
  • Needs manual QE test: no
  • Steps to reproduce for manual QE testing:
  • Risk associated with taking this patch: low
  • Explanation of risk level: The patch is extremely minimal in the scope of changes.
  • String changes made/needed: No
  • Is Android affected?: no
Attachment #9518176 - Flags: approval-mozilla-beta?

Otherwise, if the background task crashes, a loop of crash reporter
clients launching will occur as each tries to launch another background
task to send the crash ping.

Original Revision: https://phabricator.services.mozilla.com/D266826

Flags: needinfo?(afranchuk)
Attachment #9518171 - Attachment is obsolete: true
Attachment #9518171 - Flags: approval-mozilla-beta?
Attachment #9518176 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Alex Franchuk [:afranchuk] from comment #12)

So, interestingly, I couldn't reproduce the "Couldn't load XPCOM" dialog coming up when I followed your STR.

For completeness, my understanding is that reproducing the "Couldn't load XPCOM" requires crashing during DLL loading, so you need to repro [@ google::protobuf::EncodedDescriptorDatabase::Add ] (the 144 and 145 variation of bug 1816848) and not [@ mozilla::LinkedListElement<T>::setPreviousUnsafe ] (the 143- variation of bug 1816848), which was crashing after DLL loading. The DLL shared in comment 0 will repro the latter. In order to reproduce the former you need to corrupt the xul.dll from a 144 or 145 build using the Python script from comment 0 and use that corrupted file rather than the DLL I shared. Sorry if "corrupting the new xul.dll" in comment 11 wasn't clear.

(In reply to Yannis Juglaret [:yannis] from comment #18)

(In reply to Alex Franchuk [:afranchuk] from comment #12)

So, interestingly, I couldn't reproduce the "Couldn't load XPCOM" dialog coming up when I followed your STR.

For completeness, my understanding is that reproducing the "Couldn't load XPCOM" requires crashing during DLL loading, so you need to repro [@ google::protobuf::EncodedDescriptorDatabase::Add ] (the 144 and 145 variation of bug 1816848) and not [@ mozilla::LinkedListElement<T>::setPreviousUnsafe ] (the 143- variation of bug 1816848), which was crashing after DLL loading. The DLL shared in comment 0 will repro the latter. In order to reproduce the former you need to corrupt the xul.dll from a 144 or 145 build using the Python script from comment 0 and used that corrupted file rather than the DLL I shared. Sorry if "corrupting the new xul.dll" in comment 11 wasn't clear.

Ah I see, thanks for clarifying!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: