Open Bug 1808676 Opened 1 year ago Updated 4 months ago

Crash in [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop]

Categories

(Core :: Graphics: ImageLib, defect, P3)

Unspecified
Windows
defect

Tracking

()

Tracking Status
firefox-esr102 --- unaffected
firefox108 + wontfix
firefox109 + wontfix
firefox110 + wontfix
firefox111 + wontfix

People

(Reporter: aryx, Assigned: tnikkel, NeedInfo)

References

Details

(Keywords: crash)

Crash Data

There had been 5 crashes before 2023-01-01 with this signature, but since this year started 151 such crashes have been reported - all on Windows.

If the users shared the url of the crashing page, it was either https://www.wrtc2022.it/en/wrtc-2023-award-31.asp or https://www.qrz.com/db/ followed by a set of characters. Both look like they are featured by the same framework for an amateur radio website.

Crash report: https://crash-stats.mozilla.org/report/index/011b29a5-14ce-48d3-846d-932cc0230102

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(!isSome())

Top 10 frames of crashing thread:

0  xul.dll  mozilla::Maybe<nsresult>::emplace  mfbt/Maybe.h:844
0  xul.dll  mozilla::PreloaderBase::NotifyStop  uriloader/preload/PreloaderBase.cpp:242
1  xul.dll  imgRequestProxy::OnLoadComplete  image/imgRequestProxy.cpp:1082
2  xul.dll  mozilla::image::SyncNotifyInternal<const mozilla::image::ObserverTable*>::<lambda_7>::operator const  image/ProgressTracker.cpp:356
2  xul.dll  mozilla::image::ImageObserverNotifier<const mozilla::image::ObserverTable*>::operator  image/ProgressTracker.cpp:286
2  xul.dll  mozilla::image::SyncNotifyInternal  image/ProgressTracker.cpp:355
2  xul.dll  mozilla::image::ProgressTracker::SyncNotifyProgress::<lambda_2>::operator const  image/ProgressTracker.cpp:374
2  xul.dll  mozilla::image::CopyOnWrite<mozilla::image::ObserverTable>::Read const  image/CopyOnWrite.h:155
2  xul.dll  mozilla::image::ProgressTracker::SyncNotifyProgress  image/ProgressTracker.cpp:380
3  xul.dll  mozilla::image::MultipartImage::OnLoadComplete  image/MultipartImage.cpp:288

Andrew, could you check why this started to crash? No success with reproduction attempts here.

Flags: needinfo?(aosmond)

The bug is marked as tracked for firefox108 (release). We have limited time to fix this, the soft freeze is in 6 days. However, the bug still isn't assigned.

:bhood, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(bhood)
Crash Signature: [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop] → [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop] {@ mozilla::Maybe<T>::emplace<T> | mozilla::PreloaderBase::NotifyStop]
Crash Signature: [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop] {@ mozilla::Maybe<T>::emplace<T> | mozilla::PreloaderBase::NotifyStop] → [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop] [@ mozilla::Maybe<T>::emplace<T> | mozilla::PreloaderBase::NotifyStop]

The code in question is dealing with multipart images, which are sent using an unusual type of channel where multiple images are sent in sequence in one channel and the new image is intended to replace the previous image. We get a "stop" for each image, so it's not too surprising that when we try to set the stop status we find that it has already been set. But we obviously don't hit this problem in the common case with multipart images for some reason.

I put a fatal assert in some basic multipart image code and then tried to use the two linked websites, even creating a login, but I was not able to find a page that had a multipart image in it. I don't understand these websites very much, so I was just clicking around on what seemed like it would be the main parts.

Too late for 108 at this point, but still keeping an eye on this for 109.

Crash Signature: [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop] [@ mozilla::Maybe<T>::emplace<T> | mozilla::PreloaderBase::NotifyStop] → [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop] [@ mozilla::Maybe<T>::emplace<T> | mozilla::PreloaderBase::NotifyStop] [@ mozilla::PreloaderBase::NotifyStop ]

I still haven't been able to reproduce. I looked into this pretty deeply, I think I have a patch which will fix the crash, but since this code is complicated and kind of fiddly I want to have higher confidence that the patch isn't going to: (a) possibly make things worse (breaking cases that are currently working or a worse crash in the broken cases), (b) actually fix the crash and not just make us crash in a different/later place after entering a situation our code isn't prepared to handle. Now that I understand the condition better I have an idea for how to reproduce by modifying an existing test for multipart images

Uplifting this patch to beta less than a week before release does not make me comfortable.

If anyone understands those ham radio sites better and can get a multipart image to show up there that would be the most helpful. A multipart image will usually look like a live video with a low frame rate (ie live webcams from a decade ago).

Severity: -- → S2
Priority: -- → P3

We're not going to block the 109.0 release on this, but we'll keep an eye on it for possible inclusion in the planned 109 dot release at the end of the month if the risk-reward looks good.

This is a reminder regarding comment #2!

The bug is marked as tracked for firefox110 (nightly). We have limited time to fix this, the soft freeze is today. However, the bug still isn't assigned and has low priority.

Assignee: nobody → tnikkel

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 10 AArch64 and ARM crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #0)

If the users shared the url of the crashing page, it was either https://www.wrtc2022.it/en/wrtc-2023-award-31.asp or https://www.qrz.com/db/ followed by a set of characters. Both look like they are featured by the same framework for an amateur radio website.

Could you share with me privately (use my email) some of the urls on qrz.com that the crashes happen on?

Flags: needinfo?(aryx.bugmail)

I've actually been making progress on reproducing. At first I thought the stack was impossible, but I found a way to hit that exact stack, except the Maybe value is none so we don't hit the assert, if I could figure out a way to hit the same stack twice for the same image then I would have a reproduction, just have to figure out why I'm not hitting the stack a second time.

Got some urls from someone else with access.

Flags: needinfo?(aryx.bugmail)
Flags: needinfo?(bhood)

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
Flags: needinfo?(aosmond)

:tnikkel this is marked as tracking for 110 and 111 as it's a recent crash.
It's looking likely that it's too late for 110, but wondering if we can expect any further investigation in time for 111

Flags: needinfo?(tnikkel)

The number of crashes has decreased significantly, around 10x less than at the peak. I hope to land an investigation patch, but can't make any promises any the timeline of that due to other work.

Flags: needinfo?(tnikkel)
Duplicate of this bug: 1862059

FYI, bug 1862059 contains a reliable testcase to hit this crash.

Tim, when you get some cycles, could you check the test case Ryan mentions above and see if that helps identify the issue?

Flags: needinfo?(tnikkel)

I've debugged the testcase and I have a patch. I however think that the testcase in bug 1862059 is not the same issue as the crashes we have been seeing in the wild under this signature. This crash signature is strongly correlated with websites that use multipart jpeg images for stream webcams. The testcase of bug 1862059 uses multipart svg image, and the issue is specific to svg images and does not happen with raster (jpeg) images. So I will probably re-open that bug and land the patch there. If they turn out to be the same issue then we can close this, not a problem. This new failure mode I think gives me a few new ideas for how this crash might be happening.

No longer duplicate of this bug: 1862059
See Also: → 1862059

Hi, developer of hamaward.cloud here
I think the cause of the crash is the image on the top right corner counting the number of viewer on the page

Some links where it crashes most of the times
https://hamaward.cloud/aw321?iframe=1&nojs=&activator_call=II1WWA&awdwlbt=0
https://hamaward.cloud/aw314?iframe=1&nojs=1&tab=4&activator_call=IB2BGBS&callsign=IK2UCL&score=1&score_name=italian&awdwlbt=0

Maybe you did't find the crash after some time because after the award has ended, the counter on the top right corner is not streamed as multipart any more.

I worked around the bug by sending the svg frames more slowly.

When the page had just finished loading, i was sending 2 svg without any sleep between them, because some browser show the penultimate frame, so if i waited 3 or 4 seconds, i had an empty image in that time range.
Now i added a 200ms sleep between the two first frame and the page is not crashing any more.
(Tested with 50ms and 100ms and it was crashing)

In any case I would like to help fix this bug in firefox, if necessary I can host a page that reproduces the bug.

If you're sending svg-s it might be the same as bug 1862059 instead of this one.

Oh yeah if you put svg's into multipart images then it is most likely bug 1862059. I didn't think anyone did that out in the wild. Do all browsers support svg in multipart? It's a feature that we were wondering if we even needed to support.

Bug 1862059 was blocked on some testing issues that I was sorting out, but I can hack around that and get it landed.

yes, i'm pretty sure it is the same bug, just tested the poc attached, if i add some time.sleep in the while loop it become a bit more stable

(In reply to Timothy Nikkel (:tnikkel) from comment #23)

Do all browsers support svg in multipart?

Chrome, safari, edge and firefox yes
Some browsers also support png (safari)

It's a feature that we were wondering if we even needed to support.

Well, in the context of iframe with js disabled, this is one of the few methods to make the content dynamic
And streaming svg instead of mjpeg makes it less network heavy

Bug 1862059 was blocked on some testing issues that I was sorting out, but I can hack around that and get it landed.

If you need any help let me know

Thanks for posting that info. I landed bug 1862059, it should be in version 123 on Feb 20.

You need to log in before you can comment on or make changes to this bug.