Crash in [@ mozilla::Maybe<T>::emplace | mozilla::PreloaderBase::NotifyStop]
Categories
(Core :: Graphics: ImageLib, defect, P3)
Tracking
()
People
(Reporter: aryx, Assigned: tnikkel, NeedInfo)
References
Details
(Keywords: crash)
Crash Data
There had been 5 crashes before 2023-01-01 with this signature, but since this year started 151 such crashes have been reported - all on Windows.
If the users shared the url of the crashing page, it was either https://www.wrtc2022.it/en/wrtc-2023-award-31.asp or https://www.qrz.com/db/
followed by a set of characters. Both look like they are featured by the same framework for an amateur radio website.
Crash report: https://crash-stats.mozilla.org/report/index/011b29a5-14ce-48d3-846d-932cc0230102
MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(!isSome())
Top 10 frames of crashing thread:
0 xul.dll mozilla::Maybe<nsresult>::emplace mfbt/Maybe.h:844
0 xul.dll mozilla::PreloaderBase::NotifyStop uriloader/preload/PreloaderBase.cpp:242
1 xul.dll imgRequestProxy::OnLoadComplete image/imgRequestProxy.cpp:1082
2 xul.dll mozilla::image::SyncNotifyInternal<const mozilla::image::ObserverTable*>::<lambda_7>::operator const image/ProgressTracker.cpp:356
2 xul.dll mozilla::image::ImageObserverNotifier<const mozilla::image::ObserverTable*>::operator image/ProgressTracker.cpp:286
2 xul.dll mozilla::image::SyncNotifyInternal image/ProgressTracker.cpp:355
2 xul.dll mozilla::image::ProgressTracker::SyncNotifyProgress::<lambda_2>::operator const image/ProgressTracker.cpp:374
2 xul.dll mozilla::image::CopyOnWrite<mozilla::image::ObserverTable>::Read const image/CopyOnWrite.h:155
2 xul.dll mozilla::image::ProgressTracker::SyncNotifyProgress image/ProgressTracker.cpp:380
3 xul.dll mozilla::image::MultipartImage::OnLoadComplete image/MultipartImage.cpp:288
Reporter | ||
Comment 1•1 year ago
|
||
Andrew, could you check why this started to crash? No success with reproduction attempts here.
Updated•1 year ago
|
Comment 2•1 year ago
|
||
The bug is marked as tracked for firefox108 (release). We have limited time to fix this, the soft freeze is in 6 days. However, the bug still isn't assigned.
:bhood, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 3•1 year ago
|
||
The code in question is dealing with multipart images, which are sent using an unusual type of channel where multiple images are sent in sequence in one channel and the new image is intended to replace the previous image. We get a "stop" for each image, so it's not too surprising that when we try to set the stop status we find that it has already been set. But we obviously don't hit this problem in the common case with multipart images for some reason.
I put a fatal assert in some basic multipart image code and then tried to use the two linked websites, even creating a login, but I was not able to find a page that had a multipart image in it. I don't understand these websites very much, so I was just clicking around on what seemed like it would be the main parts.
Comment 4•1 year ago
|
||
Too late for 108 at this point, but still keeping an eye on this for 109.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 5•1 year ago
|
||
I still haven't been able to reproduce. I looked into this pretty deeply, I think I have a patch which will fix the crash, but since this code is complicated and kind of fiddly I want to have higher confidence that the patch isn't going to: (a) possibly make things worse (breaking cases that are currently working or a worse crash in the broken cases), (b) actually fix the crash and not just make us crash in a different/later place after entering a situation our code isn't prepared to handle. Now that I understand the condition better I have an idea for how to reproduce by modifying an existing test for multipart images
Uplifting this patch to beta less than a week before release does not make me comfortable.
If anyone understands those ham radio sites better and can get a multipart image to show up there that would be the most helpful. A multipart image will usually look like a live video with a low frame rate (ie live webcams from a decade ago).
Comment 6•1 year ago
|
||
We're not going to block the 109.0 release on this, but we'll keep an eye on it for possible inclusion in the planned 109 dot release at the end of the month if the risk-reward looks good.
Comment 7•1 year ago
|
||
This is a reminder regarding comment #2!
The bug is marked as tracked for firefox110 (nightly). We have limited time to fix this, the soft freeze is today. However, the bug still isn't assigned and has low priority.
Assignee | ||
Updated•1 year ago
|
Comment 8•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 10 AArch64 and ARM crashes on beta
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 9•1 year ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #0)
If the users shared the url of the crashing page, it was either https://www.wrtc2022.it/en/wrtc-2023-award-31.asp or
https://www.qrz.com/db/
followed by a set of characters. Both look like they are featured by the same framework for an amateur radio website.
Could you share with me privately (use my email) some of the urls on qrz.com that the crashes happen on?
Assignee | ||
Comment 10•1 year ago
|
||
I've actually been making progress on reproducing. At first I thought the stack was impossible, but I found a way to hit that exact stack, except the Maybe value is none so we don't hit the assert, if I could figure out a way to hit the same stack twice for the same image then I would have a reproduction, just have to figure out why I'm not hitting the stack a second time.
Assignee | ||
Comment 11•1 year ago
|
||
Got some urls from someone else with access.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 12•1 year ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 13•1 year ago
|
||
Since the crash volume is low (less than 15 per week), the severity is downgraded to S3
. Feel free to change it back if you think the bug is still critical.
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•1 year ago
|
Comment 14•1 year ago
|
||
:tnikkel this is marked as tracking for 110 and 111 as it's a recent crash.
It's looking likely that it's too late for 110, but wondering if we can expect any further investigation in time for 111
Assignee | ||
Comment 15•1 year ago
|
||
The number of crashes has decreased significantly, around 10x less than at the peak. I hope to land an investigation patch, but can't make any promises any the timeline of that due to other work.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 17•6 months ago
|
||
FYI, bug 1862059 contains a reliable testcase to hit this crash.
Comment 18•6 months ago
|
||
Tim, when you get some cycles, could you check the test case Ryan mentions above and see if that helps identify the issue?
Assignee | ||
Comment 19•6 months ago
|
||
I've debugged the testcase and I have a patch. I however think that the testcase in bug 1862059 is not the same issue as the crashes we have been seeing in the wild under this signature. This crash signature is strongly correlated with websites that use multipart jpeg images for stream webcams. The testcase of bug 1862059 uses multipart svg image, and the issue is specific to svg images and does not happen with raster (jpeg) images. So I will probably re-open that bug and land the patch there. If they turn out to be the same issue then we can close this, not a problem. This new failure mode I think gives me a few new ideas for how this crash might be happening.
Assignee | ||
Updated•6 months ago
|
Comment 20•4 months ago
|
||
Hi, developer of hamaward.cloud here
I think the cause of the crash is the image on the top right corner counting the number of viewer on the page
Some links where it crashes most of the times
https://hamaward.cloud/aw321?iframe=1&nojs=&activator_call=II1WWA&awdwlbt=0
https://hamaward.cloud/aw314?iframe=1&nojs=1&tab=4&activator_call=IB2BGBS&callsign=IK2UCL&score=1&score_name=italian&awdwlbt=0
Maybe you did't find the crash after some time because after the award has ended, the counter on the top right corner is not streamed as multipart any more.
Comment 21•4 months ago
|
||
I worked around the bug by sending the svg frames more slowly.
When the page had just finished loading, i was sending 2 svg without any sleep between them, because some browser show the penultimate frame, so if i waited 3 or 4 seconds, i had an empty image in that time range.
Now i added a 200ms sleep between the two first frame and the page is not crashing any more.
(Tested with 50ms and 100ms and it was crashing)
In any case I would like to help fix this bug in firefox, if necessary I can host a page that reproduces the bug.
Comment 22•4 months ago
|
||
If you're sending svg-s it might be the same as bug 1862059 instead of this one.
Assignee | ||
Comment 23•4 months ago
|
||
Oh yeah if you put svg's into multipart images then it is most likely bug 1862059. I didn't think anyone did that out in the wild. Do all browsers support svg in multipart? It's a feature that we were wondering if we even needed to support.
Bug 1862059 was blocked on some testing issues that I was sorting out, but I can hack around that and get it landed.
Comment 24•4 months ago
|
||
yes, i'm pretty sure it is the same bug, just tested the poc attached, if i add some time.sleep in the while loop it become a bit more stable
Comment 25•4 months ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #23)
Do all browsers support svg in multipart?
Chrome, safari, edge and firefox yes
Some browsers also support png (safari)
It's a feature that we were wondering if we even needed to support.
Well, in the context of iframe with js disabled, this is one of the few methods to make the content dynamic
And streaming svg instead of mjpeg makes it less network heavy
Bug 1862059 was blocked on some testing issues that I was sorting out, but I can hack around that and get it landed.
If you need any help let me know
Assignee | ||
Comment 26•4 months ago
|
||
Thanks for posting that info. I landed bug 1862059, it should be in version 123 on Feb 20.
Description
•