Closed Bug 1729539 Opened 4 years ago Closed 4 years ago

Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392

Categories

(Core :: Graphics: ImageLib, defect)

defect

Tracking

()

VERIFIED FIXED
94 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox92 --- disabled
firefox93 --- fixed
firefox94 --- verified

People

(Reporter: tsmith, Assigned: jbauman)

References

(Regression)

Details

(4 keywords, Whiteboard: [bugmon:bisected,confirmed])

Crash Data

Attachments

(2 files)

Found while fuzzing m-c 20210907-eac402936496 (--enable-debug --enable-fuzzing)

Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392

#0 0x7f3c91f6a0e5 in MOZ_Crash /builds/worker/workspace/obj-build/dist/include/mozilla/Assertions.h:256:3
#1 0x7f3c91f6a0e5 in RustMozCrash src/mozglue/static/rust/wrappers.cpp:18:3
#2 0x7f3c91f6a064 in mozglue_static::panic_hook::h63b3c2e6144e67e9 src/mozglue/static/rust/lib.rs:91:9
#3 0x7f3c91f69adb in core::ops::function::Fn::call::h0d4763c52fdc30fd /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/ops/function.rs:70:5
#4 0x7f3c92d35218 in std::panicking::rust_panic_with_hook::h7ee9e1a2d0f8975a /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:626:17
#5 0x7f3c92d34c96 in std::panicking::begin_panic_handler::_$u7b$$u7b$closure$u7d$$u7d$::h8ab3b4491718b2c7 /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:517:13
#6 0x7f3c92d3103b in std::sys_common::backtrace::__rust_end_short_backtrace::hd489062ffa586a9f /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/sys_common/backtrace.rs:141:18
#7 0x7f3c92d34c28 in rust_begin_unwind /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/std/src/panicking.rs:515:5
#8 0x7f3c88f10070 in core::panicking::panic_fmt::hca6330e3e14086b4 /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/panicking.rs:92:14
#9 0x7f3c88f0ffbc in core::panicking::panic::h1a48d878ff3dcd40 /rustc/a178d0322ce20e33eac124758e837cbd80a6f633/library/core/src/panicking.rs:50:5
#10 0x7f3c91010d6f in _$LT$qcms..iccread..curveType$u20$as$u20$core..convert..From$LT$qcms..iccread..TransferCharacteristics$GT$$GT$::from::_$u7b$$u7b$closure$u7d$$u7d$::ha817a1d3ccc3bdfb src/gfx/qcms/src/iccread.rs:1392:25
#11 0x7f3c91010d6f in qcms::iccread::build_trc_table::he6f51c21bd9a977f src/gfx/qcms/src/iccread.rs:920:22
#12 0x7f3c91010d6f in _$LT$qcms..iccread..curveType$u20$as$u20$core..convert..From$LT$qcms..iccread..TransferCharacteristics$GT$$GT$::from::hff68029ca57bb146 src/gfx/qcms/src/iccread.rs:1385:29
#13 0x7f3c910117ea in qcms::iccread::Profile::new_cicp::h90bf7e8ae50a442d src/gfx/qcms/src/iccread.rs:1545:21
#14 0x7f3c91003f59 in qcms_profile_create_cicp src/gfx/qcms/src/c_bindings.rs:71:5
#15 0x7f3c8aba1959 in mozilla::image::nsAVIFDecoder::Decode(mozilla::image::SourceBufferIterator&, mozilla::image::IResumable*) src/image/decoders/nsAVIFDecoder.cpp:1433:20
#16 0x7f3c8aba0058 in mozilla::image::nsAVIFDecoder::DoDecode(mozilla::image::SourceBufferIterator&, mozilla::image::IResumable*) src/image/decoders/nsAVIFDecoder.cpp:1144:25
#17 0x7f3c8aade2e7 in mozilla::image::Decoder::Decode(mozilla::image::IResumable*) src/image/Decoder.cpp:177:19
#18 0x7f3c8aae6bad in mozilla::image::DecodedSurfaceProvider::Run() src/image/DecodedSurfaceProvider.cpp:123:34
#19 0x7f3c8ab01533 in mozilla::image::DecodingTask::Run() src/image/DecodePool.cpp:146:12
#20 0x7f3c89100ddd in mozilla::TaskController::RunPoolThread() src/xpcom/threads/TaskController.cpp:287:33
#21 0x7f3c9f743957 in _pt_root src/nsprpub/pr/src/pthreads/ptthread.c:201:5
#22 0x7f3ca04bf608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8
#23 0x7f3ca0087292 in clone /build/glibc-eX1tMB/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Attached file testcase.avif โ€”
Attachment #9239903 - Attachment mime type: image/avif → application/octet-stream
Crash Signature: [@ qcms::iccread::{{impl}}::from ]
Flags: in-testsuite?
Keywords: testcase

A Pernosco session is available here: https://pernos.co/debug/uLyo26cfwLdJMr4q5BkJNw/index.html

Keywords: bugmon
Flags: needinfo?(jbauman)

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20210908032417-a4d2ca53b2a4.
The bug appears to have been introduced in the following build range:

Start: 8eb9e75580b68837ee7c91d1c03c218521f51b34 (20210805151303)
End: 4aa3a54f7d7202bf6868a76b10e4669b56ed3c35 (20210805165711)
Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=8eb9e75580b68837ee7c91d1c03c218521f51b34&tochange=4aa3a54f7d7202bf6868a76b10e4669b56ed3c35

Whiteboard: [bugmon:bisected,confirmed]

Ok, I see what's going on. Will have a patch up sortly

Flags: needinfo?(jbauman)

The assertion is due to an inappropriate test of exact floating-point values.
build_trc_table() handles this saturating case, so there's no need to assert.
Add more test coverage to be certain no fuzzing inputs will lead to crashes.

Assignee: nobody → jbauman
Status: NEW → ASSIGNED

:jbauman, since this bug contains a bisection range, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jbauman)
Flags: needinfo?(jbauman)
Regressed by: 1723253
Has Regression Range: --- → yes
Pushed by jbauman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/db3338456ae1 Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392. r=jrmuizel

Backed out for causing multiple failures

Flags: needinfo?(jbauman)

Ok, I think I see the issue

It's frustrating that mach try auto didn't catch this: https://treeherder.mozilla.org/jobs?repo=try&revision=f668f74164d6cc6be378bdf18e6756b5b4d5132d&selectedTaskRun=PrMv6LymQTKyvw6VU_6ZoA.0

Though my other try chooser run did hit it and I failed to catch it. Should be a straightforward fix.

Flags: needinfo?(jbauman)
Pushed by jbauman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bd23fb0c95cc Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392. r=jrmuizel
Backout by mlaza@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/47983d533665 Backed out changeset bd23fb0c95cc for causing web platform failures.

Backed out changeset bd23fb0c95cc (Bug 1729539) for causing web platform failures.
Backout link
Push with failures WdH1
Failure Log

Flags: needinfo?(jbauman)

I don't see how this is error related to https://hg.mozilla.org/integration/autoland/rev/bd23fb0c95cc, can you help?

Flags: needinfo?(jbauman) → needinfo?(mlaza)

Based on multiple green runs of WdH1 in my pre-land try, I don't think this failure is being caused by bd23fb0c95cc.

The commit before mine has a lot of errors too, but doesn't look like it has the process exiting which is causing my commit to be flagged:

Additionally the changes in that revision seem far more related to the issue being seen. Given all that, I'd like to re-land unless you have objections.

Flags: needinfo?(mlaza)

Bug 1730234 also seem like it might be relevant, and predates my landing by several days

Henrik, do you have any insight why the changes in this bug started a permanent wdspec failure?

Jon, could you reland after 12pm PDT - this will prevent potential conflicts with the next merge candidate for Nightly. Thank you.

Flags: needinfo?(mlaza) → needinfo?(hskupin)

Jon, could you reland after 12pm PDT - this will prevent potential conflicts with the next merge candidate for Nightly. Thank you.

Will do! Thanks for the guidance

I had a look and the problem here is actually a different failure. The ones mentioned above are just log output, and are expected due to an expected failing test.

So here are the actual failing lines:
https://treeherder.mozilla.org/logviewer?job_id=351587981&repo=autoland&lineNumber=39967-39978

Something is forcing Firefox to shutdown after the initial browser window (toplevel-window-ready notification) has been opened. We never actually reach the marionette-startup-requested notification, which is sent out by Firefox when all windows have been fully their session restored.

Flags: needinfo?(hskupin)
Pushed by jbauman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d629f462a605 Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392. r=jrmuizel

I don't think this is likely to be related to this change. See the WdH2 green in my pre-landing try. Is there something that implicates this particular revision? It wasn't the responsible revision for the last backout.

Flags: needinfo?(jbauman) → needinfo?(abutkovits)
Flags: needinfo?(abutkovits) → needinfo?(aryx.bugmail)

The Try push uses a base from Sept 8, might have anything regressed since then? Comment 20 might indicate a crash.

Is there something that implicates this particular revision? It wasn't the responsible revision for the last backout.

I partially fail to parse the question. Each time the https://phabricator.services.mozilla.com/D125006 got backed out. Both WdH1 and WdH2 failed on macOS. The failing tasks didn't run initially for the pushes of bug 1729539 but do so for every 10th push. The code sheriffs added them to the previous pushes to identify with which push they started. The revision of that 10th push gets appended to the task name to indicate the tasks contain the same set of tests..

Flags: needinfo?(aryx.bugmail)

After running a lot of try runs, I'm becoming more an more convinced that while this change triggers a fault, it's not due to a fault in this code. Instead, I think that it's perturbing something that is making an otherwise rare or impossible fault condition elsewhere consistent. Unfortunately, I don't understand the test which is failing well enough to effectively investigate, so I'm at a bit of a loss for what to do. I'm going to continue breaking down my change and tweaking it to see if I can get it into a form which doesn't trigger the fault anymore, but I am concerned the real issue here is not getting investigated.

The change in D125006 is pretty small to begin with, but slicing it into orthogonal parts as finely as will compile/function, I've got (working all from the same base revision) this sequence of changes gradually building up to the original failing code:

  1. No changes: ๐Ÿ†—
  2. Remove just the assert line: ๐Ÿ†—
  3. Fully remove the assert (eliminating an unnecessary variable binding): ๐Ÿ†—
  4. derive Debug for qcms_CIE_xyY, qcms_CIE_xyYTRIPLE: ๐Ÿ†—
  5. Change Try to TryFrom<ColourPrimaries> for qcms_CIE_xyYTRIPLE: ๐Ÿ†—
  6. Change white_point to return a Result: โŒ
  7. Change Try to TryFrom<TransferCharacteristics> for curveType: โŒ
  8. Update rust unit tests: โŒ

Based on that, it seems like the problem is step 6, the addition of the white_point change. But I tried another run just removing that code and leaving the rest (note that all 8 steps are pretty orthogonal):

  1. remove white_point changes: โŒ

and continuing to remove other code that was failing:

  1. remove white_point and TryFrom<ColourPrimaries> changes: ๐Ÿ†—

I've run all the jobs at least twice and results are consistent. But I can't see any reason for the behavior. I'll keep trying some things, but so far my best guess is that this is causing a very specific perturbation in unrelated code. Note that I also can't reproduce this error locally despite being on a macOS system (thought I'm using 10.14 SDK, so maybe it's worth changing to 10.14 10.15 to be more like the try runners).

(In reply to Jon Bauman [:jbauman:] from comment #25)

  1. Change white_point to return a Result: โŒ

What I can see so far it's only Wd2 in headless mode which is permanently failing on MacOS 10.15 (not 10.14 as you said above). I tried yesterday to reproduce it locally too, but it's not failing on MacOS 11.5.2. Maybe it's something specific for 10.15?

Nevertheless I used the above try build to trigger some more jobs for that particular platform; just to see if other Wd2 jobs are failing too. The variations that I used are Fission, and non-headless. And as it can be seen only the headless tests are failing.

Given that your changes are graphics related I wonder if we could run a try build with some GFX logging via MOZ_LOG enabled? But not sure which type of logs the graphics component supports, and if that also includes Rust components.

Lets have a look at the differences...

Here the logs from a working test:

[task 2021-09-16T19:07:55.558Z] 19:07:55     INFO - PID 1899 | 1631819275557	Marionette	INFO	Marionette enabled
[task 2021-09-16T19:07:55.611Z] 19:07:55     INFO - PID 1899 | 1631819275609	Marionette	TRACE	Received observer notification toplevel-window-ready
[task 2021-09-16T19:07:55.616Z] 19:07:55     INFO - PID 1899 | 1631819275615	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:55.624Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker ClientManagerService: start destroying IPC actors early for phase xpcom-will-shutdown
[task 2021-09-16T19:07:55.633Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker Flush WebExtension StartupCache for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:55.672Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:55.706Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker MediaShutdownManager: shutdown for phase profile-before-change
[task 2021-09-16T19:07:55.718Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:55.744Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker ServiceWorkerShutdownBlocker: shutting down Service Workers for phase profile-change-teardown
[task 2021-09-16T19:07:55.799Z] 19:07:55     INFO - PID 1899 | 1631819275798	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:55.868Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker GMPProvider for phase AddonManager: Waiting for providers to shut down.
[task 2021-09-16T19:07:55.873Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker ContentParent: id=142b83800 for phase xpcom-will-shutdown
[task 2021-09-16T19:07:55.874Z] 19:07:55     INFO - PID 1899 | DEBUG: Adding blocker ContentParent: id=142b83800 for phase profile-before-change
[task 2021-09-16T19:07:55.985Z] 19:07:55     INFO - PID 1899 | [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
[task 2021-09-16T19:07:56.032Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker PageActions: purging unregistered actions from cache for phase profile-before-change
[task 2021-09-16T19:07:56.038Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Places Clients shutdown for phase profile-change-teardown
[task 2021-09-16T19:07:56.039Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Places Connection shutdown for phase profile-before-change
[task 2021-09-16T19:07:56.040Z] 19:07:56     INFO - PID 1899 | 1631819276039	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:56.047Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Remote Settings profile-before-change for phase profile-before-change
[task 2021-09-16T19:07:56.055Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker DoHController: clear state and remove observers for phase profile-before-change
[task 2021-09-16T19:07:56.207Z] 19:07:56     INFO - PID 1899 | 1631819276206	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:56.286Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Sqlite.jsm shutdown blocker for phase profile-before-change
[task 2021-09-16T19:07:56.287Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker content-prefs.sqlite#0: waiting for shutdown for phase Sqlite.jsm: wait until all connections are closed
[task 2021-09-16T19:07:56.288Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Closing ContentPrefService2 connection. for phase Sqlite.jsm: wait until all clients have completed their task
[task 2021-09-16T19:07:56.289Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Transaction (0) for phase content-prefs.sqlite#0: waiting for clients
[task 2021-09-16T19:07:56.290Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Transaction (1) for phase content-prefs.sqlite#0: waiting for clients
[task 2021-09-16T19:07:56.290Z] 19:07:56     INFO - PID 1899 | DEBUG: Completed blocker Transaction (0) for phase content-prefs.sqlite#0: waiting for clients
[task 2021-09-16T19:07:56.291Z] 19:07:56     INFO - PID 1899 | DEBUG: Completed blocker Transaction (1) for phase content-prefs.sqlite#0: waiting for clients
[task 2021-09-16T19:07:56.308Z] 19:07:56     INFO - PID 1899 | 1631819276307	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:56.315Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker sanitize.js: Sanitize on shutdown for phase Places Clients shutdown
[task 2021-09-16T19:07:56.381Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Search service: shutting down for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:56.420Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker places.sqlite#1: waiting for shutdown for phase Sqlite.jsm: wait until all connections are closed
[task 2021-09-16T19:07:56.421Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker Places Expiration: shutdown for phase Places Connection shutdown
[task 2021-09-16T19:07:56.422Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker PlacesUtils wrapped connection closing as part of Places shutdown for phase Places Connection shutdown
[task 2021-09-16T19:07:56.422Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker PlacesUtils wrapped connection must be closed before Sqlite.jsm for phase Sqlite.jsm: wait until all clients have completed their task
[task 2021-09-16T19:07:56.423Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker places.sqlite#1: PlacesExpiration.jsm: setup (0) for phase places.sqlite#1: waiting for clients
[task 2021-09-16T19:07:56.424Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker places.sqlite#0: waiting for shutdown for phase Sqlite.jsm: wait until all connections are closed
[task 2021-09-16T19:07:56.425Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker PlacesUtils read-only connection closing as part of Places shutdown for phase Places Connection shutdown
[task 2021-09-16T19:07:56.426Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker PlacesUtils read-only connection must be closed before Sqlite.jsm for phase Sqlite.jsm: wait until all clients have completed their task
[task 2021-09-16T19:07:56.466Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker ContentParent: id=14f4d9800 for phase xpcom-will-shutdown
[task 2021-09-16T19:07:56.466Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker ContentParent: id=14f4d9800 for phase profile-before-change
[task 2021-09-16T19:07:56.471Z] 19:07:56     INFO - PID 1899 | DEBUG: Completed blocker places.sqlite#1: PlacesExpiration.jsm: setup (0) for phase places.sqlite#1: waiting for clients
[task 2021-09-16T19:07:56.499Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:56.548Z] 19:07:56     INFO - PID 1899 | 1631819276547	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:56.689Z] 19:07:56     INFO - PID 1899 | DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:56.796Z] 19:07:56     INFO - PID 1899 | 1631819276795	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:56.871Z] 19:07:56     INFO - PID 1899 | 1631819276870	Marionette	TRACE	Received observer notification marionette-startup-requested

And here the logs from a failing test where Firefox seems to shutdown during startup:

[task 2021-09-16T19:07:54.676Z] 19:07:54     INFO - PID 1899 | 1631819274676	Marionette	INFO	Marionette enabled
[task 2021-09-16T19:07:54.696Z] 19:07:54     INFO - PID 1899 | 1631819274695	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:54.731Z] 19:07:54     INFO - PID 1899 | 1631819274730	Marionette	TRACE	Received observer notification toplevel-window-ready
[task 2021-09-16T19:07:54.744Z] 19:07:54     INFO - PID 1899 | DEBUG: Adding blocker ClientManagerService: start destroying IPC actors early for phase xpcom-will-shutdown
[task 2021-09-16T19:07:54.752Z] 19:07:54     INFO - PID 1899 | DEBUG: Adding blocker Flush WebExtension StartupCache for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:54.791Z] 19:07:54     INFO - PID 1899 | DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:54.826Z] 19:07:54     INFO - PID 1899 | DEBUG: Adding blocker MediaShutdownManager: shutdown for phase profile-before-change
[task 2021-09-16T19:07:54.838Z] 19:07:54     INFO - PID 1899 | DEBUG: Adding blocker JSON store: writing data for phase IOUtils: waiting for profileBeforeChange IO to complete
[task 2021-09-16T19:07:54.865Z] 19:07:54     INFO - PID 1899 | DEBUG: Adding blocker ServiceWorkerShutdownBlocker: shutting down Service Workers for phase profile-change-teardown
[task 2021-09-16T19:07:54.940Z] 19:07:54     INFO - PID 1899 | 1631819274939	geckodriver::marionette	TRACE	Connection refused (os error 61). Retrying in 100ms
[task 2021-09-16T19:07:55.166Z] 19:07:55     INFO - PID 1899 | 1631819275164	geckodriver::browser	DEBUG	Browser process stopped: exit status: 1
[task 2021-09-16T19:07:55.169Z] 19:07:55     INFO - PID 1899 | 1631819275165	webdriver::server	DEBUG	<- 500 Internal Server Error {"value":{"error":"unknown error","message":"Process unexpectedly closed with status 1","stacktrace":""}}
[task 2021-09-16T19:07:55.318Z] 19:07:55     INFO - STDOUT: ERROR

So the difference between both is the following line:

[task 2021-09-16T19:07:55.868Z] 19:07:55 INFO - PID 1899 | DEBUG: Adding blocker GMPProvider for phase AddonManager: Waiting for providers to shut down.

So the GMPProvider for AddonManager is not getting added as blocker for shutdown. So something between that and ServiceWorkerShutdownBlocker is triggering the shutdown. Sadly we do not yet collect minidumps for crashes in wdspec (see bug 1490906), which might have helped here. I'm working towards it but it might still take a bit.

I wonder if Mn jobs would also fail on MacOS when run in headless mode, but as it looks like they are not available for that platform even defined in the taskcluster configuration. These use Firefox in a similar way and might also trigger the same issue, where we would be able to catch a minidump. Maybe you could add the headless config - marionette-headless to test-platforms.yml and push such a try build?

Flags: needinfo?(jbauman)
Pushed by jbauman@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7a757975d4b6 Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392. r=jrmuizel

tl;dr I've queued for landing the minimal fix (just remove the assert) since I haven't seen that cause a backout-worthy failure in several runs. My hope is to uplift that for 93 since it will prevent a tab crash in the event of input which uses the HLG transfer function. That should be quite rare to nonexistent based on what we've seen so far with beta telemetry, but it's a simple fix.

(In reply to Henrik Skupin (:whimboo) [โŒš๏ธUTC+1] from comment #26)

What I can see so far it's only Wd2 in headless mode which is permanently failing on MacOS 10.15 (not 10.14 as you said above). I tried yesterday to reproduce it locally too, but it's not failing on MacOS 11.5.2. Maybe it's something specific for 10.15?

I interpreted the "OS X 10.15" in the job description to refer to the SDK used for building, not the OS it's actually running on, but I'm not yet very knowledgeable about the CI system generally. I was previously doing my local builds on the 10.14 SDK, but changed to 10.15 (my above comment was a typo) last night in an effort to reproduce. Unfortunately, I still wasn't able to reproduce locally, but it's probably worth trying again after changing my rust toolchain from 1.53 to 1.55 (which it seems most of the try servers are using, at least for macOS builds).

Given that your changes are graphics related I wonder if we could run a try build with some GFX logging via MOZ_LOG enabled? But not sure which type of logs the graphics component supports, and if that also includes Rust components.

I'll take a look at that, but probably in a separate bug I'll open for follow-up to this one.

I wonder if Mn jobs would also fail on MacOS when run in headless mode, but as it looks like they are not available for that platform even defined in the taskcluster configuration. These use Firefox in a similar way and might also trigger the same issue, where we would be able to catch a minidump. Maybe you could add the headless config - marionette-headless to test-platforms.yml and push such a try build?

Another thing I'll put on the to-do list when I can get to it. Right now my priorities are getting as much AVIF polish and telemetry uplifted for 93.

Flags: needinfo?(jbauman)
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 94 Branch

Comment on attachment 9240184 [details]
Bug 1729539 - Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392. r=jrmuizel,tsmith

Beta/Release Uplift Approval Request

  • User impact if declined: Tab crash on AVIF inputs using HLG transfer functions (currently extremely rare based on telemetry)
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It's merely removing an assert that was ill-conceived to begin with
  • String changes made/needed:
Attachment #9240184 - Flags: approval-mozilla-beta?
See Also: → 1731398

Bugmon Analysis
Verified bug as fixed on rev mozilla-central 20210917215008-186467330eb1.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Status: RESOLVED → VERIFIED
Keywords: bugmon

(In reply to Jon Bauman [:jbauman:] from comment #28)

I interpreted the "OS X 10.15" in the job description to refer to the SDK used for building, not the OS it's actually running on, but I'm not yet very knowledgeable about the CI system generally.

10.15 there means that is the version of macOS those jobs are running on.

Bug 1475652 suggests we are using the 10.11 SDK to build in CI for Intel, I think we are using the 11 SDK for build for apple silicon.

Comment on attachment 9240184 [details]
Bug 1729539 - Hit MOZ_CRASH(assertion failed: y2 > 1. / 12. && y2 <= 1.) at gfx/qcms/src/iccread.rs:1392. r=jrmuizel,tsmith

Approved for uplift in 93 beta 8, thanks.

Attachment #9240184 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1741934
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: