Closed Bug 1917201 Opened 1 year ago Closed 1 year ago

Crash in [@ mozilla::detail::InvalidArrayIndex_CRASH | nsTArray_Impl<T>::ElementAt | nsWifiMonitor::CallWifiListeners]

Categories

(Core :: DOM: Geolocation, defect)

Desktop
Windows 11
defect

Tracking

()

VERIFIED FIXED
133 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 132+ verified
firefox130 --- unaffected
firefox131 --- wontfix
firefox132 + verified
firefox133 --- verified

People

(Reporter: atrif, Assigned: handyman)

References

(Regression)

Details

(Keywords: crash, regression, Whiteboard: [fidefe-quality-foundation?] )

Crash Data

Attachments

(5 files, 1 obsolete file)

Crash report: https://crash-stats.mozilla.org/report/index/431c368d-c45f-4a28-ae92-138560240906

Reason: EXCEPTION_BREAKPOINT

Top 10 frames:

0  mozglue.dll  MOZ_Crash(char const*, int, char const*)  mfbt/Assertions.h:317
0  mozglue.dll  mozilla::detail::InvalidArrayIndex_CRASH(unsigned long long, unsigned long long)  mfbt/Assertions.cpp:50
1  xul.dll  nsTArray_Impl<WifiListenerHolder, nsTArrayInfallibleAllocator>::ElementAt(uns...  xpcom/ds/nsTArray.h:1207
1  xul.dll  mozilla::ArrayIterator<WifiListenerHolder&, nsTArray_Impl<WifiListenerHolder,...  xpcom/ds/ArrayIterator.h:107
1  xul.dll  nsWifiMonitor::CallWifiListeners(nsTArray<RefPtr<nsIWifiAccessPoint> > const&...  netwerk/wifi/nsWifiMonitor.cpp:384
2  xul.dll  mozilla::detail::RunnableMethodArguments<nsTArray<RefPtr<nsIWifiAccessPoint> ...  /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/tuple:944
2  xul.dll  std::invoke(mozilla::detail::RunnableMethodArguments<nsTArray<RefPtr<nsIWifiA...  /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/type_traits:1739
2  xul.dll  std::_Apply_impl(mozilla::detail::RunnableMethodArguments<nsTArray<RefPtr<nsI...  /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/tuple:1077
2  xul.dll  std::apply(mozilla::detail::RunnableMethodArguments<nsTArray<RefPtr<nsIWifiAc...  /builds/worker/fetches/vs/VC/Tools/MSVC/14.39.33519/include/tuple:1088
2  xul.dll  mozilla::detail::RunnableMethodArguments<nsTArray<RefPtr<nsIWifiAccessPoint> ...  xpcom/threads/nsThreadUtils.h:1083

This was encountered when testing the new Geolocation tests provided by David (QA-2518).

  • 131.0b2

Affected versions

  • 132.0a1 (2024-09-05)
  • 131.0b2

Tested platforms

  • Affected platforms: Windows 11 aarch64 24H2, Windows 11 24h2 - Release Preview (26100.1591)
  • Unaffected platforms: Windows 10, Windows 11 23H2, macOS 12, Ubuntu 22

Preconditions

  • Windows 11 24H2 Release Preview (26100.1591)
  • Geolocation was not used before - make a new Windows user profile and the follow the steps

Steps to reproduce

  1. Open https://developer.mozilla.org/en-US/docs/Web/API/Geolocation_API/Using_the_Geolocation_API#result
  2. Click Show my location button and then Click on Allowbutton.
  3. Click Cancel on Firefox modal Windows.
  4. Click No or Yes on Windows modal Window.

Expected result

  • No crash.

Actual result

  • Firefox crashes.

Regression range

  • Most likely started with bug 1900225 since this is not reproducible with Firefox 130 because the Awaiting Location Permission modal window is not displayed on Firefox 130.0.
  • Marking bug 1900225 since performing an actual regression range is difficult due to the fact that this requires a new windows profile each time. Please let us know if this may not be correct.

Additional notes

  • Attached a screen recording.
  • This does not reproduce on Windows 23H2 because the Windows modal dialog is not displayed.
Flags: needinfo?(davidp99)
Keywords: regression
Regressed by: 1900225
Whiteboard: [fidefe-quality-foundation?]
Attached image crash_geo_step2.gif

Adding a new set of steps. This crash leads to this bugzilla ticket. Please let us know if a different issue is necessary for the following steps:

  1. Open https://developer.mozilla.org/en-US/docs/Web/API/Geolocation_API/Using_the_Geolocation_API#result
  2. Click Show my location button and then Click on Allow button.
  3. Open a new webapge in the tab where the geolocation is requested.
  4. Click the back arrow.
  5. Click Show my location button and click Yes on the Windows location request.
    AR: Firefox crashes. Attaching a screen recording.

Seems the nsIWifiListener is part of an iteration over the array that it removes itself from -- and this invalidates an iterator (also here). My sense is that it would be better to make the nsWifiMonitor collection able to be inserted/removed from while iterating, but I also believe maintaining the insertion order is required, so it's not a trivial change.

Assignee: nobody → davidp99
Flags: needinfo?(davidp99)

:handyman any chance of getting a fix for this for late beta 131? I see you mentioned it is not a trivial change and wondering if we should plan to accept this regression until next cycle as the volume is relatively low.

Flags: needinfo?(davidp99)

Eh, it looks like I spoke too soon in comment 3 -- the fix for this probably isn't just one thing, so I don't think beta 131 is likely. comment 3 would probably fix the crash, but it would not be the right fix.

The details:

First, the STR for this is issue are pretty weird. When Windows presents the one-time dialog that is always-on-top, they require the user to ignore it and (comment 0) dismiss the dimming modal dialog covering the tab in Firefox by pressing cancel or (comment 2) navigate to a new page with the URL bar and then return.

comment 0 and comment 2 aren't totally different issues but they suggest different fixes. For comment 0, the most obvious "fix" is to remove the cancel button in Firefox. The modal would be dismissed when the user makes a selection in the Windows one-time dialog. It was included because it was easier (it's needed for other new geolocation UX) and because the one-time dialog is currently a little flaky (appearing at seemingly random times, unrelated to any geolocation, etc). With the cancel button, the user can circumvent any unexpected behavior, but testing suggests that may not be needed after all.

For comment 2, the cancel isn't needed because the user can navigate to another page. We don't want to modally block Firefox during the one-time dialog (since it doesn't block anything else), so this will remain a possibility. The real issue here is that navigating to another page should be canceling the wifi listener. For example, the "fix" in comment 3 would leave the wifi listener running indefinitely, and accomplishing nothing -- essentially a leak.

I'm not totally certain that these are the right fixes, but comment 3 definitely isn't.

Flags: needinfo?(davidp99)

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit BugBot documentation.

Severity: S2 → S3

The steps in the video aren't as weird as I made them sound in comment 5. In my experiments, the VM screen was small and the system dialog covered the parts of Firefox that I needed to interact with to cause the bug (and the system dialog, while not exactly modal, cannot be moved). So I had the wrong impression of how unlikely the bug was. The video in comment 1 shows it is easy.

I've got a patch that removes the cancel button. It was unnecessary and was really only there because it is needed for the other geolocation dialog.

It would be preferable if the system dialog were in fact modal but it looks better without the confusing cancel button anyway.

Better video, showing interaction with browser but not with page. Also added a spinner to the Firefox dialog.

Attachment #9430028 - Attachment is obsolete: true

The cancel button is not really useful here and is probably confusing some
users, since they are pressing it anyway, and it is causing this crash. The
ideal fix would be for the Windows dialog to be modal instead of just immovable,
but it isn't.

Hi Gijs, just poking on the review for this bug as it's an uplift target for this cycle still. Thanks!

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)

Hi Gijs, just poking on the review for this bug as it's an uplift target for this cycle still. Thanks!

Sorry, a combination of travel and leave has meant I've been slow with reviews generally. Today a poor connection had waylaid my review making it through to phab; thanks for the flag, it's there now.

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)

Hi Gijs, just poking on the review for this bug as it's an uplift target for this cycle still. Thanks!

Sorry, a combination of travel and leave has meant I've been slow with reviews generally. Today a poor connection had waylaid my review making it through to phab; thanks for the flag, it's there now.

Ugh, so apparently this wasn't the handyman patch I thought it was. But this one r+'d now too.

Thanks Gijs.

There was some concern that doing other things than pressing "cancel" could also crash. I toyed around with operations in the browser while the system dialog is up (things like reloading the relevant tab, going to another tab to load, etc -- all weird things to do while the dialog is up) and I haven't seen issues. It looks like Windows shuts off network access while it's showing that dialog, so most of those operations just end up with an empty tab. Functionality returns after the user makes a choice in the system dialog. So there at least aren't obvious issues beyond the cancel button that this patch fixes.

Pushed by daparks@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8e1ebf9c03bb Remove cancel button from modal while waiting for user to respond to Windows geolocation dialog r=Gijs

Comment on attachment 9430041 [details]
Bug 1917201: Remove cancel button from modal while waiting for user to respond to Windows geolocation dialog r=Gijs!

Beta/Release Uplift Approval Request

  • User impact if declined/Reason for urgency: The crash that spawned this bug. This happens if the user pressed the "Cancel" button in Firefox while the browser is behind the system dialog.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The prompt change is only active for this rare Windows geolocation issue. It can only happen once per machine (not even once per install). The greatest concern would be that the cancel button is needed if the Windows APIs don't properly report when the system dialog is dismissed (since that's when we also dismiss our dialog), but the Windows behavior has been tested in QA and has been live for a couple of weeks and we have no reports of that happening. And, again, if that were an issue, it could only happen once per Windows OS install.
  • String changes made/needed: N/A
  • Is Android affected?: No
Attachment #9430041 - Flags: approval-mozilla-beta?
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch
Attachment #9430041 - Flags: approval-mozilla-esr128?

Backed out for causing mochitest failures on test_modal_prompts.html

Backout link

Push with failures

Failure log

Status: RESOLVED → REOPENED
Flags: needinfo?(davidp99)
Resolution: FIXED → ---
Target Milestone: 133 Branch → ---

This is the first time mach try auto missed something for me. I'm looking into it.

Flags: needinfo?(davidp99)

This is easy. There is already an exception for this case, elsewhere in the test. I just need to duplicate that conditional.

Pushed by daparks@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0bbd77193d50 Remove cancel button from modal while waiting for user to respond to Windows geolocation dialog r=Gijs
Blocks: 1925401
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch
Flags: needinfo?(davidp99)

Comment on attachment 9430041 [details]
Bug 1917201: Remove cancel button from modal while waiting for user to respond to Windows geolocation dialog r=Gijs!

A bit nervous about the new crash in comment 24, but it seems like this shouldn't make things any worse and will hopefully improve the current situation. Approved for 132.0rc1.

Attachment #9430041 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

A bit nervous about the new crash

Flags: in-testsuite+

Hello! We tried to verify this issue today with Firefox 133.0a1 (2024-10-20) and 132.0b10 from comment 27 on Windows 11 ARM 24H2 (26100.2152) and Win 11 24H2 (26100.2033) but it seems that we have mixed results between Nightly and beta 132.0b10 and some other issues like:

  1. Firefox 133.0a1 (2024-10-20) does not show the newly implemented Awaiting Location Permission dialog without the Cancel button when the location is the first time ever used (new Windows profile). This shows for some reason on 132.0b10. Screen recording: link
  2. Firefox 133.0a1 (2024-10-20) and 132.0b10 still crash when using the steps from comment 2. Screen recording: link.
  3. Firefox 133.0a1 (2024-10-20) crashes after clicking Allow on the location permission and then selecting Yes or No on the Let Windows and apps access your location dialog. Screen recording: link. This may be because of the problem stated in Step 1.
  4. Firefox 132.0b10 crashes when hitting the Esc key while the Awaiting Location Permission dialog is displayed and then select the Yes/ No options from the Windows dialog. Screen recording: link
  5. Having a standard user account with a disabled location service and clicking on the Open Settings Windows dialog does nothing. Note that this is not dependent on this bug but we wanted to make sure if this is a Firefox issue or not. Clicking the second time on the location permission will open Windows settings. Screen recording: link

Please let us know for which of the above steps should we open new issues. Note that each scenario was tested on a new clean Windows user.

Comment on attachment 9430041 [details]
Bug 1917201: Remove cancel button from modal while waiting for user to respond to Windows geolocation dialog r=Gijs!

Approved for 128.4esr.

Attachment #9430041 - Flags: approval-mozilla-esr128? → approval-mozilla-esr128+

Thanks @atrif. This is all pretty strange. I'm mostly concerned with your point #1.

  1. I am not able to reproduce this but it is the most concerning of the group. Right now I'm having problems with my VM so I haven't been able to try the nightly build yet. @atrif, is there any chance this was ARM only?
  2. I had forgotten about comment 2. When I tried this myself in comment 15, all I got was the spinner on the tab when I tried to go to a new URL, but I can reproduce this now. Still, this is a corner case that requires some weird user behavior. We don't need to block for it.
  3. I also think that this should not be possible without the main issue in point #1.
  4. Same feelings as point #2 -- this is fairly odd user behavior -- but I'm also surprised that this happens. I didn't think the dialog could be dismissed with escape. I should at least be able to stop that.
  5. This is a separate issue. I've never even seen that system dialog. Do you know why you are getting a dialog with "Open Settings" and "Got It" instead of the usual "Yes" and "No"? The settings page that comes up in the video says something about location settings being turned of by "an admin on this device" -- is that something new that creates the difference?

(In reply to Alexandru Trif, Desktop Test Engineering [:atrif] from comment #28)

Please let us know for which of the above steps should we open new issues. Note that each scenario was tested on a new clean Windows user.

Thanks, can you create an issue for each of these except #3 (which can be folded into issue #1)? Also lmk whether running ARM was relevant to these failures?

Flags: needinfo?(davidp99) → needinfo?(atrif)

I finally have some idea of what's happening here. The difference between 132.0b10 and 133.0a1 is (apparently) due to the patch in bug 1925401 breaking nightly. Since it's not slated for (or needed for) ESR and beta, the issue won't appear there. To be clear, I'm talking about the missing dialog in comment 28 point #1 (and subsequently point #3). From the lesser issues, point #4 presumably does apply to beta/ESR, as does point #2. I haven't been looking into #5.

I'm not sure why bug 1925401 breaks things -- I did think I saw it working before -- but I'm getting a tell-tale message in the browser console: "request for modal prompt with no buttons requires flags to be BUTTON_NONE", which came from bug 1925401 and explains why the modal isn't appearing. This (and point #4 about the ESC key) are probably not too serious after all. (And I can see that they aren't related to ARM.)

Issues #1/#3 are being dealt with in bug 1926149. @atrif, can you still file bugs with the details for #2, #4 and #5? Thanks.

Flags: qe-verify+

Thank you, David! Marking this issue verified fixed with Firefox 132.0 RC1 and 128.4.0esr. The Cancel button is no longer displayed when requesting and allowing location permission for the first time ever on Windows 11 24H2 (26100.2152- ARM) & (26100.2033- AMD). Screenshot attached.
Note: Firefox 133.0a1 (2024-10-21) cannot be verified because the dialog is not displayed (see bug 1926149 and comment 28 - issue #1). I will reverify this with 133 once bug 1926149 is fixed.

For the other issues stated in comment 28, there are separate issues filed as follows:

  1. For issues #1 and #3 stated in comment 28 bug 1926149 was filed.
  2. For issue #2 stated in comment 28 bug 1926196 was filed.
  3. For issue #4 stated in comment 28 bug 1926201 was filed.
  4. For issue #5 stated in comment 28 bug 1926216 was filed (note that this is not dependent on these changes since it's reproducing with Firefox 125.0)
Flags: needinfo?(atrif)
QA Whiteboard: [qa-triaged]

Bug 1926149 was fixed and verified with Firefox 133.0a1 (2024-10-23) on Windows 11x64 24H2 (26100.2152- ARM) & (26100.2033- AMD). The Awaiting Location Permission dialog without the Cancel button is correctly displayed when the location is used for the first time on the Windows account. Also, no crash can be seen after choosing Yes/ No on the Let Windows and apps access your location windows dialog.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: