Closed Bug 1364624 Opened 7 years ago Closed 6 years ago

Consider using SRWLOCK instead of CRITICALSECTION for our windows Mutex impl

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- fixed
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed
firefox62 --- fixed

People

(Reporter: erahm, Assigned: erahm)

References

(Depends on 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

Apparently there are perf benefits. This should be doable now that we've dropped support for XP.

Chromium did this a year ago [1], it looks pretty straightforward.

[1] https://codereview.chromium.org/1864323002
Attached patch Switch from CRITICALSECTION to SRWLOCK (obsolete) — — Splinter Review
MozReview-Commit-ID: 6JpGEQyUFz
Attachment #8868236 - Flags: review?(nfroyd)
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Comment on attachment 8868236 [details] [diff] [review]
Switch from CRITICALSECTION to SRWLOCK

Review of attachment 8868236 [details] [diff] [review]:
-----------------------------------------------------------------

This works for me, thanks.

I have seen http://stackoverflow.com/a/37357503 as a suggestion that SRW locks might not be that much faster, but I have also seen https://github.com/rust-lang/rust/pull/20367/commits/ee1ca88213133a58f0a9d234f03babbebeb7c5d8#diff-907be25f650d12cff49637dde8c9ae2eR28 suggesting SRW locks are faster.  The measurements in the Google bug was the first actual numbers I had seen...
Comment on attachment 8868236 [details] [diff] [review]
Switch from CRITICALSECTION to SRWLOCK

Review of attachment 8868236 [details] [diff] [review]:
-----------------------------------------------------------------

Don't know why my previous review didn't flip the r+...
Attachment #8868236 - Flags: review?(nfroyd) → review+
Nothing super obvious either way in the talos results, I guess lets land this and see what happens.
https://hg.mozilla.org/mozilla-central/rev/5b6d169feb92
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Depends on: 1370323
Pushed by erahm@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3802f86e1bd1
Switch from CRITICALSECTION to SRWLOCK. r=froydnj
Comment on attachment 8868236 [details] [diff] [review]
Switch from CRITICALSECTION to SRWLOCK

Approval Request Comment
[Feature/Bug causing the regression]: CRITICAL_SECTION has poorer perf characteristics and a higher memory overhead than SRWLOCK.
[User impact if declined]: Continued perf impact and memory overhead.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Landed previously for 5 weeks.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: N/A.
[Is the change risky?]: Low-med risk.
[Why is the change risky/not risky?]: Baked for 5 weeks before and we didn't see any clear issues, possible SRWLOCK can expose incorrect locking assumptions.
[String changes made/needed]: N/A
Attachment #8868236 - Flags: approval-mozilla-beta?
Comment on attachment 8868236 [details] [diff] [review]
Switch from CRITICALSECTION to SRWLOCK

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Possible perf issues.
Fix Landed on Version: 61
Risk to taking this patch (and alternatives if risky): Low-med risk. Baked for 5 weeks before and we didn't see any clear issues, possible SRWLOCK can expose incorrect locking assumptions.
String or UUID changes made by this patch: 

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8868236 - Flags: approval-mozilla-esr52?
Comment on attachment 8868236 [details] [diff] [review]
Switch from CRITICALSECTION to SRWLOCK

We try to keep ESR changes to critical security patches or major stability problems. With the ESR 60 release coming up in a couple of weeks and only 2 more ESR 52 cycles to go, I don't think we should take this patch for esr52.
Attachment #8868236 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52-
Comment on attachment 8868236 [details] [diff] [review]
Switch from CRITICALSECTION to SRWLOCK

On further discussion, the gains look significant and so let's take this for ESR52 as well as beta 60.
Attachment #8868236 - Flags: approval-mozilla-esr52-
Attachment #8868236 - Flags: approval-mozilla-esr52+
Attachment #8868236 - Flags: approval-mozilla-beta?
Attachment #8868236 - Flags: approval-mozilla-beta+
Backed out changeset 3802f86e1bd1 (bug 1364624) for shutdown hangs on reftests. a=backout

Log:
https://treeherder.mozilla.org/logviewer.html#?job_id=175581647&repo=mozilla-inbound&lineNumber=3798

INFO -  REFTEST TEST-PASS | file:///Z:/task_1524684969/build/tests/reftest/tests/layout/reftests/css-ui-invalid/select/select-novalidate.html == file:///Z:/task_1524684969/build/tests/reftest/tests/layout/reftests/css-ui-invalid/select/select-required-ref.html | image comparison, max difference: 0, number of differing pixels: 0
19:52:39     INFO -  REFTEST TEST-END | file:///Z:/task_1524684969/build/tests/reftest/tests/layout/reftests/css-ui-invalid/select/select-novalidate.html == file:///Z:/task_1524684969/build/tests/reftest/tests/layout/reftests/css-ui-invalid/select/select-required-ref.html
19:52:39     INFO -  REFTEST INFO | Slowest test took 147ms (file:///Z:/task_1524684969/build/tests/reftest/tests/layout/reftests/css-ui-invalid/select/select-invalid-reset.html)
19:52:39     INFO -  REFTEST INFO | Total canvas count = 4
19:52:39     INFO -  [Parent 4952, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:52:39     INFO -  [Child 1204, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:52:39     INFO -  JavaScript error: chrome://reftest/content/reftest.jsm, line 1538: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPropertyBag2.getPropertyAsAString]
19:52:39     INFO -  [Child 1204, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:52:39     INFO -  [Child 5236, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:52:39     INFO -  [Child 5236, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:52:39     INFO -  JavaScript error: chrome://reftest/content/reftest.jsm, line 1538: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPropertyBag2.getPropertyAsAString]
19:52:39     INFO -  [Parent 4952, Gecko_IOThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:52:39     INFO -  JavaScript error: chrome://reftest/content/reftest.jsm, line 1538: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPropertyBag2.getPropertyAsAString]
19:52:39     INFO -  *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
19:52:39     INFO -  1524685959764	Marionette	DEBUG	Received observer notification xpcom-will-shutdown
19:52:39     INFO -  1524685959764	Marionette	DEBUG	New connections will no longer be accepted
19:52:39     INFO -  [GPU 5532, Chrome_ChildThread] WARNING: pipe error: 109: file z:/build/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 346
19:53:43  WARNING -  TEST-UNEXPECTED-FAIL | file:///Z:/task_1524684969/build/tests/reftest/tests/layout/reftests/css-ui-invalid/select/select-novalidate.html | application terminated with exit code 1
19:53:43     INFO -  REFTEST INFO | Downloading symbols from: https://queue.taskcluster.net/v1/task/JveFTRtOSwmgk8cAyW-aeg/artifacts/public/build/target.crashreporter-symbols.zip
19:53:47     INFO -  REFTEST INFO | Copy/paste: Z:\task_1524684969\build\win32-minidump_stackwalk.exe c:\users\task_1524684969\appdata\local\temp\tmpiq1ocr.mozrunner\minidumps\e1282faf-67f8-40e3-8098-edafe5b906de.dmp c:\users\task_1524684969\appdata\local\temp\tmp1eglrk
19:54:01     INFO -  REFTEST INFO | Saved minidump as Z:\task_1524684969\build\blobber_upload_dir\e1282faf-67f8-40e3-8098-edafe5b906de.dmp
19:54:01     INFO -  REFTEST INFO | Saved app info as Z:\task_1524684969\build\blobber_upload_dir\e1282faf-67f8-40e3-8098-edafe5b906de.extra
19:54:01    ERROR -  REFTEST PROCESS-CRASH | file:///Z:/task_1524684969/build/tests/reftest/tests/layout/reftests/css-ui-invalid/select/select-novalidate.html | application crashed [@ MOZ_CrashOOL]
19:54:01     INFO -  Crash dump filename: c:\users\task_1524684969\appdata\local\temp\tmpiq1ocr.mozrunner\minidumps\e1282faf-67f8-40e3-8098-edafe5b906de.dmp
19:54:01     INFO -  Operating system: Windows NT
19:54:01     INFO -                    6.1.7601 Service Pack 1
19:54:01     INFO -  CPU: x86
19:54:01     INFO -       GenuineIntel family 6 model 45 stepping 7
19:54:01     INFO -       8 CPUs
19:54:01     INFO -  GPU: UNKNOWN
19:54:01     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
19:54:01     INFO -  Crash address: 0x73e9e2bb
19:54:01     INFO -  Process uptime: 69 seconds
19:54:01     INFO -  Thread 46 (crashed)

Backout:
https://hg.mozilla.org/mozilla-central/rev/e33e5049d9f35e16893f5ba4ec823f69f04972dd

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/R_ncxU7xQVujC6PizsER8A/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
Status: RESOLVED → REOPENED
Flags: needinfo?(erahm)
Resolution: FIXED → ---
Okay so this is shutdown crash due to a dom worker hang detector triggering a crash:

> 19:54:01     INFO -  Thread 46 (crashed)
> 19:54:01     INFO -   0  mozglue.dll!MOZ_CrashOOL [Assertions.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 33 + 0x0]
> 19:54:01     INFO -      eip = 0x73e9e2bb   esp = 0x1ab4f818   ebp = 0x1ab4f818   ebx = 0x00000003
> 19:54:01     INFO -      esi = 0x0a703800   edi = 0x0a780520   eax = 0x0a780520   ecx = 0x0088f00c
> 19:54:01     INFO -      edx = 0x726f7272   efl = 0x00000212
> 19:54:01     INFO -      Found by: given as instruction pointer in context
> 19:54:01     INFO -   1  xul.dll!mozilla::dom::workerinternals::RuntimeService::CrashIfHanging() [RuntimeService.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 2018 + 0x16]
> 19:54:01     INFO -      eip = 0x5ecc2601   esp = 0x1ab4f820   ebp = 0x1ab4f870
> 19:54:01     INFO -      Found by: call frame info
> 19:54:01     INFO -   2  xul.dll!static void mozilla::`anonymous namespace'::RunWatchdog(void *) [nsTerminator.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 213 + 0x5]

We can see xpcom shutting down and blocking on grabbing the IPC monitor [1]:

> 19:54:01     INFO -  Thread 0
> 19:54:01     INFO -   0  ntdll.dll!KiFastSystemCallRet + 0x0
> 19:54:01     INFO -      eip = 0x77c170b4   esp = 0x0029f368   ebp = 0x0029f3c0   ebx = 0x0029f3d0
> 19:54:01     INFO -      esi = 0xe5e5e5e5   edi = 0x1247bbb0   eax = 0xe5e5e5e5   ecx = 0x00000040
> 19:54:01     INFO -      edx = 0x124de080   efl = 0x00000247
> 19:54:01     INFO -      Found by: given as instruction pointer in context
> 19:54:01     INFO -   1  ntdll.dll!ZwWaitForKeyedEvent + 0xc
> 19:54:01     INFO -      eip = 0x77c169f4   esp = 0x0029f36c   ebp = 0x0029f3c0
> 19:54:01     INFO -      Found by: call frame info
> 19:54:01     INFO -   2  ntdll.dll!RtlpWaitCouldDeadlock + 0x35
> 19:54:01     INFO -      eip = 0x77bed9f3   esp = 0x0029f370   ebp = 0x0029f3c0
> 19:54:01     INFO -      Found by: call frame info
> 19:54:01     INFO -   3  mozglue.dll!static void arena_dalloc(void *, unsigned int, struct arena_t *) [mozjemalloc.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 3510 + 0x7]
> 19:54:01     INFO -      eip = 0x73e8401c   esp = 0x0029f3a8   ebp = 0x0029f3c0
> 19:54:01     INFO -      Found by: stack scanning
> 19:54:01     INFO -   4  mozglue.dll!mozilla::detail::MutexImpl::lock() [Mutex_windows.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 27 + 0x7]
> 19:54:01     INFO -      eip = 0x73e8d097   esp = 0x0029f3c8   ebp = 0x0029f3f8
> 19:54:01     INFO -      Found by: call frame info
> 19:54:01     INFO -   5  mozglue.dll!mozilla::detail::MutexImpl::lock() [Mutex_windows.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 27 + 0x7]
> 19:54:01     INFO -      eip = 0x73e8d097   esp = 0x0029f3d8   ebp = 0x0029f3f8
> 19:54:01     INFO -      Found by: call frame info
> 19:54:01     INFO -   6  xul.dll!mozilla::ipc::MessageChannel::Close() [MessageChannel.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 2716 + 0x8]
> 19:54:01     INFO -      eip = 0x5db0ef3c   esp = 0x0029f3e0   ebp = 0x0029f3f8
> 19:54:01     INFO -      Found by: call frame info
> 19:54:01     INFO -   7  xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager *) [XPCOMInit.cpp:51c27d9d25eaca9db40ef561bd3f9cdab63bf9b0 : 843 + 0x5]

The stack is a bit wonky though. I'm not seeing anything super obvious right now.

[1] https://searchfox.org/mozilla-central/rev/78dbe34925f04975f16cb9a5d4938be714d41897/ipc/glue/MessageChannel.cpp#2705
Flags: needinfo?(erahm)
Okay so we're actually hanging when MonitorAutoUnlock grabs the lock again when it goes out of scope [1], which probably explains the intermittent nature of this.

[1] https://searchfox.org/mozilla-central/rev/78dbe34925f04975f16cb9a5d4938be714d41897/ipc/glue/MessageChannel.cpp#2716
Backed out changeset e21076f36898 (bug 1364624) for shutdown hangs on reftests. a=backout from beta

Backout:
https://hg.mozilla.org/releases/mozilla-beta/rev/e30deae7bb9432539f3ab66be4c5517392d9cff6
This patch also caused extreme breakage of accessibility on Windows when AccessibleHandler.dll is registered system wide (which it is by the installer and the updater, as this is a COM component needed for acceptable accessibility performance). See bug 1456991.
1. I built latest central (which has this patch backed out as per comment 18).
2. Registered AccessibleHandler.dll, ran Firefox and tested with the NVDA screen reader. All works as expected.
3. I re-applied 3802f86e1bd1.
4. Ran Firefox and tested with the NVDA screen reader. Bustage!
Target Milestone: mozilla55 → ---
Depends on: 1456991
Wontfix for 60.0 as we're entering RC week.
Comment on attachment 8868236 [details] [diff] [review]
Switch from CRITICALSECTION to SRWLOCK

Resetting uplift approval flags as this was backed out.
Attachment #8868236 - Flags: approval-mozilla-esr52+
Attachment #8868236 - Flags: approval-mozilla-beta+
aklotz, do you think you could help us debug what went horribly wrong with AccessibleHandler.dll (comment 22)?

dmajor, I have an usubstantiated hunch that the code generation got weird for `MessageChannel::Close` [1] on opt/pgo builds. Any chance you could help take a look at that?

[1] https://searchfox.org/mozilla-central/rev/78dbe34925f04975f16cb9a5d4938be714d41897/ipc/glue/MessageChannel.cpp#2700-2745
Flags: needinfo?(dmajor)
Flags: needinfo?(aklotz)
Switching from an AutoLock to manually calling lock/unlock seems to have fixed the shutdown hang:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=8abe49246936d69c64c15761cdc1d93bf312c48c

Switching to clang-cl also seems to have fixed the shutdown hang:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed1126240a049ac302c5f7eb6769d7c810e95b03

That seems to support the codegen hypothesis, but it's also possible this issue doesn't repro on try or I need more than 5 runs per reftest.
Okay and just to double check I've got a build with the origin changes going to see if this is some sort of try issue:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=529b9cc4a84a547d20274767de18306bec083436

And build that is a straight conversion of the auto(un)locks to manual (un)locks:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=87fc33cf7daa67c1b7f698cf0b8e016c55dc0182

Those should both fail, if only the first one fails I'd feel more comfortable pointing to a codegen issue. If neither fails I'm not sure what we can do.
I can easily repro the a11y problem, but there is no smoking gun. I do see some debug spew that points to some problems, but it is going to take some time for me to figure out where that is coming from.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #31)
> Okay and just to double check I've got a build with the origin changes going
> to see if this is some sort of try issue:
>  
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=529b9cc4a84a547d20274767de18306bec083436

The original patch does repro on try.
(In reply to Aaron Klotz [:aklotz] from comment #32)
> I can easily repro the a11y problem, but there is no smoking gun. I do see
> some debug spew that points to some problems, but it is going to take some
> time for me to figure out where that is coming from.

I found it! There is a Mutex reentry. I have filed bug 1459085 to fix this.
Depends on: 1459085
Flags: needinfo?(aklotz)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #31)
> And build that is a straight conversion of the auto(un)locks to manual
> (un)locks:
>  
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=87fc33cf7daa67c1b7f698cf0b8e016c55dc0182

This fails as well at roughly the same spot (so when we lock again when scoping out of the if).

To summarize from the various pushes: if we remove the extra lock (scoping out of if) / unlock (scoping out of the function) pair things work fine. We're clearly unlocking before calling `NotifyMaybeChannelError`, so again either there's something weird in the codegen optimizing out the unlock or possibly we're leaving the lock locked in the `NotifyMaybeChannelError` codepath, but if that were the case I'd expect to see an assertion in the deadlock detector on debug builds.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #35)
> (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment
> #31)
> > And build that is a straight conversion of the auto(un)locks to manual
> > (un)locks:
> >  
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=87fc33cf7daa67c1b7f698cf0b8e016c55dc0182
> 
> This fails as well at roughly the same spot (so when we lock again when
> scoping out of the if).

I think this is the key. The RAII objects result in a "superfluous" Lock+Unlock pair at line 1.32-1.36 that you had optimized out in your earlier translation to manual locking code. https://hg.mozilla.org/try/rev/256620673a9a#l1.32

I am still of the opinion that we should try to explain this with run-of-the-mill code misbehavior before turning to codegen problems. What if the Unlock at line 1.28 woke somebody up, and they took some action, that either made the Lock at line 1.32 take a long time, or be unsafe/incorrect somehow?
Flags: needinfo?(dmajor)
(In reply to David Major [:dmajor] from comment #36)
> I am still of the opinion that we should try to explain this with
> run-of-the-mill code misbehavior before turning to codegen problems. What if
> the Unlock at line 1.28 woke somebody up, and they took some action, that
> either made the Lock at line 1.32 take a long time, or be unsafe/incorrect
> somehow?

In theory we should see that in the CRITICAL_SECTION version as well, but SRWLock isn't fair so maybe there's a different ordering. OTOH the odd looking stack definitely makes this look like a re-entrant lock.
This switches over to manaually managing the locking in MessageChannel::Close
in order to avoid a deadlock on msvc opt builds. It has the added benefit of
avoid a superfluos lock/unlock pair.
Attachment #8973275 - Flags: review?(nfroyd)
Attachment #8868236 - Attachment is obsolete: true
Comment on attachment 8973276 [details] [diff] [review]
Part 2: Switch from CRITICALSECTION to SRWLOCK

Carry forward r+ (this is just a title change)
Attachment #8973276 - Flags: review+
Comment on attachment 8973275 [details] [diff] [review]
Part 1: Manually manage locks in MessageChannel::Close

Review of attachment 8973275 [details] [diff] [review]:
-----------------------------------------------------------------

Nit: "superfluous" and "manually managing" in the commit message.

::: ipc/glue/MessageChannel.cpp
@@ +2709,5 @@
> +        mMonitor->Lock();
> +
> +        // Instead just use a ScopeExit to manage the unlock.
> +        RefPtr<RefCountedMonitor> monitor(mMonitor);
> +        auto exit = MakeScopeExit([monitor] () {

Should this be &monitor instead, so we're not taking a separate strong reference?  Or [monitor = Move(monitor)] or something like that?
Attachment #8973275 - Flags: review?(nfroyd) → review+
This switches over to manually managing the locking in MessageChannel::Close
in order to avoid a deadlock on msvc opt builds. It has the added benefit of
avoid a superfluous lock/unlock pair.
Attachment #8973275 - Attachment is obsolete: true
Comment on attachment 8973341 [details] [diff] [review]
Part 1: Manually manage locks in MessageChannel::Close

Updated for typos in patch message, switched to Move() per review. Carrying forward r+.
Attachment #8973341 - Flags: review+
Given the code freeze, should I hold off until next week? I've got a few hundred reftest runs in so I feel okay about this being fixed. The a11y issue seems to be fixed and landed, but I can't guarantee their won't be another weird issue that doesn't get tested in our infra.

Given the timing maybe just getting this queued up for the next dot-releases is the best option.
Flags: needinfo?(ryanvm)
Yes, this missed the boat for previous cycle. We're into the Fx62 nightly cycle now.
Flags: needinfo?(ryanvm)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #37)
> maybe there's a different ordering. OTOH the odd
> looking stack definitely makes this look like a re-entrant lock.

I guess mutex re-entry wasn't the problem in this case. However, it certainly was for accessibility (bug 1459085). Is there any reason we don't assert when re-entering a mutex?
(In reply to James Teh [:Jamie] from comment #47)
> (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment
> #37)
> > maybe there's a different ordering. OTOH the odd
> > looking stack definitely makes this look like a re-entrant lock.
> 
> I guess mutex re-entry wasn't the problem in this case. However, it
> certainly was for accessibility (bug 1459085). Is there any reason we don't
> assert when re-entering a mutex?

The deadlock detector should assert in debug builds. From what I can tell, the accessibility DLL code isn't tested in automation.
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #48)
> The deadlock detector should assert in debug builds. From what I can tell,
> the accessibility DLL code isn't tested in automation.

Indeed it isn't. I just wanted to make sure there was an assert at the right point in the stack which would have allowed us to catch this easily in debug builds.
https://hg.mozilla.org/mozilla-central/rev/3c9daea7564e
https://hg.mozilla.org/mozilla-central/rev/fa9c21a4ab94
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Comment on attachment 8973341 [details] [diff] [review]
Part 1: Manually manage locks in MessageChannel::Close

Lets do this again...

Approval Request Comment
[Feature/Bug causing the regression]: CRITICAL_SECTION has poorer perf characteristics and a higher memory overhead than SRWLOCK.
[User impact if declined]: Continued perf impact and memory overhead.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Baked on Nightly for 4 days.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: Bug 1459085, but that's already in 61.
[Is the change risky?]: Low-med risk.
[Why is the change risky/not risky?]: Baked for 5 weeks before, now on Nightly for a few days and we didn't see any clear issues. It's possible SRWLOCK can expose incorrect locking assumptions and/or deadlocks due to reentrancy. We think those bugs have been fixed.
[String changes made/needed]: N/A
Attachment #8973341 - Flags: approval-mozilla-beta?
Comment on attachment 8973276 [details] [diff] [review]
Part 2: Switch from CRITICALSECTION to SRWLOCK

Approval Request Comment

See patch 1.
Attachment #8973276 - Flags: approval-mozilla-beta?
Comment on attachment 8973341 [details] [diff] [review]
Part 1: Manually manage locks in MessageChannel::Close

Re-requesting ESR approval.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Possible perf issues.
Fix Landed on Version: 60, uplift requested for 61.
Risk to taking this patch (and alternatives if risky): Low-med risk. Baked for 5 weeks before, 4 days on Nightly now and we didn't see any clear issues, possible SRWLOCK can expose incorrect locking assumptions.
String or UUID changes made by this patch: N/A

This needs to land with bug 1459085 which is also nominated for ESR60.
Attachment #8973341 - Flags: approval-mozilla-esr60?
Comment on attachment 8973276 [details] [diff] [review]
Part 2: Switch from CRITICALSECTION to SRWLOCK

See patch 1.
Attachment #8973276 - Flags: approval-mozilla-esr60?
Comment on attachment 8973341 [details] [diff] [review]
Part 1: Manually manage locks in MessageChannel::Close

Taking this for 61.0b6 so we can see how it fares with a wider audience.
Attachment #8973341 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8973276 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8973341 [details] [diff] [review]
Part 1: Manually manage locks in MessageChannel::Close

locking change for 60.1esr
Attachment #8973341 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Attachment #8973276 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Please nominate this for ESR52 also when you get a chance.
Flags: needinfo?(erahm)
Comment on attachment 8973276 [details] [diff] [review]
Part 2: Switch from CRITICALSECTION to SRWLOCK

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: Possible perf issues.
Fix Landed on Version: 60, uplift approved for 61 and esr 60.
Risk to taking this patch (and alternatives if risky): Low-med risk. Baked for 5 weeks before, 4 days on Nightly now and we didn't see any clear issues, possible SRWLOCK can expose incorrect locking assumptions.
String or UUID changes made by this patch: N/A

This needs to land with bug 1459085.
Flags: needinfo?(erahm)
Attachment #8973276 - Flags: approval-mozilla-esr52?
Comment on attachment 8973341 [details] [diff] [review]
Part 1: Manually manage locks in MessageChannel::Close

see patch 2 request
Attachment #8973341 - Flags: approval-mozilla-esr52?
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #61)
> Comment on attachment 8973276 [details] [diff] [review]
> Part 2: Switch from CRITICALSECTION to SRWLOCK
> 
> [Approval Request Comment]
> If this is not a sec:{high,crit} bug, please state case for ESR
> consideration:
> User impact if declined: Possible perf issues.
> Fix Landed on Version: 60, uplift approved for 61 and esr 60.
> Risk to taking this patch (and alternatives if risky): Low-med risk. Baked
> for 5 weeks before, 4 days on Nightly now and we didn't see any clear
> issues, possible SRWLOCK can expose incorrect locking assumptions.
> String or UUID changes made by this patch: N/A
> 
> This needs to land with bug 1459085.

Please ni aklotz again on bug 1459085 to verify that it isn't a blocker for this bug, he indicated it's not but I'm not 100% sure of that.
Comment on attachment 8973276 [details] [diff] [review]
Part 2: Switch from CRITICALSECTION to SRWLOCK

approved for 52.9esr
Attachment #8973276 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Attachment #8973341 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Part 2 needs rebasing for ESR52.
Flags: needinfo?(erahm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #66)
> Part 2 needs rebasing for ESR52.

Oh right, we reworked the mutex code since then. I'm on PTO for the next week, Nathan do you think you can take a look? Otherwise I'll take a look when I'm back.
Flags: needinfo?(nfroyd)
(In reply to Out until June 4 - Eric Rahm [:erahm] (please no mozreview requests) from comment #67)
> (In reply to Ryan VanderMeulen [:RyanVM] from comment #66)
> > Part 2 needs rebasing for ESR52.
> 
> Oh right, we reworked the mutex code since then. I'm on PTO for the next
> week, Nathan do you think you can take a look? Otherwise I'll take a look
> when I'm back.

Sure, np.

Actually, yes problem.  Because this requires taking bug 1312086 and bug 1312087 as well.  I can do/have done the port already, but how comfortable are we porting all of that to ESR52 at this stage?  Those changes have gone through a lot of testing at this point, but that is a lot of churn for a...ah, neglible (?) performance win.
Flags: needinfo?(nfroyd) → needinfo?(ryanvm)
No, not worth it.
Flags: needinfo?(ryanvm)
Flags: needinfo?(erahm)
Attachment #8973341 - Flags: approval-mozilla-esr52+ → approval-mozilla-esr52-
Attachment #8973276 - Flags: approval-mozilla-esr52+ → approval-mozilla-esr52-
See Also: → 1472137
Depends on: 1472018
Depends on: 1472137
Depends on: 1471824
Depends on: 1469603
Depends on: 1474007
See Also: → 1475776
You need to log in before you can comment on or make changes to this bug.