Closed Bug 1362626 Opened 7 years ago Closed 7 years ago

The lock in ChannelEventQueue::RunOrEnqueue() can be contended

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1354455
Performance Impact high

People

(Reporter: ehsan.akhgari, Assigned: schien)

References

Details

(Whiteboard: [necko-active])

See this profile: https://perfht.ml/2pjFHPq

This lock is grabbed here: <https://searchfox.org/mozilla-central/rev/6580dd9a4eb0224773897388dba2ddf5ed7f4216/netwerk/ipc/ChannelEventQueue.h#150>

The other place where the lock is held is <https://searchfox.org/mozilla-central/rev/6580dd9a4eb0224773897388dba2ddf5ed7f4216/netwerk/ipc/ChannelEventQueue.cpp#47> where events are run in a loop.  I'm not familiar with this code and don't know if this loop can have an expected upper bound...  It seems prone to contention at least.

This bug blocks a quantum release criteria use case.  This same thing also shows up in another profile of the same use case: https://perfht.ml/2pjx8UX

See https://docs.google.com/spreadsheets/d/1Kxn850VasyaG_WfRg3pMInW0hbIT8LP7pRPBYTIpdbM/edit#gid=622178261 for how to reproduce.
The mRunningMonitor is grabbed before the flush/suspension check because there can be a race condition that an event might be enqueued but not dequeued during the end of flushing operation.
One solution is to find a lock-free algorithm to avoid that issue above.
Another workaround is to release and re-grab the mRunningMonitor after processing a fixed number of events in queue, so that we can avoid the lock contention.
During the investigation, the solution to bug 1354455 might be able to fix the lock contention as well. With a correct fix of the lock-free algorithm we can remove the mRunningMonitor dependency between RunOrEnqueue and FlushQueue.
Assignee: nobody → schien
Whiteboard: [necko-active]
Whiteboard: [necko-active] → [necko-active][qf]
Whiteboard: [necko-active][qf] → [necko-active][qf:p1]
As comment #2 mentioned, the problematic code is removed after bug 1354455 is landed. So, we should be able to close this issue as well.

@Jean is there anyone can help to verify the result on current code base? I tried to follow the test procedure from the google doc but could not figure out how to get the numbers for comparison.
Flags: needinfo?(jgong)
Close this issue based on comment #2. feel free to file another bug if there is still lock contention issue found during test.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #3)
> As comment #2 mentioned, the problematic code is removed after bug 1354455
> is landed. So, we should be able to close this issue as well.
> 
> @Jean is there anyone can help to verify the result on current code base? I
> tried to follow the test procedure from the google doc but could not figure
> out how to get the numbers for comparison.
Duplicate bug resolved.
Flags: needinfo?(jgong)
Performance Impact: --- → P1
Whiteboard: [necko-active][qf:p1] → [necko-active]
You need to log in before you can comment on or make changes to this bug.