Closed Bug 1258072 Opened 9 years ago Closed 4 years ago

Hang in Nightly with stack sampling; happens when using a Mozilla internal Jenkins Ops tool

Categories

(Core :: Networking, defect, P3)

48 Branch
x86_64
macOS
defect

Tracking

()

RESOLVED DUPLICATE of bug 698882
Tracking Status
firefox48 --- affected

People

(Reporter: jrgm, Assigned: jrgm)

Details

(Whiteboard: [necko-backlog])

Attachments

(7 files)

Attached file NightlyHang.txt
Hi Bill, This is that hang that I experience when using a Jenkins internal ops tool. I'm attaching a process sample from Activity Monitor. Some notes: - E10S is not enabled in this profile. (Although, I used to have it on, and would see similar hangs). - Nightly entered into this hang by initially burning 250% CPU for a few minutes, and then dropped to ~0% CPU and "Not Responding" showing in Activity Monitor. - A Quit from Activity Monitor had no effect. I had to Force Quit.
Do you have enough information from these four stack samples, or shall I just keep submitting more?
Attached file And another one gone
Attached file And another one gone
It looks like the call to PR_SetPollableEvent is expected to be non-blocking. But in this case we're writing so much data that we block waiting for the queue to empty. This may actually be an NSPR bug, but I'll needinfo Patrick since he probably has a better idea.
Assignee: wmccloskey → nobody
Component: General → Networking
Flags: needinfo?(mcmanus)
your timing is pretty amazing. This is a dup of bug 698882 which has been open for years and was just merged to mozilla-central one hour ago. retest when it hits a nightly build? so yes, PR_SetPollableEvent uses a blocking queue which is a serious bug, and it can cause a deadlock when tons of events are generated on the socket thread.. which rarely happens - but apparently jenkins makes it happen somehow? in any event, the deadlock should be fixed by 698882 whenever it sticks (it has a uncovered several unrelated latent bugs and been backed out a few times).
Flags: needinfo?(mcmanus)
Cool. I'll see if I get this hang again. (Given my recent rate of these hangs, if I don't see it in a week or so, it probably means it's been fixed).
Whiteboard: [necko-active]
Assignee: nobody → jrgm
Flags: needinfo?(jrgm)
So, I believe I had the same hang in the past two weeks, but I didn't have time right then to capture a trace, as I had a more pressing problem to address. Sorry. If I trigger it again, and I have a bit of time to capture the stack, I will.
Flags: needinfo?(jrgm)
Whiteboard: [necko-active] → [necko-backlog]
Priority: -- → P1
Priority: P1 → P3
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: