Closed Bug 774779 Opened 9 years ago Closed 9 years ago

Firefox sync is causing jank

Categories

(Firefox :: Sync, defect)

defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 600059

People

(Reporter: BenWa, Unassigned)

Details

We're seeing in profiles that Firefox sync is nesting the event loop to create async calls:
http://people.mozilla.com/~bgirard/cleopatra/?report=6d898dcec658b899f4fffc588b4b21c396b68752

Source:
http://mxr.mozilla.org/mozilla-central/source/services/common/async.js#89

I don't see any comments of analysis showing that spinning an event loop from this code is safe AND is fundamentally neccesary.
Yes we are. We don't like it either. Fixing this will require rewriting Sync or a time machine.

Bug 744321 tracks rewriting Sync. I promise we won't have any event loop spinning when we're done.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 744321
Do we have an ETA on the rewrite? If it's far out we should take steps to mitigate this since these are very serious problems.
(In reply to Benoit Girard (:BenWa) from comment #0)
> I don't see any comments of analysis showing that spinning an event loop
> from this code is safe AND is fundamentally neccesary.

FWIW, this is somewhat convenient in our metro work since we want to force a 'sync' sync on suspension of the browser. We will need to hold the calling thread no matter what since once we return from Metro's suspend event callback the process is immediately suspended.
(In reply to Benoit Girard (:BenWa) from comment #2)
> Do we have an ETA on the rewrite? If it's far out we should take steps to
> mitigate this since these are very serious problems.

Can you elaborate? Nobody has complained about this before. Given us bad looks because of event loop spinning, yes. But, nobody has really said it is a fundamental problem for performance.

(In reply to Jim Mathies [:jimm] from comment #3)
> FWIW, this is somewhat convenient in our metro work since we want to force a
> 'sync' sync on suspension of the browser. We will need to hold the calling
> thread no matter what since once we return from Metro's suspend event
> callback the process is immediately suspended.

If Metro needs to spin the event loop, Metro's Sync can deal with that problem, IMO.
Here is an example profile where sync is preventing UI from responding(switching tabs in this case). May this have something to do with our event filtering due to nested event loops?
http://people.mozilla.com/~bgirard/cleopatra/?report=6d898dcec658b899f4fffc588b4b21c396b68752
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Summary: Firefox sync is nesting event loop → Firefox sync is causing jank
Event loop spinning in Sync needs to DIAF. We are actively working towards that and this will get done when the rewrite is complete. I don't think it makes sense to track this general issue separately.

As for timeline, we would like to see things finished by EOY 2012, if not sooner. No clue what the actual timeline will be.

http://people.mozilla.com/~bgirard/cleopatra/?report=6d898dcec658b899f4fffc588b4b21c396b68752 points to Sync's main "do a sync" synchronous function - sync(). This function effectively spins the event loop while waiting for an observer to fire. If we make it async, we'll just get to the next layer in the onion and the profiler will report that function.

I don't believe we have any evidence that Sync's event loop spinning causes jank except in about:sync-tabs. The profiler points to it, but I think the profiler is wrong. If sync() caused UI freeze, Firefox would be freezing for seconds during sync, especially during initial sync. We don't see this. So, I'm inclined to think the event filtering doesn't account for event loop spinning properly.

Admittedly, I don't know what all can lead to UI freeze and jank and how the profiler and reporter actually work. So, maybe I'm just horribly ignorant. I welcome enlightenment.

I'm still inclined to resolve this as a dupe. If we identify particularly bad bits actually causing jank, we can look at paving over those until the complete async rewrite is complete.
(In reply to Gregory Szorc [:gps] from comment #6)
> Admittedly, I don't know what all can lead to UI freeze and jank and how the
> profiler and reporter actually work. So, maybe I'm just horribly ignorant. I
> welcome enlightenment.

Your not ignorant and this profiling tool is very new. It's not obvious from this profile what is causing the jank.

I filed this bug strictly because of the crash potential of spinning a nested event loop. Ehsan mention to me that he debugged a crash that was traced back to sync nesting an event loop but he couldn't remember what the outcome was.
(In reply to Benoit Girard (:BenWa) from comment #7)
> I filed this bug strictly because of the crash potential of spinning a
> nested event loop. Ehsan mention to me that he debugged a crash that was
> traced back to sync nesting an event loop but he couldn't remember what the
> outcome was.

If there were increased crash potential due to spinning a nested event loop, I would expect Sync users to experience significantly more crashes than non-Sync users due to prevalence of spinning in Sync. AFAIK, Sync is the single worst abuser of spinning in Firefox. Does anyone have data to test this theory?
I'm less concerned about crashing, and more concerned with bad behavior caused by unintentional spinning in observers (e.g., bug 697649).

As far as I understand it, there are no steps we can feasibly take to mitigate the use of spinning beyond "rewrite Sync to be asynchronous and not require event loop spinning". We're already working on that, as Greg mentioned.
(In reply to comment #8)
> (In reply to Benoit Girard (:BenWa) from comment #7)
> > I filed this bug strictly because of the crash potential of spinning a
> > nested event loop. Ehsan mention to me that he debugged a crash that was
> > traced back to sync nesting an event loop but he couldn't remember what the
> > outcome was.
> 
> If there were increased crash potential due to spinning a nested event loop, I
> would expect Sync users to experience significantly more crashes than non-Sync
> users due to prevalence of spinning in Sync. AFAIK, Sync is the single worst
> abuser of spinning in Firefox. Does anyone have data to test this theory?

I don't have data, but I have seen this in at least one occasion.  Essentially, nesting event loops can potentially cause functions to be re-entered in ways that they do not anticipate.  That can corrupt the program state, which may manifest in a number of ways, or it can result in crashes, etc.  I have seen this at least in one case quite a while ago while I was debugging an intermittent crash which turned out to be caused by this re-entrancy issue.
Duping this back to our "don't spin the event loop" bug. There's nothing we can do to directly mitigate this, other than… to not spin.
Severity: critical → major
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
OS: Mac OS X → All
Priority: P1 → --
Hardware: x86 → All
Resolution: --- → DUPLICATE
Duplicate of bug: 600059
Component: Firefox Sync: Backend → Sync
Product: Cloud Services → Firefox
You need to log in before you can comment on or make changes to this bug.