Closed Bug 1401236 Opened 2 years ago Closed 2 years ago

Sync hangs main process for multiple seconds, apparently after visiting links

Categories

(Firefox :: Sync, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
firefox57 --- affected

People

(Reporter: jryans, Assigned: lina)

References

Details

Attachments

(1 file)

I can't seem to trigger this issue reliably, so that's not the best...

In any case, I noticed in recently Nightlies that the main process would hang for multiple seconds occasionally.

Here's a profile of the issue happening:

https://perfht.ml/2hfamef

The key lines in the stack during hang appear to be:

addChangedID (resource://services-sync/engines.js:145)
js::RunScript
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct)
<name omitted>
js::ForwardingProxyHandler::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const
js::CrossCompartmentWrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const
js::Proxy::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&)
js::proxy_Call(JSContext*, unsigned int, JS::Value*)
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct)
Interpret(JSContext*, js::RunState&)
js::RunScript(JSContext*, js::RunState&)
onVisit
js::RunScript
js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct)
<name omitted>
JS_CallFunctionValue(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)
nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*)
PrepareAndDispatch
SharedStub
nsNavHistory::NotifyOnVisit(nsIURI*, long long, long long, long long, int, nsTSubstring<char> const&, bool, unsigned int, unsigned int, nsTSubstring<char16_t> const&)
mozilla::places::(anonymous namespace)::NotifyVisitObservers::Run()
It's probably the history tracker reading persisted changed IDs from disk. We use `JSONFile#load` to read the file (https://searchfox.org/mozilla-central/rev/1c13d5cf85f904afb8976c02a80daa252b893fca/toolkit/modules/JSONFile.jsm#180)...which is async, except I see we don't check if `this.dataReady` is true *before* we read. That might be the issue here.
Assignee: nobody → kit
Status: NEW → ASSIGNED
Comment on attachment 8909891 [details]
Bug 1401236 - Don't reread already loaded data in `JSONFile#load`.

https://reviewboard.mozilla.org/r/181382/#review186624

This makes me nervous for 2 reasons:

1) It seems like it might be the case that some consumers *intend* to re-read data from disk at some point.
2) I don't see the JSON file thing in the profile? Instead I see us spending a lot of time in nsNavHistory::NotifyOnVisit and its callbacks. It seems activity stream especially is trying to call getScreenshotForURL and spending a lot of time doing that.

Maybe I'm misreading the profile? What makes you think this is the culprit?
Attachment #8909891 - Flags: review?(gijskruitbosch+bugs)
(In reply to :Gijs from comment #3)
> 1) It seems like it might be the case that some consumers *intend* to re-read
data from disk at some point.

Maybe, though it's not clear to me why they would, since we check `this.dataReady` and bail immediately after `read`. At best, we've done a lot of busy work to read the contents into memory again, only to discard them because the file is already loaded.

Sync calls `load` and spins a nested event loop unconditionally (https://searchfox.org/mozilla-central/rev/1c13d5cf85f904afb8976c02a80daa252b893fca/services/sync/modules/engines.js#100), so we'll reread the tracker file every time you visit a page. I can move the `dataReady` check into Sync, but thought it made more sense to do here.

> Maybe I'm misreading the profile? What makes you think this is the culprit?

It's likely not the main culprit, but I noticed `get changedIDs` showing up repeatedly in the profile. We can easily avoid that work, especially for large tracker files.
(In reply to Kit Cambridge (he/him) [:kitcambridge] (UTC-7) from comment #4)
> (In reply to :Gijs from comment #3)
> > 1) It seems like it might be the case that some consumers *intend* to re-read
> data from disk at some point.
> 
> Maybe, though it's not clear to me why they would, since we check
> `this.dataReady` and bail immediately after `read`. At best, we've done a
> lot of busy work to read the contents into memory again, only to discard
> them because the file is already loaded.
> 
> Sync calls `load` and spins a nested event loop unconditionally
> (https://searchfox.org/mozilla-central/rev/
> 1c13d5cf85f904afb8976c02a80daa252b893fca/services/sync/modules/engines.
> js#100), so we'll reread the tracker file every time you visit a page. I can
> move the `dataReady` check into Sync, but thought it made more sense to do
> here.

Fair point!


> > Maybe I'm misreading the profile? What makes you think this is the culprit?
> 
> It's likely not the main culprit, but I noticed `get changedIDs` showing up
> repeatedly in the profile. We can easily avoid that work, especially for
> large tracker files.

OK, that makes sense. We might need to keep the bug open to do further investigation?
Keywords: leave-open
Comment on attachment 8909891 [details]
Bug 1401236 - Don't reread already loaded data in `JSONFile#load`.

https://reviewboard.mozilla.org/r/181382/#review186642

Kit's right, I was wrong, let's do this. :-)
Attachment #8909891 - Flags: review+
Pushed by kcambridge@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c0decd9037fe
Don't reread already loaded data in `JSONFile#load`. r=Gijs
We were chatting about this on IRC, and while I agree the JSONFile patch is correct, I don't see how it explains jank. I suspect Gijs may be correct that ActivityStream is the actual culprit here (but is attributed to Sync due to us spinning event loops, which we are working hard to kill). :jryans, any chance you could try disabling AS to see if that theory may actually be correct?
Flags: needinfo?(jryans)
(In reply to Mark Hammond [:markh] from comment #8)
> We were chatting about this on IRC, and while I agree the JSONFile patch is
> correct, I don't see how it explains jank. I suspect Gijs may be correct
> that ActivityStream is the actual culprit here (but is attributed to Sync
> due to us spinning event loops, which we are working hard to kill). :jryans,
> any chance you could try disabling AS to see if that theory may actually be
> correct?

I mean, that and/or get another profile with a build that has this fix, which won't have sync in the stacks anymore. :-)
(In reply to :Gijs from comment #9)
> I mean, that and/or get another profile with a build that has this fix,
> which won't have sync in the stacks anymore. :-)

I think Sync may still appear (although it seems likely it will no longer be JSONFile.load) - sync has .promiseSpinningly which always spins an event loop and is probably implicated. We'll be "done" asyncifying sync when promiseSpinningly no longer exists :)
(In reply to Mark Hammond [:markh] from comment #11)
> sync has .promiseSpinningly which always spins an event
> loop 

hrm - that's possibly not actually true, depending on how these promises resolve and the semantics of nsIThreadManager - but I'm sure we can still expect sync to pop up in odd stack traces after this patch.
(In reply to Mark Hammond [:markh] from comment #8)
> We were chatting about this on IRC, and while I agree the JSONFile patch is
> correct, I don't see how it explains jank. I suspect Gijs may be correct
> that ActivityStream is the actual culprit here (but is attributed to Sync
> due to us spinning event loops, which we are working hard to kill). :jryans,
> any chance you could try disabling AS to see if that theory may actually be
> correct?

D'oh, I even knew about the issues with Sync event loops... Guess I should have expanded the profile further. :/

In an attempt to test this, I have not yet updated to a Nightly with the fix here.  Instead, I disabled "browser.newtabpage.activity-stream.enabled" in about:config.  It looks like AS listens for the pref change, so I did not restart Firefox.  (The AS content is now gone from about:newtab.)

I'll leave it this way for a bit and see if the issue happens again.  Keeping ni? for now while I test.
With AS disabled today, I did not notice this bug anymore.

Should I morph this into an AS bug somehow?
Flags: needinfo?(jryans)
See Also: → 1401817
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #14)
> With AS disabled today, I did not notice this bug anymore.
> 
> Should I morph this into an AS bug somehow?

Given this bug has landed a patch, I think another bug is best - I just opened bug 1401817. The patch here is now in central, so I'll close this.
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.