inhibit nsDNSservice restarts

RESOLVED FIXED in Firefox 61

Status

()

enhancement
P1
normal
RESOLVED FIXED
Last year
Last year

People

(Reporter: bagder, Assigned: bagder)

Tracking

Trunk
mozilla61
Points:
---

Firefox Tracking Flags

(firefox61 fixed)

Details

(Whiteboard: [necko-triaged][trr])

Attachments

(2 attachments)

The DNS service is shutdown and restarted again in several scenarios, most notably when one of its prefs is changed. This then drags the resolver, TRRService and others with it and they too have to restart. It is also shutdown and re-inited by nsIOService when going offline/online.

In the TRRService case, this makes short moments in time when there's no TRRService during which nsHostResolver will then not be able to check mode and for default to NATIVE which for users of TRR-only is a serious flaw and for trr-first and race users at least an annoyance.

The resolver restart means flushed DNS cache that also can be avoided.

It is also an unnecessary complicated procedure that takes time and CPU without a good reason (that I can see).
Okay, here's an initial take on this. Consider it a work in progress, not ripe for merge but I'm interested in feedback on the approach before I spend too much time going down this road

It currently leaks memory (mysteriously) as shown in this valgrind try run:
https://treeherder.mozilla.org/logviewer.html#?job_id=169435174&repo=try&lineNumber=38285
Attachment #8961328 - Flags: feedback?(valentin.gosu)
Comment on attachment 8961328 [details] [diff] [review]
0001-DNSservice-remove-all-restarts.patch

Review of attachment 8961328 [details] [diff] [review]:
-----------------------------------------------------------------

I assume the leak is because the patch doesn't call nsDNSService::Shutdown() anymore. Not sure if we should do that in the DNS service or IO service.

I agree that a full restart is an overkill here, so this is mostly a good idea. I think the reason we did it like this is because things got stuck sometimes, when changing networks or coming back from sleep, and restarting it flushes and reinitializes everything.
Attachment #8961328 - Flags: feedback?(valentin.gosu) → feedback+
Comment on attachment 8962738 [details]
bug 1447642 - no more DNSService restarts

https://reviewboard.mozilla.org/r/231598/#review237532

::: commit-message-de322:3
(Diff revision 1)
> +bug 1447642 - no more DNSService restarts r?valentin
> +
> +... and subsequently no nsHostResolver nor TRRService restarts.

We need a better description and comment

::: netwerk/base/nsIOService.cpp
(Diff revision 1)
> -
> -    // We need to get references to the DNS service so that we can shut it
> -    // down later. If we wait until the nsIOService is being shut down,
> -    // GetService will fail at that point.
> -
> -    mDNSService = do_GetService(NS_DNSSERVICE_CONTRACTID, &rv);

So, now the DNS service will get initialized by another do_GetService via nsDNSService::GetSingleton().

Are we sure we don't want to initialize it from here?

::: netwerk/dns/nsDNSService2.cpp:1168
(Diff revision 1)
> +        if (mResolverPrefsUpdated && mResolver) {
> +            mResolver->SetCacheLimits(mResCacheEntries, mResCacheExpiration,
> +                                      mResCacheGrace);
> -    }
> +        }
> +    } else if (!strcmp(topic, NS_XPCOM_SHUTDOWN_OBSERVER_ID)) {
> +        Shutdown();

This is OK. My only worry is that we might get another resolver request after shutting down. I haven't checked to see if that's possible, but I didn't see anything that prevents it.
Attachment #8962738 - Flags: review?(valentin.gosu) → review+
Comment on attachment 8962738 [details]
bug 1447642 - no more DNSService restarts

https://reviewboard.mozilla.org/r/231598/#review237832

::: commit-message-de322:3
(Diff revision 1)
> +bug 1447642 - no more DNSService restarts r?valentin
> +
> +... and subsequently no nsHostResolver nor TRRService restarts.

I'll clarify the motive much better in the next iteration.

::: netwerk/base/nsIOService.cpp
(Diff revision 1)
> -
> -    // We need to get references to the DNS service so that we can shut it
> -    // down later. If we wait until the nsIOService is being shut down,
> -    // GetService will fail at that point.
> -
> -    mDNSService = do_GetService(NS_DNSSERVICE_CONTRACTID, &rv);

Any particular reason why we should? All our tests run fine without it started here and my manual tests haven't shown any problems.

The IOservice doesn't need to know about nor care about the DNSservice anymore so why would it care about starting it?

I'm think the DNSService is already started in most cases when the IOservice starts although I'll admit I haven't researched this closely.

::: netwerk/dns/nsDNSService2.cpp:1168
(Diff revision 1)
> +        if (mResolverPrefsUpdated && mResolver) {
> +            mResolver->SetCacheLimits(mResCacheEntries, mResCacheExpiration,
> +                                      mResCacheGrace);
> -    }
> +        }
> +    } else if (!strcmp(topic, NS_XPCOM_SHUTDOWN_OBSERVER_ID)) {
> +        Shutdown();

I think it is a fair concern.

But this patch doesn't at all change how the resolver gets shutdown or how it resolves or doesn't resolve host names during shutdown.

Shutting down the resolver is done the exact same way as before: the DNSService shuts it down when it is told to shutdown. The DNSservice "own" the resolver in that regard, both before this change and after.

If a name resolve request is done after a shutdown has been initiated, it will be denied exactly like before. The difference in shutdown from before compared to with this patch, is this:

Before: the IOservice got triggered by the shutdown event, which sets IOService in offline mode and calls DNSService->shutdown()

Now: the DNSService itself waits for that same shutdown event and shuts down upon receival.

While it opens up for a small difference in handling depending on exactly when each service receives the event and acts on it, I can't see any obvious error in this approach. I don't think closing it from ioservice made it any more resilient for resolves during shutdown. I think it already deals with such in a sufficient manner.

Can you think of any specific code paths or series of events that could be more fragile with this approach?
(In reply to Daniel Stenberg [:bagder] from comment #5)
> Comment on attachment 8962738 [details]
> bug 1447642 - no more DNSService restarts
> 
> https://reviewboard.mozilla.org/r/231598/#review237832
> 
> ::: commit-message-de322:3
> (Diff revision 1)
> > +bug 1447642 - no more DNSService restarts r?valentin
> > +
> > +... and subsequently no nsHostResolver nor TRRService restarts.
> 
> I'll clarify the motive much better in the next iteration.
> 
> ::: netwerk/base/nsIOService.cpp
> (Diff revision 1)
> > -
> > -    // We need to get references to the DNS service so that we can shut it
> > -    // down later. If we wait until the nsIOService is being shut down,
> > -    // GetService will fail at that point.
> > -
> > -    mDNSService = do_GetService(NS_DNSSERVICE_CONTRACTID, &rv);
> 
> Any particular reason why we should? All our tests run fine without it
> started here and my manual tests haven't shown any problems.
> 
> The IOservice doesn't need to know about nor care about the DNSservice
> anymore so why would it care about starting it?
> 
> I'm think the DNSService is already started in most cases when the IOservice
> starts although I'll admit I haven't researched this closely.

So, if we don't do it in nsIOService::Init, it will be done the first time the DNSService is instantiated.
We should assert that Init is called on the main thread (just to be sure).
Also if not done at startup, it cause a delay for the first DNS request. On the other hand, startup could be faster. Not sure which one is better.

> ::: netwerk/dns/nsDNSService2.cpp:1168
> (Diff revision 1)
> > +        if (mResolverPrefsUpdated && mResolver) {
> > +            mResolver->SetCacheLimits(mResCacheEntries, mResCacheExpiration,
> > +                                      mResCacheGrace);
> > -    }
> > +        }
> > +    } else if (!strcmp(topic, NS_XPCOM_SHUTDOWN_OBSERVER_ID)) {
> > +        Shutdown();
> 
> I think it is a fair concern.
> 
> But this patch doesn't at all change how the resolver gets shutdown or how
> it resolves or doesn't resolve host names during shutdown.

Fair enough. In any case, the failures should be obvious if they happen - leaks/hangs/crashes.
(In reply to Valentin Gosu [:valentin] from comment #7)

> We should assert that Init is called on the main thread (just to be sure).

Yes that's a good idea, I'll make sure to add that. Being careful and making sure are good virtues!

> Also if not done at startup, it cause a delay for the first DNS request. On
> the other hand, startup could be faster. Not sure which one is better.

It ought to be a very rare startup that doesn't do any name resolves at all. There are for example a lot of DNS prefetches being done in a typical startup, and possibly restoring/loading content with names that get resolved.

But yes, removing the DNSService startup from IOservice->Init() delays it a bit in the startup sequence. I checked, and IOService now gets inited a while before DNSService.

At the same time it feels wrong to start the dnsservice from ioservice now that it won't be directly used by it any longer. Further, the dnsservice startup procedure should be fairly fast, so I don't think doing it on-demand slightly later is bad.

> > But this patch doesn't at all change how the resolver gets shutdown or how
> > it resolves or doesn't resolve host names during shutdown.
> 
> Fair enough. In any case, the failures should be obvious if they happen -
> leaks/hangs/crashes.

Indeed!
Pushed by daniel@haxx.se:
https://hg.mozilla.org/integration/autoland/rev/cc52c1744016
no more DNSService restarts r=valentin
https://hg.mozilla.org/mozilla-central/rev/cc52c1744016
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.