Closed Bug 478718 Opened 11 years ago Closed 10 years ago

Move last Places sync to xpcom-shutdown

Categories

(Toolkit :: Places, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla1.9.3a1

People

(Reporter: mak, Assigned: mak)

References

Details

Attachments

(2 files, 10 obsolete files)

to be sure everything is correctly synced down, we should try to move the last sync to xpcom-shutdown.
experimenting on this. it would simplify xpcshell testing a lot, plus the fact actually any xpcshell test using places out of places test folders is leaking/asserting.
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Depends on: 508394
so, this is blocked by 2 issues:
- we leak as soon as any browser test uses a places service, or better, as soon as we notify something. that will cause PlacesDBFlush to initialize, and that's probably what we leak.
- bug 508394. A thread that is serving favicons has already shutdown here, for some reason, and the used async statement cannot be finalized. this does not allow us to close the db connection.
Depends on: 507199
No longer depends on: 507199
Attached patch wip v0.1 (obsolete) — Splinter Review
(In reply to comment #2)
> so, this is blocked by 2 issues:
> - we leak as soon as any browser test uses a places service, or better, as soon
> as we notify something. that will cause PlacesDBFlush to initialize, and that's
> probably what we leak.

Based on my investigation in bug 511735, this is indeed the case.  Specifically, nsPlacesDBFlush caches storage statements, and only finalizes them at xpcom-shutdown.
i solved the toolkit leaks probably, it was due to the fact PlacesDBFlush was being released too late, i'm removing the observer categories from the category manager and that feels like working.

I'm trying to remove any Places backend dependancy on quit-application, since it is never called in xpcshell tests, and cleanup should be guaranteed by xpcom-shutdown. moving work down the line to xpcom-shutdown does not show any problem so far.

the remaining failing tests are:
- toolkit: test_moz-anno_favicon_mime_type.js (due to bug 508394)
- browser: most tests are leaking something quite similar to nsBrowserGlue
solved the favicon issue, was due to the fact that test was not notifying anything, so nsPlacesDBFlush was not initing and we were trying to finalize the statement at ~FaviconService, most likely after xpcom-shutdown-threads.
Attached patch wip v0.2 (obsolete) — Splinter Review
Needs some clean-up but is mostly working.

the only real Places test failure is toolkit test_404630.js, this is still firing a strange assertion due to async favicons stuff:

WARNING: cannot post event if not initialized: file z:/mozilla-central/netwerk/protocol/http/src/nsHttpConnectionMgr.cpp, line 167
WARNING: nsExceptionService ignoring thread destruction after shutdown: file z:/mozilla-central/xpcom/base/nsExceptionService.cpp, line 194
###!!! ASSERTION: what happened to xpcom-shutdown?: 'mHostFiltersArray.Length() == 0 && mFilters == nsnull && mPACMan == nsnull', file z:/mozilla-central/netwerk/base/src/nsProtocolProxyService.cpp, line 319
necko!nsProtocolProxyService::`scalar deleting destructor'+0x000000000000000F
necko!nsProtocolProxyService::Release+0x00000000000000D4 (z:\mozilla-central\netwerk\base\src\nsprotocolproxyservice.cpp, line 290)
xpcom_core!nsCOMPtr_base::assign_assuming_AddRef+0x0000000000000059 (z:\mozilla-central\obj-i686-pc-mingw32\browser\dist\include\nscomptr.h, line 457)
xpcom_core!nsCOMPtr_base::assign_with_AddRef+0x000000000000002C (z:\mozilla-central\obj-i686-pc-mingw32\browser\xpcom\build\nscomptr.cpp, line 90)
xpcom_core!nsCOMPtr<nsISupports>::operator=+0x0000000000000012 (z:\mozilla-central\obj-i686-pc-mingw32\browser\dist\include\nscomptr.h, line 976)
xpcom_core!FreeServiceContractIDEntryEnumerate+0x000000000000002F (z:\mozilla-central\xpcom\components\nscomponentmanager.cpp, line 1736)
xpcom_core!PL_DHashTableEnumerate+0x00000000000000F8 (z:\mozilla-central\obj-i686-pc-mingw32\browser\xpcom\build\pldhash.c, line 754)
xpcom_core!nsComponentManagerImpl::FreeServices+0x000000000000005E (z:\mozilla-central\xpcom\components\nscomponentmanager.cpp, line 1748)
xpcom_core!mozilla::ShutdownXPCOM+0x000000000000026C (z:\mozilla-central\xpcom\build\nsxpcominit.cpp, line 831)
xpcom_core!NS_ShutdownXPCOM_P+0x000000000000000C (z:\mozilla-central\xpcom\build\nsxpcominit.cpp, line 744)
xpcom!NS_ShutdownXPCOM+0x000000000000000D (z:\mozilla-central\xpcom\stub\nsxpcomstub.cpp, line 179)
xpcshell!main+0x0000000000000CED (z:\mozilla-central\js\src\xpconnect\shell\xpcshell.cpp, line 1795)
xpcshell!__tmainCRTStartup+0x00000000000001A6 (f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c, line 586)
xpcshell!mainCRTStartup+0x000000000000000D (f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c, line 403)
Attachment #396562 - Attachment is obsolete: true
the nsProtocolProxyService xpcom-shutdown oberver is added in nsProtocolProxyService::Init, sounds like the test is so short that init happens just after xpcom-shutdowm, and the destructor runs just after that.
breaking the mainthread did help, just added a 0 timeout to the test. These situations are hard to be seen in real cases.
Attached patch patch v1.0 (obsolete) — Splinter Review
I've pushed this to the tryserver, Shawn, after vacuum review, if you want to give a first-review of this, would be awesome.
Attachment #399148 - Attachment is obsolete: true
Attachment #399207 - Flags: review?(sdwilsh)
hm oh well actually i did a wrong edit to test_404630.js, i forgot to make it pending so i practically disabled it, sigh.
tryservers did signal a crash in
test_database_sync_after_quit_application.js | application crashed (minidump found), actually trying to get the stack hoping tryservers have this feature.

and another one timed out in
test_database_sync_after_quit_application_with_removeAllPages.js

it is pretty awesome i did not hit any of these 2 locally, but i'm used to that at least.
Comment on attachment 399207 [details] [diff] [review]
patch v1.0

For more detailed review comments, please see http://reviews.visophyte.org/r/399207/

on file: browser/components/places/tests/unit/test_browserGlue_prefs.js line 232
>   dump("\nTEST " + (++testIndex) + ": " + test.description + "\n");

just change this to print


on file: browser/components/places/tests/unit/test_browserGlue_smartBookmarks.js line 189
> /*
>   dump("Found Children for folder with itemId " + aFolderItemId + ":\n");
>   for (let i = 0; i < cc; i++) {
>     let node = rootNode.getChild(i);
>     dump(i + ") " + node.title + "\n");
>   }
> */

should remove this in your final patch


on file: browser/components/places/tests/unit/test_browserGlue_smartBookmarks.js line 215
>   dump("\nTEST " + (++testIndex) + ": " + test.description + "\n");

print


on file: toolkit/components/places/public/nsPIPlacesDatabase.idl line 49
> [scriptable, uuid(8e6d4f8a-4b8e-4026-9fca-517c4494ddb7)]

We need to modify this uuid because of changes.


on file: toolkit/components/places/src/nsNavHistory.h line 437
>   nsresult FinalizeInternalStatements();

nit: NS_HIDDEN_(nsresult)


on file: toolkit/components/places/src/nsNavHistory.h line 444
>   nsresult CommitPendingChanges();

nit: NS_HIDDEN_(nsresult)


on file: toolkit/components/places/src/nsNavHistory.cpp line 5574
>     // need it for a final flush.  It could also be useful for tests.

drop the "It could be useful..." comment


on file: toolkit/components/places/src/nsPlacesDBFlush.js line 484
>     { category: "bookmark-observers", entry: "nsPlacesDBFlush"},
>     { category: "history-observers", entry: "nsPlacesDBFlush" },

I would prefer if we formatted this like this:
{ category: "bookmark-observers",
  entry: "nsPlacesDBFlush" },
...


on file: toolkit/components/places/src/nsPlacesMacros.h line 45
>     ENUMERATE_WEAKARRAY(array, type, method)                                   \
>     const nsCOMArray<type> &entries = cache.GetEntries();                      \
>     for (PRInt32 idx = 0; idx < entries.Count(); ++idx)                        \
>         entries[idx]->method;                                                  \

Can you explain this change please?  Why do we want to enumerate observers and
then categories?


on file: toolkit/components/places/tests/sync/test_database_sync_after_quit_application.js line 90
>   os.notifyObservers(null, "xpcom-shutdown", null);

This test's filename should be changed too!  This applies to the other tests
as well!


on file: toolkit/components/places/tests/unit/test_404630.js line 62
>   do_timeout(0, run);

explain this please (in commment in code)
Attachment #399207 - Flags: review?(sdwilsh)
(In reply to comment #13)

> on file: toolkit/components/places/src/nsPlacesMacros.h line 45
> >     ENUMERATE_WEAKARRAY(array, type, method)                                   \
> >     const nsCOMArray<type> &entries = cache.GetEntries();                      \
> >     for (PRInt32 idx = 0; idx < entries.Count(); ++idx)                        \
> >         entries[idx]->method;                                                  \
> 
> Can you explain this change please?  Why do we want to enumerate observers and
> then categories?

I'm just trying to reduce orangeness, this is an alternative path to bug 511965, even if i think won't change much, but could be worth trying.

I solved the problem with favicons on shutdown discarding lazy messages. nsProtocolProxyService shutdowns at xpcom-shutdown, we can't rely on it, so i'm just dropping favicons lazy messages if we are shutting down.
Attached patch patch v1.1 (obsolete) — Splinter Review
Attachment #399207 - Attachment is obsolete: true
PS: this builds on top of patch in bug 515952.
Attached patch patch v1.2 (obsolete) — Splinter Review
update, builds on top of Bug 516940 and some other patch expected to land before.
Attachment #400742 - Attachment is obsolete: true
Attached patch patch v1.3 (obsolete) — Splinter Review
my patches queue is getting better, unbitrot and cleanup.

i still have one issue to solve, the two sync/test_database_sync_after_shutdownXXX tests are crashing or timing out without apparent reasons, i suppose the fact i have to notify xpcom-shutdown manually is somehow involved, but i can't reproduce locally, so i have to use tryservers till i figure out what's up... sigh.
Attachment #402094 - Attachment is obsolete: true
Attached patch patch v1.4 (obsolete) — Splinter Review
one last detail to check, but Tryserver finally gave me a green!
Attachment #403845 - Attachment is obsolete: true
Whiteboard: [builds on top of bug 516940]
Attached patch patch v1.5 (obsolete) — Splinter Review
this should be fine
Attachment #405268 - Attachment is obsolete: true
Attachment #405499 - Flags: review?(sdwilsh)
test_database_sync_after_shutdown_with_removeAllPages is failing just on windows, not sure about the reason, on my vm i had a threading issues that caused an additive flush to be sent (onItemAdded called twice instead of once).
Apart that this is in good shape for a review i can keep investigate that.
Whiteboard: [builds on top of bug 516940]
Comment on attachment 405499 [details] [diff] [review]
patch v1.5

http://reviews.visophyte.org/r/405499/

on file: toolkit/components/places/src/nsNavHistory.h line 589
>   void CommitLazyMessages(PRBool aIsShutdown = PR_FALSE);

NS_HIDDEN_(void) since you are here please


on file: toolkit/components/places/src/nsNavHistory.cpp line 5913
> void
> nsNavHistory::CommitLazyMessages(PRBool aIsShutdown)

likewise, NS_HIDDEN_(void)_ here too.


r=sdwilsh
Attachment #405499 - Flags: review?(sdwilsh) → review+
Attached patch patch v1.6 (obsolete) — Splinter Review
Sounds like Windows did not love my category manager registration change, it is probably due to persistence (bug 364864?).
Btw, just removing the new category entry "entry" value fixes all issues, so my problem was most likely due to an old catman registered sync component that was serving duplicate flushes (so i was getting 2 flushes for onItemAdded). Odd. And this was just on Windows (both my vm and the tryserver), while Linux and OS X were behaving correctly.
I just got rid of the "entry" description going back to the old registration and let xpcomUtils cleanup the entry on shutdown, while i use the new syncStopped to ensure we don't try to sync after shutdown.
Attachment #405499 - Attachment is obsolete: true
Attached patch patch v1.7 (obsolete) — Splinter Review
better, this still removes the category observer. Notice i also remove the category in history service, just to be sure we don't leak if somebody registers as an observer and forgets to unregister. Previous version was leaking, this fixes the leak. Tested on win and lin.
And should be final version to push.
Attachment #405899 - Attachment is obsolete: true
Attached patch patch v1.8Splinter Review
fixed test orange with a polling strategy. me 1 tryserver 0
Attachment #405916 - Attachment is obsolete: true
crossing fingers
http://hg.mozilla.org/mozilla-central/rev/123e48c8edf4
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a1
Blocks: 515952
Given that we're deliberately playing with shutdown timings, this may not come as a surprise.

Nevertheless, this change is showing a 8.5% Ts Shutdown regression on Vista, but big wins on dirty profile shutdowns on Mac and Linux so far. Is this a trade-off we're making deliberately, or is the elevated Vista TShutdown cause for alarm?
it's a bit unclear, previous changes did show a large improvement on Ts shutdown dirty profiles on Vista so this 8.5% on Vista comes a bit surprising.
There are no changes that could hit Vista more than other OSes.
The vista data is noisy - this might take a couple more cycles to shake out before we call it an honest to goodness regression.
i have to revert a change to the fact i delete categories and sdwilsh pointed out components are cached and that could hurt, not sure if it will change anything. Btw will push it soon.

I also need more data abour what Ts Shutdown is exactly measuring, from when to when.
pushed additional changeset to avoid removing categories on shutdown, as per irc conversation with sdwilsh.
http://hg.mozilla.org/mozilla-central/rev/9c466d6f82f1

Now i see a 61% regression on XP MAX dirty, it is still largely below the value before bug 516940 and this, but there's lot of strange noise around these tests.
I don't actually understand this win.  We aren't doing any less work on shutdown, are we?  We just moved the work we did in quit-application to xpcom-shutdown.
yes, apart discarding lazy favicons messages (async channels opened after xpcom-shutdown cleanup) most of the stuff is just moving things later.

per irc discussion i will also push a change to make deleteCategoryEntry not persistent and file a bug to check why we leak if we don't remove the entry manually (the category cache should already clean that up on xpcom-shutdown).
better checking https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsICategoryManager/deleteCategory

looks like deleteCategory is safe, since it won't persist the removal across sessions

DeleteCategoryEntry instead is not safe, but thanks to bug 364864 it did not hurt us.

since nsICategoryManager is frozen and won't change in future, we could add back the deleteCategory calls in historyService and get rid of deleteCategoryEntry calls in DBFlush.

So 2 options:
- remove DeleteCategoryEntry calls and add back DeleteCategory
- make DeleteCategoryEntry calls get a false aPersist param
even if following code, DeleteCategory calls mTable.clear(), that removes entries from the hash table, instead DeleteCategoryEntry with aPersist=false, if the entry is a persistent one it sets leaf->nonpValue = nsnull; (nonpersistentvalue), if it's not a persistent entry it removes it from the hash...

So looks like documentation could be wrong, but this code is a bit messed up.
(In reply to comment #34)
> - remove DeleteCategoryEntry calls and add back DeleteCategory
I would prefer this.

Reopening to track that this isn't done.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
actually the leak is due to the fact for some reason in tests we can end up notifying PlacesDBFlush for the first time on xpcom-shutdown or later (i suppose during).

nsCategoryCache empties its local cache at xpcom-shutdown, if the category manager sends to the categoryCache an addEntry notification after this cleanup, it will be added to the local cache and never removed, causing the leak.

I'm filing a bug against xpcom, since the code allows this possibility, but probably we also have to follow the stack and see what is adding a category manager entry after xpcom-shutdown.
filed Bug 522353
Depends on: 522353
Attached patch be safeSplinter Review
i think while i investigate bug 522353, we should take this.
I prefer this rather than add back deleteCategory because these calls are really specific about requests, the third param specifies we don't want to persist the change.
DeleteCategory as well won't persist the change, but that is only documented on mdc, the idl does not say anything, and the code is following a different path from deleteCategoryEntry(persist=false). So i don't trust too much into that method.

there is no rush due to bug 364864, but i'd really prefer to put this in before next nightly, just to be safe.
Attachment #406321 - Flags: review?(sdwilsh)
Attachment #406321 - Flags: review?(sdwilsh) → review+
Comment on attachment 406321 [details] [diff] [review]
be safe

I'd really rather us just back this out until the issues are fixed, but this is fine as a stopgap.
well i did some deeper analysis in bug 522353, and the reason for the leak is quite clear. So backout won't give back anything really useful, will probably just make harder to try understanding Ts Shutdown roller-coaster behavior.
pushed.
http://hg.mozilla.org/mozilla-central/rev/1a780ff55536

Now, i'll wait tomorrow to check bug 522353 with Biesi.
i pushed bug 522353 and removed the workaround in nsPlacesDBFlush
http://hg.mozilla.org/mozilla-central/rev/5b631f59dda3
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Depends on: 529821
You need to log in before you can comment on or make changes to this bug.