Closed Bug 1568950 Opened 5 years ago Closed 5 years ago

Use counters are not recorded properly when a tab is closed

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

VERIFIED FIXED
mozilla70
Tracking Status
firefox70 + verified

People

(Reporter: bzbarsky, Assigned: edgar)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 1 obsolete file)

STEPS TO REPRODUCE:

  1. Start a nightly from today or later with a clean profile.
  2. Open a tab with http://web.mit.edu/bzbarsky/test-use-counter.html in it. This just exercises a codepath I know has a usecounter in today's build.
  3. Open a second tab with about:memory in it.
  4. Close the tab opened in step 2.
  5. Minimize memory usage in about:memory.
  6. Open about:telemetry.
  7. Type USE_COUNTER in the search field at top right and hit enter.

EXPECTED RESULTS: There is a histogram for the USE_COUNTER2_HTMLDOCUMENTNAMEDGETTERHIT_PAGE counter (and one for USE_COUNTER2_HTMLDOCUMENTNAMEDGETTERHIT_DOCUMENT).

ACTUAL RESULTS: There is no such histogram.

ADDITIONAL INFORMATION:

  • If I load a different page from http://web.mit.edu between steps 2 and 4, that does not change things.
  • If I load a different page from http://web.mit.edu between steps 2 and 4, and switch the order of steps 4 and 5 (so memory minimization happens before tab closing and gets to destroy the test document when it happens), then I do see the histograms.

We collect the use counters in ~Document. It seems plausible that when we close the tab we just kill the process without destroying its documents and therefore never collect use counters for anything that is loaded or in bfcache at that point?

This seems like it would bias our data, esp. for sites that claim you should close the tab after using them (e.g. bank sites).

[Tracking Requested - why for this release]: We are not getting correct data from our use counter telemetry.

I'd argue that having the code in ~Document is wrong since destructors shouldn't have side effects; they should just handle memory management (and may be deferred or skipped for reasons of memory management). Instead, it should live in code that runs at a known time as we navigate away from the document.

Not sure if you're taking this on bz, but if not, Hsin-Yi, can you help find an owner?

Flags: needinfo?(htsai)

I don't plan to fix this; I just ran into it when trying to understand the data from a use counter I had added that wasn't making sense to me.

(In reply to Liz Henry (:lizzard PTO till Aug 10 n-i to :pascald instead) from comment #3)

Not sure if you're taking this on bz, but if not, Hsin-Yi, can you help find an owner?

Will work on it.

Thanks Edgar!

Assignee: nobody → echen
Flags: needinfo?(htsai)
Priority: -- → P2

We do destruct Document and try to collect the data by calling Telemetry::Accumulate when we close the tab, it seems that Telemetry doesn't successfully report the data collected during XPCOM_shutdown procedure to parent process?

Telemetry schedules content data reporting in a IPC timer, but the content process is already in shutdown procedure, so the timeout handler doesn't have chance to be executed.

We do destruct Document and try to collect the data by calling Telemetry::Accumulate when we close the tab

Fwiw, I wasn't seeing us do that (via printfs added to ~Document and the like). It's possible that we have multiple issues going on here...

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #9)

We do destruct Document and try to collect the data by calling Telemetry::Accumulate when we close the tab

Fwiw, I wasn't seeing us do that (via printfs added to ~Document and the like). It's possible that we have multiple issues going on here...

The callstack for my case is something like,

#0 0x00007fdd35a2c2f0 in mozilla::dom::Document::ReportUseCounters(mozilla::dom::Document::UseCounterReportKind) (this=0x7fdd25ad9000, aKind=mozilla::dom::Document::UseCounterReportKind::eDefault) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.cpp:14465
#1 0x00007fdd35a2b061 in mozilla::dom::Document::~Document() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.cpp:1767
#2 0x00007fdd3786ec45 in nsHTMLDocument::~nsHTMLDocument() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/html/nsHTMLDocument.cpp:158
#3 0x00007fdd3786ec69 in nsHTMLDocument::~nsHTMLDocument() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/html/nsHTMLDocument.cpp:158
#4 0x00007fdd35a2d12e in mozilla::dom::Document::DeleteCycleCollectable() (this=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.cpp:1910
#5 0x00007fdd35aca715 in mozilla::dom::Document::cycleCollection::DeleteCycleCollectable(void*) (this=0x7fdd3fc907d0 <mozilla::dom::Document::_cycleCollectorGlobal>, p=0x7fdd25ad9000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/base/Document.h:496
#6 0x00007fdd333aca7c in SnowWhiteKiller::MaybeKillObject(SnowWhiteKiller::SnowWhiteObject&) (this=0x7ffcacc85550, aObject=...) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:2429
#7 0x00007fdd333a3408 in SnowWhiteKiller::~SnowWhiteKiller() (this=0x7ffcacc85550) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:2416
#8 0x00007fdd33393633 in nsCycleCollector::FreeSnowWhite(bool) (this=0x7fdd49e88ee0, aUntilNoSWInPurpleBuffer=true) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:2609
#9 0x00007fdd33396c73 in nsCycleCollector::BeginCollection(ccType, nsICycleCollectorListener*) (this=0x7fdd49e88ee0, aCCType=ShutdownCC, aManualListener=0x0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3584
#10 0x00007fdd333965d7 in nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) (this=0x7fdd49e88ee0, aCCType=ShutdownCC, aBudget=..., aManualListener=0x0, aPreferShorterSlices=false) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3413
#11 0x00007fdd33396371 in nsCycleCollector::ShutdownCollect() (this=0x7fdd49e88ee0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3357
#12 0x00007fdd333975ce in nsCycleCollector::Shutdown(bool) (this=0x7fdd49e88ee0, aDoCollect=true) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3645
#13 0x00007fdd33399204 in nsCycleCollector_shutdown(bool) (aDoCollect=true) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/base/nsCycleCollector.cpp:3999
#14 0x00007fdd3358a184 in mozilla::ShutdownXPCOM(nsIServiceManager*) (aServMgr=0x0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/build/XPCOMInit.cpp:706
#15 0x00007fdd33589d25 in NS_ShutdownXPCOM(nsIServiceManager*) (aServMgr=0x0) at /home/edgar/Workspace/mercurial/mozilla-central/xpcom/build/XPCOMInit.cpp:564
#16 0x00007fdd3b69ae2c in XRE_TermEmbedding() () at /home/edgar/Workspace/mercurial/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:223
#17 0x00007fdd34132c92 in mozilla::ipc::ScopedXREEmbed::Stop() (this=0x7fdd49eb2978) at /home/edgar/Workspace/mercurial/mozilla-central/ipc/glue/ScopedXREEmbed.cpp:90
#18 0x00007fdd384e566c in mozilla::dom::ContentProcess::CleanUp() (this=0x7fdd49eb2000) at /home/edgar/Workspace/mercurial/mozilla-central/dom/ipc/ContentProcess.cpp:211
#19 0x00007fdd3b69baf3 in XRE_InitChildProcess(int, char**, XREChildData const*) (aArgc=13, aArgv=0x7ffcacc86048, aChildData=0x7ffcacc85ec0) at /home/edgar/Workspace/mercurial/mozilla-central/toolkit/xre/nsEmbedFunctions.cpp:758
#20 0x00007fdd3b6a6e27 in mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) (this=0x7fdd49e026c0, argc=15, argv=0x7ffcacc86048, aChildData=0x7ffcacc85ec0) at /home/edgar/Workspace/mercurial/mozilla-central/toolkit/xre/Bootstrap.cpp:67
#21 0x00005575466b418a in content_process_main(mozilla::Bootstrap*, int, char**) (bootstrap=0x7fdd49e026c0, argc=15, argv=0x7ffcacc86048) at /home/edgar/Workspace/mercurial/mozilla-central/browser/app/../../ipc/contentproc/plugin-container.cpp:56
#22 0x00005575466b42e9 in main(int, char**, char**) (argc=16, argv=0x7ffcacc86048, envp=0x7ffcacc860d0) at /home/edgar/Workspace/mercurial/mozilla-central/browser/app/nsBrowserApp.cpp:267

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #0)

  1. Open a tab with http://web.mit.edu/bzbarsky/test-use-counter.html in it. This just exercises a codepath I know has a usecounter in today's

BTW, the link in comment #0 seems broken, I got a "Page Not Found", so I try to reproduce the bug with following script.

<!DOCTYPE html>
<form name="x"></form>
<button id="btn">click me</button>
<pre id="result"></pre>
<script>
function w(msg) {
  result.textContent += msg + '\n';
}

btn.addEventListener("click", function(e) {
  w(document.x);
});
</script>
Attached file test.html

BTW, the link in comment #0 seems broken

Er, sorry, should be http://web.mit.edu/bzbarsky/www/test-use-counter.html

The callstack for my case is something like,

That's in a debug build, right? It's coming through ContentProcess::CleanUp which is only called in debug build per its documentation. In opt builds, at least according to the documentation and code inspection of ContentChild::ActorDestroy, we land at ProcessChild::QuickExit and kill the process without doing the CC pass that you're seeing collect the document.

So it does sound like we have two possible issues here:

  1. In an opt build the "report the use counters" code is not even reached.
  2. Even if it were reached (as in debug builds), it might not report them successfully due to being unable to send the IPC message.

Right?

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #13)

That's in a debug build, right?

Yes.

  1. In an opt build the "report the use counters" code is not even reached.
  2. Even if it were reached (as in debug builds), it might not report them successfully due to being unable to send the IPC message.

Right?

Exactly. Looks like we need to find a proper time to report data for Document, maybe in Document::OnPageHide and it requires change some logic to ensure that we don't report data duplicately.

Recording use counters in ~Document doesn't work as expected for some cases, for instance,
user closes tab right after using them:

  • In opt build, Document destructor isn't even reached given that we just kill the process
    without doing cleanup;
  • In debug build, we reach the recording code, but it doesn't report use counters successfully
    due to it is unable to send the IPC message.

This patch moves the recording code at the time that user navigates away from current page,
and since user might also navigate back same document from bfcache, in order to not report
duplicated data, we introduce new flags to keep track of whether the use counters need to
be recorded.

I'm trying to understand the proposed bfcache behavior. It would be pretty helpful if the commit message described the behavior being aimed for, not just the means used to accomplish it...

(In reply to Edgar Chen [:edgar] from comment #17)

Created attachment 9084630 [details]
Bug 1568950 - Part 1: Record use counters when user navigates away from current page;

This also affects the Telemetry::*_DOCUMENTS_DESTROYED, it is still recorded only once per document or page, but it is not recorded on document destroy any more, maybe we should rename they. However, I am not sure if renaming it will break the usage of use counters data. :froydnj, would you mind shed some light here? Thank you.

Flags: needinfo?(nfroyd)

(In reply to Boris Zbarsky [:bzbarsky, bz on IRC] from comment #19)

I'm trying to understand the proposed bfcache behavior. It would be pretty helpful if the commit message described the behavior being aimed for, not just the means used to accomplish it...

So the proposed behavior is something that, if a user

  • navigate to PAGE_X, and PAGE_X use FEATURE_A.
  • navigate to PAGE_Y
  • navigate back to PAGE_X from history.
  • PAGE_X now use another FEATURE_B.

In this case, we should record the usage of FEATURE_B, but don't want to record FEATURE_A twice, does this make sense?
(I will revise commit message a bit, thanks)

So the proposed behavior is that every time we get pagehide for a page we report use counters that are newly set from the time the document was created or the last time we reported, whichever happened last? That makes sense, I think, as long as the "number of pageloads" and "number of document creations" counters that I think telemetry uses as the denominator is not affected by this change.

(In reply to Edgar Chen [:edgar] from comment #20)

(In reply to Edgar Chen [:edgar] from comment #17)

Created attachment 9084630 [details]
Bug 1568950 - Part 1: Record use counters when user navigates away from current page;

This also affects the Telemetry::*_DOCUMENTS_DESTROYED, it is still recorded only once per document or page, but it is not recorded on document destroy any more, maybe we should rename they. However, I am not sure if renaming it will break the usage of use counters data. :froydnj, would you mind shed some light here? Thank you.

Renaming it to something else means you just have discontinuous data: you have one histogram before the change and one, different, histogram after the change.

I think as long as its still fundamentally recorded once per document, you should be fine keeping the name.

Flags: needinfo?(nfroyd)

(In reply to Nathan Froyd [:froydnj] from comment #23)

Renaming it to something else means you just have discontinuous data: you have one histogram before the change and one, different, histogram after the change.

I think as long as its still fundamentally recorded once per document, you should be fine keeping the name.

That is what I am worried about. I will keep the name and revise the comment around it a bit. Thanks.

(In reply to Edgar Chen [:edgar] from comment #14)

Looks like we need to find a proper time to report data for Document, maybe in Document::OnPageHide

I just realized this still doesn't work properly for the svg document which is from <img> element, in such setup the pagehide of svg document is called from https://searchfox.org/mozilla-central/rev/3a61fb322f74a0396878468e50e4f4e97e369825/image/SVGDocumentWrapper.cpp#54 which still have same issue as comment #13. We need to have a specifial handle for this case.

Attachment #9084631 - Attachment description: Bug 1568950 - Part 2: Get rid of using PromiseUtils.jsm in browser_use_counters.js; → Bug 1568950 - Part 3: Get rid of using PromiseUtils.jsm in browser_use_counters.js;

Remove nsIDOMWindowUtils.forceUseCounterFlush calls give that we won't manually
report use counters in real case.

Attached file Bug 1568950 - Part 5: Add more tests (obsolete) —
Attachment #9084630 - Attachment description: Bug 1568950 - Part 1: Record use counters when user navigates away from current page; → Bug 1568950 - Part 1: Record use counters in Document::Destroy, instead of in ~Document;
Attachment #9086159 - Attachment description: Bug 1568950 - Part 2: Record use counters for SVG image when its owning document naviagtes away from current page; → Bug 1568950 - Part 2: Record use counters for SVG image when the image is going to be discarded;
Attachment #9086435 - Attachment is obsolete: true
Pushed by echen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/81f72abb79ff
Part 1: Record use counters in Document::Destroy, instead of in ~Document; r=smaug
https://hg.mozilla.org/integration/autoland/rev/57d7a634ef8f
Part 2: Record use counters for SVG image when the image is going to be discarded; r=smaug
https://hg.mozilla.org/integration/autoland/rev/0521dcbe7c20
Part 3: Get rid of using PromiseUtils.jsm in browser_use_counters.js; r=smaug
https://hg.mozilla.org/integration/autoland/rev/1c1c4ca0bc14
Part 4: Update existing tests to make them cover real use case; r=smaug
https://hg.mozilla.org/integration/autoland/rev/2e365eaea75b
Part 5: Remove nsIDOMWindowUtils::forceUseCounterFlush; r=smaug

(In reply to Tim Smith 👨‍🔬 [:tdsmith] from comment #26)

Will this also fix bug 1413266?

I am not sure if this bug could also fix it, the scenario of bug 1413266 is different from this bug.
But now we report use counters earlier, I would expect bug 1413266 could benefit from this a bit.

Regressions: 1576154
Flags: qe-verify+

I have reproduced this issue using Firefox 70.0a1 (2019.07.25) on Win 10 x64.
I can confirm this issue is fixed, I verified using Firefox 70.0b8 Win 10 x64, Ubuntu 18.04 x64 and macOS 10.13.6.

Flags: qe-verify+
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: