Open Bug 1540812 Opened 6 years ago Updated 3 years ago

document.lastModified for local XML documents shouldn't cause main thread IO

Categories

(Core :: DOM: Core & HTML, defect, P3)

defect

Tracking

()

Performance Impact low

People

(Reporter: Gijs, Unassigned)

References

Details

(Keywords: perf, perf:responsiveness)

Stack:

nsThread::ProcessNextEvent(bool,bool *) [xul.dll]
nsresult nsInputStreamReadyEvent::Run() [xul.dll]
nsInputStreamPump::OnInputStreamReady []
nsInputStreamPump::OnStateStart []
nsresult nsInputStreamPump::OnInputStreamReady(class nsIAsyncInputStream *) [xul.dll]
nsBaseChannel::OnStartRequest(nsIRequest *) [xul.dll]
XMLHttpRequestMainThread::OnStartRequest []
nsresult mozilla::dom::XMLHttpRequestMainThread::OnStartRequest(class nsIRequest *) [xul.dll]
mozilla::dom::XMLDocument::StartDocumentLoad(char const *,nsIChannel *,nsILoadGroup *,nsISupports *,nsIStreamListener * *,bool,nsIContentSink *) [xul.dll]
mozilla::dom::Document::StartDocumentLoad(char const *,nsIChannel *,nsILoadGroup *,nsISupports *,nsIStreamListener * *,bool,nsIContentSink *) [xul.dll]
mozilla::dom::Document::RetrieveRelevantHeaders(nsIChannel *) [xul.dll]
nsresult nsLocalFile::GetLastModifiedTime(__int64 *) [xul.dll]
nsLocalFile::ResolveAndStat []
nsresult nsLocalFile::ResolveAndStat() [xul.dll]

At a minimum, this should be OMT for async XMLHttpRequest/fetch(), as I'd expect the fetch to be OMT, too.

If we're not able to do this, we should probably consider providing the current date/time, which is what we already do for chrome:// documents and what Chrome does for file:/// documents anyway. Or only doing the IO when the file:/// URL consumer asks the document for the info (which I assume will be "never" for 99.99% of docs).

Note that we also do this for SVG documents loaded as images, where it's not even possible for anyone to ask for the lastModified property...

Annnnd I was hoping to use fetch() to work around this, but no way to get an XML doc out of a fetch Response object except getting the string and throwing it through DOMParser, which requires allocating and then GC'ing the huge (blocklist.xml) string, which is also a bad idea. :-(

Priority: -- → P3
Summary: document.lastModified for XML documents shouldn't cause main thread IO → document.lastModified for local XML documents shouldn't cause main thread IO

Squinting at this, this looks like a t-shirt size of Large...

Whiteboard: [fxperf] → [fxperf:p2][fxperfsize:L]

(In reply to Mike Conley (:mconley) (:⚙️) from comment #3)

Squinting at this, this looks like a t-shirt size of Large...

I mean, from discussions online, looks like Chrome just lies and always returns the equivalent of Date.now()... if it's OK for us to do the same, this should be straightforward, as the code already falls back on that ( https://searchfox.org/mozilla-central/rev/b756e6d00728dda4121f8278a744381d8643317a/dom/base/Document.cpp#3042-3048 ). It'd just involve removing the file: implementation ( https://searchfox.org/mozilla-central/rev/b756e6d00728dda4121f8278a744381d8643317a/dom/base/Document.cpp#7486-7496 ) . This is OK per spec, AFAICT:

The Document's source file's last modification date and time must be derived from relevant features of the networking protocols used, e.g. from the value of the HTTP Last-Modified header of the document, or from metadata in the file system for local files. If the last modification date and time are not known, the attribute must return the current date and time in the above format.

( https://html.spec.whatwg.org/multipage/dom.html#dom-document-lastmodified )

Emilio, do you know who would make a decision about whether we were OK with doing so, given that that's what other browsers do and it seems (from my reading) to be allowed per the loophole in the spec?

Alternatively, we could probably update nsIFileChannel to provide access to this information directly, so that when we open the file off-main-thread we can store the information, and have it available for this code to use - but that's obviously slightly more complex than just removing stuff.

Flags: needinfo?(emilio)

So, that IO should only happen for local files, looking at:

https://searchfox.org/mozilla-central/rev/66086345467c69685434dd1c5177b30a7511b1a5/dom/base/Document.cpp#7491

And the issue is that it happens unconditionally, regardless of whether you access lastModified or not.

Note that Blink has a FIXME for this:

https://cs.chromium.org/chromium/src/third_party/blink/renderer/core/dom/document.cc?l=5399&rcl=35d81d3a437aed83241fae9cb30fe8f1ddef921f

So, thinking a bit out-loud:

We keep around a reference to the channel in Document::mChannel, can we stop accessing the modtype from RetrieveRelevantHeaders and do it lazily from the getter? I suspect that's not quite possible, or at least that'd defeat the purpose of lastModified at least slightly, as loading the document, then modifying the file, then reading lastModified could give the new rather than the old date.

If we want to get rid of this, we should probably raise an HTML spec issue. Probably Boris or Olli have opinions on that.

Flags: needinfo?(emilio)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #5)

So, thinking a bit out-loud:

We keep around a reference to the channel in Document::mChannel, can we stop accessing the modtype from RetrieveRelevantHeaders and do it lazily from the getter? I suspect that's not quite possible, or at least that'd defeat the purpose of lastModified at least slightly, as loading the document, then modifying the file, then reading lastModified could give the new rather than the old date.

Well, I assume that we do the reads of the file's actual content off-main-thread in the general case (certainly for async XHR which is what we're looking at here, but I'd expect the same for document requests tbh). We currently call RetrieveRelevantHeaders on the main thread from OnStartRequest, and we use the file reference that the nsIFileChannel keeps to check the last modified time. Presumably, the off-main-thread nsIFileChannel could get this information prior to notifying OnStartRequest and ensure it was available, exposing it directly through nsIFileChannel, thus avoiding the need to do IO on the main thread when looking this up? That wouldn't really change the time at which we check the LM time, we'd just ensure we do it off-main-thread. Am I missing an obvious reason this wouldn't work? :-)

Flags: needinfo?(emilio)

That should work off-hand, though I'm not the most familiar with necko.

That'd do the stat call also for file resources that aren't loaded as documents too, right? Maybe that's an acceptable price to pay. Or maybe we can coalesce the modified-time check with other stat() calls or such that we presumably may need to do.

Flags: needinfo?(emilio)

Downgrading to P3 because the XML blocklist is gone, long live the XML blocklist, so the number of cases where we hit this is probably (hopefully?) limited at this point.

Performance Impact: --- → P3
Whiteboard: [fxperf:p2][fxperfsize:L]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.