1452535 - Using MOZ HTTP log adds a lot of overhead

Reporter

Description

•

7 years ago

Attached image Change in Visual metrics with MOZ log turned on/off — Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36 Steps to reproduce: When we turn on the https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging HTTP log on Linux it affect Firefox and makes metrics that we measure a lot higher. I got two examples: 1. We run Browsertime on AWS c4.large where we test a couple of URLs of Wikipedia. We are using latest nightly of Firefox and tried turning on/off the HTTP log. I've tried it multiple times and it's always the same pattern (checkout the attached screenshot). Turning off the MOZ for the https://en.wikipedia.org/wiki/Barack_Obama page makes our first visual render become 1.5s faster. The same on other URLs with a lot of requests. Pages with lower amount of requests is affected, but not as much. We run with the following FF preferences: https://github.com/sitespeedio/browsertime/blob/master/lib/firefox/webdriver/firefoxPreferences.js And moz log configured with: timestamp,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5 2. We also use WebPageTest to measure the performance of our pages. The MOZ log is turned on there too and when we compare Firefox performance with Chrome we can see that Firefox is much slower but I think one of the main reasons for that is that the MOZ log is turned on and adds overhead. You can checkout one example where we compare the runs (we run on the exact same setup except that one is using latest stable Chrome and the other stable Firefox): http://wpt.wmftest.org/video/compare.php?tests=180409_1M_4X,180409_SV_4V First render happens at 1.1 s for Chrome and 4.0 for Firefox. I cannot turn of MOZ log though in that tool to verify that it is really the log that adds the overhead. Expected results: I emailed with Harald Kirschner who said I should file this bug and make sure that past (perf team eng manager) is cc:d (how do I do that).

Honza Bambas (:mayhemer)

Comment 1

•

7 years ago

It's expected to have an overhead. Also depends on if you use the "sync" module which flushes log writes after every line written to output file(s). Logging's purpose is to provide data for (known) bug analyzes and is not intended to run while doing any performance testing. If there is anything we can improve, let's do it. But there will IMO always be at least some overhead.

u408661

Comment 2

•

7 years ago

cc'ing (and ni?ing) :past as mentioned in the original report. Not sure there would be a whole lot for necko to do here - as Honza mentioned, enabling logging adds overhead, period. There may be an item here or there we could stop logging, or log slightly smarter, but most of our logging is pretty useful (I've honestly often wanted even more things to be logged). I have a sneaking suspicion this will end up either WONTFIX or in xpcom to look at the logging code in general.

Flags: needinfo?(past)

Eric Rahm [:erahm]

Comment 3

•

7 years ago

One option is to fine tune the use of log levels to move extremely verbose portions to LogLevel::Verbose, etc. For example all nsSocketTransport, nsHostResolver, & cache2 logging are done at the Debug (4) level. The default for nsHttp is also Debug with the exception of Http2Compression.cpp, Http2Session.cpp, Http2Stream.cpp, nsHttpConnection.cpp, nsHttpConnectionInfo.cpp, & nsHttpConnectionMgr.cpp defaulting to Verbose. Info is also used a fair amount [1]. Obviously that doesn't help in the reporter's case of "timestamp,nsHttp:5,cache2:5,nsSocketTransport:5,nsHostResolver:5" as those are all verbose. There are other options we can look into where the log data is emitted in a higher perf format that pushes off post processing to outside of Fx (similar to raw mach logs), but that hasn't been fully explored yet. Peter, what's your use case for enabling logging during perf testing? [1] https://searchfox.org/mozilla-central/search?q=symbol:M_9af0390facd94579&redirect=false

Flags: needinfo?(peter)

Peter Hedenskog

Reporter

Comment 4

•

7 years ago

WebPageTest.org has the HTTP log enabled and use it to create a HAR file for Firefox.

Peter Hedenskog

Reporter

Updated

•

7 years ago

Flags: needinfo?(peter)

Peter Hedenskog

Reporter

Comment 5

•

7 years ago

You can check it out here how it's used here: https://github.com/WPO-Foundation/wptagent/blob/master/internal/support/firefox_log_parser.py One problem IMHO is that since FF 54 the HTTP log has been the only way to automate to get HAR file for Firefox (since the HAR Export trigger was broken with 55). We will use https://github.com/devtools-html/har-export-trigger when it is ready (FF 61?) for our internal testing but we will also continue using WebPageTest.

Honza Bambas (:mayhemer)

Comment 6

•

7 years ago

Thanks for all the right questions and answers. I think we could do an optimization using the level, as mentioned by Eric. All top-level state stuff could go to just level 1 or 2, so that we'd lower the overhead while still providing what's needed. There is also some code that is log-enabled dependent as it does complex loops over pending or active connection and transaction lists. That could produce some overhead too and when enabled only for level 4+ we may have a win. Since I always wanted to have better layering of our net logging, this is a good opportunity to do it now!

Assignee: nobody → honzab.moz

Status: UNCONFIRMED → NEW

Ever confirmed: true

Priority: -- → P2

Panos Astithas (he/him) [:past] (please ni?)

Comment 7

•

7 years ago

Thanks for the ping, it seems that we've got a plan figured out now.

Flags: needinfo?(past)

Honza Bambas (:mayhemer)

Updated

•

7 years ago

Whiteboard: [necko-triaged]

Honza Bambas (:mayhemer)

Comment 8

•

7 years ago

Jason, when Kershaw is bask, can you please assign to him as we agreed? Thanks. Note that one of possible proposals was to have a separate module for logs that are interesting for webpage test. Comment 5 references the py code that does the log file parsing. It's the ultimate source of what we have to expose in this separate module or move to top-level (1) in the nsHttp module.

Assignee: honzab.moz → nobody

:Harald Kirschner :digitarald

Comment 9

•

7 years ago

Since this has been idle for a while I ni? Jason, assuming Honza meant to ping him.

Flags: needinfo?(jduell.mcbugs)

Panos Astithas (he/him) [:past] (please ni?)

Updated

•

7 years ago

Whiteboard: [necko-triaged] → [necko-triaged][perf-tools]

Kershaw Chang [:kershaw]

Assignee

Comment 10

•

7 years ago

(In reply to Honza Bambas (:mayhemer) from comment #8) > Jason, when Kershaw is bask, can you please assign to him as we agreed? > Thanks. > > Note that one of possible proposals was to have a separate module for logs > that are interesting for webpage test. > > Comment 5 references the py code that does the log file parsing. It's the > ultimate source of what we have to expose in this separate module or move to > top-level (1) in the nsHttp module. Assign to myself.

Assignee: nobody → kershaw

Flags: needinfo?(jduell.mcbugs)

Kershaw Chang [:kershaw]

Assignee

Comment 11

•

7 years ago

(In reply to Honza Bambas (:mayhemer) from comment #6) > Thanks for all the right questions and answers. > > I think we could do an optimization using the level, as mentioned by Eric. > All top-level state stuff could go to just level 1 or 2, so that we'd lower > the overhead while still providing what's needed. There is also some code > that is log-enabled dependent as it does complex loops over pending or > active connection and transaction lists. That could produce some overhead > too and when enabled only for level 4+ we may have a win. > > Since I always wanted to have better layering of our net logging, this is a > good opportunity to do it now! Honza, could you please explain more about what's top-level state? The log level 1 and 2 [1] actually refer to errors and warnings. I think really make error and warning logs go to level 1 and 2 and top-level logs go to level 3. What do you think? [1] https://searchfox.org/mozilla-central/rev/c9272ef398954288525e37196eada1e5a93d93bf/netwerk/protocol/http/HttpLog.h#45-46

Flags: needinfo?(honzab.moz)

Honza Bambas (:mayhemer)

Comment 12

•

7 years ago

See https://github.com/WPO-Foundation/wptagent/blob/master/internal/support/firefox_log_parser.py for logs WPO is using. Move those to level 1.

Flags: needinfo?(honzab.moz)

Kershaw Chang [:kershaw]

Assignee

Comment 13

•

7 years ago

Attached file Bug 1452535 - Move some logs to level 1 — Details

WPT parses HTTP log and create a HAR file. To reduce the overhead of logging, this patch moves some logs that are used by WPT parser to level 1.

Kershaw Chang [:kershaw]

Assignee

Comment 14

•

7 years ago

Hi Peter, I've created a build which produces logs that firefox_log_parser.py needs with lower log level. Could you download it from [1] and give it a test? From comment #0, I assume you run firefox on mac. If not, please let me know. Please configure MOZ_LOG to: timestamp,nsHttp:1,nsSocketTransport:1,nsHostResolver:1 Thanks. [1] https://queue.taskcluster.net/v1/task/eVaNCxeUQNOsax5zFZdyGQ/runs/0/artifacts/public/build/target.dmg

Flags: needinfo?(peter)

Peter Hedenskog

Reporter

Comment 15

•

7 years ago

Sorry, I have very limited time until Christmas. To be frank for me it doesn't matter if the log adds overhead for Firefox, as long the metrics I get out of Firefox is stable. But for you if you are using WebPageTest to compare metrics between Firefox and Chrome, I think it's really valuable so that when you compare, you comparison is fare.

Kershaw Chang [:kershaw]

Assignee

Comment 16

•

7 years ago

(In reply to Peter Hedenskog from comment #15) > Sorry, I have very limited time until Christmas. To be frank for me it > doesn't matter if the log adds overhead for Firefox, as long the metrics I > get out of Firefox is stable. But for you if you are using WebPageTest to > compare metrics between Firefox and Chrome, I think it's really valuable so > that when you compare, you comparison is fare. Thanks. I think I would still land this change. If you want, you can use "timestamp,nsHttp:1,nsSocketTransport:1,nsHostResolver:1" to make the log in the future.

Flags: needinfo?(peter)

Pulsebot

Comment 17

•

7 years ago

Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e7720f22eab1 Move some logs to level 1 r=mayhemer

Andreea Pavel [:apavel]

Comment 18

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e7720f22eab1

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox65: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla65

Stephen Donner [:stephend] Not actively reading bugmail

Comment 19

•

7 years ago

Cross-referencing here, for those interested in the WebPageTest status, here; I've put up a PR over in https://github.com/WPO-Foundation/wptagent/pull/219, which will/should immediately help us in Firefox Nightly-runs' overheads. Also on the WebPageTest side, Patrick Meenan landed a fix to the base Ubuntu image for Firefox on Linux, in EC2 (https://github.com/WPO-Foundation/webpagetest/commit/7a4d01158c58a01a4d613beb036d4a856abce16c), and now we have (starting tomorrow), accurate, up-to-date Firefox Nightly timings. Thanks for filing this, Peter, and thanks for the whole team, and for fixing it, Kershaw!

Patrick Meenan

Comment 20

•

7 years ago

I just pushed the change selectively for Nightly builds and it should roll out over the next hour. I haven't run extensive testing to be sure, but in the one quick test I ran, it looks like I lost the connection setup information for OCSP checks. Not a huge deal but wondering if that goes down a section of code where the log levels weren't updated. A few other thoughts: - I don't necessarily need full sync semantics, I just need the buffers written within a few seconds of the activity. Without sync it looks like there is no time-based flushing so a chunk of data is kept in RAM until the browser is actually closed. If it can flush but not necessarily fsync() that would meet my needs perfectly. - On Linux I may be able to use eatmydata to disable the fsync's at the OS level but still get the buffers flushing quickly: http://manpages.ubuntu.com/manpages/trusty/man1/eatmydata.1.html

Honza Bambas (:mayhemer)

Comment 21

•

7 years ago

Thanks Patrick. Kershaw, do you think you could help with this? I have no clear idea how to provide flushing (fflush) on a time based manner. We could add it to the 'test' function [1] and simply keep a timestamp (std::time) of the last write and fflush the file when std::time() - epoch > something. Not sure how std::time can be expensive. [1] https://searchfox.org/mozilla-central/rev/55895c49f55073d82d977cb74ec1d3a71ae4b25f/xpcom/base/Logging.h#193

Flags: needinfo?(kershaw)

Patrick Meenan

Comment 22

•

7 years ago

Another optimization I can do on my side is to only use sync when a test has more than one step. Otherwise I can close the browser at the end of the test before reading out the log. I'll implement that across all platforms and eatmydata on Linux (for the multi-step case) and see where that gets us.

Patrick Meenan

Comment 23

•

7 years ago

Looks like at least some events are missing from the logs. I'm not seeing the granular transfer timings in the WPT waterfalls. Nightly (long, solid download bars): https://www.webpagetest.org/result/181121_QE_5ff4dd3c0ad4504dafaa36711c194803/1/details/#waterfall_view_step1 Stable (individual chunk timings): https://www.webpagetest.org/result/181121_WB_5d8c71b8e5ff89e051be10f7318d004b/2/details/#waterfall_view_step1 I'll see about getting the other optimizations in place but it looks like I'll probably need to switch back to log level 5.

Kershaw Chang [:kershaw]

Assignee

Comment 24

•

7 years ago

(In reply to Honza Bambas (:mayhemer) from comment #21) > Thanks Patrick. Kershaw, do you think you could help with this? I have no > clear idea how to provide flushing (fflush) on a time based manner. We > could add it to the 'test' function [1] and simply keep a timestamp > (std::time) of the last write and fflush the file when std::time() - epoch > > something. Not sure how std::time can be expensive. > > > [1] > https://searchfox.org/mozilla-central/rev/ > 55895c49f55073d82d977cb74ec1d3a71ae4b25f/xpcom/base/Logging.h#193 I've filed bug 1509090 for time-based flushing.

Flags: needinfo?(kershaw)

Kershaw Chang [:kershaw]

Assignee

Comment 25

•

7 years ago

(In reply to Patrick Meenan from comment #23) > Looks like at least some events are missing from the logs. I'm not seeing > the granular transfer timings in the WPT waterfalls. > Could you show me what events you are looking for in your parser code? Thanks.

Flags: needinfo?(patmeenan)

Patrick Meenan

Comment 26

•

7 years ago

There are quite a lot of them: https://github.com/WPO-Foundation/wptagent/blob/master/internal/support/firefox_log_parser.py#L136 FWIW, I just removed sync from the logging, added eatmydata and switched log level back to 5: https://github.com/WPO-Foundation/wptagent/commit/61c647ea88c0d97268a0c6a41900e3d051f7f238 I can switch the log level back again as needed but if I'm consuming most of the debug events then it won't make much of a difference. Hopefully the move away from sync logging gets most of the benefits.

Flags: needinfo?(patmeenan)

Honza Bambas (:mayhemer)

Comment 27

•

7 years ago

(In reply to Kershaw Chang [:kershaw] from comment #25) > (In reply to Patrick Meenan from comment #23) > > Looks like at least some events are missing from the logs. I'm not seeing > > the granular transfer timings in the WPT waterfalls. > > > > Could you show me what events you are looking for in your parser code? > Thanks. Note that during the review process I went through all the lines you expect in the parser code. All of them have been updated in our code base.

Stephen Donner [:stephend] Not actively reading bugmail

Comment 28

•

6 years ago

Attached image Screenshot comparing two WebPageTest runs with log levels 3 vs./and 5 — Details

Stephen Donner [:stephend] Not actively reading bugmail

Comment 29

•

6 years ago

(In reply to Stephen Donner [:stephend] from comment #28)

Created attachment 9035800 [details]
Screenshot comparing two WebPageTest runs with log levels 3 vs./and 5

Looking at those two runs: 1) log level 3 (Nightly) and (2) log level 5 (release), respectively), we can see the following missed requests, in the Firefox Nightly run with log level 3:

oscp.digicert.com - /
oscp.digicert.com - /
aus5.mozilla.org - update.xml
oscp.digicert.com - /
oscp.digicert.com - /
oscp.digicert.com - /
oscp.digicert.com - /
oscp.digicert.com - /
aus5.mozilla.org - /
push.services.mozilla.com

(the latter two appear to not be properly disabled via https://github.com/WPO-Foundation/wptagent/blob/30310759c4120bf5054d4f7541671916e8e0d7db/internal/support/Firefox/profile/prefs.js?)

Patrick, any corrections/comments?

Flags: needinfo?(patmeenan)

Patrick Meenan

Comment 30

•

6 years ago

(In reply to Stephen Donner [:stephend] from comment #29)

Patrick, any corrections/comments?

From the screenshot it looks like it is also missing the connection events (DNS, socket connect, TLS negotiation). I wouldn't necessarily be surprised if it is also missing the HTTP/2 stream information.

It's possible that the events are missing because the log strings changed slightly by going to log level 3 and it's failing to parse the events generally and just falling back to the dev tools events. It's also possible the events just aren't there.

Flags: needinfo?(patmeenan)

Change in Visual metrics with MOZ log turned on/off 7 years ago Peter Hedenskog 144.44 KB, image/png		Details
Bug 1452535 - Move some logs to level 1 7 years ago Kershaw Chang [:kershaw] 46 bytes, text/x-phabricator-request		Details \| Review
Screenshot comparing two WebPageTest runs with log levels 3 vs./and 5 6 years ago Stephen Donner [:stephend] Not actively reading bugmail 2.92 MB, image/png		Details