Open Bug 1744483 Opened 3 years ago Updated 8 months ago

Automate generate HAR file on desktop and mobile

Categories

(DevTools :: Netmonitor, enhancement)

Firefox 94
enhancement

Tracking

(Not tracked)

People

(Reporter: peter, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(4 files, 1 obsolete file)

Steps to reproduce:

Hi, this is a feature wish of adding a way of automating getting a HAR file from Firefox on desktop and mobile. Today we use the HAR export trigger https://github.com/firefox-devtools/har-export-trigger for Firefox on desktop, but that do not work on mobile + you need to have devtools open add that adds overhead to the tests.

Let me explain why using a HAR is important for us:

At the Wikimedia Foundation we use RUM and synthetic tests to measure the performance of Wikipedia. For our synthetic tests (we start the browser, access one or multiple Wikipedia pages and measure them by recording a video of the screen and analyze the result) we also collect a HAR file for each test that we run. For us it has been really important because it has helped us a lot tracking down regressions. Here's a couple of examples of how it helped us:

  1. By having the HAR of the tests we can see what actually existed on the page at that time we run the tests. Since every page is user generated on Wikipedia (users can also add JS and CSS) it changes a lot during the day and sometimes is hard to know if a change in performance comes from changes in how Wikimedia team built the page or actual content changed by the user. By having that snapshot, it helps us to know.

  2. We have different layers of backend caches at Wikimedia where we cache output that is served to the user. Different content can have different cache time, and we have had cases in the past where content mismatched (the HTML that was served expected another version of the CSS). With the HAR and also including the HTML in the HAR response it helped us to see what we actually tested at that moment and what was wrong.

  3. When teams do changes at Wikimedia it sometimes really hard to know what kind of performance impact it will have. Each page is served with its own CSS and JS and since different browsers handle prio of requests differently, the HAR file is really helpful to see how different browsers handles our changes.

Today we use the HAR from Chrome (both desktop and Android) and Firefox (only desktop) using WebDriver and that helps us but it would be great to be able to get a HAR both on desktop an dmobile without overhead on Firefox.

Here's a backfground how we do things today:

Chrome

For Chrome we get the HAR by parsing the Chrome event trace log. When you start Chrome, you can turn on that logging and then we parse it with a tool one of my friend Tobias built a couple of years ago (https://github.com/sitespeedio/chrome-har). Using the raw event log from Chrome is kind of annoying because: The Chrome team sometimes do small/larger changes to the format and that is isn't always documented and communicated outside if the Chrome team (or rather never). It's also hard to get the HAR right especially if you have a scenario where you test multiple pages after each other. So I will open the same bug for Chrome asking if they also can make it easy to automate getting a HAR.

Firefox

Through the years Firefox has had two different approaches: Years ago (I think this was before Firefox 57) we used an extension https://github.com/firebug/har-export-trigger that could be triggered to get a HAR. The extension at that moment made it possible to either get the HAR using JavaScript (so for example if you use Selenium, you triggered the HAR and got it back using plain JS). That version could also automatically export HAR files to disk. At startup of Firefox you could set Firefox preferences to always store the HAR and set the location. What was good with that extension was that we could not see any overhead running it at that time and it was also pretty neat to store it directly to disk through configuration.

With the rewrite of Firefox we needed to switch to the newsly created extension: https://github.com/firefox-devtools/har-export-trigger (Firefox 60+) - The idea is the same, you install the extension and then you can get the HAR file using JavaScript. The problem with the new extension is that you need to have the devtools panel open for it to work (it uses https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/devtools/network/getHAR. So when we run our tests we open devtools (as small as possible because we don't want it to affect our video). This also makes it impossible to run the extension on Firefox on a mobile phone. The extension work there but you cannot open devtools. Also, since devtools is open, it adds overhead to the test. Something like 10-20% slower page loads. And the problem is also that the larger HAR file, it takes longer time to generate in Firefox.

There's one last option with Firefox that we haven't used and that is using the "MOZ HTTP log" (https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging). You can turn on a raw HTTP log for Firefox that list all requests/responses to disk. You can use that file to generate a HAR file, but it's quite complicated and for me I want the Mozilla team to do it that knows the internal best.

A couple of key things that's important for us:

  • An easy way to get the HAR, either stored automatically to disk or if we can retrieve it through plain JavaScript (using Selenium/WebDriver)
  • If the HAR can contain the response HTML/CSS and JS that is super handy or if that is configurable to include. If I remember correctly Firefox include everything and then we remove the responses we do not need.

The Bugbug bot thinks this bug should belong to the 'WebExtensions::Storage' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Storage
Product: Firefox → WebExtensions

The enhancement request described in this issue doesn't seem to fit in any the WebExtensions bugzilla components, the reporter would like to be able to create an HAR file of all requests/response, and in particular being able to do so also while connected to a Firefox for Android instance connected through ADB.

Based on the details from comment 0 it may better fit in either "DevTools :: Netmonitor" (e.g. to expose it as a method for a Firefox Remote Debugging Protocol actor), "Testing :: Marionette" (e.g. to expose it as a Marionette / WebDriver command) or "Remote Protocol :: CDP" (e.g. to expose it through Chrome DevTools Protocol support as in Chrome).

As an additional side note: I haven't checked if the CDP agent already part of Firefox expose enough of the Chrome DevTools Protocol to already support the same data collection the reporter.

In the meantime, I'm moving it to "Toolkit :: General" to be retriaged (but also adding needinfo to increase the chances we may have decided in which component this would fit better by the time we will get to it as part of Firefox :: General triage).

Hi Jan and Henrik,
what is your opinion about this request, would you think it may be better exposed as a Marionette/WebDriver command or through the "Firefox Remote Debugging Protocol" server or the "Chrome DevTools Protocol" agent?

Flags: needinfo?(odvarko)
Flags: needinfo?(hskupin)
Component: Storage → General
Product: WebExtensions → Toolkit

Oops sorry, I missed that this bug wasn't perfectly triaged.
This was meant to discuss about the opportunities to improve the current HAR features of DevTools.
So for now, I would like to keep it under the netmonitor umbrella.
If relevant, we would request some new features in CDP/WebDriver, but we are not there yet.

Note that the current implementation of CDP in Firefox is still limited to Network.requestWillBeSent and Network.responseReceived.
While chrome-har also listen to Network.requestServedFromCache, Network.requestWillBeSentExtraInfo, Network.responseReceivedExtraInfo, Network.dataReceived, Network.loadingFinished, Network.loadingFailed, Network.resourceChangedPriority.
And out of the two events we already support, we probably miss some or have compat issues.

Given that, improving our CDP implementation to cover HAR support on top of CDP is probably not the right call as that would require a quite significant effort whereas we might prefer to invest in WebDriver.

I'll report back on this bug on options we may have within DevTools without immediately going into protocols.
Support for android and remote devices may be a followup.

Component: General → Netmonitor
Flags: needinfo?(odvarko)
Flags: needinfo?(hskupin)
Product: Toolkit → DevTools

Hi Peter,

We made enough progress on the WebDriver BiDi network events to try to use them for HAR automation. We would like to provide a HAR generation feature in Firefox, based on BiDi, that you can use with your usual performance test suite and then get your feedback.

Two questions though:
1 - How should we expose the feature to "start" or "enable" the recording so that it's easy to integrate for you? Would a preference or a command line argument for Firefox be ok? I imagine the outcome of the recording should be a file saved to disk?
2 - We don't generate "complete" HAR files at the moment. Eg we are missing the response.content.text, and we don't fill the page entries. For now it's mostly focused on recording the requests. I've seen in the summary that response.content.text is very useful for you, so we'll try to get that added, but is there anything else we should make sure to include on top of request/response data?

Thanks!

Flags: needinfo?(peter)
Attached file example_bidi.har (obsolete) —

To illustrate, here's a typical har that is easy to generate with the current events we have in WebDriver BiDi.

Wow this is great news. Thank you all for the work!

  1. Would a preference or a command line argument for Firefox be ok? I imagine the outcome of the recording should be a file saved to disk?

Preference or command line argument works fine, do what's best for you. Saved to disk is good, it would be great if we can configure where it will be saved or if it's a programatic way to find it. Like we launch Firefox, access a page and then after we need to get hold of the HAR, we need to have a way know where it ends up.

is very useful for you, so we'll try to get that added, but is there anything else we should make sure to include on top of request/response data?

That would be great to have text.

a) Another thing that is important is that the HAR is valid so it can be viewed in http://www.softwareishard.com/har/viewer/ (like drag and drop and it works). Most HAR viewers rely on that that the fields that must exits, exists so that would be great. Else it's hard for me to visualise it.

b) If the HAR is saved to disk, when is it saved? How do you decide when the page is finished? If we access multiple pages after each other, will that be in the same HAR file or will it produce multiple HAR files? And how is those named if that the case?

Flags: needinfo?(peter)

(In reply to Peter Hedenskog from comment #6)

Wow this is great news. Thank you all for the work!

  1. Would a preference or a command line argument for Firefox be ok? I imagine the outcome of the recording should be a file saved to disk?

Preference or command line argument works fine, do what's best for you. Saved to disk is good, it would be great if we can configure where it will be saved or if it's a programatic way to find it. Like we launch Firefox, access a page and then after we need to get hold of the HAR, we need to have a way know where it ends up.

Sorry for the delay here! So after discussing internally with the team, the preference was to provide something that you could directly use from selenium instead of shipping you a custom build of Firefox. I still need to figure out how to publish that somehow but I already pushed something on a selenium fork: https://github.com/juliandescottes/selenium/tree/firefox-bidi-har-example, specifically this commit.

Since you are using the selenium node bindings, I imagine you should be able to use it as shown in the example here:

  let driver
  let options = new firefox.Options()
    .setBinary(firefox.Channel.NIGHTLY)
    .enableBidi()
  try {
    driver = await new Builder()
      .forBrowser('firefox')
      .setFirefoxOptions(options)
      .build()

    const id = await driver.getWindowHandle()
    const browsingContext = await BrowsingContext(driver, {
      browsingContextId: id,
    })

    const har = await harBuilder(driver, [id])
    await har.startRecording()

    await browsingContext.navigate('http://www.example.com')
    await driver.wait(until.titleIs('Example Domain'), 1000)

    const harExport = await har.stopRecording()
    console.log(JSON.stringify(harExport))

Does that approach sound relevant to you? Am I correct to think that you don't necessarily need a custom version of selenium here? Afterall we are only using regular BiDi events, which you can already subscribe to using Selenium? So you could maybe directly copy the har and harBuilder modules which I added under javascript/node/selenium-webdriver/bidi. Nevertheless I'll try to publish it somehow, but it might be easier for you this way.

is very useful for you, so we'll try to get that added, but is there anything else we should make sure to include on top of request/response data?

That would be great to have text.

a) Another thing that is important is that the HAR is valid so it can be viewed in http://www.softwareishard.com/har/viewer/ (like drag and drop and it works). Most HAR viewers rely on that that the fields that must exits, exists so that would be great. Else it's hard for me to visualise it.

b) If the HAR is saved to disk, when is it saved? How do you decide when the page is finished? If we access multiple pages after each other, will that be in the same HAR file or will it produce multiple HAR files? And how is those named if that the case?

The version I added on the selenium fork does not include text for now, but it should produce valid HAR files for HAR viewer and it should also support several pages correctly (which the Firefox DevTools doesn't really do at the moment).

Let me know if that could help you, in the meantime I'll try to publish my selenium fork.

I also had questions about the way you measure load and domContentLoaded for browsertime. Both are supposed to be available in the page timings of HAR files, but considering you are not consistently using HAR for Firefox today, I suppose you measure them using another something else? If yes, can you explain what is the approach? I have a naive implementation which I use in my current HAR files, but the results I get are not consistent with what Firefox DevTools computes for instance. Especially I'm not sure what should be the "origin" time used to compute them.

Thanks!

Flags: needinfo?(peter)

One important thing to note: you will need to use Firefox 110 or newer in order to have access to the necessary network events. So either Nightly or Beta / DevEdition

Nice, yes want to continue to use regular Selenium, that sounds good. I can use a patched version for the HAR fixes. Lets try to make it simple now (so you don't need to spend on it) and then I can give feedback when I get it up and running.

The version I added on the selenium fork does not include text for now, but it should produce valid HAR files for HAR viewer

Ok! I tested the HAR attached to this task there were some errors but maybe that not done with code in Selenium?

I also had questions about the way you measure load and domContentLoaded for browsertime. Both are supposed to be available in the page timings of HAR files, but considering you are not consistently using HAR for Firefox today, I suppose you measure them using another something else? If yes, can you explain what is the approach?

We can backfill them from the navigation timing API https://github.com/sitespeedio/browsertime/blob/main/browserscripts/timings/navigationTiming.js , like when the HAR produced we can add those later in Browsertime. We have metric that we add already using the underscore method that we use in the HAR viewer.

Flags: needinfo?(peter)

(In reply to Peter Hedenskog from comment #9)

Nice, yes want to continue to use regular Selenium, that sounds good. I can use a patched version for the HAR fixes. Lets try to make it simple now (so you don't need to spend on it) and then I can give feedback when I get it up and running.

Great, works for me :)

The version I added on the selenium fork does not include text for now, but it should produce valid HAR files for HAR viewer

Ok! I tested the HAR attached to this task there were some errors but maybe that not done with code in Selenium?

Yes! This one was made with an earlier version. After you mentioned the HAR viewer, I updated the implementation to generate HAR files compatible with the HAR viewer validation. I imagine it's still possible that in some scenarios we would generate invalid HAR files, but at least all the navigations I tested were fine. If you manage to create a HAR file which fails validation, happy to take a look.

I will attach a more recent example.

We can backfill them from the navigation timing API https://github.com/sitespeedio/browsertime/blob/main/browserscripts/timings/navigationTiming.js , like when the HAR produced we can add those later in Browsertime. We have metric that we add already using the underscore method that we use in the HAR viewer.

Oh nice, I see you use navigationStart as the reference. That's also what I had in mind, but since I could see DevTools was using a different origin, I wasn't sure it was the right choice.

A more recent example of a HAR file generated using the WebDriver BiDi network events.

Attachment #9313124 - Attachment is obsolete: true
Attachment #9314326 - Attachment mime type: application/octet-stream → text/json

Tested the new HAR, great! I'll try to incorporate it in Browsertime Monday the coming week.

See Also: → 1806833
Status: UNCONFIRMED → NEW
Ever confirmed: true

Ah sorry it took longer time for me to actually try it out. When I tried today, it worked fine! I'll finetune the code a little on my side and share the branch in Browsertime. Right now I just copy/pasted your code but I guess then the plan is to include it in Selenium?

I could verify that it works on Android too, that is great news :) I pushed a branch here: https://github.com/sitespeedio/browsertime/tree/ff-har-pro

(In reply to Peter Hedenskog from comment #14)

I could verify that it works on Android too, that is great news :) I pushed a branch here: https://github.com/sitespeedio/browsertime/tree/ff-har-pro

Great news and thanks for sharing the branch!

Out of curiosity, did you compare the performance with the current implementation? In early tests the BiDi approach had less overhead than the DevTools approach, but was still slower than without any HAR recording. Would be interested to know if came to similar conclusions.

I guess then the plan is to include it in Selenium?

I imagine Henrik or James could provide better answers here, but from I understood we are not yet sure where HAR generation should be. We will discuss about it with Selenium peers once network events have been integrated there.

No I haven't measured, to do that I need to push it to a server that is dedicated for tests. I can do that later, need to verify and test against a lot of different URLs before we'll make the switch :)

I have been finding a few bugs in the initial module I provided, it could be worth creating a dedicated npm package to build a har file from BiDi network events, so that we can easily reuse it in our client as well as in the browsertime branch.

Whiteboard: [webdriver:triage]
Whiteboard: [webdriver:triage]

For information I created a repository to keep the BiDi HAR generation logic in a single place: https://github.com/juliandescottes/bidi-har-export/

I will probably propose a PR for your branch to integrate it, it might be easier to share fixes this way?

Sounds great, do it! :)

(In reply to Peter Hedenskog from comment #19)

Sounds great, do it! :)

Sorry about the delay! PR up at https://github.com/sitespeedio/browsertime/pull/1908

Hi Peter!

Just wanted to check if there was an update on your side, if by chance you had time to try the BiDi solution with browsertime. And if there's anything we can do to help, let us know :)

Flags: needinfo?(peter)

No sorry but I can test later this week. Which Firefox version should I use?

Flags: needinfo?(peter)

(In reply to Peter Hedenskog from comment #22)

No sorry but I can test later this week. Which Firefox version should I use?

Sounds good! Any version of our current channels other than ESR should be fine here (so specifically: 111, 112 or 113).

Ah ok, I just tested it with 111 and it worked fine. Gonna roll it out on the test servers tomorrow and see how it goes. Thanks again! :)

Ok, the test do not pass on GitHub the reason is that we run tests on NodeJS 14, 16 and 18 and findLast (https://github.com/sitespeedio/browsertime/actions/runs/4535912120/jobs/7993067682?pr=1918) was introduced in 18. We try to make sure Browsertime works with NodeJS version until the EOL https://github.com/nodejs/release#release-schedule. Is it possible to fix that? After that I can make a test release and push to the test server.

(In reply to Peter Hedenskog from comment #25)

Ok, the test do not pass on GitHub the reason is that we run tests on NodeJS 14, 16 and 18 and findLast (https://github.com/sitespeedio/browsertime/actions/runs/4535912120/jobs/7993067682?pr=1918) was introduced in 18. We try to make sure Browsertime works with NodeJS version until the EOL https://github.com/nodejs/release#release-schedule. Is it possible to fix that? After that I can make a test release and push to the test server.

Sure! We'll get that fixed at https://github.com/firefox-devtools/bidi-har-export/pull/6, and will release a new version of the module soon.

Screenshot comparing the change in metrics using HAR Export vs the BIDI HAR testing the Barack Obama page using WebPageReplay. The only thing changed here is that I've switched to the bidi HAR. We can see that it much less overhead than the old HAR export :)

(In reply to Peter Hedenskog from comment #28)

Created attachment 9327977 [details]
Compare HAR export and Bidi HAR

Screenshot comparing the change in metrics using HAR Export vs the BIDI HAR testing the Barack Obama page using WebPageReplay. The only thing changed here is that I've switched to the bidi HAR. We can see that it much less overhead than the old HAR export :)

Thank you so much for the update and all the work here Peter! It's great to see some improvement with the BiDi based solution.

Two questions on my end:

  • do you also have the figures without recording any HAR file? to get an idea of how much overhead we still have
  • did you see any improvement for metrics such as load or domContentLoaded

Thanks!

Flags: needinfo?(peter)

do you also have the figures without recording any HAR file? to get an idea of how much overhead we still have

Let me do one test run without a HAR, I'll do that early next week.

did you see any improvement for metrics such as load or domContentLoaded

Yes. We run the tests on dedicated server with pinned CPU (low) frequency, so the page load is kind of slow but I could see a change:
LoadEventEnd 8.36 -> 6.21 s
domContentLoaded: 3,79 -> 3,33

Flags: needinfo?(peter)

Great, thanks for the update, looking forward to the "HAR-less" figures.

Attached image visual-bidi.jpg

Changes in running visual metrics running without Bidi and with Bidi-HAR.

Attached image bidi-no-bidi.jpg

Navigation Timing metrics using Bidi HAR no Bidi.

Hi Julian did a test run with the same setup with and without, we can see some overhead.

Thanks a lot for the update Peter! We were expecting some overhead so I would say this is not a surprise, but it's good to see the improvement compared to the devtools solution. We'll try to improve the recording of network events to reduce the overhead now, and I'll keep you posted!

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: