Open Bug 1753352 Opened 2 years ago Updated 20 days ago

www1.my.commbank.com.au - links fail with Firefox

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

Webcompat Priority P3
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- affected
firefox96 --- unaffected
firefox97 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 --- wontfix
firefox101 --- wontfix
firefox102 --- wontfix
firefox103 --- wontfix
firefox104 --- wontfix
firefox105 --- wontfix
firefox106 --- wontfix
firefox107 --- wontfix

People

(Reporter: karlcow, Unassigned, NeedInfo)

References

(Depends on 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(6 files)

As reported in https://webcompat.com/issues/99059 by the user:

When using Firefox (version is up todate today) certain links in the Netbank site stall at a dynamic loading page (10-20 seconds) them report a technical issue. Has been happening that Ive noticed for about a month. https://www1.my.commbank.com.au/netbank/container/ESD/Bills.Management/ContainerLaunch/1 is the URL fro the link to bills due which fails. Even the Help menue link fails https://www1.my.commbank.com.au/netbank/container/ESD/SupportHub.Provider/ContactUs/1?ei=nbmod-Header-Help

Crash Signature: When using Firefox (version is up todate today) certain links in the Netbank site stall at a dynamic loading page (10-20 seconds) them report a technical issue. Has been happening that Ive noticed for about a month. https://www1.my.commbank.com.au/netbank…

From mozregression:
Bug 1732358 - Part 5: Add the fission rollout slug to the GRADUATION_SET, r=mythmon

Depends on D133008

Differential Revision: https://phabricator.services.mozilla.com/D133659

Set release status flags based on info from the regressing bug 1732358

Has Regression Range: --- → yes

Patrick, can you please try bisecting this again with the fission.autostart pref set to true? Just add --pref "fission.autostart:true" to the mozregression command. That'll help narrow down the real culprit.

Flags: needinfo?(mpj.5)

Re did the bisection with fission.autostart set to true and got:
Bug 1320197 - Add reftest.

Differential Revision: https://phabricator.services.mozilla.com/D133653

Not sure if that is a plausible regressing bug. I have found that the issue doesn't occur in Firefox 96 with fission set to false

Flags: needinfo?(mpj.5)

No, that doesn't seem plausible. And looking at commits around the same time, that one was a few commits after bug 1732358 which gets back to comment 1.

Tried bisecting it again (with fission.autostart set to true) and got last good build 2012-08-09, first bad build 2021-08-10.

2022-02-05T12:35:11.920000: DEBUG : Found commit message:
Bug 1650089 - Part 9: Report errors back to caller when messaging from outside of a GeckoView controlled window, r=agi

Differential Revision: https://phabricator.services.mozilla.com/D122147

Not sure if that seems like a more likely candidate

Bug 1650089 seems plausible, yeah. Nika, any thoughts?

Flags: needinfo?(nika)

This will be difficult to diagnose conclusively based only on the regression range without STR, but it could potentially be related to the webpage in question relying on some timing behaviour around our initial about:blank documents (which might be impacted by :hsivonen's work in bug 1736570). There are also some observable behaviour changes in session history which might impact this, but it's hard to say what they are or how they would impact the situation.

The main things which that patch will have changed from a webpage's POV are navigation timing, and potentially some frames now loading in different processes than they would have previously. I could see them potentially leading to a website breaking due to a dependency on the previous ordering, but I don't know what would be broken or have an idea how to fix it.

It might be worth trying to reproduce this bug again with the patch from bug 1736570 applied, and see if it fixes the issue.

Flags: needinfo?(nika)

This will be difficult to diagnose conclusively based only on the regression range without STR

Yes and as usual bank websites are close to impossible to diagnose if you are not a bank account owner of that same bank. And the reporter in that case was anonymous.

I have an online account with the bank and am happy to assist

Thanks Patrick, that would be fantastic. Which platform are you on? If you're on Win64, here's a build you can try:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/GrJViMlcSwKWMY9_6S2hOw/runs/0/artifacts/public/build/target.zip

Flags: needinfo?(mpj.5)

The issue is reproducible on that build

Flags: needinfo?(mpj.5)

Set release status flags based on info from the regressing bug 1732358

Is there any kind of logging we could try to gather to help diagnose this issue? Just trying to figure out a way forward here.

Flags: needinfo?(nika)

Perhaps session history logs could be useful. ni? :smaug who might know what logs to look at and how to detect an issue from them.

Flags: needinfo?(nika) → needinfo?(bugs)

Given the patch which appears to have regressed the issue, MOZ_LOG=ProcessIsolation:5,DocumentChannel:5 might also be useful

SessionHistory:5 might reveal something, though this doesn't sound like a session history issue.

Flags: needinfo?(bugs)

Patrick, thank you so much for your help so far! Can you please try running Firefox with the MOZ_LOG=ProcessIsolation:5,DocumentChannel:5,SessionHistory:5 and MOZ_LOG_FILE=logfile.txt environment variables set and attach the resulting log to the bug? Thanks again!

Flags: needinfo?(mpj.5)

Where can I find instructions on setting environment variables?

Flags: needinfo?(mpj.5) → needinfo?(ryanvm)
Flags: needinfo?(ryanvm)
Attached file Logging output

Hope this is helpful

Severity: -- → S3
Priority: -- → P3

(triage based on webcompat priority)

Hi Nika, does the attached log show anything helpful?

Flags: needinfo?(nika)

I was able to reproduce the problem in a Nightly build, but it isn't consistently occurring. Let me know if there is anything I can contribute beyond what Patrick has done.

The logs help a ton, thanks for sharing them! I noticed a couple things with it, and I think I may have identified a bug, though I don't fully understand the issue here yet.

The first thing I noticed was that an iframe load of https://www.commbank.com.au/digital/identity/authenticate/sign-out?dpOnly=true (starts line 1179) becomes blocked by X-Frame-Options and ends up loading about:neterror instead at line 1211. This makes sense presuming the page serves the relevant header, as it has a different origin (https://www.commbank.com.au vs. https://www.my.commbank.com.au), but wouldn't really explain a fission difference, so I expect that this isn't the source of the problem.

I think the more likely cause of the issue is that at line 2022 a load of about:blank?hxtx=0&...[sic] is started. I'm not entirely sure how this load is starting, but it appears to resolve with a null principal and no precursor URI. This load ends up completing with a null principal (as far as DocumentLoadListener is aware at least). Because the principal doesn't match the existing process, the load ends up redirecting into a "web" remote type. If the page is going to try to access the document in that frame, it would make sense that it encounters an error, and would line up with the apparent regressing bug.

Based on that assumption I was able to create a test case which behaves differently between a chromium-based browser and Firefox by loading an <iframe src="about:blank?foo"></iframe> and checking whether the loaded document is same-origin. In Edge the test-case I'm attaching alerts [object HTMLDocument], whereas it alerts null in Firefox. This issue of the about:blank being cross-origin also occurs with Fission disabled on my local machine, however, so it is not the full cause of the issue. It would probably be good to make sure we align with chromium on our behaviour when loading about:blank with query parameters either way.

In addition to the about:blank principal behaviour, there was also an interesting apparent session history issue which I expect was the final cause of the problem (though it was triggered by the about:blank process switch). Immediately after the session history entry was added for the about:blank?foo URI (line 2090), it appears that another entry was added after it for another URI which had started loading earlier and had been process switched away from by the about:blank navigation (line 2146). I'm guessing that there is some race between the navigation handling with the process switch and the other load which stopped the loads from interrupting one-another.

ni? :smaug to potentially look into the session history issue more.

Flags: needinfo?(nika) → needinfo?(bugs)

Hmm, I think the log is still missing some information.
adding ,nsSHistory:5 would be useful. It would tell if we actually do any session history loads.
(sorry about missing that before. We have so many different ways to log this data ;) )

Patrick, could you perhaps create another log?

Flags: needinfo?(bugs) → needinfo?(mpj.5)
Attached file logfile.txt
Flags: needinfo?(mpj.5)

Is that what you need?

Flags: needinfo?(nika)
Flags: needinfo?(bugs)

That is useful yes, assuming it has nsSHistory in the log
It tells that we aren't doing any session history loads, so it would be rather surprising if session history would cause the issue.

nika, I'm not quite sure about your comment 25. It sounds like a race condition between loads rather than anything related to session history?

Flags: needinfo?(bugs)

But let me look at the log some more. Perhaps it hints something else.

Patrick, now that bug 1736570 has landed to Nightly, could you try if Nightly has still the same issue?
Also, the logs are oddly cut at the end. Not sure why that is happening. Perhaps the process was killed?
One could use 'sync' in the MOZ_LOG to ensure everything is captured in the log.
(See for example https://firefox-source-docs.mozilla.org/xpcom/logging.html#enabling-logging how 'sync' is used.)

If nothing is revealed this way, I'll add some more logging.

Flags: needinfo?(mpj.5)

The issue is still reproducible in Nightly. Additional logs to follow.

Attached file log.txt
Flags: needinfo?(mpj.5)

Oh, still one thing which might be useful.

If you set fission.autostart to false in about:config and restart the browser, the issue shouldn't happen.
Could you create a log from such page load?
(comparing the logs might reveal something useful)

Can confirm issue is not reproducible with fission.autostart set to false

More things to try. (this is tricky to debug)

Could you try with these prefs (set in about:config)
fission.autostart=false
fission.sessionHistoryInParent=true
fission.bfcacheInParent=false
You need to restart after modifying those.

Do you need more logs, or do you just want me to see if the issue is reproducible or not with those prefs?

Flags: needinfo?(bugs)

Just try those prefs.

(I'll need to add some more logging code to non-Fission case to get more useful data, but knowing how those prefs affect the behavior might make that part easier.)

Flags: needinfo?(bugs)

The issue is not reproducible with those prefs set

Great, thanks.

(Smells like a Fission page load issue, not SHIP issue. That reduces the needed logs at least)

Assignee: nobody → bugs
Status: NEW → ASSIGNED

Patrick, could you perhaps try the following
https://treeherder.mozilla.org/logviewer?job_id=372703080&repo=try
Scroll the top left area and there should be target.zip containing the build.
Or I guess setup.exe should work too.

Assignee: bugs → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(mpj.5)

I've had a look at the link you've provided, and can't find anything to download. Could you perhaps help me with a direct link or a screenshot pointing me in the right direction?

Flags: needinfo?(bugs)

(In reply to Patrick from comment #44)

I've had a look at the link you've provided, and can't find anything to download. Could you perhaps help me with a direct link or a screenshot pointing me in the right direction?

Please try:

Direct link to target.zip
Direct link to target installer

Thanks for your help!

Flags: needinfo?(bugs)

Thanks Jens for the direct link.

I can confirm that the issue is not reproducible in that build.

Flags: needinfo?(mpj.5)

(In reply to Patrick from comment #46)

I can confirm that the issue is not reproducible in that build.

Olli, any news how to move forward here? Thanks!

Flags: needinfo?(bugs)

I just hit this issue with Firefox 104.0a1 (2022-07-06) on Linux.

If I launch Firefox with a new, blank profile and login to NetBank, the "Bills & upcoming payments" page does not load.

If I launch Firefox with a new, blank profile, set "fission.autostart" to false, restart the browser, then login to NetBank, the page loads perfectly.

Following @Screwtape comment above, I tested today, Win 10 firefox 102.0.1 (64-bit)

With "fission.autostart" set True (default) some Netbank pages don't load, just show message "We're unable to find the page you were looking for due to a technical problem." Pages affected include Bills & upcoming payments, Apply for new account and others.

With "fission.autostart" set False, these pages load perfectly.

I'm a long time Netbank user, thanks to people working to resolve this bug.

This bug could still be reproduced on FF 105.0.1 64-bit on Win 10 21H2.
However, found another CommBank link with similar bug but seems can be accessed by anyone without a bank account:
Go to their Car Insurance page: https://www.commbank.com.au/insurance/car-insurance.html
Then Click "Get a quote" and Click "Get a quote" under "New customer or don't bank online with us?". It will lead to this URL "https://www.my.commbank.com.au/netbank/container/ESD/CarInsurance.QuoteAndApply/ContainerLaunch?ei=get-a-quote-nav"

Flags: needinfo?(smaug)

Confirmed that the link from comment 52 stays stuck on "Loading..." for me as well. Smaug, is that enough to make progress on this?

Flags: needinfo?(smaug)
Flags: needinfo?(smaug)

Bumping this up to smaug's attention.

Flags: needinfo?(smaug)

That testcase in comment 52 seems to be broken also in other browsers and also without Fission. I see "loading" page and then eventually
"Sorry! We're unable to find the page you were looking for due to a technical problem."

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #55)

That testcase in comment 52 seems to be broken also in other browsers and also without Fission.

Not sure if Commbank is doing geo-blocking, since I'm accessing with an Australian IP.
At the moment, I could access their site on iOS 15.7 Safari with no problem, no matter if I enter the "container/ESD" URL directly or access from the "Get a quote" button.
On my end it will redirects to "https://www2.my.commbank.com.au/netbank/container/ESD/CarInsurance.QuoteAndApply/ContainerLaunch?ei=get-a-quote-nav" and display the correct page.

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #55)

That testcase in comment 52 seems to be broken also in other browsers and also without Fission.

Was able to borrow an old VM for a moment and did a quick test.
Windows Server 2008 R2, Firefox 91.7.1esr 64bit, Japanese IP, page successfully loaded.
Please find attached screenshot.

I'm also a Netbank user; can confirm this bug still affects Mozilla Firefox 106.0.1 on FreeBSD 13.1-RELEASE amd64.

I see this has been set to wontfix; is there anything I can do to assist debugging, test fixes, etc.? I'm a CBA customer with Firefox available on a range of Linux, FreeBSD, and macOS machines.

(In reply to Duncan Bayne from comment #59)

I see this has been set to wontfix; is there anything I can do to assist debugging, test fixes, etc.? I'm a CBA customer with Firefox available on a range of Linux, FreeBSD, and macOS machines.

The bug is not set to wontfix, it is the already released versions of Firefox that are set as wontfix because we don't have a fix for these branches.

I tried the STR in Comment 52.

I can reproduce the problem (Nightly on Linux). but I ran into the problem that if I tried it a 2nd time after a Ctrl-F5 type refresh then it would work regardless of other settings. I can get it to fail again if I click the padlock and click Clear Cookies and Site Data. I'm not sure if the actual net banking has that problem, I'm not a Commbank customer.

(In reply to Nika Layzell [:nika] (ni? for response) from comment #25)

It would probably be good to make sure we align with chromium on our behaviour when loading about:blank with query parameters either way.

I intend to pursue this when bug 543435 is in a better shape for about:blank without a query.

(In reply to Pascal Chevrel:pascalc from comment #60)

(In reply to Duncan Bayne from comment #59)

I see this has been set to wontfix; is there anything I can do to assist debugging, test fixes, etc.? I'm a CBA customer with Firefox available on a range of Linux, FreeBSD, and macOS machines.

The bug is not set to wontfix, it is the already released versions of Firefox that are set as wontfix because we don't have a fix for these branches.

Ah, my bad. Though the offer of testing still stands, since I'm a CBA customer with a range of OSs available :)

bump... any news?

Looking into bug 543435, it seems that is still a bit off, so I'm thinking it would be good to get ourselves a fix here before we've finished that work. I think we could potentially expand the checks to make about:blank URIs with query parameters inherit principals / precursor principals to align better with Chromium like was tested by :smaug in comment 43 / comment 46.

I'm guessing the check happens in https://searchfox.org/mozilla-central/rev/7426a35738cd542b9488c7b67f4f6d21edfeda0a/dom/base/nsContentUtils.cpp#7125 specifically for this case, though we can't just expand the checks there, as there are relevant checks in many other places too.

Perhaps we add a new NS_MatchesAboutBlank(nsIURI*) method which does the more relaxed check, to align with the spec wording? Some places (e.g. 1, 2) probably also would need to be relaxed to match the new behaviour, whereas others (e.g. 1) probably wants the exact about:blank URI, even after the changes. We'd probably need to audit the specific callers, as well as other places checking URI specs against about:blank (e.g. 1, 2, 3).

In general, I think that if it's about principal inheritance, we'd probably need NS_MatchesAboutBlank, and if it's about specifically the URI of the implicitly-loaded about:blank document in chrome code, we probably want to stick with NS_IsAboutBlank. Perhaps we rename that method into NS_IsExactlyAboutBlank at the same time to make sure we've checked at least every caller of that c++ method?

On matrix, :smaug mentioned the potential option to have an [infallible] boolean matchesAboutBlank() method on nsIURI in order to also expose the method to JS callers, which might also be appealing, though we could also expose Servies.io.matchesAboutBlank(uri) or implement an equivalent method somewhere in JS (NetUtil.jsm?).

Depends on: sync-about-blank
Flags: needinfo?(smaug)
Flags: needinfo?(nika)

Leaving a new ni? on myself to look into this again soon, though I don't have the time to right now.

Flags: needinfo?(nika)

Firefox v111.0.1, not sure if something has been changed, I've tried to access "Bills & upcoming payments" and some other pages with URL like ".../netbank/container/ESD/..." ("Help" or "Credit Card Settings" for example) today while using netbank , the page could be loaded successfully about 1 out of 10 times.

That car insurance page mentioned in Comment 52, which does not require a bank account so everyone could try, will also load after several tries.
( https://www.commbank.com.au/insurance/car-insurance.html >> [Get a quote] >> [New customer or don't bank online with us? > Get a quote] >> IF quote page failed to loaded, close quote page and click "Get a quote" button to try again)

Car insurance page is also reported here a few months ago: https://github.com/webcompat/web-bugs/issues/112825

Problem still found in Firefox 114.0.2 running under Ubuntu 23.04 and Windows 10 running in Ubuntu VM.

Problem goes away when "fission.autostart" disabled.

Can confirm on Firefox 114.0.2 on macOS Ventura 13.4.1 as well (Apple M2 silicon).

It's still happening. I have disabled fission.autostart and the page will load, but what are the issues to watch out for with this disabled.

I know this could not possibly the issue of Fission and I am just confirming setting fission.autostart to false did make the commanbank netbank pages load.

Commbank "Bills & upcoming payments" still fails in Firefox 120.0.1. Issue resolved by setting fission.autostart to false.

(In reply to Henri Sivonen (:hsivonen) from comment #62)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #25)

It would probably be good to make sure we align with chromium on our behaviour when loading about:blank with query parameters either way.

I intend to pursue this when bug 543435 is in a better shape for about:blank without a query.

Bug 543435 is taking longer than I expected when I wrote the above comment, so it makes sense not to wait for bug 543435.

I agree with comment 66 that we need two functions: "is this about:blank exactly?" (what we already have) and a new "is this about:blank possibly with fragment or query?".

Do we have sufficient confirmation that the attached test case represents the real-world problem? If we do, it seems the next step would be looking in Pernosco at what calls to "is this about:blank exactly?" are made to get an idea of which those calls should be to "is this about:blank possibly with fragment or query?" instead.

(In reply to Chris Thompson from comment #73)

Commbank "Bills & upcoming payments" still fails in Firefox 120.0.1. Issue resolved by setting fission.autostart to false.

I suggest "issue worked around", not "resolved". Project Fission is there for a reason: disabling it exposes the user to more risks.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: