Closed Bug 1736570 Opened 3 years ago Closed 3 years ago

Outlook OWA Automatic Replies not working - _event error or broken fields.

Categories

(Core :: DOM: Navigation, defect, P1)

Firefox 82
defect

Tracking

()

RESOLVED FIXED
99 Branch
Webcompat Priority ?
Tracking Status
firefox-esr91 99+ fixed
firefox94 --- wontfix
firefox95 + wontfix
firefox96 + wontfix
firefox97 + wontfix
firefox98 + wontfix
firefox99 + fixed

People

(Reporter: itservice, Assigned: hsivonen)

References

(Depends on 1 open bug, Blocks 1 open bug, Regression)

Details

(Keywords: perf-alert, regression, Whiteboard: qa-not-reproducible, [wptsync upstream])

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0

Steps to reproduce:

You log in to the Outlook OWA portal. Then you go to the point -Automtische Reply Send- There you enter the e-mail that should be sent when you are written to

Actual results:

You get the error message:

Client Information

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:93.0) Gecko/20100101 Firefox/93.0
CPU Class: undefined
Platform: Win32
System Language: undefined
User Language: de-DE
CookieEnabled: true

Exception Details

Date: Tue Oct 19 2021 14:36:28 GMT+0200 (MitteleuropΓ€ische Sommerzeit)
Message: TypeError: a._events is undefined
Url: https://outlook.uni-polster.de/ecp/15.0.1497.18/scripts/microsoftajax.js
Line: 5

Call Stack

ErrorHandling.$EM@https://outlook.uni-polster.de/ecp/15.0.1497.18/scripts/common.js:1:172926
ErrorHandling.showUnhandledException@https://outlook.uni-polster.de/ecp/15.0.1497.18/scripts/common.js:1:172011

Dump Event

isTrusted = true
message = TypeError: a._events is undefined
filename = https://outlook.uni-polster.de/ecp/15.0.1497.18/scripts/microsoftajax.js
lineno = 5
colno = 58709
error = TypeError: a._events is undefined
composedPath = function composedPath() {
[native code]

}
stopPropagation = function stopPropagation() {
[native code]
}
stopImmediatePropagation = function stopImmediatePropagation() {
[native code]
}
preventDefault = function preventDefault() {
[native code]
}
initEvent = function initEvent() {
[native code]
}
type = error
target = [object Window]
srcElement = [object Window]
currentTarget = [object Window]
eventPhase = 2
bubbles = true
cancelable = true
returnValue = true
defaultPrevented = false
composed = false
timeStamp = 165576
cancelBubble = false
originalTarget = [object Window]
explicitOriginalTarget = [object Window]
NONE = 0
CAPTURING_PHASE = 1
AT_TARGET = 2
BUBBLING_PHASE = 3
ALT_MASK = 1
CONTROL_MASK = 2
SHIFT_MASK = 4
META_MASK = 8

Detailed Call Stack

Expected results:

The automatic e-mail should have been saved and set.

Hi,

Can you provide the link you are using to log in to the portal?

is it https://outlook.uni-polster.de ? or https://www.microsoft.com/en-us/microsoft-365/outlook/web-email-login-for-outlook ?

Does this issue occur in the latest nightly version of firefox? Here is a link from where you can download it:
https://www.mozilla.org/en-US/firefox/channel/desktop/

thanks.

Flags: needinfo?(itservice)

Hi,
the firefox version iss 93.0 and the link we are using is
https://outlook.uni-polster.de

Flags: needinfo?(itservice)

set component, not sure if its the correct one.
the site gives me a potential security risk and i have no access to it.

Component: Untriaged → General
Whiteboard: qa-not-reproducible

Can you tell me what version of Outlook you are using on the server?

I'm going to try to install a version locally for testing.

Or possibly give me a test account? (I know that's unlikely)

Flags: needinfo?(itservice)

Its on a Server with - Windows Server 2012 R2
Outlook Version: 15.0 (Build 1497.2)

Flags: needinfo?(itservice)

The best I can do for a regression window is August 28 of last year I think this covers it:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=56166cae2e26429f4786ad1013adae78189a12e8&tochange=29ba77e279e4346ecf071b81c8fb171115b266d1

This is a bigger issue now because of the new ESR which folks are switching to.

If you have a a response set, the bug is even worse because the fields are unclickable.

Severity: -- → S1
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Priority: -- → P1
Summary: Outlook OWA Automatic Anser → Outlook OWA Automatic Replies not working - _event error or broken fields.

I thought it might be bug 1649121 but I tried backing out part 76 (the only one I could) and it didn't fix it on trunk.

I was unable to get any of this revisions building so I could try to bisect exactly what broke it.

Olli/Masayuki, any ideas here?

Flags: needinfo?(masayuki)
Flags: needinfo?(bugs)

I'm still trying to get those old versions to build. My guess is that Bug 1589102 caused this, since the issue doesn't seem to be about editing.

(In reply to Mike Kaply [:mkaply] from comment #7)

The best I can do for a regression window is August 28 of last year I think this covers it:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=56166cae2e26429f4786ad1013adae78189a12e8&tochange=29ba77e279e4346ecf071b81c8fb171115b266d1

This is a bigger issue now because of the new ESR which folks are switching to.

If you have a a response set, the bug is even worse because the fields are unclickable.

FWIW, "the fields are unclickable" was randomly reproducible to me, not always reproducible but frequently. I still didn't figure out a reliable STR. I also agree the unclikable issue is even more severe.

Unfortunately I couldn't narrow it down any better than mkaply did.

One of the things I noticed is that I didn't have any font changes applied to either written text or selected text. Maybe look into any text editing changes done in the patch @mkaply? (that's my best guess).

The unclickable fields issue is indeed quite severe and I think I found out some reliable steps in reproducing it.

  1. Log into the outlook account.
  2. Click on the account settings button and choose 'set automatic replies' option.
  3. Choose 'Send automatic replies' and input some text in the fields.
  4. Click the save button and then the 'OK' button on the received error.
  5. Close Firefox (choose the 'Leave' option if prompted with the unsaved changes modal).
  6. Re-open the browser and login back into the outlook account.
  7. If the prompt to turn off the automatic replies is received choose NOT to turn off the replies (click NO on the prompt).
  8. Click on the account settings button and choose 'set automatic replies' option.

---> The textbox fields are unclickable.

Workaround:

  1. If the prompt to turn off the automatic replies is received, choose Yes to turn off the automatic replies (click Yes on the prompt). Access the 'set automatic replies' section and notice the fields can be edited.
  2. Go to the 'set automatic replies' section, choose the radio button to 'Don't send automatic replies' and click the save button. Click the outlook 'back' arrow and then access the section again. Once the 'Send automatic replies' radio button is selected the fields can be edited again (become active).

I've been trying to get old m-c to build, no luck yet. Even tryserver doesn't give me builds.

Now getting closer with builds.

If the send button triggers errors, only then the textareas become disabled.

And it is Bug 1589102 which causes the error.

Anny, could you take a look? I'll upload instructions how to get the old m-c build in a moment.

Flags: needinfo?(masayuki)
Flags: needinfo?(bugs)
Flags: needinfo?(agakhokidze)
Regressed by: 1589102
Has Regression Range: --- → yes

I needed to apply the patch and then also run the following commands to compile the old m-c
(https://hg.mozilla.org/mozilla-central/rev/e763b6e09a4914247ab86b3600acac8562e6c1d1 I didn't check which of the patches in that bug regressed the behavior).

rustup toolchain install 1.46.0
rustup override set 1.46.0
cargo install --version 0.14.3 cbindgen
export CBINDGEN=~/.cargo/bin/cbindgen

Because the changes that caused this regression are quite low level, I can't understand how exactly they caused this. It seems that there are two separate issues - first is that we get an error when we try to save the automatic response, and second, is that the text field for the automatic responses later become disabled and we can't type in them. I think that if we figure out what causes the first issue and solve it, the second issue will not be a problem. I will point out that the text fields become enabled once I clear the cookies for the site and login again.

The text fields in question are actually iframes whose location is about:blank. In my original patches, I changed about:blank loads to take place via DocumentChannel, which meant that about:blank pages took a little longer to load. I thought perhaps that's the issue, so I checked out the commit before my changes and added the following change (1) in nsFrameLoader to artificially delay loading of about:blank, but I could not reproduce the issue this way on my mac.

(1)

diff --git a/dom/base/nsFrameLoader.cpp b/dom/base/nsFrameLoader.cpp
index 39a85558280e4..9ad1dace7e1a7 100644
--- a/dom/base/nsFrameLoader.cpp
+++ b/dom/base/nsFrameLoader.cpp
@@ -476,6 +476,7 @@ void nsFrameLoader::LoadFrame(bool aOriginalSrc) {

   bool isSrcdoc = mOwnerContent->IsHTMLElement(nsGkAtoms::iframe) &&
                   mOwnerContent->HasAttr(kNameSpaceID_None, nsGkAtoms::srcdoc);
+  bool isBlank = false;
   if (isSrcdoc) {
     src.AssignLiteral("about:srcdoc");
     principal = mOwnerContent->NodePrincipal();
@@ -494,6 +495,7 @@ void nsFrameLoader::LoadFrame(bool aOriginalSrc) {
                                      nsGkAtoms::_true, eCaseMatters)) {
         return;
       }
+      isBlank = true;
       src.AssignLiteral("about:blank");
       principal = mOwnerContent->NodePrincipal();
       csp = mOwnerContent->GetCsp();
@@ -515,6 +517,12 @@ void nsFrameLoader::LoadFrame(bool aOriginalSrc) {
   if (rv == NS_ERROR_MALFORMED_URI) {
     rv = NS_NewURI(getter_AddRefs(uri), u"about:blank"_ns, encoding, base_uri);
   }
+  if (isBlank) {
+    for (int i = 0; i< 200000; i++) {
+      printf_stderr("i ");
+    }
+    printf_stderr("\n");
+  }

   if (NS_SUCCEEDED(rv)) {
     rv = LoadURI(uri, principal, csp, aOriginalSrc);

Hi Tom, I asked for some help in webcompat channel to look into the web site code. Karl suggested you may be the person who can help figure out what's happening. :)

Flags: needinfo?(twisniewski)

So there is definitely some kind of race condition with iframe loading here. If I log in, then open a new tab directly to https://outlook.uni-polster.de/ecp/Organize/AutomaticReplies.slab?showhelp=false (the iframe for the autoreply UI without the outer bits), then Chrome loads the default text for the rich text editor widgets containing the user's autoreplies, but Firefox ends up with both being empty iframes with none of that markup.

In Chrome, I see that the markup that's meant to be in those frames includes the string divtagdefaultwrapper, which only occurs in the base markup in this code in the HTML:

                Sys.Application.add_init(function() {
                    $create(AutomaticRepliesProperties, {
                        "Bindings": /* snip a bunch of stuff */
                        "JsonResults": "{\"__type\":\"AutoReplyConfigurationResults:ECP\", /* snip, includes divtagdefaultwrapper */]}",
                    }, null, {
                       /* snip more */
                    }, $get("ResultPanePlaceHolder_ctl00_ctl02_ctl01_AutomaticReplies"));
                });```

And if I place a breakpoint on that add_init call, both browsers are running it, and the delay introduced by me having to click to resume from that breakpoint also "un-breaks" the situation on Firefox, implicating possible iframe loading differences/race conditions.

Note that they are not actually setting any iframe src/srcdoc, they are simply altering the default empty HTML markup of the iframe (as given by about:blank) and setting its body via innerHTML to the JSONResults above. After some diggging around I found that ultimately this happens in https://outlook.uni-polster.de/ecp/15.0.1497.18/scripts/richtexteditor.js in:

  OwaEditor.Dom.Element.prototype.set_InnerHtml = function (n) {
    do if (n == null || n === undefined) throw new Error('Assert fail [' + '.\\dom/element.js' + ':' + 419 + ']: argument ' + 'sHTML' + ' can\'t be null or undefined');
    while (0);
    do if (n !== null && n !== undefined && String.isInstance != undefined && !String.isInstance(n)) throw new Error('Assert fail [' + '.\\dom/element.js' + ':' + 420 + ']: argument ' + 'sHTML' + ' is not of type ' + 'String');
    while (0);
    this._get_DomNode().innerHTML = n
  };

But if I log the node using a conditional logpoint with n.indexOf('divtagdefaultwrapper')>=0 then I see that while there is a body element being set to contain the expected text, by the time I change to the inspector to study that node it's stale and no longer in the document, with its replacement being a freshly-empty body element. This isn't happening on Chrome, leading me to believe either our iframe loading code is replacing the document after they've changed its innerHTML , or there is something else in their code triggering the same kind of result.

Would we expect the initial about:blank document to be replaced with a new document, even if it has no src/srcdoc? If so, could we maybe carry over any innerHTML that might have been set before, treating it as though it's a de-facto srcdoc?

Flags: needinfo?(twisniewski)

Henri or Kris may be able to answer the question in comment 19.

Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(hsivonen)

(In reply to Thomas Wisniewski [:twisniewski] from comment #19)

Would we expect the initial about:blank document to be replaced with a new document, even if it has no src/srcdoc? If so, could we maybe carry over any innerHTML that might have been set before, treating it as though it's a de-facto srcdoc?

It depends on exactly how the about:blank document is loaded. There's an initial implicit about:blank document which is always created instantly when it's needed, and a possible explicit about:blank document, which can be created a number of ways, like setting window.location or the src attribute to "about:blank". That one is the one that would be created by DocumentChannel, and is now asynchronous. I would expect it to replace the contents added to the initial one if there's an explicit about:blank load going on and the content is added to the initial one too early.

Flags: needinfo?(kmaglione+bmo)

Do you by chance know how the iframe is being created, and if/when "about:blank" is being loaded into it, either via a src attribute or something like a location assignment? My guess is that this is most likely to happen if we dynamically create an iframe with an explicit about:blank load and then immediately modify the DOM of the initial about:blank document.

Flags: needinfo?(twisniewski)

No, I don't know yet, but I'll take a closer look tomorrow. Thanks for the info, it does make me suspect that they're doing something (perhaps unintentionally) to trigger a document replacement.

The cause for the described symptoms is typically that the script adds content to the content viewer -created immediately-available synthetic about:blank and then the subsequent parser-generated about:blank replaces that document from the event loop.

Flags: needinfo?(hsivonen)
Component: General → DOM: Navigation
Product: Firefox → Core
Version: Firefox 95 → Firefox 82

Yes, I think Henri's right. I poked around a bit more, but don't see them doing anything interesting with the iframe after settings its body's innerHTML.

Flags: needinfo?(twisniewski)

But what changed in about:blank handling? Loading the new about:blank has always been async, so just modifying the initial about:blank immediately should have been broken before the changes too, but it wasn't.
Or is something perhaps asynchronously modifying the document? And before the changes it ended up modifying the loaded about:blank but now loading takes more time (does it?) so the code modifies the initial about:blank?

Good question. So far all I've seen is that the page has the iframe specified right in the markup:

<Iframe id="ResultPanePlaceHolder_ctl00_ctl02_ctl01_contentContainer_rteInternalMessage_ifBdy" class="richTextFrame" frameborder="0"></Iframe>

And then their scripts set its body's innerHTML as the page loads via the mechanisms I mentioned in my previous comment, and the after that at some point those changes are lost as the iframe's document seems to be replaced.

The most promising related changes in the regression range seem to be the ones in bug 1589102 .. maybe that changed the timings here, and where once the iframe's body would have been replaced earlier before their scripts set its innerHTML, they're now changed a bit later?

Flags: needinfo?(bugs)

Yeah, I believe that is what bug 1589102 does, effectively slows down loading the about:blank.
Obviously the site should use load event listener and modify document only after the page has been loaded.
But, in other browsers that isn't needed.

It isn't super clear to me why we need to go through the parent process in this kind of case though.

But I don't still quite understand what causes this. Do we end up stopping loading the about:blank before the patch, but new implementation just
continues loading?

Flags: needinfo?(bugs)

I'm afraid that based on my limited knowledge, I can't tell whether we actually stopped any such loads before, or just replaced the document sooner before the page changed it, so I'll have to defer to AnnyG for more insight.

Olli, I'm not sure I can answer your question, but I'll give a summary of what my patches did and then I'll NI Henri so hopefully he could have some more insights into this (as per my convo with Hsin-Yi).

Originally what prompted the changes was a small case written by zombie here
(in mochitests in a debug build with fission enabled)

let win = window.open();
win.location = "http://example.net/browser";
setTimeout(() => win.location = "about:blank", 300); // crash

Nika then provided her analysis of what happened

The root of the issue here is that loading about:blank inside of a BrowsingContext loads a document which inherits the principal from the global which triggered the load. This causes an issue, as we don't run about:blank loads through DocumentChannel, which means that we don't trigger a process switch to the correct process, and end up loading an about:blank with a principal which doesn't match the target process.
We need to support performing a process switch in this case so that the final loaded about:blank document is loaded in the current process.

Since as a team we agreed that all loads should be going via DocumentChannel, this was a remaining task - making about:blank go via DocumentChannel.

So even though my original work that prompted this has a lot of patches, there are only a few relevant ones

Flags: needinfo?(agakhokidze) → needinfo?(hsivonen)

When we figure this out, we also need to followup to this conversation on the enterprise list:

https://groups.google.com/a/mozilla.org/g/enterprise/c/69SK6V-weSc

(In reply to Anny Gakhokidze [:annyG] from comment #30)

Olli, I'm not sure I can answer your question, but I'll give a summary of what my patches did and then I'll NI Henri so hopefully he could have some more insights into this (as per my convo with Hsin-Yi).

Originally what prompted the changes was a small case written by zombie here
(in mochitests in a debug build with fission enabled)

let win = window.open();
win.location = "http://example.net/browser";
setTimeout(() => win.location = "about:blank", 300); // crash

Nika then provided her analysis of what happened

The root of the issue here is that loading about:blank inside of a BrowsingContext loads a document which inherits the principal from the global which triggered the load. This causes an issue, as we don't run about:blank loads through DocumentChannel, which means that we don't trigger a process switch to the correct process, and end up loading an about:blank with a principal which doesn't match the target process.
We need to support performing a process switch in this case so that the final loaded about:blank document is loaded in the current process.

Since as a team we agreed that all loads should be going via DocumentChannel, this was a remaining task - making about:blank go via DocumentChannel.

Do I understand correctly that there was a need to handle a process switch when navigating from a non-about:blank page to a non-initial about:blank, and as a side effect the initial about:blank started completing at a further distance than before in terms of event loop spins?

We don't need the initial about:blank to go through DocumentChannel to fix non-initial ones, right?

(Over in bug 543435, I'm experimenting with loading non-initial about:blank completely normally and loading initial about:blank even more specially than at present.)

Flags: needinfo?(hsivonen) → needinfo?(agakhokidze)

Nika, I'm NI-ing you in case I say something wrong, so please correct me if that's the case :) I think you can just read comment 30 and Henri's comment above for this.

Do I understand correctly that there was a need to handle a process switch when navigating from a non-about:blank page to a non-initial about:blank, and as a side effect the initial about:blank started completing at a further distance than before in terms of event loop spins?

I think the side effect you outlined is wrong. It's only non-initial about:blank's that take longer to complete than before, because the initial about:blank is still handled in a very special way (I didn't change the way we manually create initial about:blanks).

We don't need the initial about:blank to go through DocumentChannel to fix non-initial ones, right?
I don't think so.

Flags: needinfo?(agakhokidze) → needinfo?(nika)

It seems likely that the problem here is the slightly different timing of the implicit about:blank document load triggered by inserting the iframe, as has been mentioned before. The initial about:blank is still created synchronously, but this second load used to always complete in exactly 1 spin of the event loop, but may take an arbitrary amount of time (as it includes IPC) now. This means that any work being triggered on e.g. a setTimeout(..., 0) would've previously fired on the explicit about:blank load, but now fires on the implicit one and is clobbered.

As mentioned in comment 32, there is some work :hsivionen is doing in bug 543435 to try to align our about:blank load behaviour more with chrome & webkit's right now, which might fix this issue by removing the behaviour difference, but I don't know what the timelines on that look like, and we might need a faster solution. It might be possible to add a flag passed through the nsDocShellLoadState for these initial implicit about:blank loads which disables document channel just for these loads, though it's kinda a gross workaround. You could perhaps get away with checking if:

if (DocumentChannel::CanUseDocumentChannel(aLoadState->URI()) &&
    !(NS_IsAboutBlank(aLoadState->URI()) && inheritPrincipal &&
      // I'm using == here because I think it needs identity due to document.domain
      aLoadState->PrincipalToInherit() == GetInheritedPrincipal(false) &&
      (!mContentViewer || !mContentViewer->GetDocument() ||
       mContentViewer->GetDocument()->IsInitialDocument()))) {
  // use document channel
}

Which should effectively skip using document channel in roughly-ish the cases where we'd be using the sync about:blank in :hsivonen's code. It's pretty gross though.

Flags: needinfo?(nika)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #34)

It seems likely that the problem here is the slightly different timing of the implicit about:blank document load triggered by inserting the iframe, as has been mentioned before. The initial about:blank is still created synchronously, but this second load used to always complete in exactly 1 spin of the event loop, but may take an arbitrary amount of time (as it includes IPC) now. This means that any work being triggered on e.g. a setTimeout(..., 0) would've previously fired on the explicit about:blank load, but now fires on the implicit one and is clobbered.

As mentioned in comment 32, there is some work :hsivionen is doing in bug 543435 to try to align our about:blank load behaviour more with chrome & webkit's right now, which might fix this issue by removing the behaviour difference, but I don't know what the timelines on that look like, and we might need a faster solution.

Henri has posted some ideas in bug 543435 but at this moment it's not clear if the idea is viable and how much work needed. I'd agree that we should fix this issue faster, even it means we take a hacky workaround.

It might be possible to add a flag passed through the nsDocShellLoadState for these initial implicit about:blank loads which disables document channel just for these loads, though it's kinda a gross workaround. You could perhaps get away with checking if:

if (DocumentChannel::CanUseDocumentChannel(aLoadState->URI()) &&
    !(NS_IsAboutBlank(aLoadState->URI()) && inheritPrincipal &&
      // I'm using == here because I think it needs identity due to document.domain
      aLoadState->PrincipalToInherit() == GetInheritedPrincipal(false) &&
      (!mContentViewer || !mContentViewer->GetDocument() ||
       mContentViewer->GetDocument()->IsInitialDocument()))) {
  // use document channel
}

Which should effectively skip using document channel in roughly-ish the cases where we'd be using the sync about:blank in :hsivonen's code. It's pretty gross though.

See Also: → 1741058
Assignee: nobody → hsivonen
Status: NEW → ASSIGNED

Need to look at toolkit/components/antitracking/test/browser/browser_aboutblank.js

Shippable builds to run the steps to repro with to see if the problem goes away:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eecf787e32b577f289f25a4eb1a85fc31ef3139c

(In reply to Henri Sivonen (:hsivonen) from comment #39)

Shippable builds to run the steps to repro with to see if the problem goes away:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eecf787e32b577f289f25a4eb1a85fc31ef3139c

This OWA issue seemed to be fixed by the patch. I can't reproduce it.

FWIW, https://bugzilla.mozilla.org/show_bug.cgi?id=1741058 is still reproducible here. But I think we can leave that bug separately.

(In reply to Hsin-Yi Tsai [:hsinyi] from comment #40)

(In reply to Henri Sivonen (:hsivonen) from comment #39)

Shippable builds to run the steps to repro with to see if the problem goes away:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=eecf787e32b577f289f25a4eb1a85fc31ef3139c

This OWA issue seemed to be fixed by the patch. I can't reproduce it.

This matches what I'm seeing.

FWIW, https://bugzilla.mozilla.org/show_bug.cgi?id=1741058 is still reproducible here. But I think we can leave that bug separately.

That one seems to be a duplicate of bug 543435.

(In reply to Henri Sivonen (:hsivonen) from comment #38)

Need to look at toolkit/components/antitracking/test/browser/browser_aboutblank.js

Findings so far: In the DocumentChannel case, mIsThirdPartyContextToTopWindow of LoadInfo is set to false via IPC. With the patch, we end up defaulting it to true when constructing the LoadInfo.

AFAICT, the IPC case also comes from a default, but a different place where things are set to default values.

Try run after rebase in the hope of seeing which oranges are unrelated to the patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3cf4e8ff0f9f046db094374a38b728544829df6c

This patch makes history.length too short by one:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/html/browsers/browsing-the-web/history-traversal/browsing_context_name-0.html#19

I expect nsParser needs to graduate the initial about:blank into a non-initial state.

(In reply to Henri Sivonen (:hsivonen) from comment #47)

This patch makes history.length too short by one:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/html/browsers/browsing-the-web/history-traversal/browsing_context_name-0.html#19

I expect nsParser needs to graduate the initial about:blank into a non-initial state.

Wrong remark about what needs to happen here.

smaug, do you have ideas why merely making the parser-created about:blank get created sooner in terms of event loop spins might make history.length shorter? Naively, I'd think there'd be more opportunity to make it one longer this way.

Flags: needinfo?(bugs)

So far, it looks like session history in parent causes history.length to depend on IPC. Though I haven't confirmed it yet, it's likely that this change removing IPC from the about:blank path makes history.length update IPC messages to arrive after history.length is read.

Not sure what to do about this. Also not sure if this actually matters for Web compat.

Confirmed that this is about session history IPC.

smaug, any ideas about whether this is worth fixing and, if so, how?

(All the other WPT failures are also on the history theme and most likely have the same root cause.)

So the idea is to load about:blank in child process without going through parent process?
Do we still get the SessionHistoryCommit at some point?
it sounds like we're missing AddPendingHistoryChange() call in the child side. That would keep the history.length correct, while IPC messages are still going through the parent process.

(In reply to Olli Pettay [:smaug] from comment #52)

So the idea is to load about:blank in child process without going through parent process?

Yes. Thanks for the off-bug advice. The current patch seems to address the issue at least for the test I was looking at.

Let's see what happens with more tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a48e376b167c832bba2abb71f927e7d0c3395f0f

Flags: needinfo?(bugs)

Sigh. Copypaste errors.

Here's a try run that chains mActiveEntry for real:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ed19f0ae04f209992b8f7d79aeb64803ed1b299c

For https://searchfox.org/mozilla-central/source/testing/web-platform/tests/html/browsers/browsing-the-web/history-traversal/history-traversal-navigate-parent-while-child-loading.html :

The history.length assert before the load handler gets the expected result by having commited 1 and adding 2 from pending.

The assert in the load handler gets an unexpected result by having committed 2 without anything pending.

2 pending being two deltas of one each.

content-security-policy/navigate-to/unsafe-allow-redirects/blocked-end-of-chain.sub.html failing on try doesn't appear to be caused by this patch.

The patch here might be making bug 1744379 more frequent but not permaorange.

The patch here seems to be making bug 1727101 permaorange.

(In reply to Henri Sivonen (:hsivonen) from comment #68)

The patch here seems to be making bug 1727101 permaorange.

This should now be addressed in the patch. Try run to check:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b08cd9f3e663c2264ddb5f749dd274fe47d999f3

(In reply to Henri Sivonen (:hsivonen) from comment #68)

The patch here might be making bug 1744379 more frequent but not permaorange.

Disabling this one on Linux debug.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=55656d6ba9281c51e40b8f0cb7a7ad87d806d06a

Notably, the cases of tests turning more orange are with the socket process. I sure hope the socket process doesn't alter about:blank stream behavior.

Rebase:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=391f6b910faeaac7da57a6b64b73c8b7c97ef605

(In reply to Henri Sivonen (:hsivonen) from comment #69)

(In reply to Henri Sivonen (:hsivonen) from comment #68)

The patch here seems to be making bug 1727101 permaorange.

This should now be addressed in the patch. Try run to check:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b08cd9f3e663c2264ddb5f749dd274fe47d999f3

And this bit didn't work.

Blocks: 1727101

(In reply to Henri Sivonen (:hsivonen) from comment #71)

Rebase:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=391f6b910faeaac7da57a6b64b73c8b7c97ef605

(In reply to Henri Sivonen (:hsivonen) from comment #69)

(In reply to Henri Sivonen (:hsivonen) from comment #68)

The patch here seems to be making bug 1727101 permaorange.

This should now be addressed in the patch. Try run to check:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b08cd9f3e663c2264ddb5f749dd274fe47d999f3

And this bit didn't work.

Moving to disable the test in bug 1727101 per sheriff's recommendation.

Webcompat Priority: --- → ?

Deferring SetActiveSessionHistoryEntry call until mLoadingEntry becomes mActiveEntry:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3957ffcd43285d36855d1ad990d77e6d4362f153

(In reply to Henri Sivonen (:hsivonen) from comment #76)

Deferring SetActiveSessionHistoryEntry call until mLoadingEntry becomes mActiveEntry:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3957ffcd43285d36855d1ad990d77e6d4362f153

I confirmed that the try build of Comment#76 solved Bug 1741058 too.

(In reply to Henri Sivonen (:hsivonen) from comment #79)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4b7567e58845060b7a4d85944c8738f792ce0c6

Sigh. A new failure:
REFTEST TEST-UNEXPECTED-FAIL | layout/base/crashtests/378682.html | assertion count 1 is more than expected 0 assertions

(In reply to Henri Sivonen (:hsivonen) from comment #80)

(In reply to Henri Sivonen (:hsivonen) from comment #79)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4b7567e58845060b7a4d85944c8738f792ce0c6

Sigh. A new failure:
REFTEST TEST-UNEXPECTED-FAIL | layout/base/crashtests/378682.html | assertion count 1 is more than expected 0 assertions

This looks intermittent.

(In reply to Henri Sivonen (:hsivonen) from comment #81)

(In reply to Henri Sivonen (:hsivonen) from comment #80)

(In reply to Henri Sivonen (:hsivonen) from comment #79)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4b7567e58845060b7a4d85944c8738f792ce0c6

Sigh. A new failure:
REFTEST TEST-UNEXPECTED-FAIL | layout/base/crashtests/378682.html | assertion count 1 is more than expected 0 assertions

This looks intermittent.

It seems that if this occurs, it occurs upon reloading about:blank. That goes through a cross-process path and, therefore, IPC. I fail to see a mechanism how this patch would cause this intermittent.

Try run to see if the null parent WindowContext case is exercised by the test suite:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d61cd3edb1c52d9e3fc6f6f24d9a142eb960294

(In reply to Henri Sivonen (:hsivonen) from comment #81)

(In reply to Henri Sivonen (:hsivonen) from comment #80)

(In reply to Henri Sivonen (:hsivonen) from comment #79)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4b7567e58845060b7a4d85944c8738f792ce0c6

Sigh. A new failure:
REFTEST TEST-UNEXPECTED-FAIL | layout/base/crashtests/378682.html | assertion count 1 is more than expected 0 assertions

This looks intermittent.

5 times out of 19 looks bad as far as intermittent odds go.

Third-partiness rewritten:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b7bcf7f82392a0f90de1f733d8d4031a6cfe61f

(I still haven't gotten a Pernosco repro of the intermittent.)

Pushed by hsivonen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/15c3d4b61356
Avoid DocumentChannel for nsParser-created initial about:blank replacement. r=nika,smaug
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/32875 for changes under testing/web-platform/tests
Whiteboard: qa-not-reproducible → qa-not-reproducible, [wptsync upstream]
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Upstream PR merged by moz-wptsync-bot

What are your thoughts on backporting to the ESR once it's baked?

I think it's an important use case for ESR, but it'll also need rebasing. It does graft cleanly to Beta.

(In reply to Mike Kaply [:mkaply] from comment #91)

What are your thoughts on backporting to the ESR once it's baked?

Let's see what this try run looks like:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9610359241b7770a15159b0c8491f0a20e507949

The patch landed in nightly and beta is affected.
:hsivonen, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(hsivonen)

Henri, do you plan on requesting an uplift to beta for the 98 release or should we let it ship with 99?

The patch touched session history which historically caused regressions. I think we should work with QA to test the patch on top websites and the websites which we had regressions before we uplift to ESR.

== Change summary for alert #33325 (as of Mon, 21 Feb 2022 09:35:32 GMT) ==

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
5% google-docs SpeedIndex linux1804-64-shippable-qr cold fission webrender 1,249.83 -> 1,186.69
4% espn dcf android-hw-p2-8-0-android-aarch64-shippable-qr warm webrender 644.48 -> 618.88
4% google-docs SpeedIndex linux1804-64-shippable-qr cold fission webrender 1,243.88 -> 1,195.92
3% google-docs ContentfulSpeedIndex linux1804-64-shippable-qr cold fission webrender 1,768.32 -> 1,719.42
2% espn dcf android-hw-g5-7-0-arm7-shippable-qr warm webrender 1,677.31 -> 1,643.75

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=33325

(In reply to Pascal Chevrel:pascalc from comment #96)

Henri, do you plan on requesting an uplift to beta for the 98 release or should we let it ship with 99?

Let's proceed like Hsin-yi said.

I updated the ESR patch to account for the uplift of bug 1671819, which fixed the orange in the previous try run. New try run just in case:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62bf677629b0948f9bb1f650b25610f0f403d91b

Flags: needinfo?(hsivonen)

(In reply to Henri Sivonen (:hsivonen) from comment #99)

(In reply to Pascal Chevrel:pascalc from comment #96)

Henri, do you plan on requesting an uplift to beta for the 98 release or should we let it ship with 99?

Let's proceed like Hsin-yi said.

I updated the ESR patch to account for the uplift of bug 1671819, which fixed the orange in the previous try run. New try run just in case:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=62bf677629b0948f9bb1f650b25610f0f403d91b

This treeherder view should get shippable builds for testing shortly.

I messed up merging the test case. New try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad9ef992a4a012208a4c2d6c85d9ff7ba2d722b5

(This is a test-only change, so the shippable builds from above are valid for testing.)

Shippable-configuration beta-channel builds with the patch for testing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ffe5940ef21d8b8f0195e60e14e45c80fffe3c85

Henri, could you please provide a shippable try build for ESR 91.8 for QA testing? Thank you.

Flags: needinfo?(hsivonen)

(In reply to Hsin-Yi Tsai [:hsinyi] from comment #103)

Henri, could you please provide a shippable try build for ESR 91.8 for QA testing? Thank you.

Shippable builds should appear at
https://treeherder.mozilla.org/#/jobs?repo=try&revision=59e8fdd30c90bae78b0b3730e4986d72eec96a82

Flags: needinfo?(hsivonen)

(In reply to Henri Sivonen (:hsivonen) from comment #104)

(In reply to Hsin-Yi Tsai [:hsinyi] from comment #103)

Henri, could you please provide a shippable try build for ESR 91.8 for QA testing? Thank you.

Shippable builds should appear at
https://treeherder.mozilla.org/#/jobs?repo=try&revision=59e8fdd30c90bae78b0b3730e4986d72eec96a82

Thanks Henri. Catalin from QA team shared back to me that "we are done testing with the ESR try build and all seems good there."

Please nominate this for ESR approval when you get a chance. Thanks for all the testing done in advance of this landing, too.

Flags: needinfo?(hsivonen)

Comment on attachment 9264813 [details]
Bug 1736570 - Avoid DocumentChannel for nsParser-created initial about:blank replacement (ESR backport).

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Without this patch, Web sites that implicitly depend on nsParser-created about:blank replacing the docshell-created about:blank soon (as in about two event loop spins) can break due to accidentally operating on the docshell-created about:blank and the the nsParser-created about:blank throwing away the changes due to the replacement happening later than expected.
  • User impact if declined: Some scripted behaviors mysteriously breaking on some sites. The concrete known case motivating uplift to ESR is the inability to enable the vacation autoresponder in Outlook Web App.
  • Fix Landed on Version: 99
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): This patch touches session history, which is always risky to change. However, the risk has been managed by QA testing of the patch and the patch having been on Nightly for almost a month already.
Flags: needinfo?(hsivonen)
Attachment #9264813 - Flags: approval-mozilla-esr91?

Comment on attachment 9264813 [details]
Bug 1736570 - Avoid DocumentChannel for nsParser-created initial about:blank replacement (ESR backport).

Fixes some important web compat issues, especially visible on enterprise apps like OWA. Thanks for all the extra testing done for this. Approved for 91.8esr.

Attachment #9264813 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+
Flags: in-testsuite+
Regressions: 1764272
Regressions: 1873068
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: