1574017 - postMessage() to a cross-origin parent without passing "*" as the second argument crashes the tab in Fission

(no longer active)

Reporter

Description

•

5 years ago

STR:
Run the following code in a cross-origin tab under Fission: parent.postMessage("foo");.

We crash here because callerDocumentURI is nullptr.

Henri Sivonen (:hsivonen)

Comment 1

•

5 years ago

This file appears to have undergone a Fission-related renaming without the follow-up. Is there tracking for the follow-up other than the 'fission' top-level meta bug?

Flags: needinfo?(kvijayan)

Priority: -- → P2

Kannan Vijayan [:djvj]

Comment 2

•

5 years ago

No, I haven't looked at this one yet. It should show up on my list later down the line. Still dealing with main docshell-related code for now.

Flags: needinfo?(kvijayan)

Neha Kochar [:neha]

Updated

•

5 years ago

Blocks: fission-dogfooding
No longer blocks: fission

Chris Peterson [:cpeterson]

Comment 3

•

5 years ago

Tentatively moving all bugs whose summaries mention "Fission" (or other Fission-related keywords) but are not assigned to a Fission Milestone to the "?" triage milestone.

This will generate a lot of bugmail, so you can filter your bugmail for the following UUID and delete them en masse:

0ee3c76a-bc79-4eb2-8d12-05dc0b68e732

Fission Milestone: --- → ?

Neha Kochar [:neha]

Updated

•

5 years ago

Fission Milestone: ? → M5

Andrew McCreight [:mccr8]

Updated

•

5 years ago

Keywords: testcase

Anny Gakhokidze [:annyG]

Assignee

Comment 4

•

5 years ago

I tried reproducing this by going on a website (google.com in my case), and executing window.open("https://en.wikipedia.org/wiki/Cymbidium_ensifolium"); in the console and in the newly opened tab's console executing parent.postMessage("foo");, but I could not reproduce the crash. There is a possibility that I am not following the steps correctly, can someone please confirm that I am doing the right thing?

Andrew McCreight [:mccr8]

Comment 5

•

5 years ago

Can you still reproduce this, Ehsan? Thanks.

Flags: needinfo?(ehsan)

Anny Gakhokidze [:annyG]

Assignee

Comment 6

•

5 years ago

I talked to Ehsan and here is what I found - I was going all wrong about this. In the case above I was posting a message to the window itself (I conflated the terms opener and parent), but what I should be doing is sending a message from a cross origin iframe to its parent. I tried doing it via devtools console (by going to page https://annygakh.github.io/bug1574017test/ and sending a message from the iframe to the parent, but that did not work. Ehsan suspects that it is possible that when we post a message via console, the code path might be different.

In Ehsan's case, the code was crashing here https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/PostMessageEvent.cpp#146 because inside of NS_GetSanitizedURIStringFromURI we call aURI->GetSpec(...) when aURI is null.

When we called NS_GetSanitizedURIStringFromURI we passed callerDocumentURI to it, which was created here https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/PostMessageEvent.cpp#68 via mCallerDocumentURI. mCallerDocumentURI is only set once, when we are creating PostMessageEvent for the first time. Let us now follow some paths that create a new PostMessageEvent.

One of them is here nsGlobalWindowOuter::PostMessageMozOuter -https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/nsGlobalWindowOuter.cpp#6048. Earlier in function nsGlobalWindowOuter::PostMessageMozOuter, we pass callerDocumentURI to GatherPostMessageData to fill it out.

In nsGlobalWindowOuter::GatherPostMessageData, because of Fission, we don't have access to caller's inner window, and therefore we never end up taking code paths that would set aCallerDocumentURI and therefore it remains null.

Tomorrow I will try crafting the test case to trigger this and figuring out how to fix this.

Assignee: nobody → agakhokidze

Status: NEW → ASSIGNED

Flags: needinfo?(ehsan)

(no longer active)

Reporter

Comment 7

•

5 years ago

I talked about this with Anny in person. I couldn't remember how I reproduced this before but based on code inspection it appeared the bug is still present.

That being said I tried to create a minimized test case and that did not reproduce the bug, so it could be that I'm wrong about that and the bug doesn't happen any more... I don't see an error generated with Fission enabled but there is no crash either; so maybe the nature of the bug has changed a bit.

(no longer active)

Reporter

Comment 8

•

5 years ago

(In reply to Anny Gakhokidze [:annyG] from comment #6)

In nsGlobalWindowOuter::GatherPostMessageData, because of Fission, we don't have access to caller's inner window

After looking at the code more closely I'm not too sure if I was right about this BTW, it looks like we're just trying to obtain the subject principal's inner window so in theory that should work. The test case in the previous comment should be helpful for debugging what's going on...

Anny Gakhokidze [:annyG]

Assignee

Comment 9

•

5 years ago

•

Edited

If I can get the code to take the following path https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/nsGlobalWindowOuter.cpp#5859-5863 when sending a post message, maybe this will replicate the bug.

The comment here https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/nsGlobalWindowOuter.cpp#5859-5860 gave me an idea of creating a test case like so https://wandering-ziconium.glitch.me [1]. I was hoping that when we clicked on the post message button, we wouldn't be able to get an inner window because we have a sandbox, but it appears I am wrong and we can't send a post message in this case (I guess because the parent iframe is in the sandbox?).

[1]
main page, https://wandering-ziconium.glitch.me :

<html>
  <body>
    <iframe src="https://silken-skipjack.glitch.me/" sandbox>
    </iframe>
  </body>
</html>

https://silken-skipjack.glitch.me/:

<html>
  <body style="background-color:powderblue;">
    <iframe src="https://humane-cicada.glitch.me/">
    </iframe>
  </body>
</html>

https://humane-cicada.glitch.me/:

<html>
  <body style="background-color:green">
    <button id="postmessagebutton" type="button">
      Post message to a parent
    </button>
    <script>
      document
        .getElementById("postmessagebutton")
        .addEventListener("click", function() {
          console.log("are we parent?");
          console.log(parent === window);
          parent.postMessage("foo");
          console.log("posted a message to the parent");
        });
    </script>
  </body>
</html>

(no longer active)

Reporter

Comment 10

•

5 years ago

Aha, perhaps using a sandbox is the key to reproducing this, great idea! Sandbox attribute by default removes scripting capabilities, so you probably want sandbox="allow-scripts" instead.

However when I modified my test case as such it seemed like the sandboxed iframe got kicked into the same content process as its parent, so we were no longer an out-of-process frame. Perhaps this needs a couple of levels of nesting, for example foo.com > bar.com > sandboxed-iframe?

Anny Gakhokidze [:annyG]

Assignee

Comment 11

•

5 years ago

I'm not sure how several levels of nesting would help actually... If we are sending a postMessage from sandboxed-iframe to bar.com, it will end up being in the same content process as its parent (as you mentioned, and I too have observed), but if we are sending a message from bar.com to foo.com, why do we need to have a sandboxed-iframe?

Anywhoo, I tried this with the following setup:

On linux, I edited my /etc/hosts file to look like this [1] so that I can easily get pages 'served' from different domains.
[1]

0.0.0.0 www.a.com
0.0.0.0 www.b.com
0.0.0.0 www.c.com

Then, I created ./dir/a/index.html[2], ./dir/b/index.html[3] and ./dir/c/index.html[4] and served them all from different ports, such that they can be accessed via the following links http://www.a.com:8001, http://www.b.com:8002 and http://www.a.com:8003.

[2] a.html

<html>
  <body>
    <iframe src="http://www.b.com:8002">
    </iframe>
  </body>
</html>

[3] b.html

<html>
  <body style="background-color:powderblue;">
    <iframe src="http://www.c.com:8003/" sandbox="allow-scripts">
    </iframe>
  </body>
</html>

[4] c.html

<!DOCTYPE html>
<html>
  <body style="background-color:green">
    <button id="postmessagebutton" type="button">
      Post message to a parent
    </button>
    <script>
      document
        .getElementById("postmessagebutton")
        .addEventListener("click", function() {
          console.log("are we parent?");
          console.log(parent === window);
          parent.postMessage("foo");
          console.log("posted a message to the parent");
        });
    </script>
  </body>
</html>

When I went to www.a.com:8001/ and waited for the page to load and then posted a message from c.com frame to b.com frame, (un)fortunately the code did not take the desired codepath... Perhaps my iframes are setup incorrectly?

Kris suggested[5] that I try to write a web-extensions test, which I will pursue next..

[5] https://mozilla.logbot.info/domfission/20191213#c16815812-c16815988

(no longer active)

Reporter

Comment 12

•

5 years ago

(In reply to Anny Gakhokidze [:annyG] from comment #11)

I'm not sure how several levels of nesting would help actually... If we are sending a postMessage from sandboxed-iframe to bar.com, it will end up being in the same content process as its parent (as you mentioned, and I too have observed), but if we are sending a message from bar.com to foo.com, why do we need to have a sandboxed-iframe?

Hmm, yeah now that I think about it again, what I said before doesn't make sense since the sandboxed iframe would still be in-process. (I had the exact setup you replicated in mind, so your test also demonstrates that I wasn't thinking clearly before.) Sorry about that!

Anny Gakhokidze [:annyG]

Assignee

Comment 13

•

5 years ago

I have come up with a following test case to crash the content process

<!DOCTYPE HTML>
<html>
<head>
  <meta charset="utf-8">
  <title>Test for Bug 1574017</title>
  <script src="/tests/SimpleTest/SimpleTest.js"></script>
  <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css"/>
  <script type="application/javascript">
  add_task(async () => {
    var Cu = SpecialPowers.Cu;
    var testDone = {};
    testDone.promise = new Promise(resolve => { 
        testDone.resolve = resolve;
    });
    var princ = SpecialPowers.wrap(window.document).nodePrincipal;
    var sandbox = Cu.Sandbox(princ, { sameZoneAs: this});
    sandbox.win = window.frames.diffDomain;
    function receiveMessage(event) {
        testDone.resolve();
    }
    window.addEventListener("message", receiveMessage, false);

    Cu.evalInSandbox('win.postMessage("foo");', sandbox);
    dump("ANNY -- waiting for promise to resolve\n");
    await testDone.promise;
    dump("ANNY -- we finished waiting\n");
    ok(true, "great");
    Cu.nukeSandbox(sandbox);    
  });
  </script>
</head>
<body>
<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1574017">Mozilla Bug 1574017</a>
<p id="display"></p>
<div id="content" style="display: none">
</div>
<pre id="test">
</pre>
<iframe id="diffDomain" name="diffDomain" src="https://example.org/tests/dom/tests/mochitest/bugs/file_postmessage.html"></iframe>
</body>
</html>

<html>
<head>
<meta charset=UTF-8>
</head>
<body>
    <script>
    function receiveMessage(event) {
        top.postMessage("foo","*");
    }
    window.addEventListener("message", receiveMessage, false);
    </script>
</body>
</html>

In a non crashing case, the test would time out as posting a message to a cross origin iframe is not allowed, but the test is currently failing because of a content process crashing. I am thinking maybe it is enough that here https://searchfox.org/mozilla-central/rev/6305f6935f496b3a302c7afcc579399a4217729c/dom/base/PostMessageEvent.cpp#145-146 we assign providedOrigin to uriSpec if we don't have callerDocumentURI but I don't really know what caller document's URI is supposed to represent and how it differs from providedOrigin.

Anny Gakhokidze [:annyG]

Assignee

Comment 14

•

5 years ago

Hi Boris,

I hear you have been working a lot on error messages and trying to make them the most useful, so I have a question for you :)

In the above comment I have a test case, which causes a crash here https://searchfox.org/mozilla-central/rev/b243debf6235b050b42fd2eb615fdc729636ca6b/dom/base/PostMessageEvent.cpp#146 . I thought of using providedOrigin instead, but peterv suggested that when I create a PostMessageEvent here https://searchfox.org/mozilla-central/rev/6305f6935f496b3a302c7afcc579399a4217729c/dom/ipc/ContentChild.cpp#4107-4109 I should save the caller's principal as well, and in PostMessageEvent::Run if the caller's document URI is null, extract a URI from caller's principal and use that for the error message. However, this means that if we have a system principal, we won't have a URI and I would pass an empty string instead. Do you have a better idea for this or can I go ahead with this?

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 15

•

5 years ago

OK, so a few things:

In nsGlobalWindowOuter::GatherPostMessageData if callerInnerWin is null then we will end up with no window id and no document URI, right? That's what I assume the sandbox case comment 13 describes triggers, and I would expect this to not depend on fission in any way.
In the fission case, it looks like we are not sending the source windowid along. Why is that? Yes, it references a window in a different process, but if we want devtools to put the error in the right place we still need it, no? We just need to make sure to not use it for anything other than passing to error reporting. Have we checked how the error reporting here works if we're doing a cross-process postMessage with fission and the principals don't match? I expect this needs to be fixed independently of this crash bug.
I don't see an obvious way that callerDocumentURI can be null when we have (or could have, in the fission case) a caller window id. Am I missing anything there? If not, then the only case we need to worry about here is the "caller is not a window" case. In that case, yeah, I guess the best we can do is get a URI from the caller principal. I suggest using GetScriptLocation on the BasePrincipal for that; it might work better than just getting the URI, including providing a useful string in the system-principal case...

Flags: needinfo?(bzbarsky)

Anny Gakhokidze [:annyG]

Assignee

Comment 16

•

5 years ago

From my observations, when I run this test without Fission, the lack of an inner window does not lead to us having no window id. We don't have caller document URI, but we do have a window id and we end up in this case https://searchfox.org/mozilla-central/rev/b243debf6235b050b42fd2eb615fdc729636ca6b/dom/base/PostMessageEvent.cpp#141-143
I assumed the lack of window ID in fission case was done by design after reading the comment here https://searchfox.org/mozilla-central/rev/b243debf6235b050b42fd2eb615fdc729636ca6b/dom/base/PostMessageEvent.h#47-55 . I will look into this and see if its possible to send source windowid along.

Have we checked how the error reporting here works if we're doing a cross-process postMessage with fission and the principals don't match?

Quickly modifying my test case to pass https://example.com as the second argument to postMessage yields in a crash due to the same reason. So this means it will be fixed as a result of this bug.

I don't see an obvious way that callerDocumentURI can be null when we have (or could have, in the fission case) a caller window id.

I think I addressed this above in 1).

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 17

•

5 years ago

From my observations, when I run this test without Fission, the lack of an inner window does not lead to us having no window id

Oh, I see. In that case the windowid is not Nothing, but is set to 0, so we take the "report with window id" codepath but just don't provide a useful one... Alright, then.

I assumed the lack of window ID in fission case was done by design

It was, but I'm not sure that was the right thing to do. Check with Peter, just in case?

Quickly modifying my test case to pass https://example.com as the second argument to postMessage

No, I meant a testcase where the sending code is in a window, so we do have a document URI but don't have a window id, and end up reporting via the "no window id" codepath. I expect that doesn't crash, but also doesn't report it to the right web console...

Flags: needinfo?(bzbarsky)

Anny Gakhokidze [:annyG]

Assignee

Comment 18

•

5 years ago

Attached file Bug 1574017 - Fix console error reporting for postMessage, r=peterv — Details

Rename mCallerDocumentURI field of PostMessageEvent to mCallerURI as we
might not always have a document, e.g. when we have a sandbox.
When we neither have caller's inner window (and thus we have no caller's
window ID) nor caller's URI, use script location as a source name for the error
object when reporting errors to console.
When the caller of postMessage has its window in a different process, pass along
the caller's window ID so that the error can be reported correctly.

Chris Peterson [:cpeterson]

Comment 20

•

5 years ago

Adding crash signature from duplicate bug 1608746.

Raising bug priority to P1 because this is a reproducible Fission crash. STR: open https://onedrive.live.com/about/en-us/plans/ in multiple tabs with Fission enabled.

Crash Signature: [@ NS_GetSanitizedURIStringFromURI ]

status-firefox74: --- → affected

Priority: P2 → P1

Pulsebot

Comment 21

•

5 years ago

Pushed by agakhokidze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/229b1e761a5f
Fix console error reporting for postMessage, r=peterv

Razvan Maries

Comment 22

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/229b1e761a5f

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox74: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla74

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

5 years ago

status-firefox72: --- → disabled

status-firefox73: --- → disabled

status-firefox-esr68: --- → unaffected

Vlad Lucaci, Thunderbird QA (:vlucaci)

Comment 23

•

4 years ago

Hello Chris,

Is QA needed to verify this issue? We have the required steps in Comment 20.

Flags: needinfo?(cpeterson)

Chris Peterson [:cpeterson]

Comment 24

•

4 years ago

(In reply to Vlad Lucaci, QA (:vlucaci) from comment #23)

Is QA needed to verify this issue? We have the required steps in Comment 20.

Fission is not enabled by default, so I don't think verifying this fix is a high priority. We have easy STR, so Anny was presumably able verify her fix adequately. She also add a new test for the fix.

Flags: needinfo?(cpeterson)