Closed Bug 1574017 Opened 5 years ago Closed 5 years ago

postMessage() to a cross-origin parent without passing "*" as the second argument crashes the tab in Fission

Categories

(Core :: DOM: Core & HTML, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla74
Fission Milestone M5
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- disabled
firefox73 --- disabled
firefox74 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: annyG)

References

Details

(Keywords: crash, testcase)

Crash Data

Attachments

(1 file)

STR:
Run the following code in a cross-origin tab under Fission: parent.postMessage("foo");.

We crash here because callerDocumentURI is nullptr.

This file appears to have undergone a Fission-related renaming without the follow-up. Is there tracking for the follow-up other than the 'fission' top-level meta bug?

Flags: needinfo?(kvijayan)
Priority: -- → P2

No, I haven't looked at this one yet. It should show up on my list later down the line. Still dealing with main docshell-related code for now.

Flags: needinfo?(kvijayan)
Blocks: fission-dogfooding
No longer blocks: fission

Tentatively moving all bugs whose summaries mention "Fission" (or other Fission-related keywords) but are not assigned to a Fission Milestone to the "?" triage milestone.

This will generate a lot of bugmail, so you can filter your bugmail for the following UUID and delete them en masse:

0ee3c76a-bc79-4eb2-8d12-05dc0b68e732

Fission Milestone: --- → ?
Fission Milestone: ? → M5
Keywords: testcase

I tried reproducing this by going on a website (google.com in my case), and executing window.open("https://en.wikipedia.org/wiki/Cymbidium_ensifolium"); in the console and in the newly opened tab's console executing parent.postMessage("foo");, but I could not reproduce the crash. There is a possibility that I am not following the steps correctly, can someone please confirm that I am doing the right thing?

Can you still reproduce this, Ehsan? Thanks.

Flags: needinfo?(ehsan)

I talked to Ehsan and here is what I found - I was going all wrong about this. In the case above I was posting a message to the window itself (I conflated the terms opener and parent), but what I should be doing is sending a message from a cross origin iframe to its parent. I tried doing it via devtools console (by going to page https://annygakh.github.io/bug1574017test/ and sending a message from the iframe to the parent, but that did not work. Ehsan suspects that it is possible that when we post a message via console, the code path might be different.

In Ehsan's case, the code was crashing here https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/PostMessageEvent.cpp#146 because inside of NS_GetSanitizedURIStringFromURI we call aURI->GetSpec(...) when aURI is null.

When we called NS_GetSanitizedURIStringFromURI we passed callerDocumentURI to it, which was created here https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/PostMessageEvent.cpp#68 via mCallerDocumentURI. mCallerDocumentURI is only set once, when we are creating PostMessageEvent for the first time. Let us now follow some paths that create a new PostMessageEvent.

One of them is here nsGlobalWindowOuter::PostMessageMozOuter -https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/nsGlobalWindowOuter.cpp#6048. Earlier in function nsGlobalWindowOuter::PostMessageMozOuter, we pass callerDocumentURI to GatherPostMessageData to fill it out.

In nsGlobalWindowOuter::GatherPostMessageData, because of Fission, we don't have access to caller's inner window, and therefore we never end up taking code paths that would set aCallerDocumentURI and therefore it remains null.

Tomorrow I will try crafting the test case to trigger this and figuring out how to fix this.

Assignee: nobody → agakhokidze
Status: NEW → ASSIGNED
Flags: needinfo?(ehsan)

I talked about this with Anny in person. I couldn't remember how I reproduced this before but based on code inspection it appeared the bug is still present.

That being said I tried to create a minimized test case and that did not reproduce the bug, so it could be that I'm wrong about that and the bug doesn't happen any more... I don't see an error generated with Fission enabled but there is no crash either; so maybe the nature of the bug has changed a bit.

(In reply to Anny Gakhokidze [:annyG] from comment #6)

In nsGlobalWindowOuter::GatherPostMessageData, because of Fission, we don't have access to caller's inner window

After looking at the code more closely I'm not too sure if I was right about this BTW, it looks like we're just trying to obtain the subject principal's inner window so in theory that should work. The test case in the previous comment should be helpful for debugging what's going on...

If I can get the code to take the following path https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/nsGlobalWindowOuter.cpp#5859-5863 when sending a post message, maybe this will replicate the bug.

The comment here https://searchfox.org/mozilla-central/rev/923eec8d2fb8078ebc7a33a9e1ce73eac01f7446/dom/base/nsGlobalWindowOuter.cpp#5859-5860 gave me an idea of creating a test case like so https://wandering-ziconium.glitch.me [1]. I was hoping that when we clicked on the post message button, we wouldn't be able to get an inner window because we have a sandbox, but it appears I am wrong and we can't send a post message in this case (I guess because the parent iframe is in the sandbox?).

[1]
main page, https://wandering-ziconium.glitch.me :

<html>
  <body>
    <iframe src="https://silken-skipjack.glitch.me/" sandbox>
    </iframe>
  </body>
</html>

https://silken-skipjack.glitch.me/:

<html>
  <body style="background-color:powderblue;">
    <iframe src="https://humane-cicada.glitch.me/">
    </iframe>
  </body>
</html>

https://humane-cicada.glitch.me/:

<html>
  <body style="background-color:green">
    <button id="postmessagebutton" type="button">
      Post message to a parent
    </button>
    <script>
      document
        .getElementById("postmessagebutton")
        .addEventListener("click", function() {
          console.log("are we parent?");
          console.log(parent === window);
          parent.postMessage("foo");
          console.log("posted a message to the parent");
        });
    </script>
  </body>
</html>

Aha, perhaps using a sandbox is the key to reproducing this, great idea! Sandbox attribute by default removes scripting capabilities, so you probably want sandbox="allow-scripts" instead.

However when I modified my test case as such it seemed like the sandboxed iframe got kicked into the same content process as its parent, so we were no longer an out-of-process frame. Perhaps this needs a couple of levels of nesting, for example foo.com > bar.com > sandboxed-iframe?

I'm not sure how several levels of nesting would help actually... If we are sending a postMessage from sandboxed-iframe to bar.com, it will end up being in the same content process as its parent (as you mentioned, and I too have observed), but if we are sending a message from bar.com to foo.com, why do we need to have a sandboxed-iframe?

Anywhoo, I tried this with the following setup:

On linux, I edited my /etc/hosts file to look like this [1] so that I can easily get pages 'served' from different domains.
[1]

0.0.0.0 www.a.com
0.0.0.0 www.b.com
0.0.0.0 www.c.com

Then, I created ./dir/a/index.html[2], ./dir/b/index.html[3] and ./dir/c/index.html[4] and served them all from different ports, such that they can be accessed via the following links http://www.a.com:8001, http://www.b.com:8002 and http://www.a.com:8003.

[2] a.html

<html>
  <body>
    <iframe src="http://www.b.com:8002">
    </iframe>
  </body>
</html>

[3] b.html

<html>
  <body style="background-color:powderblue;">
    <iframe src="http://www.c.com:8003/" sandbox="allow-scripts">
    </iframe>
  </body>
</html>

[4] c.html

<!DOCTYPE html>
<html>
  <body style="background-color:green">
    <button id="postmessagebutton" type="button">
      Post message to a parent
    </button>
    <script>
      document
        .getElementById("postmessagebutton")
        .addEventListener("click", function() {
          console.log("are we parent?");
          console.log(parent === window);
          parent.postMessage("foo");
          console.log("posted a message to the parent");
        });
    </script>
  </body>
</html>

When I went to www.a.com:8001/ and waited for the page to load and then posted a message from c.com frame to b.com frame, (un)fortunately the code did not take the desired codepath... Perhaps my iframes are setup incorrectly?

Kris suggested[5] that I try to write a web-extensions test, which I will pursue next..

[5] https://mozilla.logbot.info/domfission/20191213#c16815812-c16815988

(In reply to Anny Gakhokidze [:annyG] from comment #11)

I'm not sure how several levels of nesting would help actually... If we are sending a postMessage from sandboxed-iframe to bar.com, it will end up being in the same content process as its parent (as you mentioned, and I too have observed), but if we are sending a message from bar.com to foo.com, why do we need to have a sandboxed-iframe?

Hmm, yeah now that I think about it again, what I said before doesn't make sense since the sandboxed iframe would still be in-process. (I had the exact setup you replicated in mind, so your test also demonstrates that I wasn't thinking clearly before.) Sorry about that!

I have come up with a following test case to crash the content process

<!DOCTYPE HTML>
<html>
<head>
  <meta charset="utf-8">
  <title>Test for Bug 1574017</title>
  <script src="/tests/SimpleTest/SimpleTest.js"></script>
  <link rel="stylesheet" type="text/css" href="/tests/SimpleTest/test.css"/>
  <script type="application/javascript">
  add_task(async () => {
    var Cu = SpecialPowers.Cu;
    var testDone = {};
    testDone.promise = new Promise(resolve => { 
        testDone.resolve = resolve;
    });
    var princ = SpecialPowers.wrap(window.document).nodePrincipal;
    var sandbox = Cu.Sandbox(princ, { sameZoneAs: this});
    sandbox.win = window.frames.diffDomain;
    function receiveMessage(event) {
        testDone.resolve();
    }
    window.addEventListener("message", receiveMessage, false);

    Cu.evalInSandbox('win.postMessage("foo");', sandbox);
    dump("ANNY -- waiting for promise to resolve\n");
    await testDone.promise;
    dump("ANNY -- we finished waiting\n");
    ok(true, "great");
    Cu.nukeSandbox(sandbox);    
  });
  </script>
</head>
<body>
<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1574017">Mozilla Bug 1574017</a>
<p id="display"></p>
<div id="content" style="display: none">
</div>
<pre id="test">
</pre>
<iframe id="diffDomain" name="diffDomain" src="https://example.org/tests/dom/tests/mochitest/bugs/file_postmessage.html"></iframe>
</body>
</html>
<html>
<head>
<meta charset=UTF-8>
</head>
<body>
    <script>
    function receiveMessage(event) {
        top.postMessage("foo","*");
    }
    window.addEventListener("message", receiveMessage, false);
    </script>
</body>
</html>

In a non crashing case, the test would time out as posting a message to a cross origin iframe is not allowed, but the test is currently failing because of a content process crashing. I am thinking maybe it is enough that here https://searchfox.org/mozilla-central/rev/6305f6935f496b3a302c7afcc579399a4217729c/dom/base/PostMessageEvent.cpp#145-146 we assign providedOrigin to uriSpec if we don't have callerDocumentURI but I don't really know what caller document's URI is supposed to represent and how it differs from providedOrigin.

Hi Boris,

I hear you have been working a lot on error messages and trying to make them the most useful, so I have a question for you :)

In the above comment I have a test case, which causes a crash here https://searchfox.org/mozilla-central/rev/b243debf6235b050b42fd2eb615fdc729636ca6b/dom/base/PostMessageEvent.cpp#146 . I thought of using providedOrigin instead, but peterv suggested that when I create a PostMessageEvent here https://searchfox.org/mozilla-central/rev/6305f6935f496b3a302c7afcc579399a4217729c/dom/ipc/ContentChild.cpp#4107-4109 I should save the caller's principal as well, and in PostMessageEvent::Run if the caller's document URI is null, extract a URI from caller's principal and use that for the error message. However, this means that if we have a system principal, we won't have a URI and I would pass an empty string instead. Do you have a better idea for this or can I go ahead with this?

Flags: needinfo?(bzbarsky)

OK, so a few things:

  1. In nsGlobalWindowOuter::GatherPostMessageData if callerInnerWin is null then we will end up with no window id and no document URI, right? That's what I assume the sandbox case comment 13 describes triggers, and I would expect this to not depend on fission in any way.

  2. In the fission case, it looks like we are not sending the source windowid along. Why is that? Yes, it references a window in a different process, but if we want devtools to put the error in the right place we still need it, no? We just need to make sure to not use it for anything other than passing to error reporting. Have we checked how the error reporting here works if we're doing a cross-process postMessage with fission and the principals don't match? I expect this needs to be fixed independently of this crash bug.

  3. I don't see an obvious way that callerDocumentURI can be null when we have (or could have, in the fission case) a caller window id. Am I missing anything there? If not, then the only case we need to worry about here is the "caller is not a window" case. In that case, yeah, I guess the best we can do is get a URI from the caller principal. I suggest using GetScriptLocation on the BasePrincipal for that; it might work better than just getting the URI, including providing a useful string in the system-principal case...

Flags: needinfo?(bzbarsky)
  1. From my observations, when I run this test without Fission, the lack of an inner window does not lead to us having no window id. We don't have caller document URI, but we do have a window id and we end up in this case https://searchfox.org/mozilla-central/rev/b243debf6235b050b42fd2eb615fdc729636ca6b/dom/base/PostMessageEvent.cpp#141-143

  2. I assumed the lack of window ID in fission case was done by design after reading the comment here https://searchfox.org/mozilla-central/rev/b243debf6235b050b42fd2eb615fdc729636ca6b/dom/base/PostMessageEvent.h#47-55 . I will look into this and see if its possible to send source windowid along.

Have we checked how the error reporting here works if we're doing a cross-process postMessage with fission and the principals don't match?

Quickly modifying my test case to pass https://example.com as the second argument to postMessage yields in a crash due to the same reason. So this means it will be fixed as a result of this bug.

I don't see an obvious way that callerDocumentURI can be null when we have (or could have, in the fission case) a caller window id.

I think I addressed this above in 1).

Flags: needinfo?(bzbarsky)

From my observations, when I run this test without Fission, the lack of an inner window does not lead to us having no window id

Oh, I see. In that case the windowid is not Nothing, but is set to 0, so we take the "report with window id" codepath but just don't provide a useful one... Alright, then.

I assumed the lack of window ID in fission case was done by design

It was, but I'm not sure that was the right thing to do. Check with Peter, just in case?

Quickly modifying my test case to pass https://example.com as the second argument to postMessage

No, I meant a testcase where the sending code is in a window, so we do have a document URI but don't have a window id, and end up reporting via the "no window id" codepath. I expect that doesn't crash, but also doesn't report it to the right web console...

Flags: needinfo?(bzbarsky)
  1. Rename mCallerDocumentURI field of PostMessageEvent to mCallerURI as we
    might not always have a document, e.g. when we have a sandbox.
  2. When we neither have caller's inner window (and thus we have no caller's
    window ID) nor caller's URI, use script location as a source name for the error
    object when reporting errors to console.
  3. When the caller of postMessage has its window in a different process, pass along
    the caller's window ID so that the error can be reported correctly.

Adding crash signature from duplicate bug 1608746.

Raising bug priority to P1 because this is a reproducible Fission crash. STR: open https://onedrive.live.com/about/en-us/plans/ in multiple tabs with Fission enabled.

Crash Signature: [@ NS_GetSanitizedURIStringFromURI ]
Priority: P2 → P1
Pushed by agakhokidze@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/229b1e761a5f
Fix console error reporting for postMessage, r=peterv
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla74

Hello Chris,

Is QA needed to verify this issue? We have the required steps in Comment 20.

Flags: needinfo?(cpeterson)

(In reply to Vlad Lucaci, QA (:vlucaci) from comment #23)

Is QA needed to verify this issue? We have the required steps in Comment 20.

Fission is not enabled by default, so I don't think verifying this fix is a high priority. We have easy STR, so Anny was presumably able verify her fix adequately. She also add a new test for the fix.

Flags: needinfo?(cpeterson)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: