1305380 - Crash in IPCError-browser | (msgtype=0x4E0001,name=PCookieService::Msg_GetCookieString) Processing error: message was deserializ

Reporter

Description

•

9 years ago

This bug was filed from the Socorro interface and is report bp-0e31d7f8-7eca-4d88-83f5-786662160926. ============================================================= Crashing Thread (0) Frame Module Signature Source Ø 0 ntdll.dll ntdll.dll@0x46bf4 1 kernel32.dll WaitForSingleObjectExImplementation 2 kernel32.dll WaitForSingleObject 3 nss3.dll PR_WaitCondVar nsprpub/pr/src/threads/combined/prucv.c:525 4 xul.dll mozilla::CondVar::Wait(unsigned int) obj-firefox/dist/include/mozilla/CondVar.h:79 5 xul.dll mozilla::ipc::MessageChannel::WaitForSyncNotify(bool) ipc/glue/WindowsMessageLoop.cpp:1081 6 xul.dll mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) ipc/glue/MessageChannel.cpp:1189 7 xul.dll mozilla::net::PCookieServiceChild::SendGetCookieString(mozilla::ipc::URIParams const&, bool const&, bool const&, IPC::SerializedLoadContext const&, nsCString*) obj-firefox/ipc/ipdl/PCookieServiceChild.cpp:80 8 xul.dll mozilla::net::CookieServiceChild::GetCookieStringInternal(nsIURI*, nsIChannel*, char**, bool) netwerk/cookie/CookieServiceChild.cpp:127 9 xul.dll js::frontend::Parser<js::frontend::FullParseHandler>::primaryExpr(js::frontend::YieldHandling, js::frontend::TripledotHandling, js::frontend::Parser<js::frontend::FullParseHandler>::PossibleError*, js::frontend::TokenKind, js::frontend::Parser<js::frontend::FullParseHandler>::InvokedPrediction) js/src/frontend/Parser.cpp:9517 10 xul.dll js::frontend::Parser<js::frontend::FullParseHandler>::memberExpr(js::frontend::YieldHandling, js::frontend::TripledotHandling, js::frontend::Parser<js::frontend::FullParseHandler>::PossibleError*, js::frontend::TokenKind, bool, js::frontend::Parser<js::frontend::FullParseHandler>::InvokedPrediction) js/src/frontend/Parser.cpp:8761 cross-platform crashes of the content process with this signatures started in firefox 50 and subsequent builds. the signature is currently making up 0.33% of content crashes in 50.0b1.

Ciprian Muresan [:cmuresan], Formerly of Ecosystem QA - please needinfo @mcoman for things regarding the team

Updated

•

9 years ago

Component: General → Networking: Cookies

Dragana Damjanovic [:dragana]

Comment 1

•

9 years ago

Amy, can you please take a look?

Flags: needinfo?(amchung)

Whiteboard: [necko-active]

Amy Chung [:Amy]

Assignee

Comment 2

•

9 years ago

Ok, I will take a look at this issue.

Flags: needinfo?(amchung)

Amy Chung [:Amy]

Assignee

Updated

•

9 years ago

Assignee: nobody → amchung

Shian-Yow Wu [:swu]

Updated

•

9 years ago

Updated

•

9 years ago

Comment 3

•

9 years ago

Amy, any news on this one? Any suspected regressing patch(es?) Thx!

Flags: needinfo?(amchung)

Amy Chung [:Amy]

Assignee

Comment 4

•

9 years ago

Hi Dragana, I found this issue relates by 1306928, because the cookie parent thread blocks to cause this crash. I created two tests and try to find the reason why cookie parent thread blocked the feedback. test 1: Block the feedback from parent thread. CookieServiceParent::RecvGetCookieString() int64_t startTimeInUsec = PR_Now(); while(1) { PRTime CurrentTime = PR_Now(); PRExplodedTime explodedTime; PR_ExplodeTime(CurrentTime , PR_GMTParameters, &explodedTime); char timeStringPreset[40]; PR_FormatTimeUSEnglish(timeStringPreset, 40, "%c GMT", &explodedTime); printf_stderr("AMY set time:%s \n",timeStringPreset); sleep(1); if ((CurrentTime - startTimeInUsec) >= 30*PR_USEC_PER_SEC) { break; } } And print time stamp every sec until time out. test 2: print time stamp on every steps. CookieServiceChild::SendGetCookieString()=>CookieServiceParent::RecvGetCookieString() But I don’t know what kind of test can reproduce this crash, and it failed in try server on winXP building test(can’t find hg share command). Do you have any suggestions for this crash?

Flags: needinfo?(amchung) → needinfo?(dd.mozilla)

Amy Chung [:Amy]

Assignee

Comment 5

•

9 years ago

Hi Jet, In my view, this crash relates by 1306928, it have to found the reason why the parent thread would be to block the requests from child thread.

Flags: needinfo?(dd.mozilla)

Dragana Damjanovic [:dragana]

Comment 6

•

9 years ago

The first crash with this signature was on 23. July 2016 on aurora. There is no Nightly crashes. Amy, do you remember if something has changed in cookie handling on aurora around this time or on nightly 50. Look like message times out, did we change this timeout value? We could look what made this regression and fix bug 1306928. :billm, did we change something for sync messages that would cause this to crash if it takes too long? Can you take a look, please! Thanks.

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 7

•

9 years ago

This seems to be happening because RecvGetCookieString is returning false. The full text of the error message is "Processing error: message was deserialized, but the handler returned false (indicating failure)". It looks like we end up here: http://searchfox.org/mozilla-central/rev/e8dd21b1673c07307068303ab3e99360c78f4d12/netwerk/cookie/CookieServiceParent.cpp#109

Flags: needinfo?(wmccloskey)

Andrew Overholt [:overholt]

Comment 8

•

9 years ago

http://searchfox.org/mozilla-central/source/ipc/glue/URIUtils.cpp#122 could lead to http://searchfox.org/mozilla-central/rev/e8dd21b1673c07307068303ab3e99360c78f4d12/netwerk/cookie/CookieServiceParent.cpp#109 and was touched in bug 1117337. Maybe Blake has an idea?

Flags: needinfo?(mrbkap)

[:philipp]

Reporter

Comment 9

•

9 years ago

content crashes with this signature have seized with 50.0b6 and after aurora build 20161010004016

Andrew Overholt [:overholt]

Comment 10

•

9 years ago

(In reply to [:philipp] from comment #9) > content crashes with this signature have seized with 50.0b6 and after aurora > build 20161010004016 So we're no longer hitting this crash?

Flags: needinfo?(madperson)

[:philipp]

Reporter

Comment 11

•

9 years ago

that "seized" should have been "ceased", sorry. yes this crash signature is apparently gone now & 51.0a2 20161010004016 was the last build where we recorded the signature. this would be the pushlog from 50.0b5 to b6, where one of the patches apparently had the pleasant side-effect of solving this bug too: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_50_0b5_RELEASE&tochange=FIREFOX_50_0b6_RELEASE not sure if we should just close this bug or if the necko team still wants to investigate what happened...

Flags: needinfo?(madperson)

Shian-Yow Wu [:swu]

Comment 12

•

9 years ago

The crash happened from 2016-07-23 to 2016-10-11, which is same as bug 1289001. Could be same root cause.

Comment 13

•

9 years ago

Hi baku, Could you please take a look at comment 7? It shows that this crash is due to the failure of |DeserializeURI|. From the push log in comment 11, it looks like the only thing that might be related is bug 1288997. My guess is that something might cause |PrincipalToPrincipalInfo| to be failed during the process of serializing a URI [1]. So, the consequence will be that |PrincipalInfoToPrincipal| fail in |nsHostObjectURI::Deserialize| [2], and then the child process will be also killed. However, after the landing of bug 1288997, this crash is magically gone. Does my theory sound reasonable to you? If yes, do you have any idea about how to improve this? [1] http://searchfox.org/mozilla-central/rev/703b663355467293fad148ab7c2c5ee2b878e4d9/dom/base/nsHostObjectURI.cpp#113 [2] http://searchfox.org/mozilla-central/rev/703b663355467293fad148ab7c2c5ee2b878e4d9/dom/base/nsHostObjectURI.cpp#146

Flags: needinfo?(amarchesini)

Blake Kaplan (:mrbkap) (inactive)

Comment 14

•

9 years ago

I'm going to let baku respond here.

Flags: needinfo?(mrbkap)

Andrew Overholt [:overholt]

Comment 15

•

9 years ago

(For those playing along at home, baku's going to be back on Monday Oct 24)

Andrea Marchesini [:baku]

Comment 16

•

9 years ago

Before push https://hg.mozilla.org/releases/mozilla-beta/rev/70abfe990978, we had a bug about how blobURLs were propagated across processes. That patch disabled the propagation if the number of content processes was 1 or 0. Then, with the following patches I also fixed the real bug: the inputStream must be kept alive until the creation of the remote blob is completed in the parent processes. But all my patches didn't touch PrincipalToPrincipalInfo nor nsHostObjectURI.

Flags: needinfo?(amarchesini)

Andrea Marchesini [:baku]

Comment 17

•

9 years ago

Attached patch hostProtocol.patch — Details — Splinter Review

In general I think it's nice to have this security check.

Attachment #8805075 - Flags: review?(mrbkap)

Blake Kaplan (:mrbkap) (inactive)

Comment 18

•

9 years ago

Comment on attachment 8805075 [details] [diff] [review] hostProtocol.patch Review of attachment 8805075 [details] [diff] [review]: ----------------------------------------------------------------- I spoke with Andrea on IRC about this and I'm not a huge fan of this patch. Looking at the code, I don't see any reason that PrincipalToPrincipalInfo should fail and I'm worried that by working around that we'll be silently doing the wrong thing (possibly incorrect from a security perspective). If this is extremely severe, I guess I'd be OK with checking this in (or maybe using a null principal info or something hacky like that instead of void) as a stop-gap, but we should really understand why we're failing to serialize a principal. Also, is this bug really Windows-only?

Attachment #8805075 - Flags: review?(mrbkap)

Andrew Overholt [:overholt]

Comment 19

•

9 years ago

baku, can you provide an update here?

Flags: needinfo?(amarchesini)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 20

•

9 years ago

We reviewed this in platform triage today. It's a very low volume crash in 50, fix-optional.

status-firefox50: affected → fix-optional

Ryan VanderMeulen [:RyanVM]

Comment 21

•

9 years ago

The only hits on 52 in the last 6 months are from late September when it was still on m-c. Lots of hits on 50/51, but we're quickly running out of time to do anything about them. I'm inclined to INCOMPLETE this bug (or WFM) at this point.

status-firefox50: fix-optional → wontfix

status-firefox52: affected → unaffected

status-firefox53: --- → unaffected

Ryan VanderMeulen [:RyanVM]

Comment 22

•

9 years ago

Unless you have an idea of what might have fixed this on 52+ for a potential backport, baku?

Amy Chung [:Amy]

Assignee

Comment 23

•

9 years ago

Hi Baku, I found this crash which doesn't occur on 52, in your view, would we close this bug and set to WFM? Thanks!

Andrea Marchesini [:baku]

Comment 24

•

9 years ago

Yes, let's do it. and in case we can reopen it.

Status: NEW → RESOLVED

Closed: 9 years ago

Flags: needinfo?(amarchesini)

Resolution: --- → WORKSFORME