Closed Bug 1305380 Opened 8 years ago Closed 7 years ago

Crash in IPCError-browser | (msgtype=0x4E0001,name=PCookieService::Msg_GetCookieString) Processing error: message was deserializ

Categories

(Core :: Networking: Cookies, defect)

50 Branch
x86
All
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox50 --- wontfix
firefox51 --- affected
firefox52 --- unaffected
firefox53 --- unaffected

People

(Reporter: philipp, Assigned: amchung)

References

Details

(Keywords: crash, regression, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-0e31d7f8-7eca-4d88-83f5-786662160926.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
Ø 0 	ntdll.dll 	ntdll.dll@0x46bf4 	
1 	kernel32.dll 	WaitForSingleObjectExImplementation 	
2 	kernel32.dll 	WaitForSingleObject 	
3 	nss3.dll 	PR_WaitCondVar 	nsprpub/pr/src/threads/combined/prucv.c:525
4 	xul.dll 	mozilla::CondVar::Wait(unsigned int) 	obj-firefox/dist/include/mozilla/CondVar.h:79
5 	xul.dll 	mozilla::ipc::MessageChannel::WaitForSyncNotify(bool) 	ipc/glue/WindowsMessageLoop.cpp:1081
6 	xul.dll 	mozilla::ipc::MessageChannel::Send(IPC::Message*, IPC::Message*) 	ipc/glue/MessageChannel.cpp:1189
7 	xul.dll 	mozilla::net::PCookieServiceChild::SendGetCookieString(mozilla::ipc::URIParams const&, bool const&, bool const&, IPC::SerializedLoadContext const&, nsCString*) 	obj-firefox/ipc/ipdl/PCookieServiceChild.cpp:80
8 	xul.dll 	mozilla::net::CookieServiceChild::GetCookieStringInternal(nsIURI*, nsIChannel*, char**, bool) 	netwerk/cookie/CookieServiceChild.cpp:127
9 	xul.dll 	js::frontend::Parser<js::frontend::FullParseHandler>::primaryExpr(js::frontend::YieldHandling, js::frontend::TripledotHandling, js::frontend::Parser<js::frontend::FullParseHandler>::PossibleError*, js::frontend::TokenKind, js::frontend::Parser<js::frontend::FullParseHandler>::InvokedPrediction) 	js/src/frontend/Parser.cpp:9517
10 	xul.dll 	js::frontend::Parser<js::frontend::FullParseHandler>::memberExpr(js::frontend::YieldHandling, js::frontend::TripledotHandling, js::frontend::Parser<js::frontend::FullParseHandler>::PossibleError*, js::frontend::TokenKind, bool, js::frontend::Parser<js::frontend::FullParseHandler>::InvokedPrediction) 	js/src/frontend/Parser.cpp:8761

cross-platform crashes of the content process with this signatures started in firefox 50 and subsequent builds. the signature is currently making up 0.33% of content crashes in 50.0b1.
Component: General → Networking: Cookies
Amy, can you please take a look?
Flags: needinfo?(amchung)
Whiteboard: [necko-active]
Ok, I will take a look at this issue.
Flags: needinfo?(amchung)
Assignee: nobody → amchung
See Also: → 1036928
See Also: 10369281306928
Amy, any news on this one? Any suspected regressing patch(es?) Thx!
Flags: needinfo?(amchung)
Hi Dragana,
I found this issue relates by 1306928, because the cookie parent thread blocks to cause this crash.
I created two tests and try to  find the reason why cookie parent thread blocked the feedback.

test 1:
Block the feedback from parent thread.
CookieServiceParent::RecvGetCookieString()
int64_t startTimeInUsec = PR_Now();
while(1) {
    PRTime CurrentTime = PR_Now();
     PRExplodedTime explodedTime;
     PR_ExplodeTime(CurrentTime , PR_GMTParameters, &explodedTime);
     char timeStringPreset[40];
     PR_FormatTimeUSEnglish(timeStringPreset, 40, "%c GMT", &explodedTime);
     printf_stderr("AMY set time:%s \n",timeStringPreset);
     sleep(1);
     if ((CurrentTime - startTimeInUsec) >= 30*PR_USEC_PER_SEC) {
        break;
     }
  }

And print time stamp every sec until time out.

test 2:
print time stamp on every steps.
CookieServiceChild::SendGetCookieString()=>CookieServiceParent::RecvGetCookieString()

But I don’t know what kind of test can reproduce this crash, and it failed in try server on winXP building test(can’t find hg share command).
Do you have any suggestions for this crash?
Flags: needinfo?(amchung) → needinfo?(dd.mozilla)
Hi Jet,
In my view, this crash relates by 1306928, it have to found the reason why the parent thread would be to  block the requests from child thread.
Flags: needinfo?(dd.mozilla)
The first crash with this signature was on 23. July 2016 on aurora. There is no Nightly crashes. Amy, do you remember if something has changed in cookie handling on aurora around this time or on nightly 50.

Look like message times out, did we change this timeout value?
We could look what made this regression and fix bug 1306928.

:billm, did we change something for sync messages that would cause this to crash if it takes too long? Can you take a look, please! Thanks.
Flags: needinfo?(wmccloskey)
This seems to be happening because RecvGetCookieString is returning false. The full text of the error message is "Processing error: message was deserialized, but the handler returned false (indicating failure)".

It looks like we end up here:
http://searchfox.org/mozilla-central/rev/e8dd21b1673c07307068303ab3e99360c78f4d12/netwerk/cookie/CookieServiceParent.cpp#109
Flags: needinfo?(wmccloskey)
content crashes with this signature have seized with 50.0b6 and after aurora build 20161010004016
(In reply to [:philipp] from comment #9)
> content crashes with this signature have seized with 50.0b6 and after aurora
> build 20161010004016

So we're no longer hitting this crash?
Flags: needinfo?(madperson)
that "seized" should have been "ceased", sorry.

yes this crash signature is apparently gone now & 51.0a2 20161010004016 was the last build where we recorded the signature. this would be the pushlog from 50.0b5 to b6, where one of the patches apparently had the pleasant side-effect of solving this bug too: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_50_0b5_RELEASE&tochange=FIREFOX_50_0b6_RELEASE

not sure if we should just close this bug or if the necko team still wants to investigate what happened...
Flags: needinfo?(madperson)
The crash happened from 2016-07-23 to 2016-10-11, which is same as bug 1289001.  Could be same root cause.
See Also: → 1289001
Hi baku,

Could you please take a look at comment 7? It shows that this crash is due to the failure of |DeserializeURI|. From the push log in comment 11, it looks like the only thing that might be related is bug 1288997.

My guess is that something might cause |PrincipalToPrincipalInfo| to be failed during the process of serializing a URI [1]. So, the consequence will be that |PrincipalInfoToPrincipal| fail in |nsHostObjectURI::Deserialize| [2], and then the child process will be also killed.
However, after the landing of bug 1288997, this crash is magically gone.

Does my theory sound reasonable to you? If yes, do you have any idea about how to improve this?

[1] http://searchfox.org/mozilla-central/rev/703b663355467293fad148ab7c2c5ee2b878e4d9/dom/base/nsHostObjectURI.cpp#113
[2] http://searchfox.org/mozilla-central/rev/703b663355467293fad148ab7c2c5ee2b878e4d9/dom/base/nsHostObjectURI.cpp#146
Flags: needinfo?(amarchesini)
I'm going to let baku respond here.
Flags: needinfo?(mrbkap)
(For those playing along at home, baku's going to be back on Monday Oct 24)
Before push https://hg.mozilla.org/releases/mozilla-beta/rev/70abfe990978, we had a bug about how blobURLs were propagated across processes. That patch disabled the propagation if the number of content processes was 1 or 0.

Then, with the following patches I also fixed the real bug: the inputStream must be kept alive until the creation of the remote blob is completed in the parent processes.

But all my patches didn't touch PrincipalToPrincipalInfo nor nsHostObjectURI.
Flags: needinfo?(amarchesini)
In general I think it's nice to have this security check.
Attachment #8805075 - Flags: review?(mrbkap)
Comment on attachment 8805075 [details] [diff] [review]
hostProtocol.patch

Review of attachment 8805075 [details] [diff] [review]:
-----------------------------------------------------------------

I spoke with Andrea on IRC about this and I'm not a huge fan of this patch. Looking at the code, I don't see any reason that PrincipalToPrincipalInfo should fail and I'm worried that by working around that we'll be silently doing the wrong thing (possibly incorrect from a security perspective). If this is extremely severe, I guess I'd be OK with checking this in (or maybe using a null principal info or something hacky like that instead of void) as a stop-gap, but we should really understand why we're failing to serialize a principal.

Also, is this bug really Windows-only?
Attachment #8805075 - Flags: review?(mrbkap)
baku, can you provide an update here?
Flags: needinfo?(amarchesini)
We reviewed this in platform triage today. It's a very low volume crash in 50, fix-optional.
The only hits on 52 in the last 6 months are from late September when it was still on m-c. Lots of hits on 50/51, but we're quickly running out of time to do anything about them. I'm inclined to INCOMPLETE this bug (or WFM) at this point.
Unless you have an idea of what might have fixed this on 52+ for a potential backport, baku?
Hi Baku,
I found this crash which doesn't occur on 52, in your view, would we close this bug and set to WFM?

Thanks!
Yes, let's do it. and in case we can reopen it.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(amarchesini)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: