Closed Bug 881996 Opened 13 years ago Closed 12 years ago

fatal assert in nsPrincipal::GetAppId()

Categories

(Core :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: Dolske, Assigned: bzbarsky)

References

Details

Attachments

(3 files, 1 obsolete file)

My patch in bug 839961 is a rather mundane refactoring of some login manager code. Most of it just moving some code from an xpcom component to a JSM, and adding a couple of event listeners in a frame script. Unfortunately it is reliably causing a bizarre assert in test_printpreview_bug396024.xul, part of mochitest-chrome / M(oth), on Windows and Ubuntu debug builds. Oddly OS X seems to work fine (no print preview there, I guess?). https://tbpl.mozilla.org/?tree=Try&rev=70157d046e35 From the log: INFO - Assertion failure: false, at ../../../caps/src/nsPrincipal.cpp:531 TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/layout/base/tests/chrome/test_printpreview_bug396024.xul | Exited with code 11 during test run Here's the stack (edited for readability): - Thread 0 0 libxul.so!nsPrincipal::GetAppId 1 libxul.so!nsPermissionManager::CommonTestPermission 2 libxul.so!nsComponentManagerImpl::GetServiceByContractID 3 libplds4.so!PL_HashTableLookup [plhash.c:70157d046e35 : 333 + 0xc] 4 libxul.so!nsContentUtils::IsSystemPrincipal 5 libxul.so!nsPermissionManager::TestPermissionFromPrincipal 6 libxul.so!TestSitePerm [nsContentUtils.cpp:70157d046e35 : 2895 + 0x18] 7 libxul.so!nsPrincipal::GetDomain 8 libxul.so!nsContentUtils::AllowXULXBLForPrincipal 9 libxul.so!XPCWrappedNativeScope::XPCWrappedNativeScope 10 libxul.so!xpc::CreateGlobalObject 11 libxul.so!mozilla::BlockingResourceBase::PrintCycle 12 libxul.so!XPCAutoLock::~XPCAutoLock 13 libxul.so!XPCNativeScriptableInfo::Construct 14 libxul.so!XPCWrappedNative::WrapNewGlobal 15 libxul.so!nsLogAddRefCamera [CameraCommon.h:70157d046e35 : 110 + 0xe] 16 libxul.so!nsLogReleaseCamera [CameraCommon.h:70157d046e35 : 115 + 0xe] 17 libxul.so!nsDOMClassInfo::Release 18 libxul.so!xpcObjectHelper::GetScriptableFlags 19 libxul.so!nsXPConnect::InitClassesWithNewWrappedGlobal 20 libxul.so!pref_HashTableLookup 21 libxul.so!nsGlobalWindow::SetNewDocument 22 libxul.so + 0x1021cb1 23 libxul.so!nsCOMPtr<nsINode>::operator= 24 libxul.so!nsDocumentViewer::SetDocumentInternal 25 libxul.so!NS_TableDrivenQI 26 libxul.so!nsHTMLDocument::QueryInterface 27 libxul.so!nsDocumentViewer::SetDOMDocument 28 libxul.so!nsPrintObject::Init 29 libxul.so!GetBloatEntry [nsTraceRefcntImpl.cpp:70157d046e35 : 429 + 0x7] 30 libxul.so!nsPrintEngine::DoCommonPrint 31 libxul.so!NS_LogAddRef [nsTraceRefcntImpl.cpp:70157d046e35 : 251 + 0x7] 32 libnspr4.so!PR_Unlock [ptsynch.c:70157d046e35 : 205 + 0x7] 33 libxul.so!nsPrintEngine::CommonPrint 34 libxul.so!nsPrintEngine::PrintPreview 35 libxul.so!nsDocumentViewer::PrintPreview 36 libxul.so!NS_InvokeByIndex [xptcinvoke_x86_64_unix.cpp:70157d046e35 : 164 + 0x40] 37 libxul.so!CallMethodHelper::InitializeDispatchParams 38 libxul.so!XPCJSRuntime::SetPendingException 39 libxul.so!CallMethodHelper::Call 40 libnspr4.so!PR_Unlock [ptsynch.c:70157d046e35 : 205 + 0x7] 41 libxul.so!xptiInterfaceEntry::GetMethodInfo 42 libxul.so!XPCWrappedNative::CallMethod 43 libxul.so + 0x11ce955 44 libxul.so!NS_LogAddRef [nsTraceRefcntImpl.cpp:70157d046e35 : 251 + 0x7] 45 libnspr4.so!PR_Unlock [ptsynch.c:70157d046e35 : 205 + 0x7] 46 libxul.so!IS_WN_WRAPPER_OBJECT [xpcpublic.h:70157d046e35 : 110 + 0x6] 47 libxul.so!XPCCallContext::XPCCallContext 48 libxul.so!js::GCMarker::markDelayedChildren 49 libxul.so!XPC_WN_GetterSetter 50 libxul.so!XPC_WN_CallMethod
Hmm, camera stuff around frame 16? Do we really muck around that in the middle of print preview? Or is some of this stack lies?
Where is the principal that's involved here created?
A little more info: This is easily reproducible on my Windows box. Apply the main patch from bug 839961, and run |mach mochitest-chrome layout/base/tests/chrome/test_printpreview_bug396024.xul|. The stack above does have some lies in it. MSVC gives me http://pastebin.mozilla.org/2529483, which is basically a setTimeout running some JS, which calls into nsDocumentViewer::PrintPreview(). *fatal assert* nsPrincipal::GetAppId nsPermissionManager::CommonTestPermission nsPermissionManager::TestPermissionFromPrincipal TestSitePerm nsContentUtils::IsSitePermAllow nsContentUtils::AllowXULXBLForPrincipal TreatAsRemoteXUL CreateNativeGlobalForInner nsGlobalWindow::SetNewDocument nsDocumentViewer::SetDocumentInternal nsDocumentViewer::SetDOMDocument nsPrintObject::Init nsPrintEngine::DoCommonPrint nsPrintEngine::CommonPrint nsPrintEngine::PrintPreview nsDocumentViewer::PrintPreview ...JS... ...Timer... DumpJSStack() seems to be broken for me, but the URI being passed in to CreateNativeGlobalForInner() 8 frames from the top is "chrome://mochitests/content/chrome/layout/base/tests/chrome/printpreview_bug396024_helper.xul". Some dump debugging pins this as happening at the end of http://mxr.mozilla.org/mozilla-central/source/layout/base/tests/chrome/printpreview_bug396024_helper.xul; we're in run5() calling printpreview() calling gWbp.printPreview().
Attached file principal creation
Ok, I set a breakpoint in nsPrincipal::Init(), and then clicked to start the test. There were a handful (< 10) initialized with aAppId == 0, and then one created with 4294967295 (-1). Continuing on from that then hit the fatal assert. Attached is the stack from that principal's creation. Does this help? It all means nothing to me. :-)
I'm utterly baffled at what my patch from bug 839961 is doing to break this... If I revert the changes to browser/base/content/content.js (a frame script), the test still asserts and dies. If I then remove the single LoginManangerContent call in nsLoginManager.js, it still asserts. If I then remove nsLoginManager's lazyModuleGetter for this new JSM, and also comment out observer setup that the module does on first load, it... wait for it... still asserts. At this point the only change from stock m-c is that I've deleted a bunch of code from nsLoginManager, and have a module which nothing calls into, should never be instantiated, and even if it did doesn't do anything but create a couple objects with methods on them. I guess the deleted code from login manager was causing some subtle side effect that other code has come to incorrectly rely upon?
(Oh, and I also reverted fully back to m-c and ran the test again -- works flawlessly.)
Yeah, so. From a stock mozilla-central... 1) run test for sanity --> test ok 2) comment out the addProgressListener() call from the init() in nsLoginManager.js --> test asserts 3) restore addProgressListener() call, but put an immediate return in the listener's onStatusChange --> test ok So it would appear something is relying on some side effect of having a web progress listener present, even if it doesn't actually do anything. [Background/context: login manager does 3 basic things: provide an API for login manipulation, watch for form submissions to save passwords, and watch for page loads to autofill logins. Only the last of these is relevant here. It uses a web progress listener to watch for documents starting to load, upon which it adds a DOMContentLoaded listener, and once that fires for a page we try to autofill logins. One of the changes in bug 839961 is to remove the web progress listener, and just add the DOMContentLoaded listener straight from a frame script.]
Sounds vaguely related to bug 878886.
Oh, re-reading the stack in comment 0, it sounds like the same issue actually. The presence of a WPL probably affects the behavior of nsScriptSecurityManager::GetChannelPrincipal ?
That bug and the stack in attachment 763191 [details] both go through nsScriptSecurityManager::GetChannelPrincipal(), so I guess it could be the same thing.
I'm landing the fix to bug 839961 with a gross workaround (attachment 763783 [details] [diff] [review]) to avoid hitting this bug. Would still like to fix this to remove the gross hack, so login manager code doesn't have to be called for every document load.
Basically, when creating the channel you need to provide an nsILoadContext (which is retrived through the loadgroup) which returns NO_APP_IP from the appid getter.
In the attached stack trace, it seems that the issue occurs during the cloning made as part of creating the print preview document. Any idea about why cloning the document is sometimes triggering the same issue as bug 878886? It seems related to timing or some relationship between the various objects being set lazily.
Attached patch patch? (obsolete) — Splinter Review
I think I narrowed this down... The document in question has no docshell because its an empty "clone" doc created by nsHTMLDocument::Clone. Copying over the loadgroup state from the document being cloned fixes this locally for me. I pushed that to try to confirm: https://tbpl.mozilla.org/?tree=Try&rev=5269ec1c3a64 Not sure what other implications this might have, or whether a more specific fix would be better.
Attachment #8418477 - Flags: feedback?(jonas)
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #14) > I pushed that to try to confirm: > https://tbpl.mozilla.org/?tree=Try&rev=5269ec1c3a64 No luck :( Will need to try again tomorrow.
OK, I think I figured this out. Reproducing locally was not 100% reliable which made things annoying. The test is causing us to clone a document for print preview. Cloning ends up calling Reset(), which calls GetChannelPrincipal to get a principal from the document's current channel, and eventually ends up setting it as the document's new principal (not sure what the point of this is...). GetChannelPrincipal tries to get the relevant docShell by querying the channel's notificationCallbacks, but the channel (which is an nsBaseChannel for the document's initial about:blank load) has no notificationCallbacks anymore, because its mCallbacks is cleared in nsBaseChannel's OnStopRequest (http://hg.mozilla.org/mozilla-central/annotate/8bd82defae82/netwerk/base/src/nsBaseChannel.cpp#l741). So we fall into the GetCodebasePrincipalInternal(uri, UNKNOWN_APP_ID, false) case (http://hg.mozilla.org/mozilla-central/annotate/8bd82defae82/caps/src/nsScriptSecurityManager.cpp#l309), which means the document's principal ends up being set to an UNKNOWN_APP_ID principal, and so then comment 3 happens. I think the only reason this isn't a perma-orange is because things work if the about:blank load hasn't hit OnStopRequest before we try cloning. The WPL hack from bug839961 happens to adjust timing to ensure that usually happens, apparently. Bug 910521 suggests there are other cases where it doesn't happen. This seems to be a print-preview- (or document-cloning-) specific issue, but not sure how to fix it. Should Reset() do something different? As a hackaround in the interim, we can probably adjust this test to avoid cloning a document whose about:blank load has finished. Or we could just fix bug 878886 and remove this assertion, because I don't think it's getting us anything useful (we've created plenty of valid scenarios where this happens in practice).
Attachment #8418477 - Attachment is obsolete: true
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #16) > Or we could just fix bug 878886 and remove this assertion, because I don't > think it's getting us anything useful (we've created plenty of valid > scenarios where this happens in practice). Removing the assertion and failing the permissions check as mcmanus suggests in bug 878886 comment 2 seems reasonable to me. Sicking, what do you think?
Flags: needinfo?(jonas)
Cloning should use the principal of the document being cloned, I'd think. And in fact, nsDocument::CloneDocHelper does the right SetPrincipal. But then we go and call Reset()... why are we doing that? We've already set the base URI and principal; the only other things it does are setting mChannel (which we could do directly) and calling ResetToURI. Olli, can we nix that Reset call?
Flags: needinfo?(bugs)
There used to be some reason for the Reset, perhaps it was just mChannel. But just calling ResetToURI and setting mChannel manually might be ok.
Flags: needinfo?(bugs)
Justin, does this fix the issue you were running into?
Attachment #8424095 - Flags: feedback?(dolske)
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
(In reply to Boris Zbarsky [:bz] from comment #20) > Created attachment 8424095 [details] [diff] [review] > Make cloning static documents not reset the principal on the clone after > setting it to the right thing Thanks bz! I pushed a removal of the workaround to try, if this goes green we're good: https://tbpl.mozilla.org/?tree=Try&rev=8c2ab86c309b I also pushed without your fix just to confirm that it's still orange: https://tbpl.mozilla.org/?tree=Try&rev=55507c4b01bc
Flags: needinfo?(jonas)
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #21) > I pushed a removal of the workaround to try, if this goes green we're good: > https://tbpl.mozilla.org/?tree=Try&rev=8c2ab86c309b > > I also pushed without your fix just to confirm that it's still orange: > https://tbpl.mozilla.org/?tree=Try&rev=55507c4b01bc We have a winner!
Attachment #8424095 - Flags: feedback?(dolske) → feedback+
Comment on attachment 8424095 [details] [diff] [review] Make cloning static documents not reset the principal on the clone after setting it to the right thing Awesome, thanks! What's the workaround we should remove?
Attachment #8424095 - Flags: review?(bugs)
Flags: needinfo?(gavin.sharp)
The workaround removal is https://hg.mozilla.org/try/rev/55507c4b01bc (attached here).
Flags: needinfo?(gavin.sharp)
Comment on attachment 8424095 [details] [diff] [review] Make cloning static documents not reset the principal on the clone after setting it to the right thing Hmm, should we use NS_GetFinalChannelURI, not GetURI, for consistency.
Attachment #8424095 - Flags: review?(bugs) → review+
> Hmm, should we use NS_GetFinalChannelURI, not GetURI, for consistency. Yes! Good catch!
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Depends on: 1072991
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: