Closed Bug 51267 Opened 25 years ago Closed 25 years ago

Intermittent failure loading CSS from JARs

Categories

(Core :: Networking, defect, P1)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: jrgmorrison, Assigned: sspitzer)

References

Details

(Whiteboard: [nsbeta3+] fix in hand)

Attachments

(4 files)

For a while now, when starting mozilla on Windows (at least) there has been this error message in the console for debug builds. CSSLoaderImpl::DidLoadStyle: Load of URL 'chrome://communicator/skin/menu.css ' failed. Error code: 16389 I haven't seen this error in optimized builds, but there have been a number of bugs reported recently that basically boil down to a failure to load or resolve style on widgets and menus. There seems to be some failure in loading the style sheets from the JAR files. Here is the stack at the point of the error message, although the actual error may be on a different thread (I don't know). [build pulled ~9pm 09/03 win2k]. CSSLoaderImpl::DidLoadStyle(nsIStreamLoader * 0x02729660, nsString * 0x00000000, SheetLoadData * 0x02729518, unsigned int 2147500037) line 918 SheetLoadData::OnStreamComplete(SheetLoadData * const 0x02729518, nsIStreamLoader * 0x02729660, nsISupports * 0x00000000, unsigned int 2147500037, unsigned int 0, const char * 0x100a45e8 gCommonEmptyBuffer) line 644 nsStreamLoader::OnStopRequest(nsStreamLoader * const 0x02729664, nsIChannel * 0x0272ef38, nsISupports * 0x00000000, unsigned int 2147500037, const unsigned short * 0x100a45e8 gCommonEmptyBuffer) line 121 + 78 bytes nsJARChannel::OnStopRequest(nsJARChannel * const 0x0272ef3c, nsIChannel * 0x0272fe40, nsISupports * 0x00000000, unsigned int 2147500037, const unsigned short * 0x100a45e8 gCommonEmptyBuffer) line 925 + 53 bytes nsOnStopRequestEvent::HandleEvent(nsOnStopRequestEvent * const 0x026f22e0) line 302 nsStreamListenerEvent::HandlePLEvent(PLEvent * 0x026f2358) line 97 + 12 bytes PL_HandleEvent(PLEvent * 0x026f2358) line 589 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x00b6fce0) line 526 + 9 bytes _md_EventReceiverProc(HWND__ * 0x003d04c8, unsigned int 49339, unsigned int 0, long 11992288) line 1059 + 9 bytes USER32! 77e13eb0() USER32! 77e1401a() USER32! 77e192da() nsAppShellService::Run(nsAppShellService * const 0x00bebf68) line 379 main1(int 1, char * * 0x00957a98, nsISupports * 0x00000000) line 958 + 32 bytes main(int 1, char * * 0x00957a98) line 1139 + 37 bytes mainCRTStartup() line 338 + 17 bytes KERNEL32! 77e87903()
if this is tbe clue to bugs like bug 51225 and bug 51164, severity should be raised.
Of course, after I went home, it occurred to me that perhaps the error message was not bogus ... http://lxr.mozilla.org/seamonkey/search?string=communicator%2Fskin%2Fmenu.css shows the only places that this .css file is referenced, but when you track down the actual file, http://lxr.mozilla.org/seamonkey/source/themes/modern/communicator/menu.css, you find that it is a zero-length file (hence no style to load). Shunting this bug over to ben to see if that @import (and jar.mn line) should be removed (maybe, maybe not).
Assignee: warren → ben
updating summary from 'mozilla failing to load some stylesheets from the JAR files.'
Summary: mozilla failing to load some stylesheets from the JAR files. → remove(?) @import & jar.mn of communicator/skin/menu.css (blue,modern skins)
That particular one is bogus, but the bug is much worse. I intermittently have random stylesheet load failures, and I've seen this on practically every stylesheet we load (from the main global skin to others). The menu.css message is distinct and seems to be inaccurate. It should be the subject of a separate bug.
Assignee: ben → warren
Fixing summary.
Summary: remove(?) @import & jar.mn of communicator/skin/menu.css (blue,modern skins) → Intermittent failure loading CSS from JARs
Oops ... filed the specific menu.css issue as bug 51317. Although, looking at this again, it is curious that this is reported as an error when loading a zero-length CSS file form a JAR in a debug build. It is not an error in an optimized build (on windows), or from the file system in a debug build (linux).
*** Bug 51334 has been marked as a duplicate of this bug. ***
*** Bug 51335 has been marked as a duplicate of this bug. ***
Nominating for nsbeta3, this has been seen on Windows 98 as well.
Keywords: nsbeta3
See bug 51511 - it may be related or not, but it looks like this is a probable explanation. I will try to see if any errors pop up in the console when this occurs.
This is the entire console output on optimized build when symptoms described in bug 51511 manifest themselves: stdout directed to dynamic console stderr directed to dynamic console WEBSHELL+ = 1 I am inside the initialize Hey : You are in QFA Startup (QFA)Talkback loaded Ok. WEBSHELL+ = 2 WEBSHELL+ = 3 Setting content window *** Pulling out the charset in SetSecurityButton has multiple monitor apis is 1 WEBSHELL+ = 4 Document http://www.mozilla.org/ loaded successfully
Attached an image. This was seen by hammerly in today's build. after talking to hyatt, he said that this may also be a symptom of not reading in some CSS properly. Notice that the place for the dropmarker for forward, back, and print buttons is not appearing, but an identical image to the button itself is instead. This may be because menubutton.css is not being read properly. This is crucial for using skins in mozilla. The end user should never see the screenshot i just posted. Affirming nsbeta3.
*** Bug 51925 has been marked as a duplicate of this bug. ***
adding myself to the cc list. I'm wondering if this is related to the intermittent failure to load the am-server.js file. (which has been seen on win32 and linux) see bug #51546
I've seen this too. It's fairly frequent.
I get a failure on one or another chrome css files every couple of times that I launch - usually menu.css, but sometimes several at once, like: CSSLoaderImpl::DidLoadStyle: Load of URL 'chrome://global/skin/splitter.css' failed. Error code: 16389 CSSLoaderImpl::DidLoadStyle: Load of URL 'chrome://global/skin/tree.css' failed. Error code: 16389 CSSLoaderImpl::DidLoadStyle: Load of URL 'chrome://global/skin/radio.css' failed. Error code: 16389 on this launch. The browser looks like crap when this happens, and I think that this is really a pretty high priority bug.
Don't know how I'm going to get to this, but we need to fix it. nsbeta3+
Whiteboard: [nsbeta3+]
Should the priority of this bug be higher than a P3?
My $0.02 from a marketing perspective is that this is definitely a P1.
Ok, this may be redundant, but yes, this should be a higher priority. This intermittent problem makes Mozilla seem, well, a bit unprofessional. "Gee, it works most of the time. . . . "
for a pretty reproducable case of this, see bug #51546
*** Bug 50652 has been marked as a duplicate of this bug. ***
Warren, is there anything I can do to help get this narrowed down or fixed? I have a few spare cycles and can reproduce it pretty regularly...
Mark: You should try to coordinate with Seth who's working on 51546.
No longer blocks: 51546
mark / warren: it turns out 51546 is not related to this. but I'm more than willing to help debug this beast, too.
marking this P1, so it stays on the radar. (everyone seems to agree with p1 on this beast.) I'm doing some debugging on this today to see if I can make some headway.
Priority: P3 → P1
I'm getting somewhere, I think. it looks like we are processing multiple loads of the css file (example: menu.css) in parallel. we fail because the realsize (in the nsZipItem is 0). I'm trying to figure out why the nsZipItem gets corrupted. I'll continue to debug. (all this code is new to me...)
I just noticed that there are *two* menu.css files getting packaged in the jar, one of length zero. it's in the top level jar.nm in modern. I'm going to take it out, repackage and see if that makes the problem go away. keep your fingers crossed!
that seems to fix it for me. I was able to reproduce this bug easily, and now I can't. I'm going to fix the jar.mn files and then remove the zero length menu.css from the tree. after I get a reviewer, of course.
What about the other files that fail sometimes? See my comment on 2000-09-12 17:48. I'm really glad you fixed the menu.css one though, that was the most common.
It looks like john morrison figured out the problem 10 days ago. since other people have reported other css failures (tree.css, radio.css. etc) I'll look into those problems before I mark this bug fixed.
*** Bug 52685 has been marked as a duplicate of this bug. ***
adding nbhatla to the cc list.
mark: I'm going to go try to reproduce the tree.css,radio.css,splitter.css bugs, and look for similar jar problems (that caused the menu.css bug.)
ok, after some painful debugging, here's what I've found: the random failure to load files from JARs are caused by "corruption" errors. we fail to verify the crc32. see approx line 1145 on nsZipArchive.cpp nsZipArchive::InflateItem(const nsZipItem * 0x01a82b50, PRFileDesc * 0x00000000, char * 0x00d13020) line 1145 nsZipArchive::ReadInit(const char * 0x0270a500, nsZipRead * * 0x0270a564) line 410 + 18 bytes nsJARInputStream::Init(nsJAR * 0x00d1c8c0, const char * 0x02709c20) line 110 + 29 bytes nsJAR::GetInputStream(nsJAR * const 0x00d1c8c0, const char * 0x02709c20, nsIInputStream * * 0x0270ae2c) line 321 + 16 bytes nsJARChannel::GetInputStream(nsJARChannel * const 0x02709ae8, nsIInputStream * * 0x0270ae2c) line 1035 + 37 bytes nsFileTransport::Process() line 407 + 71 bytes nsFileTransport::Run(nsFileTransport * const 0x0270add4) line 362 nsThreadPoolRunnable::Run(nsThreadPoolRunnable * const 0x026b2a40) line 689 + 12 bytes nsThread::Main(void * 0x026b29f0) line 84 + 26 bytes _PR_NativeRunThread(void * 0x026b2880) line 399 + 13 bytes _threadstartex(void * 0x026b26d0) line 212 + 13 bytes KERNEL32! 77f04ee8() now I'm off to see why that is.
adding sgehani to the cc list. according the blame logs, he added the crc checking code, so he might be able to help figure this one out.
well, we think we've got it. nsZipArchive is not thread safe, but when the error occurs, we see that we have two threads calling into zlib to inflate an entry. samir and I are going to try to put a lock to prevent this, and see if it prevents the problem. and kill jar loading performance, I'm sure. samir's guess on why this doesn't show up on XPInstall is that XPInstall does all extraction on one thread.
*** Bug 52300 has been marked as a duplicate of this bug. ***
fix in hand. I'll attach the patch. I'm not seeing any failures anymore, and none of my failure breakpoints are getting hit. the downside: I've added a nsAutoLock to nsJAR which means performance goes down. correctness, then performance.
whoops, a few extra debugging assertions in that patch. I'll re-attach an official patch. (with comments, too.)
r=dveditz for what you've got, looking around to see if ::LoadEntry() and ::GetInputStream() are really the only ones you need to lock. nsJAR::Open() has a small unreleated problem. If you try to open twice mZip.OpenArchiveWithFileDesc() will return an error, and the open NSPR file will never be closed. Does the open need to be locked? The chances of double simultaneous use are small, but really it should. Ditto the Close().
Should nsJAR::Extract() be locked as well? Since you lock nsJAR::LoadEntry() and it calls nsJAR::GetInputStream() which you also lock, don't you get an NSPR error trying to lock the same lock twice?
taking this bug from warren. I've gotten a review from warren, so I'll check this in when the tree opens.
Assignee: warren → sspitzer
accepting. just waiting for the tree to open...
Status: NEW → ASSIGNED
Whiteboard: [nsbeta3+] → [nsbeta3+] fix in hand
dan, thanks for catching my mistake. I've got a new and improved patch, I'll attach it now.
marking m18.
Target Milestone: --- → M18
didn't get a chance to check this in tonight, I'll check it in tomorrow.
*** Bug 52754 has been marked as a duplicate of this bug. ***
r=dveditz when you get to it.
fixed. thanks to dveditz and samir for the help.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
changing QA contact to jrgm for verification
QA Contact: tever → jrgm
After several days launching builds (debug and opt) I have seen neither the error messages nor the failure to resolve style in chrome. So vrfy fixed.
Status: RESOLVED → VERIFIED
*** Bug 50519 has been marked as a duplicate of this bug. ***
Mass removing self from CC list.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: