Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1b1) Gecko/2006081703 BonEcho/2.0b1 Click on Page Setup. Switch to Landscape. Close Page Setup. Re-open Page Setup, see that it has reverted to Portrait. This could not be reproduced on Mac PPC and may be an Intel Mac only bug.
The following compiler warning on Intel is worrisome. I'm not sure exactly what "hidden" means in this case. Looking into it. /Users/josh/src/ff_debug/mozilla/widget/src/mac/../xpwidgets/nsPrintSettingsImpl.h:65: warning: 'virtual nsPrintSettings& nsPrintSettings::operator=(const nsPrintSettings&)' was hidden /Users/josh/src/ff_debug/mozilla/widget/src/mac/nsPrintSettingsX.h:63: warning: by 'nsPrintSettingsX& nsPrintSettingsX::operator=(const nsPrintSettingsX&)'
The compiler warning is telling us that the base class (nsPrintSettings) declares: nsPrintSettings& operator=(const nsPrintSettings& rhs); and the derived class (nsPrintSettingsX : public nsPrintSettings): nsPrintSettingsX& operator=(const nsPrintSettingsX& rhs); Declaring the latter (note the different signature) has the effect of hiding the former, but we don't care, because the only nsPrintSettings objects we'll ever see are nsPrintSettingsX objects. Anyway. Now I'm going to tell you what the bug was, how I found it, and how I fixed it. I'm going to do this because I have a cold, and writing a long explanation will take my mind off the fact that my head is about three stories above my shoulders. Sometimes, I was able to change the page setup settings. I found that I usually couldn't change orientation alone, but I'd have better luck if I changed the orientation along with something else like paper size or scale factor. Here's what I found in the console when I opened up a Page Setup dialog that reset itself back to the defaults (US Letter, portrait, 100%): 2006-08-21 16:55:50.095 firefox-bin CFLog (0): CFPropertyListCreateFromXMLData(): plist parse failed; the data is not proper UTF-8. The file name for this data could be: /Users/mmentovai/Library/KeyBindings/DefaultKeyBinding.dict The parser will retry as in 10.2, but the problem should be corrected in the plist. The top of stack looked like this: #0 0x9088b65a in CFLog () #1 0x90825109 in _CFPropertyListCreateFromXMLData () #2 0x90824d6f in CFPropertyListCreateFromXMLData () #3 0x91660a24 in PMXMLToTicket () #4 0x916814bd in OpaquePMPageFormat::PJCUnflattenPageFormat () #5 0x916813a2 in PMUnflattenPageFormat () #6 0x2a986d13 in nsPrintSettingsX::ReadPageFormatFromPrefs (this=0x2f9af850) at /lizard/1.8/mozilla/gfx/src/mac/nsPrintSettingsX.cpp:277 When this happened, PMUnflattenPageFormat would return -50 (paramErr). I noted that the base64 we were saving in WritePageFormatToPrefs was identical to the base64 we were reading back in ReadPageFormatFromPrefs. I figured that the base64 just encoded an XML plist (because of CFPropertyListCreateFromXMLData), and I always enjoy decoding base64, so I decoded it and found that it was indeed an XML plist, and that it was indeed well-formed. The differences between data that would be accepted and data that would cause PMUnflattenPageFormat to fail were insignificant. plutil and Property List Editor had no trouble reading any of the plists, and they both use CFPropertyListCreateFromXMLData. I also noted that if I took pageFormatHandle in WritePageFormatToPrefs and called PMUnflattenPageFormat on it, it would succeed. Because of all of this, it seemed most likely that we had base64-decoding problem. Here's where we do the base64 decode: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/widget/src/mac/nsPrintSettingsX.cpp&rev=1.14&mark=320-330#320 . Pay special attention to the PtrToHand call. A handle comprises a pointer and a field containing the size of the data block that the pointer references. In the PtrToHand call, we were setting the data size to 3/4 of the encoded base64 representation of the data block itself. The problem is that a proper base64 encoding, as performed by PL_Base64Encode, uses four encoded bytes for up to three bytes of input. "Up to" is what gets us into trouble. If you've just got one byte of data, you'll still get four bytes of base64-encoded data. If you do the data-length-is-3/4-base64-length computation as we did here, you'd assume that after decoding, you'd have three bytes of data. But you'd be wrong. You'd have one byte of data and two bytes of someting else: garbage. Now, in our case, we're working with XML data. What happens when you feed garbage to an XML parser? The good ones fail. We were giving up to two bytes of garbage to the XML parser after all of the well-formed XML we had stored. Of course, we're not always feeding garbage. Sometimes, the data length is exactly 3/4 the encoded length, and in that case, we're fine. Sometimes, the only extra data will be whitespace, and we'd be fine there too. That explains the intermittent nature of the problem. The reliance on random heap garbage could explain why nobody saw this on ppc, although ppc is in theory affected by this bug as well. (I didn't test ppc other than to verify that it's also using XML-encoded plists for flattened page setup data.) It's also possible that something else on ppc makes it more likely that the sizes would work out in favor of not showing this bug. Who knows? People don't print all that often anyway, and we've had this particular symptom (page setup settings not sticking on the Mac) at least three times over the past year for other causes. Maybe people just expect it to be broken. Unfortunately, PL_Base64Decode has no provisions to communicate the actual length of the decoded data back to the caller. I consider this an oversight, but we're not going to be changing any NSPR APIs today. We could add (or call out to) some other base64 decoder that returns the length, but I'm not going to do that either. With that in mind, the only real solution is to encode the actual data length along with the flattened page setup data. This patch implements that. Note that I've changed the name of the Page Setup pref to print.macosx.pagesetup-2. This is because the new length-encoded format is incompatible with the old format, and there's nothing else we can use to disambiguate them. Even though I've included very careful bounds-checking and it's exceedingly unlikely that passing the old format into the new code would do anything other than cause the page setup settings to revert to the defaults, I thought it wise to err on the side of caution because the flattened page setup data is opaque (although it happens to be an XML plist today, it may not have been in the past and may not be in the future), and because I wouldn't want to cause more problems for unpatched Mozillas attempting to read the updated pref. This has the effect of causing everyone's saved page setup settings to be lost once. I didn't think that this was important enough to account for in migration code. Migration code would be imperfect anyway, due to this bug. Now, astute hackers may point out that we also store Base64-encoded junk in the prefs for alias records pointing to files or directories. Do we need to fix those too? No. Alias records already include their length in the data payload, and we use that information to properly size their handles. Thanks for listening, Mark
Assignee: joshmoz → mark
Status: NEW → ASSIGNED
Attachment #234904 - Flags: review?(joshmoz)
Attachment #234904 - Flags: superreview?(mikepinkerton)
Comment on attachment 234904 [details] [diff] [review] Fix sr=pink great writeup, thanks a million. Is it worth putting more of this into the comments, or just leave it as-is?
Attachment #234904 - Flags: superreview?(mikepinkerton) → superreview+
Checked in on the trunk. I didn't add the writeup to code comments because it's more of a detective novel than anything else, I think the comments I put in the code are sufficient. Specifically, this sums it up: + // Save the handle in a struct that identifies the data length and + // the data itself, and wrap it all up in base64. The length must be + // included because PL_DecodeBase64 doesn't return the size of the + // decoded data, and the handle will need to be reconstructed later with + // the correct size.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment on attachment 234904 [details] [diff] [review] Fix This manifests itself as a ppc-x86 parity bug, although it probably shows up on ppc in certain cases too. If 1.8.1 weren't so close, I'd ask for it for 1.8.0.x.
Attachment #234904 - Flags: approval1.8.1?
Comment on attachment 234904 [details] [diff] [review] Fix a=schrep for UB/PPC Mac platform parity bug.
Attachment #234904 - Flags: approval1.8.1? → approval1.8.1+
Checked in on MOZILLA_1_8_BRANCH before 1.8.1rc1.
Target Milestone: --- → Firefox 2
You need to log in before you can comment on or make changes to this bug.