Closed Bug 463806 Opened 16 years ago Closed 16 years ago

[PATCH][@font-face] Downloaded font activation on Mac may fail due to ATS cache corruption

Categories

(Core :: Graphics, defect, P2)

PowerPC
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1b3

People

(Reporter: jfkthame, Assigned: jfkthame)

Details

(Keywords: fixed1.9.1)

Attachments

(1 file, 4 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_5; en-us) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.20.1
Build Identifier: 

This is related to bug 458861, which contains additional analysis of the underlying ATS cache problem. The fix for that bug aims to check fonts for validity before trying to activate them, in order to avoid damaging the cache. However, if the cache is already corrupted due to other circumstances (e.g., the user has installed a bad font, or another program on the system has tried to activate a font using invalid data), Mozilla may still fail to correctly load and activate a perfectly good font. In testing on OS X 10.5.5, I typically see 3 failures out of the 50 font-load operations in the downloadable-font-loadtest.html reftest.

See bug 452870#18 and following comments for further info on this.

The attached patch (revised from attachment 346907 [details] [diff] [review]) makes the font system more robust by allowing it to retry the activation of a downloaded font, in the event that the ATS generation value has changed during the first attempt; this appears to be a key indicator associated with the failures.



Reproducible: Sometimes

Steps to Reproduce:
See bug 458861 for a technique that can corrupt the ATS font cache, leading to intermittent failures. Note that since that bug-fix, it is harder to cause such corruption directly through Mozilla; however, if the cache is already damaged (perhaps due to flaws in other software or installed fonts), then font activation may still fail as shown by the downloadable-font-loadtest.html example.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: joshmoz → jdaggett
> // ATSFontGetPostScriptName() can fail if the font DB generation changed
> // somewhere during the initialization process, giving us an invalid
> // font entry; if that happened, retrying the activation should work
> if (ATSGetGeneration() != mATSGeneration) {
> 
> }
> // otherwise, we don't know what went wrong, so just exit the retry loop
> else
> 	break;

So the logic here is that if the generation value is not incremented, that indicates some internal ATS font DB failure?  This is essentially synonymous with ATSFontGetPostScriptName failing?

Seems like the retry loop could be tighter, can't the generation check/call to ATSFontGetPostScriptName can be done right after ATSFontFindFromContainer is called, before creating a MacOSUserFontData object?  Something like:

  while (retryCount++ < kMaxRetries) {
    ATSFontActivateFromMemory
    ATSFontFindFromContainer
    // check for font DB error
    // break if no error, cleanup otherwise
  }
  
  MacOSUserFontData *userFontData = new MacOSUserFontData(containerRef);
  
Also, the patch looks like it's a diff on top of another patch.  Using ATSFontRef/FMGetFontFromATSFontRef is fine here so the diff can be made without being dependent on another patch.
Assignee: jdaggett → nobody
Severity: normal → major
Component: GFX: Mac → GFX: Thebes
Flags: blocking1.9.1?
Priority: -- → P2
QA Contact: mac → thebes
Target Milestone: --- → mozilla1.9.1b3
Version: unspecified → Trunk
Summary: Downloaded font activation on Mac may fail due to ATS cache corruption → [@font-face] Downloaded font activation on Mac may fail due to ATS cache corruption
Assignee: nobody → jfkthame
The logic is actually that the generation value *only* appears to change after ATSFontGetPostScriptName fails. It does not get updated on a normal, successful activation. So if we get to the end of the retry loop (which means we didn't get a valid font), and yet the generation count hasn't changed, we don't bother retrying as it's not this issue - although if we ignored the generation, and allowed a couple of retries in all cases, it wouldn't really do any harm.

I have updated the patch and ensured it does not depend on other patches.

I tried using a "tight" retry, as suggested, but unfortunately this does not resolve the problem; it seems that we have to try and access the font in some way (such as ATSFontGetPostScriptName, called from the MacOSFontEntry constructor) before the generation counter may get bumped, and we can detect the failure.

So before we can tell whether we need to retry, we'd have to at least go ahead and call ATSFontFindFromContainer and ATSFontGetPostScriptName; and if we're going to do that, we might as well complete our normal process and check mIsValid to see if it works. Writing specific code to check if the activation was successful by calling these, yet without constructing our fontentry objects, seems like needless duplication.
Attachment #347064 - Attachment is obsolete: true
Attachment #351118 - Flags: review?(jdaggett)
Attachment #347064 - Flags: review?(jdaggett)
(In reply to comment #3)

> I tried using a "tight" retry, as suggested, but unfortunately this does not
> resolve the problem; it seems that we have to try and access the font in some
> way (such as ATSFontGetPostScriptName, called from the MacOSFontEntry
> constructor) before the generation counter may get bumped, and we can detect
> the failure.

Actually, I suspect (though have not traced carefully) that the generation count doesn't really change until we destroy the MacOSUserFontData that we created, which causes the font to be deactivated again. The only potential "win" would be to try calling ATSFontGetPostScriptName separately before constructing the MacOSUserFontData, but then we'd have to either pass the name as an additional parameter there, or call it again; I don't see the benefit. Better to leave the normal, successful path as straightforward as possible, and allow a little extra work to happen in the failure case.
So how about this, let's move the call to ATSFontGetPostScriptName out of the second MacOSFontEntry constructor and into MakePlatformFont, then add an extra parameter to the second constructor, similar to the first constructor: 

  MacOSFontEntry(const nsAString& aPostscriptName, ...

That way all the code to load and test the font is all in one place.  With the call to ATSFontGetPostScriptName in the constructor it's not clear that MakePlatformFont is *relying* on that to assure the font is valid.
OK, I have moved the call to ATSFontGetPostScriptName to be directly in MakePlatformFont, before the MacOSFontEntry is created, so we can retry immediately if it fails.

Also removed the comparison of ATSGeneration values, as I think the change of generation is merely a result of the deactivation we do when GetPSName fails; it's not really an indicator of failure in its own right.
Attachment #351118 - Attachment is obsolete: true
Attachment #351158 - Flags: review?(jdaggett)
Attachment #351118 - Flags: review?(jdaggett)
Attachment #351158 - Flags: review?(jdaggett) → review+
Comment on attachment 351158 [details] [diff] [review]
revised patch as suggested in comment #5


Looks good, one small nit:

+ // We don't retry from here; the ATS font cache issue would have caused failure earlier
+ // so if we get here, there's something else bad going on within our font data structures.
+ // Currently, there should be no way to reach here, as fontentry creation cannot fail
+ // except by memory allocation failure.
+ NS_NOTREACHED("invalid font entry for a newly activated font");
+ break;

I don't think this is quite right, invalid font data is not something we should assert about.  Bad content is handled as bad content, not something that violates program invariants. I think NS_WARNING is more appropriate here.
Flags: blocking1.9.1? → blocking1.9.1+
Summary: [@font-face] Downloaded font activation on Mac may fail due to ATS cache corruption → [PATCH][@font-face] Downloaded font activation on Mac may fail due to ATS cache corruption
Attachment #351158 - Attachment is obsolete: true
Attachment #353020 - Flags: superreview?(roc)
+        // We don't retry from here; the ATS font cache issue would have caused failure earlier
+        // so if we get here, there's something else bad going on within our font data structures.
+        // Currently, there should be no way to reach here, as fontentry creation cannot fail
+        // except by memory allocation failure.

I'm confused. Can we reach this point if we downloaded corrupt font data? If we can, then the comment is incorrect. But if we can't, then John's comment #7 doesn't make sense.
The comment is accurate as the code stands right now.

However, I can imagine that eventually we might do some kind of validation of the newly-created MacOSFontEntry, and clear the mIsValid flag if we detect something bad. If that were to happen, we'd have a legitimate path that could reach this point in the code.

So I think NS_NOTREACHED is currently correct, but this could change some day. And if that happens, issuing a warning and returning nsnull will be more consistent with the other potential failure paths here.
Attachment #353020 - Flags: superreview?(roc) → superreview+
Comment on attachment 353020 [details] [diff] [review]
changed NOTREACHED to WARNING as requested

ok
Pushed 98ea743c9156 to trunk.
Status: NEW → RESOLVED
Closed: 16 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 191 landing]
Backed out, this was causing the OS X 10.4 talos boxes to crash after loading a few pages of the Tp test. First box has gone green after backing this out.

EG: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1229522550.1229545859.12261.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I have not been able to reproduce the failures from the test boxes as yet. I can only think of one aspect of the patch that could possibly have affected cases that aren't actually using @font-face, and that's the switch from an internal Cocoa method to ATSFontFindFromPostScriptName in the MacOSFontEntry::GetFontID function.

Accordingly, I have updated the patch to revert to the previous approach here, and suggest that we re-land this and see if that resolves the issue. Unfortunately, the tryserver doesn't help us here, as submitting the previous patch there seemed fine.
Attachment #353682 - Flags: review?(roc)
-    ATSGeneration currentGeneration = ATSGeneration();
+    ATSGeneration currentGeneration = ATSGetGeneration();

So the original code was always setting currentGeneration to 0 I guess. I don't see how fixing this could possibly make a difference but it's something to try reverting if all else fails.
Keywords: checkin-needed
Whiteboard: [needs 191 landing] → [needs landing]
Attachment #353020 - Attachment is obsolete: true
Comment on attachment 353682 [details] [diff] [review]
revised patch, backing out one change to try and resolve crashes
[Checkin: Comment 16 & 19]

http://hg.mozilla.org/mozilla-central/rev/d20c18e98f8f
Attachment #353682 - Attachment description: revised patch, backing out one change to try and resolve crashes → revised patch, backing out one change to try and resolve crashes [Checkin: Comment 16]
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing] → [c-n: baking for 1.9.1][needs landing]
(In reply to comment #16)
> (From update of attachment 353682 [details] [diff] [review])
> http://hg.mozilla.org/mozilla-central/rev/d20c18e98f8f

No need to include [PATCH] in the commit message.
(Also, we should use the status whiteboard for that stuff.)
(In reply to comment #17)
> (Also, we should use the status whiteboard for that stuff.)

(The status whiteboard doesn't show up in bug lists.)
Comment on attachment 353682 [details] [diff] [review]
revised patch, backing out one change to try and resolve crashes
[Checkin: Comment 16 & 19]

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8af7b470f2e4
Attachment #353682 - Attachment description: revised patch, backing out one change to try and resolve crashes [Checkin: Comment 16] → revised patch, backing out one change to try and resolve crashes [Checkin: Comment 16 & 19]
(In reply to comment #17)
> No need to include [PATCH] in the commit message.

I take whatever the summary is set to at the checkin-needed time...
Whiteboard: [c-n: baking for 1.9.1][needs landing]
(In reply to comment #20)
> I take whatever the summary is set to at the checkin-needed time...

When the patch has a commit message inside it already, as this one did, you should definitely prefer that one over using the bug summary (although you should make sure to include the bug number and reviewers).  Commit messages should be a summary of the changes being made, not just a description of the symptom they fix.

(In reply to comment #18)
> (The status whiteboard doesn't show up in bug lists.)

It does if you follow the "Change Columns" link at the bottom of your bug list (which I think sets a cookie, although maybe it's a hidden preference).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: