Open Bug 838845 Opened 11 years ago Updated 1 year ago

[meta] Full Review of GfxInfo and feature Blocklisting

Categories

(Core :: Graphics, defect)

defect

Tracking

()

Tracking Status
firefox19 - ---
firefox20 - ---

People

(Reporter: akeybl, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: meta)

Once we're done with the fallout from bug 838603, a full review of blocklisting on Android is a blocker for all future mobile blocklist pushes.

We also need to figure out what to do about pushing blocklists in the future that only apply to {version of Firefox vetted for blocklisting} and up.
The badness is not limited to Android --- I suggest we call this a full review of GfxInfo in general. My bias being that it needs to be rewritten from scratch.
Should bug 823253 be on here for a block-list removal again dealing with Stage-fright?
Summary: Full Review of Android Graphics Blocklisting → Full Review of GfxInfo and feature Blocklisting
(In reply to Aaron Train [:aaronmt] from comment #2)
> Should bug 823253 be on here for a block-list removal again dealing with
> Stage-fright?

Removing a blacklist entry should be much safer than adding one... though at this point, I don't want to display to much confidence.
We should complete this review as soon as possible, so we can make any necessary changes in FF20 late in Aurora or early in its beta cycle.
We're discussing this today in the gfx team. We're making an internal postmortem/proposal asap.

I'm not entirely sure that we should aim for FF20 though. At least, we should not rush this and not constrain the new design to accomodate a very near deadline.
(In reply to Benoit Jacob [:bjacob] from comment #5)
> We're discussing this today in the gfx team. We're making an internal
> postmortem/proposal asap.
> 
> I'm not entirely sure that we should aim for FF20 though. At least, we
> should not rush this and not constrain the new design to accomodate a very
> near deadline.

If possible, please try to separate the identification of issues with our current gfx blocklisting mechanism from a longterm solution. We should leave the door open to short term fixes, in case a critical graphics stability issue is identified prior to a redesign that forces us to react.
Absolutely. We'll keep these two things separate.
For reference here (bug 737176) is the previous postmortem we never really did seriously on the previous time that roughly the same thing blew up on us (bug 711656)
Here's what we discussed so far. It has an analysis of what went wrong, but does not yet have the proposal for the new design.

https://etherpad.mozilla.org/gfxinfo
https://etherpad.mozilla.org/gfxinfo now has a proposal for the new design and a discussion of how it would address issues we identified.
The etherpad seems to have stabilized so let's dump it here:

Part 1. Mini post mortem on the worst bugs we had

Bug 838603 - android startup crashes
Not fully understood yet, however we know that this was largely caused by a mixture of:
 - over-complication / absence of design leading to surprising bugs
 - absence of min-gecko-version field in XML blocklist entries
 - difficulty of QA'ing XML blocklist entries esp. on android
 - persistent gfxinfo state in user profile
Bug 711656 - D2D crashes
Was a combination of:
 - persistent gfxinfo state in user profile
 - GfxInfo API exposed to Gecko is too error-prone to use

***

Part 2. What's bad with current GfxInfo / what we should make better

1. Poor/complicated design leads to surprising bugs

    e.g. an unknown feature is interpreted as "all features" by accident

2. Persistent state in user profile

    Leads to unexpected bugs on real-world user profiles

    Very hard to predict or QA at all

    Moreover caching hardware-related info in user profile fails when a profile is used on multiple machines i.e. corp environment, portable firefox, etc.

    Moreover caching blacklisting decisions in user profile also confuses people who try to upgrade drivers

3. Absence of min-Gecko-version in XML blocklist

    Means that we can't ship blocklist updates that would break old Firefoxen because of e.g. 1.

4. Should not have a built-in blocklist --- should just fix all reasons why one might not want to use the XML blocklist
5. Should improve XML blocklisting capabilities

    Not experienced in XML format design. Need help there. E.g. do we want to allow matching hardware fields for substrings, regexes, etc.

6. Bad QAbility

    Persistent state makes some bugs (see part 1) virtually impossible to QA at all without knowing beforehand what the bug is going to be!

    No easy way to ping a blacklist update, especially on Android

    should have a ChromeOnly DOM API, see 9.

    No easy way to see what blacklist file was used, if it was correctly updated etc.

    No way to tell what blacklist rule determined the result

    Cannot test the blocklist entries of platform X while running on platform Y. See 8.

7. Bad testability

    Cannot test the blocklist entries of platform X while running on platform Y. See 8.

    XPCOM-only interface is not easy to test. ChromeOnly DOM API would be nicer. See 9.

    We could use compiled-code tests to test a large number of cases in reasonable time. Especially as some fuzz-like testing could be useful there to test resilience to unexpected XML blocklist entries.

    XPCshell test gave us the ability to test a browser restart to test XML blocklist updates. But we still can't test what happens when in the real world a build is run against a XML blocklist it wasn't designed for. So we need to change design approach here so that we're much more defensive towards unexpected XML blacklists entries. Then this design requirement would hopefully go away, allowing us to do without the XPCshell test. See below how the new design could allow that.

8. Platform-specific parts should be kept to the minimum i.e. system detection only.

    In particular that entails that system detection should be a separate self-contained part, separate from general blacklist processing. likely a separate class.

9. Should be a ChromeOnly WebIDL interface, not a XPCOM interface

    Will allow for mochitest-chrome test

    Will allow for ease of QA'ing by exposing stuff in a privileged web page

    Will avoid spreading bad design from XPCOM

    Will allow to have both a nice JS api and a nice C++ api

10. Bad API exposed to Gecko code

    Having a WebIDL interface instead of XPCOM interface (9) will go a long way toward making the C++ interface more sane.

    The interface should also avoid exposing gritty details to code that doesn't need them (precise blacklisting reason code when the user just wants a bool yes/no --- see bug 711656)

11. Impossible to know what blocklist rule determined the result

    This should be visible in about:support and in crash reports

***

Part 3. Proposed new design

Let's replace GfxInfo by 3 separate classes:

    SysInfo

    Blocklist

    BlocklistingService

A SysInfo instance would contain any information about the machine we're currently running on. One could get a SysInfo by calling a function that does actual system detection; or one could make up one's own custom SysInfo for testing purposes. SysInfo is the only part that would be platform-specific in any way. The SysInfo interface would still be platform-agnostic (otherwise, the whole blocklist processing would need to have platform-specific backends, which we don't want). The system detection code generating a SysInfo representing the host would be the only platform-specific part.
A Blocklist would be a set of blocklist rules. One could get one representing the actual blocklist in use (blocklist.xml) or one could make up one's own custom Blocklist for testing purposes. Blocklist would be completely platform-agnostic.
BlocklistingService would be a singleton --- not quite sure yet about that part. The BlocklistingService instance would be a good place to cache anything we want to cache for the duration of the Firefox session. BlocklistingService would be completely platform-agnostic.
To get a blocklisting decision, the user would call a method on BlocklistingService, passing it a Feature (some enum thing), a SysInfo and a Blocklist. There would be a helper overload only taking a Feature enum and just getting the real SysInfo and Blocklist; while tests would instead provide their own SysInfo and Blocklist.
These 3 classes would have ChromeOnly WebIDL bindings so they could be used in the same way by both JS and C++.
Blocklist is where all the XML processing would happen. From the outside --- even from BlocklistingService --- it should not matter that the blocklist came from XML
BlocklistingService would also offer some more testing/QAing features to force as blocklist ping, obtain the current blocklist and sysinfo, etc.
To be determined: what kind of caching would we do, if any, to make the XML blocklist processing not slow down startup? We will cross that bridge when we're there. The key design principle here is that we won't cache anything system-specific; we would just at most cache a more efficient representation of the blocklist entries that could be relevant on the host OS.

***

Part 4: How the new design (part 3) would solve the current problems (part 2)

1. Poor/complicated design leads to surprising bugs

    just taking this occasion to think the design through and code it cleanly should give us a fix for that

2. Persistent state in user profile

    By not caching anything beyond perhaps a system-agnostic form of the blocklist, we will avoid these problems

3. Absence of min-Gecko-version in XML blocklist

    We can make such a field mandatory. We can also add an optional max-Gecko-version field.

4. Should not have a built-in blocklist --- should just fix all reasons why one might not want to use the XML blocklist

    We can take all the existing built-in blocklist and turn it into XML entries, expanding our XML format as needed

5. Should improve XML blocklisting capabilities

    Will do as needed

6. Bad QAbility

    No persistent state

    DOM API will give access to useful features (i.e. ping the blocklist) which we'll be able to expose in privileged pages

    new design will reduce platform-specific parts to the minimum; will allow to test all platforms' blocklists on all platforms

    will return the information of which blocklist rule determined the result

7. Bad testability

    See part 6, the same applies here

    By not requiring a browser restart for anything, we will be able to everything except testing actual browser restart in mochitests as opposed to xpcshell tests

8. Platform-specific parts should be kept to the minimum i.e. system detection only.

    Proposed design does give that

9. Should be a ChromeOnly WebIDL interface, not a XPCOM interface

    Proposed design does give that

10. Bad API exposed to Gecko code

    New design + WebIDL bindings instead of XPCOM interface should give us that

11. Impossible to know what blocklist rule determined the result

    No reason why we couldn't write that new code in a way that makes that easy.
Which of the above (outside of the redesign) do we think are blockers for mobile gfx blocklisting vs new desktop gfx blocklisting? Can we get bugs on file for those already?
Mobile is no different from desktop as far as blocklisting. The above plan only details steps to a redesign; I didn't think of other things that could be done aside from the redesign, as I believe that the right approach is to just do that redesign.
Am I understanding this correctly, and you want a separate blocklist.xml only for driver blocks? Or is this about extending the current blocklist? I'm just confused because I see you mention XML processing, but I would assume that the current blocklist is already being read and parsed elsewhere.
This is a plan to redesign the C++ code that handles the blocklist.xml entries, and to expand the xml format there (add new tags). We haven't made plans so far to add a separate XML file, as I was under the assumption that it made sense to avoid adding more network traffic. When we get to the point where we are expanding or changing the XML format, we will definitely work with you to ensure that existing entries are correctly ported.
(In reply to Benoit Jacob [:bjacob] from comment #13)
> I didn't think of other things that could
> be done aside from the redesign, as I believe that the right approach is to
> just do that redesign.

Does that mean that the gfx team's recommendation is that we push no further blocks to desktop or mobile until this redesign is complete? If so, have we come to an ETA for the blocking work?
Not quite that bad.  We can push blocks in general, though there may be some types that we couldn't. They should be evaluated with the knowledge gained with bug 824118 until the redesign makes things safer.
(In reply to Benoit Jacob [:bjacob] from comment #15)
> This is a plan to redesign the C++ code that handles the blocklist.xml
> entries, and to expand the xml format there (add new tags). We haven't made
> plans so far to add a separate XML file, as I was under the assumption that
> it made sense to avoid adding more network traffic. When we get to the point
> where we are expanding or changing the XML format, we will definitely work
> with you to ensure that existing entries are correctly ported.

OK, I'll just note that it would be useful to reuse the nodes and attributes we are already using for extension and plugin blocks, which possibly have greater filtering capabilities than what you need for graphics blocks. It'll also make it easier to implement both in Firefox and on AMO.
@Jorge: interesting, please a link to some documentation about that?

@Alex: Is there anything more that you want to happen in Firefox 20 timeframe? Implementation will not be ready for Firefox 20. In the very near term I think all we can do is be extremely careful if we have to add any downloadable blacklist entry. We can still more relaxedly add compiled-in blacklist entries.
(In reply to Benoit Jacob [:bjacob] from comment #19)
> @Jorge: interesting, please a link to some documentation about that?

https://wiki.mozilla.org/Blocklisting/Admin
(In reply to Benoit Jacob [:bjacob] from comment #19)
> @Alex: Is there anything more that you want to happen in Firefox 20
> timeframe? Implementation will not be ready for Firefox 20. In the very near
> term I think all we can do is be extremely careful if we have to add any
> downloadable blacklist entry. We can still more relaxedly add compiled-in
> blacklist entries.

Nope, sounds like there's nothing to do in FF20. Will ask your opinion on bug 843273.
It would be nice if we could address that in a future release. We have been bitten by this in 33 and that hurt.
Milan, could you prioritize this?
Flags: needinfo?(milan)
There hasn't been a lot of push on this; we got in trouble in 21, and now in 33, and were lucky in between, but when bad things happen, usually they're really bad, as blocklisting, especially the downloadable part, is our last line of defense.
Flags: needinfo?(milan)
Assignee: jacob.benoit.1 → nobody
Severity: normal → S3
Summary: Full Review of GfxInfo and feature Blocklisting → [meta] Full Review of GfxInfo and feature Blocklisting
You need to log in before you can comment on or make changes to this bug.