Closed Bug 1439323 Opened 2 years ago Closed 1 year ago

Gloda facet/filters layout not properly formatted when choosing involved. DOM not built as it used to be after bug 1429125

Categories

(Thunderbird :: Theme, defect)

defect
Not set

Tracking

(thunderbird_esr52 unaffected, thunderbird_esr60+ fixed)

RESOLVED DUPLICATE of bug 1520467
Tracking Status
thunderbird_esr52 --- unaffected
thunderbird_esr60 + fixed

People

(Reporter: jorgk-bmo, Unassigned)

References

Details

(Keywords: regression, Whiteboard: [regression:TB57])

Attachments

(2 files, 2 obsolete files)

Attached image gloda-filters.png
See screenshot. The text is overlapping.
Flags: needinfo?(richard.marti)
How do you search to see "involving any of:" and "other participants:"? I can only make them appear through Inspector disabling a display: none rule. And then it looks correct.
Flags: needinfo?(richard.marti)
Like this:
I searched for "beta" and got:
Filters
 [ ] From Me  [ ] To Me
 [ ] Starred

People
 Bugzilla@Mozilla
 TB drivers

Account
 IRC - 
 jorgk
 Local Folders

Folder
 Mozilla (Local Folders)
 Sent (jorgk)

Now click onto any of the sub-items and select "Must be in ...".

The next screen is broken.
Okay, I see it now. This isn't a CSS issue but probably a JS issue. Does gloda use other libraries?

I checked with TB 52. The <li> you want to "Must be in ..." is moved from the .facet-remaindered <ul> to the .facet-included <ul> and all is good.

Now it moves the <li> as TB 52 but sets display: none; additionally it creates a new <li> with absolute positioning to position there the original <li> should be. But this doesn't move the following items down and this shows then the overwriting.

Somehow this must be a regression in JS part. TB 59 shows the same. So it must have regressed before.
OK, thanks for looking into it. If it's a regression, I hope Alice will help us find it. STR in comment #2.

Gloda hasn't changed in ages, so I don't know what's going on here.
Flags: needinfo?(alice0775)
Thanks Alice!!

So the regression came in on Jan 10th. That day, bug 1428745 and broke Gloda. We didn't notice and finally fixed in on 8th Feb in bug 1436590.

So I wonder what broke it in the first regression window. The datetimepicker stuff is not likely. Or maybe it is. There is all this complicated processing in  comm-central/mail/base/content/gloda* :-(
Aceman, you've dug around Gloda before, could you please take a look.
Gloda uses a 3rd party library as seen in bug 1436590. As said in bug 1436590 comment 11, we may need to fix the library to work in the newer JS version/engine now. But if there are no errors in the console, it is hard to find what to fix.
The protovis library source mentions JS version 1.8 and parsing of JS 1.6 so all hell may break loose if we run it in an engine that ignores all the JS versions and parses code as if it were the newest version.

We should drop 3rd party libraries that we can't maintain. We did get rid of jquery.
Or the other option is to convert the dialog from xhtml to xul, as in chrome versioned JS should still be supported.
But that direction is probably out of fashion with the push to messy html instead of xul.

Removing protovis from the cloudfile providers should be easy, we do not really need a pie chart for the free space. A plain rectangular bar done in XUL could be enough.

The gloda charts are harder.

Can you try disabling loading the date chart at https://dxr.mozilla.org/comm-central/source/mail/base/content/glodaFacetBindings.xml#1293 so we can at least determine if the problem is in the library or not?
Thanks. The problem is in the area on the left side, not the bar chart. That area is driven by glodaFacetView.xhtml in comm-central/mail/base/content/gloda*. I'll try your suggestion anyway.
Yes, that is just an experiment whether some silent error in the protovis code doesn't affect subsequest code for the left side panel.
I took out the "build" method around glodaFacetBindings.xml#1293 and that removed the bar chart, but the formatting error on the left was still present.
Regression range in local build with patch(36b19d54366f bug 1436590):
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d97ff9ec96fc&tochange=57386b58c057

And finally, I got
Last Good: c-c:c470737be118 +36b19d54366f m-c:0085a1b2cbdc
First Bad: c-c:c470737be118 +36b19d54366f m-c:2e705f777acd

Regressed by: 2e705f777acd	Emilio Cobos Álvarez — Bug 1429125: Enable lazy frame construction in the browser chrome, but not XUL yet. r=heycam,tnikkel
Thanks!!
I was just going to post the results of some investigation. As Richard wrote in comment #1, the results are made up of <li> elements, which are constructed here:
https://dxr.mozilla.org/comm-central/rev/f51fea20a88c099624cc0145b4168b38eed204c8/mail/base/content/glodaFacetBindings.xml#914

Now they are placed incorrectly. If I manually delete such an element and rerun the same steps, the list is correct, and no rewriting takes place. So the problem as an "intermittent" character as you noted in comment #5.

Emilio, can you please help us here. We basically create some DOM on the fly in glodaFacetBindings.xml (see link above) and now the result isn't as it was before, it seems that elements don't get inserted where they used to go.
Flags: needinfo?(emilio)
Huh, interesting... I assume these are all XBL bindings, right?

I suspect this is an issue with nested insertion points. These are broken since forever, see bug 1422747 and bug 472020. My changes may have affected it indeed, causing ranges of content to be inserted differently than before, and triggering this known bug for this particular case...

Does the patch in bug 1425362 by any chance fix this?

If not, a way to fix this could be to create the DOM under a display: none element, and remove the display: none when the DOM is constructed. I'd really prefer not to have to fix all the XBL insertion point bugs at this point :).
Flags: needinfo?(emilio) → needinfo?(jorgk)
Attached patch 1439323-pre-load.patch (obsolete) — Splinter Review
This fixes the issue. I don't really have a clue what I'm doing here.

Boris, we were busily removing those load calls in
https://hg.mozilla.org/comm-central/rev/8f403ab507743db0e61f5eec8b5349c8ff3486a9
Bug 1438328
and now it turns out that adding such a call fixes a problem.

Please advise.
Flags: needinfo?(bzbarsky)
Attachment #8952518 - Flags: feedback?(emilio)
Flags: needinfo?(jorgk)
Summary: Gloda filters not properly formatted → Gloda filters not properly formatted: DOM not built as it used to be after bug 1429125
Oh, wires crossed here, I will try the patch from bug 1425362.
Comment on attachment 8952518 [details] [diff] [review]
1439323-pre-load.patch

Review of attachment 8952518 [details] [diff] [review]:
-----------------------------------------------------------------

::: mail/base/content/glodaFacetTab.js
@@ +19,5 @@
>        type: "glodaSearch"
>      }
>    },
>    openTab: function glodaFacetTabType_openTab(aTab, aArgs) {
> +    document.loadBindingDocument("chrome://messenger/content/glodaFacetBindings.xml");

I wish I knew why would this fix the error, sorry... Feel free to redirect to bz, though looks somewhat hacky :)

All chrome:// XBL bindings should load synchronously as soon as frame construction runs...
Attachment #8952518 - Flags: feedback?(emilio)
(In reply to Jorg K (GMT+1) from comment #13)
> If I manually delete such an element and
> rerun the same steps, the list is correct, and no *over*rewriting takes place.
In fact, just repeating the steps gives the correct result.

I wonder whether we've done any damage with:
https://hg.mozilla.org/comm-central/rev/8f403ab507743db0e61f5eec8b5349c8ff3486a9
The need for those calls wasn't documented.
The patch from bug 1425362 doesn't fix the issue.
This does the "display: none" from comment #14. It doesn't work, at least not with the patch from bug 1425362 still applied.
Attachment #8952588 - Flags: feedback?(emilio)
> Boris, we were busily removing those load calls in
> https://hg.mozilla.org/comm-central/rev/8f403ab507743db0e61f5eec8b5349c8ff3486a9

Just to be clear, what I had asked for was for someone to check _whether_ they could be removed, and if not why not.

> I wonder whether we've done any damage with:
> https://hg.mozilla.org/comm-central/rev/8f403ab507743db0e61f5eec8b5349c8ff3486a9

That is what I had asked people to check: whether they were actually needed...  I wasn't saying they have to be removed, though it sure would be nice if they could be.

OK, so on to the actual issue in this bug.  Do you ever end up in a situation where https://searchfox.org/mozilla-central/rev/47cb352984bac15c476dcd75f8360f902673cb98/dom/xbl/nsXBLService.cpp#1008 is reached with aForceSyncLoad false and aBindingURI is chrome://messenger/content/glodaFacetBindings.xml ?

I ask because this code is forcing syncload if the _document_ URI is chrome://, not the binding URI.
Flags: needinfo?(bzbarsky)
And that last was apparently a behavior change introduced in https://github.com/mozilla/gecko-dev/commit/bfd6b88a89407917b0409344fc2d549e9fba78ba ?  Before that it looks like it was checking the binding URI.

So worth checking whether changing https://searchfox.org/mozilla-central/rev/4e5e8f7b73d59fc29d32db1a0c8d763732209fe1/dom/xbl/nsXBLService.cpp#938 to test the scheme of aBindingURI helps.
On the other han, I'd expect documentURI to be "chrome://messenger/content/messenger.xul" there when loading glodaFacetBindings.xml.
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #21)
> Just to be clear, what I had asked for was for someone to check _whether_
> they could be removed, and if not why not.
 
> That is what I had asked people to check: whether they were actually
> needed...  I wasn't saying they have to be removed, though it sure would be
> nice if they could be.
Indeed, we understood you 100%. However, the calls we removed to be nice were preceded by a comment "otherwise we have problems". That's not real good documentation. So the only check we saw possible was to actually remove the calls to see whether anything stopped working. If we find visual glitches, we will need to revisit this.

> OK, so on to the actual issue in this bug.
Indeed. I will poke around nsXBLService.cpp#1008.
Comment on attachment 8952588 [details] [diff] [review]
1439323-display-none.patch - NOT working

Review of attachment 8952588 [details] [diff] [review]:
-----------------------------------------------------------------

::: mail/base/content/glodaFacetBindings.xml
@@ +939,5 @@
>          }
>  
> +        // Hack to work around XBL problem, see bug 1439323.
> +        // Part 2, see part 1 above.
> +        remainderList.style.display = "initial";

Does it work if you do something like `remainderList.offsetTop` above this line?

Also, no need for "initial", just "" would remove the property, which would avoid overwriting the display value with "inline".
Attachment #8952588 - Flags: feedback?(emilio)
Comment on attachment 8952518 [details] [diff] [review]
1439323-pre-load.patch

Sorry about the confusion, this doesn't work after all.
Attachment #8952518 - Attachment is obsolete: true
Adding .offsetTop doesn't change the situation.
Attachment #8952588 - Attachment is obsolete: true
First I confirmed that backing out https://hg.mozilla.org/mozilla-central/rev/2e705f777acd changed the behaviour back to what we need. That's confirmed.

I added a bit of debug
    // Always load chrome synchronously
    bool chrome;
    if (NS_SUCCEEDED(documentURI->SchemeIs("chrome", &chrome)) && chrome)
      aForceSyncLoad = true;

    nsCOMPtr<nsIDocument> document;
    nsAutoCString dspec, bspec;
    documentURI->GetSpec(dspec);
    aBindingURI->GetSpec(bspec);
    printf("=== before calling FetchBindingDocument: force=%d, docURI %s, bindURI %s\n", aForceSyncLoad?1:0,
           dspec.get(), bspec.get());
    rv = FetchBindingDocument(aBoundElement, aBoundDocument, documentURI,
                              aBindingURI, aOriginPrincipal, aForceSyncLoad,
                              getter_AddRefs(document));

and when doing the operation, no extra debug shows. Running it again capturing all the debug I don't see anything related to chrome://messenger/... or glodaFacetBindings.xml. The only debug I see is loading chrome://global/... like:
=== before calling FetchBindingDocument: force=1, docURI chrome://global/content
/bindings/dialog.xml, bindURI chrome://global/content/bindings/dialog.xml#dialog

Backing out rev 2e705f777acd again, I see:
=== before calling FetchBindingDocument: force=1, docURI chrome://messenger/cont
ent/glodaFacetBindings.xml, bindURI chrome://messenger/content/glodaFacetBinding
s.xml#popup-menu
Partly ignore my previous comment. Even with rev 2e705f777acd backed out there occasions where
=== before calling FetchBindingDocument: force=1, docURI chrome://messenger/cont
ent/glodaFacetBindings.xml, bindURI chrome://messenger/content/glodaFacetBinding
s.xml#popup-menu
doesn't get logged, however I see the correct formatting. This is really puzzling.
Well, it's possible you're getting a hit in the startup cache.  In which case the binding is available sync.

And in any case, the document URI is chrome, so you get a syncload here if you end up in this code at all.

So the real problem, presumably, is that something else is depending on some binding being attached before we do frame construction in this case.

If you add a clone.getBoundingClientRect() get right after the aTab.panel.appendChild(clone) bit in openTab in glodaFacetTab.js, does that fix the problem?  That's not the right fix, or even a fix, but I'd like to check it as a diagnostic measure.
Thanks Boris, I thought I was going crazy. After each build, I say a whole lot more loading, and when repeating, it was less.

I'll try your suggestion as soon as I have recovered from the latest fire, that is, TB is busted and doesn't even compile.
Here we go:

  openTab: function glodaFacetTabType_openTab(aTab, aArgs) {
    // we have no browser until our XUL document loads
    aTab.browser = null;

    // First clone the page and set up the basics.
    let clone = document.getElementById("glodaTab")
                        .firstChild
                        .cloneNode(true);

    aTab.panel.appendChild(clone);
    clone.getBoundingClientRect(); <=== added this
    aTab.iframe = aTab.panel.querySelector("iframe");

doesn't make a change.
OK.  One more experiment, please.  If you do the loadBindingDocument thing, but for some binding document _other_ than glodaFacetBindings.xml, does that also fix things?  That is, is it the act of doing loadBindingDocument at all, or the act of loading this particular binding document that fixes things?

(I ask this because given that we load the binding sync anyway it's really not clear to me why the LoadBindingDocument() call would have any effect at all....)
I can do any number of experiments, but the "loadBindingDocument thing" did actually NOT fix things, see comment #26.
Aha.  OK, that at least makes sense, then.  ;)

Emilio, do you have any ideas for other things Jorg could try there?  If not, please let me know and I can do some digging....

Jorg, I assume there are no obvious errors in the browser console?
Flags: needinfo?(emilio)
OK, let me summarise what we know:

There is a certain operation in Thunderbird involving Global search that constructs some DOM on the fly to represent the search results. These are <li> nodes, and sometimes one of those nodes gets inserted into an incorrect position in the DOM tree.

The only secure remedy for that symptom is to back out:
https://hg.mozilla.org/mozilla-central/rev/2e705f777acd

Otherwise the results appear to be random. Yesterday I thought inserting the "loadBindingDocument thing" fixed the problem, but today I cannot confirm that. Yesterday I wrote that deleting the wrongly inserted node and running through the same steps gain produces the correct result (comment #13), but today I can't confirm that. I've seen cases where the next time around, the result is correct, I have also seen cases where that's not so. Yesterday I wrote that the second time around the result is correct (comment #18), today I find that this is also not so.

If you rerun the steps six times without restarting the result:
2x correct, 3x incorrect, 1x correct, 1x incorrect.
So now it's really not deterministic.
So there are two places where you're inserting stuff into the dom, one is comment 32, and the other one is all those list.appendChild(li).

If you add the getBoundingClientRect right below this line, is it fixed?

  https://dxr.mozilla.org/comm-central/rev/f51fea20a88c099624cc0145b4168b38eed204c8/mail/base/content/glodaFacetBindings.xml#932
I didn't know getBoundingClientRect on which element you wanted to add, to I added this whole-sale:
          } <== end if
          li.getBoundingClientRect();
          includeList.getBoundingClientRect();
          excludeList.getBoundingClientRect();
          remainderList.getBoundingClientRect();
        } <== end for

No difference, same random behaviour, incorrect, then a few times correct, then incorrect.

The correct ones show a funny "jumping" behaviour: It first looks wrong for a split second and then jumps into the correct position. So when the DOM is first drawn, it looks wrong but jumps to being right.
So where from here? The visual glitch on the second click in our Gloda results is not a big deal, I'm just puzzled about it, especially the random nature of the bug.
I have no more ideas that don't involve stepping in with a debugger and seeing what was going on before, and what's going on now on the frame constructor :(

I can try to take some of my unpaid time to do that, but can't make any promises since I have a pretty tough cold this week.
Flags: needinfo?(emilio)
A reduced test-case that one could reproduce on firefox loading it with dom.allow_XUL_XBL_for_file would be extremely helpful to figure this out too faster, if you could make it.
Yes, I understand that the STR are extremely annoying. Given the random nature of the problem, I assume that it will be very difficult to find a test case in FF. Sadly I have other problems burning much hotter at the moment.

Would it make a difference to move the DOM creation out of the XML file and do it in JS instead, that is, refactor this code? What's special about this code? I assume people create DOM on the fly in JS in the browser all the time and don't get random wrong results.

What's dom.allow_XUL_XBL_for_file? It doesn't appear to be a preference.
(In reply to Jorg K (GMT+1) from comment #42)
> Yes, I understand that the STR are extremely annoying. Given the random
> nature of the problem, I assume that it will be very difficult to find a
> test case in FF. Sadly I have other problems burning much hotter at the
> moment.
> 
> Would it make a difference to move the DOM creation out of the XML file and
> do it in JS instead, that is, refactor this code? What's special about this
> code? I assume people create DOM on the fly in JS in the browser all the
> time and don't get random wrong results.
> 
> What's dom.allow_XUL_XBL_for_file? It doesn't appear to be a preference.

It's a hidden preference to be able to load xul files locally. It's useful for reduced XUL / XBL test-cases :)
> Would it make a difference to move the DOM creation out of the XML file and do it in
> JS instead, that is, refactor this code?

Unlikely.

> What's special about this code?

It's inserting non-XUL things into XUL and expecting XBL to apply to them, right?  That's not something people do in the browser "all the time", or ever....
I think the elements are inserted into the DOM one by one. Jorg, can you try generating them into the respective arrays and then append them all in one go?
Duplicate of this bug: 1444801
Duplicate of this bug: 1455856
(I didn't find this because "facet" was not in summary - this distinguish gloda functions from quick filter and the like)
Blocks: 1429125
Summary: Gloda filters not properly formatted: DOM not built as it used to be after bug 1429125 → Gloda facet/filters not properly formatted: DOM not built as it used to be after bug 1429125
Boris and Emilio, sadly there has been no progress, so unless I hear an objection, I will back out bug 1429125
https://hg.mozilla.org/mozilla-central/rev/2e705f777acd#l1.12
from our TB 60 release branch, now on beta, soon on ESR.
Flags: needinfo?(emilio)
Flags: needinfo?(bzbarsky)
https://hg.mozilla.org/releases/mozilla-beta/rev/481fea2011e633e214ebba06a62caa8081b1c6b7
Bug 1439323 - Backed out bug 1429125 (changeset 2e705f777acd) to build Thunderbird 60. a=jorgk DONTBUILD
> sadly there has been no progress

Just so we're clear... who is expected to be making progress here?  If you were waiting for us to do it, that's news to us.

> I will back out bug 1429125

I'll leave it to Emilio to answer whether that's a sane thing to do.  I seem to recall patches going by that assumed we always do lazy frame construction...
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #51)
> Just so we're clear... who is expected to be making progress here?  If you
> were waiting for us to do it, that's news to us.
No offence intended. I wouldn't know where to start. Other than that, I refer you to comment #40. Anyway, I've (impatiently) done the backout, we'll see whether hell breaks loose now. Hopefully in the fullness of time we can debug this.
I _think_ it should be fine for now. It should buy you a bit more time, but as soon as de-XBL allows it we probably will remove that code path altogether. Of course that's not quite soon though.

I've written patches that take advantage of LazyFC invariants actually holding to assert more stuff, but I don't think I've broken non-LazyFC.
Flags: needinfo?(emilio)
I'm running the binary with the backout, now the page looks right, but the rendering is still very strange. It looks as if the elements were first rendered on top of one another, and then jump into place. Without the backout they are just rendered on top of one another and only sometimes jump into place.
> No offence intended.

This isn't about offence.  This is about having a credible plan for fixing this, as opposed to having no plan.  I just wanted to be clear that relying on me or Emilio debugging whatever weirdness is going on in the Thunderbird frontend is in the "no plan" bucket.

> I wouldn't know where to start.

Well, comment 40.  Or finding someone else to take it if you don't have the time/expertise...

> Hopefully in the fullness of time we can debug this.

All I'm saying is the Thunderbird leadership needsto pin down who "we" is.  ;)
Taking "tracking TB 60" off since it was fixed via backout.
Blocks: tb60found
Summary: Gloda facet/filters not properly formatted: DOM not built as it used to be after bug 1429125 → Gloda facet/filters layout not properly formatted when choosing involves. DOM not built as it used to be after bug 1429125
Whiteboard: [regression:TB57]
Summary: Gloda facet/filters layout not properly formatted when choosing involves. DOM not built as it used to be after bug 1429125 → Gloda facet/filters layout not properly formatted when choosing involved. DOM not built as it used to be after bug 1429125
https://hg.mozilla.org/releases/mozilla-esr60/rev/a86e665e8e8e7a4ece2d6cbc6f24c45c99ccf0db
Bug 1439323 - Backed out bug 1429125 (changeset 2e705f777acd) to build Thunderbird 60. a=jorgk

We're done here, the de-XBL work will take care of it in bug 1520467.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1520467
You need to log in before you can comment on or make changes to this bug.