Issue with self closing MathML tags in planet

NEW
Unassigned

Status

Websites
planet.mozilla.org
7 years ago
4 years ago

People

(Reporter: fredw, Unassigned)

Tracking

Details

(Reporter)

Description

7 years ago
See my last post here:
http://www.maths-informatique-jeux.com/blog/frederic/

Some MathML formulas are badly rendered. It seems to be due to the tags <mspace/>. It may also happen with other elements such that <none/>, <prescript/> etc

Comment 1

7 years ago
An analysis of what is going on:

Frédéric's blog is using a version of DotClear which has been "hacked"[1] to serve content as application/xhtml+xml, but still encodes the content as html in the Atom feed[2].  Venus uses FeedParser to parse the feed, and FeedParser uses Python's sgmllib to parse HTML content, and sgmllib by design does not recognize trailing slashes in element names as being significant.

Venus does handle MathML encoded as xhtml in Atom feeds correctly, for example the MathML found in Jacques Distler's blog entries[3].

Encoding mspace as <mspace></mspace> in the original content will allow the content to be parsed correctly by HTML and XML parsers alike.

It is my recommendation that one of the above two options be taken in addition to any possible changes in Planet Mozilla, as I presume that Frédéric intends for his feed to be consumable by other readers.

As to Planet Mozilla, if somebody identifies a set of elements which never can validly have child nodes (text or element), and therefore are always empty, I can update the feedparser to always fixup these elements.

Longer term, the planet should use a parser that understands self-closing syntax.  The (still incomplete) rewrite of Venus in Ruby named Mars uses Nokogiri and therefore libxml2 which handles this just fine.  Alternately, changing the feedparser to use html5lib would address this.  The way this would likely work is that the feedparser would optionally use html5lib if it is present, and fall back to sgmllib if not; and venus would simply include html5lib.

[1] http://www.maths-informatique-jeux.com/blog/frederic/?post/2010/05/02/Hacking-Dotclear-2-for-writing-XHTMLMathMLSVG
[2] http://www.maths-informatique-jeux.com/blog/frederic/?feed/atom
[3] http://golem.ph.utexas.edu/~distler/blog/atom10.xml
(Reporter)

Comment 2

7 years ago
Thanks, for the update. I think my Atom feed was originally encoded as xhtml but Google didn't like that and was not able to index my posts. I'll see later if I can do something to automatically transform empty tags.

BTW, there are also problems with a recent post from Daniel Glazman but it seems to be a different issue (SVG elements are closed and HTML mimetype is used):
http://www.glazman.org/weblog/dotclear/index.php?post/2010/12/28/The-current-CSS-Gradients-mess
(Reporter)

Comment 3

7 years ago
Here is the list of empty elements in MathML3. Most of them belong to mathml3-content but I think the most important are the seven from mathml3-presentation (mspace, mglyph, msline, none, mprescripts, malignmark, maligngroup).

fred@debian:~$ egrep "EMPTY|end of|start of" mathml3.dtd
<!-- end of MathML Character Entity section -->]]>
<!-- end of mathml3-strict-content.rng -->
<!-- start of mathml3-content.rng -->
<!-- start of mathml3-strict-content.rng -->
<!ELEMENT %share.qname; EMPTY>
<!ELEMENT %sep.qname; EMPTY>
<!ELEMENT %inverse.qname; EMPTY>
<!ELEMENT %ident.qname; EMPTY>
<!ELEMENT %domain.qname; EMPTY>
<!ELEMENT %codomain.qname; EMPTY>
<!ELEMENT %image.qname; EMPTY>
<!ELEMENT %ln.qname; EMPTY>
<!ELEMENT %log.qname; EMPTY>
<!ELEMENT %moment.qname; EMPTY>
<!ELEMENT %compose.qname; EMPTY>
<!ELEMENT %quotient.qname; EMPTY>
<!ELEMENT %divide.qname; EMPTY>
<!ELEMENT %minus.qname; EMPTY>
<!ELEMENT %power.qname; EMPTY>
<!ELEMENT %rem.qname; EMPTY>
<!ELEMENT %root.qname; EMPTY>
<!ELEMENT %factorial.qname; EMPTY>
<!ELEMENT %abs.qname; EMPTY>
<!ELEMENT %conjugate.qname; EMPTY>
<!ELEMENT %arg.qname; EMPTY>
<!ELEMENT %real.qname; EMPTY>
<!ELEMENT %imaginary.qname; EMPTY>
<!ELEMENT %floor.qname; EMPTY>
<!ELEMENT %ceiling.qname; EMPTY>
<!ELEMENT %exp.qname; EMPTY>
<!ELEMENT %max.qname; EMPTY>
<!ELEMENT %min.qname; EMPTY>
<!ELEMENT %plus.qname; EMPTY>
<!ELEMENT %times.qname; EMPTY>
<!ELEMENT %gcd.qname; EMPTY>
<!ELEMENT %lcm.qname; EMPTY>
<!ELEMENT %and.qname; EMPTY>
<!ELEMENT %or.qname; EMPTY>
<!ELEMENT %xor.qname; EMPTY>
<!ELEMENT %not.qname; EMPTY>
<!ELEMENT %implies.qname; EMPTY>
<!ELEMENT %equivalent.qname; EMPTY>
<!ELEMENT %forall.qname; EMPTY>
<!ELEMENT %exists.qname; EMPTY>
<!ELEMENT %eq.qname; EMPTY>
<!ELEMENT %gt.qname; EMPTY>
<!ELEMENT %lt.qname; EMPTY>
<!ELEMENT %geq.qname; EMPTY>
<!ELEMENT %leq.qname; EMPTY>
<!ELEMENT %neq.qname; EMPTY>
<!ELEMENT %approx.qname; EMPTY>
<!ELEMENT %factorof.qname; EMPTY>
<!ELEMENT %tendsto.qname; EMPTY>
<!ELEMENT %int.qname; EMPTY>
<!ELEMENT %diff.qname; EMPTY>
<!ELEMENT %partialdiff.qname; EMPTY>
<!ELEMENT %divergence.qname; EMPTY>
<!ELEMENT %grad.qname; EMPTY>
<!ELEMENT %curl.qname; EMPTY>
<!ELEMENT %laplacian.qname; EMPTY>
<!ELEMENT %union.qname; EMPTY>
<!ELEMENT %intersect.qname; EMPTY>
<!ELEMENT %cartesianproduct.qname; EMPTY>
<!ELEMENT %in.qname; EMPTY>
<!ELEMENT %notin.qname; EMPTY>
<!ELEMENT %notsubset.qname; EMPTY>
<!ELEMENT %notprsubset.qname; EMPTY>
<!ELEMENT %setdiff.qname; EMPTY>
<!ELEMENT %subset.qname; EMPTY>
<!ELEMENT %prsubset.qname; EMPTY>
<!ELEMENT %card.qname; EMPTY>
<!ELEMENT %sum.qname; EMPTY>
<!ELEMENT %product.qname; EMPTY>
<!ELEMENT %limit.qname; EMPTY>
<!ELEMENT %sin.qname; EMPTY>
<!ELEMENT %cos.qname; EMPTY>
<!ELEMENT %tan.qname; EMPTY>
<!ELEMENT %sec.qname; EMPTY>
<!ELEMENT %csc.qname; EMPTY>
<!ELEMENT %cot.qname; EMPTY>
<!ELEMENT %sinh.qname; EMPTY>
<!ELEMENT %cosh.qname; EMPTY>
<!ELEMENT %tanh.qname; EMPTY>
<!ELEMENT %sech.qname; EMPTY>
<!ELEMENT %csch.qname; EMPTY>
<!ELEMENT %coth.qname; EMPTY>
<!ELEMENT %arcsin.qname; EMPTY>
<!ELEMENT %arccos.qname; EMPTY>
<!ELEMENT %arctan.qname; EMPTY>
<!ELEMENT %arccosh.qname; EMPTY>
<!ELEMENT %arccot.qname; EMPTY>
<!ELEMENT %arccoth.qname; EMPTY>
<!ELEMENT %arccsc.qname; EMPTY>
<!ELEMENT %arccsch.qname; EMPTY>
<!ELEMENT %arcsec.qname; EMPTY>
<!ELEMENT %arcsech.qname; EMPTY>
<!ELEMENT %arcsinh.qname; EMPTY>
<!ELEMENT %arctanh.qname; EMPTY>
<!ELEMENT %mean.qname; EMPTY>
<!ELEMENT %sdev.qname; EMPTY>
<!ELEMENT %variance.qname; EMPTY>
<!ELEMENT %median.qname; EMPTY>
<!ELEMENT %mode.qname; EMPTY>
<!ELEMENT %determinant.qname; EMPTY>
<!ELEMENT %transpose.qname; EMPTY>
<!ELEMENT %selector.qname; EMPTY>
<!ELEMENT %vectorproduct.qname; EMPTY>
<!ELEMENT %scalarproduct.qname; EMPTY>
<!ELEMENT %outerproduct.qname; EMPTY>
<!ELEMENT %integers.qname; EMPTY>
<!ELEMENT %reals.qname; EMPTY>
<!ELEMENT %rationals.qname; EMPTY>
<!ELEMENT %naturalnumbers.qname; EMPTY>
<!ELEMENT %complexes.qname; EMPTY>
<!ELEMENT %primes.qname; EMPTY>
<!ELEMENT %emptyset.qname; EMPTY>
<!ELEMENT %exponentiale.qname; EMPTY>
<!ELEMENT %imaginaryi.qname; EMPTY>
<!ELEMENT %notanumber.qname; EMPTY>
<!ELEMENT %true.qname; EMPTY>
<!ELEMENT %false.qname; EMPTY>
<!ELEMENT %pi.qname; EMPTY>
<!ELEMENT %eulergamma.qname; EMPTY>
<!ELEMENT %infinity.qname; EMPTY>
<!-- end of mathml3-common.rng -->
<!-- end of mathml3-content.rng -->
<!-- start of mathml3-presentation.rng -->
<!ELEMENT %mspace.qname; EMPTY>
<!ELEMENT %mglyph.qname; EMPTY>
<!ELEMENT %msline.qname; EMPTY>
<!ELEMENT %none.qname; EMPTY>
<!ELEMENT %mprescripts.qname; EMPTY>
<!ELEMENT %malignmark.qname; EMPTY>
<!ELEMENT %maligngroup.qname; EMPTY>
<!-- end of mathml3-presentation.rng -->
<!-- start of mathml3-common.rng -->

Comment 4

7 years ago
Explorations with html5lib and feedparser:

http://intertwingly.net/blog/2010/12/30/Dealing-with-HTML-in-Feeds
(Reporter)

Comment 5

7 years ago
(In reply to comment #2)
> Thanks, for the update. I think my Atom feed was originally encoded as xhtml
> but Google didn't like that and was not able to index my posts. I'll see later
> if I can do something to automatically transform empty tags.

I've come back to the "clean" way (i.e. encoding the Atom content as xhtml) and the problem has gone. I now send a RSS feed to Google instead of the Atom feed, to prevent indexation issues.

> 
> BTW, there are also problems with a recent post from Daniel Glazman but it
> seems to be a different issue (SVG elements are closed and HTML mimetype is
> used):
> http://www.glazman.org/weblog/dotclear/index.php?post/2010/12/28/The-current-CSS-Gradients-mess

Of course I was wrong here, it is exactly the same issue with SVG instead of MathML.

(In reply to comment #4)
> Explorations with html5lib and feedparser:
> 
> http://intertwingly.net/blog/2010/12/30/Dealing-with-HTML-in-Feeds

It is certainly a good idea to move to html5lib, since more and more people are going to to use this technology and likely to meet the same problem with MathML/SVG... Good luck!
You need to log in before you can comment on or make changes to this bug.