Refactor the MathML Operator Dictionary

NEW
Assigned to

Status

()

defect
P5
normal
2 years ago
3 months ago

People

(Reporter: fredw, Assigned: tom.overlund, Mentored)

Tracking

(Blocks 2 bugs, {helpwanted})

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox54 affected)

Details

(Whiteboard: [good second bug])

Attachments

(2 attachments)

(Reporter)

Description

2 years ago
layout/mathml/nsMathMLOperators.cpp could be rewritten a bit:

* Use a hardcoded static table instead of using XPCOM to parse mathfont.properties into memory.
* Use a sorted table instead of a hash table. See mfbt/BinarySearch.h for a helper search function.
* Minimize the size of an entry (OperatorData). For example, use bitfields for the rspace/space, flags or form.

See https://trac.webkit.org/browser/trunk/Source/WebCore/mathml/MathMLOperatorDictionary.cpp#L35 for the implementation in WebKit.

Comment 1

a year ago
I am looking for a first bug. Too bad it looks like all those changes have already been made to the file, right?
(Reporter)

Comment 2

a year ago
@Amir: Thanks I think that would be helpful. AFAIK, nothing has been done in Mozilla (note that the file I mention in comment 0 is from WebKit). The relevant files in Gecko are:


https://dxr.mozilla.org/mozilla-central/source/layout/mathml/mathfonts.properties (dictionary itself)
https://dxr.mozilla.org/mozilla-central/source/layout/mathml/nsMathMLOperators.cpp (class that loads and exposes the dictoinary)
Yeah.

The first link is broken, it should be https://searchfox.org/mozilla-central/source/layout/mathml/mathfont.properties.

Looks like that code could indeed get some love. If Fred doesn't have the time, I'm happy to mentor / review patches :).

Comment 4

a year ago
Thank you. I am working on this. I am trying to resolve all my own questions, but I'll let you know if I really need help.

Comment 5

a year ago
Okay, I need help.
If I am to hard code a sorted table of operators.
That means I will discard all of the functions dealing with constructing the hashtable?
But, I could use them in a separate program to pre-process the operators and generate the code for me.

So is mathfont.properties considered to be in sorted order?
It seems that the keys inserted in the hashtable are the hexadecimal unicode numbers and a digit for the form appended on the end. The only thing that doesn't make sense is how will binary search sort through this? 

For example:
operator.\u0021.postfix = lspace:1 rspace:0 # !
operator.\u0021\u0021.postfix = lspace:1 rspace:0 # !!
operator.\u0021\u003D.infix = lspace:4 rspace:4 # !=
operator.\u0025.infix = lspace:3 rspace:3 # percent sign

The keys would be:
00213
002100213
0021003D1
00251

So, the keys aren't sorted by value. I think I would just have to design my own complicated binary search because of these concatenated operators that appear at most 3 consecutively:
operator.\u002D\u002D.postfix = lspace:0 rspace:0 # --
operator.\u002D\u003D.infix = lspace:4 rspace:4 # -=
operator.\u002D\u003E.infix = lspace:5 rspace:5 # ->

I am guessing since we want this hard coded mathfont.properties is not likely to change? Which allows me to design the search with the above observation. The Webkit implementation does not include concatenated operators in their table. I am not sure if we can do this too.

I am not sure how to proceed. Hopefully I can get some explanation for each of my above questions.

Comment 6

a year ago
Oh, I see now why we have concatenated operators, we are supporting programming operators too.
(In reply to amir rasouli from comment #5)
> Okay, I need help.
> If I am to hard code a sorted table of operators.
> That means I will discard all of the functions dealing with constructing the
> hashtable?
> But, I could use them in a separate program to pre-process the operators and
> generate the code for me.
> 
> So is mathfont.properties considered to be in sorted order?

Yes, or that's what the comment there claims at least.

> It seems that the keys inserted in the hashtable are the hexadecimal unicode
> numbers and a digit for the form appended on the end. The only thing that
> doesn't make sense is how will binary search sort through this? 

So IIUC (Fred can correct me), the keys are a pair indeed (Operator, Form).

There are only three possible forms (infix, prefix, postfix), so as long as the array is sorted by digit / operator, you can binary search that and then look at the contiguous entries to check the form.

Furthermore, you can keep the list sorted first first by operator, second by form, and then you get BinarySearch to do the job for you I'd think.

Fred, is that right?
Flags: needinfo?(fred.wang)

Comment 8

a year ago
That all makes sense, I suppose the concatenated operators would also have to be searched linearly. Essentially just use the first unicode point as a search key and check up and down from there.
(Reporter)

Comment 9

a year ago
Yes, (Operator, Form) would be the key of the sorted table. In general, "Operator" is a Unicode character but it could indeed be a surrogate pairs (bug 1246657) or a longer string (but I think made of at most two characters). In any case, you can use lexical ordering on the character/string. "Form" takes only three values, so you can just pick one arbitrary order e.g. prefix, infix, postfix.

Some remarks:
* I think multiple characters are exceptional and rarely used, maybe you can just have a second sorted table to handle them. WebKit does not support such entries yet IIRC.
* If an "Operator" has several "Forms", it's nice to make the corresponding entries consecutive so you can quickly access all of them. See MathMLOperatorDictionary::search in WebKit. IIRC, in Mozilla we need that when retrieving the direction of an operator.
* I would suggest you write a program to generate the new format from the old format. FYI, there is an old Perl script updateOperatorDictionary.pl in layout/mathml/ that updates the content of mathfont.properties using the info from the "XML Entity Definitions for Characters" spec.
Flags: needinfo?(fred.wang)

Comment 10

a year ago
I finished all the changes, built it (it compiles without any warnings associated with my changes), ran Firefox, and looked at the MathML torture test. The tab crashes. I built Firefox with debug flag, and tried to run --debug, but it doesn't work. Also, the gdb command isn't recognized in the Mozilla shell. I tried to debug from a gdb shell, but it said it had no debugger flags. When I ran it, it was giving me information this time as it was running as opposed to before when I would run it without the debug build. The information doesn't make any sense to me. I don't really think that was the debugger, but I am not sure.

I wish I could debug this in VS, but I couldn't figure out how. I replicated and isolated nsMathMLOperators.cpp in a separate project and tested it by looking up operators and it works fine, but in my separate project I used std::wstring, because I couldn't figure out how to use the NS libraries in my project. So, I just don't know what to do anymore to find the issue.

Here is the information that was outputted when the tab crashed:
https://github.com/arasouli91/GenerateOperatorTable/blob/master/mathML/crash%20notes.txt

Here is my changes to nsMathMLOperators.cpp (the only changes in any files): 
https://github.com/arasouli91/GenerateOperatorTable/blob/master/mathML/nsMathMLOperators.cpp
(Assignee)

Comment 11

4 months ago
Is there still interest in this? I reworked Amir's code and got it working. This would be a first bug for me.
Yes, please go ahead! :)
(Assignee)

Comment 13

4 months ago
I'm attaching the complete file of what I have now, so it doesn't get lost. It could use a bit of cleanup in spots. But I tested it with:

$ ./mach reftest layout/reftests/mathml/reftest.list

and did not get any new failures on my end.
Is there any chance you could follow https://moz-conduit.readthedocs.io/en/latest/phabricator-user.html to submit patches?

Or alternatively attaching the patch to bugzilla also works. Otherwise reviewing the patch is going to be pretty hard.

Thanks!
(Assignee)

Comment 15

4 months ago
Sure, no problem. I'll work on that tomorrow.
(Assignee)

Comment 16

3 months ago
Removed all the parsing code from nsMathMLOperators.cpp. Added a statically
initialized array in sorted order for binary searching, which replaces the
XPCOM hashtable, satisfying deCOMtamination requirements. Special thanks to
Amir, a new contributor whose work this is based off of, but was unfinished
and abandoned.
(Reporter)

Comment 17

3 months ago

Thanks for working on this Tom (and Amir)!

Assignee: nobody → tom.overlund
Mentor: emilio
You need to log in before you can comment on or make changes to this bug.