Open Bug 539506 (mathml-clipboard) Opened 15 years ago Updated 2 years ago

[MathML3] Implement clipboard specification

Categories

(Core :: DOM: Copy & Paste and Drag & Drop, enhancement)

enhancement

Tracking

()

People

(Reporter: fredw, Unassigned, Mentored)

References

(Depends on 3 open bugs, )

Details

(Keywords: dev-doc-needed, helpwanted, Whiteboard: [lang=c++])

Attachments

(8 files, 3 obsolete files)

163.09 KB, application/x-xpinstall
Details
9.64 KB, patch
Details | Diff | Splinter Review
1.73 KB, application/x-xpinstall
Details
22.58 KB, patch
Details | Diff | Splinter Review
10.39 KB, patch
Details | Diff | Splinter Review
3.84 KB, patch
Details | Diff | Splinter Review
5.19 KB, patch
Details | Diff | Splinter Review
16.57 KB, patch
Details | Diff | Splinter Review
Section 6.3 "Transferring MathML" of MathML 3 describes a recommended clipboard format to use when transferring MathML code. For interoperability reasons, Mozilla browser/editor should support copy-and-paste and drag-and-drop following this specification.

FYI, an extract from David Carlisle's "introduction to MathML":

"An early version of the clipboard specification is already implemented in MathPlayer MathML renderer for Microsoft® Internet Explorer®, and in Microsoft Word 2007, which allows Math fragments to be reliably cut from Internet Explorer and pasted into Word."

https://www.ibm.com/developerworks/xml/library/x-mathml3/
Alias: mathml-clipboard
I'd like to add that Mathematica also has support for copying/pasting MathML (which at least works with Word 2007).

http://blog.wolfram.com/2008/07/31/going-wordless-at-the-advanced-mathematica-summer-school/
Here is a demo for both Word and Mathematica:
http://www.screenr.com/dAO
I cc' Andrii as this bug could maybe fit in his summer of code project, in the larger context of "interactive MathML". I have an experimental add-on which adds support for "copy MathML". Here is the source code:

https://raw.github.com/fred-wang/Mathzilla/master/mathzilla/lib/copy-mathml.js

(you can uncomment the commented part, if I remember I did that because of a problem with Components.classes and JetPack).
Just for the record, in very old notes to myself I list these files:
layout/base/nsDocumentViewer.cpp
content/base/src/nsCopySupport.cpp
cc'ing Jacques Distler. I tried to respond to his mail but my smtp server is blacklisted by his spam filter...
Setting this as a mentored bug, although I don't really have a knowledge of the code referred in comment 4.

Specifically, Jacques Distler's proposal is contained in the fifth point of "Recommended Behaviors":

"When an application exports a MathML fragment whose only child of the root element is a semantics element, it SHOULD offer, after the above flavors, a transfer flavor for each annotation or annotation-xml element, provided the transfer flavor can be recognized and named based on the encoding attribute value, and provided the annotation key is (the default) alternate-representation. The transfer content for each annotation should contain the character data in the specified encoding (for an annotation element), or a well-formed XML fragment (for an annotation-xml element), or the data that results by requesting the URL given by the src attribute (for an annotation reference)."

The particularly interesting annotation is <annotation encoding="application/x-tex"> which is generated by LaTeXML and that Jacques Distler wants to add to itex2MML. I guess MathJax could add this in the native MathML output too.

Conversely, I would be great if this application/x-tex flavor could be used in our editor (used in thunderbird, bluegriffon, seamonkey etc) but that would require a TeX parser in Gecko or a call to an external parser I guess. The video from comment 2 shows how it is used to paste in MathType or Wikipedia.
Whiteboard: [mentor=fredw][lang=c++]
(In reply to Frédéric Wang (:fredw) from comment #6)
> Conversely, I would be great if this application/x-tex flavor could be used
> in our editor (used in thunderbird, bluegriffon, seamonkey etc) but that
> would require a TeX parser in Gecko or a call to an external parser I guess.
> The video from comment 2 shows how it is used to paste in MathType or
> Wikipedia.

Perhaps nsIProcess would be helpful to call one of the existing converter like Blahtex, itex2MML or LaTeXML.

https://developer.mozilla.org/en-US/docs/XPCOM_Interface_Reference/nsIProcess

Otherwise, we can use a Javascript converter like MathJax but I don't think we want to include such a converter in our code base.
Hmm. Sorry about the Spam filter. I shall have to smack it around.

As to incorporating itex2MML (or one of the alternatives) in your editor, I should point out that's exactly the course taken by AbiWord. Their equation editor uses itex2MML and gtkmathview.

* http://www.dedoimedo.com/computers/abiword.html
I've started to work on this bug (only for MathML copy for the moment, not other flavors in <semantic> like LaTeX etc). That would be great to see if that works with tools like Word, Mathematica or MathType. I've added a first patch to my mercurial queue so normally it should be automatically integrated in Bill's build soon:

http://www.wg9s.com/mozilla/firefox/
Keywords: student-project
QA Contact: fred.wang
Whiteboard: [mentor=fredw][lang=c++]
Assignee: nobody → fred.wang
Keywords: dev-doc-needed
QA Contact: fred.wang
I haven't found something on Linux to inspect clipboard content so I've written a little Firefox add-on to do so. Perhaps it's a bit biased to use Gecko's clipboard API to test the patch so that would still be useful if someone could test with other programs (Word, Mathematica etc).

This add-on adds two icons on the bottom right corner:
- the first button opens a test case page
- the second button pastes the content of different clipboard flavors (MathML, TeX, HTML and plain text) at the end of the current page.
I've just updated my patch queue to fix some issues in mtable and implement TeX copy in <semantics>. But copying TeX is not always successful at the moment, so I'll have to improve that.
That would probably be useful to fix some selection bugs if MathML copy is implemented. So I make this depends on various such bugs.
Depends on: 175850, 487587, 759462, 175845
(In reply to Frédéric Wang (:fredw) from comment #11)
> I've just updated my patch queue to fix some issues in mtable and implement
> TeX copy in <semantics>. But copying TeX is not always successful at the
> moment, so I'll have to improve that.

I get the following compiler errors under windows:

c:/Users/wag/mozilla/mozilla2/widget/windows/nsClipboard.cpp(46) : error C2039: 'CF_MATHML' : is not a member of 'nsClipboard'
        c:\users\wag\mozilla\mozilla2\widget\windows\nsClipboard.h(23) : see declaration of 'nsClipboard'
c:/Users/wag/mozilla/mozilla2/widget/windows/nsClipboard.cpp(47) : error C2039: 'CF_TEX' : is not a member of 'nsClipboard'
        c:\users\wag\mozilla\mozilla2\widget\windows\nsClipboard.h(23) : see declaration of 'nsClipboard'

Defining those made it compile for me.
(In reply to Bill Gianopoulos [:WG9s] from comment #13)
> (In reply to Frédéric Wang (:fredw) from comment #11)
> > I've just updated my patch queue to fix some issues in mtable and implement
> > TeX copy in <semantics>. But copying TeX is not always successful at the
> > moment, so I'll have to improve that.
> 
> I get the following compiler errors under windows:
> 
> c:/Users/wag/mozilla/mozilla2/widget/windows/nsClipboard.cpp(46) : error
> C2039: 'CF_MATHML' : is not a member of 'nsClipboard'
>         c:\users\wag\mozilla\mozilla2\widget\windows\nsClipboard.h(23) : see
> declaration of 'nsClipboard'
> c:/Users/wag/mozilla/mozilla2/widget/windows/nsClipboard.cpp(47) : error
> C2039: 'CF_TEX' : is not a member of 'nsClipboard'
>         c:\users\wag\mozilla\mozilla2\widget\windows\nsClipboard.h(23) : see
> declaration of 'nsClipboard'
> 
> Defining those made it compile for me.

OK, I forgot to define them in nsClipboard.h. Thanks for the info. I hope that with this change, the copy and paste to Word will now work.
Assuming what I did to fix this in nsClipboard.h is correct, there are builds that can be used for testing at http://www.wg9s.com/mozilla/firefox/

Unfortunately I do not have Office installed on my home computer.
Using today's WG9s I now see "MathML" flavour on the clipboard but if I paste into Word I  just get character data: the markup is not recognised.

If I look at the clipboard flavour in a little c# program I see the text below  where I wouldn't expect to see the \0. I don't know too much about windows clipboard really but it looks like an utf16/8 mis-match somewhere.


		mathMl	"<\0?\0x\0m\0l\0 \0v\0e\0r\0s\0i\0o\0n\0=\0\"\01\0.\00\0\"\0?\0>\0\n\0<\0m\0a\0t\0h\0 \0x\0m\0l\0n\0s\0=\0\"\0h\0t\0t\0p\0:\0/\0/\0w\0w\0w\0.\0w\03\0.\0o\0r\0g\0/\01\09\09\08\0/\0M\0a\0t\0h\0/\0M\0a\0t\0h\0M\0L\0\"\0>\0\n\0<\0m\0o\0>\0�\0<\0/\0m\0o\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0<\0m\0s\0q\0r\0t\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0r\0o\0w\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0s\0u\0p\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0i\0>\0b\0<\0/\0m\0i\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0n\0>\02\0<\0/\0m\0n\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0/\0m\0s\0u\0p\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0o\0>\0-\0<\0/\0m\0o\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0r\0o\0w\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0n\0>\04\0<\0/\0m\0n\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0o\0>\0b <\0/\0m\0o\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0i\0>\0a\0<\0/\0m\0i\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0o\0>\0b <\0/\0m\0o\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0m\0i\0>\0c\0<\0/\0m\0i\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0<\0/\0m\0r\0o\0w\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0 \0 \0<\0/\0m\0r\0o\0w\0>\0\r\0\n\0 \0 \0 \0 \0 \0 \0<\0/\0m\0s\0q\0r\0t\0>\0\n\0<\0/\0m\0a\0t\0h\0>\0\0\0"	string
Good to know that the "MathML" flavor was used this time. I tried with MathType and Word and that seems to work, but Word use the plain text flavor (I don't know about MathType).

In general Gecko uses UTF-16 for its internal string. The MathML REC indicates: 

"The instance MUST specify the character encoding, if it uses an encoding other than UTF-8, either in the XML declaration, or by the use of a byte-order mark (BOM) for UTF-16-encoded data."

I thought Gecko would add this BOM mark, but I don't see it in the output of your program. So to be in accordance with the spec we should either:

1) convert the string to UTF-8

2) add a BOM mark

3) declaring UTF-16 in the XML declaration
Sorry my builds were a bit late today because I had issues on my build system.  Windows builds including the BOM fix are now available on my site.
The BOM improved things in my C# clipboard viewer: the string encoding was automatically identified so the MathML XML markup appears correctly, however Word 2010 still doesn't recognise the MathML on paste. Comparing with MathPlayer and the Windows Math Input Panel, both of which do work to paste into Word the main difference I see is that both of them also have an explicit encoding="UTF-16" in the xml declaration. They both also use namespace prefixed forms but I suspect it is the encoding declaration that is making the difference. I could ask Murray to say what exactly Word is looking for....
(In reply to David Carlisle from comment #19)
> Comparing with
> MathPlayer and the Windows Math Input Panel, both of which do work to paste
> into Word the main difference I see is that both of them also have an
> explicit encoding="UTF-16" in the xml declaration.

OK, I'll update my patch in a moment to add the encoding.
OK.  New builds on my site with the new patch (unless I screwed up, which is entirely possible) are now available.
Sorry, no success with Word, I now see the encoding="UTF-16" and if I cut and paste the xml out of my clipboard viewer (presumably just as Unicode text flavour since it's a basic textbox) then it does display as formatted mathematics in Word, but if I directly paste from the clipboard after copying the expression in firefox, Word just shows the character data. I don't know what to suggest, I could ask Murray what exactly Word is looking for on the clipboard, or someone else who understands Word could try to debug.
(In reply to David Carlisle from comment #22)
> I could ask Murray what exactly Word is looking
> for on the clipboard, or someone else who understands Word could try to
> debug.

Yes, please ask Murray. Note that Firefox transmit MathML, HTML and plain text flavors. Probably the other flavors are disturbing Word.

Does someone have access to another tools? I tried with MathType and it worked (although I don't know if it really used the MathML flavor or the unicode one).

I can probably modify my patch to temporarily remove the HTML and plain text flavors.
(In reply to Frédéric Wang (:fredw) from comment #23)
> I can probably modify my patch to temporarily remove the HTML and plain text
> flavors.

Done. Let's see if Word understands the MathML better and if MathType did not cheat by using the plain text flavor.
nsCopySupport and nsDocumentEncoder use essentially the same code to determine whether we are in a plaintext context. I need to modify this code to determine whether we are in a MathML context, but I feel a bit bad to add even more code duplication. So let's consider a preliminary patch to avoid this issue so that I can continue the work on this bug.

nsCopySupport::IsPlainTextContext is public but is only used in nsCopySupport::SelectionCopyHelper. Moreover it does (almost) a subset of the work of nsHTMLCopyEncoder::SetSelection and this latter function is called after nsCopySupport::IsPlainTextContext. By switching the order of function calls, I think we can just retrieve the boolean result stored in nsHTMLCopyEncoder::mIsTextWidget. I propose to add a GetContextParameters member to the nsIDocumentEncoder interface in order to do this task (I can probably make mIsTextWidget a read-only boolean attribute if you prefer). Note the 's' at the end of GetContextParameters: I plan to add another parameter for MathML.

I say "almost" above, because nsCopySupport::IsPlainTextContext uses an additional code fragment to skip non-HTML tags. I think it is the right thing because the conditions after this fragment don't check the namespace. Perhaps it is also a bit faster. This change is not a problem for input/textarea because they can not contain non-HTML elements anyway (I think?) and regarding body, we really just want to reach this element and don't care about what it contains.
Attachment #678409 - Flags: review?(ehsan)
Comment on attachment 678409 [details] [diff] [review]
Avoid code duplication between nsCopySupport and nsDocumentEncoder - V1

Review of attachment 678409 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/base/public/nsIDocumentEncoder.idl
@@ +327,5 @@
> +  /**
> +   * Determine whether the selection is inside a special context.
> +   * @param aIsPlainTextContext whether the selection is in a plain text context
> +   */
> +  void getContextParameters(out boolean aIsPlainTextContext);

You need to update the uuid of this interface.
Attachment #678409 - Flags: review?(ehsan) → review+
(In reply to Ehsan Akhgari [:ehsan] from comment #26)
> @@ +327,5 @@
> > +  /**
> > +   * Determine whether the selection is inside a special context.
> > +   * @param aIsPlainTextContext whether the selection is in a plain text context
> > +   */
> > +  void getContextParameters(out boolean aIsPlainTextContext);
> 
> You need to update the uuid of this interface.

Thanks. The next part will be to add a aMathMLContext parameter, so I'll update the interface in this second patch.
(In reply to comment #27)
> (In reply to Ehsan Akhgari [:ehsan] from comment #26)
> > @@ +327,5 @@
> > > +  /**
> > > +   * Determine whether the selection is inside a special context.
> > > +   * @param aIsPlainTextContext whether the selection is in a plain text context
> > > +   */
> > > +  void getContextParameters(out boolean aIsPlainTextContext);
> > 
> > You need to update the uuid of this interface.
> 
> Thanks. The next part will be to add a aMathMLContext parameter, so I'll update
> the interface in this second patch.

It's considered good practice to rev the uuid in every patch, since that will protect you in the future if for example one of the two patches get backed out for some reason.
OK, here is a version with the uuid updated.
Attachment #678409 - Attachment is obsolete: true
(In reply to Frédéric Wang (:fredw) from comment #23)
> (In reply to David Carlisle from comment #22)
> > I could ask Murray what exactly Word is looking
> > for on the clipboard, or someone else who understands Word could try to
> > debug.
> 
> Yes, please ask Murray. Note that Firefox transmit MathML, HTML and plain
> text flavors. Probably the other flavors are disturbing Word.
> 

IIUC, Murray quick guess about the problem is that Word expect UTF-8, not UTF-16. Apparently, on Windows nsDataObj::GetText converts CF_TEXT and CF_HTML to UTF-8. So perhaps we should do so for CF_MATHML and CF_TEX too.
So unless I miss something, implementing the paste operation for MathML was pretty easy, just saying to the editor to recognize the MathML mime types (I hope the name of the flavors are already recognized by Windows thanks to the other patch for the copy operation). I've just updated my patch queue, so the patch won't be included in Bill's builds immediately. Also, you need an editor to test the paste operation so I'm attaching a basic add-on to load the editor in Firefox. Once installed, the editor should be available at chrome://editor/content/editor.xul Note that the DOM inspector can be used to copy the HTML source code generated by the editor.

The HTML paste operation is already implemented in Gecko and because MathML can be included in HTML, the editor already already allows to paste MathML (so your can insert math in Thunderbird, see bug 593721 comment 6). If you really want to test my patch you should either use an application that exports the MathML flavor without the HTML one like the Windows Math panel for example. Or there's also still a patch in my patch queue to make the copy MathML operation export the MathML flavor only.
(In reply to Frédéric Wang (:fredw) from comment #31)
> Created attachment 680345 [details]

> other patch for the copy operation). I've just updated my patch queue, so
> the patch won't be included in Bill's builds immediately. Also, you need an

Actually you updated the queue just before my build got to the point where it downloads your queue.

Therefore, this patch will be included in the builds currently in progress for today.
(In reply to Frédéric Wang (:fredw) from comment #31)
> So unless I miss something, implementing the paste operation for MathML was
> pretty easy

OK, I know what I missed: when pasting MathML in MathML, I have to remove the <math> root from the imported MathML tree before pasting it and do more processing like slitting the token elements adding <mrow>s etc. However, pasting in MathML, or cutting/deleting a MathML subtree or copying a MathML subtree may currently result in invalid markup (even without my patches) so I'd better handle all these cases in a separate bug...

(In reply to Bill Gianopoulos [:WG9s] from comment #32)
> Actually you updated the queue just before my build got to the point where
> it downloads your queue.
> 

OK, great! Good timing :-)
Attachment #678457 - Attachment description: Avoid code duplication between nsCopySupport and nsDocumentEncoder - V2 → Part 1: Avoid code duplication between nsCopySupport and nsDocumentEncoder - V2
I have two questions:

- Is there a way to parse an XML document from a string? The ContentUtils::ParseFragmentHTML currently used by the editor code seems to do that, but I'm wondering if I should instead use XML for MathML (see point 6 below). ContentUtils::ParseFragmentXML fails for me when the XML declaration is present.

- Can someone with knowledge of the clipboard code in widget/ help? I'm wondering how to register these "Universal Type Identifier" for Mac. In general, I'm wondering if I need to add more code to claim support for MathML copy or paste transfer. At the moment, I only added this for Windows: attachment 680462 [details] [diff] [review]. See also points 5 and 6 below.

Maybe I should also add a status progress on this bug:

1) When you select a MathML subexpression, the "copy" command will put additional MathML and TeX flavors into the clipboard (see 2). However, it is hard to select the whole <math> element with a mouse (often you select adjacent text nodes or elements). It seems that adding a "copy formula" item in the context menu will help, but that would mean touching code outside Gecko so I'd prefer to do that in a separate bug, to avoid delaying the integration of these patches. Only single-range selections are considered for MathML, which I think is reasonable (the comments in the code seems to assume it is for table selection only, even if that works in more general cases).

2) The copy command for MathML will always put MathML & text unicode flavors representing the selected subexpression into the clipboard, as described in the MathML REC. The HTML-related flavors for this subexpression are still added to the clipboard (but the HTML context & info flavors will be empty). When your MathML selection has a <semantics> ancestor, the flavors for presentation MathML, content MathML and TeX are added to the clipboard, if found.

3) Inside the HTML editor, you can paste "MathML" and "MathML Presentation" flavors. Currently, I don't try to sniff the unicode text flavor to see if it contains MathML and I don't use the TeX flavor either (that would require a TeX parser). I don't plan to include any of these features in this bug's work. I didn't add anything for the plaintext editor either. Perhaps that would be useful to have the TeX flavor pasted e.g. when you edit a page in Wikipedia or other web forms. But that would require an interface to let user select among multiple flavors I guess (here TeX and unicode text). BTW, there is bug 73286 for Seamonkey.

4) When you cut/copy/paste/delete a MathML subtree you may get invalid markup. This is already the case without my patch. The HTML editor doesn't seem to have been designed with mixing HTML & MathML (or SVG) in mind, so I'd better postpone this work in a separate bug too and focus on the main goal on this bug: allowing interaction with other applications (points 5 and 6)

5) Currently, I'm able to do copy & paste from Gecko to Gecko and to transfer data to David Carlisle's C# program (http://monet.nag.co.uk/~dpc/mmlclipboard.exe ; http://code.google.com/p/web-xslt/source/browse/trunk/mmlclipboard). Transfer from Gecko to MathType seems to work (only MathML Presentation and unicode text with the MathType version I have access to). Transfer from Gecko to Word works via unicode text only. David and I have tried many combinations but we have not been able to figure out how to make it work with a MathML flavor. Transfer from Gecko to other applications like Mathematica or Mapple may probably work via unicode text too. But I don't know if these programs support the MathML flavor and I haven't tested them. I have essentially done experiments on Windows as I don't have Mac/Linux programs to communicate with besides MathType on Mac.

6) I'm able to transfer from David Carlisle's C# program to Gecko. Transfer from Word/MathPlayer/Windows Math Panel to Gecko does not seem to work at all. I think there are two issues: the MathML REC says that the MathML flavor must contain an XML document but I use the HTML parser and these three programs use namespace prefixes + I'm not sure that the platform-specific stuff I added in widget/ are enough.
(In reply to Frédéric Wang (:fredw) from comment #38)
> I have two questions:
> 
> - Is there a way to parse an XML document from a string? 

DomParser is your friend there:

https://developer.mozilla.org/en-US/docs/DOM/DOMParser


> 
> 5) Currently, I'm able to do copy & paste from Gecko to Gecko and to
> transfer data to David Carlisle's C# program
> (http://monet.nag.co.uk/~dpc/mmlclipboard.exe ;
> http://code.google.com/p/web-xslt/source/browse/trunk/mmlclipboard).
> Transfer from Gecko to MathType seems to work (only MathML Presentation and
> unicode text with the MathType version I have access to). Transfer from
> Gecko to Word works via unicode text only. David and I have tried many
> combinations but we have not been able to figure out how to make it work
> with a MathML flavor. Transfer from Gecko to other applications like
> Mathematica or Mapple may probably work via unicode text too. But I don't
> know if these programs support the MathML flavor and I haven't tested them.
> I have essentially done experiments on Windows as I don't have Mac/Linux
> programs to communicate with besides MathType on Mac.
> 
> 6) I'm able to transfer from David Carlisle's C# program to Gecko. Transfer
> from Word/MathPlayer/Windows Math Panel to Gecko does not seem to work at
> all. I think there are two issues: the MathML REC says that the MathML
> flavor must contain an XML document but I use the HTML parser and these
> three programs use namespace prefixes + I'm not sure that the
> platform-specific stuff I added in widget/ are enough.

I'm fairly sure that the xml namespace prefixing is not relevant, what does seem to make a difference though is the type, if you sent a string as MathML flavor Word never shows anything, probably because it passes it through the wring encoding or null-at-end paths. What the c# test program you mentioned does for both cut and paste of MathML flavour is to put an in memory stream on the clipboard with a BOM at the front and no null at the end. If you paste in that for Word sees happy. So in my c# clipboard viewer I can see code that has been coped from gecko, pasting it to word fails but if I just get the viewer to repaste the same string as a MemoryStream with flavor MathML then Word picks it up and formats it as a math zone.
(In reply to David Carlisle from comment #39)
> DomParser is your friend there:
> 
> https://developer.mozilla.org/en-US/docs/DOM/DOMParser
> 

Thanks David, but I was really asking a function from Gecko's C++ source. I'm not really willing to use XPCOM to call nsIDOMParser ;-)

> 
> I'm fairly sure that the xml namespace prefixing is not relevant

I agree for Gecko to * transfer, but I was really considering * to Gecko transfer. Word, MathPlayer & Windows Math Panel use xml namespace prefixing and when I send them to Gecko, it does not seem to work. It doesn't work either when I use your C# program to transfer this code with namespace prefixes to Gecko so I guess ContentUtils::ParseFragmentHTML is not really happy with that... thus my question about whether there is another function to parse an XML document. 

(However, when I seem to remember that when I hardcoded the string in Gecko's source it worked but now I'm no longer sure about which parser I used)
(In reply to Frédéric Wang (:fredw) from comment #38)
> - Is there a way to parse an XML document from a string? The
> ContentUtils::ParseFragmentHTML currently used by the editor code seems to
> do that, but I'm wondering if I should instead use XML for MathML (see point
> 6 below).

Looking at the MathML spec for the clipboard flavor, yes.

> ContentUtils::ParseFragmentXML fails for me when the XML
> declaration is present.

You need nsContentUtils::ParseDocumentXML, but that method doesn’t exist yet, because no one has needed such a Gecko internal API so far. I guess now is the time to introduce it.

You need to set up an nsParser with an nsXMLContentSink, make sure everything is initialized in a suitable way and then call
nsresult
nsParser::Parse(const nsAString& aSourceBuffer,
                void* aKey,
                bool aLastCall).

Pass nullptr for aKey and true as aLastCall.

Unfortunately, if you are going to hack nsParser, you are going to have a bad time. Sorry that I haven’t gotten around to obsoleting nsParser, yet.

> 6) I'm able to transfer from David Carlisle's C# program to Gecko. Transfer
> from Word/MathPlayer/Windows Math Panel to Gecko does not seem to work at
> all. I think there are two issues: the MathML REC says that the MathML
> flavor must contain an XML document but I use the HTML parser and these
> three programs use namespace prefixes + I'm not sure that the
> platform-specific stuff I added in widget/ are enough.

Please don’t use the HTML for this.
(In reply to Henri Sivonen (:hsivonen) from comment #41)
> You need nsContentUtils::ParseDocumentXML, but that method doesn’t exist
> yet, because no one has needed such a Gecko internal API so far. I guess now
> is the time to introduce it.

Thanks, that's what I was feared :-(

> You need to set up an nsParser with an nsXMLContentSink

Quick look: NS_NewXMLContentSink wants arguments nsIDocument* aDoc, nsIURI* aURI, nsISupports* aContainer and nsIChannel* aChannel. I guess aDoc is the target document as in other nsContentUtils::Parse* methods, but what should I use for the others?
(In reply to David Carlisle from comment #39)
> I'm fairly sure that the xml namespace prefixing is not relevant, what does
> seem to make a difference though is the type, if you sent a string as MathML
> flavor Word never shows anything, probably because it passes it through the
> wring encoding or null-at-end paths. What the c# test program you mentioned
> does for both cut and paste of MathML flavour is to put an in memory stream
> on the clipboard with a BOM at the front and no null at the end. If you
> paste in that for Word sees happy.
Gecko currently has no support for copying anything except TYMED_HGLOBAL, but all the code samples (except yours) I found assume that the source is the TYMED_HSTREAM that the Math Input Panel provides...
(In reply to neil@parkwaycc.co.uk from comment #43)

> Gecko currently has no support for copying anything except TYMED_HGLOBAL,
> but all the code samples (except yours) I found assume that the source is
> the TYMED_HSTREAM that the Math Input Panel provides...

Don't take my code as authoritative in any way (I'm _way_ out of my depth here:-) It was a mixture of some information found somewhere in the internet and trial and error. The .NET MemoryStream type used there works for identifying the MathML flavor as put on the clipboard by the input panel and by MathPlayer and Word, it also works for pasting to Word but I have no idea what .NET/COM translation is going on behind the scenes so I'm not sure how relevant it is to the c++ coding under discussion. If it is useful that's fine but if it is irrelevant then that's OK too:-)
(In reply to Frédéric Wang (:fredw) from comment #42)
> Quick look: NS_NewXMLContentSink wants arguments nsIDocument* aDoc, nsIURI*
> aURI, nsISupports* aContainer and nsIChannel* aChannel. I guess aDoc is the
> target document as in other nsContentUtils::Parse* methods, but what should
> I use for the others?

You can do

  nsCOMPtr<nsIURI> uri;
  NS_NewURI(getter_AddRefs(uri), "about:blank");
  nsCOMPtr<nsIPrincipal> principal =
    do_CreateInstance(NS_NULLPRINCIPAL_CONTRACTID);
  nsCOMPtr<nsIDOMDocument> domDocument;
  nsresult rv = nsContentUtils::CreateDocument(EmptyString(),
                                               EmptyString(),
                                               nullptr,
                                               uri,
                                               uri,
                                               principal,
                                               nullptr,
                                               DocumentFlavorLegacyGuess,
                                               getter_AddRefs(domDocument));

And then
 * pass domDocument QIed to nsIDocument as aDoc
 * pass uri as aURI
 * pass nullptr as aContainer
 * pass nullptr as aChannel

However, instead of using NS_NewXMLContentSink as-is, it seems to me you will need to add a new way to instantiate nsXMLContentSink in such a way that mRunsToCompletion gets set to true in the constructor of nsXMLContentSink (before nsXMLContentSink::Init() is called).

You might even get away with not calling nsXMLContentSink::Init() if you have some other mechanism to set mTargetDocument to your doc and mNodeInfoManager to its nodeinfomanager.
Experimental patch (not tested).
(In reply to Henri Sivonen (:hsivonen) from comment #41)
> You need to set up an nsParser with an nsXMLContentSink, make sure
> everything is initialized in a suitable way and then call
> nsresult
> nsParser::Parse(const nsAString& aSourceBuffer,
>                 void* aKey,
>                 bool aLastCall).
> 
> Pass nullptr for aKey and true as aLastCall.
> 
> Unfortunately, if you are going to hack nsParser, you are going to have a
> bad time. Sorry that I haven’t gotten around to obsoleting nsParser, yet.
> 

nsContentUtils.cpp uses the nsIParser interface, but

nsParser::Parse(const nsAString& aSourceBuffer,
                void* aKey,
                bool aLastCall)

is private. Do you suggest me to add this method to nsIParser?
Attachment #677945 - Attachment description: Testcase Add-on → Add-on to test MathML copy and analyse the clipboard
(In reply to Frédéric Wang (:fredw) from comment #47)
> nsContentUtils.cpp uses the nsIParser interface, but
> 
> nsParser::Parse(const nsAString& aSourceBuffer,
>                 void* aKey,
>                 bool aLastCall)
> 
> is private. Do you suggest me to add this method to nsIParser?

Yes. :-(
Comment on attachment 680824 [details] [diff] [review]
Add a method nsContentUtils::ParseDocumentXML

>+  static nsresult ParseDocumentXML(const nsAString& aSourceBuffer,
>+                                   nsIDocument* aTargetDocument,
>+                                   bool aScriptingEnabledForNoscriptParsing);

aScriptingEnabledForNoscriptParsing doesn’t really make sense in the context of XML, since XML parsing of <noscript> does not vary.

>+    NS_NewXMLContentSinkFromString(&sXMLContentSink, aTargetDocument);
...
>+  sXMLFragmentSink->SetTargetDocument(aTargetDocument);

Creating sXMLContentSink but using sXMLFragmentSink.
Attachment #680463 - Attachment description: Part 5: Implement MathML paste and drop operations for the HTML editor - V1 → Part 6: Implement MathML paste and drop operations for the HTML editor - V1
Attachment #680463 - Attachment filename: mathml-clipboard-5.diff → mathml-clipboard-6.diff
New patch. I tried it, but I hit the assert in nsContentSink::DidBuildModelImpl:

    MOZ_ASSERT(mDocument->GetReadyStateEnum() ==
               nsIDocument::READYSTATE_LOADING, "Bad readyState");

I'm sure I'm doing something wrong in the parser initialization...
Attachment #680824 - Attachment is obsolete: true
Attachment #681895 - Flags: feedback?(hsivonen)
Comment on attachment 681895 [details] [diff] [review]
Part 5: Add a method nsContentUtils::ParseDocumentXML

>+NS_NewXMLContentSinkFromString(nsIXMLContentSink** aInstancePtrResult,
>+                               nsIDocument* aDoc);

“FromString”?

(In reply to Frédéric Wang (:fredw) from comment #50)
> Created attachment 681895 [details] [diff] [review]
> Part 5: Add a method nsContentUtils::ParseDocumentXML
> 
> New patch. I tried it, but I hit the assert in
> nsContentSink::DidBuildModelImpl:
> 
>     MOZ_ASSERT(mDocument->GetReadyStateEnum() ==
>                nsIDocument::READYSTATE_LOADING, "Bad readyState");
> 
> I'm sure I'm doing something wrong in the parser initialization...

You need to make sure the doc transitions to READYSTATE_LOADING before you call nsIParser::Parse(). Typically this is done by calling StartDocumentLoad on the document with kLoadAsData as the first argument.
Attachment #681895 - Flags: feedback?(hsivonen) → feedback+
Assignee: fred.wang → nobody
Whiteboard: [mentor=fredw][lang=c++]
Attachment #681895 - Attachment is obsolete: true
Since I'm unlikely to work on these patches again soon, I've created an add-on for Firefox that implements copying MathML and TeX: https://addons.mozilla.org/en-US/firefox/addon/mathml-copy/
Mentor: fred.wang
Whiteboard: [mentor=fredw][lang=c++] → [lang=c++]
I also support Firefox to fully implement the Clipboard API specification. In contacted the Github developers since their text editor principally supports pasting images from the clipboard. However, they need the paste processing model implemented according to http://www.w3.org/TR/clipboard-apis/#processing-model.

Right now, Chrome is the only browser that allows them to process images pasted from the clipboard. As soon as other browsers add this support, it will work on github.com.
Note that this bug is for the *MathML3* clipboard specification.
I am sorry. I was already pointed to the correct bug report:

https://bugzilla.mozilla.org/show_bug.cgi?id=803014

Clipboard API improvements are tracked by bug 1619251. Would MathML these days be covered by copying XML/HTML? It's not immediately apparent why it would need its own infrastructure.

(In reply to Anne (:annevk) from comment #57)

Clipboard API improvements are tracked by bug 1619251. Would MathML these days be covered by copying XML/HTML? It's not immediately apparent why it would need its own infrastructure.

Per comment 0, presumably we should make sure that what we export can be pasted into Microsoft Word.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: