Bug 1692771 Comment 5 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

WaltS48, thanks for the details, including specific reproductions where it works and where it does not work in comment 3, and for the sample message in comment 1.

When you just type or paste a URL as plaintext in the HTML editor, without manually marking it as link (which is what most people do), we detect it within the HTML (using the `scanHTML()` recognizer) and make it a link. This works as expected.

In the sample message, in the HTML part, you see
```
<a moz-do-not-send="true" href="https://bugzilla.mozilla.org/">https://bugzilla.mozilla.org</a>
```

The `moz-do-not-send="true"` means: When we downconvert to plaintext and see this attribute, we should output only the link text and not the link target.

(We do it in such an explicit way, instead of relying on heuristics like `target == linktext`, because we also convert other forms of URLs where the linktext does not match the URL precisely, but only vaguely, e.g. linktexts "foobar.com", "<http://foobar.com>" etc. give URL https://foobar.com . Instead of relying on heuristics that would reverse the recognition, we just unabiguously mark which <a href> was artificially inserted.)

This used to work, but apparently broke recently.

I have 3 theories what might have changed and broke this:
* The mail sending code in Thunderbird was rewritten and broke it (I would think that's the most likely)
* Something in Thunderbird is now "sanitizing" the sent HTML during the sending process and removing the attribute.
* A change in the HTML parser in Gecko.

If it's the second or third, then an exact regression range would help.

This is not a bug in the HTML<->plaintext converters, but it's a regression in the sending process. Something gets lost between `scanHTML()` and the HTML->plaintext converter.
WaltS48, thanks for the details, including specific reproductions where it works and where it does not work in comment 3, and for the sample message in comment 1.

When you just type or paste a URL as plaintext in the HTML editor (which is what most people do), without manually marking it as link, we detect the plaintext URL within the HTML text nodes (using the `scanHTML()` recognizer) and make it a link. This works as expected.

In the sample message, in the HTML part, you see
```
<a moz-do-not-send="true" href="https://bugzilla.mozilla.org/">https://bugzilla.mozilla.org</a>
```

The `moz-do-not-send="true"` means: When we downconvert to plaintext and see this attribute, we should output only the link text and not the link target.

(We do it in such an explicit way, instead of relying on heuristics like `target == linktext`, because we also convert other forms of URLs where the linktext does not match the URL precisely, but only vaguely, e.g. linktexts "foobar.com", "<http://foobar.com>" etc. give URL https://foobar.com . Instead of relying on heuristics that would reverse the recognition, we just unabiguously mark which <a href> was artificially inserted.)

This used to work, but apparently broke recently.

I have 3 theories what might have changed and broke this:
* The mail sending code in Thunderbird was rewritten and broke it (I would think that's the most likely)
* Something in Thunderbird is now "sanitizing" the sent HTML during the sending process and removing the attribute.
* A change in the HTML parser in Gecko.

If it's the second or third, then an exact regression range would help.

This is not a bug in the HTML<->plaintext converters, but it's a regression in the sending process. Something gets lost between `scanHTML()` and the HTML->plaintext converter.
WaltS48, thanks for the details, including specific reproductions where it works and where it does not work in comment 3, and for the sample message in comment 1.

When you just type or paste a URL as plaintext in the HTML editor (which is what most people do), without manually marking it as link, we detect the plaintext URL within the HTML text nodes (using the `scanHTML()` recognizer) and make it a link. This works as expected.

In the sample message, in the HTML part, you see
```
<a moz-do-not-send="true" href="https://bugzilla.mozilla.org/">https://bugzilla.mozilla.org</a>
```

The `moz-do-not-send="true"` means: When we downconvert to plaintext and see this attribute, we should output only the link text and not the link target.

(We do it in such an explicit way, instead of relying on heuristics like `target == linktext`, because we also convert other forms of URLs where the linktext does not match the URL precisely, but only vaguely, e.g. linktexts "foobar.com", "<http://foobar.com>" etc. give URL https://foobar.com . Instead of relying on heuristics that would reverse the recognition, we unabiguously mark which <a href> was artificially inserted.)

This used to work, but apparently broke recently.

I have 3 theories what might have changed and broke this:
* The mail sending code in Thunderbird was rewritten and broke it (I would think that's the most likely)
* Something in Thunderbird is now "sanitizing" the sent HTML during the sending process and removing the attribute.
* A change in the HTML parser in Gecko.

If it's the second or third, then an exact regression range would help.

This is not a bug in the HTML<->plaintext converters, but it's a regression in the sending process. Something gets lost between `scanHTML()` and the HTML->plaintext converter.
WaltS48, thanks for the details, including specific reproductions where it works and where it does not work in comment 3, and for the sample message in comment 1.

When you just type or paste a URL as plaintext in the HTML editor (which is what most people do), without manually marking it as link, we detect the plaintext URL within the HTML text nodes (using the `scanHTML()` recognizer) and make it a link. This works as expected.

In the sample message, in the HTML part, you see
```
<a moz-do-not-send="true" href="https://bugzilla.mozilla.org/">https://bugzilla.mozilla.org</a>
```

The `moz-do-not-send="true"` means: When we downconvert to plaintext and see this attribute, we should output only the link text and not the link target.

(We do it in such an explicit way, instead of relying on heuristics like `target == linktext`, because we also convert other forms of URLs where the linktext does not match the URL precisely, but only vaguely, e.g. linktexts "foobar.com", "<http://foobar.com>" etc. give URL https://foobar.com . Instead of relying on heuristics that would reverse the recognition, we unabiguously mark which <a href> was artificially inserted.)

This used to work, but apparently broke recently.

Probably, the mail sending code in Thunderbird was rewritten recently and broke it. I suspect that this broke simply during the recent C++ to JS rewrite of the mail send code.

This is not a bug in the HTML<->plaintext converters, but it's a regression in the sending process. Something gets lost between `scanHTML()` and the HTML->plaintext converter.
WaltS48, thanks for the details, including specific reproductions where it works and where it does not work in comment 3, and for the sample message in comment 1.

When you just type or paste a URL as plaintext in the HTML editor (which is what most people do), without manually marking it as link, we detect the plaintext URL within the HTML text nodes (using the `scanHTML()` recognizer) and make it a link. This works as expected.

In the sample message, in the HTML part, you see
```
<a moz-do-not-send="true" href="https://bugzilla.mozilla.org/">https://bugzilla.mozilla.org</a>
```

The `moz-do-not-send="true"` means: When we downconvert to plaintext and see this attribute, we should output only the link text and not the link target.

(We do it in such an explicit way, instead of relying on heuristics like `target == linktext`, because we also convert other forms of URLs where the linktext does not match the URL precisely, but only vaguely, e.g. linktexts "foobar.com", "<http://foobar.com>" etc. give URL https://foobar.com . Instead of relying on heuristics that would reverse the recognition, we unabiguously mark which <a href> was artificially inserted.)

This used to work, but apparently broke recently.

Probably, the mail sending code in Thunderbird was rewritten recently and broke it. I suspect that this broke simply during the recent C++ to JS rewrite of the mail send code.

This is not a bug in the HTML<->plaintext converters, but it's a regression in the sending process. Something gets lost between the call to `scanHTML()` and the HTML->plaintext converter.
WaltS48, thanks for the details, including specific reproductions where it works and where it does not work in comment 3, and for the sample message in comment 1.

When you just type or paste a URL as plaintext in the HTML editor (which is what most people do), without manually marking it as link, we detect the plaintext URL within the HTML text nodes (using the `scanHTML()` recognizer) and make it a link. This works as expected.

In the sample message, in the HTML part, you see
```
<a moz-do-not-send="true" href="https://bugzilla.mozilla.org/">https://bugzilla.mozilla.org</a>
```

The `moz-do-not-send="true"` means: When we downconvert to plaintext and see this attribute, we should output only the link text and not the link target. It's all correct until here.

(We do it in such an explicit way, instead of relying on heuristics like `target == linktext`, because we also convert other forms of URLs where the linktext does not match the URL precisely, but only vaguely, e.g. linktexts "foobar.com", "<http://foobar.com>" etc. give URL https://foobar.com . Instead of relying on heuristics that would reverse the recognition, we unabiguously mark which <a href> was artificially inserted.)

However, the `moz-do-not-send="true"` seems to get lost on the way.

This used to work, but apparently broke recently.

Probably, the mail sending code in Thunderbird was rewritten recently and broke it. I suspect that this broke simply during the recent C++ to JS rewrite of the mail send code.

This is not a bug in the HTML<->plaintext converters, but it's a regression in the sending process. Something gets lost between the call to `scanHTML()` and the HTML->plaintext converter.

Back to Bug 1692771 Comment 5