Closed Bug 1681887 Opened 3 years ago Closed 3 years ago

Intermittent comm/mail/test/browser/openpgp/browser_viewMessage.js | Secret text should be contained in message - chrome://mochitests/content/browser/comm/mail/test/browser/openpgp/browser_viewMessage.js :: testDecryptInlineWithNBSPasQP :: line 400

Tracking

(thunderbird_esr78+ fixed, thunderbird86+ fixed)

Status:

RESOLVED FIXED

Milestone:

87 Branch

Tracking Flags:

Tracking

Status

thunderbird_esr78

fixed

thunderbird86

fixed

People

(Reporter: intermittent-bug-filer, Assigned: KaiE)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 3 obsolete files)

1681887-v1.patch 3 years ago Kai Engert (:KaiE:) 2.42 KB, patch		Details \| Diff \| Splinter Review
Bug 1681887: Delay test result checking until OpenPGP message parsing is done. r=mkmelin 3 years ago Kai Engert (:KaiE:) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1681887 - attempted backport of split open_message_from_file for esr78. 3 years ago Kai Engert (:KaiE:) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin 3 years ago Kai Engert (:KaiE:) 48 bytes, text/x-phabricator-request	wsmwk : approval-comm-beta+ wsmwk : approval-comm-esr78+	Details \| Review

Treeherder Bug Filer

Reporter

Description

•

3 years ago

treeherder

Filed by: mkmelin [at] iki.fi
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=324223945&repo=comm-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/HAwPTPzbQAmGc8-eVzdqMA/runs/0/artifacts/public/logs/live_backing.log

Magnus Melin [:mkmelin]

Comment 1

•

3 years ago

This is the new test case that landed yesterday.

Comment hidden (Intermittent Failures Robot)

Kai Engert (:KaiE:)

Assignee

Comment 3

•

3 years ago

The problem is probably related to how we process PGP/INLINE messages, which are loaded as is into the message display area, and only afterwards we run decryption and replace the shown message.

For PGP/MIME I think the decryption is done as part of the initial message loading, and consequently the message is shown immediately.

The test probably runs at the time the shown message is still the PGP/INLINE text, and haven't yet decrypted.

The problem isn't happening on Linux. We should try to confirm my theory, and if correct, figure out how to delay the test until after the decryption step has passed.

Comment hidden (Intermittent Failures Robot)

Kai Engert (:KaiE:)

Assignee

Comment 7

•

3 years ago

Attached patch 1681887-v1.patch (obsolete) — Details — Splinter Review

I had started to work on this mid december, but the fix probably isn't complete yet.

Assignee: nobody → kaie

Kai Engert (:KaiE:)

Assignee

Updated

•

3 years ago

Comment 8

•

3 years ago

Comment on attachment 9195589 [details] [diff] [review]
1681887-v1.patch

(In reply to Kai Engert (:KaiE:) from comment #7)

Created attachment 9195589 [details] [diff] [review]
1681887-v1.patch

Ok, this fix doesn't work, because obviously we cannot access the window's elements before it gets created/opened.
Marking obsolete. I have a different fix that appears to work.

Attachment #9195589 - Attachment is obsolete: true

Kai Engert (:KaiE:)

Assignee

Comment 9

•

3 years ago

The test probably runs at the time the shown message is still the PGP/INLINE text, and haven't yet decrypted.

I have a patch that fixes the issue.

This is a try run without changes - 1 out of 7 bct5 showed the symptom from bug 1681656:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=287061941133abbbd2c52dd3d8131987476c5905

This is a try run with a fix - no failures in 12 bct5 executions:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=c7285efa0a20e2f412af4127053e05e0b8296185

Look at the patch of the try run:
https://hg.mozilla.org/try-comm-central/rev/c7285efa0a20e2f412af4127053e05e0b8296185

It logs text "waitForDecryptDone" with empty status, if we are checking early for the OpenPGP result, but the processing hasn't completed yet.

This is one of the logs from a try run with the fix:
https://firefoxci.taskcluster-artifacts.net/ENCwzSjgSWqVzYT-kIN6Vw/0/public/logs/live_backing.log

In one place it can be seen that "waitForDecryptDone" is printed twice, once with empty status, once with a good status.
This confirms that "waitFor" completion works.

I'm uploading a patch that uses this strategy, with some cleanup.

Kai Engert (:KaiE:)

Assignee

Comment 10

•

3 years ago

Attached file Bug 1681887: Delay test result checking until OpenPGP message parsing is done. r=mkmelin (obsolete) — Details

Also fixes bug 1681656.

Comment hidden (Intermittent Failures Robot)

Kai Engert (:KaiE:)

Assignee

Comment 12

•

3 years ago

Attached file Bug 1681887 - attempted backport of split open_message_from_file for esr78. (obsolete) — Details

Phabricator Automation

Updated

•

3 years ago

Attachment #9195902 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9196892 - Attachment is obsolete: true

Magnus Melin [:mkmelin]

Comment 13

•

3 years ago

I did some debugging of this. It's not working on try for osx, and locally also fails on linux in verify chaos mode.
I think there's some confusion of where updatedSecurityStatus should be called or not.
Try: https://treeherder.mozilla.org/jobs?repo=try-comm-central&selectedTaskRun=Vt-SEKdWRkWx9G9qWPu62Q.0

Kai Engert (:KaiE:)

Assignee

Comment 14

•

3 years ago

Here are some details that I learned about the current implementation (inherited from Enigmail).

The code does several "trial and error" attempts when processing OpenPGP messages, because it isn't certain which encoding is the correct one to use.

First, the message is extracted from the window's DOM, some fixes are potentially applied to its structure, and an attempt is made to process it.
If this fails, the code will attempt to retry with different encodings.

There are retries numbered 1, 2, 3,

If the initial encoding is already UTF-8, then it will skip retry 1.

Retry 1 (potentially skipped), uses the message that was extracted (and adjusted), and processes it using
EnigmailData.convertFromUnicode(msgText, "UTF-8")

If retry 1 doesn't work, then the code decides that the text that was extracted from the DOM will not be used for the further attempts.

Retry 2 (msgDirectDecrypt/msgDirectCallback) will use the message URI and a stream listener to obtain the raw message. It then converts the raw message using EnigmailData.convertFromUnicode(msgText, "UTF-8") and process it.

If retry 2 failed, it do retry 3.

Retry 3 calls EnigmailData.convertToUnicode(msgText, "UTF-8") (in my understanding that reverts the conversion from retry 2, going back to the raw message source), and will process it again.

All of these steps are done without waiting for the results. I've added some async declaractions and added some await statements, which was insufficient (by itself) to fix it.

The problem is that retry 2 cannot be easily awaited. It uses setTimeout(0) to continue the processing of the data that is obtained from the stream listener. My attempt to avoid the setTimeout(0) call resulted in a deadlock.

The two tests that are failing currently are processed in retry 3. That's why it's difficult to wait for the result.

I had the thought that we potentially could rewrite the retry attempts. Maybe we can find a more direct way to obtain the raw message data, without having to rely on setTimeout(0).

However, when experimenting, I noticed that the code makes an (apparently) incorrect decision.

For the messages that work with retry 3, I noticed that actually retry 1 works for them, too!
For our test messages, which are already in UTF-8, the code decides to skip retry 1.
However, that seems like an incorrect decision.
The text that we extract from the DOM is probably in the unicode encoding - a result of whatever we have converted from the source and rendered for display.
Retry 1 takes unicode input and converts it to UTF-8.

I've changed the code to always allow retry (and never skip it).
This causes our test messages to be work in retry 1. This change avoids that we reach the setTimeout(0) code from retry 2, resulting in more direct execution.

These changes (async/await, don't skip retry 1) were still insufficient to fix the tests.
Waiting for the end of the processing was still necessary.

Because I couldn't get Magnus' suggestion to work, I went back to my original idea, using the attribute, and using utils.waitFor.

With this change, I got a working try run:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=3952b5381e0e3a4e6219aca3e94c8179237372b9

I've spent a lot of time on this issue, and I feel it's not worth it to spend more time on it right now.
I would like to suggest that we accept this workaround for now, even if it isn't the "perfect" approach.

I recommend to take this fix to get the tests going, and plan follow-up work to get this into better shape.

As part of the patch I switched the processing of retry 2 and retry 3. I think it makes sense to avoid one conversion, so if we reach retry 2, it will start with the raw code, and only does a conversion for retry 3.

Kai Engert (:KaiE:)

Assignee

Comment 15

•

3 years ago

Attached file Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin — Details

Kai Engert (:KaiE:)

Assignee

Comment 16

•

3 years ago

This is a try build that uses a delay that Magnus had suggested, it demonstrates that the test correctly waits for processing to be done, and the attribute to appear:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=ae858c1e4a795a86311c7b11c1384f5a18ba643b

It's true that waiting for retry 2 or retry 3 wouldn't work.

And here is a try with esr78:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=8e8cf66a63b43f41313ff2640069bc07fb506dd9

Magnus Melin [:mkmelin]

Comment 17

•

3 years ago

It seems for 78 timings are different enough that it doesn't work there (see the try windows bct5), which is all to show there's a bunch of things that are iffy to say the least in this code.

Comment hidden (Intermittent Failures Robot)

Pulsebot

Comment 21

•

3 years ago

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/e10ae1d3fe56
Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED

Magnus Melin [:mkmelin]

Updated

•

3 years ago

status-thunderbird86: --- → affected

status-thunderbird_esr78: --- → affected

tracking-thunderbird86: --- → +

tracking-thunderbird_esr78: --- → +

Target Milestone: --- → 87 Branch

Magnus Melin [:mkmelin]

Comment 22

•

3 years ago

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Approval Request Comment]
Clears up test failures, and likely some real bugs as well.

Attachment #9197175 - Flags: approval-comm-beta?

Wayne Mery (:wsmwk)

Comment 23

•

3 years ago

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Triage Comment]
Approved for beta

Attachment #9197175 - Flags: approval-comm-beta? → approval-comm-beta+

Rob Lemley [:rjl]

Comment 24

•

3 years ago

bugherder uplift

Thunderbird 86.0b2:
https://hg.mozilla.org/releases/comm-beta/rev/56512e60f4ec

status-thunderbird86: affected → fixed

Comment hidden (Intermittent Failures Robot)

Magnus Melin [:mkmelin]

Comment 26

•

3 years ago

Uplift to 78?

Kai Engert (:KaiE:)

Assignee

Comment 27

•

3 years ago

(In reply to Magnus Melin [:mkmelin] from comment #26)

Uplift to 78?

Yes please. I've been running a local build with this change, and haven't seen problems.

Kai Engert (:KaiE:)

Assignee

Comment 28

•

3 years ago

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Approval Request Comment]
Regression caused by (bug #): no
User impact if declined: none
Testing completed (on c-c, etc.): yes, cc, beta, local build, experimental build given to testers of bug 1644085
Risk to taking this patch (and alternatives if risky): rather low

Attachment #9197175 - Flags: approval-comm-esr78?

Wayne Mery (:wsmwk)

Comment 29

•

3 years ago

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Triage Comment]
Approved for esr78

Attachment #9197175 - Flags: approval-comm-esr78? → approval-comm-esr78+

Kai Engert (:KaiE:)

Assignee

Comment 30

•

3 years ago

https://hg.mozilla.org/releases/comm-esr78/rev/d1f053495b2f60e3f7ce163d001e10e41525f3c2

= 78.8.1

status-thunderbird_esr78: affected → fixed

Kai Engert (:KaiE:)

Assignee

Updated

•

3 years ago

Regressions: 1701924

Magnus Melin [:mkmelin]

Updated

•

3 years ago