Closed Bug 1681887 Opened 10 months ago Closed 8 months ago

Intermittent comm/mail/test/browser/openpgp/browser_viewMessage.js | Secret text should be contained in message - chrome://mochitests/content/browser/comm/mail/test/browser/openpgp/browser_viewMessage.js :: testDecryptInlineWithNBSPasQP :: line 400

Categories

(MailNews Core :: Security: OpenPGP, defect, P5)

Tracking

(thunderbird_esr78+ fixed, thunderbird86+ fixed)

RESOLVED FIXED
87 Branch
Tracking Status
thunderbird_esr78 + fixed
thunderbird86 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: KaiE)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file, 3 obsolete files)

This is the new test case that landed yesterday.

The problem is probably related to how we process PGP/INLINE messages, which are loaded as is into the message display area, and only afterwards we run decryption and replace the shown message.

For PGP/MIME I think the decryption is done as part of the initial message loading, and consequently the message is shown immediately.

The test probably runs at the time the shown message is still the PGP/INLINE text, and haven't yet decrypted.

The problem isn't happening on Linux. We should try to confirm my theory, and if correct, figure out how to delay the test until after the decryption step has passed.

Attached patch 1681887-v1.patch (obsolete) — Splinter Review

I had started to work on this mid december, but the fix probably isn't complete yet.

Assignee: nobody → kaie
See Also: → 1681656

Comment on attachment 9195589 [details] [diff] [review]
1681887-v1.patch

(In reply to Kai Engert (:KaiE:) from comment #7)

Created attachment 9195589 [details] [diff] [review]
1681887-v1.patch

Ok, this fix doesn't work, because obviously we cannot access the window's elements before it gets created/opened.
Marking obsolete. I have a different fix that appears to work.

Attachment #9195589 - Attachment is obsolete: true

The test probably runs at the time the shown message is still the PGP/INLINE text, and haven't yet decrypted.

I have a patch that fixes the issue.

This is a try run without changes - 1 out of 7 bct5 showed the symptom from bug 1681656:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=287061941133abbbd2c52dd3d8131987476c5905

This is a try run with a fix - no failures in 12 bct5 executions:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=c7285efa0a20e2f412af4127053e05e0b8296185

Look at the patch of the try run:
https://hg.mozilla.org/try-comm-central/rev/c7285efa0a20e2f412af4127053e05e0b8296185

It logs text "waitForDecryptDone" with empty status, if we are checking early for the OpenPGP result, but the processing hasn't completed yet.

This is one of the logs from a try run with the fix:
https://firefoxci.taskcluster-artifacts.net/ENCwzSjgSWqVzYT-kIN6Vw/0/public/logs/live_backing.log

In one place it can be seen that "waitForDecryptDone" is printed twice, once with empty status, once with a good status.
This confirms that "waitFor" completion works.

I'm uploading a patch that uses this strategy, with some cleanup.

Attachment #9195902 - Attachment is obsolete: true
Attachment #9196892 - Attachment is obsolete: true

I did some debugging of this. It's not working on try for osx, and locally also fails on linux in verify chaos mode.
I think there's some confusion of where updatedSecurityStatus should be called or not.
Try: https://treeherder.mozilla.org/jobs?repo=try-comm-central&selectedTaskRun=Vt-SEKdWRkWx9G9qWPu62Q.0

Here are some details that I learned about the current implementation (inherited from Enigmail).

The code does several "trial and error" attempts when processing OpenPGP messages, because it isn't certain which encoding is the correct one to use.

First, the message is extracted from the window's DOM, some fixes are potentially applied to its structure, and an attempt is made to process it.
If this fails, the code will attempt to retry with different encodings.

There are retries numbered 1, 2, 3,

If the initial encoding is already UTF-8, then it will skip retry 1.

Retry 1 (potentially skipped), uses the message that was extracted (and adjusted), and processes it using
EnigmailData.convertFromUnicode(msgText, "UTF-8")

If retry 1 doesn't work, then the code decides that the text that was extracted from the DOM will not be used for the further attempts.

Retry 2 (msgDirectDecrypt/msgDirectCallback) will use the message URI and a stream listener to obtain the raw message. It then converts the raw message using EnigmailData.convertFromUnicode(msgText, "UTF-8") and process it.

If retry 2 failed, it do retry 3.

Retry 3 calls EnigmailData.convertToUnicode(msgText, "UTF-8") (in my understanding that reverts the conversion from retry 2, going back to the raw message source), and will process it again.

All of these steps are done without waiting for the results. I've added some async declaractions and added some await statements, which was insufficient (by itself) to fix it.

The problem is that retry 2 cannot be easily awaited. It uses setTimeout(0) to continue the processing of the data that is obtained from the stream listener. My attempt to avoid the setTimeout(0) call resulted in a deadlock.

The two tests that are failing currently are processed in retry 3. That's why it's difficult to wait for the result.

I had the thought that we potentially could rewrite the retry attempts. Maybe we can find a more direct way to obtain the raw message data, without having to rely on setTimeout(0).

However, when experimenting, I noticed that the code makes an (apparently) incorrect decision.

For the messages that work with retry 3, I noticed that actually retry 1 works for them, too!
For our test messages, which are already in UTF-8, the code decides to skip retry 1.
However, that seems like an incorrect decision.
The text that we extract from the DOM is probably in the unicode encoding - a result of whatever we have converted from the source and rendered for display.
Retry 1 takes unicode input and converts it to UTF-8.

I've changed the code to always allow retry (and never skip it).
This causes our test messages to be work in retry 1. This change avoids that we reach the setTimeout(0) code from retry 2, resulting in more direct execution.

These changes (async/await, don't skip retry 1) were still insufficient to fix the tests.
Waiting for the end of the processing was still necessary.

Because I couldn't get Magnus' suggestion to work, I went back to my original idea, using the attribute, and using utils.waitFor.

With this change, I got a working try run:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=3952b5381e0e3a4e6219aca3e94c8179237372b9

I've spent a lot of time on this issue, and I feel it's not worth it to spend more time on it right now.
I would like to suggest that we accept this workaround for now, even if it isn't the "perfect" approach.

I recommend to take this fix to get the tests going, and plan follow-up work to get this into better shape.

As part of the patch I switched the processing of retry 2 and retry 3. I think it makes sense to avoid one conversion, so if we reach retry 2, it will start with the raw code, and only does a conversion for retry 3.

This is a try build that uses a delay that Magnus had suggested, it demonstrates that the test correctly waits for processing to be done, and the attribute to appear:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=ae858c1e4a795a86311c7b11c1384f5a18ba643b

It's true that waiting for retry 2 or retry 3 wouldn't work.

And here is a try with esr78:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=8e8cf66a63b43f41313ff2640069bc07fb506dd9

It seems for 78 timings are different enough that it doesn't work there (see the try windows bct5), which is all to show there's a bunch of things that are iffy to say the least in this code.

Duplicate of this bug: 1681656

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/e10ae1d3fe56
Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

Status: NEW → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Approval Request Comment]
Clears up test failures, and likely some real bugs as well.

Attachment #9197175 - Flags: approval-comm-beta?

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Triage Comment]
Approved for beta

Attachment #9197175 - Flags: approval-comm-beta? → approval-comm-beta+

Uplift to 78?

(In reply to Magnus Melin [:mkmelin] from comment #26)

Uplift to 78?

Yes please. I've been running a local build with this change, and haven't seen problems.

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Approval Request Comment]
Regression caused by (bug #): no
User impact if declined: none
Testing completed (on c-c, etc.): yes, cc, beta, local build, experimental build given to testers of bug 1644085
Risk to taking this patch (and alternatives if risky): rather low

Attachment #9197175 - Flags: approval-comm-esr78?

Comment on attachment 9197175 [details]
Bug 1681887 - Reorganize OpenPGP message processing to allow our tests to complete without a race. r=mkmelin

[Triage Comment]
Approved for esr78

Attachment #9197175 - Flags: approval-comm-esr78? → approval-comm-esr78+
Regressions: 1701924
See Also: → 1715340
You need to log in before you can comment on or make changes to this bug.