1627437 - MOZ_CAN_RUN_SCRIPT related Thunderbird build failures of 2020-04-04

Magnus Melin [:mkmelin]

Reporter

Description

•

5 years ago

+++ This bug was initially created as a clone of Bug #1619914 +++

Thought I had fixed it since it build locally, but this check requires this to be set.

ac_add_options --enable-clang-plugin

I don't see a very easy solution here. Just adding MOZ_CAN_RUN_SCRIPT doesn't work for NS_IMETHODIMPs like nsMsgCopyService::CopyMessages

MOZ_CAN_RUN_SCRIPT_BOUNDARY NS_IMETHODIMP seem to help there, but that gives errors such as
/home/magnus/Code/tb/mozilla3/comm/mailnews/base/src/nsMessenger.cpp:1411:7: error: arguments must all be strong refs or caller's parameters when calling a function marked as MOZ_CAN_RUN_SCRIPT (including the implicit object argument). 'mTxnMgr' is neither.

Any good ideas?

An explanation of how it's supposed to work is apparently here: https://searchfox.org/comm-central/rev/14bbd9a262f7ca60026345024fbd2ee461c04268/mozilla/build/clang-plugin/CanRunScriptChecker.cpp#14-48

Magnus Melin [:mkmelin]

Reporter

Updated

•

5 years ago

Severity: normal → blocker

Priority: P3 → P1

Magnus Melin [:mkmelin]

Reporter

Comment 1

•

5 years ago

Failures: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=296263948&repo=comm-central&lineNumber=7335

Rob Lemley [:rjl]

Updated

•

5 years ago

Keywords: leave-open

Pulsebot

Comment 2

•

5 years ago

Pushed by thunderbird@calypsoblue.org: https://hg.mozilla.org/comm-central/rev/0d3828ea7422 Temporarily disable clang-plugin. rs=bustage-fix

mac198442

Comment 3

•

5 years ago

I realize why you took this action, but this is entirely backwards. What really needs to be done is to enable clan-plugin on all developers builds so that this does not happen.

mac198442

Comment 4

•

5 years ago

Actually, what would probably be better for situations such as this would be if you could specify --enable-clang-plugin=warn to make normally fatal errors be warnings, rather than building without it altogether.

Magnus Melin [:mkmelin]

Reporter

Comment 5

•

5 years ago

We'll enable it again as soon as we get CI build working with it enabled - just can't afford not to find other problems that could creep in while we figure it out. The real fix doesn't appear super trivial for a quick weekend action.

Magnus Melin [:mkmelin]

Reporter

Comment 6

•

5 years ago

Ben, please take this.

Assignee: nobody → benc

mac198442

Comment 7

•

5 years ago

I was not criticizing you for deciding to build without it. I was trying to make a comment on what should be done for proper development going forward. There is no reason for any developers to not be building with the clang-plugin enabled. Secondly if something odd happens and something lads it would be better if there were a way to make all error clang-plugin checks to be warnings so that you don't have to disable it completely so there is still a mechanism to see new things landing that are issues.

Comment hidden (Intermittent Failures Robot)

Rob Lemley [:rjl]

Comment 9

•

5 years ago

Another possibility is to have another build job in CI that can run daily with --enable-clang-plugin-alpha set. (I'm assuming these checks will show up there prior to being being made default.) The idea being that we will get a little more notice that something is coming. Make is a tier 2 job so it doesn't throw anything off.

Rob Lemley [:rjl]

Comment 10

•

5 years ago

(In reply to mac198442 from comment #4)

Actually, what would probably be better for situations such as this would be if you could specify --enable-clang-plugin=warn to make normally fatal errors be warnings, rather than building without it altogether.

Is that a feature suggestion? It doesn't work:
0:04.37 mozbuild.configure.options.InvalidOptionError: --enable-clang-plugin takes 0 values

Rob Lemley [:rjl]

Comment 11

•

5 years ago

Attached patch Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta — Details — Splinter Review

This is breaking beta builds as well as we merged today. I'll let Magnus make the call if we want to disable on beta or not.

Attachment #9138642 - Flags: review?(mkmelin+mozilla)

Rob Lemley [:rjl]

Updated

•

5 years ago

Assignee: benc → rob

Status: NEW → ASSIGNED

mac198442

Comment 12

•

5 years ago

(In reply to Rob Lemley [:rjl] from comment #10)

(In reply to mac198442 from comment #4)

Actually, what would probably be better for situations such as this would be if you could specify --enable-clang-plugin=warn to make normally fatal errors be warnings, rather than building without it altogether.

Is that a feature suggestion? It doesn't work:
0:04.37 mozbuild.configure.options.InvalidOptionError: --enable-clang-plugin takes 0 values

I am sorry that was a feature suggestion. Probably does not belong in this bug.

Magnus Melin [:mkmelin]

Reporter

Comment 13

•

5 years ago

I think it's not a problem to do a beta with it disabled.

Magnus Melin [:mkmelin]

Reporter

Updated

•

5 years ago

Attachment #9138642 - Flags: review?(mkmelin+mozilla) → review+

Ben Campbell

Assignee

Comment 14

•

5 years ago

(In reply to Magnus Melin [:mkmelin] from comment #0)

MOZ_CAN_RUN_SCRIPT_BOUNDARY NS_IMETHODIMP seem to help there, but that gives errors such as
/home/magnus/Code/tb/mozilla3/comm/mailnews/base/src/nsMessenger.cpp:1411:7: error: arguments must all be strong refs or caller's parameters when calling a function marked as MOZ_CAN_RUN_SCRIPT (including the implicit object argument). 'mTxnMgr' is neither.

Any good ideas?

I think it's mostly a matter of adding [can_run_script] decorators in .idl files, to all the interface methods which might eventually invoke a script (or call something else which might invoke a script). [can_run_script] adds the MOZ_CAN_RUN_SCRIPT into the generated C++ for you.
In practice, I think it'll be a case of the [can_run_script] methods in the M-C interfaces causing [can_run_script] ripples, spreading upward through the C-C codebase. A bit tedious, but all sane and sensible stuff.

There are a few other conditions to satisfy - eg scripts might decide to hold a reference to any params you pass in, way beyond the life of that function call. That includes the this pointer. So you've got to be careful about refcounting stuff you pass in. I can see the static analysis preventing all kinds of subtle footguns here!

Anyway, I've made a start on it. Will see how far the [can_run_script] ripples spread tomorrow!

Pulsebot

Comment 15

•

5 years ago

Pushed by thunderbird@calypsoblue.org: https://hg.mozilla.org/comm-central/rev/3cdb2ce79d79 Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta. r=mkmelin

Rob Lemley [:rjl]

Comment 16

•

5 years ago

Comment on attachment 9138642 [details] [diff] [review] Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta [Approval Request Comment] This will need uplift so Beta 76 can proceed.

Attachment #9138642 - Flags: approval-comm-beta?

Ben Campbell

Assignee

Comment 17

•

5 years ago

Attached patch 1627437-WIP-1.patch (obsolete) — Details — Splinter Review

Uploading what I've got so far as I might not be around so much over the next couple of days.

marks most of the MOZ_CAN_RUN_SCRIPT functions needed to get the static analysis working, except for a couple of places where we're implementing non-can_run_script M-C interfaces. But they shouldn't be too hard to sort out.
still a bunch of calls to fix up where we need to observe the parameter restrictions. Mostly trivial stuff, just tedious. Usually just a matter of adding a local nsCOMPtr<> to be really explicit that the parameter is live.

So I don't think we're too far off being able to compile with the static analysis back on.

Rob Lemley [:rjl]

Updated

•

5 years ago

Assignee: rob → benc

Ben Campbell

Assignee

Comment 19

•

5 years ago

Hmm. The changes are still rippling outward. It's hard to gauge progress as it only prints out a few error messages at a time (and of course, each fix has the potential to cause other errors).
I'm going to have one more run on it tomorrow, and if that doesn't nail them all I'll step back and reassess. Most likely that means establishing a minimal MOZ_CAN_RUN_SCRIPT_BOUNDARY perimeter, and logging a bug to try and better define it all.

I think the bigger issue is that we've got no clear boundary between the native and and script sides. Virtually any XPCOM object could (in theory) be implemented in script.
In practice, a lot of interfaces only have native implementations, so it probably makes sense to go through and document the ones we're prepared to commit to being non-scriptable, for a start.
And even then, the unit tests mock up a lot of these otherwise-native-only interfaces in script (which is a great capability we'd want to retain).

Jorg K (CEST = GMT+2)

Comment 20

•

5 years ago

IMHO, MOZ_CAN_RUN_SCRIPT_BOUNDARY is the way to go, otherwise you'll end up with a huge patch, see:
https://searchfox.org/comm-central/search?q=MOZ_CAN_RUN_SCRIPT_BOUNDARY&case=false&regexp=false&path=mailnews

Ben Campbell

Assignee

Comment 21

•

5 years ago

(In reply to Jorg K (CEST = GMT+2) from comment #20)

IMHO, MOZ_CAN_RUN_SCRIPT_BOUNDARY is the way to go, otherwise you'll end up with a huge patch

For sure. That's the way I'll go now.

But it's essentially bypassing all the C++/script boundary static analysis, which would be nice to have.
I figured it was worth a shot to see if we could go the whole hog and mark everything "correctly". As various M-C interfaces are marked, there might have been a restricted effect in C-C code which was managable.... but no. The ripples quickly turn into a tsunami :-(

Oh well. For now I think its most important to just have the rest of the static analysis going, even if we don't get the benefit of the C++/script boundary checking. But further down the road, it'd be really nice to really nail down where the native/script boundaries lie in TB. I think that'd clarify a whole load of other issues too.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)

Comment 22

•

5 years ago

•

Edited

FYI: Adding [can_run_script] to editor interface methods means that the method calls may run mutation event listeners, mutation observers, input event listeners, selectionchange event listeners and layout information observers (I don't know the detail of this). JS can destroy the editor and can change the DOM tree. Therefore, each caller of them needs to guarantee the method's class lifetime and arguments' lifetime. On the other hand, if callers are sure that there are only safe event listeners which don't do that. I.e., cannot be handled by web contents (including HTML email), in other words, it changes only in chrome and which is not touchable from addons, the callers can be marked as MOZ_CAN_RUN_SCRIPT_BOUNDARY and it's better strategy than marking all callers as MOZ_CAN_RUN_SCRIPT.

Magnus Melin [:mkmelin]

Reporter

Comment 23

•

5 years ago

If you had a more complete wip patch for the [can_run_script], can you attach it?
And should we do both? Use MOZ_CAN_RUN_SCRIPT_BOUNDARY but mark stuff as [can_run_script] where it's needed?

Wayne Mery (:wsmwk)

Comment 24

•

5 years ago

Does this actually block 76 beta2?

Ben Campbell

Assignee

Comment 25

•

5 years ago

Attached patch 1627437-minimal-MOZ_CAN_RUN_SCRIPT_BOUNDARY-1.patch — Details — Splinter Review

This one is the tightest MOZ_CAN_RUN_SCRIPT_BOUNDARY circle I could draw around the nsITransaction methods. Gets it compiling for me with ac_add_options --disable-clang-plugin in my mozconfig.

Here's the approach I used, as I'm sure this will come up again:

Where possible add the MOZ_CAN_RUN_SCRIPT_BOUNDARYdecoration to the method declaration in the.h` file (seems to be the preferred approach).
In a lot of places the offending method is declared in a NS_DECL_... macro, so there I decorated the method definition in the .cpp file instead.
Any calls to [can_run_script] (aka MOZ_CAN_RUN_SCRIPT) methods need to ensure none of the arguments can possibly be destroyed during the call - including this. So I added nsCOMPtr<> or RefPtr<> locals where required.

Attachment #9141602 - Flags: review?(mkmelin+mozilla)

Ben Campbell

Assignee

Comment 26

•

5 years ago

Attached patch 1627437-abortive-MOZ_CAN_RUN_SCRIPT-WIP.patch — Details — Splinter Review

This is the abortive attempt to use MOZ_CAN_RUN_SCRIPT and avoid the use of MOZ_CAN_RUN_SCRIPT_BOUNDARY. There was a bit more, but I'm afraid I ditched it in frustration.

I still don't entirely grasp the intent of the MOZ_CAN_RUN_SCRIPT analysis, in the context of C-C.
Comment 22 seems to imply that it's for script running in addons or "in-the-wild" (I guess we don't have too much of that, although presumably there's some issue with html emails..).
Are we able to nail down where those boundaries lie, and extend out our MOZ_CAN_RUN_SCRIPT circle appropriately?

That said, it seems that a lot of the checking is applicable to any C++/Javascript boundary. But doing that kind of sucks in pretty much our entire codebase...

I don't know. I love the approach of having static analysis detect potentially dodgy calls to script land, but I don't quite grasp how we can best use it. Happy to have a go at improving the approach if anyone knows which direction we should go in...

In any case, if my minimal MOZ_CAN_RUN_SCRIPT_BOUNDARY patch works, at least we can reenable static analysis (and hopefully set or recommend it as a default for developers doing local builds).

Attachment #9139087 - Attachment is obsolete: true

Magnus Melin [:mkmelin]

Reporter

Comment 27

•

5 years ago

Comment on attachment 9141602 [details] [diff] [review] 1627437-minimal-MOZ_CAN_RUN_SCRIPT_BOUNDARY-1.patch Review of attachment 9141602 [details] [diff] [review]: ----------------------------------------------------------------- Seems fine to me. r=mkmelin

Attachment #9141602 - Flags: review?(mkmelin+mozilla) → review+

Pulsebot

Comment 28

•

5 years ago

Pushed by mkmelin@iki.fi: https://hg.mozilla.org/comm-central/rev/3e7fa8d4bd21 Add attributes to cope with [can_run_script] methods in nsITransaction (Bug 1619914). r=mkmelin DONTBUILD

Magnus Melin [:mkmelin]

Reporter

Comment 29

•

5 years ago

Comment on attachment 9141604 [details] [diff] [review] 1627437-abortive-MOZ_CAN_RUN_SCRIPT-WIP.patch Review of attachment 9141604 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/search/src/nsMsgFilterService.cpp @@ +420,5 @@ > if (m_filters) (void)m_filters->FlushLogIfNecessary(); > > + if (m_callback) { > + nsCOMPtr<nsIMsgOperationListener> cb = m_callback; > + cb->OnStopOperation(mFinalResult); So should we salvage some of this? Is it useful?

Wayne Mery (:wsmwk)

Comment 30

•

5 years ago

Comment on attachment 9138642 [details] [diff] [review] Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta Approving - with ultimate judgement in the hands of Rob and Magnus of what to land on beta, and when. Including whether attachment 9141604 [details] [diff] [review] must be resolved.

Attachment #9138642 - Flags: approval-comm-beta? → approval-comm-beta+

Ben Campbell

Assignee

Comment 31

•

5 years ago

(In reply to Magnus Melin [:mkmelin] from comment #29)

Comment on attachment 9141604 [details] [diff] [review]
So should we salvage some of this? Is it useful?

No, I think that way madness lies. We end up with just about every single function requiring MOZ_CAN_RUN_SCRIPT.
I think, for now, using MOZ_CAN_RUN_SCRIPT_BOUNDARY to fence off code which calls MOZ_CAN_RUN_SCRIPT/[can_run_script] methods is the way to go. Ideally we'd have a clear idea of where the script/native boundaries really are, and use MOZ_CAN_RUN_SCRIPT up to there, but I don't think we have that clear delineation.

Magnus Melin [:mkmelin]

Reporter

Comment 32

•

5 years ago

Have you done a try build with https://hg.mozilla.org/comm-central/rev/3cdb2ce79d79 backed out?

Flags: needinfo?(benc)

Ben Campbell

Assignee

Comment 33

•

5 years ago

(In reply to Magnus Melin [:mkmelin] from comment #32)

Have you done a try build with https://hg.mozilla.org/comm-central/rev/3cdb2ce79d79 backed out?

I assume you meant backing out https://hg.mozilla.org/comm-central/rev/0d3828ea7422 also?

Try build here, which I think looks OK, as far as this bug goes:

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=e17029974177ca2d75b0975c9bc512f3e6ee4fa7

Flags: needinfo?(benc)

Magnus Melin [:mkmelin]

Reporter

Updated

•

5 years ago

Keywords: leave-open

Target Milestone: --- → Thunderbird 77.0

Pulsebot

Comment 34

•

5 years ago

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/f249a765cc4b
Backed out changeset 3cdb2ce79d79 (bug 1627437) now that 1627437 is fixed (adjustments for nsITransaction::Do/UndoTransaction MOZ_CAN_RUN_SCRIPT). r=backout
https://hg.mozilla.org/comm-central/rev/eaf2127db4f3
Backed out changeset 0d3828ea7422 (for beta/release) now that 1627437 is fixed (adjustments for nsITransaction::Do/UndoTransaction MOZ_CAN_RUN_SCRIPT). r=backout DONTBUILD

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Rob Lemley [:rjl]

Comment 35

•

5 years ago

backout bugherder uplift

Thunderbird 76.0b3:
https://hg.mozilla.org/releases/comm-beta/rev/da0a04cfc049

Rob Lemley [:rjl]

Comment 36

•

5 years ago

bugherder uplift

Thunderbird 76.0b3:
https://hg.mozilla.org/releases/comm-beta/rev/c6fef8bbb3fc
Thunderbird 76.0b3:
https://hg.mozilla.org/releases/comm-beta/rev/12de722b5dc2

Rob Lemley [:rjl]

Updated

•

5 years ago

status-thunderbird76: --- → fixed

Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta 5 years ago Rob Lemley [:rjl] 2.68 KB, patch	mkmelin : review+ wsmwk : approval-comm-beta+	Details \| Diff \| Splinter Review
1627437-WIP-1.patch 5 years ago Ben Campbell 74.73 KB, patch		Details \| Diff \| Splinter Review
1627437-minimal-MOZ_CAN_RUN_SCRIPT_BOUNDARY-1.patch 5 years ago Ben Campbell 16.69 KB, patch	mkmelin : review+	Details \| Diff \| Splinter Review
1627437-abortive-MOZ_CAN_RUN_SCRIPT-WIP.patch 5 years ago Ben Campbell 115.41 KB, patch		Details \| Diff \| Splinter Review