Closed Bug 1627437 Opened 5 years ago Closed 5 years ago

MOZ_CAN_RUN_SCRIPT related Thunderbird build failures of 2020-04-04

Categories

(MailNews Core :: Backend, defect, P1)

defect

Tracking

(thunderbird76 fixed)

RESOLVED FIXED
Thunderbird 77.0
Tracking Status
thunderbird76 --- fixed

People

(Reporter: mkmelin, Assigned: benc)

References

Details

Attachments

(3 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1619914 +++

Thought I had fixed it since it build locally, but this check requires this to be set.

ac_add_options --enable-clang-plugin

I don't see a very easy solution here. Just adding MOZ_CAN_RUN_SCRIPT doesn't work for NS_IMETHODIMPs like nsMsgCopyService::CopyMessages

MOZ_CAN_RUN_SCRIPT_BOUNDARY NS_IMETHODIMP seem to help there, but that gives errors such as
/home/magnus/Code/tb/mozilla3/comm/mailnews/base/src/nsMessenger.cpp:1411:7: error: arguments must all be strong refs or caller's parameters when calling a function marked as MOZ_CAN_RUN_SCRIPT (including the implicit object argument). 'mTxnMgr' is neither.

Any good ideas?

An explanation of how it's supposed to work is apparently here: https://searchfox.org/comm-central/rev/14bbd9a262f7ca60026345024fbd2ee461c04268/mozilla/build/clang-plugin/CanRunScriptChecker.cpp#14-48

Severity: normal → blocker
Priority: P3 → P1
Keywords: leave-open
Pushed by thunderbird@calypsoblue.org: https://hg.mozilla.org/comm-central/rev/0d3828ea7422 Temporarily disable clang-plugin. rs=bustage-fix

I realize why you took this action, but this is entirely backwards. What really needs to be done is to enable clan-plugin on all developers builds so that this does not happen.

Actually, what would probably be better for situations such as this would be if you could specify --enable-clang-plugin=warn to make normally fatal errors be warnings, rather than building without it altogether.

We'll enable it again as soon as we get CI build working with it enabled - just can't afford not to find other problems that could creep in while we figure it out. The real fix doesn't appear super trivial for a quick weekend action.

Ben, please take this.

Assignee: nobody → benc

I was not criticizing you for deciding to build without it. I was trying to make a comment on what should be done for proper development going forward. There is no reason for any developers to not be building with the clang-plugin enabled. Secondly if something odd happens and something lads it would be better if there were a way to make all error clang-plugin checks to be warnings so that you don't have to disable it completely so there is still a mechanism to see new things landing that are issues.

Another possibility is to have another build job in CI that can run daily with --enable-clang-plugin-alpha set. (I'm assuming these checks will show up there prior to being being made default.) The idea being that we will get a little more notice that something is coming. Make is a tier 2 job so it doesn't throw anything off.

(In reply to mac198442 from comment #4)

Actually, what would probably be better for situations such as this would be if you could specify --enable-clang-plugin=warn to make normally fatal errors be warnings, rather than building without it altogether.

Is that a feature suggestion? It doesn't work:
0:04.37 mozbuild.configure.options.InvalidOptionError: --enable-clang-plugin takes 0 values

This is breaking beta builds as well as we merged today. I'll let Magnus make the call if we want to disable on beta or not.
Attachment #9138642 - Flags: review?(mkmelin+mozilla)
Assignee: benc → rob
Status: NEW → ASSIGNED

(In reply to Rob Lemley [:rjl] from comment #10)

(In reply to mac198442 from comment #4)

Actually, what would probably be better for situations such as this would be if you could specify --enable-clang-plugin=warn to make normally fatal errors be warnings, rather than building without it altogether.

Is that a feature suggestion? It doesn't work:
0:04.37 mozbuild.configure.options.InvalidOptionError: --enable-clang-plugin takes 0 values

I am sorry that was a feature suggestion. Probably does not belong in this bug.

I think it's not a problem to do a beta with it disabled.

Attachment #9138642 - Flags: review?(mkmelin+mozilla) → review+

(In reply to Magnus Melin [:mkmelin] from comment #0)

MOZ_CAN_RUN_SCRIPT_BOUNDARY NS_IMETHODIMP seem to help there, but that gives errors such as
/home/magnus/Code/tb/mozilla3/comm/mailnews/base/src/nsMessenger.cpp:1411:7: error: arguments must all be strong refs or caller's parameters when calling a function marked as MOZ_CAN_RUN_SCRIPT (including the implicit object argument). 'mTxnMgr' is neither.

Any good ideas?

I think it's mostly a matter of adding [can_run_script] decorators in .idl files, to all the interface methods which might eventually invoke a script (or call something else which might invoke a script). [can_run_script] adds the MOZ_CAN_RUN_SCRIPT into the generated C++ for you.
In practice, I think it'll be a case of the [can_run_script] methods in the M-C interfaces causing [can_run_script] ripples, spreading upward through the C-C codebase. A bit tedious, but all sane and sensible stuff.

There are a few other conditions to satisfy - eg scripts might decide to hold a reference to any params you pass in, way beyond the life of that function call. That includes the this pointer. So you've got to be careful about refcounting stuff you pass in. I can see the static analysis preventing all kinds of subtle footguns here!

Anyway, I've made a start on it. Will see how far the [can_run_script] ripples spread tomorrow!

Pushed by thunderbird@calypsoblue.org: https://hg.mozilla.org/comm-central/rev/3cdb2ce79d79 Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta. r=mkmelin
Comment on attachment 9138642 [details] [diff] [review] Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta [Approval Request Comment] This will need uplift so Beta 76 can proceed.
Attachment #9138642 - Flags: approval-comm-beta?
Attached patch 1627437-WIP-1.patch (obsolete) — Splinter Review

Uploading what I've got so far as I might not be around so much over the next couple of days.

  • marks most of the MOZ_CAN_RUN_SCRIPT functions needed to get the static analysis working, except for a couple of places where we're implementing non-can_run_script M-C interfaces. But they shouldn't be too hard to sort out.
  • still a bunch of calls to fix up where we need to observe the parameter restrictions. Mostly trivial stuff, just tedious. Usually just a matter of adding a local nsCOMPtr<> to be really explicit that the parameter is live.

So I don't think we're too far off being able to compile with the static analysis back on.

Assignee: rob → benc

Hmm. The changes are still rippling outward. It's hard to gauge progress as it only prints out a few error messages at a time (and of course, each fix has the potential to cause other errors).
I'm going to have one more run on it tomorrow, and if that doesn't nail them all I'll step back and reassess. Most likely that means establishing a minimal MOZ_CAN_RUN_SCRIPT_BOUNDARY perimeter, and logging a bug to try and better define it all.

I think the bigger issue is that we've got no clear boundary between the native and and script sides. Virtually any XPCOM object could (in theory) be implemented in script.
In practice, a lot of interfaces only have native implementations, so it probably makes sense to go through and document the ones we're prepared to commit to being non-scriptable, for a start.
And even then, the unit tests mock up a lot of these otherwise-native-only interfaces in script (which is a great capability we'd want to retain).

IMHO, MOZ_CAN_RUN_SCRIPT_BOUNDARY is the way to go, otherwise you'll end up with a huge patch, see:
https://searchfox.org/comm-central/search?q=MOZ_CAN_RUN_SCRIPT_BOUNDARY&case=false&regexp=false&path=mailnews

(In reply to Jorg K (CEST = GMT+2) from comment #20)

IMHO, MOZ_CAN_RUN_SCRIPT_BOUNDARY is the way to go, otherwise you'll end up with a huge patch

For sure. That's the way I'll go now.

But it's essentially bypassing all the C++/script boundary static analysis, which would be nice to have.
I figured it was worth a shot to see if we could go the whole hog and mark everything "correctly". As various M-C interfaces are marked, there might have been a restricted effect in C-C code which was managable.... but no. The ripples quickly turn into a tsunami :-(

Oh well. For now I think its most important to just have the rest of the static analysis going, even if we don't get the benefit of the C++/script boundary checking. But further down the road, it'd be really nice to really nail down where the native/script boundaries lie in TB. I think that'd clarify a whole load of other issues too.

FYI: Adding [can_run_script] to editor interface methods means that the method calls may run mutation event listeners, mutation observers, input event listeners, selectionchange event listeners and layout information observers (I don't know the detail of this). JS can destroy the editor and can change the DOM tree. Therefore, each caller of them needs to guarantee the method's class lifetime and arguments' lifetime. On the other hand, if callers are sure that there are only safe event listeners which don't do that. I.e., cannot be handled by web contents (including HTML email), in other words, it changes only in chrome and which is not touchable from addons, the callers can be marked as MOZ_CAN_RUN_SCRIPT_BOUNDARY and it's better strategy than marking all callers as MOZ_CAN_RUN_SCRIPT.

If you had a more complete wip patch for the [can_run_script], can you attach it?
And should we do both? Use MOZ_CAN_RUN_SCRIPT_BOUNDARY but mark stuff as [can_run_script] where it's needed?

Does this actually block 76 beta2?

This one is the tightest MOZ_CAN_RUN_SCRIPT_BOUNDARY circle I could draw around the nsITransaction methods. Gets it compiling for me with ac_add_options --disable-clang-plugin in my mozconfig.

Here's the approach I used, as I'm sure this will come up again:

  • Where possible add the MOZ_CAN_RUN_SCRIPT_BOUNDARYdecoration to the method declaration in the.h` file (seems to be the preferred approach).
  • In a lot of places the offending method is declared in a NS_DECL_... macro, so there I decorated the method definition in the .cpp file instead.
  • Any calls to [can_run_script] (aka MOZ_CAN_RUN_SCRIPT) methods need to ensure none of the arguments can possibly be destroyed during the call - including this. So I added nsCOMPtr<> or RefPtr<> locals where required.
Attachment #9141602 - Flags: review?(mkmelin+mozilla)

This is the abortive attempt to use MOZ_CAN_RUN_SCRIPT and avoid the use of MOZ_CAN_RUN_SCRIPT_BOUNDARY. There was a bit more, but I'm afraid I ditched it in frustration.

I still don't entirely grasp the intent of the MOZ_CAN_RUN_SCRIPT analysis, in the context of C-C.
Comment 22 seems to imply that it's for script running in addons or "in-the-wild" (I guess we don't have too much of that, although presumably there's some issue with html emails..).
Are we able to nail down where those boundaries lie, and extend out our MOZ_CAN_RUN_SCRIPT circle appropriately?

That said, it seems that a lot of the checking is applicable to any C++/Javascript boundary. But doing that kind of sucks in pretty much our entire codebase...

I don't know. I love the approach of having static analysis detect potentially dodgy calls to script land, but I don't quite grasp how we can best use it. Happy to have a go at improving the approach if anyone knows which direction we should go in...

In any case, if my minimal MOZ_CAN_RUN_SCRIPT_BOUNDARY patch works, at least we can reenable static analysis (and hopefully set or recommend it as a default for developers doing local builds).

Attachment #9139087 - Attachment is obsolete: true
Comment on attachment 9141602 [details] [diff] [review] 1627437-minimal-MOZ_CAN_RUN_SCRIPT_BOUNDARY-1.patch Review of attachment 9141602 [details] [diff] [review]: ----------------------------------------------------------------- Seems fine to me. r=mkmelin
Attachment #9141602 - Flags: review?(mkmelin+mozilla) → review+
Pushed by mkmelin@iki.fi: https://hg.mozilla.org/comm-central/rev/3e7fa8d4bd21 Add attributes to cope with [can_run_script] methods in nsITransaction (Bug 1619914). r=mkmelin DONTBUILD
Comment on attachment 9141604 [details] [diff] [review] 1627437-abortive-MOZ_CAN_RUN_SCRIPT-WIP.patch Review of attachment 9141604 [details] [diff] [review]: ----------------------------------------------------------------- ::: mailnews/base/search/src/nsMsgFilterService.cpp @@ +420,5 @@ > if (m_filters) (void)m_filters->FlushLogIfNecessary(); > > + if (m_callback) { > + nsCOMPtr<nsIMsgOperationListener> cb = m_callback; > + cb->OnStopOperation(mFinalResult); So should we salvage some of this? Is it useful?
Comment on attachment 9138642 [details] [diff] [review] Follow-up: Temporarily disable clang-plugin on Thunderbird release/beta Approving - with ultimate judgement in the hands of Rob and Magnus of what to land on beta, and when. Including whether attachment 9141604 [details] [diff] [review] must be resolved.
Attachment #9138642 - Flags: approval-comm-beta? → approval-comm-beta+

(In reply to Magnus Melin [:mkmelin] from comment #29)

Comment on attachment 9141604 [details] [diff] [review]
So should we salvage some of this? Is it useful?

No, I think that way madness lies. We end up with just about every single function requiring MOZ_CAN_RUN_SCRIPT.
I think, for now, using MOZ_CAN_RUN_SCRIPT_BOUNDARY to fence off code which calls MOZ_CAN_RUN_SCRIPT/[can_run_script] methods is the way to go. Ideally we'd have a clear idea of where the script/native boundaries really are, and use MOZ_CAN_RUN_SCRIPT up to there, but I don't think we have that clear delineation.

Have you done a try build with https://hg.mozilla.org/comm-central/rev/3cdb2ce79d79 backed out?

Flags: needinfo?(benc)

(In reply to Magnus Melin [:mkmelin] from comment #32)

Have you done a try build with https://hg.mozilla.org/comm-central/rev/3cdb2ce79d79 backed out?

I assume you meant backing out https://hg.mozilla.org/comm-central/rev/0d3828ea7422 also?

Try build here, which I think looks OK, as far as this bug goes:

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=e17029974177ca2d75b0975c9bc512f3e6ee4fa7

Flags: needinfo?(benc)
Keywords: leave-open
Target Milestone: --- → Thunderbird 77.0

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/f249a765cc4b
Backed out changeset 3cdb2ce79d79 (bug 1627437) now that 1627437 is fixed (adjustments for nsITransaction::Do/UndoTransaction MOZ_CAN_RUN_SCRIPT). r=backout
https://hg.mozilla.org/comm-central/rev/eaf2127db4f3
Backed out changeset 0d3828ea7422 (for beta/release) now that 1627437 is fixed (adjustments for nsITransaction::Do/UndoTransaction MOZ_CAN_RUN_SCRIPT). r=backout DONTBUILD

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: