Closed Bug 53849 Opened 24 years ago Closed 24 years ago

Java -> JavaScript calls blocked by security checks.

Categories

(Core Graveyard :: Java: OJI, defect, P1)

x86
Windows NT
defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.6

People

(Reporter: jeff.dyer, Assigned: jeff.dyer)

References

()

Details

(Keywords: regression, Whiteboard: [nsbeta3-][rtm++])

Attachments

(7 files)

nsCSecurityContext is returning null origin and certid, causing the security 
check to fail.
This needs to be nsbeta3++.
Keywords: nsbeta3
Priority: P3 → P1
Because nsCSecurityContext is returning null origin and certid, any LiveConnect 
call from Java to JavaScript will fail because the browser fails to verify the 
origin of the applets, even the applet has enough priviledge to do so. This bug 
is significant because it will block all Java->JS call, and it means we won't 
have LiveConnect support in Netscape 6.
Attached patch FixSplinter Review
The attached fix needs to be tested by Stanley and reviewed by Mitch and Ed. It 
still leaves one know problem, which happens when trying to make a LC call while 
during applet init(). In that case, we still don't have a valid principal for 
the object (the page), and the call fails.

Things to look for:
1. Leaks.
2. Holes.
3. Is the method of getting the object principal general (mitch?)
4. Does it work for existing LC tests (stanley, raju?)
Do you really need to include both securityManager->GetSubjectPrincipal() and the 
code which gets a principal from the JS context? I think they're fundamentally 
the same.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: --- → M18
Mitch,

Jeff has handed this off to me to get checked in.  To answer you comment,
perhaps he does both things for the case when m_pJSCX is null?

I don't know.  I need to investigate it more.

Ed
Temporarily taking ownership
Assignee: jeff.dyer → edburns
Fix looks good to me code-wise. r=mstoltz. I'm not set up to test LiveConnect,
though. Please test this and make sure it's getting the correct Javascript
principal. 
In reply to Jeff's question in the patch, as a last resort you can always get
the URL of the page and call SecurityManager->GetCodebasePrincipal(uri,
CodebasePrincipal). This will give you codebase information only, not
certificate information. 
Agree with P1 and agree we should accept this for nsbeta3. It is high profile 
backward compatibility (existing Java/JavaScript LiveConnect content and applets 
on the web) and required for functional completeness of our browser and 
technology platform. Sun's also a high profile partner and the JDK is key 
high-profile functionality. It's important that we continue to offer clean 
interoperability between Java and JavaScript.
Keywords: 4xp, correctness
Someone, preferably Jeff, should look into mstoltz's suggestion about making a 
codebase principal if necessary.  Otherwise I'm left wondering (without testing 
myself) what happens in the case where there is no other principal to use.

/be
I made one small change to Jeff's fix:

Was:
    if (NS_FAILED(secMan->GetSubjectPrincipal(getter_AddRefs(principal))))


Is:

    if (NS_FAILED(secMan->GetSubjectPrincipal(getter_AddRefs(principal))) ||
        !principal)

That is, I check for the case when GetSubjectPrincipal() returns NS_OK, but the 
principal is still null.
Status: NEW → ASSIGNED
However, after applying my patch, I still get the error, even though 
nsCSecurityContext::GetOrigin() always return the right value when it's 
supposed to (to my knowledge).
Stanley gave me an alternate test case, which succeeds after applying the patch:

http://lonely.eng/Blackwood/src/OJI/PluginTest/sunw/LiveConnect/html/sjavatoscri
pt.html
Marking nsbeta3++ per ekrock's comments.
Whiteboard: [nsbeta3++]
I didn't notice this originally, but a return NS_ERROR_FAILURE; was removed in 
Jeff's patch, leaving us with

+    if (NS_FAILED(secMan->GetSubjectPrincipal(getter_AddRefs(principal))) ||
+        !principal)
+    if(m_pJSCX) 
+    {
          ...

which is passing strange: nested if-if-then should indent properly, at least, or 
(better) conjoin using

+    if ((NS_FAILED(secMan->GetSubjectPrincipal(getter_AddRefs(principal))) ||
+         !principal) &&
+        m_pJSCX) 
+    {

Was the original patch intentionally losing the return NS_ERROR_FAILURE?  If not 
then we need to restore that line.

/be
Why should a failure from secMan->GetSubjectPrincipal be suppressed?  I don't 
buy the missing-return-intentional thing, either it was an accident of the 
intention was wrong.  If secMan is running into memory exhaustion, the error 
should be propagated.

If the subject principal get succeeds, but the principal is null, *then* retry 
some other logic.

Or is the problem that secMan->GetSubjectPrincipal returns failure to mean both 
critical failure and non-critical "try something else" status?

/be
In order to make PR3, we need a clean review and checkin on 9/28. If we can't
meet that deadline, this fix will probably be deferred into the RTM cycle.
Brendan:

> Why should a failure from secMan->GetSubjectPrincipal be suppressed?  I don't 
> buy the missing-return-intentional thing, either it was an accident of the 
> intention was wrong.  If secMan is running into memory exhaustion, the error 
> should be propagated.

secMan is not running into memory exhaustion.  The only way 
secMan->GetSubjectPrincipal() can return anything other than NS_OK is if 
nsScriptSecurityManager::GetFramePrincipal() fails.  

>If the subject principal get succeeds, but the principal is null, *then* retry 
> some other logic.

This is what I have implemented.

> Or is the problem that secMan->GetSubjectPrincipal returns failure to mean 
> both critical failure and non-critical "try something else" status?

As far as I can tell, the real measure of GetSubjectPrincipal's success is 
whether or not the principal is null.  I'm attaching a patch that uses this 
logic.

Please let me know if you approve.

Thanks,

Ed
/be


Ed: secMan->GetSubjectPrincipals could run into what appear to me to be serious
errors at
http://lxr.mozilla.org/mozilla/source/caps/src/nsScriptSecurityManager.cpp#821
and
http://lxr.mozilla.org/mozilla/source/caps/src/nsScriptSecurityManager.cpp#1534
-- these are not due to memory exhaustion, from what I can tell, but they're not
good situations.  They indicate a "predicate failure" -- some earlier code that
should have run and ensured a principal was associated with an object or script
has failed to do so.  Such a failure sounds exploitable by security attackers.

So why shouldn't these failures be propagated?  I'm probably picking a bone with
mstoltz here, because Jeff ran into a real problem during applet init, where the
object (page) principal has not been set up yet.  But if there is no object
principal yet, then the new code you're adding, which tries to get the script
global object and from its data, the object principal that secMan can't yet
find, will probably fail also.  Is that the case?

This may even be a jst bug.  We need to get to the bottom of this.

I can approve a variation on your patch that simply adds an "// XXX why aren't
we able to propagate errors here?  See bug 53849" comment before the call to
secMan->GetSubjectPrincipal.  And we need to leave this bug open until it
records new bugs on file to track the open issues.

(mstoltz: there appears to be a bogus test, if (result), at line
http://lxr.mozilla.org/mozilla/source/caps/src/nsScriptSecurityManager.cpp#825
that governs a return NS_ERROR_FAILURE; this should be fixed -- can you file a
bug this and put the bug number here.  Thanks.)

/be
Brendan, here's the diff with the added comment.
Removing nsbeta3++.
Whiteboard: [nsbeta3++]
Bug fix authored by Jeff Dyer.

Reviewed by edburns, mstoltz, approved for trunk checkin by brendan.
Keywords: regression
PDT marking nsbeta3- because it is not clear how much this occurs, what 
percentage of those cases are fixed by this, and how many remain.  Also, the 
author is out of town. Nominating for rtm, please add sufficient information to 
enable triage for rtm.

Keywords: rtm
Whiteboard: [nsbeta3-][need info]
Just got off the phone with PDT.  Due to Jeff being out of town, they decided 
not to accept the fix for the branch.  I have decided not to put it in the 
trunk either.  I'm reassigning this back to Jeff.  Jeff, when you get back:

PDT wants to know:

1. How often do the problems fixed by this bug occurr?

2. How much more work remains to be done on this issue after this bug fix is 
applied?

Assignee: edburns → jeff.dyer
Status: ASSIGNED → NEW
This bug, as I understand it, blocks LiveConnect calls completely. It would be 
good to get this into PR3 if that's still possible.
Mitch, I was denied for PR3, due to concern about Jeff not being in town.
PDT marking [rtm need info]. If this is going to land on the branch, it needs a
code review and a super review. When those reviews are available, please change
the whiteboard to [rtm+] and PDT will review for [rtm++] consideration. Sooner
is better :-)
Whiteboard: [nsbeta3-][need info] → [nsbeta3-][rtm need info]
Ed, can you attach the final patch and I'll reiterate my a=?  Is Jeff back yet?

Phil, this did get approved by me for trunk checkin.

/be
Brendan, jeff's back.  I'll leave it to him.
I have a revised patch that I am testing. I'll try to post it tomorrow 
morning along with comments regarding the issues raised above.
The latest patch (10/5) makes sure that there is a jscontext associated with 
the security context. If there is no subject principal, then the jscontext is 
used to get the principal of the page through the dom. Thus a call to 
JSObject.getWindow() during applet initialization works.

Mitch's question "Are these two prinicpals always the same?" is a valid one. I 
don't know the answer. If they are we can simplify the logic in 
nsCSecurityContext::GetOrigin() and fix the problem addressed next. It is safer 
to leave it the way it is.

I fixed the confusing (but intentional) nested if() if()'s. I picked up Ed's 
change to check for !principal and make the control flow clearer. The problem 
of propagating result codes remains. Since 
nsSecurityManager::GetSubjectPrincipal() returns NS_ERROR_FAILURE when there is 
no principal to be had, we ignore the return value and continue trying to get a 
principal.

Finally, one issue remains. That is that we no long compare certificate ids. 
This means the effective policy for allow Java -> JS calls is that if the 
origin of the applet and the page are the same, then allow the call. Originally 
we had a more restrictive policy that the applet and the page had to have the 
same origin and same signer, which some argue is too restrictive and 
incompatible with Netscape 4.x.

I tried out the patch, and I still see couple problems:

1) Using nsCRT::strcasecmp to compare origin (I assume it is doing case 
sensitive), and it is problematic.

2) Using the patch, I get basic Java->JS call working. However, there is a 
remaining problem: When nsILiveConnect::ToString and nsILiveConnect::Finalize 
is called, we don't need to pass any security context from our side because we 
may not have any (This may be called by the finalizer thread, etc). However, by 
default, it will then call into

    jsj_env = jsj_enter_js(jEnv, NULL, NULL, &cx, NULL, &saved_state, NULL, 0, 
NULL );

in nsCLiveConnect.cpp, and it will result as zero and ignore the call.

For nsILiveConnect::ToString, we then get back NULL as the result. However, for 
nsILiveConnect::Finalize, it will result in browser crash because both the Java 
and JS are trying to finalize the same object!!

In both nsILiveConnect::ToString and nsILiveConnect::Finalize, the call MUST 
succeed no matter what the security context is because there is really no 
security implication. 

Regarding the security concern of not comparing the cert principle, this is 
okay because we only allow JavaScript and Java to communicate if and only if 
they are from the same host. In this case, the applet can obtain all the 
information in the HTML page anyway by connecting back to the same host. We 
won't be able to pass the cert principle because we don't have a common system 
in JS and Java to manage the cert.

Jeff, we see similar problems on Solaris/Linux as well. Hope you will be able 
to fix this problem in time. ;)

My mail server has been dead most of the entire afternoon, so if you need to 
contact me, better call my office or cell phone.


I looked at the problem more. There are more issues:

    For all the calls in nsILiveConnect, the following call will be used to 
obtain the JavaScript Context (cx) to perform the action:

    jsj_env = jsj_enter_js(jEnv, .....);

    The issue is, I don't think it is getting the proper JavaScript context in 
most of the cases. If I just do the following in my applet:

public class Test extends Applet
{
    public void start()
    {
	try
	{
	    JSObject win = JSObject.getWindow(this);
	    win.eval("alert('Hello World');");
	    JSObject doc = (JSObject) win.getMember("document");
	    System.out.println("doc: " + doc.toString());
	    System.out.println("location: " + doc.getMember("URL"));
	}
	catch (Throwable e)
	{
	    e.printStackTrace();
	}
    }
}

All the calls exception getWindow will fail. It means all the calls in 
nsILiveConnect::GetMemeber/SetMember/GetSlot/SetSlot/Call actually fails. This 
is because the cx obtained in nsILiveConnect is not correct.

It will still fail if I put some scripts in the HTML page:

<SCRIPT>
<!--- Hide 
function popHello()
{
   alert("Java To Javascript access is successful");
}

// ends hide here --->
</SCRIPT>

However, if I put the script in the HTML page BUT also call the script from 
Java, like

public class Test extends Applet
{
    public void start()
    {
	try
	{
	    JSObject win = JSObject.getWindow(this);

-->>	    win.call("popHello", null);

	    win.eval("alert('Hello World');");
	    JSObject doc = (JSObject) win.getMember("document");
	    System.out.println("doc: " + doc.toString());
	    System.out.println("location: " + doc.getMember("URL"));
	}
	catch (Throwable e)
	{
	    e.printStackTrace();
	}
    }
}

Once some script is actually get executed, then all the LiveConnect calls will 
succeed. 

I suspend "cx" is not initialized properly until the first call to the script, 
and it makes all the Liveconnect calls to fail until then. This is pretty bad 
because I don't know because it means it fails in most cases unless the user 
workaround it.




. In all the LiveConnect calls, only nsILiveConnect::GetWindow succeeded 
because it doesn't perform anything on cx. However, for the rest of the calls 
in LiveConnect, it will always fail because it doesn't find the properties or 
objects through GetMemeber/SetMember/GetSlot/SetSlot/Call. This is pretty bad

	    win.eval("alert('Hello World');");

Jeff, I tested out your latest patch, and it works after I use the workaround in
Java to make sure the JavaScript context is populated. Otherwise, I see
NullPointerException is being thrown from the applet because nsILiveConnect is
not succeeded.
This latest patch fixes the problems associated with this bug, as verified by 
Stanley. The problem managing JSContexts is unrelated to OJI Security and so 
will get its own bug #.
r=edburns
>------- Additional Comments From Phil Peterson 2000-09-27 23:34 -------
>PDT marking [rtm need info]. If this is going to land on the branch, it needs
>a code review and a super review. When those reviews are available, please
>change the whiteboard to [rtm+] and PDT will review for [rtm++] consideration.
>Sooner is better :-)

The patch is ready. The [rtm need info] is addressed by my comments above. A 
new patch is attached (No response from reviewers@mozilla.org yet.) What else 
should I do to get this bug fix done? 
Ok, nits only.  Please fix indentation and bracing to match the prevailing style 
(make Emacs do the work, per the modeline comment on line 1).  Example of wacky 
indentation:

+    else if(context->java_applet_obj)
+    {
+        java_applet_obj=context->java_applet_obj;
+    }
+
+       JSContext *pJSCX    = 
map_jsj_thread_to_js_context_impl(nsnull,java_applet_obj,jEnv,errp);
+    nsCOMPtr<nsIPrincipal> principal;
+
+       if (pNSISecurityContext != nsnull) {
+
+        if(pJSCX) 
+        {
+            JSPrincipals *jsprin = nsnull;
+
+            nsCOMPtr<nsIScriptContext> scriptContext = 
(nsIScriptContext*)JS_GetContextPrivate(pJSCX);
+            if (scriptContext)
+            {
+                nsCOMPtr<nsIScriptGlobalObject> global = 
scriptContext->GetGlobalObject();
+                NS_ASSERTION(global, "script context has no global object");
+
+                if (global) {
+                   nsCOMPtr<nsIScriptObjectPrincipal> globalData = 
do_QueryInterface(global);
+                   if (globalData) {

If you don't do this for the branch, do it in a subsequent trunk checkin that I 
will be happy to review.

Please file that other JSContext management bug if you haven't (and any other 
unfiled bugs -- the fact that GetSubjectPrincipal fails uselessly when there is 
a fallback via DOM interfaces that can compute a valid principal still seems 
like a bug to me), and put the bug #s here in a comment.  Thanks.

a=brendan@mozilla.org

/be
>Please fix indentation and bracing to match the prevailing style

Done. See final patch attached here.

>Please file that other JSContext management bug if you haven't 

Done. Its #55990.

>(and any other unfiled bugs -- the fact that GetSubjectPrincipal fails
>uselessly when there is a fallback via DOM interfaces that can compute
>a valid principal still seems like a bug to me)

It is not a bug that GetSubjectPrincipal does not provide a principal
in all cases. It is a bug that it returns NS_ERROR_FAILURE in the 
normal case that it can't provide one. Need a little API redesign
here. Mitch, have you opened a bug for this yet?

Marking as [rtm+] to initiate PDT review for [rtm++].
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm++]
Whiteboard: [nsbeta3-][rtm++] → [nsbeta3-][rtm+]
Just want to reassure the PDT that Jeff changed only white space (I verified by 
eyeball), so r= and a= should still apply.

/be
> It is not a bug that GetSubjectPrincipal does not provide a principal
> in all cases.

My concern was that GetSubjectPrincipal failed to find a principal, yet the LC 
code could cons one up from DOM interfaces.  What prevented the "subject" from 
doing the same at an earlier time, so that GetSubjectPrincipals could succeed 
when called in this case?

> It is a bug that it returns NS_ERROR_FAILURE in the 
> normal case that it can't provide one. Need a little API redesign
> here. Mitch, have you opened a bug for this yet?

Agree on this point, in any event.  NS_OK with a null out parameter is better 
than debasing and overloading NS_ERROR_FAILURE.

/be
>What prevented the "subject" from doing the same at an earlier time, 
>so that GetSubjectPrincipals could succeed when called in this case?

I could be wrong about this, but I think the script security manager
considers the subject principal to be the principal of the script
on the JS call stack. A null subject principal means that there is 
no JS stack, and hence no subject. Perhaps the fact that the subject
principal (when one exists) and the page prinicpal (which always
exists) are always the same, obscures the need for the distinction.
PDT would like a review from mstoltz. marking [rtm need info]
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
I noticed this comment:
// If one or the other context is signed, the comparision will fail.
in the section of code that compares certificates, which is ifdef'd out. I
assume this condition, that the comparison fails if one context is signed, holds
true even though this bit of code is ifdef'd out. If not, it needs to be. If we
can't compare certificates, we need to disallow access when certificates are
present. The rest of the patch looks good. When this is checked in, I'm going to
ask Guninski to try and break our LiveConnect security.

r=mstoltz with the above caveat.
Reviewed by Mitch. Marking [rtm+].
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+]
Blocks: 54339
PDT needs info: please land on trunk. Per MStoltz, please ask Guninski to try to
break this. 
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
PDT: Please do not make feedback from Guninski a condition for getting this
checked into the branch. I will ask him to look at this, but it could easily be
a week or two before we get any feedback from him, and by that time we may not
be able to get this fix in at all. Restoring rtm+ for reconsideration.
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+]
PDT still needs info: would like some assurance that this is safe and worthwhile
relative to the risk.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
The risk of not applying the patch is that Liveconnect is broken in a very 
visible way. The risk of applying the patch is that a potential security hole 
is created. Specifically, unsigned applets can access the dom with the same 
privileges as the page (possibly signed) that hosts it. Since the page controls 
which applets are loaded, it is only a security concern if the author of the 
signed page does not know the LiveConnect security policies.
Marking [rtm+] for consideration.
Status: NEW → ASSIGNED
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+]
My earlier concerns have been resolved to my satisfaction. I am comfortable with
the risk level here. Jeff and the Sun team have made security the first priority.
PDT says need info: Please land on trunk, and test there for 48 hours, then
renominate with some justification for why this is worth the risk.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
Jeff, I'll land this on the trunk tonight or tomorrow morning.  You verified
that it works right?
Yes it works. I plan to land it in the early morning (if the tree is open.) 
Check to see if it's done if you haven't done it by then. (I'll do the same.) 
This is a good call by the PDT. Let's get it tested so they know it's a good 
risk.
-jd
The patch has landed on the trunk.
Blocks: 14666
It's been 48 hours. Marking [rtm+] for review.
FYI-14666 is reopened because of this bug. 
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+]
This bug is in candidate limbo for the moment.  When we have a candidate build 
in our pocket, we'll come back and consider approviing this.  Until then, we 
can't approve it.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm+][InLimbo-OOH]
Guninski is evaluating the security of this checkin.
We are moving toward the candidate.  Please check this fix into the trunk so we 
can get get some cook time.
The patch was checked into the trunk on 10/19 per the PDT's instructions of 
10/18.
PDT marking [rtm++]. This bug is now out of limbo and approved for checkin to
the branch. Please check in ASAP.

Whiteboard: [nsbeta3-][rtm+][InLimbo-OOH] → [nsbeta3-][rtm++][InLimbo-OOH]
Whiteboard: [nsbeta3-][rtm++][InLimbo-OOH] → [nsbeta3-][rtm++]
Patch has landed.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
This fix may be the cause of 58551.  Can you help us find out?
Wait.  I think mscott et al. may have found a way that Jeff's fix tickled a 
latent bug, bug 31847.  The finger of blame is moving.

/be
Brendan, Without Java running (as in my situation, I believe), I don't see how 
Jeff's code could change anything.
Jband: I agree in your situation, but I took mscott et al. at their word, and 
they seemed to find a correlation.  Whatever the combination of causes, the one 
true bug to fix here is bug 31847, I think.

Although now at least selmer (for the PDT) seems to want a full diagnosis, which 
is worthwhile, but which should not lead to valid fixes being backed out just 
because they happened to expose 31847 (or so I say!).

/be
Since mscott backed out the change to fix bug 58551 I'm reopening this bug. 
Shouldn't it?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Yes, this should be reopened.
mscott said [in bug 58551]:
>Jeff, I'll check your changes back into the branch again once 
>PDT starts taking in limbo bugs again. 

What should the status whiteboard say?

Whiteboard: [nsbeta3-][rtm++] → [nsbeta3-][rtm++][InLimbo-OOH]
Marking rtm+ so we can go back into limbo with this change, now that it seems
not to be the cause of yesterday's big regression. I'll ask mscott to check it
back in, if that's ok with you Jeff.
Whiteboard: [nsbeta3-][rtm++][InLimbo-OOH] → [nsbeta3-][rtm+][InLimbo-OOH]
That would be great. Let me know if I can help.

-jd
rtm++, please checkin ASAP so we can build today.
Whiteboard: [nsbeta3-][rtm+][InLimbo-OOH] → [nsbeta3-][rtm++][InLimbo-OOH]
I checked this back into the branch again.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
This must be backed out per discussion with mstoltz.  Please bring back the 
complete fix with new reviews and [rtm+] status.
Whiteboard: [nsbeta3-][rtm++][InLimbo-OOH] → [nsbeta3-][rtm need info][InLimbo-OOH]
Backed out. Jeff and I are working on a fix for Bugscape 3109, which should be
checked in along with this patch.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
huh? Arrgh. Removing InLimbo-OOH tag.
Whiteboard: [nsbeta3-][rtm need info][InLimbo-OOH] → [nsbeta3-][rtm need info]
rtm-, can't stop ship for this.
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm-]
"New Netscape 6! Now without LiveConnect support!"
Since there is a respin planned, this bug is not trying to "stop ship".  It's
trying to get on the ship before the ship leaves.  Why should it not be allowed
on board?

/be
The respins of the N6 RTM branch at this Very Late point are now restricted to
including must fix problems (defined by marketing and legal requirements), as
well as high reward, small size, low risk changes, with narrow code-path impact,
that can be verified by QA in time for our target dates.  This patch, which
induced regressions in the second N6 limbo landing, is now too risky for
inclusion.  We can't possibly test and shake out the security ramifications of
this landing in time for our release dates.

Hopefully this will get a thorough shake-out on the trunk, and can be included
in the next release.
jar: "induced regressions" is plural, but I was involved in reviewing a patch to
fix only one regression, in security policy (bugscape 3109).  Jeff's patch did
not have anything to do with bug 58551 (which was essentially a dup of 31847). 
Were there any other regressions?

/be
Jim -
I understand that the criteria for assessing the value of a fix is subjective. 
None-the-less, I'm left wondering if the PDT has considered fully the risk of 
disabling LiveConnect calls from Java to JavaScript. The risk, of course, is  
of losing content developers because an key feature they have long depended on 
to do their jobs, has been erased. -jd
I do understand that risk management is going on.  However, this bug has been in 
since last week, has gotten good bake time, and has specifically been 
scrutinized against another problem, and it was evidently discovered that this 
patch had no effect on that other problem.  Of *course* this patch might cause 
other problems in the browser, but that's true of many, many patches, large and 
small.  The point is, you rolled the dice, took this in, and it didn't break 
anything for over a week.  If there's a security hole that somebody finds down 
the line, it'll get fixed right away.  But from what we know now, this patch 
seems to do good, not evil.  Why not take it?

It's patch number two on a pretty fair-sized list of patches.  The patch was 
actually reviewed several times and has been in the trunk since October 19.  I 
can't say that many of the other recent patches have been in that long.
I must jump in here.  Oracle Tools division NEEDS full liveconnect 
functionality in our Cherokee HTML form tool.  This is our HTML next-generation 
of Developer 2000.  It will be used by many of our Oracle Applications product. 
Without it, we will only be able to run on MSIE.  This is not good for us and 
for Netscape.  I think it wise that you make this functionality a MUST-FIX and
get it working as soon as possible.  I agree with George and Brendan on this.  
You should not lose functionality by moving up to a new version.  What if apps 
that depend on liveconnect now move up to NS6 and suddenly they don't work.  
That would not look very good and further hurt Netscape's marketshare.  Please 
don't just consider this from a purely technical standpoint.  
There are two points that need to be clarified here:

1) "This patch, which induced regressions in the second N6 limbo landing ..."
IIRC this statement is factually incorrect. After the regressions in the second
N6 limbo landing, some people initially believed this patch to be at fault and
it was backed out because of this suspicion, but this patch was later determined
to be not at fault for the regressions, yes?

2) "... is now too risky for inclusion." Even if this patch is indeed blameless,
I understand that some may argue (independently of the question of whether this
patch caused the earlier regressions) that we are now too close to release to
accept this patch due to the security model implications, even though we could
have taken the risk a week ago. In the end, this is a subjective judgment call.
Isn't it the case however that this patch has been baking on the trunk for over
a week with no ill effects however? In addition to that, we have multiple senior
engineers arguing that the patch should be low risk on the merits.

3) A third point: we now have the additional information that this breaks an
Oracle application, which is evidence of actual customer impact, increasing the
desirability of accepting this fix.

We need to keep in mind that since we already have a certified fallback build
and are working on certifying the limbo 2 build, here are the risks we would be
taking by accepting this patch:

a) if we accept the patch on the branch and (in spite of the fact that it's been
baking on the trunk without incident) it causes unacceptable regressions we
detect, we'd have to go with an earlier fallback build rather than the final,
losing the benefit of the other patches that made it into the later limbo builds

b) if we accept the patch on the branch and (in spite of the fact that it's been
baking on the trunk without incident) it causes unacceptable regressions we
*don't* detect by RTM, we'd ship a product with flaws that either would impact
the usability of the first release in ways we chose to live with or would force
an emergency point release from the branch

It's a judgment call. drapeau, brendan, jeff, mike -- would you be available to
be dialed in to today's 3 p.m. PDT meeting? If so, please voicemail the #s
you'll be at to my Netscape extension. (Will email you that # separately; my
receive email access is dubious today.)
For clarity: the point I'm making with my previous comments is that accepting
this patch won't be taking this risk of causing us to miss our ship date,
regardless.
Marking [rtm+] to trigger reconsideration at today's pdt meeting based on above
additional input from all sources. The r/a for the previous check-in should
still count since this bug turned out to be wrongly blamed for the limbo
regressions, IIRC.
Whiteboard: [nsbeta3-][rtm-] → [nsbeta3-][rtm+]
The regression in question is bugscape 3109 and any other unknowns that might be
introduced by this patch on the branch.

Eric's comments would make good sense in the context of a limbo 3 build.  We are
not doing a limbo 3 build because there is not enough time left.

The additional risk this patch would add is that the stop-ship bugs that were
taken might not be in the certified build.  The inclusion of this fix in a limbo
build was opportunistic, but 3109 caused it to miss that build.
Please note that we now have a well-tested patch for bugscape 3109.
selmer> The regression in question is bugscape 3109 and any other unknowns that
selmer> might be introduced by this patch on the branch.

If this is addressed at my questioning jar's use of "regressions" (plural), it 
won't do.  You can't hand-wave hypothetical unknowns into a "regression" set of 
one element and use plural to inflate your claim, at least not with me.  Kindly 
drop this word-play, it's weaselly.

About the rest of the reasoning (opportunistic, limbo), it seems to me that 
process is now king.  Is that just to get CDs burned by a certain date?  That's 
not a good reason to break LiveConnect Java=>JS in Netscape 6, IMHO.

/be
The regressionS were written up in bugscape 3109, and fortunately could be
handled by a patch in a single area.  When I heard about the issueS, there were
several items mentioned as I recall, but all surrounding the one fix in the end.
 The issue of singularity vs plurality seems of no consequence. I was trying to
explain why this bug had not been included in the limbo 2 build.  It was because
of the problemS we identified internally in our commercial build (1, 100 or
1000, it wouldn't matter).

At this point in time (beyond even limbo 2), we can no longer afford to mix in a
code landing of this size, and be able perform requisite regression tests and
ship on our target date.  I think most contributors were aware that when we
started taking limbo 1 and 2 fixes, that there was a chance that none of the
limbo landings would actually be included in Netscape's RTM.  At this point, we
no longer have the time to accept landings of this magnitude on the N6 branch.



You mean, again.  No time to put this fix into the branch, again.  It was
already in once, didn't break anything, and was taken out.

"The issue of singularity vs plurality seems of no consequence."

Your use of plural was puffery to inflate the case against this patch, and you
know it.  It could be easily read to imply that this patch caused bug 58551, and
it was read that way by drapeau.  I have read bugscape bug 3109, and it concerns
one problem, not several.  Its reporter wrote, in the initial comment, "The
problem is LiveConnect allows calling xxxxYyyyyy" (where xxxxYyyyyy is a single
well-known method).

I'm sick and tired of this weasel-wording.  Arguing about risks requires being
exact about the number and kind of regressions, and eschewing inflated, vague,
windy, political language.

/be
Adding "mozilla 0.6" keyword. Hey, if the PDT decides to continue like this,
then we could just get all the liveconnect
developers to switch from Netscape to Mozilla. How ironic would THAT be?

Keywords: mozilla0.6
To clarify again for brendan: I never suggested that bug 58551 was caused by
this bug. You and I both know it was not, and I will assert again today that it
was not.  I never mentioned that other bug, and it would be simpler to read this
bug if you would stop referenecing it, and suggesting a connection (or worse
yet, suggesting I raised the point).

The regressionS for N6 that were reported to me and the PDT (before ANY bug was
filed) were later fixed by the resolution given in the singular bug 3109 in the
Netsacpe internal database.  I don't know of any other bug reports relating to
to regressions.

Per your email request, I tried to carefully explained in this bug why the PDT
had not taken the patch in limbo 2 (in fact, we rolled it in and then back
during the limbo 2 landing period).  Your email indicated that there was no
explanation of why we were forced to roll it back out, and asked us to publicize
that.  You have now been attacking each comment I've made in servicing your
request. I don't understand what you are asking for at this point as an
explanation.  I've been very precise.

If we are given the chance to adjust our schedule, then it would of course be
highly desirable to take and test this change, along with the regression patch
offered in 3109.  At this point, we cannot afford to take this change.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm-]
jar, there you go again.  I'm not attacking each and every comment you make,
only the vague and misleading ones.  Please specify the plural regressions, as
in distinct bugs, caused by this patch.  There is only one, it is described by
the bugscape bug, and it has been fixed.  I hope that means it will soon be
moved to bugzilla so everyone can judge the singularity of the problem it
reports.  Of course, you could still assert that you and the PDT heard about
multiple regressions, but if you can't specify any others, why should anyone
believe you?

When you're right, you are right and I either say so in public (see bug 48064)
or (better) simply stand back, avoid glad-handing and rah-rah'ing, and let the
facts and the involved community members speak for themselves.  When you are
wrong and the facts are not in everyone's view due to bugscape, I'm a pit-bull. 
Get used to it!

/be
Brendan, I'm not getting in the middle of your argument with Jar.  Please don't
misconstrue my comment in that way or include me in your name calling.  I'm also
unhappy that this bug was mistakenly backed out of the limbo build.  Regardless,
it happened and it happened at a very bad time.

At this moment, there is one _known_ regression in 3109, a large patch in this
bug, and zero time.  There is great fear of the unknown because there is zero time.

The mistaken backout was unfortunate, but arguing about it won't change the ship
date or the need to drastically minimize risk in the candidate builds.  The
reality is that we *must* attempt to have the CD build identically match the
download build.
Setting target milestone to Mozilla 0.6. Maybe if marketing realizes that people
will be able
to do Oracle management along with other important corporate tasks with the
non-commerical version of the browser that they won't be able to do with the
commerical version, they will be sufficiently corporately embarrassed to let
this in to RTM.

Also adding ns601 keyword that (going by the current decisions made by the PDT)
will probably be released 2 weeks after RTM.

Keywords: ns601
Target Milestone: M18 → mozilla0.6
selmer: is the CD build the tail that should not wag the 6.0 dog?

jce2: please stop using ns601 -- it's ill-defined and badly named.  It is not 
likely to mean what you think here (a 6.01 security firedrill release done in 
the next little while; some people have used it to mean the next non-firedrill 
release).

Also, as you do not work for netscape.com, and as we have long ago passed 
nsbeta[123] and now have mozillaX.Y milestone keywords with which to nominate 
bugs, setting keywords that start with ns is something to be avoided unless 
you've gotten some Netscaper's blessing, and everyone knows it.  Jar has already 
asked you please to stop thrashing status whiteboards, so I don't believe that 
you are blessed or deputized.

What's more, mozilla0.6 stands for a mozilla.org release off the Netscape 6 
branch that we're contemplating *only* to ensure that MathML and other #ifdef'ed 
Gecko extensions can be built and dropped into a Netscape 6 installation.  It is 
not for random fixes that didn't make it into Netscape 6.  Such fixes belong in 
the Mozilla trunk.  This is all per the roadmap.

I think your frustration about this patch not making it could be better directed 
toward demonstrating that the patch is safe and sound, in case the PDT or parts 
of it reconsiders this bug in light of a respin.

/be
pdt: marking rtm++ , please check in ASAP into the branch; bugscape bug fix must 
go in a the same time.
Whiteboard: [nsbeta3-][rtm-] → [nsbeta3-][rtm++]
has this fix been checked in yet?  I'm waiting to respin.
Please check in soon.  We are only waiting on this bug fix to be checked in 
(plus bugscape 3109).
Fixes are checked in on the branch for this and bugscape 3109.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Will the bug fix be available on the Windows nightly binary builds available on 
the Mozilla site?  I cannot test this functionality until the fixes are in these 
builds.  This is very important to my development effort.  Thanks.
Per Gary's question: will this be checked in on the trunk as well? Or is there a
reason that it won't--such as the desire to create some more-complete fix?
Thanks to Ed Burns for providing me the testcase files! The test 
"sjavatoscript.html" (java to javascript access) works now with the latest 
branch build on windows(110616). I see the alert dialog saying " Java to 
Javascript access is successful" coming up on this test. Not sure if this is 
checked in the trunk,so.. am not adding the "vtrunk" keyword. Somebody, pls 
respond back telling if this is going to be in the trunk build (so that I can 
close this bug and get off the rtm radar). Thanks !!
I must agree with Gary Kind from Oracle here. The core product from our company
(Caplin Systems) Uses this type of Java -> Javascript function call. It would be
extremely disadvantageous to both Netscape and ourselves if this funcitonality
were not present in the release of NS6. It would result in our clients needing
to exclusively use IE browsers..... and this can't be good.
shrirang is in Netscape QA and has verified this functionality is in our RTM
candidate builds of Netscape 6.

Need to know if this is checked into the trunk; I recall yes, but am not sure.  
This is checked into the trunk (afaik it was never backed out of the trunk). As
of Thursday it worked on the trnk. As of yesterday, the security fix (bugscape
3109) is also in the trunk. 
Please take a look at bug 59542 if it is related to the fix here.
Keywords: ns601highrisk
This bug [53849] has been fixed. If you think the fix had the effect of 
breaking something else, it is enough to say so in a bug that describes the 
effect. If you would like to get the attention of the owner of this bug, or any 
one else, please send them a message directly. My fear is that this fix has 
become a too easy target of blame for too many others.
Thanks for hearing me on this.
-jd 
adding keyword: vtrunk
Keywords: vtrunk
Keywords: highrisk
*** Bug 51640 has been marked as a duplicate of this bug. ***
Blocks: 51640
verified on windows trunk 110604. 
Status: RESOLVED → VERIFIED
Keywords: vtrunk
Keywords: mozilla0.6
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: