Closed Bug 84420 Opened 23 years ago Closed 23 years ago

BeOS and OS/2 should use Unix Async DNS Implementation

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.2

People

(Reporter: gordon, Assigned: mkaply)

Details

(Whiteboard: critical for 0.9.2)

Attachments

(3 files)

proposed patch for adding eviction of dns cache entries to OS/2 and BeOS. 23 years ago gordon 1.74 KB, patch		Details \| Diff \| Splinter Review
Optimistic patch for OS/2 and BeOS to use same async implementation as Unix. 23 years ago gordon 5.48 KB, patch		Details \| Diff \| Splinter Review
Fix to minimal patch for PR_CLIST_IS_EMPTY typo. 23 years ago gordon 696 bytes, patch		Details \| Diff \| Splinter Review

gordon

Reporter

Description

•

23 years ago

The DNS cache doesn't evict lookups on BEOS and OS/2.  We need to call 
AddToEvictionQ() after processing the request on new/pending lookups for 
platforms that don't use the dns thread for async lookups.

gordon

Reporter

Comment 1

•

23 years ago

Mike, do you test on both BEOS and OS/2, or just OS/2?  Could you test a patch 
once I get it done?

Status: NEW → ASSIGNED

Priority: -- → P3

Target Milestone: --- → mozilla0.9.2

timeless

Comment 2

•

23 years ago

afaik mkaply is just a concerned citizen and doesn't have a BeOS box.  I 
know cls builds on BeOS.

Summary: DNS cache doesn't evict on BEOS and OS/2 → DNS cache doesn't evict on BeOS and OS/2

benc

Comment 3

•

23 years ago

Someone else is going to need to pick up testing this.

Keywords: qawanted

gordon

Reporter

Comment 4

•

23 years ago

Attached patch proposed patch for adding eviction of dns cache entries to OS/2 and BeOS. — Details — Splinter Review

gordon

Reporter

Comment 5

•

23 years ago

If mkaply and cls can review the patch, I'll get an sr= and an a= and check it 
in.

Another possibility is that OS/2 and BeOS may be able to simply use the UNIX 
implementation.  That would clean up the code a bit as well.  I'll submit an 
optimistic patch so someone can try that as well.

Mike Kaply [:mkaply]

Assignee

Comment 6

•

23 years ago

How would I actually test the patch to see if it was working/failing?

gordon

Reporter

Comment 7

•

23 years ago

Attached patch Optimistic patch for OS/2 and BeOS to use same async implementation as Unix. — Details — Splinter Review

gordon

Reporter

Comment 8

•

23 years ago

The second patch is using PR_GetIPNodeByName() for OS/2.  Wan-Teh, is that 
available on OS/2?

Wan-Teh Chang

Comment 9

•

23 years ago

Yes, PR_GetIPNodeByName() is available on OS/2.

gordon

Reporter

Comment 10

•

23 years ago

Setting target milestone to 0.9.3, but I'll check it in as soon as I get test 
feedback and reviews.

Target Milestone: mozilla0.9.2 → mozilla0.9.3

Mike Kaply [:mkaply]

Assignee

Comment 11

•

23 years ago

OK, without any patches applied, DNS works fine.

First patch wouldn't compile - PRCLIST_IS_EMPTY is not defined.

Second patch compiled, but I got this assert coming up:

###!!! ASSERTION: bad DNS shutdown state: 'mState == 
DNS_SHUTTING_DOWN', file D:
/builds/current/mozilla/netwerk/dns/src/nsDnsService.cpp, line 1219

And DNS did not work at all.

gordon

Reporter

Comment 12

•

23 years ago

Attached patch Fix to minimal patch for PR_CLIST_IS_EMPTY typo. — Details — Splinter Review

gordon

Reporter

Comment 13

•

23 years ago

I don't understand the problem with the optimal patch.  I looks like the dns 
thread is failing to wait on the condition variable in DequeueLookup() or getting 
wrongly notified when there are no lookups in the queue.  It's not clear to me 
how this could be happening on OS/2, but not Unix.

Wan-Teh, are there any known differences with PR_WaitCondVar() on OS/2?

Wan-Teh Chang

Comment 14

•

23 years ago

You MUST ALWAYS call PR_WaitCondVar in a while loop.

The following code is wrong.
 
nsDNSLookup *
nsDNSService::DequeuePendingQ()
{
#if defined(XP_UNIX)
    // Wait for notification of a queued request, unless  we're shutting down.
    if (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) {
        PRStatus  status = PR_WaitCondVar(mDNSCondVar, PR_INTERVAL_NO_TIMEOUT);
        NS_ASSERTION(status == PR_SUCCESS, "PR_WaitCondVar failed.");
    }
#endif

You should say
    while (PR_CLIST_IS_EMPTY(&mPendingQ) && (mState != DNS_SHUTTING_DOWN)) {

gordon

Reporter

Comment 15

•

23 years ago

Ah, great.  I'll fix it on the trunk as soon as I can push through the review 
process.  I wonder why we haven't seen this as a problem on Unix?

Darin Fisher

Comment 16

•

23 years ago

in the original "async dns" XP_UNIX code that i wrote for bug 10733, we had
something similar to:

DNS thread:
  mon.enter();
  if (!pending_lookup && !shutting_down)
      mon.wait();
  if (pending_lookup)
      do_lookup();
  else
      do_shutdown();
  mon.exit();

client thread:
  mon.enter();
  pending_lookup = true;
  mon.notify();
  mon.exit();

in other words, it seemed ok for there not to be a pending_lookup when the
monitor was notified... it meant to shutdown (note: the shutting_down flag was
just used to handle the shutdown case in which the DNS thread had not yet run).
the new code seems to be basically identical.

on OS/2 it seems to be the case that a pending_lookup is added before the
monitor (convar in the new world) is notified, but after the DNS thread resumes
pending_lookup is for some reason not set.

wtc: why is the while loop needed?  i always thought that a while loop was only
need in the case of multiple threads waiting on a condition upon which only one
thread may act.  is this not the case?  thanks!

Wan-Teh Chang

Comment 17

•

23 years ago

The OS/2 problem may be not caused by not using a while loop
here.  I am merely pointing out that as a rule, you must call
PR_WaitCondVar or PR_Wait in a while loop.  That's required by
the API.

Two reasons the while loop is required.
1. PR_WaitCondVar or PR_Wait may return due to a spurious wakeup.
   Note that an implementation may not actually generate spurious
   wakeups, but they are allowed by the API specification.
2. If multiple threads may call PR_WaitCondVar, another thread
   may be able to enter the monitor sooner than the awakened thread
   can (a thread awakened from a PR_WaitCondVar call needs to re-enter
   the monitor) and make the boolean expression false again.  This
   condition is called a stolen wakeup.

Again, I am not saying that the OS/2 problem is caused by a
spurious wakeup or a stolen wakeup.  In fact, you may be able
to analyze your code and the PR_WaitCondVar implementation on
OS/2 and determine that these don't happen.  I am just saying
that it is a syntactical requirement (one that can't be enforced
by a compiler) that all PR_WaitCondVar calls must be inside a
while loop.

Darin Fisher

Comment 18

•

23 years ago

fair enough... thanks for the thorough explanation :-)

in this case, it is not a stolen wakeup, because we only have one thread 
that may call PR_WaitCondVar.  perhaps it is the spurious wakeup problem,
since we are only seeing this on OS/2 and not any of the unix platforms.

sounds to me like we need to do a scan of all usages of PR_Wait or
PR_WaitCondVar to ensure that they are not subject to similar problems.
I suspect there may be problems with the thread pool implementation.

gordon

Reporter

Comment 19

•

23 years ago

The simple patch has been checked in.  So I'm morphing this bug to state the
desire to have BeOS and OS/2 use the Unix implementation of async dns lookups
(unless those platforms have a native async gethostbyname API that someone wants
to use).

Summary: DNS cache doesn't evict on BeOS and OS/2 → BeOS and OS/2 should use Unix Async DNS Implementation

Target Milestone: mozilla0.9.3 → mozilla1.0

Mike Kaply [:mkaply]

Assignee

Comment 20

•

23 years ago

OK, with the current level of nsDnsService.cpp, Os/2 can follow the 
UNIX path no problem (as per the optimistic patch)

Do we want to assume BEOS can and check in the changes? Or just check 
in the pieces with XP_OS2?

Mike Kaply [:mkaply]

Assignee

Comment 21

•

23 years ago

Gordon,

OS/2 can use the Unix implementation. Do you want to assume that BeOS can? Or 
should I create a patch that just fixes this for OS/2?

gordon

Reporter

Comment 22

•

23 years ago

You know, I'm tempted to change it for both; if it works - great! If it breaks
BeOS and somebody cares, then we should hear from them, which will let us know
who we can work with to resolve the issues.  I suppose we should make an
announcement on netscape.public.mozilla.netlib first.

timeless

Comment 23

•

23 years ago

you could also post to netscape.public.mozilla.beos, ... I just updated my tree 
so i might try a beos build this week.

Makoto Hamanaka

Comment 24

•

23 years ago

Sorry to be late, I tested the 'optimistic' patch on BeOS build and it is
working for me so far.
I'm posting this from patched Bezilla:-)

Mike Kaply [:mkaply]

Assignee

Comment 25

•

23 years ago

Why is this targeted for 1.0?

I'd like to see this in 0.9.2. Don't we leak without this?

Darin Fisher

Comment 26

•

23 years ago

mkaply: gordon is out of town for the next couple of weeks... so, this probably
won't go in for 0.9.2 unless some kind soul takes it... how 'bout you? ;-)

Mike Kaply [:mkaply]

Assignee

Comment 27

•

23 years ago

I'm taking this bug to push for checkin to 0.9.2.

Gordon wrote the code, and I looked at it and said it was OK, so

r=mkaply

Makoto Hamanaka verified it on BeOS, so I am going to take r=VYA04230@nifty.com 
for the BEOS part.

darin: Can I get an sr=darin so I can send this to drivers?

Assignee: gordon → mkaply

Status: ASSIGNED → NEW

Target Milestone: mozilla1.0 → mozilla0.9.2

Darin Fisher

Comment 28

•

23 years ago

sr=darin

Mike Kaply [:mkaply]

Assignee

Comment 29

•

23 years ago

Fix checked in to trunk and branch (branch approval from Asa by mail)

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Christopher Blizzard (:blizzard)

Updated

•

23 years ago

Whiteboard: critical for 0.9.2

benc

Updated

•

23 years ago

Keywords: verifyme

Mike Kaply [:mkaply]

Assignee

Comment 30

•

23 years ago

Verify fixed

Status: RESOLVED → VERIFIED

Simon Paquet [:sipaq]

Updated

•

16 years ago

Keywords: verifyme

Peter Bylenga [:PBylenga]

Updated

•

10 years ago

Keywords: qawanted

You need to log in before you can comment on or make changes to this bug.