Closed Bug 33337 Opened 26 years ago Closed 24 years ago

Universal Charset Autodetection

Categories

(Core :: Internationalization, defect, P1)

All
Other
defect

Tracking

()

RESOLVED FIXED
mozilla1.0

People

(Reporter: bobj, Assigned: shanjian)

References

Details

(Keywords: helpwanted, intl, Whiteboard: need sr / since 7/8)

Attachments

(3 files)

No description provided.
Status: NEW → ASSIGNED
Target Milestone: --- → M20
Target Milestone: M20 → M18
Target Milestone: M18 → M17
Need to support this to greatly improve usability for users viewing unlabeled pages in different encodings. IE has this feature and its a competive disadvantage not to support this.
Reassigned to shanjian. He has the 3rd party technology and is working with ftang to integrate it using ftang's auto-detection interface. We may need to open a "confidential" bug in the Netscape bug system to discuss details of the 3rd party. However, we should leave this bug open, in case someone wants to offer an open-source alternative. technology
Assignee: bobj → shanjian
Status: ASSIGNED → NEW
Here is the plan 1. Build it in commercial tree. 2. Initially target on Win32 3. Use the 3rd party library without modification of any source or headers 4. Port Solaris version 3rd party library to Linux 5. Port to 3rd party library Mac Windows should be complete by 6/1. Shortly (TBD) after, we will port to other platforms. mkaply, Are you interested in helping to port to AIX?
Keywords: nsbeta2
This is a really critical feature to have in the NS6 product. The contract is already completed with a 3rd party so we basically get this feature for free except we need to do the work to incorporate it into the build and port to Mac, Linux. And, we're using this feature on the AutoTranslate service so the bugs and performance problems in it have primarily been flushed out. Why is this important? What's Happening on the Web--The International population on the web is increasing dramatically thus new users will be less technical. Less than 2% of web pages are tagged with encoding and/or language info. From UE studies we've found that at worst users can't find the Encoding menu. Many users can find the menu but it is challenging to figure out and time consuming to change over and over again when they encounter web pages which are not in their preferred encoding. Ease of Use--With this feature in place very few customers will need to use the Encoding menu. They will only need to use this if and when the LanguageDetect Fails. The Competition--This feature is currently available in IE 5.0 What does the code offer?--Platforms. The Technology will be designed to operate on the Sun Solaris and the Microsoft Windows 95/98/NT operating systems. We get the source code to port to other platforms Languages. The Technology shall be able to detect, at a minimum, any of the following languages on a web page: Albanian, Arabic, Basque, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Korean, Malay, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai or Turkish.
Status: NEW → ASSIGNED
This is on PDT's feature watch for an exceptions tomorrow night.
Whiteboard: [NEED INFO]
PDT just added "[NEED INFO]" to this bug. What info is needed?
This is an nsbeta2+ feature exception for Windows only. Mac and Linux are being worked on by outside resources. We will not hold beta for them but we will take the fixes.
Whiteboard: [NEED INFO] → [nsbeta2+] Windows only
bug 1058 in bugscape should be marked as nsbeta2+. This bug will remain open util a open source solution is provided.
removed nsbeta2 here, and added to 1058 in bugscape.
Keywords: nsbeta2
Whiteboard: [nsbeta2+] Windows only
Bugzilla's automatic bug linker created a bogus link in the 06-02 comment. This is a bugscape bug and the correct link is http://bugscape.netscape.com/show_bug.cgi?id=1058
Note that the bugscape bug: http://bugscape.netscape.com/show_bug.cgi?id=1058 is for the Netscape commerical build only because it is using a libary licensed from a 3rd party. This bugzilla bug remains open to encourage someone to provide an open-source alternative that can also be used in Mozilla. Reassigned to nobody@netscape.com and added helpwanted keyword.
Assignee: shanjian → nobody
Status: ASSIGNED → NEW
Keywords: helpwanted
Target Milestone: M17 → Future
Moving to mozilla.org
Assignee: nobody → nobody
Keywords: intl, nsbeta1
Changed QA contact to ylong@netscape.com.
QA Contact: teruko → ylong
*** Bug 81936 has been marked as a duplicate of this bug. ***
yokoyama, please move the Netscape code from /ns/intl/chardet to /mozilla/extension/chardet after remove 3rd party code.
Assignee: nobody → yokoyama
Target Milestone: Future → ---
ftang: do we have an open-source alternative that can also be used in Mozilla? Moving the code from /ns/intl/chardet to /mozilla/extension/chardet is not hard, but what do we replace the code with after removing the 3rd party?
Status: NEW → ASSIGNED
The 3rd party code is currently used as fallback in a #ifdef . So .. simply remove the #ifdef .. #endif part should work fine. Also remember to replace any file name which have 3rd party code/company name to some other name, ask me to code review BEFORE you put in the patch so we won't cause any legal issue. mark it as P1 moz0.9.3
Priority: P3 → P1
Target Milestone: --- → mozilla0.9.3
shanjian: I have a patch for this. ftang recommends me to contact you before I put in the patch. I'll contact you soon.
=========== pasting an email from shanjian ============================ Your name change seems complete to me. It seems that you generated 2 new number for NS_UNIVERSAL_DETECTOR_CID and NS_UNIVERSAL_STRING_DETECTOR_CID. That is the right thing to do. r=shanjian
Whiteboard: got /r=shanjian.
checked-in and changing QA contact to ftang
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
QA Contact: ylong → ftang
Resolution: --- → FIXED
Several of the files checked in have this notice at the top instead of the NPL/MPL: * CONFIDENTIAL AND PROPRIETARY SOURCE CODE * OF NETSCAPE COMMUNICATIONS CORPORATION Can you please fix the licenses?
+ { if ((unsigned char)*str >= (unsigned char)0xc4) + return 94*((unsigned char)str[0]-(unsigned char)0xc4) + (unsigned char)str[1] - (unsigned char)0xa1; There's code like this all through this patch and there are almost no comments. Can you please go through the code and add comments to all of the bit magic, please? It's completely unreadable as is.
re-opening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
shanjian: since you wrote the code, can you add the comments? While you are at it, can you change the licenses as well? Thanks
Assignee: yokoyama → shanjian
Status: REOPENED → NEW
I spent half of the day to add those comments. They might still seem no enough. I have 2 documentation in "http://jazz/users/shanjian/publish/UniversalCharDet/". One is the paper that I am going to submitted to IUC, it introduce the basic idea and algorithm. The other one is an explaination of the files and code. That will be very helpful in reviewing the code.
Whiteboard: got /r=shanjian. → need sr / since 7/8
1. shanjian: your latest patch is x-zip-compressed , please submit it as cvs diff -u 2. create a mozilla/extensions/univeralchardet/doc directory and check in your paper
Status: NEW → ASSIGNED
The document explaining code in birdview has been checked in. The paper should not go public until the Conference is over.
Push it to 1.0 until we could get a sr. Then reset.
Target Milestone: mozilla0.9.3 → mozilla1.0
sr=blizzard
fix has been checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: