Closed Bug 971048 Opened 10 years ago Closed 10 years ago

Run language detection on webpages and display infobar when language is not the current UI locale

Categories

(Firefox :: Translations, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
Firefox 31

People

(Reporter: Felipe, Assigned: florian)

References

Details

(Whiteboard: [translation] p=5 s=it-31c-30a-29b.3 [qa!])

Attachments

(1 file, 4 obsolete files)

When a webpage finishes loading, we need to do the following:

- Collect the webpage text content (using module from bug 971043)
- Run language detection on this content (from bug 971047)
- Display an infobar (from bug 971044) offering to translate the webpage, iff:
    - Language Detection was deemed to be accurate
    - Source language is a supported language
    - Source language is not part of the user's known languages (Options > Content > Languages)
Blocks: 971052
Whiteboard: p=0
Blocks: 973275
Blocks: 973276
Whiteboard: p=0 → [translation] p=0
Depends on: 979424
Status: NEW → ASSIGNED
Assignee: nobody → bmo
Whiteboard: [translation] p=0 → [translation] p=5 s=it-31c-30a-29b.1
Assigning to Bogdan Maris for QA sign-off.
QA Contact: bogdan.maris
Whiteboard: [translation] p=5 s=it-31c-30a-29b.1 → [translation] p=5 s=it-31c-30a-29b.1 [qa+]
Whiteboard: [translation] p=5 s=it-31c-30a-29b.1 [qa+] → [translation] p=5 [qa+]
Whiteboard: [translation] p=5 [qa+] → [translation] p=5 s=it-31c-30a-29b.2 [qa+]
Attached patch WIP (obsolete) — Splinter Review
This is the WIP that I'm currently using to test my other patches.
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Attached patch WIP2 (obsolete) — Splinter Review
Updated WIP applying above the patch landed in bug 974461.
Attachment #8400782 - Attachment is obsolete: true
Depends on: 993338
Attached patch WIP3 (obsolete) — Splinter Review
Different approach compatible with e10s.

Not fully happy with this, mostly because I don't see an easy way to avoid loading cld if we have no translation engine available.
Attachment #8403194 - Attachment is obsolete: true
Depends on: 995321
Whiteboard: [translation] p=5 s=it-31c-30a-29b.2 [qa+] → [translation] p=5 s=it-31c-30a-29b.3 [qa+]
Attached patch Patch v4 (obsolete) — Splinter Review
Attachment #8405423 - Attachment is obsolete: true
Attachment #8406353 - Flags: review?(felipc)
Comment on attachment 8406353 [details] [diff] [review]
Patch v4

Review of attachment 8406353 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/content.js
@@ +410,5 @@
> +    // Grab a 64k sample of text from the page.
> +    let encoder = Cc["@mozilla.org/layout/documentEncoder;1?type=text/plain"]
> +                    .createInstance(Ci.nsIDocumentEncoder);
> +    encoder.init(content.document, "text/plain",
> +                 encoder.OutputWrap | encoder.SkipInvisibleContent);

is OutputWrap necessary?
Could we also add OutputDontRemoveLineEndingSpaces to skip the space trimming?

@@ +411,5 @@
> +    let encoder = Cc["@mozilla.org/layout/documentEncoder;1?type=text/plain"]
> +                    .createInstance(Ci.nsIDocumentEncoder);
> +    encoder.init(content.document, "text/plain",
> +                 encoder.OutputWrap | encoder.SkipInvisibleContent);
> +    let string = encoder.encodeToStringWithMaxLength(64 * 1024);

Let's be conservative and not get close to the edge of 64k.  60k should be enough
Attachment #8406353 - Flags: review?(felipc) → review+
Attached patch Patch v5Splinter Review
(In reply to :Felipe Gomes from comment #6)

> > +    encoder.init(content.document, "text/plain",
> > +                 encoder.OutputWrap | encoder.SkipInvisibleContent);
> 
> is OutputWrap necessary?

No, it was my mistake. I was using OutputRaw before and that didn't give good results. For some reason I thought OutputWrap was the opposite, but I can actually just omit it.

> Could we also add OutputDontRemoveLineEndingSpaces to skip the space
> trimming?

I don't think the loop at http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsPlainTextSerializer.cpp#1379 is really expensive, and I think the more actual text we feed to the language detection library, the better the results will be.

> @@ +411,5 @@
> > +    let encoder = Cc["@mozilla.org/layout/documentEncoder;1?type=text/plain"]
> > +                    .createInstance(Ci.nsIDocumentEncoder);
> > +    encoder.init(content.document, "text/plain",
> > +                 encoder.OutputWrap | encoder.SkipInvisibleContent);
> > +    let string = encoder.encodeToStringWithMaxLength(64 * 1024);
> 
> Let's be conservative and not get close to the edge of 64k.  60k should be
> enough

Ok, changed the 64 to 60 in this new patch.
Attachment #8406353 - Attachment is obsolete: true
> // Grab a 64k sample of text from the page.
nit: update comment
Assignee: bmo → florian
Summary: Run language detection on webpages and display infobar when language are not in the user's preferred languages list → Run language detection on webpages and display infobar when language is not the current UI locale
https://hg.mozilla.org/integration/fx-team/rev/7b4cfd722bcd

There's a try server build for QA at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/florian@queze.net-d50109811367/

Some notes for QA:
- Language detection is expected to run on http and https pages, but not on pages that are part of the browser (about: ...)
- I've intentionally disabled language detection on web pages that contain less than 100 characters of text; it's not a bug if the language isn't detected on really short pages.
- Language detection is now expected to work in both normal windows and e10s windows (File -> New e10s Window); previous test builds with translation UI didn't work in e10s windows.
https://hg.mozilla.org/mozilla-central/rev/7b4cfd722bcd
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 31
Hi Bogdan, confirming if this resolved bug can be verified before the end of the desktop iteration on Monday April 28?
Flags: needinfo?(bogdan.maris)
Yes Marco, I did some exploratory testing around this bug, I encountered two issues bug 997806 and bug 997818 (credits to Florian). I think we can close this issue as Verified.
My testing was done on Windows 7 64bit, Ubuntu 13.10 32bit, Mac OS X 10.9.2 and Surface Pro 2 with Windows 8.1 64bit.
Status: RESOLVED → VERIFIED
Flags: needinfo?(bogdan.maris)
Whiteboard: [translation] p=5 s=it-31c-30a-29b.3 [qa+] → [translation] p=5 s=it-31c-30a-29b.3 [qa!]
Depends on: 1003118
Mass move of translation bugs to the new Translation component.
Component: General → Translation
You need to log in before you can comment on or make changes to this bug.