Closed Bug 971052 Opened 10 years ago Closed 10 years ago

Cache detected language on SS data to avoid re-running detection

Categories

(Firefox :: Translations, defect)

defect
Not set
normal
Points:
5

Tracking

()

RESOLVED WONTFIX

People

(Reporter: Felipe, Assigned: yarik.sheptykin, Mentored)

References

Details

(Whiteboard: [diamond])

Attachments

(1 file, 2 obsolete files)

When we finish running the language detection in a page (bug 971048), we should cache it in a place that will be saved by both SS and bfcache to avoid running the algorithm again during sessionrestore and during back/forward navigation.
Whiteboard: p=0
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Whiteboard: p=0 → p=5
Whiteboard: p=5 → p=5 [mentor=felipe] [diamond]
Component: General → Translation
Blocks: 1015527
Mentor: felipc
Whiteboard: p=5 [mentor=felipe] [diamond] → p=5 [diamond]
Points: --- → 5
Whiteboard: p=5 [diamond] → [diamond]
Hi, everybody!

It seems to me that re-running the language detection algorithm when a page is retrieved from cache or after the session restore is not an issue anymore. I am new to mozilla and do not yet have a good idea on all processes, but based on the code in TranslationContentHandler.jsm (http://mxr.mozilla.org/mozilla-central/source/browser/components/translation/TranslationContentHandler.jsm) I guess that the part responsible for language detection will only execute if the content.detectedLanguage does not exist (http://mxr.mozilla.org/mozilla-central/source/browser/components/translation/TranslationContentHandler.jsm#74). Same with line #43 during the showpage handling. The code seems to be already optimized for dealing with restoring.

I am not sure how caching and SessionStore work exactly, so I might be wrong. However, this is what I learned from Firefox 1.5 caching (https://developer.mozilla.org/en-US/docs/Using_Firefox_1.5_caching)

"When a user navigates to a cached page, inline scripts and the onload handler do not run (steps 2 and 3), since in most cases, the effects of these scripts have been preserved."

Thus, the existing code seems to be already cache tolerant. As far as my understanding goes, SessionStore writes down urls and user-defined data for the opened tabs, but not the contents of the page itself. When a session is restored the contents might be (1) retrieved from the cache, or (2) from the web if cache is unavailable. The first case is already covered by the code. For the second case it might actually be wise to re-run the detection in case the page changed. Optionally SessionRestore API could be used to explicitly store the detectedLanguage.

The code for handling this was introduced in a patch for bug 1015527.

To sum up, I would like to work on fixing this bug, but I am not sure if this is a issue any longer. I could write a test to ensure this issue is gone. But it would be great to hear opinion of more experienced people before doing something wrong.
Iroslav - Good work on the investigation. Felipe, could you put a second set of eyes to this to confirm?
Thanks Iroslav for the investigation! You're correct that this bug is less important now, because we added support for the bfcache directly while implementing TranslationContentHanldler.jsm

However, just to clarify, there are two types of cache, and the one that stores the detected language will never be used while restoring session.

The cache that that link talks about, and is able to preserve the page state (effects of the scripts, etc) is the bfcache, and is only used in-memory while the program is running. It is used for Back/Forward navigation and Undo close tab.

The other cache is the network cache. When a session is restored, it's still able to avoid retrieving pages from the web if they are saved in the network cache. But on this one, what is saved is what came through the network.. It's only a way to speed up the loading of data, but the page state still restarts from 0 and scripts run again. Then after they are finished, SessionStore fills in some data that it had saved, like form input data.

So it would still be beneficial to extend support for the detected language to SessionStore. However, given that the default behavior is "don't load tabs until selected" (lazy restore), and that language detection runs on a separate thread, there's no big user impact from not doing so.


So I do suggest that you your contribution be focused on a higher impact bug! But if you think this bug itself is interesting for you, by all means feel free to take it, because it is still a valid bug.
Summary: Cache detected language on SS/bfcache data to avoid re-running detection → Cache detected language on SS data to avoid re-running detection
(In reply to :Felipe Gomes from comment #3)

Hello, Felipe! Thanks for the detailed explanation of caching!

> So it would still be beneficial to extend support for the detected language
> to SessionStore. However, given that the default behavior is "don't load
> tabs until selected" (lazy restore), and that language detection runs on a
> separate thread, there's no big user impact from not doing so.

This certainly makes sense. Does extending support for SessionStore include anything else than using SessionStore API? (https://developer.mozilla.org/en/docs/Session_store_API#Saving_a_value_with_a_tab). If adding SessionStore API calls to save / restore detected language is sufficient then I would work on this. There seem to be enough documentation and examples to get it done.
 
> So I do suggest that you your contribution be focused on a higher impact
> bug! But if you think this bug itself is interesting for you, by all means
> feel free to take it, because it is still a valid bug.

I will keep searching for higher-priority bugs up to my skills, but I will also try to fix this one, because I am here to learn and this looks like a good opportunity to explore translation at Firefox.
Hi there!

Here is my work in progress on this bug. I would love to get your opinion on this early draft, because I am not sure if I am moving in the right direction. As you see in the patch I tried to make the TranslationContentHandler to save detectedLanguage to SessionStore after the detection. If language can be restored we skip language detection detection, and let the "showpage" listener to consturct the Translation:DocumentState.

I have tried to come up with a test that could detect if the languageDetection ran, but I could not. Instead I wrote a test that checks if the SS value for a tab is set. The test I wrote isn't complete. It has to close the tab and recreate it and make sure detection does not repeat. I have difficulties detecting that without changing the body of the LanguageDetector callback. Do you have any ideas regarding this?
Attachment #8511714 - Flags: review?(felipc)
Comment on attachment 8511714 [details] [diff] [review]
Cache detected language on SS data to avoid re-running detection.

Review of attachment 8511714 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Iaroslav! This is definitely going in the right direction. This patch would actually be really close to what we need if it weren't for the fact that we now must also take e10s into consideration. With e10s, the SS and Translation code are split in two, one running in the parent (UI) process and one running in the child (content) process.

The TranslationContentHandler.jsm module runs in the child process (because it needs access to the webpage's content), but then it doesn't have access to the "tab" object, which is part of the UI, or the code to set/retrieve the values from SS.


For saving the value to SS, it is gonna be easy. The information is already passed up to the parent (because it must be displayed in the UI). So you can add the saving code there, which happens in Translation.jsm (more specifically, the documentStateReceived function).


To get the value during page load, take a look in content-sessionStore.js. The SS data is received through various messages in the receiveMessage function. You'll need to find out which one and go from there. I'm not entirely sure but start by looking at gContentRestore.restoreHistory.. I think that may be a good lead.

Hope that helps, and feel free to ask any follow-up questions!
Attachment #8511714 - Flags: review?(felipc) → feedback+
(In reply to :Felipe Gomes from comment #6)

Hi Felipe!

> For saving the value to SS, it is gonna be easy. The information is already
> passed up to the parent (because it must be displayed in the UI). So you can
> add the saving code there, which happens in Translation.jsm (more
> specifically, the documentStateReceived function).

I moved the saving code to the Translation.jsm, as you suggested.
 
> To get the value during page load, take a look in content-sessionStore.js.
> The SS data is received through various messages in the receiveMessage
> function. You'll need to find out which one and go from there. I'm not
> entirely sure but start by looking at gContentRestore.restoreHistory.. I
> think that may be a good lead.

Just as you mentioned, SS received tabData together with the SessionStore:restoreHistory message. This message is sent at the very beginning of the restore procedure. I subscribed TranslateContentHandler to this message and extract tab values out of tabData.extData. After processing this message TranslateContentHandler remembers detectedLanguage and thus the detection algorithm does not re-run.

> Hope that helps, and feel free to ask any follow-up questions!

Thanks for the great hints! I would have to spend a couple of days to figure out all this without your help.

Please have a close look on my test file. In order to make sure that the detection algorithm isnt running, I remove LanguageDetector.detectLanguage function to break the TranslateContentHandler. Then I wait for 2 sec for "Translation:DocumentState" and if it doesn't come I fail the test. If the implementation is correct, the message should come from showpage handler.
Attachment #8512789 - Flags: review?(felipc)
Comment on attachment 8512789 [details] [diff] [review]
Cache detected language on SS data to avoid re-running detection.

Review of attachment 8512789 [details] [diff] [review]:
-----------------------------------------------------------------

Nice patch! The translation parts of it look right to me! But I'm not the best person to properly review SessionStore usage, so I'm asking ttaubert to take a look at that for us.

I looked at the test carefully and have some comments below:

::: browser/components/translation/test/browser_translation_caching.js
@@ +55,5 @@
> +  tab = SS.undoCloseTab(window, 0);
> +
> +  // Stub the LanguageDetector to break TranslationContentHandler if detection runs.
> +  this.LanguageDetector._detectLanguage = this.LanguageDetector.detectLanguage;
> +  this.LanguageDetector.detectLanguage = null;

instead of "breaking" it, you should define a function here that, if called, will make the test to fail.  Like: 
function() { Assert.ok(false, "detection should not have run again"); }

@@ +62,5 @@
> +  browser = tab.linkedBrowser;
> +  yield waitForMessage(browser, "Translation:DocumentState", 2000);
> +
> +  // Restore the original detectLanguage function.
> +  this.LanguageDetector.detectLanguage = this.LanguageDetector._detectLanguage;

do this as another registerCleanupFunction before the yield is called

@@ +96,5 @@
> +    messageManager.addMessageListener(name, function onMessage() {
> +      messageManager.removeMessageListener(name, onMessage);
> +      resolve();
> +    });
> +    if(orFailAfter) {

I don't get this part of the code.. there should be something to cancel the setTimeout that is set, or to at least guarantee that it's not gonna call Assert.ok(false..)

Even if the test has already finished successfully, it's usually problematic to let timeouts behind because they might be called while a different test is running, and then the Assert.ok(false) will make it appear that the other test failed.

But is this orFailAfter part really needed? Why is it only used in the second usage of waitForMessage?  The Translation:DocumentState message should be sent regardless of the information being cached from SS or not, right? So you can always rely on it being sent..
Attachment #8512789 - Flags: review?(ttaubert)
Attachment #8512789 - Flags: review?(felipc)
Attachment #8512789 - Flags: feedback+
Assignee: nobody → yarik.sheptykin
Status: NEW → ASSIGNED
Flags: qe-verify?
(In reply to :Felipe Gomes from comment #8)

Thanks for the feedback!

> Nice patch! The translation parts of it look right to me! But I'm not the
> best person to properly review SessionStore usage, so I'm asking ttaubert to
> take a look at that for us.

I resubmitted the patch and carried over the review request for ttaubert.

> I looked at the test carefully and have some comments below:

Thanks for the comments! They were a big help! I modified the test code according to your feedback. I ran it with and without SS restore support and it seems to behave properly.

I will be waiting for ttaubert's review and then, if everything is alright, I will push the patch to the try server.
Attachment #8511714 - Attachment is obsolete: true
Attachment #8512789 - Attachment is obsolete: true
Attachment #8512789 - Flags: review?(ttaubert)
Attachment #8515316 - Flags: review?(ttaubert)
Hi Iaroslav, the test looks great now! Do you already have access to tryserver? If you don't, let me know and I'll help you set it up. You don't need to wait for the review before pushing your patch to try server. In fact it's even better to do it before, because then after getting the review we can land it right away and resolve the bug. And it also gives the reviewer more confidence that the patch will need no further modifications. So, go ahead and send it to try!
Comment on attachment 8515316 [details] [diff] [review]
Cache detected language on SS data to avoid re-running detection.

Review of attachment 8515316 [details] [diff] [review]:
-----------------------------------------------------------------

Hey Iaroslav, thanks for the patch! I unfortunately think we should go with a different approach that could at the same time maybe even cache more results for language detection? Please see my comments below:

::: browser/components/translation/TranslationContentHandler.jsm
@@ +168,5 @@
>          this.global.content.translationDocument.showTranslation();
>          break;
> +
> +      case "SessionStore:restoreHistory":
> +        // SessionStore key set in Translation.jsm

So... I have a few problems with this:

1) The translation service shouldn't need to know so many details about how SessionStore works.
2) extData is more or less only sent "by accident". The content process doesn't need to know about it and we might want to get rid of that in the future.
3) The tab state is something we save for the whole tab but we actually only want to save the detected language for a specific URL.

I would favor the translation module having its own language cache, ideally per URL. That should live in the parent and be queried before kicking off language detection. The onStateChange() listener could live in the parent and kick off language detection or send a cached language?
Attachment #8515316 - Flags: review?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #11)

Hey Tim! Thanks for the feedback!

> 3) The tab state is something we save for the whole tab but we actually only
> want to save the detected language for a specific URL.
> I would favor the translation module having its own language cache, ideally
> per URL. That should live in the parent and be queried before kicking off
> language detection. The onStateChange() listener could live in the parent
> and kick off language detection or send a cached language?

I don't understand this part. Do you mean that we should not use sessionstore at all? Because I don"t see api for saving a URL only. But maybe I am reading in a wrong place (https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Reference/Interface/nsISessionStore). Do you know of any code that uses a similar caching strategy? Would be great to have an example of how to hanlde this the right way.
Flags: needinfo?(ttaubert)
Flags: qe-verify? → qe-verify-
(In reply to Iaroslav Sheptykin from comment #12)
> (In reply to Tim Taubert [:ttaubert] from comment #11)
> > 3) The tab state is something we save for the whole tab but we actually only
> > want to save the detected language for a specific URL.
> > I would favor the translation module having its own language cache, ideally
> > per URL. That should live in the parent and be queried before kicking off
> > language detection. The onStateChange() listener could live in the parent
> > and kick off language detection or send a cached language?
> 
> I don't understand this part. Do you mean that we should not use
> sessionstore at all? Because I don"t see api for saving a URL only.

Yeah, I think that SessionStore is not the right place to cache this kind of information. You probably (felipe, please correct me if I'm wrong) want a cache for the whole browser. A detected language per URL, not just per tab.

> Do you know of any code that uses a similar
> caching strategy? Would be great to have an example of how to hanlde this
> the right way.

I think the best way to do it would be to send an async message from the TranslationContentHandler in onStateChange() to ask the parent process whether it has any cached language information for the current URL. The parent then responds with the cached and previously detected language or with just an empty response. That is the point at which either the content handler uses the given language or starts detecting itself.

Translation.jsm would simply maintain a map of URL => detectedLanguage and query that. The content handler must of course after a successful detection send a message to the parent so that it can cache that value for subsequent page loads.

Felipe, any thoughts on the proposal?
Flags: needinfo?(ttaubert) → needinfo?(felipc)
(In reply to Tim Taubert [:ttaubert] from comment #13) 
> > I don't understand this part. Do you mean that we should not use
> > sessionstore at all? Because I don"t see api for saving a URL only.
> 
> Yeah, I think that SessionStore is not the right place to cache this kind of
> information. You probably (felipe, please correct me if I'm wrong) want a
> cache for the whole browser. A detected language per URL, not just per tab.

I thought about it, but I think that for this particular case of language detection, a cache per tab, just for a session, is better than a global cache for URLs. There are two reasons:

- it's not uncommon for the language of a particular URL to change.. For example, there are sites that offer me portuguese content when I'm logged out (due to ip-geolocation). But then after logging in, in the same url the site will be in English due to my settings. So if we cached language per URL we would miss these things

- Language detection is fast enough most of the time, and runs in a dedicated thread. So there's no downside in running it that would justify managing a separate storage for its cache, handle invalidation, privacy, forget about this site, etc..  The only critical moment for language detection is during session restore (for a user who does not have lazy loading), because the detection for all the pages will be queued in the detection thread and it will create a spike of work/memory usage there.

So the purpose of this bug would be to optimize this, and it seems that SS would be the straightforward place to do that, if it's OK to add data to SS about arbitrary things like the detected language.


Does that make sense, Tim? With that in mind, would you say the original plan here is reasonable? If so, I guess it's just a matter of decoupling the usage of SS more. You said "The translation service shouldn't need to know so many details about how SessionStore works.". Any tips on how to approach that?
Flags: needinfo?(felipc) → needinfo?(ttaubert)
(In reply to :Felipe Gomes from comment #14)
> The only critical moment for language
> detection is during session restore (for a user who does not have lazy
> loading), because the detection for all the pages will be queued in the
> detection thread and it will create a spike of work/memory usage there.

Would there be other ways to avoid this edge case? Maybe starting detection only the first time a tab is selected?
(In reply to :Felipe Gomes from comment #14)
> - Language detection is fast enough most of the time, and runs in a
> dedicated thread. So there's no downside in running it that would justify
> managing a separate storage for its cache, handle invalidation, privacy,
> forget about this site, etc..

Well, great :)

> The only critical moment for language
> detection is during session restore (for a user who does not have lazy
> loading), because the detection for all the pages will be queued in the
> detection thread and it will create a spike of work/memory usage there.

restore_on_demand=true is the default. Anyone with lots of tabs setting that to false is crazy and I honestly don't think the overhead that language detection brings here actually matters. I would think that is negligible compared to actually loading pages.

> So the purpose of this bug would be to optimize this, and it seems that SS
> would be the straightforward place to do that, if it's OK to add data to SS
> about arbitrary things like the detected language.

I honestly don't know whether optimizing for a crazy non-default use case makes sense. It seems like a lot of work to me to offer a SessionStore-API for content, if that's a thing we want at all. My very personal opinion is that I don't think investing time here is worth the outcome.

If you really want to work on this you could still from onStateChange() send a message to the parent. That could use ss.getTabState() to look up a potentially saved language and should probably also clear it right afterwards. The parent sends message to content again to tell it to start language detection or use the language provided.
Flags: needinfo?(ttaubert)
(In reply to Florian Quèze [:florian] [:flo] from comment #15)
> (In reply to :Felipe Gomes from comment #14)
> > The only critical moment for language
> > detection is during session restore (for a user who does not have lazy
> > loading), because the detection for all the pages will be queued in the
> > detection thread and it will create a spike of work/memory usage there.
> 
> Would there be other ways to avoid this edge case? Maybe starting detection
> only the first time a tab is selected?

Starting language detection for only the visible tab makes a lot of sense I think. Upon switching tabs we could simply check whether we have already run that or still need to run it. +1
With all this new information at hand, (that bfcache is working, and making it work with SS would be over optimizing for a rare edge case), and since detection is already fast enough, I think the best to do here is to wontfix this bug.

Iaroslav: thanks for working on this! I apologize for filing this bug that ended up being a wontfix, but that's how bugs go sometimes. I'm sure the work was not lost because writing the patch for it hopefully was a nice way to learn about the translation, SS and e10s code, and it will definitely help with other bugs :)
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
(In reply to :Felipe Gomes from comment #18)
> Iaroslav: thanks for working on this! I apologize for filing this bug that
> ended up being a wontfix, but that's how bugs go sometimes. I'm sure the
> work was not lost because writing the patch for it hopefully was a nice way
> to learn about the translation, SS and e10s code, and it will definitely
> help with other bugs :)

It was fun working on this bug. Happy have it resolved!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: