Closed Bug 1036705 Opened 10 years ago Closed 10 years ago

Funsize requires file level diff caching for speedups in partial generation

Categories

(Release Engineering :: General, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ffledgling, Assigned: mtabara, Mentored)

Details

(Whiteboard: [funsize])

Right now Funsize (erstwhile Senbonzakura) does caching for complete MARs to save on time spent downloading the partials, the other Major time consuming thing in the process is actually diffing files in the MARs. The XUL runner binary and omni.ja are files that seem to take a lot of time, caching these for reuse should speedup the partial generation significantly.
This will require making changes to some bits of and then using http://hg.mozilla.org/mozilla-central/file/tip/tools/update-packaging/make_incremental_updates.py instead of http://hg.mozilla.org/mozilla-central/file/tip/tools/update-packaging/make_incremental_update.sh which is what is currently being used.
More info:
Solving this or the process of solving this will raise/raises a few more questions and results in the following additional sub-problems:

i) How do we decide what to cache?
i.e cache everything vs. cache some files based on a benchmark like size vs. time taken to diff (among others)

ii) Caching invalidation strategy
How and when do we decide to purge our file cache? How do we detect it's corruption?


So, instead of tackling these problems before getting caching support in, the following should suffice for this bug:

i) Cache all files regardless of what size they are and/or how much time generating a diff for them takes (this will cause possible overheads in cases where generating diffs locally is cheaper and faster than actually fetching them from the cache).
The advantage of this strategy will be that's not over complicated and still allows to modify our cache strategy further down the line (assuming it's implemented sanely, keeping future extension in mind).

ii) We assume that our cache never gets corrupted, and if it does, it'll be cleaned up manually, thus the application needn't worry about it at the moment.
Mentor: ffledgling
Severity: normal → critical
Depends on: 1045414
Assignee: nobody → ffledgling
Whiteboard: [funsize]
Assignee: ffledgling → mtabara
(In reply to Anhad Jai Singh (:ffledgling) from comment #1)
> More info:
> Solving this or the process of solving this will raise/raises a few more
> questions and results in the following additional sub-problems:
> 
> i) How do we decide what to cache?
> i.e cache everything vs. cache some files based on a benchmark like size vs.
> time taken to diff (among others)
> 
> ii) Caching invalidation strategy
> How and when do we decide to purge our file cache? How do we detect it's
> corruption?
> 
> 
> So, instead of tackling these problems before getting caching support in,
> the following should suffice for this bug:
> 
> i) Cache all files regardless of what size they are and/or how much time
> generating a diff for them takes (this will cause possible overheads in
> cases where generating diffs locally is cheaper and faster than actually
> fetching them from the cache).
> The advantage of this strategy will be that's not over complicated and still
> allows to modify our cache strategy further down the line (assuming it's
> implemented sanely, keeping future extension in mind).
> 
> ii) We assume that our cache never gets corrupted, and if it does, it'll be
> cleaned up manually, thus the application needn't worry about it at the
> moment.

We have added file level caching to speedup partial generation. Very useful for locales generation as 90% of the partial content is the same. Tackled strategy was:

* cache all the files as soon as they're generated within the make_incrementa_update.sh (changes have been updated to m-c here: http://hg.mozilla.org/mozilla-central/file/c70f62375f7d/tools/update-packaging/make_incremental_update.sh#l198

* all the caching in funsize is done in AWS S3

* flag can be set to have them saved on local directory for workers too. Should that happen we do not take into account garbage collection at this moment (could use /tmp to have them cleaned up at very-frequent reboot in aws)

* as regards the potential corruption of the files during transmission - should the transfer of a patch fail, it defaults to old behavior of diffing the file
I'm closing this bug. Should other tasks arrive on this topic, please file a separate bug to improve funsize's bug granularity.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
No longer depends on: 1045414
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.