Closed Bug 825031 Opened 12 years ago Closed 11 years ago

change indexing task chunk size

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

2013Q1

People

(Reporter: willkg, Assigned: willkg)

Details

(Whiteboard: u=sumo-team c=search p= s=)

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Description

•

12 years ago

Currently the indexing task chunk size is 50,000 which means it splits all the things that need to be indexed into chunks of 50,000 items. Currently that creates 6 celery tasks for reindexing everything.

Recently, we've upgraded mysql and elastic. Probably makes sense to change the task size.

If we changed it to 40,000, that'll be 7 tasks: 1 kb, 1 forums, 5 questions.

If we changed it to 30,000, that'll be 9 tasks: 1 kb, 1 forums, 7 questions.

Kadir Topal [:atopal]

Comment 1

•

11 years ago

I can't tell the up-/downsides of this or the time requirements. Would this parallelize indexing and bring the total time needed for indexing down? Could someone elaborate?

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 2

•

11 years ago

This is an engineering thing.

Generally speaking, as long as the number of tasks is lower than the number of available celery workers, then all the tasks will run at once, the tasks will be smaller in size, and therefore it'll take less time to index.

It's an easy change to make in the code. It's probably a half-day project because we'll need to test it out in production and probably want to see whether the index-time numbers change.

Kadir Topal [:atopal]

Comment 3

•

11 years ago

Thanks for the info, Will. Adding this to the next sprint.

Priority: -- → P4

Whiteboard: u=sumo-team c=search p= s=2013.2

Target Milestone: --- → 2013Q1

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 4

•

11 years ago

In pull request: https://github.com/mozilla/kitsune/pull/1049

Assignee: nobody → willkg

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 5

•

11 years ago

Decided to try 20k for now. Landed in master in https://github.com/mozilla/kitsune/commit/fbfc5a10c6b777f7aadb91c7e3788343aed93071

Next:

1. deploy to staging
2. run non-destructive reindex on staging (which might make staging sad because it doesn't have a good ES cluster. we might want to make the chunk size configurable, but that's a project for another bug.)
3. deploy to production
4. run non-destructive reindex on production while watching ganglia

If ganglia doesn't show massive spike and the indexing times are lower than the previous reindexing, then we should stay at 20k. Otherwise, we'll adjust as necessary.

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 6

•

11 years ago

Pushed it to stage and reindexed there. During the reindexing, phx1 got very sad and lonely and all kinds of problems ensued. It had done enough indexing that I decided it was worth pushing to prod and testing there, so I did.

Previous indexing pass (50k chunk size) had 7 tasks and took 24 minutes. This indexing pass (20k chunk size) had 13 tasks and took 18 minutes. It's possible that it took longer that it normally will because phx1 was still recovering.

I watched the mysql and ES graphs and they look fine. The biggest surge was the mysql graph network one. But since all the tasks execute in parallel, the number of tasks shouldn't affect network load.

Ergo, we're going to leave it at 20k.

As a side note, when we update elasticutils to a version that uses pyelasticsearch, I suspect we'll see some performance wins there.

Also, as a clarification, this work ONLY affects indexing from the admin. It doesn't affect indexing from cron jobs or normal site activity.

Marking as FIXED.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Will Kahn-Greene [:willkg] ET needinfo? me

Assignee

Comment 7

•

11 years ago

Taking this out of the 2013.2 sprint since it's done now.

Whiteboard: u=sumo-team c=search p= s=2013.2 → u=sumo-team c=search p= s=

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

change indexing task chunk size

Categories

(support.mozilla.org :: Search, defect, P4)

Tracking

(Not tracked)

People

(Reporter: willkg, Assigned: willkg)

References

Details

(Whiteboard: u=sumo-team c=search p= s=)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7