Closed Bug 1183952 Opened 6 years ago Closed 6 years ago

ES timeout on large (55mb) file

Categories

(Webtools Graveyard :: DXR, defect)

defect
Not set
normal

Tracking

(firefox42 affected)

RESOLVED FIXED
Tracking Status
firefox42 --- affected

People

(Reporter: fubar, Unassigned)

Details

01:28:59 A worker failed while indexing /builds/dxr-build-env/src/mozilla-central-test/obj-x86_64-unknown-linux-gnu/all-tests.json:
01:28:59 Traceback (most recent call last):
01:28:59   File "/builds/dxr-build-env/venv/local/lib/python2.7/site-packages/dxr-0.1-py2.7.egg/dxr/build.py", line 579, in index_chunk
01:28:59     index_file(tree, tree_indexers, path, es, index)
01:28:59   File "/builds/dxr-build-env/venv/local/lib/python2.7/site-packages/dxr-0.1-py2.7.egg/dxr/build.py", line 550, in index_file
01:28:59     es.bulk(chunk, index=index, doc_type=LINE)
01:28:59   File "/builds/dxr-build-env/venv/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 93, in decorate
01:28:59     return func(*args, query_params=query_params, **kwargs)
01:28:59   File "/builds/dxr-build-env/venv/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 448, in bulk
01:28:59     query_params=query_params)
01:28:59   File "/builds/dxr-build-env/venv/local/lib/python2.7/site-packages/pyelasticsearch/client.py", line 281, in send_request
01:28:59     raise exc.info
01:28:59 ReadTimeoutError: HTTPConnectionPool(host='node46.bunker.scl3.mozilla.com', port=9200): Read timed out. (read timeout=60)


dxr-processor1.private.scl3# du -skh mozilla-central-test/obj-x86_64-unknown-linux-gnu/all-tests.json
55M	mozilla-central-test/obj-x86_64-unknown-linux-gnu/all-tests.json

This is the second time I've noticed it fail on that particular file. It's not clear to me where that particular timeout is set, as es_indexing_timeout is set to 120.
Is the contents of all-tests.json one long line? How long is the longest line? If it's one enormous 55MB line, the chunking machinery would find it indivisible and would be stuck sending 55MB to ES all in one request, where the usual is 10K.
It's one line. :-\
Rrrr. Maybe I'll just have to put a cap on long lines. Let me think about it over lunch and see if I come up with any better ideas.
I don't know why the underlying transport would ignore the passed timeout, either.

I didn't come up with anything better than truncating the lines. I should be able to get to that Friday.
I think our best bet is actually to add all-tests.json to the ignore_patterns in the config file. The line-truncating solution would have a cascading effect on plugins and the line-building subsystem, which would then have to deal with offsets that suddenly point beyond the end of a line. We could spin through all the lines when reading each file and automatically class as "binary" any with ridiculously long lines, but I'm not sure how much that would slow down indexing. It might not be bad at all, but it bears benchmarking. If it's just a single file bothering us for now, let's just add it to the ignores. Sound decent?
just to circle back - adding all-tests.json to ignore_patterns did the trick, though it did turn up that setting ignore_patterns is not additive to the default list.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.