Open Bug 1417177 Opened 7 years ago Updated 2 years ago

watchman frequently times out

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

People

(Reporter: sfink, Unassigned)

Details

watchman has been problematic for me for quite a while. Example: I just did

$ hg amend
warning: Watchman unavailable: timed out waiting for response, while executing ('query', '/home/sfink/src/mozilla', {'fields': ['mode', 'mtime', 'size', 'exists', 'name'], 'since': 'c:1508545429:8873:2:7407942', 'expression': ['not', ['anyof', ['dirname', '.hg'], ['name', '.hg', 'wholename']]], 'empty_on_fresh_instance': False, 'sync_timeout': 2000})

I think this watchman log message probably came from the same command:

2017-11-14T11:28:55,771: [client=0x9f4cc0:stm=0x9ee800:pid=10724] PERF: {"ru_nvcsw": 63, "ru_oublock": 0, "ru_inblock": 400, "ru_msgsnd": 0, "ru_nswap": 0, "ru_majflt": 50, "ru_nivcsw": 0, "ru_minflt": 0, "ru_idrss": 0, "ru_msgrcv": 0, "version": "4.9.0", "start_time": 1510687732.9232359, "meta": {"args": ["query", "/home/sfink/src/mozilla", {"sync_timeout": 2000, "fields": ["mode", "mtime", "size", "exists", "name"], "since": "c:1508545429:8873:2:7407942", "expression": ["not", ["anyof", ["dirname", ".hg"], ["name", ".hg", "wholename"]]], "empty_on_fresh_instance": false}], "client": {"pid": 10724}, "root": {"watcher": "inotify", "ticks": 7480897, "number": 2, "path": "/home/sfink/src/mozilla", "case_sensitive": true, "recrawl_count": 0}}, "description": "dispatch_command:query", "elapsed_time": 2.848096, "user_time": 0.001384, "ru_maxrss": 0, "system_time": 0.0068580000000000004, "ru_nsignals": 0, "pid": 8873, "ru_ixrss": 0}

I get mysterious pauses quite often, which I suspect may be related to watchman, and a few timeouts every day.
When watchman first registers a watch on a directory, it needs to perform a full filesystem walk to assemble a consistent snapshot of the directory.

`hg` commands that assess state of the working directory (update, commit, diff, etc) will trigger watchman integration and spawn a watchman daemon if one isn't running. This will incur a filesystem walk by watchman.

Mercurial is configured to wait N milliseconds for a watchman response before giving up and falling back to its normal implementation. When you see Mercurial print a warning about watchman timing out, this is likely what's happening. The warning is a bit misleading because it gives the impression watchman isn't working correctly. That should get fixed upstream.

As for the mysterious pauses, this could be watchman daemons going away or failing to keep up with filesystem events. Also, there was a performance regression in 4.4 you may be hitting. https://bz.mercurial-scm.org/show_bug.cgi?id=5716. Try upgrading to Mercurial 4.4.2 (released a few days ago) to see if perf improves.
Fwiw, I still see this error with hg 4.6.1 the first time I run any hg command like diff/wip etc.
I'm at the Mercurial developer meetup this weekend and will try to push on improving the warning messaging. The current state isn't great, as the warning is a bit misleading.

I still face this issue with Mercurial 4.9 and watchman 4.9.0. The first hg command always throws a warning and takes time even longer than it would have taken without watchman. One of the commands I run:

$ hg status
warning: Watchman unavailable: timed out waiting for response, while executing ('clock', '/home/x/src/mozilla-unified')

I get the same with Mercurial 5.1 on WSL. My native Linux environment doesn't have this issue. Probably just running slower on WSL.

You need to log in before you can comment on or make changes to this bug.