Closed Bug 951059 Opened 12 years ago Closed 11 years ago

Permission denied errors for some SVN commits

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ashish, Assigned: bhourigan)

References

Details

(Whiteboard: [firebug-p1])

Attachments

(1 file)

Reported in #sysadmins: 23:48:10 < flod> | is SVN having trouble this morning? I keep getting strange "permission denied" errors that disappear after a retry 00:33:42 < flod> ashish: do you have any warnings coming from SVN (see message above)? 00:37:19 <@ashish> hmm, trying to figure which ssh server might have generated that permission denied error 00:37:47 < flod> | right now the ratio of my commits is 50% failures 00:38:34 < flod> | this one worked http://viewvc.svn.mozilla.org/vc?view=revision&revision=123503 00:38:41 < flod> | but it failed twice before that
Possible correlated entries in /var/log/httpd/error_log: > [Mon Dec 16 23:26:43 2013] [error] [client 10.8.74.212] Could not create activity /!svn/act/586ee596-cf41-499a-9812-a7e2f5cfb85e. [500, #0] > [Mon Dec 16 23:26:43 2013] [error] [client 10.8.74.212] could not begin a transaction [500, #13] > [Mon Dec 16 23:26:43 2013] [error] [client 10.8.74.212] Can't open file '/repo/svn/mozilla/db/txn-current-lock': Permission denied [500, #13] > [Mon Dec 16 23:27:18 2013] [error] [client 10.8.74.212] Could not create activity /!svn/act/5cfb94a4-ab92-45ca-a3fa-8367827fddc4. [500, #0] > [Mon Dec 16 23:27:18 2013] [error] [client 10.8.74.212] could not begin a transaction [500, #13] > [Mon Dec 16 23:27:18 2013] [error] [client 10.8.74.212] Can't open file '/repo/svn/mozilla/db/txn-current-lock': Permission denied [500, #13] > [Mon Dec 16 23:31:30 2013] [error] [client 10.8.74.212] Could not MERGE resource "/!svn/act/812a34d8-ecd5-4901-81e8-4cff8a9a8963" into "/projects/mozilla.com/tags/production/locales". [409, #0] > [Mon Dec 16 23:31:30 2013] [error] [client 10.8.74.212] An error occurred while committing the transaction. [409, #13] > [Mon Dec 16 23:31:30 2013] [error] [client 10.8.74.212] Can't open file '/repo/svn/mozilla/db/write-lock': Permission denied [409, #13] > [Mon Dec 16 23:35:55 2013] [error] [client 10.8.74.212] Access denied: - GET :/projects/directory, referer: http://stackoverflow.com/questions/6044622/svn-how-to-export-changed-files-with-all-the-log-messages-to-a-list > [Mon Dec 16 23:47:18 2013] [error] [client 10.8.74.212] Could not create activity /!svn/act/2d13d7b3-cb67-48de-aecf-bcda4a05a700. [500, #0] > [Mon Dec 16 23:47:18 2013] [error] [client 10.8.74.212] could not begin a transaction [500, #13] > [Mon Dec 16 23:47:18 2013] [error] [client 10.8.74.212] Can't open file '/repo/svn/mozilla/db/txn-current-lock': Permission denied [500, #13] > [Tue Dec 17 00:06:14 2013] [error] [client 10.8.74.212] Could not MERGE resource "/!svn/act/ff5565aa-3bb9-489c-bd5b-fdfc045c15c4" into "/projects/mozilla.com/trunk/locales". [409, #0] > [Tue Dec 17 00:06:14 2013] [error] [client 10.8.74.212] An error occurred while committing the transaction. [409, #13] > [Tue Dec 17 00:06:14 2013] [error] [client 10.8.74.212] Can't open file '/repo/svn/mozilla/db/write-lock': Permission denied [409, #13] > [Tue Dec 17 00:31:10 2013] [error] [client 10.8.74.212] Could not create activity /!svn/act/71776889-e30a-4c76-b83a-852ae9b258b5. [500, #0] > [Tue Dec 17 00:31:10 2013] [error] [client 10.8.74.212] could not begin a transaction [500, #13] > [Tue Dec 17 00:31:10 2013] [error] [client 10.8.74.212] Can't open file '/repo/svn/mozilla/db/txn-current-lock': Permission denied [500, #13] > [Tue Dec 17 00:35:18 2013] [error] [client 10.8.74.212] Could not MERGE resource "/!svn/act/ca602a7b-0ca5-4f57-8fbb-c76ccc9da7ae" into "/projects/mozilla.com/tags/stage/locales". [409, #0] > [Tue Dec 17 00:35:18 2013] [error] [client 10.8.74.212] An error occurred while committing the transaction. [409, #13] > [Tue Dec 17 00:35:18 2013] [error] [client 10.8.74.212] Can't open file '/repo/svn/mozilla/db/write-lock': Permission denied [409, #13] Permissions on some the lock files: -rw-rw-r-- 1 svn svn_mozilla 0 Jan 14 2013 /repo/svn/mozilla/db/write-lock -rw-rw-r-- 1 svn svn_mozilla 0 Jan 14 2013 /repo/svn/mozilla/db/txn-current-lock
svn2 is drained from Zeus pools. Please ensure to put it back in service when this bug is resolved!
I'm trying to replicate, what is the username that is affected?
I'm not sure how this service is working for the user. [root@svn1.dmz.phx1 db]# groups bkero@mozilla.com bkero@mozilla.com : users svn_sysadmins svn_mozilla svn_moco scm_level_1 scm_l10n bzr_bmo scm_level_2 [root@svn1.dmz.phx1 db]# groups flodolo@mozilla.com groups: flodolo@mozilla.com: No such user Perhaps the email address in LDAP is incorrect?
I found these buried in /var/log/messages: Dec 17 01:49:49 svn2 nslcd[1426]: [d6b512] ldap_result() failed: Can't contact LDAP server. The SSH server in requestion is: ldap.db.phx1.mozilla.com (10.8.70.108): [root@svn2.dmz.phx1 log]# nc -v -z 10.8.70.108 389 Connection to 10.8.70.108 389 port [tcp/ldap] succeeded! [root@svn2.dmz.phx1 log]# nc -v -z 10.8.70.108 636 nc: connect to 10.8.70.108 port 636 (tcp) failed: Connection refused
svn.mozilla.org is having problems again It started yesterday evening for me > svn: Can't open file '/repo/svn/mozilla/db/txn-current-lock': Permission denied Had to try multiple times before managing to commit a file (user: flodolo@mozilla.com) Another localizer is having similar problems (see bug 981502).
The "flodolo@mozilla.com" account didn't have the following bits set: scm_l10n,scm_level_1 however, for these sort of commits should not matter. Seems issue was indeed permissions related (as per the error); # ls -ltc /repo/svn/mozilla/db/txn-current-lock -rw-rw-r-- 1 svn svn_mozilla 0 Jan 14 2013 /repo/svn/mozilla/db/txn-current-lock # chown apache /repo/svn/mozilla/db/txn-current-lock # ls -ltc /repo/svn/mozilla/db/txn-current-lock -rw-rw-r-- 1 apache svn_mozilla 0 Mar 10 02:13 /repo/svn/mozilla/db/txn-current-lock # ls -ltc /repo/svn/mozilla/db/write-lock -rw-rw-r-- 1 svn svn_mozilla 0 Jan 14 2013 /repo/svn/mozilla/db/write-lock # chown apache /repo/svn/mozilla/db/write-lock # ls -ltc /repo/svn/mozilla/db/write-lock -rw-rw-r-- 1 apache svn_mozilla 0 Mar 10 02:18 /repo/svn/mozilla/db/write-lock Leaving bug open so that repo admins could verify that all is well and perhaps locate what recently change to cause the issue...
Forgot to add, verified with :flod that it is confirmed working (for him...).
Still failing for the localizer Commit failed (details follow): Can't move '/repo/svn/mozilla/db/txn-protorevs/125266-2p5s.rev' to '/repo/svn/mozilla/db/revs/125/125267': Permission denied Additional errors: Can't open file '/repo/svn/mozilla/db/transactions/125266-2p5s.txn/props': No such file or directory
If this is a permission issue on some files, would it be good to to a 'chown -R apache' to the whole repo?
Note that this is broken again for me too (@mozilla.com account).
One note. While this morning the error was > svn: Commit failed (details follow): > svn: Can't open file '/repo/svn/mozilla/db/txn-current-lock': Permission denied Now it's consistently > svn: Commit failed (details follow): > svn: Can't create temporary file from template '/repo/svn/mozilla/db/svn-XXXXXX': Permission denied As I explained on IRC, even if SSH works, this is still a major problem because most localizers use https:// to commit, and most of them don't even have a SSH key (we don't ask it when filing bugs to get SVN access). My own @mozilla.com account doesn't have a SSH key associated.
And it worked only a few minutes after (log should have plenty of my failures) http://viewvc.svn.mozilla.org/vc?view=revision&revision=125270 Honestly no idea what's going on, but it doesn't seem a simple permission problem.
Does apache need to be in the svn_mozilla group to write to the correct files?
:pir, I think you're right, but I'm not certain. the svn docs on mana don't specifically say, but some of the examples would indicate that the file permissions are correct (which doesn't discount your theory). :digi, can you shed some light?
Flags: needinfo?(bhourigan)
I documented everything I learned during the migration in mana. Unfortunately, it was 2 years ago and the details in my memory are now vague. I'm going to take an educated guess and say that apache should be in svn_mozilla group, because only the mozilla repo is exposed via https and authz should take care of the granular permission side. It could have the unintended side effect of giving folks who are in authz but not svn_mozilla read access. https://mana.mozilla.org/wiki/display/SYSADMIN/Subversion?src=search
Flags: needinfo?(bhourigan)
The problem is still there and we get emails from various localizers that they can't update the projects they are in charge of, this morning again for es-AR locale: Transmitiendo contenido de archivos ...svn: E000013: Falló el commit (detalles a continuación): svn: E000013: Can't move '/repo/svn/mozilla/db/txn-protorevs/125292-2p7w.rev' to '/repo/svn/mozilla/db/revs/125/125293': Permission denied svn: E000002: Additional errors: svn: E000002: Can't open file '/repo/svn/mozilla/db/transactions/125292-2p7w.txn/props': No such file or directory Raising severity to major as this is blocking a lot of people to work.
Severity: normal → major
Assignee: server-ops-webops → eziegenhorn
The problem is definitely related to the crash of svn1 over the weekend, but I'm stuck on sorting out what changed (alas svn3, which appears to be behaving correctly, has been up 240 days; who knows how long svn1 had been up). I'm not sure if this is a read herring or not, but: svn1.dmz.phx1# ps -Af | grep http root 26197 1 0 06:21 ? 00:00:00 /usr/sbin/httpd apache 26199 26197 0 06:21 ? 00:00:00 /usr/sbin/httpd apache 26200 26197 0 06:21 ? 00:00:00 /usr/sbin/httpd apache 26201 26197 0 06:21 ? 00:00:00 /usr/sbin/httpd [snip] svn1.dmz.phx1# cat /proc/26201/loginuid 4294967295 svn3.dmz.phx1# ps -Af | grep http root 16054 1 0 2013 ? 00:12:01 /usr/sbin/httpd apache 19779 16054 0 06:00 ? 00:00:00 /usr/sbin/httpd [snip] svn3.dmz.phx1# cat /proc/19779/loginuid 0 apache on svn1 defintely doesn't have access to the svn files, though I can find no configuration differences.
Assignee: eziegenhorn → klibby
SVN on getfirebug.com is also affected by this bug. Honza
Any progress on this, we are really blocked by this problem. Honza
Whiteboard: [firebug-p1]
+1, for l10n, we need that fixed asap as we are going to receive the Australis content soon.
For the time being, only the working node (svn3.dmz.phx1) is in the pool. Everyone should be able to commit without any issues (at least not permissions related). We are still looking into the other "bad" nodes.
Works for getfirebug.com again, thanks! Honza
Honza or anyone else, please ping me on IRC to verify the "fix".
Setting needinfo's on people that have experienced the problem, to help :Aj verify
Flags: needinfo?(pascalc)
Flags: needinfo?(odvarko)
Flags: needinfo?(francesco.lodolo)
I haven't seen this error in a while (tbh I thought this bug was already closed).
Flags: needinfo?(francesco.lodolo)
It's not so much about whether or not you've seen the problem recently, but about finding someone to test/verify a fix with :Aj. (we're currently in a less than ideal state with the svn web service - he has a potential fix)
Kendall. Can you clarify what help you need?
Flags: needinfo?(pascalc)
I need someone to work with :Aj to verify his potential fix (see comment 25).
I am not experiencing the problem anymore. Honza
Flags: needinfo?(odvarko)
This is currently being tested right NOW, so expect some "breakage" ie permissions errors while testing...
So cluster back to running on just 1 node. Issue is OS specific but can't find which setting is causing the issue. On the working node, if one allows apache to have a login shell, one could ls to the /repo/svn/mozilla/ directory. On the other nodes, this is not possible (hence the above errors). I checked ALL the files, umask etc and cannot find an actual difference that would cause this behavior. At one point I rsync'ed the /etc directory from svn3 to svn2 and then it worked, however, on subsequent checks it stopped working. The working node (svn3) has been up for many many days, so possible that a library still loaded which is allowing this to work. Essentially the "fix" in Bug 982633 would indeed fix this issue, however, would like to determine what exactly is causing it to work on svn3.
Status: NEW → ASSIGNED
Assignee: klibby → afernandez
So went the "extra mile" and "cloned" the working node: svn3.dmz.phx1 to a tmp vm: svn2.dmz.phx1 (currently the real svn2 has the network off). After bring svn2.dmz.phx1 online, same issue as before. So seems indeed the issue is library or permissions related. Apache on svn3 has been running since April 03. I have attached a comparison of loaded libraries and svn3 indeed running older libraries. There could also be an issue with umask since when apache was started. In all, would be best to proceed with Bug 982633 as it will or should fix the permissions issue.
I'll leave the tmp VM running for now in case anyone wants to check etc but will switch back to the real svn2 by Wednesday (07/16) or on Monday. Once again, only svn3 is enabled for the https pool in Zeus.
Assignee: afernandez → server-ops-webops
Severity: major → normal
This caused operational problems. See 1037808
Original svn2 back online and re-puppetized. While "downtimed", following update: P410i 5.76 -> 6.40 Bios 05/05/2011 -> 07/02/2013 (HP NC532i) Ethernet 5.2.7 -> 6.2.26 Kernel 2.6.32-431.17.1 -> 2.6.32-431.20.3 :fubar as per comment 34, please proceed with Bug 982633
I am investigating this. Please coordinate any changes with me until I update the bug.
Assignee: server-ops-webops → bhourigan
This has been solved in 982633.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: WebOps: Source Control → General
Product: Infrastructure & Operations → Developer Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: