Closed Bug 464906 Opened 11 years ago Closed 9 years ago

An unexpected error has occurred! when posting to forums while logged in

Categories

(support.mozilla.org :: Forum, task, critical)

task
Not set
critical

Tracking

(Not tracked)

VERIFIED INCOMPLETE

People

(Reporter: cww, Unassigned)

Details

(Whiteboard: 08/06/2009)

I can't tell if it's temporary or if something broke with 0.7.2 so I'm filing a bug.

Go to forums, try to reply to existing thread, get 
An unexpected error has occurred!

It may be a performance issue since everything seems to have slowed down.
According to Quarantine [1], this still happens often and is a major issue. Cww, did you want to set a target milestone on this?


[1]<http://support.mozilla.com/tiki-view_forum_thread.php?comments_parentId=211632&forumId=3>
Severity: normal → major
OS: Linux → All
Hardware: PC → All
Target Milestone: --- → 0.8
Sorry, I always assumed target milestones were set during sumodev meetings.  I would love to have this in the next release.  We could also try the 99999999 trick since I think reverting that is when this started.
This should be a performance issue. It's about time to optimize that part. It's probably an inefficient query.

The "unexpected error" is basically a db error (which means it probably the sql server has gone away or something similar) due to the time limit set on queries.

To reduce overall load on the forums, you can try setting in the Admin.. Forums admin panel, "Performance - Limit forum topic listings to recent (hours)", "Performance - Limit ranking listings to recent posts (hours)" to a smaller number.
Also, is the tiki_comments db table innodb or myISAM? I remember we converted most tables to innodb but left some in myISAM due to need for FULLTEXT index requirements. Is this table one of them? Just wondering if this might be behind performance problems.
Target Milestone: 0.8 → 0.9
What's the status of this? Are we still getting errors?

I imagine this was fixed with the recent query revisions.
Has anyone seen this recently?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Nope; resolving as verified.
Status: RESOLVED → VERIFIED
This is happening today. Both me and Noah are having the problems.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
It is now happening again. I also got the error
This seems a on and off problem. Some threads give you this problem some dont.
Moving to server ops to see what happened to cause this recently.  We think that getting more servers helped last time but there doesn't seem to be a spike in traffic so we'll need some investigative work from IT to figure out what's killing the DB.

Can we get a SHOW FULL PROCESSLIST for prod unless you know of servers that are going down for other reasons?
Assignee: nobody → server-ops
Severity: major → critical
Component: Forum → Server Operations
Product: support.mozilla.com → mozilla.org
QA Contact: forum → mrz
Target Milestone: 0.9 → ---
Version: unspecified → other
I don't see any InnoDB in there, it's entirely MyISAM.

No load at all on the database, gonna have to check the web server logs...
Assignee: server-ops → justdave
ok, the sumo site was recently moved to a new web cluster, and the error logging wasn't set up correctly (was trying to log to a directory that didn't exist).

I just fixed that and we're getting logs now.  Next time it happens get the timestamps and we'll find it in the logs.
Witnessed the error again today @ 6:57pm CST after posting in the forums.
Not a thing recorded in the logs near that timestamp.  Whatever error is happening isn't getting logged.
Also witnessed at 15:48 BST
Seeing a lot of this...Dave, how would you feel about turning on verbose logging maybe just on one webhead?
What I actually mean is on one *slave* since it's a config setting.
hmm, this apparently got left here while I was on vacation last week, I'm still getting caught up.  It's been a week, are we still seeing this?  I've got no qualms with turning on verbose logging on a webhead to test this.
i also get it while voting on comments on the KB...
(In reply to comment #19)
> I've got no qualms with turning on verbose logging on a webhead to test this.
You mean you didn't already do this? *Faceplant* Do it!

>It's been a week, are we still seeing this?
Yes! All day today, and last night. Right now I'm seeing it consistently after 6pm CDT, looking again its 6:20pm, and its still happening. If I post/edit my post in the Firefox forum, its there! Staring me in the face! *Stabs tiki wiki*
(In reply to comment #17)
> Seeing a lot of this...Dave, how would you feel about turning on verbose
> logging maybe just on one webhead?

How is that done?
Whiteboard: waiting on verbose steps
This is happening again today as of 7:20 pm CDT. It's causing users to think their posts aren't going thru, making them post twice.
Laura, how do we turn on logging?  I think we're stuck debugging since nothing's logging anything interesting.
(In reply to comment #24)
> Laura, how do we turn on logging?  I think we're stuck debugging since
> nothing's logging anything interesting.

Via the admin interface usually. I *don't* think we want to do this on all boxes for perf reasons, so I'm wondering if we can somehow hive off one slave and set it there.

It's on
https://support.mozilla.com/tiki-admin.php?locale=en-US&page=general
The settings we want are "TikiWiki verbose error reporting" and "Log SQL"
We should also set it to "Report all PHP errors".

To set this on a single slave we'll need to set these directly on that slave.
In the database these settings correspond to name-value pairs in the tiki_preferences table.
error_reporting_level should be set to 2047
error_reporting_adminonly set to 'y'
tiki_error_reporting_verbose to 'y'
log_sql to 'y'
They all use the same pool of database slaves, and the config for which database to talk to is shared between all of them.  I don't think there's a way to set this on just one if it's stored in the database.
Maybe we can just do it for a short period of time during an off-peak time of day or something?
k, let's turn it on one evening.  Dave, I'll leave when up to you as you have a better view of the traffic patterns.
Does this error still happen when traffic is not at peak (just out of curiosity)?  Noah, do you have insights as to when this is most likely to happen?
Cww, I thought it was based on traffic but since I have no access to those stats I couldn't say. Keeping in mind my central timezone, I witnessed the problem all day long (11am-9pm) as I was posting in the forums. Even to late evening.

Which makes me think this isn't based on traffic at all, and something just broke where ever it broke.
Don't see a plan here so here's one - 

1. Turn on logging during tonight's maintenance window.  
2. Let logging run throughout the whole window
3. Attempt to reproduce problem

Will there be enough people online and around to help test?
I don't think I'll be online tonight, no.
Scratch tonight, plan for next week instead.  Dave might even be around in person!
Whiteboard: waiting on verbose steps → 07/21 ?
Don't want to do this during a release.
Whiteboard: 07/21 ? → 07/23
Flags: needs-downtime+
Whiteboard: 07/23 → 08/06/2009
I don't think this needs to be announced but it'd be great it everyone's around to get this tested tonight.
(In reply to comment #35)
> I don't think this needs to be announced but it'd be great it everyone's around
> to get this tested tonight.

I'll be around, but will have both a small SFx push and an AMO release tonight, by myself.
ok, so we have the logging enabled cluster-wide, and we're not getting anything new that we weren't already getting in the apache logs on the webheads, nor in the tiki_logs table in the database.  Anyone know if there's somewhere special the new logged items get stored?
I'm not seeing any performance impact at the moment from having this stuff enabled, might as well leave it on until we do.  Still haven't figured out if/where it's logging anything though.
Ive just has this error so if we find out where its logging too we should check the time stamp about 4 mins before now
This is now happening on every thread I reply too.
Laura, any clue where to find debug logs?  I feel like were stuck here.
Alternatively, if we can't figure out where it logs to, can we add some debugging output that will intentionally log stuff around the code that probably handles this action?
Move this back to server ops when there's actually something for IT to do again.
Assignee: justdave → nobody
Component: Server Operations → Forum
Flags: needs-downtime+
Product: mozilla.org → support.mozilla.com
QA Contact: mrz → forum
Version: other → unspecified
So long ago.
Status: REOPENED → RESOLVED
Closed: 11 years ago9 years ago
Resolution: --- → INCOMPLETE
Meow.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.