Closed Bug 27146 Opened 25 years ago Closed 24 years ago

Need evaluation of Bugzilla perfomance and scalability limits

Categories

(Bugzilla :: Bugzilla-General, defect, P2)

Tracking

()

VERIFIED FIXED
Bugzilla 2.12

People

(Reporter: mitchell, Assigned: endico)

Details

Risto,  filing (edited) questions/comments from chofmann as a bug as you 
requested.

__
It would also be good to get an idea about the source 
of the recent bugzilla performance problems that have been 
observed by many engineers and testers. 

If these performance problems are the result of 
of the system not scaling to current levels of bug entry and 
query traffic, then  this problem could only get worse. 

Can we get any estimates on the "entry and query transaction 
rates" that bugzilla can support?  Are we anywhere close to 
those limits?  Also, can we get someone to look at performance and scalibility 


Active Users 
   90 netscape engineers + 30-40 active mozilla contributor engineers. 
    35 netscape qa testers + 200-300 active mozilla testers that download the 
daily builds. 
    -- ~465 active mozilla contributors 

    -- ???? -total number of bugzilla accounts (is this data available?) 
    ---???? -active bugzilla users( a transaction or two in the last month 
                 - is this data available and monitored, has it moved much 
lately)? 

Causal Users. 
~20,000-30,000  full circle build milestone testers 
~120,000 milestone testers 

We need a plan to preserve performance and usability of bugzilla for the roughly 
500 very active mozilla contributors.
we also need an evaluation of current peak time usage
which seems to be between 2-6pm pst, and what
the peak rate limitations might be with the
given bugzilla configuration and set up.

thanks
I will look into this...being one of 3 important things with Mozilla right now 
(others being cvs performance and lounge's sendmail configs).

I would like to get as much data as possible. Ie. if you can see slowness please 
report exactly when that happened so I can go back to sar and other logs to look 
little bit more. We had some issues with bonsai and addcheckin.pl using a lot of 
resources time to time. So, it's possible that at those times database access 
was slow. Terry changed addcheckin.pl since then and I'm wondering if things are 
any better.

Also, we were talking about moving bonsai to another database instance. That 
might help something too.
Status: NEW → ASSIGNED
One logfile that might be helpful:

I recently added some logging to Bugzilla, where it logs every SQL request it
makes, and also logs when they finish.

On lounge, check out /export2/webtools/bugzilla/data/sqllog.

(Right now, this file is not automatically rotated, which may be a problem.  If
you add rotation code, be aware that Bugzilla won't add to the logfile unless it
already exists.  Any rotating code would have to create a new, empty sqllog
file, and give it appropriate permissions.)
I added a script to check response times of bugzilla. It checks tickets of 
server operations group and after that 10 tickets and I'm logging response times 
for that. 

Ok, we have plenty of logging and other data available. Now I just need to see 
Bugzilla being slow - so, can someone say exactly when they see it slow so we 
can go back to logs and see what was the cause.

Right now mysqld is in heavy load because tree was just opened and 
addcheckin.pl's are loading it. BUT... I haven't seen it slowing down bugzilla 
yet.

So, if you can give me exactly minutes when bugzilla is slow it would help 
(unless my test script catches one).
Terry fixed some hook problems with addchekin.pl that were probable cause for 
mysql/system load (being 30 processes each using 92M memory caused some 
swapping). I would appreciate feedback of any possible bugzilla slowness you've 
seen since 2/10/00 7pm PST. My test script has been finishing in 11-15s (it 
makes about 10 queries to bugzilla, a query for component and some tickets).
So, be aware that some slowness may be occuring just because MySQL may be being
asked to do a lot at the same time.

For example: suppose person A does a big, complicated, slow query.  They are not
surprised when their query takes 15 seconds to complete.

But during that 15 seconds, persons B, C, and D all try to submit a change to a
bug.  Now, submitting a change requires getting an exclusive lock on (most of)
the database, so those people all have to wait for A's query to finish.  They
find their change, which might ordinarily take 3 seconds, now takes up to 20
seconds.

And then, just as B's checkins starts, person E comes along with the most
trivial request, but it takes 9 seconds as they wait for B, C, and D to all
finish.

This may all seem unlikely and pathological, but I bet similar and much worse
things happen fairly often.
OS: Mac System 8.5 → All
Hardware: Macintosh → All
Risto, we'll need to be prepared to give an overview of bugzilla performance,
scalability, bottlenecks, etc for jim hamerly's staff, which meets on Tuesday at
11.  Dmose has a specific set of questions and more details.
Severity: normal → major
I have now seen few instances when reply time from automatic script has been 
rather long. Gotta go to logs and see what caused those. 
I was just looking at the MySQL manual, and found a section which describes the
kind of locking problem I stated above, only worse. 

This doc can be found at
http://www.mysql.org/Manual_chapter/manual_Performance.html#Table_locking .
(Note that we are running version 3.22.29, almost the latest stable release;
many of the options described in that section only apply to 3.23.xx, which is a
development release.  I don't think we want to run development releases.)

Anyway, to quote that page:

> One main problem with this is the following: 
> 
>  *  A client issues a SELECT that takes a long time to run. 
>  *  Another client then issues an UPDATE on a used table; This client 
>     will wait until the SELECT is finished 
>  *  Another client issues another SELECT statement on the same table;
>     As UPDATE has higher priority than SELECT, this SELECT will wait 
>     for the UPDATE to finish. It will also wait for the first SELECT 
>     to finish! 

It's hard to prove, but I bet this happens often enough.  Someone does a big,
slow, stupid query; while that's happening, someone else changes a bug,
suddenly, *everyone* is locked up until the slow stupid query grinds to a
finish.

We can try some of the fixes outlined in that page.  I can hack Bugzilla to
always set SQL_LOW_PRIORITY_UPDATES, or (equivilantly) Risto can restart mysqld
with the --low-priority-updates flag.  This means that changing or creating a
bug can sit and block for a really long time, but everything else would behave
better.

But first, it would be a good idea to prove that this is actually a problem
we're hitting.
Here are some times my script has catched when bugzilla replies were slow:

Date query started + time to finish:
2/11/00 5.00pm 6:07.3
2/11/00 5.05pm 1:08.3
2/11/00 5.40pm 0:43.6
2/11/00 5.45pm 7:45.8
2/11/00 5.50pm 2:48.4
2/12/00 10.25am 1:18.7
2/12/00 11.55am 1:02.1
2/12/00 1.45pm 0:21.3
2/12/00 3.35pm 1:13.5
2/13/00 6.00pm 1:58.0
2/13/00 10.05pm 4:41.2
2/14/00 9.45am 1:10.0

In normal situation this query finishes in 11 seconds. I will check those times 
with systems logs to see if we had some performance issues those times. 
I can't find any bottlenecks from the system... more later
Ok, I have looked into systems performance and right now I'm rather convinced 
that we don't have i/o, cpu, memory or other bottlenecks in the system itself. 
So next steps will be to look into MySQL issues: I have found one segment of 
problems around 5.42-5.45pm last Friday when my script had long return time. 
Terry, you might want to take a look at /cvsmirror/tmp/problem.sqllog (had to 
move sqllogs here because /export2 started to fill up). Look at my comments 
starting with '#####'. There's one place where insert command is issued to 
profiles and after that all selects were blocked for long time. Could this be 
what you describe as a problem?

I'm going to bed now and will look more to these later.
The file I mention here is on lounge.
Mysqld running now with --low-priority-insert option. It didn't know
--low-priority-updates flag... even if manuals refers to it. Weird.
If you're looking at the URLs I mentioned above, they may be talking about
options for the 3.23.xx versions of MySQL, which we're not running.

So, I found the culprit in the scenario you described.  There is the following
line (reformatted here for legibility):

    02/11/00 17:42:31 27732: SELECT bugs.bug_id, bugs.groupset,
    substring(bugs.bug_severity, 1, 3), substring(bugs.priority, 1, 3),
    substring(bugs.rep_platform, 1, 3), map_assigned_to.login_name,
    substring(bugs.bug_status,1,4), substring(bugs.resolution,1,4),
    substring(bugs.short_desc, 1, 60) FROM bugs, profiles map_assigned_to,
    profiles map_reporter LEFT JOIN profiles map_qa_contact ON
    bugs.qa_contact = map_qa_contact.userid, longdescs longdescs_ WHERE
    bugs.assigned_to = map_assigned_to.userid AND bugs.reporter =
    map_reporter.userid AND bugs.groupset & 0 = bugs.groupset AND
    longdescs_.bug_id = bugs.bug_id AND (bug_status = 'NEW' OR bug_status
    = 'ASSIGNED' OR bug_status = 'REOPENED' OR bug_status = 'RESOLVED' OR
    bug_status = 'VERIFIED' OR bug_status = 'CLOSED') AND
    (lower(longdescs_.thetext) regexp '(^|[^a-z0-9])window($|[^a-z0-9])'
    OR lower(longdescs_.thetext) regexp '(^|[^a-z0-9])loads($|[^a-z0-9])'
    OR lower(longdescs_.thetext) regexp '(^|[^a-z0-9])starts($|[^a-z0-9])'
    OR lower(longdescs_.thetext) regexp '(^|[^a-z0-9])in($|[^a-z0-9])' OR
    lower(longdescs_.thetext) regexp
    '(^|[^a-z0-9])background($|[^a-z0-9])') GROUP BY bugs.bug_id ORDER BY
    bugs.priority, bugs.bug_severity

In English, this translates to "generate the list of all bugs in the system that
have a comment containing any of the words "windows", "loads", "starts", or
"in".  It's not very surprising that it returns every bug in the system, and
that it has to look through all 51 megabytes of comment text to do so.  So, it's
very slow to run, and very slow to finish delivering all the results.

Less than a second later, another process (27733) is generating email diffs for
a bug, using the new experimental email code.  This involves updating a
timestamp in a bug.  Which means it needs to get a write lock on the bug table,
which means it has to wait for the big grody query to finish.

3 seconds later, the process you noticed does its select.  It has to wait until
the write finishes.  This is exactly the scenario I found described in the
manual.

Now that you have turned on --low-priority-insert, the only people who should
see really slow behavior is people making changes to bugs.  I think this is an
improvement, but it is not really great.

I just realized what the right solution may be, but I'm scared of the details of
implementing it.

Bugzilla should keep two copies of the database around at all times.  The main
database works exactly as it does now.  There is also a shadow database.  All
changes to the main database get logged in a file.  A background process of some
kind reads the log and makes the same changes to the shadow database.

Then, we change the main query page to do all of its queries against the shadow
database, not the main one.  Theoretically, these queries might be incorrect as
they will be querying old data.  Realistically, the shadow database ought to be
able to be kept pretty well up-to-date, and everything will work great.

I'm pretty sure that BugSplat (Netscape's internal bugsystem) was using this
kind of scheme.

There's just the small matter of implementing it.  Yuck.  But at least it's a
plan...

REASSIGNing back to me, changing product to Webtools, component to Bugzilla,
priority to P1.
Assignee: rko → terry
Status: ASSIGNED → NEW
Component: Server Operations → Bugzilla
Priority: P3 → P1
Product: mozilla.org → Webtools
Yeah, I have.  The problem is that all the nitfy tricks they talk about only
work for INSERT statements.  Which is fine, but I need to do a lot of UPDATE
statements too, and none of their tricks help there.
Status: NEW → ASSIGNED
Just now, when trying to bring up these two URIs:
   http://bugzilla.mozilla.org/show_bug.cgi?id=20394
   http://bugzilla.mozilla.org/show_bug.cgi?id=27164
...both stopped for a noticable few seconds (the first trying to get the 
comments, the second trying to get the top part of the bug).

From the above comments I'm guessing this should not have happened, and that
you may be able to work out what caused this from the logs.
Whoops!  Yes, I'd call a 50-second delay a "noticable few seconds".

My theory is that the change Risto makes doesn't affect LOCK TABLES calls.  And
all of the interesting changes happen while the tables are locked.  So, adding
--low-priority-insert turns out to be a no-op.  (Well, it works when a new bug
is created, but not when an old one is edited.)

I have hacked the code to put in the LOW_PRIORITY parameter to the LOCK TABLES
calls.
You mentioned something in the mail about this, too, but just to check if you 
were thinking same. How about architecture like this:

Main database                               Mirror database
-------------                               ---------------
All SELECT queries hits this                All UPDATE/INSERT queries hits this
This side has high priority for             This side has writes prioritized
selects and penalizes writes                (like we used to have)
(like we have now)
                  <------------------------------
                  Sync once a minute; doesn't matter if takes long
                  time or if the main database is little lagged.
          
This might bring more middle of the air collisions but would make both 
selects and inserts/updates fast.

That's basically the picture I outlined above.

I disagree on your nomenclature.  To me, the Main database is the up-to-date one
that has the real truth in it.  And the second database (I called it "shadow")
is the one that is read-only which might lag behind the times.

In order to "Sync", we apply deltas.  That is, we replay all UPDATE/INSERT/etc
requests into the shadow database.  Rather than once a minute, I think I'll just
have an always-running-usually-idle background process try to do them as they
happen.

And I probably won't bother making every SELECT use the shadow database; just
SELECTs that are potentially expensive.
I just now had long delays viewing bugs 13534 and 27146.
My automatic check script catched something too.
Well, damn.  I have the bare glimmerings of a clue.

In /export2/webtools/bugzilla/data/sqllog, look at the entries for process
15525.  (Or, equivilantly, in /opt/mysql-3.22.29/var/lounge.log, look at thread
ID 41144.)  Both logfiles seem to think that the last thing that this process
did was request a LOCK TABLES, at 13:58:59.  Neither logfile ever indicates that
the LOCK TABLES finished, or that the process ever did anything else.  But I
think the mysql logfile won't ever indicate it finished, it just indicates when
it next gets a legitimate request from that thread.

What I think happened was this:

The process tried to do a LOCK TABLES.  With the new LOW_PRIORITY stuff, it sat
and took a long time, because lots of people were busy reading tables and doing
queries and stuff.  Someone (either the user or the webserver) got bored hanging
around and killed the process.  But this somehow didn't propagate its way all
the way to mysqld, and so mysqld thought this process was still there waiting
for a lock.  Several minutes later (somewhere around 14:05, I think) mysqld
finally managed to honor this LOCK TABLE request.  But there was no longer an
active process behind the request, and so the thread then just hung around with
everything locked.  Finally, a couple minutes later (at about 14:07:01),
something timed out, the thread was quietly killed, the locked tables were
released, and the logjamb of blocked Bugzilla requests were finally unloosed. 
There is quite a flurry of activity at that time.

I have actually seen some other recent evidence that mysqld doesn't notice very
quickly when a connection to it is dropped.  Maybe that can be fixed.

And, if I implement the shadow DB thing, this kind of problem should happen much
less frequently, because it won't take that long to get a write lock and
people/webservers won't get bored.
Updating commited changes has been *real slow* since about 5pm on 2/16.
It is causing great confusion, as it took hours to see changes and the result
is bogus "midair collisions" during the time it takes to update.
The PDT team will not get prompt notification with this state of affairs.
I have done the shadowing stuff as described above.  It still probably needs to
be tuned a bit (for example, the logging table is going to grow without bounds
until we fill up a disk), but things should be much happier.
Making changes to bug reports still taking 30-60 seconds for Bugzilla to progress 
beyond displaying:

>Bug List: (0 of 652) First Last Prev Next   Show list      Query page      Enter new bug 
> <HR>
So, 30-60 seconds on changing is not great, and I hope to further tune things to
make it better.  But I can't consider it a disaster, either.

In times of heavy usage, there will always be some delay.  But I have reason to
hope that we will not be approaching total gridlock like we were before.
Priority: P1 → P2
um. i seem to have the "can confirm bug" bit set in my prefs -> permissions. I
can not seem to confirm anything however. why so?
Let's please limit this bug to discussion about performance issues.  If you're
having other troubles, please open a new bug.  (If your other troubles are
preventing you from opening a bug, please send me mail.)
Attempting to close 28415 as a duplicate of 20901, it puts the "marked as 
duplicate" text into 20901, but times out before making any changes to 28415.
Submiting an additional comment to bug 28327 about 6 minutes ago. I am still 
waiting for it to finish displaying "Bug processed..."
I just submitted a change to bug 28555 (marking it RESOLVED-FIXED).  I waited
for about a minute (the very top of the response page showed, but not down to
the part about sending mail).   Then I hit stop and went back to the bug.  The
changes to the bug were made, but mail was never sent.  I usually get bugzilla
mail within seconds, so I think (considering it's Sunday morning) that I'm
probably not going to get any mail for this change.
I did get the mail after all, but it took about 10 minutes for it to be sent
(the message is dated 7:29, the change was made at 7:19).  That's a performance
problem in itself.  I suspect I would have had to wait until 7:29 if I'd wanted
to see the page finish loading...
The previous could also have been slowness in mail deliveries; in any of the 
mta's enroute.
Hi, I saw this interesting thread and thought I would add several
comments/ideas/musings/opinions.

I've done quite a bit of database/SQL programming but have not used MySQL,
so some of these comments are more general.

From what I've gathered from scanning the MySQL documentation is 
that it only supports table locks, so the main improvement that 
can be done (after the addition of the queued inserts/updates with the
main-mirror database) is to make the SELECT query as fast as possible.

One way of doing this is could be accomplished by adding an 
option to the http://bugzilla.mozilla.org/query.cgi page 
to allow users to limit the # of results returned by their queries. A 
selection box that allows limits of say 25, 50, 100 or unlimited number of 
results. There is a LIMIT option for SELECT statements in MySQL which I think 
will let you accomplish this:
http://mysql.bluep.com/Manual_chapter/manual_Performance.html#LIMIT_optimization
You could default to 50, which for many may be adequate, and the 'unlimited' 
option would keep the power users happy. For example suppose I do a query 
on description using the word 'clipped' which unbounded yields 1000 rows. 
By using a LIMIT of 100, only 10% of the table needs to be read (assuming
an even distribution of the word 'clipped'). This also has a side benefit
of limiting how much HTML the web server has to spit out.

You could also do a full-text index on the longdescs.thetext field.
Build a table of unique words called say 'unique_words' that has an 
field 'word'. This table will have a row for each unique word across all 
descriptions. Build a many to many table between it and the 'longdescs' table, 
indexing 'word' and the foreign key fields appropriately. Your select statement 
can be recoded so that queries that are searching for words in the description 
entry can utilize the indexed unique_words table to find data much faster. 
The downside of this is that inserts/updates would be slower because of 
breaking up the longdescs, however you could do the inserts/updates as you 
normally do them now and have a secondary process break up the newly added 
descriptions hourly. 

Some databases I've used support a READ UNCOMMITTED transaction isolation mode
which allows SELECT statements to ignore locks and just read the data even if
another thread has a write lock. MySQL doesn't seem to support this 
(or transactions), but long term perhaps you can ask them to add a read 
uncommitted or 'dirty' read type feature.

You could consider using another DBMS (not a flame!). From what I see of MySQL 
it doesn't support page or row level locking, which would really help with the
types of problems you are having, and may eliminate the need for a second 
copy of the database. I believe PostgreSQL has page locking and 
READ UNCOMMITTED. (Haven't used it either). Some of the commercial 
products have built-in text indexing features. I know this is less of an 
option since a lot of porting work would be required (bug 1104). 
Optimizing the performance of SELECTs is not my highest priority right now. 
They seem to work pretty well.  Things can always be made better, but I don't
think it's bad right now.

I'm very surprised to hear about 6-minute delays.  I don't know what is causing
them.  I can understand that it's theoretically possible, but I would never
expect it to happen.  Which probably just means that I don't know what's going
on.

The biggest problem that I know of is that not only does MySQL do table-based
locking, but I force it to lock down most of the tables all-at-once when you
update a bug.  This is because I'm super-paranoid about two competing processes
causing inconsistancies by doing changes simultaneously.  I am toying with the
idea of fixing this by simulating record-based locks using the MySQL GET_LOCK()
function.  I'm not sure whether I'll be able to pull this off, nor am I sure how
it will effect performance.
SETUP caused an invalid page fault in
module XPCOM.DLL at 015f:60c580b8.
Registers:
EAX=01320900 CS=015f EIP=60c580b8 EFLGS=00010246
EBX=00000000 SS=0167 ESP=006779e4 EBP=006779f0
ECX=60c6c80c DS=0167 ESI=78010c8e FS=3707
EDX=00000003 ES=0167 EDI=01324b20 GS=0000
Bytes at CS:EIP:
83 23 00 6a 26 68 60 3f c8 60 c7 00 01 00 00 00 
Stack dump:
80000000 01320900 00000000 00677a3c 60c42948 01324b08 60c83844 01320900 
00000000 60c53ea7 00000000 60c454a2 013208b0 60c45513 013208e0 013208b0 
I find that whenever I run a saved Query, it just takes forever to get the 
query. It is to the point where it is just faster to put in the query manuly.
marking fixed for Seth. Thanks a lot, Seth!
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Um, what?

David, I think you just closed the wrong bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
sorry, I was running 5.0 - I must have got confused about which page I was
looking at.
I have been having a performance problem with mozilla and maybe this is the bug
to place it in.  The problem I notice is that the layout processing can take so
much of the CPU that network activity is starved.  For instance, if I go to
Slashdot there is an initial delay while the HTML page downloads.  During this
time the statusbar and the throbber are active.  Then, once it has enough info
to attempt layout the status bar animation and the throbber halt in midstream
and the network activity stops at the same time.

After a few seconds (depending on page complexity and speed of my machine)
mozilla presents the layout and then begins downloading inlined images.  Now,
IMHO this is definately bad behaviour.  For those of us on dialup connections,
downloading images is the top delay in web browsing.  The network code should
never be starved of processing time.  The layout should be processed a little
slower rather than delaying download of the inline images for 2-3 seconds.

My $0.02
No, this bug is for recording problems in *Bugzilla*, not in mozilla itself.
I think I might have been cc'd by mistake, but before I go hide again, I'll offer 
what little advice I can even though it might be obvious.  (I don't know MySQL, 
so I can only make really generic observations.)

Since contention is the main scaling problem with concurrent usage, the only good 
approach I know is to reduce lock granularity size when possible, and to use more 
locks when this separates things into non-interfering spaces.

For big databases under heavy loads, I think there's a classical solution to 
reduce contention in circumstances when one wants to globally lock everything.  
And this is to shorten the duration of the global lock by using it only to guard 
transitions to smaller granularity locks in a tree structure.  To lock child C 
under parent P, one might lock P before C and release P while actually using C.  
(The partial ordering in lock sequence prevents deadlocks.)

Database literature gets really hairy about the fine details of writing, 
intending, and sharing style locks, and transitions between these.  But you can 
just ignore that and aim to reduce contention by ad hoc means, just by thinking 
about the problem in general terms.  Sorry if all this is obvious.  I'm just 
trying to be helpful.
I might also point out that Dave Rothschild says in staff meetings that folks are 
now spending hours a day groveling over buglists to fine tune triage for beta. So 
the load might be more than the initial scenario projects.
data point:  I am having wretched performance problems today.  Simply clicking 
on a link to a bug is taking upwards of a minute to respond.  
It is believed that today's problems were due to some network problems, not
problems with the Bugzilla code itself.
Allright, stupid question.  How do I get my email off the CC for this bug (my 
contribution was a late night boob mistake)?  There are so many on the cc list 
it doesn't display my email address in the truncated list.
it becomes really slow when visiting

http://zicon.stjernesludd.net/passionate





Thought y'all might want to know - mysql has some built in facilities for
mirroring - one of the startup options will generate a trace of exact sql
updates/inserts against the database into a log file. You can then run this log
file on the secondary db server to roll it forward.


Here is the relevant section from the mysql manual:

------------
The update log

When started with the --log-update=file_name option, mysqld writes a log file
containing all SQL commands that
update data. The file is written in the data directory and has a name of
file_name.#, where # is a number that is
incremented each time you execute mysqladmin refresh or mysqladmin flush-logs,
the FLUSH LOGS statement, or
restart the server.

If you use the --log or -l options, mysqld writes a general log with a filename
of `hostname.log', and restarts and
refreshes do not cause a new log file to be generated (although it is closed and
reopened). By default, the
mysql.server script starts the MySQL server with the -l option. If you need
better performance when you start
using MySQL in a production environment, you can remove the -l option from
mysql.server.

Update logging is smart since it logs only statements that really update data.
So an UPDATE or a DELETE with a WHERE
that finds no rows is not written to the log. It even skips UPDATE statements
that set a column to the value it already
has.

If you want to update a database from update log files, you could do the
following (assuming your update logs have
names of the form `file_name.#'):

shell> ls -1 -t -r file_name.[0-9]* | xargs cat | mysql

ls is used to get all the log files in the right order.

This can be useful if you have to revert to backup files after a crash and you
want to redo the updates that
occurred between the time of the backup and the crash.

You can also use the update logs when you have a mirrored database on another
host and you want to replicate
the changes that have been made to the master database.
--------------------

Hope this is useful.

I'm simply wondering if anyone else is getting intermittent e-mails supposedly 
from 1 bug either empty, or with only a last sentence or so of what appears to 
be a paragraph.

I've had a few of these now purporting to be from certain bugs with many posts 
in them, yet visiting these bugs the mysterious chunk of comment in the e-mail 
is not present.

I can dig through my e-mails to pull out some specific examples if necessary. I 
find it hard to believe I'd be the only one this is happenning to though.
I haven't seen anything like that; please open a new bug with all possible
details.

And *PLEASE*, people, only put things about BUGZILLA PERFORMANCE PROBLEMS into
this bug (bug 27146)!
Terry:

I just got this when reassigning a bug:

-----------------------------------------------------------
Mid-air collision detected!

Someone else has made changes to this bug at the same time you were trying to. 
The changes made were: 

 Who
       What
             Old value
                      New value
                                 When

Content-type: text/html 

Software error:

SELECT attach_id FROM attachments WHERE bug_id = 30385: Table 'attachments' was 
not locked with LOCK TABLES at
globals.pl line 134. 

Please send mail to this site's webmaster for help. 
---------------------------

so here it is.
Whoops.  Fixed.

(But why oh why do people insist on reporting non-performance related things in
this bug which is supposed to be only for BUGZILLA PERFORMANCE PROBLEMS ???)
Adding an email string between mitchell and Rickg.
 Rick Gessner wrote:

       The bugzilla site has become a performance bottleneck. I'm wondering if
       we have any plans to add hardware to improve this.

   Mitchell Baker wrote:

     Rick

     last time we looked at this, Risto believed that hardware was not the 
problem.  (I'm planning to add some  anyway as a preventive measure as things 
heat up going forward.) Last time Risto was able to track down performance 
problems; to do so he needed specific data so he could check logs, etc.
As you generate  specific data, please  add it to bug 27146 .  That will allow us 
to look into other potential problems as well.

     mitchell

    
Rick Gessner wrote:

   So what do you need in terms of data? Should I make queries and run a 
stopwatch? Instinctively I know it's a problem 
   because we're all spending more time waiting for buzilla to respond.

   Rick

I'll let risto answer definitively, as to the types of data that are needed to 
track network performance; i/o limits, database performance, etc.  But useful 
info includes:

whether your making a query or updating;
what types of queries
whether this is a constand "this seems slower" or periodic instances where 
something is  really, really, slow, etc.  




I wanted to put it down here again: I don't believe Bugzilla problem is much to 
do with system performance; it's more of bugzilla architecture issue. On 
occasions Bugzilla is slow I haven't seen bottlenecks in the system. It's pretty 
much same thing as 3 lane freeway that is blocked due to maintenance. It doesn't 
go any faster if the freeway is widened to 5 lanes. The block might be gone 
faster if disk i/o would be faster but the main problem IMHO is that bugzilla 
don't scale and we can't forever add faster hardware as we get more bugs.

If anyone have exact times when bugzilla has been slow please give me exact 
times so I can compare system logs again.
Since you wanted exact times:

Right now (as I type this) I am waiting for a change that I entered into bug
27999 to be submitted.  I've been waiting for the "Bug Processed" page for a
good 2 or 3 minutes so far.  It's 2000-03-16 12:08 PST.
For the record, the email I got about my above comment bug 27999 was dated
12:15:31 (on lounge.mozilla.org), but the comment in the bug was listed as
12:03.
tara@tequilarista.org is the new owner of Bugzilla and Bonsai.  (For details,
see my posting in netscape.public.mozilla.webtools,
news://news.mozilla.org/38F5D90D.F40E8C1A%40geocast.com .)
Assignee: terry → tara
Status: REOPENED → NEW
things to look at to gives hints as to what is going on:

 mysqladmin processlist | grep -v Sleep

to see what is locked and because of what.

Then using "EXPLAIN SELECT ..." on some of the queries taking forever and then
some can shed some light on how they could be optimized. But as Terry (I think)
mentioned, if it needs to look through everything then it is going to take a
while no matter what.

Tweaking the various mysqld parameters for buffer sizes etc can often give big
improvements too.
It's often slow between 12:20 and 12:30 am, west coast us (pacific) time.
20 0 * * * cd /export2/mysqlbackup ; ./grabbackups >> log 2>&1
You know what?  This is more a mozilla.org thing.  I mean, I'm all for 
scalability, but I'm not going to worry about doing benchmarking for mozilla.org 
at this point.  I'm gonna reassign this to endico so she can figure out whether 
or not to continue caring.
Assignee: tara → endico
This seems to be pretty throughly evaluated and the problems we were having
have been mostly fixed. Closing this rambling bug.
Status: NEW → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → FIXED
bbbbbooooooooo!
Verifying for endico.
Status: RESOLVED → VERIFIED
In search of accurate queries....  (sorry for the spam)
Target Milestone: --- → Bugzilla 2.12
Blocks: 71861
No longer blocks: 71861
Moving closed bugs to Bugzilla product
Component: Bugzilla → Bugzilla-General
Product: Webtools → Bugzilla
QA Contact: endico → matty
Version: other → unspecified
QA Contact: matty_is_a_geek → default-qa
You need to log in before you can comment on or make changes to this bug.