Open Bug 487437 Opened 15 years ago Updated 11 years ago

[MS-SQL] Support MS SQL Server: Bugzilla::DB::Mssql

Categories

(Bugzilla :: Database, enhancement)

enhancement
Not set
normal

Tracking

()

ASSIGNED

People

(Reporter: mockodin, Assigned: jasmins)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

Attachments

(1 file, 13 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8
Build Identifier: Bugzilla CVS Head

Review ticket for adding Bugzilla::DB::Mssql for MSSQL support

Reproducible: Always
Blocks: bz-mssql
No longer blocks: bz-mssql
Blocks: bz-mssql
Assignee: database → m.thomas
OS: Windows NT → All
Hardware: x86_64 → All
Target Milestone: --- → Bugzilla 4.0
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attachment #372701 - Attachment description: Initial Patch for Bugzilla::Mssql → Initial Patch for Bugzilla::DB::Mssql
Attachment #372701 - Attachment is patch: true
Attachment #372701 - Attachment mime type: application/octet-stream → text/plain
Attachment #372701 - Flags: review?(mkanat)
Attached patch Patch for Bugzilla::DB::Mssql (obsolete) — Splinter Review
Corrected placement of _bz_build_schema_from_disk

accidentally move it under Bugzilla::DB::Mssql::st during some cleanup.

Incidentally, why are not making separate st.pm files?
Attachment #372701 - Attachment is obsolete: true
Attachment #372701 - Flags: review?(mkanat)
Attachment #372742 - Flags: review?(mkanat)
You don't need bz_build_schema_from_disk, and you don't need to implement bz_initial_schema.
It seem to be called however, I'll look into it.
I copied more than needed from some of the other modules, working weeding out extra calls and tweaking performance.
Attached patch Patch for Bugzilla::DB::Mssql (obsolete) — Splinter Review
Cleaned up un-needed functions.
Attachment #372742 - Attachment is obsolete: true
Attachment #372952 - Flags: review?(mkanat)
Attachment #372742 - Flags: review?(mkanat)
Attached patch Patch for Bugzilla::DB::Mssql (obsolete) — Splinter Review
Minor updates to remove some over thinking on my part.

removed the custom TO_DAYS , FROM_DAYS functions in favor of directly calling
DATEDIFF and DATEADD
Attachment #372952 - Attachment is obsolete: true
Attachment #372952 - Flags: review?(mkanat)
Attachment #374521 - Flags: review?(mkanat)
Comment on attachment 374521 [details] [diff] [review]
Patch for Bugzilla::DB::Mssql

>+use constant GROUPBY_REGEXP => '((CASE\s+WHEN.+END)|(SUBSTR.+\))|(TO_CHAR\(.+\))|(\(SCORE.+\))|(\(MATCH.+\))|(\w+(\.\w+)?))(\s+AS\s+)?(.*)?$';

Note that GROUPBY_REGEXP no longer exists. It can go away from your file.
Attached patch Patch v5 (obsolete) — Splinter Review
Corrected the creation logic on triggers to create single Update and delete
trigger instead of one for one when using a trigger instead of a FKey. This
required refactoring of the schema where references where concerned.

In the cases where a FKey is still used added logic to determine if index can
be filtered (ignore nulls) or if a normal index should be used

A result of the mix triggers and fkeys is that schema changes will trigger them
to be completely refreshed. By that I mean dropped and recreated.

This includes when creating custom multi select fields. Running checksetup
against mssql takes this into account and rebuilds the custom field table
fkeys/triggers in addition to the default bugzilla tables fkeys/triggers.

I need to run a few more tests against a loaded data set, about 200k bugs, 1.5M
comments. And see if I need to change this a bit, I probably need to make this
only drop and recreate if something changed for that specific FKey or
associated tables.

A fair number of other adjustments, overall performance under mssql improved
greatly. the biggest limitation at this point I think is the speed of TT2 under
windows. Pages like the admin edit user screen are slow while it parses the
product responsibilities (default assignee or QA contact etc..) I'm guessing
these are a bit sluggish, comparatively to other pages,  under any OS and Web
server however.
Attachment #374521 - Attachment is obsolete: true
Attachment #390548 - Flags: review?(mkanat)
Attachment #374521 - Flags: review?(mkanat)
Attachment #390548 - Flags: review?(mkanat)
Attached patch v6 (obsolete) — Splinter Review
Does NOT included the source code for the CLR Object
Attachment #390548 - Attachment is obsolete: true
Ignore comment 9

Pretty much everything there was either backed out due to complexity vs need or was worked around in a better manner.

I have not tested it yet, but I believe we will be able to drop the db requirement down and support MSSQL 2005 and MSSQL 2008. MSSQL 2000 can not be supported due to its maximum field size for varchar and handling differences for text fields. Also a lack of unicode support.

SQL 2005 introduced support for NVARCHAR(MAX) which more closely matches other databases support for text fields and their handle of comparison operations for the same. Additionally NVARCHAR supports unicode:

Variable-length Unicode character data. n can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size, in bytes, is two times the number of characters entered + 2 bytes. The data entered can be 0 characters in length. The ISO synonyms for nvarchar  are national char varying and national character varying
Status: NEW → ASSIGNED
Attached patch v7 (obsolete) — Splinter Review
Added Source code for CLR dll to Contrib
Added better support for GROUP_CONCAT
      including sorting as third param, asc, desc and ntrl (or null/undef)
           null/undef/ntrl = default sorting, ie in the order the data was read

Changed the name of the dll to Bugzilla.MSSQL, instead of Mockodin.Bugzilla :-)
Attachment #429018 - Attachment is obsolete: true
Attachment #436371 - Flags: review?(mkanat)
Attached patch v7 (obsolete) — Splinter Review
quick repost, missed a item in search and replace for clr rename
Attachment #436371 - Attachment is obsolete: true
Attachment #436376 - Flags: review?(mkanat)
Attachment #436371 - Flags: review?(mkanat)
Attachment #436376 - Flags: review?(mkanat) → review-
Comment on attachment 436376 [details] [diff] [review]
v7

  Hey, so I have a LOT of comments below, but before I go into those, I want to let you know that I really, really, really appreciate that you have put so much time and effort into this, because I know what a tremendous undertaking this was. :-) There's a few times below that I get pretty critical, but I want you to know that that's just because I want this to be very maintainable and work perfectly for every Bugzilla user, and overall I am very very glad to have this patch.

  With that said:

>=== added file Bugzilla/DB/Mssql.pm
>+# The Initial Developer of the Original Code is Netscape Communications
>+# Corporation. Portions created by Netscape are
>+# Copyright (C) 1998 Netscape Communications Corporation. All
>+# Rights Reserved.

  Untrue.

>+=head1 NAME
>+
>+Bugzilla::DB::Mssql - Bugzilla database compatibility layer for MSSQL

  I know that a lot of other modules have this stuff up top, but could you put the POD at the bottom? I've decided that's the best standard for us.

>+my %locks;

  Why do you have a package-level "my" variable? (Or any package-level variable at all, for that matter.)

>+# This is how many comments of MAX_COMMENT_LENGTH we expect on a single bug.
>+# In reality, you could have a LOT more comments than this, because 
>+# MAX_COMMENT_LENGTH is big.
>+use constant MAX_COMMENTS => 50;

  You should not be overriding this in a driver. Why are you?

>+use constant EMPTY_STRING  => '';

  You don't need to specify that.

>+use constant BLOB_TYPE =>  undef ;

  MS-SQL can bind blobs correctly without any special stuff to DBI?

  Nit: Semicolon right after "undef".

>+# This module extends the DB interface via inheritance
>+use base qw(Bugzilla::DB);

  Nit: This should be right under "package" and doesn't need the comment.

>+sub new {
>+    my ($class, $user, $pass, $host, $dbname, $port, $sock) = @_;
>+
>+    # construct the DSN from the parameters we got
>+    my $dsn = "dbi:ODBC:Driver={SQL Server Native Client 10.0};

  Um, shouldn't that be specifyable somehow? Or at least in a comment.

Server=$host;UID=$user;PWD=$pass;Database=$dbname;APP=Bugzilla;MARS_Connection=Yes;";

  What's MARS_Connection mean? Also, what's APP? Also, why do you have to put the password into the DSN? That's dangerous, because it will very likely expose the password in error messages.

>+    my $attrs = { odbc_default_bind_type => '-9', LongReadLen => 5000000 };

  What are those for? Also, why isn't LongReadLen a constant?

>+    # Needed by TheSchwartz
>+    $self->{private_bz_dsn} = \($dsn, $user, $pass, $attrs);

  Um...  (a) You just tried to assign a list of references to a single hash item (b) that's not the format of private_bz_dsn (c) does TheSchwartz even support MS-SQL?

>+    # all class local variables stored in DBI derived class needs to have
>+    # a prefix 'private_'. See DBI documentation.
>+    $self->{private_bz_tables_locked} = "";

  That variable is actually obsolete and should be removed from all of Bugzilla.

>+sub bz_explain {
>+    my ($self, $sql) = @_;
>+    # effectly does nothing but allow SQL to display
>+    return '';
>+}

  Ummm, there's no way to get query analysis data out of MS-SQL via SQL?

>+sub bz_last_key {
>+    my ($self) = @_;
>+
>+    my ($last_insert_id) = $self->selectrow_array('SELECT @@IDENTITY');

  Umm, what if we just inserted several rows into several different tables? It's not something that I think we do, but what if we did?

>+sub sql_regexp {
>+    my ($self, $expr, $pattern) = @_;
>+
>+    $pattern = "'\\\\x00-\\\\x1f'" if ($pattern eq "'[[:cntrl:]]'");

  I think that instead, you need to replace [:cntrl:] with the item there.

>+    return "dbo.REGEXP($expr,$pattern, 0) = 1";

  What's the 0 for?

>+sub sql_string_concat {
>+    my ($self, @params) = @_;

>+    my @hack_for_hack;

  Nit: Should have a better variable name.

>+    foreach my $string (@params){
>+        push @hack_for_hack, "cast($string as nvarchar(max))";

  Why does this have a CAST on it? Is nvarchar(max) long enough to hold all the TEXT types?

  Nit: CAST should be capitalized.

>+# Currently Unsupportable, CONTAINS can not be used in GROUP BY

  Then let's not have the code here. But I don't see why it would be in a GROUP BY.

>+sub sql_from_days {
>+    my ($self, $days) = @_;
>+
>+    return " dateadd(dd,0, ( $days )) ";

  Why the spaces around the SQL?

>+sub sql_date_format {
>+    my ($self, $date, $format) = @_;
>+
>+    $format = trim($format);
>+    $format = '%Y.%m.%d %H:%i:%s' unless $format;
>+
>+    return " dbo.DATE_FORMAT($date,'$format') ";

  You just have a return here, and then there's a bunch of unreachable code after it.

>+sub sql_group_concat {
>+    my ($self, $column, $separator, $sortorder) = @_;

  We don't support $sortorder yet, so I'd wait until we actually have that before adding it to this patch.

>+sub LOCALTIMESTAMP {

  Why do you have this here?

>+sub MOD {

  And this?

>+sub bz_setup_database {
>+    my ($self) = @_;
>+
>+    print "Regenerating CLR Functions for mssql...";

  That should be an install_string. Also, it probably shouldn't happen every time somebody runs checksetup.pl. If it has to happen every time, then there shouldn't be a string printed.

>+    $self->do("IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[FROM_DAYS]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
>+    DROP FUNCTION [dbo].[FROM_DAYS]");

  Can you put this into a subroutine instead of repeating the same code 7 times?

>+    $self->do(q|

  Usually we use q{ unless there's some reason not to.

>+        CREATE ASSEMBLY [Bugzilla.MSSQL]
>+        AUTHORIZATION [dbo]
>+        FROM 
> [snip]

  Instead of having an enormous string inside of this file, which will be loaded into memory for every httpd child, could the CLR code live in an external file?

>+    $self->do("CREATE FUNCTION [dbo].[FROM_DAYS](\@days [int])
>+    RETURNS [datetime] WITH EXECUTE AS CALLER
>+    AS 
>+    EXTERNAL NAME [Bugzilla.MSSQL].[UserDefinedFunctions].[FROM_DAYS]
>+    ");

  If you don't need to use ", then just use ' and you can remove the \ from @days. (Same comment for all of these.)

  Also, our normal style would be to simply put the "); at the end of the text.

  These might be more readable using the heredoc syntax, though:

$self->do(<<'END'
  CREATE FUNCTION [dbo].[FROM_DAYS](@days [int])
          RETURNS [datetime] WITH EXECUTE AS CALLER
               AS 
    EXTERNAL NAME [Bugzilla.MSSQL].[UserDefinedFunctions].[FROM_DAYS]
END
);

  (You might want to use something different than "END" though.)

  Alternately, since there are so many of these, you might want to put their name, arguments, and return values into a data structure and have a function go through the data structure and create them.

>+    $self->do("IF  EXISTS (SELECT name FROM sys.procedures WHERE name = N'sp_alter_column_default' and type = 'P')

  Nit: Very long line.

>+    $self->do(q|CREATE PROC [sp_alter_column_default]

  I don't really understand what this is doing, but that's probably just because I haven't looked at the Schema part yet.

>+    $self->do("IF  EXISTS (SELECT name FROM sys.procedures WHERE name = N'sp_alter_primarykey' and type = 'P')

  You're telling me that there's no way to do this with normal SQL?

>+sub _bz_add_field_table {
> [snip]

  This is not code you should be overriding, and I don't see any reason why you would have to. It looks just like a direct copy of the superclass's code.

>+sub bz_add_field_tables {
>+    my ($self, $field) = @_;
>+    
>+    $self->_bz_add_field_table($field->name,
>+                               $self->_bz_schema->FIELD_TABLE_SCHEMA, $field->type);
>+    if ($field->type == FIELD_TYPE_MULTI_SELECT) {
>+        my $ms_table = "bug_" . $field->name;
>+        $self->_bz_add_field_table($ms_table,
>+            $self->_bz_schema->MULTI_SELECT_VALUE_TABLE);
>+ 
>+        $self->updatecreate_column_refs;

  Instead of copying all of the code, you should be calling SUPER::bz_add_field_tables and then doing your custom piece afterward. Also, your function name should start with bz_ or _bz_.

>+sub get_random_name()

  This already exists in Bugzilla::Util as generate_random_password.

>+sub adjust_statement {
>+    my ($sql) = @_;
>+
>+    # We can't just assume any occurrence of "''" in $sql is an empty
>+    # string, since "''" can occur inside a string literal as a way of
>+    # escaping a single "'" in the literal.  Therefore we must be trickier...
>+
>+    # split the statement into parts by single-quotes.  The negative value
>+    # at the end to the split operator from dropping trailing empty strings
>+    # (e.g., when $sql ends in "''")
>+    my @parts = split /'/, $sql, -1;
> [snip]

  Please factor this code out into a separate function if you're going to share it with Oracle.

>+    # MSSQL doesn't have LIMIT, so if we find the LIMIT comment, wrap the
>+    # query with "SELECT * FROM (...) WHERE rownum < $limit"
>+    my ($limit,$offset) = ($part =~ m{/\* LIMIT (\d*) (\d*) \*/}o);

  I don't know if this is in the Oracle part too, but it occurs to me that one could specify a LIMIT that was a column, though we never do this currently.

>+        # Look for a LIMIT clause
>+        ($limit) = ($nonstring =~ m(/\* LIMIT (\d*) \*/)o);

  Why do you essentially have this code twice?

  Also, I think you don't need $has_from, $has_select, or probably this entire loop.

>+    if (defined($limit)) {
>+        if ($new_sql !~ /\bWHERE\b/) {
>+            $new_sql = $new_sql." WHERE 1=1";

  Nit: Space around the period.

  Also, you can't just paste a WHERE clause on some SQL--it's not valid at the end of every statement. And what if we start using UNION?

>+         my ($before_where, $after_where) = split /\bWHERE\b/i,$new_sql;

  What if it has the word WHERE in a literal string? (Like, what if I search for "WHERE" on query.cgi?)

>+         if (defined($offset)) {
>+             if ($new_sql =~ /(.*\s+)FROM(\s+.*)/i) { 

  And what if it has the word FROM in a literal string?

>+                 $before_where = "$before_from FROM ($before_from,"
>+                             . " ROW_NUMBER() OVER (ORDER BY (SELECT 1)) R "
>+                             . " FROM $after_from ) AS $random_name"; 
>+                 $after_where = " $random_name.R BETWEEN $offset+1 AND $limit+$offset";

  Why are you doing it before FROM? We need to limit after GROUP BY.

>+         $before_where =~ s/^(select +(distinct)?)/$1 top $limit /i;

  Nit: Use \s+ instead of a raw space with a plus character after it.

>+    $new_sql =~ s/CURRENT_DATE\(?\)?/CAST(GETDATE() as DATE)/igo;

  Oh, CURRENT_DATE in a raw string....

>+    $new_sql =~ s/LOCALTIMESTAMP\(0\)/GETDATE()/igo; # Might have to change this is we do more than (0) in the future

  And so on....

>+    $new_sql =~ s/NOW\(\)/GETDATE()/igo;

  And so on....

>+    $new_sql =~ s/\(?CAST\((\b[\w\.]+\(?\)?) as (\w+\(?\)?)\) (\+|\-) \/\*INTERVAL (\d+|\?) (\b\w+\b)\*\/\)? (\+|\-) \/\*INTERVAL (\d+|\?) (\b\w+\b)\*\//CAST(DATEADD($5, $3$4, DATEADD($8, $6CAST($7 as int), $1)) as $2 )/;

  This is pretty hard to read. Could you use /x and make it more readable?

  Also, why are you doing this here instead of in sql_interval?

>+    $new_sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;

  MS-SQL doesn't support the ANSI syntax for that?

>+sub do {
> [snip]

  If you're going to share these with Oracle, they should go into their own class that you can subclass instead of Bugzilla::DB.


>+sub db_lock {

  What's this for?

>+# Should really live in it's own .pm file...

  Nit: its

  (it's is a contraction, "its" is a possessive pronoun like his or hers, which also don't have apostrophes in them.)

>+# DNI function override for MSSQL sql adjustments

  DNI? (Is that supposed to be DBI?)

>+package Bugzilla::DB::Mssql::st;
> [snip]

  And these should go into their own class as well, that both MS-SQL and Oracle can subclass.

  Let's call the package Bugzilla::DB::StatementModifier, and it can also contain Bugzilla::DB::StatementModifier::st.


>=== added file contrib/mssql_clr_functions/Bugzilla.MSSQL/AssemblyInfo.cs

  I would love to review these, but I simply don't have the knowledge or expertise. I'll put a call out to the developers list.
Noting I related alot of nits or items to being the result of copy and paste from Mysql.pm or Oracle.pm I will comment those items as <CPItem> purely to save typing a full response, generally take it mean that the item is removable or simple acknowledgment of the comment with the pointer back to a Copy and Paste.

(In reply to comment #14)
> (From update of attachment 436376 [details] [diff] [review])
>   Hey, so I have a LOT of comments below, but before I go into those, I want to
> let you know that I really, really, really appreciate that you have put so much
> time and effort into this, because I know what a tremendous undertaking this
> was. :-) There's a few times below that I get pretty critical, but I want you
> to know that that's just because I want this to be very maintainable and work
> perfectly for every Bugzilla user, and overall I am very very glad to have this
> patch.
No Worries, <puts on big boy pants>

> 

>   Untrue.

<CPItem>

> >+=head1 NAME
> >+
> >+Bugzilla::DB::Mssql - Bugzilla database compatibility layer for MSSQL
> 
>   I know that a lot of other modules have this stuff up top, but could you put
> the POD at the bottom? I've decided that's the best standard for us.

Should, could will, unless someone else gets to it first :-)

> >+my %locks;
> 
>   Why do you have a package-level "my" variable? (Or any package-level variable
> at all, for that matter.)

Actually that shouldn't be there, some internal code here uses it.
 
> >+# This is how many comments of MAX_COMMENT_LENGTH we expect on a single bug.
> >+# In reality, you could have a LOT more comments than this, because 
> >+# MAX_COMMENT_LENGTH is big.
> >+use constant MAX_COMMENTS => 50;
> 
<CPItem>

> 
> >+use constant EMPTY_STRING  => '';
> 
>   You don't need to specify that.

So long as that empty string placer holder doesn't get used.. <CPItem>

> 
> >+use constant BLOB_TYPE =>  undef ;
> 
>   MS-SQL can bind blobs correctly without any special stuff to DBI?

Using mssql data type of varbinary, Hasn't caused in issues here, so no apparently not. Least was I haven't seen an issue.

> 
>   Nit: Semicolon right after "undef".
> 
> >+# This module extends the DB interface via inheritance
> >+use base qw(Bugzilla::DB);
> 
>   Nit: This should be right under "package" and doesn't need the comment.
Noted and <CPItem>

> >+sub new {
> >+    my ($class, $user, $pass, $host, $dbname, $port, $sock) = @_;
> >+
> >+    # construct the DSN from the parameters we got
> >+    my $dsn = "dbi:ODBC:Driver={SQL Server Native Client 10.0};
> 
>   Um, shouldn't that be specifyable somehow? Or at least in a comment.
YES, left in that manner though since we don't seem to allow for this in other packages, how might we want to do this? Optimally it would be a item in localconfig I would assume, but unique to mssql felt odd when I thought about it. And this really is built for the SQL Server Native Client 10.0 driver. Which is freely downloadable from the Microsoft.
> 
> Server=$host;UID=$user;PWD=$pass;Database=$dbname;APP=Bugzilla;MARS_Connection=Yes;";
> 
>   What's MARS_Connection mean? Also, what's APP? Also, why do you have to put
> the password into the DSN? That's dangerous, because it will very likely expose
> the password in error messages.

MARS is Microsoft's version of Multiple Active Results Sets

App allows you see connection in mssql activity console and know what application is connecting otherwise reports as apache, which is not useful when you may have more than one apache app connection.. which I do.

PWD: I'll test it later. Copied from connections string site.
> 
> >+    my $attrs = { odbc_default_bind_type => '-9', LongReadLen => 5000000 };
> 
>   What are those for? Also, why isn't LongReadLen a constant?
> 
> >+    # Needed by TheSchwartz
> >+    $self->{private_bz_dsn} = \($dsn, $user, $pass, $attrs);
> 
>   Um...  (a) You just tried to assign a list of references to a single hash
> item (b) that's not the format of private_bz_dsn (c) does TheSchwartz even
> support MS-SQL?

Referencing a ..um.. disagreement you had with another contributor on yahoo groups, least ways I mean that in that you shot down his response to someone else.. yes it can work perfectly fine. It just takes effort to compile so modules. That said similar to %lock this could be removed.

> >+    # all class local variables stored in DBI derived class needs to have
> >+    # a prefix 'private_'. See DBI documentation.
> >+    $self->{private_bz_tables_locked} = "";
> 
>   That variable is actually obsolete and should be removed from all of
> Bugzilla.

<CPItem>

> >+sub bz_explain {
> >+    my ($self, $sql) = @_;
> >+    # effectly does nothing but allow SQL to display
> >+    return '';
> >+}
> 
>   Ummm, there's no way to get query analysis data out of MS-SQL via SQL?

If I work hard enough, probably, the effort I did put in at one point showed that what builtin function there were, were basically less useful, I mean that, than the raw sql being displayed. So yeah, unless someone has time to find a real way to accomplish this.

> >+sub bz_last_key {
> >+    my ($self) = @_;
> >+
> >+    my ($last_insert_id) = $self->selectrow_array('SELECT @@IDENTITY');
> 
>   Umm, what if we just inserted several rows into several different tables?
> It's not something that I think we do, but what if we did?
Then we would have a problem, but is that different than mysql's LAST_INSERT_ID()
> 
> >+sub sql_regexp {
> >+    my ($self, $expr, $pattern) = @_;
> >+
> >+    $pattern = "'\\\\x00-\\\\x1f'" if ($pattern eq "'[[:cntrl:]]'");
> 
>   I think that instead, you need to replace [:cntrl:] with the item there.
> 
> >+    return "dbo.REGEXP($expr,$pattern, 0) = 1";
> 
>   What's the 0 for?

One of my custom CLR functions, to enable (1) or disable (0) case sensitivity

> >+sub sql_string_concat {
> >+    my ($self, @params) = @_;
> 
> >+    my @hack_for_hack;
> 
>   Nit: Should have a better variable name.

All come on :-) you know you like it.
> 
> >+    foreach my $string (@params){
> >+        push @hack_for_hack, "cast($string as nvarchar(max))";
> 
>   Why does this have a CAST on it? Is nvarchar(max) long enough to hold all the
> TEXT types?

You will run you server dry of memory and possibly swap space before you fill a nvarchar(max) if I recall think terabytes. Granted that is a possible size not a practical one.

>   Nit: CAST should be capitalized.


> >+# Currently Unsupportable, CONTAINS can not be used in GROUP BY
> 
>   Then let's not have the code here. But I don't see why it would be in a GROUP
> BY.

Don't recall the scenarios, but if I removed it its because I ran in to it with no work around. Will review notes.

> >+sub sql_from_days {
> >+    my ($self, $days) = @_;
> >+
> >+    return " dateadd(dd,0, ( $days )) ";
> 
>   Why the spaces around the SQL?

Readability in code, but no reason, probably paste it from a query window.

> >+sub sql_date_format {
> >+    my ($self, $date, $format) = @_;
> >+
> >+    $format = trim($format);
> >+    $format = '%Y.%m.%d %H:%i:%s' unless $format;
> >+
> >+    return " dbo.DATE_FORMAT($date,'$format') ";
> 
>   You just have a return here, and then there's a bunch of unreachable code
> after it.

Probably junk to be removed. Will Check.

> 
> >+sub sql_group_concat {
> >+    my ($self, $column, $separator, $sortorder) = @_;
> 
>   We don't support $sortorder yet, so I'd wait until we actually have that
> before adding it to this patch.

With out altering the CLR, which I'd rather not it still need to be referenced internally to the function, but the param could be dropped.

> >+sub LOCALTIMESTAMP {
> 
>   Why do you have this here?

Ok to remove, left over.

> >+sub MOD {
> 
>   And this?
Only sort of phantom code, I believe this is needed for whine.pl, with an edit, several actually there as well. Either diff ticket, or missed rolling into the patch file, or forgot to open the ticket... or..

> 
> >+sub bz_setup_database {
> >+    my ($self) = @_;
> >+
> >+    print "Regenerating CLR Functions for mssql...";
> 
>   That should be an install_string. Also, it probably shouldn't happen every
> time somebody runs checksetup.pl. If it has to happen every time, then there
> shouldn't be a string printed.

Yes can be moved there, was living there so that I didn't have to wipe my database with each CLR edit. Hadn't really occurred to me, but makes sense. I don;t suppose there is a commandline switch to trigger specific install functions, at least for this it would be useful at times. Eg migrated data to a new database and you want to insure everything is enabled and registered etc.. That said it only takes about 2 seconds to deploy as is...?

> 
> >+    $self->do("IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[FROM_DAYS]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
> >+    DROP FUNCTION [dbo].[FROM_DAYS]");
> 
>   Can you put this into a subroutine instead of repeating the same code 7
> times?

Yes, there are some differences, CLR vs a basic UDF, but I think it can be merged in to one.

> >+    $self->do(q|
> 
>   Usually we use q{ unless there's some reason not to.

Not really.

> >+        CREATE ASSEMBLY [Bugzilla.MSSQL]
> >+        AUTHORIZATION [dbo]
> >+        FROM 
> > [snip]
> 
>   Instead of having an enormous string inside of this file, which will be
> loaded into memory for every httpd child, could the CLR code live in an
> external file?

..Yes.. we had some debate here on that actually. Feel free to side with me on that one :-), since it appears you are :-P

> 
> >+    $self->do("CREATE FUNCTION [dbo].[FROM_DAYS](\@days [int])
> >+    RETURNS [datetime] WITH EXECUTE AS CALLER
> >+    AS 
> >+    EXTERNAL NAME [Bugzilla.MSSQL].[UserDefinedFunctions].[FROM_DAYS]
> >+    ");
> 
>   If you don't need to use ", then just use ' and you can remove the \ from
> @days. (Same comment for all of these.)
> 
>   Also, our normal style would be to simply put the "); at the end of the text.
> 
>   These might be more readable using the heredoc syntax, though:
> 
> $self->do(<<'END'
>   CREATE FUNCTION [dbo].[FROM_DAYS](@days [int])
>           RETURNS [datetime] WITH EXECUTE AS CALLER
>                AS 
>     EXTERNAL NAME [Bugzilla.MSSQL].[UserDefinedFunctions].[FROM_DAYS]
> END
> );
> 
>   (You might want to use something different than "END" though.)
> 
>   Alternately, since there are so many of these, you might want to put their
> name, arguments, and return values into a data structure and have a function go
> through the data structure and create them.
> 

Will think about this.

> >+    $self->do("IF  EXISTS (SELECT name FROM sys.procedures WHERE name = N'sp_alter_column_default' and type = 'P')
> 
>   Nit: Very long line.

Hehe get a bigger monitor :-) Wrapping on 80 when possible yes? Think I heard about that after most of this was complete.

> >+    $self->do(q|CREATE PROC [sp_alter_column_default]
> 
>   I don't really understand what this is doing, but that's probably just
> because I haven't looked at the Schema part yet.
> 
> >+    $self->do("IF  EXISTS (SELECT name FROM sys.procedures WHERE name = N'sp_alter_primarykey' and type = 'P')
> 
>   You're telling me that there's no way to do this with normal SQL?

Yep, unless someone knows a trick I haven't see, always possible.

> >+sub _bz_add_field_table {
> > [snip]
> 
>   This is not code you should be overriding, and I don't see any reason why you
> would have to. It looks just like a direct copy of the superclass's code.
> 
> >+sub bz_add_field_tables {
> >+    my ($self, $field) = @_;
> >+    
> >+    $self->_bz_add_field_table($field->name,
> >+                               $self->_bz_schema->FIELD_TABLE_SCHEMA, $field->type);
> >+    if ($field->type == FIELD_TYPE_MULTI_SELECT) {
> >+        my $ms_table = "bug_" . $field->name;
> >+        $self->_bz_add_field_table($ms_table,
> >+            $self->_bz_schema->MULTI_SELECT_VALUE_TABLE);
> >+ 
> >+        $self->updatecreate_column_refs;
> 
>   Instead of copying all of the code, you should be calling
> SUPER::bz_add_field_tables and then doing your custom piece afterward. Also,
> your function name should start with bz_ or _bz_.

Will review, there was a reason at one point, might have been more relevant before I ended up killing all the foreign keys in favor of triggers.

> >+sub get_random_name()
> 
>   This already exists in Bugzilla::Util as generate_random_password.

Again, I think there was a reason, will review.

> >+sub adjust_statement {
> >+    my ($sql) = @_;
> >+
> >+    # We can't just assume any occurrence of "''" in $sql is an empty
> >+    # string, since "''" can occur inside a string literal as a way of
> >+    # escaping a single "'" in the literal.  Therefore we must be trickier...
> >+
> >+    # split the statement into parts by single-quotes.  The negative value
> >+    # at the end to the split operator from dropping trailing empty strings
> >+    # (e.g., when $sql ends in "''")
> >+    my @parts = split /'/, $sql, -1;
> > [snip]
> 
>   Please factor this code out into a separate function if you're going to share
> it with Oracle.

Not sure what your asking, the whole function?, the mssql specific items?

> >+    # MSSQL doesn't have LIMIT, so if we find the LIMIT comment, wrap the
> >+    # query with "SELECT * FROM (...) WHERE rownum < $limit"
> >+    my ($limit,$offset) = ($part =~ m{/\* LIMIT (\d*) (\d*) \*/}o);
> 
>   I don't know if this is in the Oracle part too, but it occurs to me that one
> could specify a LIMIT that was a column, though we never do this currently.

Yep in Oracle.pm, I think I customized slightly if I recall.
 
> >+        # Look for a LIMIT clause
> >+        ($limit) = ($nonstring =~ m(/\* LIMIT (\d*) \*/)o);
> 
>   Why do you essentially have this code twice?

Is in Oracle.pm

>   Also, I think you don't need $has_from, $has_select, or probably this entire
> loop.
> 

In Oracle.pm

To expand, it wasn't broken, beyond the brokenness of having to do it to begin with, so wasn't looking to fix it.

> >+    if (defined($limit)) {
> >+        if ($new_sql !~ /\bWHERE\b/) {
> >+            $new_sql = $new_sql." WHERE 1=1";
> 
>   Nit: Space around the period.
> 
>   Also, you can't just paste a WHERE clause on some SQL--it's not valid at the
> end of every statement. And what if we start using UNION?

Again in Oracle.pm

> >+         my ($before_where, $after_where) = split /\bWHERE\b/i,$new_sql;
> 
>   What if it has the word WHERE in a literal string? (Like, what if I search
> for "WHERE" on query.cgi?)

One hopes that your using placeholders which haven't been populated at this point. Beyond that no idea, see Oracle.pm

> 
> >+         if (defined($offset)) {
> >+             if ($new_sql =~ /(.*\s+)FROM(\s+.*)/i) { 
> 
>   And what if it has the word FROM in a literal string?

ditto Oracle.pm

> 
> >+                 $before_where = "$before_from FROM ($before_from,"
> >+                             . " ROW_NUMBER() OVER (ORDER BY (SELECT 1)) R "
> >+                             . " FROM $after_from ) AS $random_name"; 
> >+                 $after_where = " $random_name.R BETWEEN $offset+1 AND $limit+$offset";
> 
>   Why are you doing it before FROM? We need to limit after GROUP BY.

Because we are wrapped in a () need to name the result set for mssql to see it as valid, randomizing a name here lets sql keep trucking along.

> >+         $before_where =~ s/^(select +(distinct)?)/$1 top $limit /i;
> 
>   Nit: Use \s+ instead of a raw space with a plus character after it.
> 
> >+    $new_sql =~ s/CURRENT_DATE\(?\)?/CAST(GETDATE() as DATE)/igo;
> 
>   Oh, CURRENT_DATE in a raw string....
> 
> >+    $new_sql =~ s/LOCALTIMESTAMP\(0\)/GETDATE()/igo; # Might have to change this is we do more than (0) in the future
> 
>   And so on....
> 
> >+    $new_sql =~ s/NOW\(\)/GETDATE()/igo;
> 
>   And so on....
> 
> >+    $new_sql =~ s/\(?CAST\((\b[\w\.]+\(?\)?) as (\w+\(?\)?)\) (\+|\-) \/\*INTERVAL (\d+|\?) (\b\w+\b)\*\/\)? (\+|\-) \/\*INTERVAL (\d+|\?) (\b\w+\b)\*\//CAST(DATEADD($5, $3$4, DATEADD($8, $6CAST($7 as int), $1)) as $2 )/;
> 
>   This is pretty hard to read. Could you use /x and make it more readable?

Mayhaps, but why go for readability now :-) I'll see what I can do.

>   Also, why are you doing this here instead of in sql_interval?

If I recall because we have to factor in an interval of a interval or some such thing, I believe this occurs in whine.pl or collectstats.pl (or what ever it was called) So if it could be cleaned up there... then we can I suppose.

> >+    $new_sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;
> 
>   MS-SQL doesn't support the ANSI syntax for that?
Nope, we discussed this before, also in relation to the 3rd param being required, which should be, else you are relying on an implicit rather than explicit which can be confusing at times.

> >+sub do {
> > [snip]
> 
>   If you're going to share these with Oracle, they should go into their own
> class that you can subclass instead of Bugzilla::DB.

I believe I altered them all, slightly different, if not then sure.
> 
> >+sub db_lock {
> 
>   What's this for?
part of %lock, again can should be removed.
> 
> >+# Should really live in it's own .pm file...
> 
>   Nit: its
> 
>   (it's is a contraction, "its" is a possessive pronoun like his or hers, which
> also don't have apostrophes in them.)

yes, or simpler, if 'is' can follow it then it's else its

> 
> >+# DNI function override for MSSQL sql adjustments
> 
>   DNI? (Is that supposed to be DBI?)

Probably.

> 
> >+package Bugzilla::DB::Mssql::st;
> > [snip]
> 
>   And these should go into their own class as well, that both MS-SQL and Oracle
> can subclass.

Previous comment applies.

>   Let's call the package Bugzilla::DB::StatementModifier, and it can also
> contain Bugzilla::DB::StatementModifier::st.

Sound like a plan.

> 
> >=== added file contrib/mssql_clr_functions/Bugzilla.MSSQL/AssemblyInfo.cs
> 
>   I would love to review these, but I simply don't have the knowledge or
> expertise. I'll put a call out to the developers list.

Thanks MKanat

An I don't feel you were harsh at all, quite enjoyed reading the feedback. Thanks for you effort on this and Bugzilla as a whole.
Note, the C# can not be auto deployed from Visual Studio it need to be manually deployed. Visual Studio during deploy objects to setting MaxByteSize = -1, which allows for NVARCHAR(MAX) on the group_concat function. I'm hoping MS fixes that in Visual Studio 2010, but it doesn't look like it is. Someone might mention using [SqlFacet(MaxSize = -1)], that doesn't work for the Aggregate functions for [Microsoft.SqlServer.Server.SqlUserDefinedAggregate(Microsoft.SqlServer.Server.Format.UserDefined, MaxByteSize = -1)]

If anyone needs to know how to manually deploy, a quick google returns plenty of examples or drop me a line.
re Bugzilla::DB::StatementModifier

Oracle uses this Bugzilla::DB::Oracle::_fix_hashref($ref); in most of the functions, I dropped it's usage in MSSQL, should be fine to create Bugzilla::DB::StatementModifier by changing it to 
Bugzilla::DB::_fix_hashref($ref) and just return for mssql. Or alternatively perhaps put it in Util.pm as it could be used for non-sql operations. Make it a constant to to see if we need to use it or not.

Thoughts?
(In reply to comment #15)
> Noting I related alot of nits or items to being the result of copy and paste
> from Mysql.pm or Oracle.pm I will comment those items as <CPItem> purely to
> save typing a full response, generally take it mean that the item is removable
> or simple acknowledgment of the comment with the pointer back to a Copy and
> Paste.

  That's fine.

> >   Untrue.
> 
> <CPItem>

  Does need to be fixed. Should also be fixed in Oracle.pm by Xiaoou, but that's a separate issue.

> Actually that shouldn't be there, some internal code here uses it.

  Okay. Your internal code won't work right under mod_perl, FWIW.

> > >+# In reality, you could have a LOT more comments than this, because 
> > >+# MAX_COMMENT_LENGTH is big.
> > >+use constant MAX_COMMENTS => 50;
> > 
> <CPItem>

  Yeah, should go.

> > >+use constant EMPTY_STRING  => '';
> > 
> >   You don't need to specify that.
> 
> So long as that empty string placer holder doesn't get used.. <CPItem>

  It's Oracle-specific, because oracle thinks an empty string IS NULL.


> >   Um, shouldn't that be specifyable somehow? Or at least in a comment.
>
> YES, left in that manner though since we don't seem to allow for this in other
> packages, how might we want to do this? Optimally it would be a item in
> localconfig I would assume, but unique to mssql felt odd when I thought about
> it. And this really is built for the SQL Server Native Client 10.0 driver.
> Which is freely downloadable from the Microsoft.

  Hmm. So there's no way to have it auto-specify that, then, or just let something figure it out on its own? Does it really have to have an explicit version specifier? At the worst, it should be in a constant at the top of the file.

> MARS is Microsoft's version of Multiple Active Results Sets

  Okay, thanks. So I assume that's what lets us have multiple selects going at once on one connection?

> App allows you see connection in mssql activity console and know what
> application is connecting otherwise reports as apache, which is not useful when
> you may have more than one apache app connection.. which I do.

  Ah, that's great, okay. You might want to use $0 instead, though?

> PWD: I'll test it later. Copied from connections string site.

  Okay. :-) Yeah, $user and $pass should go into DBI arguments instead of the DSN.

> > >+    my $attrs = { odbc_default_bind_type => '-9', LongReadLen => 5000000 };
> > 
> >   What are those for? Also, why isn't LongReadLen a constant?

  I didn't see an answer there. (Was mostly curious what the odbc_default_bind_type is.)

> Referencing a ..um.. disagreement you had with another contributor on yahoo
> groups, least ways I mean that in that you shot down his response to someone
> else.. yes it can work perfectly fine. It just takes effort to compile so
> modules. 

  Okay. Have those patches been sent to the maintainer of those modules?

> That said similar to %lock this could be removed.

  Well, it doesn't hurt. But the code you have there (list of references, not a reference to a list) almost certainly doesn't actually work.

> >   Ummm, there's no way to get query analysis data out of MS-SQL via SQL?
> 
> If I work hard enough, probably, the effort I did put in at one point showed
> that what builtin function there were, were basically less useful, I mean that,
> than the raw sql being displayed. So yeah, unless someone has time to find a
> real way to accomplish this.

  Well, the raw SQL is always displayed. What are the built-in functions? This is not critical; it just helps with analyzing query.cgi slowness.

> >   Umm, what if we just inserted several rows into several different tables?
> > It's not something that I think we do, but what if we did?
> >
> Then we would have a problem, but is that different than mysql's
> LAST_INSERT_ID()

  Oh, yeah, I suppose you're right. OK.

> > >+    return "dbo.REGEXP($expr,$pattern, 0) = 1";
> > 
> >   What's the 0 for?
> 
> One of my custom CLR functions, to enable (1) or disable (0) case sensitivity

  Oh, that's nice. Okay.

> >   Nit: Should have a better variable name.
> 
> All come on :-) you know you like it.

  LOL!

> You will run you server dry of memory and possibly swap space before you fill a
> nvarchar(max) if I recall think terabytes. Granted that is a possible size not
> a practical one.

  Ahhh, okay. I was thinking of other databases where the max varchar size is like 4000 characters.

> >   Then let's not have the code here. But I don't see why it would be in a GROUP
> > BY.
> 
> Don't recall the scenarios, but if I removed it its because I ran in to it with
> no work around. Will review notes.

  Okay. I'd be happy to work with you on it; it's nice to have real fulltext.

> >   Why the spaces around the SQL?
> 
> Readability in code, but no reason, probably paste it from a query window.

  Okay. For consistency with the other drivers, let's get rid of the spaces.

> > >+sub sql_group_concat {
> > >+    my ($self, $column, $separator, $sortorder) = @_;
> > 
> >   We don't support $sortorder yet, so I'd wait until we actually have that
> > before adding it to this patch.
> 
> With out altering the CLR, which I'd rather not it still need to be referenced
> internally to the function, but the param could be dropped.

  Okay. But you might have to alter the CLR anyway, when we actually add this. sortorder will be a column name or an expression, not ASC or DESC or something like that.

> >   That should be an install_string. Also, it probably shouldn't happen every
> > time somebody runs checksetup.pl. If it has to happen every time, then there
> > shouldn't be a string printed.
> 
> Yes can be moved there, was living there so that I didn't have to wipe my
> database with each CLR edit. Hadn't really occurred to me, but makes sense. I
> don;t suppose there is a commandline switch to trigger specific install
> functions, at least for this it would be useful at times. Eg migrated data to a
> new database and you want to insure everything is enabled and registered etc..
> That said it only takes about 2 seconds to deploy as is...?

  Perhaps you should put a version number into the CLR or include some sort of date about it in the database, so we could check if we have to update it.

> >   Instead of having an enormous string inside of this file, which will be
> > loaded into memory for every httpd child, could the CLR code live in an
> > external file?
> 
> ..Yes.. we had some debate here on that actually. Feel free to side with me on
> that one :-), since it appears you are :-P

  Yes, I am siding with you. :-)

> Hehe get a bigger monitor :-) Wrapping on 80 when possible yes? Think I heard
> about that after most of this was complete.

  Yes, wrap on 80 when possible.

> > >+    $self->do("IF  EXISTS (SELECT name FROM sys.procedures WHERE name = N'sp_alter_primarykey' and type = 'P')
> > 
> >   You're telling me that there's no way to do this with normal SQL?
> 
> Yep, unless someone knows a trick I haven't see, always possible.

  This seems to indicate it works just like other DBs:

  http://msdn.microsoft.com/en-us/library/ms174123.aspx

> >   Instead of copying all of the code, you should be calling
> > SUPER::bz_add_field_tables and then doing your custom piece afterward. Also,
> > your function name should start with bz_ or _bz_.
> 
> Will review, there was a reason at one point, might have been more relevant
> before I ended up killing all the foreign keys in favor of triggers.

  Okay.

> > >+sub get_random_name()
> > 
> >   This already exists in Bugzilla::Util as generate_random_password.
> 
> Again, I think there was a reason, will review.

  Without looking at the code, it might be that generate_random_password can use numbers and you don't want them in the name? In any case, if you keep this sub as a utility locally in the module, its name should be prefixed with an underscore.

> > >+sub adjust_statement {
> >   Please factor this code out into a separate function if you're going to share
> > it with Oracle.
> 
> Not sure what your asking, the whole function?, the mssql specific items?

  The part that splits the string and parses it in parts. If you need to do something to each part, you can give the new function a callback that tests each part.

> > >+        # Look for a LIMIT clause
> > >+        ($limit) = ($nonstring =~ m(/\* LIMIT (\d*) \*/)o);
> > 
> >   Why do you essentially have this code twice?
> 
> Is in Oracle.pm

  I think perhaps we should audit that code. 

  Perhaps ask Xiaoou why he put it there.

> >   Also, I think you don't need $has_from, $has_select, or probably this entire
> > loop.
> > 
> 
> In Oracle.pm

  Yes, but it uses those.

> >   Also, you can't just paste a WHERE clause on some SQL--it's not valid at the
> > end of every statement. And what if we start using UNION?
> 
> Again in Oracle.pm

  Seriously? Wow...I thought Oracle did all of this inside the loop.

> > >+         my ($before_where, $after_where) = split /\bWHERE\b/i,$new_sql;
> > 
> >   What if it has the word WHERE in a literal string? (Like, what if I search
> > for "WHERE" on query.cgi?)
> 
> One hopes that your using placeholders which haven't been populated at this
> point. Beyond that no idea, see Oracle.pm

  We're not using placeholders in Search.pm.

> > >+         if (defined($offset)) {
> > >+             if ($new_sql =~ /(.*\s+)FROM(\s+.*)/i) { 
> > 
> >   And what if it has the word FROM in a literal string?
> 
> ditto Oracle.pm

  Wow, you are correct. It makes no sense, let's see if you can fix it better for MS-SQL.

> >   Why are you doing it before FROM? We need to limit after GROUP BY.
> 
> Because we are wrapped in a () need to name the result set for mssql to see it
> as valid, randomizing a name here lets sql keep trucking along.

  Why not just wrap the entire query and do a SELECT *?

> >   Also, why are you doing this here instead of in sql_interval?
> 
> If I recall because we have to factor in an interval of a interval or some such
> thing, I believe this occurs in whine.pl or collectstats.pl (or what ever it
> was called) So if it could be cleaned up there... then we can I suppose.

  Yeah, I'd way rather see this in sql_interval. If there's some code that's not using sql_interval, it should be.

> > >+    $new_sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;
> > 
> >   MS-SQL doesn't support the ANSI syntax for that?
> Nope, we discussed this before, also in relation to the 3rd param being
> required, which should be, else you are relying on an implicit rather than
> explicit which can be confusing at times.

  Lame. Yeah, I do seem to recall we discussed it. Anyhow, you'll have to fix this inside of the parts loop, not outside of it.

> An I don't feel you were harsh at all, quite enjoyed reading the feedback.
> Thanks for you effort on this and Bugzilla as a whole.

  I'm glad. :-) Hey, you're welcome. :-)
(In reply to comment #17)
> Oracle uses this Bugzilla::DB::Oracle::_fix_hashref($ref); in most of the
> functions, I dropped it's usage in MSSQL, should be fine to create
> Bugzilla::DB::StatementModifier by changing it to 
> Bugzilla::DB::_fix_hashref($ref) and just return for mssql. Or alternatively
> perhaps put it in Util.pm as it could be used for non-sql operations. Make it 
a
> constant to to see if we need to use it or not.

  Hmm. Yeah, tricky, because it's being used in two classes. There are some Perl OO tricks you could do to handle it....create another class called Bugzilla::DB::HashrefFixer, and then do some clever stuff with @ISA in Oracle.pm, but I'm not sure that's a good idea.

  I don't know, let me know what you come up with. Don't put it in Util.pm, though, and I'd prefer it wasn't in Bugzilla::DB directly, also.
So on the CLR import code (the really big chuck of random looking text) what shall we do.. create /bugzilla/Bugzilla/DB/Mssql/CLR.inc ?

We can then read the file during install or as needed?
Some remarks I have in mind while reading the comments on this bug:

Sql server query plans can be obtained in text form with the following when within SSMS. I don't know how you would do the equivalent from perl:

SET SHOWPLAN_ALL ON
GO

--run query
GO

SET SHOWPLAN_ALL OFF
GO

Don't quote me on this since I cannot seem to find the source I had, but I am pretty sure that mssql 2005 introduced varchar(max), nvarchar(max) and varbinary(max). Mssql 2008 then optimized the storage of strings, unicode strings and bitstrings for these 3 types. It is currently considered depreciated to use the TEXT, NTEXT and IMAGE storage types. I am under the impression that nvarchar(#) is simply internally stored as nvarchar(max) with a constraint on the length of the string.

> +        CREATE ASSEMBLY [Bugzilla.MSSQL]
> +        AUTHORIZATION [dbo]
> +        FROM 0x4D5A9000030...
This varbinary constant here is the actual bits of the assembly, isn't it?

Couldn't you set the path+filename for the dll here?
from msdn (http://msdn.microsoft.com/en-us/library/ms189524.aspx):
> CREATE ASSEMBLY HelloWorld 
> FROM @SamplesPath + 'HelloWorld\CS\HelloWorld\bin\debug\HelloWorld.dll'
> WITH PERMISSION_SET = SAFE;

from message on ML:
Should Bugzilla style conventions be used for this project or standard C# ones?

Ignoring any of the style issues there are actual code issues I am seeing. I don't see any actual security issues here, but there could very well be race conditions (which could result in bad/missing data). There will be people who have issues with using managed functions in general, but I don't see a way around it particularly for the concat aggregate and the regex search (everything else I think is doable in raw sql functions; though I suppose a regex engine is possible there too, I wouldn't want to try writing it).


Anyway, assuming standard (mostly because I don't feel comfortable enough with the Bugzilla ones since not having done any bz or perl coding in quite a while now; I am pretty sure other things will need to change if it is to be Bugzilla):

> Bugzilla.MSSQL.pidb
> Bugzilla.MSSQL.suo
> Bugzilla.MSSQL.userprefs

these files should be ignored, not added (IDE/compiler generated)

into actual changes:
AssemblyInfo.cs
===============
> +using System.Runtime.CompilerServices;

this line is unused in the current file (hence should not be there)

> +[assembly: AssemblyTitle("Bugzilla.MSSQL")]

Nit: Standard C# naming practices would have this be Bugzilla.Mssql (or perhaps MSSql, but eww...; mssql is usually referred to in .NET as SqlServer though, don't know if that should be given any weight or not)
see: http://msdn.microsoft.com/en-us/library/ms229043.aspx
quote: Do capitalize only the first character of acronyms
with three or more characters, except the first word of a camel-cased
identifier

overall assemblyinfo.cs nits:
1. Every bit of information in this file should be filled in
2. The following addititional code should be in assemblyinfo.cs as standard practice:
> using System.Runtime.InteropServices;
>
> [assembly: AssemblyFileVersion("1.0.*")]
>
> // Setting ComVisible to false makes the types in this assembly not visible
> // to COM components.  If you need to access a type in this assembly from
> // COM, set the ComVisible attribute to true on that type.
> [assembly: ComVisible(false)]
>
> // The following GUID is for the ID of the typelib if this project is exposed to COM (create a new GUID for this)
> [assembly: Guid("f06345e0-a4d4-47d8-99fc-5af3232233cf")]

(reasoning: it is far better to be explicit up front than to add it in later on)

Bugzilla.MSSQL.cs
=================
Filename should be Bugzilla.Mssql.cs

> +using System.Data;
> +using System.Data.SqlClient;

unused, shouldn't be there

> +using System.Text;

unused, shouldn't be there

> +using NaturalComparer;

unused, shouldn't be there

> +public partial class UserDefinedFunctions

This class is not partial, it is fully defined. This class should be in a Bugzilla.Mssql namespace. This class should be marked static.

> +    [Microsoft.SqlServer.Server.SqlFunction()]

Namespace is unnecessary (included in using statements), used multiple times in the file, will not repeat myself over and over again

> +    public static SqlBoolean REGEXP([SqlFacet(MaxSize = -1)]SqlString subject, [SqlFacet(MaxSize = -1)]SqlString pattern, SqlBoolean ignorecase)

Function name is not in line with naming conventions for C# function names, Should be Regexp
nit: In .NET regexes are of the type Regex, not Regexp. Should the function name match?

> +        string str_subject = subject.Value;
> +        string str_pattern = pattern.Value;

names are nonstandard, should be strSubject and strPattern, though you should avoid any form of Hungarian notation. Better names would be subjectValue and patternValue.

> +            test = System.Text.RegularExpressions.Regex.IsMatch(str_subject, str_pattern,RegexOptions.IgnoreCase);

namespace is unnecessary (included in using statements)

> +            test = System.Text.RegularExpressions.Regex.IsMatch(str_subject, str_pattern);

namespace is unnecessary (included in using statements)

> +        if (test)
> +        {
> +            return SqlBoolean.True;
> +        }
> +        else
> +        {
> +            return SqlBoolean.False;
> +        }

nit: stylistically I do not like this code; better would be to pull out the Regexp function to:

> private static bool RegexpImpl(SqlString subject, SqlString pattern, SqlBoolean ignorecase)

then replace the implementation of Regexp with:

> return ToSqlBoolean(RegexpImpl(subject, pattern, ignorecase));

and have a function ToSqlBoolean:

> private static SqlBoolean ToSqlBoolean(bool test)
> {
>    return test ? SqlBoolean.True : SqlBoolean.False;
> }

This would allow you to also replace the contents of NOT_REGEX with

> return ToSqlBoolean(!RegexpImpl(subject, pattern, ignorecase));

cutting the file shorter by about 15 lines and making the methods slightly more optimized.

> +    public static SqlBoolean NOT_REGEXP([SqlFacet(MaxSize = -1)]SqlString subject, [SqlFacet(MaxSize = -1)]SqlString pattern, SqlBoolean ignorecase)

Function name should be NotRegexp

> +    public static SqlInt32 INSTR([SqlFacet(MaxSize = -1)]SqlString find, [SqlFacet(MaxSize = -1)]SqlString inthis, SqlBoolean CaseSensitive)

Function name should be InStr
parameter inthis should be inThis
parameter CaseSensitive should be caseSensitive

nit: Why does this one have caseSensitive and the regex functions have ignorecase? The parameters should be consistent with each other as the current method could be confusing and eventually lead to issues.

> +        if (ind == -1)
> +        {
> +            return 0;
> +        }
> +        else {
> +            return ind+1;
> +        }

condition is unnecessary, should be:

> +        return ind+1;

empty line at the end of this method should be removed

> +    [Microsoft.SqlServer.Server.SqlFunction()]
> +    public static SqlDateTime NOW()
> +    {
> +
> +        return DateTime.Now;
> +    }
> +
> +    [Microsoft.SqlServer.Server.SqlFunction()]
> +    public static SqlDateTime LOCALTIMESTAMP()
> +    {
> +        return NOW();
> +    }

Aren't both of these doing the same thing that getdate() does?
Again nonstandard names (will stop saying this now for future methods)

nit: empty line at beginning of NOW() (empty lines at the beginning and end of methods are quite pointless and IMO can only possibly serve to be an attempt to get the reader to disassociate the method name from its implementation or to make syntax errors when modifying and I decree that this is a bad thing in all cases)

> +    public static SqlInt32 TO_DAYS(SqlDateTime date)
> +    {
> +        DateTime SubjectDate = date.Value;
> +        DateTime Y1900 = new DateTime(1900, 1, 1);
> +
> +        long elapsedTicks = SubjectDate.Ticks - Y1900.Ticks;
> +        TimeSpan elapsedSpan = new TimeSpan(elapsedTicks);
> +        return Convert.ToInt32(elapsedSpan.TotalDays);
> +
> +    }

1. Shouldn't you be checking if date is null?
2. Why are you subtracting ticks like that? Can't you just do
> +        return Convert.ToInt32((SubjectDate - Y1900).TotalDays);
3. variable names (should be camelCase), will not repeat again

> +        Y1900.AddDays(days.Value);

shouldn't be here, statement has no side affect and is repeated anyway on the next line

> +        if (SqlDate.IsNull) return SqlChars.Null;
> +        string str_format = format.Value;
> +        DateTime Date = SqlDate.Value;
> +
> +        string a = Date.DayOfWeek.ToString();

if statement uses different syntax than before; should be consistent
variable "a" is unused

As for the if statements: most of the time spent inside string.Replace is spent doing the same thing that string.Contains does. Both take far longer than evaluating whatever the replacement would be.
I would replace that whole section (from the first if to the return statement) with:
>        return new SqlChars(str_format
>            .Replace(@"%a", Date.DayOfWeek.ToString().Substring(0, 3))
>            .Replace(@"%b", ((ShortMonthOfYear) Date.Month).ToString())
>            .Replace(@"%c", Date.Month.ToString())
>            .Replace(@"%D", Date.Day.ToString() + GetDateSuffix(Date))
>            .Replace(@"%d", Date.Day.ToString().PadLeft(2, '0'))
>            .Replace(@"%e", Date.Day.ToString())
>            .Replace(@"%f", Date.Millisecond.ToString().PadRight(6, '0'))
>            .Replace(@"%H", Date.Hour.ToString().PadLeft(2, '0'))
>            .Replace(@"%h", Date.Hour.ToString().PadLeft(2, '0'))
>            .Replace(@"%I", Date.Hour.ToString().PadLeft(2, '0'))
>            .Replace(@"%i", Date.Minute.ToString().PadLeft(2, '0'))
>            .Replace(@"%j", Date.DayOfYear.ToString().PadLeft(3, '0'))
>            .Replace(@"%k", Date.Hour.ToString().PadLeft(2, '0'))
>            .Replace(@"%l", Date.Hour.ToString())
>            .Replace(@"%M", ((LongMonthOfYear) Date.Month).ToString())
>            .Replace(@"%m", Date.Month.ToString().PadLeft(2, '0'))
>            .Replace(@"%p", Date.Hour >= 12 ? @"PM" : @"AM")
>            .Replace(@"%r", Date.TimeOfDay.ToString())
>            .Replace(@"%S", Date.Second.ToString().PadLeft(2, '0'))
>            .Replace(@"%s", Date.Second.ToString().PadLeft(2, '0'))
>            .Replace(@"%T", Date.ToShortTimeString().ToString())
>            .Replace(@"%U", WeekOfYear(Date).ToString())
>            .Replace(@"%u", WeekOfYear(Date).ToString())
>            .Replace(@"%V", WeekOfYear(Date).ToString())
>            .Replace(@"%v", WeekOfYear(Date).ToString())
>            .Replace(@"%W", Date.DayOfWeek.ToString())
>            .Replace(@"%w", ((LongDayofWeek) Date.DayOfWeek).ToString())
>            .Replace(@"%X", Date.Year.ToString())
>            .Replace(@"%x", Date.Year.ToString())
>            .Replace(@"%Y", Date.Year.ToString())
>            .Replace(@"%y", Date.Year.ToString().Substring(2, 2))
>            .Replace(@"%%", @"%"));

That said, I think there are some bugs here:
I doubt that %H, %h, %I and %k are supposed to return the same value (similarly U,u,V,v, and X,x,Y).
Is %f correct or should it also be PadLeft?
You have an if for capital S and then do a replace with lowercase s.
Could a test be written to ensure the correctness of this function?
Are there l10n issues here?

> +    [Serializable]
> +    private enum LongDayofWeek
> ...
> +
> +    [Serializable]
> +    private enum ShortMonthOfYear
> ...
> +
> +    [Serializable]
> +    private enum LongMonthOfYear

These should be public to prevent a future compiler from obfuciating them.

> +    private static string GetDateSuffix(SqlDateTime SqlDate)
> +    {
> +        if (SqlDate.IsNull) return null;
> +
> +        DateTime date = SqlDate.Value;
> +
> +        string day = date.Day.ToString();
> +        if (day.EndsWith("1"))
> +        {
> +            return day.StartsWith("1") && date.Day != 1 ? "th" : "st";
> +        }
> +        else if (day.EndsWith("2"))
> +        {
> +            return day.StartsWith("1") ? "th" : "nd";
> +        }
> +        else if (day.EndsWith("3"))
> +        {
> +            return day.StartsWith("1") ? "th" : "rd";
> +        }
> +        return "th";
> +    }

parameter being passed in is DateTime, not SqlDateTime (makes first if and assignment unnecessary)
first if syntax inconsistent with second if syntax
else keywords are unneeded

> +    private static int WeekOfYear(SqlDateTime SqlDate)
> +    {
> +        if (SqlDate.IsNull) return 0;
> +
> +        DateTime date = SqlDate.Value;
> +
> +        System.Globalization.CultureInfo ci = System.Threading.Thread.CurrentThread.CurrentCulture;
> +        System.Globalization.Calendar cal = ci.Calendar;
> +        System.Globalization.CalendarWeekRule cwr = ci.DateTimeFormat.CalendarWeekRule;
> +        DayOfWeek fdow = ci.DateTimeFormat.FirstDayOfWeek;
> +        return cal.GetWeekOfYear(date, cwr, fdow);
> +    }

parameter being passed in is DateTime, not SqlDateTime (makes first if and assignment unnecessary)
if syntax inconsistant again
using CurrentThread.CurrentCulture here, is this ok in sql app domain context (I think so but I am not certain)?
You use the current culture here but previously you used InvariantCulture for case sensitivity. Does that present an internationalization issue (how does mysql do case sensitivity particularly for accented characters? does it vary per culture and if so should this or should it do it however mssql does case sensitivity when it is configured as insensitive, if they are different?)?

> +public struct GROUP_CONCAT : Microsoft.SqlServer.Server.IBinarySerialize

should be in separate file, should be in Bugzilla.Mssql namespace, should be named GroupConcat
This class feels smelly:
1. A List<T> datatype is almost always better than an ArrayList (and in worst case is equivalent)
2. this keyword is not necessary for accessing your own members unless they share a name with something else you are trying to use
3. Read is not a compliment to Write
4. Lists are not thread safe. I wouldn't doubt there may be race conditions in this code.

I would write this class as follows (completely untested code, but it looks less buggy I think):
> public struct GROUP_CONCAT : Microsoft.SqlServer.Server.IBinarySerialize {
>    private static readonly object lockobj = new object();
>    private List<string> _remembered;
>    private List<GROUP_CONCAT> _merged;
>    private string _seperator;
>    private string _sortOrder;
>    public void Init() {
>        _remembered = new List<string>();
>        _merged = new List<GROUP_CONCAT>();
>        _seperator = null;
>        _sortOrder = null;
>    }
>    public void Accumulate([SqlFacet(MaxSize = -1)]SqlString value, [SqlFacet(MaxSize = -1, IsNullable = true)]SqlString seperator, SqlString sortOrder) {
>        // Default a Null to Seperator to an empty string
>        _seperator = seperator.IsNull ? "" : seperator.Value;
>
>        // Add the Seperator as the first array element
>        _sortOrder = sortOrder.IsNull ? "NTRL" : sortOrder.Value.ToUpper();
>
>        if (_sortOrder != "ASC" && _sortOrder != "DESC" && _sortOrder != "NTRL") {
>            throw new ArgumentException(sortOrder.Value + " is not valid for Sort Order");
>        }
>
>        if (value.IsNull) return;
>
>        lock (lockobj) {
>            _remembered.Add(value.Value);
>        }
>    }
>    public void Merge(GROUP_CONCAT group) {
>        if(group._seperator!=_seperator) {
>            throw new ArgumentException("Seperators do not match: " + group._seperator+", "+_seperator);
>        }
>        if(group._sortOrder!=_sortOrder) {
>            throw new ArgumentException("Sort Orders do not match: " + group._sortOrder + ", " + _sortOrder);
>        }
>
>        //not sure about the thread safety of Add/AddRange so...
>        lock (lockobj) {
>            _merged.Add(group);
>        }
>    }
>
>    private IEnumerable<string> Enumerate() {
>        lock (lockobj) {
>            foreach (var value in _remembered) {
>                yield return value;
>            }
>            foreach (var list in _merged) {
>                foreach (var value in list.Enumerate()) {
>                    yield return value;
>                }
>            }
>        }
>    }
>
>    public SqlString Terminate() {
>        List<string> all = new List<string>(Enumerate());
>
>        // If ASC or DESC then Sort
>        //new NaturalComparer(NaturalComparerOptions.RomanNumbers));
>        if (_sortOrder != "NTRL") all.Sort(new NaturalComparer.NaturalComparer());
>        // If DESC then reverse the Sort order
>        if (_sortOrder == "DESC") all.Reverse();
>        return new SqlString(string.Join(_seperator, all.ToArray()));
>    }
>
>    public void Read(System.IO.BinaryReader r) {
>        int itemCount = r.ReadInt32();
>
>        // Retrieve the Seperator for use by Terminate()
>        _seperator = r.ReadString();
>        _sortOrder = r.ReadString();
>        lock (lockobj) {
>            _remembered = new List<string>(itemCount - 2);
>        }
>        for (int i = 2; i <= itemCount - 1; i++) {
>            lock (lockobj) {
>                _remembered.Add(r.ReadString());
>            }
>        }
>    }
>
>    public void Write(System.IO.BinaryWriter w) {
>        List<string> all = new List<string>(Enumerate());
>        w.Write(all.Count);
>        w.Write(_seperator);
>        w.Write(_sortOrder);
>        foreach (string s in all) {
>            w.Write(s);
>        }
>    }
> }

Bugzilla.MSSQL.csproj
=====================
> +    <RootNamespace>Bugzilla.MSSQL</RootNamespace>
> +    <AssemblyName>Bugzilla.MSSQL</AssemblyName>
> +    <TargetFrameworkVersion>v3.5</TargetFrameworkVersion>

Names again...
Should this be targeting .net 3.5? or should it target 2.0 (for a potentially wider audience)? It doesn't appear to be using any 3.5 assemblies or syntax (unless you use the GROUP_CONCAT class I provided above, in which case you would have to remove the 3.5 var syntax).

NaturalSortComparer.cs
======================
Assuming this code is correct (not even going to bother going there at least yet) my most significant issue is that the variable names are not very explicit. In my company we use _camelCase for fields, camelCase for method variables and parameters and PascalCase for just about everything visible outside of the class.
These classes should also be in a Bugzilla.Mssql namespace and they should be in separate files per class (at least according to C# conventions that I know of, perhaps the bugzilla devs do not care to follow this).

I am finding myself questioning what the whole point of this class is.
What advantages does it provide that Bugzilla might need that will permit it to waste so much time (parsing strings) as part of the sql execution?
(In reply to comment #18)
> > <CPItem>
> 
>   Does need to be fixed. Should also be fixed in Oracle.pm by Xiaoou, but
> that's a separate issue.

We'll need to work with Xiaoou to move this all to Bugzilla::DB::StatementModifier anyhow so addressable then..


> > Actually that shouldn't be there, some internal code here uses it.
> 
>   Okay. Your internal code won't work right under mod_perl, FWIW.
Will relay that, didn't write that myself.

>   Hmm. So there's no way to have it auto-specify that, then, or just let
> something figure it out on its own? Does it really have to have an explicit
> version specifier? At the worst, it should be in a constant at the top of the
> file.

If you set up a System DSN, it kinda is, but your just moving where you manually selected at that point. And yes you can have multiple version of the driver installed one for e SQL 2005, for SQL 2008, etc.. Updates to a version are left named the base name, so doesn't update very often. Constant would work, maybe a prompt on install and store in data/params would be best though?

> > MARS is Microsoft's version of Multiple Active Results Sets
> 
>   Okay, thanks. So I assume that's what lets us have multiple selects going at
> once on one connection?
Yes, works well, though like the normal multiple result set warnings apply, possible odd behavior dependent on the application. Seems to be fine for Bugzilla however.

>   Ah, that's great, okay. You might want to use $0 instead, though?
Reasonable thought, makes for even better identification. Might still recommend concatenating $0 and bugzilla ('Bugzilla '.$0 on the off chance another app has the same file name.

> > > >+    my $attrs = { odbc_default_bind_type => '-9', LongReadLen => 5000000 };
> > > 
> > >   What are those for? Also, why isn't LongReadLen a constant?

odbc_default_bind_type => '-9' == SQLWCHAR  allows for NVARCHAR
LongReadLen sure I can make it a constant.

>   I didn't see an answer there. (Was mostly curious what the
> odbc_default_bind_type is.)
> 
> > Referencing a ..um.. disagreement you had with another contributor on yahoo
> > groups, least ways I mean that in that you shot down his response to someone
> > else.. yes it can work perfectly fine. It just takes effort to compile so
> > modules. 
> 
>   Okay. Have those patches been sent to the maintainer of those modules?
If I recall didn't need patches. Was just an issue of compiling on the server, we ended up going a different route in any case, so no nothing was submitted back. What we actually needed was a delayed mail send, similar to the send unsent mail script, to roll up multiple comments in to a single email, within the last x minutes.

> > > >+sub sql_group_concat {
> > > >+    my ($self, $column, $separator, $sortorder) = @_;
> > > 
> > >   We don't support $sortorder yet, so I'd wait until we actually have that
> > > before adding it to this patch.
> > 
> > With out altering the CLR, which I'd rather not it still need to be referenced
> > internally to the function, but the param could be dropped.
> 
>   Okay. But you might have to alter the CLR anyway, when we actually add this.
> sortorder will be a column name or an expression, not ASC or DESC or something
> like that.

Might I suggest rather than just column to all for any of? Sometimes you might just want the return sorted internally instead of based on another field. Not much different than a subselect with a order by.

>   Perhaps you should put a version number into the CLR or include some sort of
> date about it in the database, so we could check if we have to update it.

That I can do, in fact had one and pulled it out when I submitted the code. Ah well :-)

>   This seems to indicate it works just like other DBs:
> 
>   http://msdn.microsoft.com/en-us/library/ms174123.aspx

If I recall try it, and you'll get fun errors. Might have just given up late at night, I'll review. I know I did some investigating before hand.

> 
> > >   Why are you doing it before FROM? We need to limit after GROUP BY.
> > 
> > Because we are wrapped in a () need to name the result set for mssql to see it
> > as valid, randomizing a name here lets sql keep trucking along.
> 
>   Why not just wrap the entire query and do a SELECT *?

It is. In fact, I think, two of them Sort DESC TOP then Sort ASC TOP to give the
effect of limit x, y

> > >   Also, why are you doing this here instead of in sql_interval?
> > 
> > If I recall because we have to factor in an interval of a interval or some such
> > thing, I believe this occurs in whine.pl or collectstats.pl (or what ever it
> > was called) So if it could be cleaned up there... then we can I suppose.
> 
>   Yeah, I'd way rather see this in sql_interval. If there's some code that's
> not using sql_interval, it should be.

OK, I'll look at altering whine, it is using sql_interval but like this

$dbh-sql_interval(blah..) . ' - ' . $dbh-sql_interval(blah..)

> > > >+    $new_sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;
> > > 
> > >   MS-SQL doesn't support the ANSI syntax for that?
> > Nope, we discussed this before, also in relation to the 3rd param being
> > required, which should be, else you are relying on an implicit rather than
> > explicit which can be confusing at times.
> 
>   Lame. Yeah, I do seem to recall we discussed it. Anyhow, you'll have to fix
> this inside of the parts loop, not outside of it.

Think the issue was it splitting it up oddly, perhaps not I'll see if it works.
(In reply to comment #21)
> Some remarks I have in mind while reading the comments on this bug:
> 
> Sql server query plans can be obtained in text form with the following when
> within SSMS. I don't know how you would do the equivalent from perl:
> SET SHOWPLAN_ALL ON
> GO
> 
> --run query
> GO
> 
> SET SHOWPLAN_ALL OFF
> GO

That may have been the issue I think, using GO. GO indicating end of transaction I believe. Will review.

> Don't quote me on this since I cannot seem to find the source I had, but I am
> pretty sure that mssql 2005 introduced varchar(max), nvarchar(max) and
> varbinary(max). Mssql 2008 then optimized the storage of strings, unicode
> strings and bitstrings for these 3 types. It is currently considered
> depreciated to use the TEXT, NTEXT and IMAGE storage types. I am under the
> impression that nvarchar(#) is simply internally stored as nvarchar(max) with
> a constraint on the length of the string.

Well nvarchar vs varchar is basically that nvarchar accepts what ever the incoming encoding was with out altering it. Hence wchar support. That was introduced in 2005. Other factors sure but that is what we want. Which is why we use NVARCHAR everywhere instead of TEXT

> 
> > +        CREATE ASSEMBLY [Bugzilla.MSSQL]
> > +        AUTHORIZATION [dbo]
> > +        FROM 0x4D5A9000030...
> This varbinary constant here is the actual bits of the assembly, isn't it?

Yes

> Couldn't you set the path+filename for the dll here?
> from msdn (http://msdn.microsoft.com/en-us/library/ms189524.aspx):
> > CREATE ASSEMBLY HelloWorld 
> > FROM @SamplesPath + 'HelloWorld\CS\HelloWorld\bin\debug\HelloWorld.dll'
> > WITH PERMISSION_SET = SAFE;
No, if your db is on a different server the path and file must match and exist there too.

> from message on ML:
> Should Bugzilla style conventions be used for this project or standard C# ones?
> 
> Ignoring any of the style issues there are actual code issues I am seeing. I
> don't see any actual security issues here, but there could very well be race
> conditions (which could result in bad/missing data). There will be people who
> have issues with using managed functions in general, but I don't see a way
> around it particularly for the concat aggregate and the regex search
> (everything else I think is doable in raw sql functions; though I suppose a
> regex engine is possible there too, I wouldn't want to try writing it).
> 

I commented on the regex issue in comment above if I recall, so yah can be done but NO NO NO lol. Yes some others could be native, but C# provides more additional flexibility such as easier case management.

> Anyway, assuming standard (mostly because I don't feel comfortable enough with
> the Bugzilla ones since not having done any bz or perl coding in quite a while
> now; I am pretty sure other things will need to change if it is to be
> Bugzilla):
> 
> > Bugzilla.MSSQL.pidb
> > Bugzilla.MSSQL.suo
> > Bugzilla.MSSQL.userprefs
> 
> these files should be ignored, not added (IDE/compiler generated)

OK

> into actual changes:
> AssemblyInfo.cs
> ===============
> > +using System.Runtime.CompilerServices;
> 
> this line is unused in the current file (hence should not be there)

Is a default by Visual Studio and is only used by the compiler afaik, so doesn't really matter, will not impact the compiled object.

> > +[assembly: AssemblyTitle("Bugzilla.MSSQL")]
> 
> Nit: Standard C# naming practices would have this be Bugzilla.Mssql (or perhaps
> MSSql, but eww...; mssql is usually referred to in .NET as SqlServer though,
> don't know if that should be given any weight or not)
> see: http://msdn.microsoft.com/en-us/library/ms229043.aspx
> quote: Do capitalize only the first character of acronyms
> with three or more characters, except the first word of a camel-cased
> identifier

OK

> overall assemblyinfo.cs nits:
> 1. Every bit of information in this file should be filled in
> 2. The following addititional code should be in assemblyinfo.cs as standard
> practice:

I suppose.

> > using System.Runtime.InteropServices;
> >
> > [assembly: AssemblyFileVersion("1.0.*")]
> >
> > // Setting ComVisible to false makes the types in this assembly not visible
> > // to COM components.  If you need to access a type in this assembly from
> > // COM, set the ComVisible attribute to true on that type.
> > [assembly: ComVisible(false)]
> >
> > // The following GUID is for the ID of the typelib if this project is exposed to COM (create a new GUID for this)
> > [assembly: Guid("f06345e0-a4d4-47d8-99fc-5af3232233cf")]
> 
> (reasoning: it is far better to be explicit up front than to add it in later
> on)

Agreed

> 
> Bugzilla.MSSQL.cs
> =================
> Filename should be Bugzilla.Mssql.cs
> 
> > +using System.Data;
> > +using System.Data.SqlClient;
> 
> unused, shouldn't be there
> 
> > +using System.Text;
> 
> unused, shouldn't be there

Used by RegEx or something else if I recall.

> > +using NaturalComparer;
> 
> unused, shouldn't be there
> 
> > +public partial class UserDefinedFunctions
> 
> This class is not partial, it is fully defined. This class should be in a
> Bugzilla.Mssql namespace. This class should be marked static.

It being marked partial is based on the Visual Studio defaults, is there a reason it should be marked otherwise? Ie is it broken as partial or gain something being marked full?

> > +    [Microsoft.SqlServer.Server.SqlFunction()]
> 
> Namespace is unnecessary (included in using statements), used multiple times in
> the file, will not repeat myself over and over again

Will check that, thought it mattered.

> > +    public static SqlBoolean REGEXP([SqlFacet(MaxSize = -1)]SqlString subject, [SqlFacet(MaxSize = -1)]SqlString pattern, SqlBoolean ignorecase)
> 
> Function name is not in line with naming conventions for C# function names,
> Should be Regexp
> nit: In .NET regexes are of the type Regex, not Regexp. Should the function
> name match?

I'm matching thenaming used by other databases, such as MySQL. Not required either way.

> > +        string str_subject = subject.Value;
> > +        string str_pattern = pattern.Value;
> 
> names are nonstandard, should be strSubject and strPattern, though you should
> avoid any form of Hungarian notation. Better names would be subjectValue and
> patternValue.

OK

> > +            test = System.Text.RegularExpressions.Regex.IsMatch(str_subject, str_pattern,RegexOptions.IgnoreCase);
> 
> namespace is unnecessary (included in using statements)

OK

> > +            test = System.Text.RegularExpressions.Regex.IsMatch(str_subject, str_pattern);
> 
> namespace is unnecessary (included in using statements)

OK
> > +        if (test)
> > +        {
> > +            return SqlBoolean.True;
> > +        }
> > +        else
> > +        {
> > +            return SqlBoolean.False;
> > +        }
> 
> nit: stylistically I do not like this code; better would be to pull out the
> Regexp function to:
> 
> > private static bool RegexpImpl(SqlString subject, SqlString pattern, SqlBoolean ignorecase)
> 
> then replace the implementation of Regexp with:
> 
> > return ToSqlBoolean(RegexpImpl(subject, pattern, ignorecase));
> 
> and have a function ToSqlBoolean:

Again, matching other database usage. So you want 4 functions instead of 2..

> > private static SqlBoolean ToSqlBoolean(bool test)
> > {
> >    return test ? SqlBoolean.True : SqlBoolean.False;
> > }
> 
> This would allow you to also replace the contents of NOT_REGEX with
> 
> > return ToSqlBoolean(!RegexpImpl(subject, pattern, ignorecase));
> 
> cutting the file shorter by about 15 lines and making the methods slightly more
> optimized.

Will look at it.

> > +    public static SqlBoolean NOT_REGEXP([SqlFacet(MaxSize = -1)]SqlString subject, [SqlFacet(MaxSize = -1)]SqlString pattern, SqlBoolean ignorecase)
> 
> Function name should be NotRegexp

OK

> > +    public static SqlInt32 INSTR([SqlFacet(MaxSize = -1)]SqlString find, [SqlFacet(MaxSize = -1)]SqlString inthis, SqlBoolean CaseSensitive)
> 
> Function name should be InStr
> parameter inthis should be inThis
> parameter CaseSensitive should be caseSensitive

OK, though I have never understood the mid word caps without capping the first..
 
> nit: Why does this one have caseSensitive and the regex functions have
> ignorecase? The parameters should be consistent with each other as the current
> method could be confusing and eventually lead to issues.

OK, written at different times, laziness etc..

> > +        if (ind == -1)
> > +        {
> > +            return 0;
> > +        }
> > +        else {
> > +            return ind+1;
> > +        }
> 
> condition is unnecessary, should be:

Hmmm -1 + 1 = 0 that is true ;-)

> > +        return ind+1;
> 
> empty line at the end of this method should be removed

Ok

> > +    [Microsoft.SqlServer.Server.SqlFunction()]
> > +    public static SqlDateTime NOW()
> > +    {
> > +
> > +        return DateTime.Now;
> > +    }
> > +
> > +    [Microsoft.SqlServer.Server.SqlFunction()]
> > +    public static SqlDateTime LOCALTIMESTAMP()
> > +    {
> > +        return NOW();
> > +    }
> 
> Aren't both of these doing the same thing that getdate() does?
> Again nonstandard names (will stop saying this now for future methods)
> 
> nit: empty line at beginning of NOW() (empty lines at the beginning and end of
> methods are quite pointless and IMO can only possibly serve to be an attempt to
> get the reader to disassociate the method name from its implementation or to
> make syntax errors when modifying and I decree that this is a bad thing in all
> cases)

Yes, and I think its extra code not being used any longer at this point as I rolled it in to the fix sql code in perl.

> > +    public static SqlInt32 TO_DAYS(SqlDateTime date)
> > +    {
> > +        DateTime SubjectDate = date.Value;
> > +        DateTime Y1900 = new DateTime(1900, 1, 1);
> > +
> > +        long elapsedTicks = SubjectDate.Ticks - Y1900.Ticks;
> > +        TimeSpan elapsedSpan = new TimeSpan(elapsedTicks);
> > +        return Convert.ToInt32(elapsedSpan.TotalDays);
> > +
> > +    }
> 
> 1. Shouldn't you be checking if date is null?
> 2. Why are you subtracting ticks like that? Can't you just do
> > +        return Convert.ToInt32((SubjectDate - Y1900).TotalDays);
> 3. variable names (should be camelCase), will not repeat again

Yes
Ah.. I don't remember/know off hand will look.
OK

> > +        Y1900.AddDays(days.Value);

OK, will review

> shouldn't be here, statement has no side affect and is repeated anyway on the
> next line
> 
> > +        if (SqlDate.IsNull) return SqlChars.Null;
> > +        string str_format = format.Value;
> > +        DateTime Date = SqlDate.Value;
> > +
> > +        string a = Date.DayOfWeek.ToString();
> 
> if statement uses different syntax than before; should be consistent
> variable "a" is unused

OK

> As for the if statements: most of the time spent inside string.Replace is spent
> doing the same thing that string.Contains does. Both take far longer than
> evaluating whatever the replacement would be.
> I would replace that whole section (from the first if to the return statement)
> with:
> >        return new SqlChars(str_format
> >            .Replace(@"%a", Date.DayOfWeek.ToString().Substring(0, 3))
> >            .Replace(@"%b", ((ShortMonthOfYear) Date.Month).ToString())
> >            .Replace(@"%c", Date.Month.ToString())
> >            .Replace(@"%D", Date.Day.ToString() + GetDateSuffix(Date))
> >            .Replace(@"%d", Date.Day.ToString().PadLeft(2, '0'))
> >            .Replace(@"%e", Date.Day.ToString())
> >            .Replace(@"%f", Date.Millisecond.ToString().PadRight(6, '0'))
> >            .Replace(@"%H", Date.Hour.ToString().PadLeft(2, '0'))
> >            .Replace(@"%h", Date.Hour.ToString().PadLeft(2, '0'))
> >            .Replace(@"%I", Date.Hour.ToString().PadLeft(2, '0'))
> >            .Replace(@"%i", Date.Minute.ToString().PadLeft(2, '0'))
> >            .Replace(@"%j", Date.DayOfYear.ToString().PadLeft(3, '0'))
> >            .Replace(@"%k", Date.Hour.ToString().PadLeft(2, '0'))
> >            .Replace(@"%l", Date.Hour.ToString())
> >            .Replace(@"%M", ((LongMonthOfYear) Date.Month).ToString())
> >            .Replace(@"%m", Date.Month.ToString().PadLeft(2, '0'))
> >            .Replace(@"%p", Date.Hour >= 12 ? @"PM" : @"AM")
> >            .Replace(@"%r", Date.TimeOfDay.ToString())
> >            .Replace(@"%S", Date.Second.ToString().PadLeft(2, '0'))
> >            .Replace(@"%s", Date.Second.ToString().PadLeft(2, '0'))
> >            .Replace(@"%T", Date.ToShortTimeString().ToString())
> >            .Replace(@"%U", WeekOfYear(Date).ToString())
> >            .Replace(@"%u", WeekOfYear(Date).ToString())
> >            .Replace(@"%V", WeekOfYear(Date).ToString())
> >            .Replace(@"%v", WeekOfYear(Date).ToString())
> >            .Replace(@"%W", Date.DayOfWeek.ToString())
> >            .Replace(@"%w", ((LongDayofWeek) Date.DayOfWeek).ToString())
> >            .Replace(@"%X", Date.Year.ToString())
> >            .Replace(@"%x", Date.Year.ToString())
> >            .Replace(@"%Y", Date.Year.ToString())
> >            .Replace(@"%y", Date.Year.ToString().Substring(2, 2))
> >            .Replace(@"%%", @"%"));

Hmm, will review.

> That said, I think there are some bugs here:
> I doubt that %H, %h, %I and %k are supposed to return the same value (similarly
> U,u,V,v, and X,x,Y).
> Is %f correct or should it also be PadLeft?
> You have an if for capital S and then do a replace with lowercase s.
> Could a test be written to ensure the correctness of this function?
> Are there l10n issues here?

It's been tested here, in reality only a subset of the items are used by Bugzilla. Most of this was an attempt to clone the MySQL function of the same name. I can review again, wrote this about... a year ago :-) fuzzy now.

> > +    [Serializable]
> > +    private enum LongDayofWeek
> > ...
> > +
> > +    [Serializable]
> > +    private enum ShortMonthOfYear
> > ...
> > +
> > +    [Serializable]
> > +    private enum LongMonthOfYear
> 
> These should be public to prevent a future compiler from obfuciating them.

Well except then they can be altered externally. Is there another way? And there is little point to obfuscating this code given its open source.

> > +    private static string GetDateSuffix(SqlDateTime SqlDate)
> > +    {
> > +        if (SqlDate.IsNull) return null;
> > +
> > +        DateTime date = SqlDate.Value;
> > +
> > +        string day = date.Day.ToString();
> > +        if (day.EndsWith("1"))
> > +        {
> > +            return day.StartsWith("1") && date.Day != 1 ? "th" : "st";
> > +        }
> > +        else if (day.EndsWith("2"))
> > +        {
> > +            return day.StartsWith("1") ? "th" : "nd";
> > +        }
> > +        else if (day.EndsWith("3"))
> > +        {
> > +            return day.StartsWith("1") ? "th" : "rd";
> > +        }
> > +        return "th";
> > +    }
> 
> parameter being passed in is DateTime, not SqlDateTime (makes first if and
> assignment unnecessary)
> first if syntax inconsistent with second if syntax
> else keywords are unneeded

OK, will review

> > +    private static int WeekOfYear(SqlDateTime SqlDate)
> > +    {
> > +        if (SqlDate.IsNull) return 0;
> > +
> > +        DateTime date = SqlDate.Value;
> > +
> > +        System.Globalization.CultureInfo ci = System.Threading.Thread.CurrentThread.CurrentCulture;
> > +        System.Globalization.Calendar cal = ci.Calendar;
> > +        System.Globalization.CalendarWeekRule cwr = ci.DateTimeFormat.CalendarWeekRule;
> > +        DayOfWeek fdow = ci.DateTimeFormat.FirstDayOfWeek;
> > +        return cal.GetWeekOfYear(date, cwr, fdow);
> > +    }
> 
> parameter being passed in is DateTime, not SqlDateTime (makes first if and
> assignment unnecessary)
> if syntax inconsistant again
> using CurrentThread.CurrentCulture here, is this ok in sql app domain context
> (I think so but I am not certain)?
> You use the current culture here but previously you used InvariantCulture for
> case sensitivity. Does that present an internationalization issue (how does
> mysql do case sensitivity particularly for accented characters? does it vary
> per culture and if so should this or should it do it however mssql does case
> sensitivity when it is configured as insensitive, if they are different?)?

Mmm, Can run some tests see if it is an issue? Or provide some case example to test? I cobbled this together I think it is OK, but as commonly is I'm testing many thing from my situation.

> > +public struct GROUP_CONCAT : Microsoft.SqlServer.Server.IBinarySerialize
> 
> should be in separate file, should be in Bugzilla.Mssql namespace, should be
> named GroupConcat
> This class feels smelly:
> 1. A List<T> datatype is almost always better than an ArrayList (and in worst
> case is equivalent)
> 2. this keyword is not necessary for accessing your own members unless they
> share a name with something else you are trying to use
> 3. Read is not a compliment to Write
> 4. Lists are not thread safe. I wouldn't doubt there may be race conditions in
> this code.

OK, I can look, it worked though processing about 1M bug comments, with no errors in a reasonable amount of time.

>this keyword is not necessary
Can you explain further, not really sure what you are referring too.

>Read is not a compliment to Write
Can you explain further?

> I would write this class as follows (completely untested code, but it looks
> less buggy I think):
> > public struct GROUP_CONCAT : Microsoft.SqlServer.Server.IBinarySerialize {
> >    private static readonly object lockobj = new object();
> >    private List<string> _remembered;
> >    private List<GROUP_CONCAT> _merged;
> >    private string _seperator;
> >    private string _sortOrder;
> >    public void Init() {
> >        _remembered = new List<string>();
> >        _merged = new List<GROUP_CONCAT>();
> >        _seperator = null;
> >        _sortOrder = null;
> >    }
> >    public void Accumulate([SqlFacet(MaxSize = -1)]SqlString value, [SqlFacet(MaxSize = -1, IsNullable = true)]SqlString seperator, SqlString sortOrder) {
> >        // Default a Null to Seperator to an empty string
> >        _seperator = seperator.IsNull ? "" : seperator.Value;
> >
> >        // Add the Seperator as the first array element
> >        _sortOrder = sortOrder.IsNull ? "NTRL" : sortOrder.Value.ToUpper();
> >
> >        if (_sortOrder != "ASC" && _sortOrder != "DESC" && _sortOrder != "NTRL") {
> >            throw new ArgumentException(sortOrder.Value + " is not valid for Sort Order");
> >        }
> >
> >        if (value.IsNull) return;
> >
> >        lock (lockobj) {
> >            _remembered.Add(value.Value);
> >        }
> >    }
> >    public void Merge(GROUP_CONCAT group) {
> >        if(group._seperator!=_seperator) {
> >            throw new ArgumentException("Seperators do not match: " + group._seperator+", "+_seperator);
> >        }
> >        if(group._sortOrder!=_sortOrder) {
> >            throw new ArgumentException("Sort Orders do not match: " + group._sortOrder + ", " + _sortOrder);
> >        }
> >
> >        //not sure about the thread safety of Add/AddRange so...
> >        lock (lockobj) {
> >            _merged.Add(group);
> >        }
> >    }
> >
> >    private IEnumerable<string> Enumerate() {
> >        lock (lockobj) {
> >            foreach (var value in _remembered) {
> >                yield return value;
> >            }
> >            foreach (var list in _merged) {
> >                foreach (var value in list.Enumerate()) {
> >                    yield return value;
> >                }
> >            }
> >        }
> >    }
> >
> >    public SqlString Terminate() {
> >        List<string> all = new List<string>(Enumerate());
> >
> >        // If ASC or DESC then Sort
> >        //new NaturalComparer(NaturalComparerOptions.RomanNumbers));
> >        if (_sortOrder != "NTRL") all.Sort(new NaturalComparer.NaturalComparer());
> >        // If DESC then reverse the Sort order
> >        if (_sortOrder == "DESC") all.Reverse();
> >        return new SqlString(string.Join(_seperator, all.ToArray()));
> >    }
> >
> >    public void Read(System.IO.BinaryReader r) {
> >        int itemCount = r.ReadInt32();
> >
> >        // Retrieve the Seperator for use by Terminate()
> >        _seperator = r.ReadString();
> >        _sortOrder = r.ReadString();
> >        lock (lockobj) {
> >            _remembered = new List<string>(itemCount - 2);
> >        }
> >        for (int i = 2; i <= itemCount - 1; i++) {
> >            lock (lockobj) {
> >                _remembered.Add(r.ReadString());
> >            }
> >        }
> >    }
> >
> >    public void Write(System.IO.BinaryWriter w) {
> >        List<string> all = new List<string>(Enumerate());
> >        w.Write(all.Count);
> >        w.Write(_seperator);
> >        w.Write(_sortOrder);
> >        foreach (string s in all) {
> >            w.Write(s);
> >        }
> >    }
> > }

Thanks I'll take a look/test.

> Bugzilla.MSSQL.csproj
> =====================
> > +    <RootNamespace>Bugzilla.MSSQL</RootNamespace>
> > +    <AssemblyName>Bugzilla.MSSQL</AssemblyName>
> > +    <TargetFrameworkVersion>v3.5</TargetFrameworkVersion>
> 
> Names again...
> Should this be targeting .net 3.5? or should it target 2.0 (for a potentially
> wider audience)? It doesn't appear to be using any 3.5 assemblies or syntax
> (unless you use the GROUP_CONCAT class I provided above, in which case you
> would have to remove the 3.5 var syntax).

I though I was using something that required 3.5, perhaps not, 2.0 is preferred if not required.

> NaturalSortComparer.cs
> ======================
> Assuming this code is correct (not even going to bother going there at least
> yet) my most significant issue is that the variable names are not very
> explicit. In my company we use _camelCase for fields, camelCase for method
> variables and parameters and PascalCase for just about everything visible
> outside of the class.
> These classes should also be in a Bugzilla.Mssql namespace and they should be
> in separate files per class (at least according to C# conventions that I know
> of, perhaps the bugzilla devs do not care to follow this).
> 
> I am finding myself questioning what the whole point of this class is.
> What advantages does it provide that Bugzilla might need that will permit it to
> waste so much time (parsing strings) as part of the sql execution?

Used in GroupConcat for sorting, without if you get 'human' unexpected results
eg sort 1,2,30,40,6,7,8,9,10,20,50,60,3,4,5,
you get 1,10,2,20,3,30,4,40  or something along those lines
What should imo return is 
1,2,3,4,5,6,7,8,9,10,20,30,40,50,60

That is what it does.

Thanks for reviewing! I'll see about applying changes in the next few days.
For adjust_statement()
MKanat mentioned moving items like

>$new_sql =~ s/CURRENT_DATE\(?\)?/CAST(GETDATE() as DATE)/igo;

inside the @parts loop, however, does running the regex mulitple times (once per loop) make more since than once at the end as it is now?
Depends on: 557929
(In reply to comment #20)
> So on the CLR import code (the really big chuck of random looking text) what
> shall we do.. create /bugzilla/Bugzilla/DB/Mssql/CLR.inc ?
> 
> We can then read the file during install or as needed?

  That sounds fine. Or you can put it in contrib with the rest of the CLR stuff.
(In reply to comment #22)
> >   Does need to be fixed. Should also be fixed in Oracle.pm by Xiaoou, but
> > that's a separate issue.
> 
> We'll need to work with Xiaoou to move this all to
> Bugzilla::DB::StatementModifier anyhow so addressable then..

  Okay, although that has to happen before this patch can pass review.

> If you set up a System DSN, it kinda is, but your just moving where you
> manually selected at that point. And yes you can have multiple version of the
> driver installed one for e SQL 2005, for SQL 2008, etc.. Updates to a version
> are left named the base name, so doesn't update very often. Constant would
> work, maybe a prompt on install and store in data/params would be best though?

  So you're telling me there's no way to just say "{SQL Server}" without a version number and have it use the latest one available?

> >   Okay, thanks. So I assume that's what lets us have multiple selects going at
> > once on one connection?
> Yes, works well, though like the normal multiple result set warnings apply,

  I'm not aware of any problems with multiple result sets in other DBs.

> Reasonable thought, makes for even better identification. Might still recommend
> concatenating $0 and bugzilla ('Bugzilla '.$0 on the off chance another app has
> the same file name.

  $0 is usually the full path to the current file, so you should be OK, but sure, "Bugzilla: $0" would be fine.

> > > >   What are those for? Also, why isn't LongReadLen a constant?
> 
> odbc_default_bind_type => '-9' == SQLWCHAR  allows for NVARCHAR

  Okay. Could you put a comment next to it?

> LongReadLen sure I can make it a constant.

  Okay. And you're sure that you need LongReadLen for MS-SQL?

> >   Okay. Have those patches been sent to the maintainer of those modules?
>
> If I recall didn't need patches. Was just an issue of compiling on the server,

  Okay, in that case you should contact the maintainer of the theoryx5 PPM repository and ask him to package the modules for you, and if he says there's a problem, assist him in resolving the issues.

> Might I suggest rather than just column to all for any of?

  I'm sorry, that sentence doesn't make sense to me.

> Sometimes you might
> just want the return sorted internally instead of based on another field. Not
> much different than a subselect with a order by.

  This isn't an appropriate place to discuss this feature, which will happen in another bug.

> >   Perhaps you should put a version number into the CLR or include some sort of
> > date about it in the database, so we could check if we have to update it.
> 
> That I can do, in fact had one and pulled it out when I submitted the code. Ah
> well :-)

  Ah, sounds good then. :-)

> >   This seems to indicate it works just like other DBs:
> > 
> >   http://msdn.microsoft.com/en-us/library/ms174123.aspx
> 
> If I recall try it, and you'll get fun errors. Might have just given up late at
> night, I'll review. I know I did some investigating before hand.

  Okay. If you can't get it to work with normal SQL, record the errors you're getting. If you need help, I can help research solutions or make suggestions.

> >   Why not just wrap the entire query and do a SELECT *?
> 
> It is. In fact, I think, two of them Sort DESC TOP then Sort ASC TOP to give
> the
> effect of limit x, y

  What it looks like you're actually doing is splitting on the existing FROM, though...perhaps I'm incorrect?

> OK, I'll look at altering whine, it is using sql_interval but like this
> 
> $dbh-sql_interval(blah..) . ' - ' . $dbh-sql_interval(blah..)

  That's fine...are you saying the problem is that MS-SQL cannot do math with the interval formats you've created?

> >   Lame. Yeah, I do seem to recall we discussed it. Anyhow, you'll have to fix
> > this inside of the parts loop, not outside of it.
> 
> Think the issue was it splitting it up oddly, perhaps not I'll see if it works.

  Yes, it will get split up oddly, but you have to work around that.
  Also, before I forget--if the utf8 parameter is on in Bugzilla, you need to set odbc_utf8_on in the driver: http://search.cpan.org/dist/DBD-ODBC/ODBC.pm#odbc_utf8_on

  Or alternately, you may need to check if odbc_has_unicode is enabled and die if not.
(In reply to comment #24)
> MKanat mentioned moving items like
> 
> >$new_sql =~ s/CURRENT_DATE\(?\)?/CAST(GETDATE() as DATE)/igo;
> 
> inside the @parts loop, however, does running the regex mulitple times (once
> per loop) make more since than once at the end as it is now?

  Yes, because you *must* do replacements inside the loop, otherwise you're also modifying strings that people are searching for or otherwise using in SQL.
(In reply to comment #26)
>   So you're telling me there's no way to just say "{SQL Server}" without a
> version number and have it use the latest one available?

Correct, the drivers aren't called "SQL Server Native Client" they are called with the major version number.
ie SQL Server Native Client 9.0  (2005)
   SQL Server Native Client 10.0 (2008)

In most cases you can you version 9 or 10 on SQL 2005 or SQL 2008 but using the latest makes more sense generally.

>   Okay. And you're sure that you need LongReadLen for MS-SQL?

Had some truncation issues originally with out. None since even on a very very large file or nvarchar return. Was the commonly recommended solution for the truncation warnings I was receiving, believe in a MS article someplace as well.

> > >   Why not just wrap the entire query and do a SELECT *?
> > 
> > It is. In fact, I think, two of them Sort DESC TOP then Sort ASC TOP to give
> > the
> > effect of limit x, y
> 
>   What it looks like you're actually doing is splitting on the existing FROM,
> though...perhaps I'm incorrect?

On Where looks like, it works however the same for oracle, can't take credit for how it works, beyond some modifications to mssqlize it where it differed from Oracle. It was also a commonly recommended, see google search, method to achieve LIMIT functionality with mssql.

> 
> > OK, I'll look at altering whine, it is using sql_interval but like this
> > 
> > $dbh-sql_interval(blah..) . ' - ' . $dbh-sql_interval(blah..)
> 
>   That's fine...are you saying the problem is that MS-SQL cannot do math with
> the interval formats you've created?

Part of the issue it sql_interval accepts two params
>NOW() + INTERVAL 5 MINUTE == "NOW + " . sql_interval('MINUTE', 5)


So that means that in whine.pl it does this:

>        $sth = $dbh->prepare("UPDATE whine_schedules " .
>                                 "SET run_next = (CURRENT_DATE + " .
>                                 $dbh->sql_interval('?', 'DAY') . ") + " .
>                                 $dbh->sql_interval('?', 'HOUR') .
>                                 " WHERE id = ?")

meaning I get a result like this

>CURRENT_DATE + INTERVAL ? DAY + INTERVAL ? HOUR

using sql_interval in mssql modules result in:
>CURRENT_DATE + /*INTERVAL DAY ?*/ + /*INTERVAL HOUR ?*/

Which I reparse with sql_adjust as:
>DATEADD(DAY, ?, DATEADD(HOUR, ? CURRENT_DATE()))

I'm doing a bit more, casting to insure an INT etc.. but that is that's the basic result. If the datetime field or value we are adding or subtracting from were passed to the function I could completely forgo the REGEX eliminating the issue. I wonder that it wasn't built that way to begin with.

If sql_interval also accepted the datetime field or value eg:
>NOW() + INTERVAL 5 MINUTE   == "NOW() + " . sql_interval('NOW()', 'MINUTE', 5)
then the issue would be trivial.

> CURRENT_DATE +  sql_interval('DAY', '?') + sql_interval('HOUR', '?')
could become
> sql_interval(sql_interval('CURRENT_DATE','DAY', '?'),'HOUR', '?')
becoming with no regex.
>DATEADD(DAY, ?, DATEADD(HOUR, ? CURRENT_DATE()))
(In reply to comment #27)
>   Also, before I forget--if the utf8 parameter is on in Bugzilla, you need to
> set odbc_utf8_on in the driver:
> http://search.cpan.org/dist/DBD-ODBC/ODBC.pm#odbc_utf8_on
> 
>   Or alternately, you may need to check if odbc_has_unicode is enabled and die
> if not.

From that page:

>When building DBD::ODBC on Windows ($^O eq 'MSWin32') the WITH_UNICODE macro >is automatically added

>Do not confuse this with DBD::ODBC's unicode support. The odbc_utf8_on  >attribute only applies to non-unicode enabled builds of DBD::ODBC.

Taking those togeather, it means not required under Windows.
(In reply to comment #23)
> >
> > > +        CREATE ASSEMBLY [Bugzilla.MSSQL]
> > > +        AUTHORIZATION [dbo]
> > > +        FROM 0x4D5A9000030...
> > This varbinary constant here is the actual bits of the assembly, isn't it?
>
> Yes
I would assume you shouldn't need some sort or other file to read, just use the dll itself.

There's gotta be a short bit of perl that can read a file and store it as a hex string.

> > > +public partial class UserDefinedFunctions
> >
> > This class is not partial, it is fully defined. This class should be in a
> > Bugzilla.Mssql namespace. This class should be marked static.
>
> It being marked partial is based on the Visual Studio defaults, is there a
> reason it should be marked otherwise? Ie is it broken as partial or gain
> something being marked full?

No performance reason or brokenness, just the better to be explicit logic rearing its head again. The word partial being there in the diff had me looking for a second half of the class.

> > > +    public static SqlInt32 INSTR([SqlFacet(MaxSize = -1)]SqlString find, [SqlFacet(MaxSize = -1)]SqlString inthis, SqlBoolean CaseSensitive)
> >
> > Function name should be InStr
> > parameter inthis should be inThis
> > parameter CaseSensitive should be caseSensitive
>
> OK, though I have never understood the mid word caps without capping the
> first..

http://msdn.microsoft.com/en-us/library/ms229043.aspx

Seems like it is arbitrary to me, but this was the decision made and it is largely followed.

see also:
http://msdn.microsoft.com/en-us/library/ms229002.aspx

> > That said, I think there are some bugs here:
> > I doubt that %H, %h, %I and %k are supposed to return the same value (similarly
> > U,u,V,v, and X,x,Y).
> > Is %f correct or should it also be PadLeft?
> > You have an if for capital S and then do a replace with lowercase s.
> > Could a test be written to ensure the correctness of this function?
> > Are there l10n issues here?
>
> It's been tested here, in reality only a subset of the items are used by
> Bugzilla. Most of this was an attempt to clone the MySQL function of the same
> name. I can review again, wrote this about... a year ago :-) fuzzy now.

Perhaps this assembly should be spun off to its own project, not under the stewardship of Bugzilla. I am certain it has usefulness outside of Bugzilla.

Something like ???.SqlServer.MysqlExtensions (not concerned about this now though, maybe someday).

> > These should be public to prevent a future compiler from obfuciating them.
>
> Well except then they can be altered externally. Is there another way? And
> there is little point to obfuscating this code given its open source.

Altered externally? What do you mean?

I was talking about obfuscating as part of some future compiler optimizations that might come up.

Object names are not required to remain in the compiled assembly if they are not publically accessable per the spec IIRC. Some future compiler could very well notice that they are not public and simply use their integer values instead of the enums.

> > > +    private static int WeekOfYear(SqlDateTime SqlDate)
> > > +    {
> > > +        if (SqlDate.IsNull) return 0;
> > > +
> > > +        DateTime date = SqlDate.Value;
> > > +
> > > +        System.Globalization.CultureInfo ci = System.Threading.Thread.CurrentThread.CurrentCulture;
> > > +        System.Globalization.Calendar cal = ci.Calendar;
> > > +        System.Globalization.CalendarWeekRule cwr = ci.DateTimeFormat.CalendarWeekRule;
> > > +        DayOfWeek fdow = ci.DateTimeFormat.FirstDayOfWeek;
> > > +        return cal.GetWeekOfYear(date, cwr, fdow);
> > > +    }
> >
> > parameter being passed in is DateTime, not SqlDateTime (makes first if and
> > assignment unnecessary)
> > if syntax inconsistant again
> > using CurrentThread.CurrentCulture here, is this ok in sql app domain context
> > (I think so but I am not certain)?
> > You use the current culture here but previously you used InvariantCulture for
> > case sensitivity. Does that present an internationalization issue (how does
> > mysql do case sensitivity particularly for accented characters? does it vary
> > per culture and if so should this or should it do it however mssql does case
> > sensitivity when it is configured as insensitive, if they are different?)?
>
> Mmm, Can run some tests see if it is an issue? Or provide some case example to
> test? I cobbled this together I think it is OK, but as commonly is I'm testing
> many thing from my situation.

I think it might be OK too, but I wasn't certain and this is an area where I don't feel comfortable enough to say everything is fine. As a developer my weakest area overall is i18n. I'm happy to blame it on US culture, that for a pretty long time we never gave any thought to writing code that would be used in other countries. Fortunately this is starting to change, but it is still a very foreign concept to many devs I have met.

> >this keyword is not necessary
> Can you explain further, not really sure what you are referring too.

this.member can almost always just be retrieved by member. It is a convention choice more than anything. I prefer in my code to write as little cruft as possible.

> >Read is not a compliment to Write
> Can you explain further?

http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.server.ibinaryserialize_members%28v=VS.90%29.aspx

Pretty sure that the following test would fail:

var concat = new GROUP_CONCAT();
concat.Init();
concat.Accumulate(new SqlString("1"),new SqlString(", "), new SqlString("ASC"));
concat.Accumulate(new SqlString("2"),new SqlString(", "), new SqlString("ASC"));
concat.Accumulate(new SqlString("3"),new SqlString(", "), new SqlString("ASC"));
var concat2 = new GROUP_CONCAT();
concat2.Init();
concat2.Accumulate(new SqlString("4"),new SqlString(", "), new SqlString("ASC"));
concat2.Accumulate(new SqlString("5"),new SqlString(", "), new SqlString("ASC"));
concat2.Accumulate(new SqlString("6"),new SqlString(", "), new SqlString("ASC"));
concat.Merge(concat2);

var ms = new MemoryStream(); 
using(var w = new BinaryWriter(ms)) {
    concat.Write(w);
}
ms.Seek(0);
var concat3 = new Group_CONCAT();
using(var r = new BinaryReader(ms)) {
    concat3.Read(r);
}
var ms2 = new MemoryStream();
using(var w2 = new BinaryWriter(ms)) {
    concat3.Write(ms2);
}
var ms1Array = ms.ToArray();
var ms2Array = ms2.ToArray();

Debug.Assert(concat.Terminate().Value == concat3.Terminate().Value);
Debug.Assert(ms1Array.Length == ms2Array.Length);
for(int i=0;i<ms1Array.Length;++i) {
    Debug.Assert(ms1Array[i]==ms2Array[i]);
}

> > NaturalSortComparer.cs
> > ======================
> > Assuming this code is correct (not even going to bother going there at least
> > yet) my most significant issue is that the variable names are not very
> > explicit. In my company we use _camelCase for fields, camelCase for method
> > variables and parameters and PascalCase for just about everything visible
> > outside of the class.
> > These classes should also be in a Bugzilla.Mssql namespace and they should be
> > in separate files per class (at least according to C# conventions that I know
> > of, perhaps the bugzilla devs do not care to follow this).
> >
> > I am finding myself questioning what the whole point of this class is.
> > What advantages does it provide that Bugzilla might need that will permit it to
> > waste so much time (parsing strings) as part of the sql execution?
>
> Used in GroupConcat for sorting, without if you get 'human' unexpected results
> eg sort 1,2,30,40,6,7,8,9,10,20,50,60,3,4,5,
> you get 1,10,2,20,3,30,4,40  or something along those lines
> What should imo return is
> 1,2,3,4,5,6,7,8,9,10,20,30,40,50,60
>
> That is what it does.

What I meant was, what advantages does it have over something far simpler (easier to review, likely significantly faster) like the code below.

    public class NumericComparer : IComparer<string>, IComparer {
        public int Compare(string x, string y) {
            if(string.IsNullOrEmpty(x) || string.IsNullOrEmpty(y)) {
                return string.Compare(x, y);
            }
            double numX, numY;
            if (double.TryParse(x, out numX) && double.TryParse(y, out numY)) {
                return numX.CompareTo(numY);
            }
            return string.Compare(x, y);
        }
        public int Compare(object x, object y) {
            return Compare(x as string, y as string);
        }
    }

> Thanks for reviewing! I'll see about applying changes in the next few days.

You are welcome, Thanks for not taking it too harshly, it's my first public review.
(In reply to comment #31)
> (In reply to comment #23)
> > >
> > > > +        CREATE ASSEMBLY [Bugzilla.MSSQL]
> > > > +        AUTHORIZATION [dbo]
> > > > +        FROM 0x4D5A9000030...
> > > This varbinary constant here is the actual bits of the assembly, isn't it?
> >
> > Yes
> I would assume you shouldn't need some sort or other file to read, just use the
> dll itself.
> 
> There's gotta be a short bit of perl that can read a file and store it as a hex

Yeah probably, but then, is it necessary vs. us releasing a preconfigured file.

On obfuscation, I guess to me it boils down to, is it broken, is obfuscation even an issue if it changes the name? I don't think it is. Being what it is, it won't break anything.

> I think it might be OK too, but I wasn't certain and this is an area where I
> don't feel comfortable enough to say everything is fine. As a developer my
> weakest area overall is i18n. I'm happy to blame it on US culture, that for a
> pretty long time we never gave any thought to writing code that would be used
> in other countries. Fortunately this is starting to change, but it is still a
> very foreign concept to many devs I have met.

Same here, but in this case, baring using tool to guess the incoming encoding,  and setting culture to suite you end up having to accept it might not be perfect. I use a adapted guess encoding script found here http://www.codeproject.com/KB/recipes/DetectEncoding.aspx which might give us what we want but it probably unnecessary and would likely be high overhead. I use it for db conversions, mysql utf8 to mssql UCS-2 (a UTF-16 variant)

> > >this keyword is not necessary
> > Can you explain further, not really sure what you are referring too.
> 
> this.member can almost always just be retrieved by member. It is a convention
> choice more than anything. I prefer in my code to write as little cruft as
> possible.

Ah OK, matters not to me. We can alter it.

> > >Read is not a compliment to Write
> > Can you explain further?
> 
> http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.server.ibinaryserialize_members%28v=VS.90%29.aspx
> 
> Pretty sure that the following test would fail:
> 
> var concat = new GROUP_CONCAT();
> concat.Init();
> concat.Accumulate(new SqlString("1"),new SqlString(", "), new
> SqlString("ASC"));
> concat.Accumulate(new SqlString("2"),new SqlString(", "), new
> SqlString("ASC"));
> concat.Accumulate(new SqlString("3"),new SqlString(", "), new
> SqlString("ASC"));
> var concat2 = new GROUP_CONCAT();
> concat2.Init();
> concat2.Accumulate(new SqlString("4"),new SqlString(", "), new
> SqlString("ASC"));
> concat2.Accumulate(new SqlString("5"),new SqlString(", "), new
> SqlString("ASC"));
> concat2.Accumulate(new SqlString("6"),new SqlString(", "), new
> SqlString("ASC"));
> concat.Merge(concat2);
> 
> var ms = new MemoryStream(); 
> using(var w = new BinaryWriter(ms)) {
>     concat.Write(w);
> }
> ms.Seek(0);
> var concat3 = new Group_CONCAT();
> using(var r = new BinaryReader(ms)) {
>     concat3.Read(r);
> }
> var ms2 = new MemoryStream();
> using(var w2 = new BinaryWriter(ms)) {
>     concat3.Write(ms2);
> }
> var ms1Array = ms.ToArray();
> var ms2Array = ms2.ToArray();
> 
> Debug.Assert(concat.Terminate().Value == concat3.Terminate().Value);
> Debug.Assert(ms1Array.Length == ms2Array.Length);
> for(int i=0;i<ms1Array.Length;++i) {
>     Debug.Assert(ms1Array[i]==ms2Array[i]);
> }

I didn't test above, but compiled and running a very basic test of what I have, similar to your example above in sql see below:

--SQL Snip
DECLARE @tab TABLE (test int, test2 nvarchar(1))

DECLARE @var int
SET @var=0

WHILE @var < 10 BEGIN
INSERT INTO @tab SELECT @var, CASE WHEN @var < 6 THEN 'A' ELSE 'B' END
SET @var = @var + 1
END

SELECT dbo.GROUP_CONCAT(test, ',', 'ASC')  FROM @tab group by test2

-- Result
--Row 1
0,1,2,3,4,5
--Row 2
6,7,8,9

--End Snip


> > > NaturalSortComparer.cs

> What I meant was, what advantages does it have over something far simpler
> (easier to review, likely significantly faster) like the code below.
> 
>     public class NumericComparer : IComparer<string>, IComparer {
>         public int Compare(string x, string y) {
>             if(string.IsNullOrEmpty(x) || string.IsNullOrEmpty(y)) {
>                 return string.Compare(x, y);
>             }
>             double numX, numY;
>             if (double.TryParse(x, out numX) && double.TryParse(y, out numY)) {
>                 return numX.CompareTo(numY);
>             }
>             return string.Compare(x, y);
>         }
>         public int Compare(object x, object y) {
>             return Compare(x as string, y as string);
>         }
>     }

Can take into account items like Roman Numerals etc... if that matters.. maybe not much. Depends on what we want really, how far we want to go. Do we do only enough?

> it's my first public review.

You did good! If you disagree with me anything, stick to your guns if needed, MKanat or someone else can always arbitrate. :-)
(In reply to comment #29)
> Correct, the drivers aren't called "SQL Server Native Client" they are called
> with the major version number.
> ie SQL Server Native Client 9.0  (2005)
>    SQL Server Native Client 10.0 (2008)

  Wow. That is utterly lame.

  Okay, here's what we should do:

  Allow people to put a space in $db_driver, for MS-SQL. The item after the space will be the version number. It will default to 10.0 if not specified. This will be only for MS-SQL.

> >   Okay. And you're sure that you need LongReadLen for MS-SQL?
> 
> Had some truncation issues originally with out. None since even on a very very
> large file or nvarchar return. Was the commonly recommended solution for the
> truncation warnings I was receiving, believe in a MS article someplace as well.

  Okay, that sounds fine then.

> On Where looks like, it works however the same for oracle, can't take credit
> for how it works, beyond some modifications to mssqlize it where it differed
> from Oracle. It was also a commonly recommended, see google search, method to
> achieve LIMIT functionality with mssql.

  Okay. If it works, I'll let it slide, for now.

> I'm doing a bit more, casting to insure an INT etc.. but that is that's the
> basic result. If the datetime field or value we are adding or subtracting from
> were passed to the function I could completely forgo the REGEX eliminating the
> issue. I wonder that it wasn't built that way to begin with.

  It wasn't built that way to begin with because this is sql_interval, not sql_date_add or sql_date_subtract or sql_date_math or something like that.

  Is it possible to create a custom operator in MS-SQL for this? That is, you could return some data type that could be added or subtracted with a datetime...?

  sql_interval is used all over the code; I don't really want to change them all to sql_date_math, which would really get complex.
  For these things that you're doing a global regex on the SQL with, one option is to put some delimiters around them that you think will never, ever appear in a Search.pm query, and then you could parse them out that way.
> >When building DBD::ODBC on Windows ($^O eq 'MSWin32') the WITH_UNICODE macro >is automatically added
> 
> >Do not confuse this with DBD::ODBC's unicode support. The odbc_utf8_on  >attribute only applies to non-unicode enabled builds of DBD::ODBC.
> 
> Taking those togeather, it means not required under Windows.

  Yes, but somebody may want to host Bugzilla on *nix with the MS-SQL server on Windows, so we have to make this check--probably just assure that we have odbc_has_unicode on.
(In reply to comment #32)
> > There's gotta be a short bit of perl that can read a file and store it as a hex
> 
> Yeah probably, but then, is it necessary vs. us releasing a preconfigured file.

  That's fine, I'd rather you just have the binary dll, read it in, and convert it to hex in Perl. It won't take that long.

> I use a adapted guess encoding script found here
> http://www.codeproject.com/KB/recipes/DetectEncoding.aspx which might give us
> what we want but it probably unnecessary and would likely be high overhead. I
> use it for db conversions, mysql utf8 to mssql UCS-2 (a UTF-16 variant)

  I'm slightly concerned--are you saying that your MS-SQL driver is causing MS-SQL to store data as UCS-2 instead of UTF-8? Does it return it as UCS-2 or UTF-8?

  As far as any i18n issues go, ask me about them specifically and I'll help out. I usually know the theoretical answer, and then you can translate that into the appropriate language-specific answer.

> > What I meant was, what advantages does it have over something far simpler
> > (easier to review, likely significantly faster) like the code below.
> > 
> >     public class NumericComparer : IComparer<string>, IComparer {
> [snip]

  Comparison should be done exactly as the database would do it given an ORDER BY clause on the type of column passed in. Is that what the comparer is doing now?
(In reply to comment #33)
> 
>   Okay. If it works, I'll let it slide, for now.

Makes me wonder if that was the response to the original code hehe.

> 
> > I'm doing a bit more, casting to insure an INT etc.. but that is that's the
> > basic result. If the datetime field or value we are adding or subtracting from
> > were passed to the function I could completely forgo the REGEX eliminating the
> > issue. I wonder that it wasn't built that way to begin with.
> 
>   It wasn't built that way to begin with because this is sql_interval, not
> sql_date_add or sql_date_subtract or sql_date_math or something like that.
> 
>   Is it possible to create a custom operator in MS-SQL for this? That is, you
> could return some data type that could be added or subtracted with a
> datetime...?
> 
>   sql_interval is used all over the code; I don't really want to change them
> all to sql_date_math, which would really get complex.

Short answer, I have no idea :-) I have wondered.. but have not seen how. I would like to know, though as it would open up a lot of possibilities. I have a thought... but hadn't poked at it yet I'll try to do so in the next week.. if I can.
(In reply to comment #34)
>   For these things that you're doing a global regex on the SQL with, one option
> is to put some delimiters around them that you think will never, ever appear in
> a Search.pm query, and then you could parse them out that way.

I think the only real thing I needed it for was sql_interval, the rest can go in the loop. And I can't use delimiters for that since there is an out of data element required.

(In reply to comment #33)
>  sql_interval is used all over the code; I don't really want to change them
>all to sql_date_math, which would really get complex.

I'm not really sure how, is it a specific module we are doing something that would make this difficult?
(In reply to comment #36)
>   I'm slightly concerned--are you saying that your MS-SQL driver is causing
> MS-SQL to store data as UCS-2 instead of UTF-8? Does it return it as UCS-2 or
> UTF-8?

You answer can be found here:
http://search.cpan.org/~mjevans/DBD-ODBC-1.23/ODBC.pm#odbc_has_unicode
When odbc_has_unicode is 1, DBD::ODBC will:
bind columns the database declares as wide characters as SQL_Wxxx
    This means that UNICODE data stored in these columns will be returned to Perl in UTF-8 and with the UTF8 flag set.

Which you addressed in a prior comment, we can always check for unicode support.

> 
>   As far as any i18n issues go, ask me about them specifically and I'll help
> out. I usually know the theoretical answer, and then you can translate that
> into the appropriate language-specific answer.
> 
> > > What I meant was, what advantages does it have over something far simpler
> > > (easier to review, likely significantly faster) like the code below.
> > > 
> > >     public class NumericComparer : IComparer<string>, IComparer {
> > [snip]
> 
>   Comparison should be done exactly as the database would do it given an ORDER
> BY clause on the type of column passed in. Is that what the comparer is doing
> now?

Just regular mssql:
Given a single table with one field sort by that field with the following rows:
1
2
3
4
5
10
20
30
40
50

you would get:
1
10
2
20
3
30
4
40
5
50

So if that is what you looking for then yeah we wouldn't want to include the NaturalSort code as the would return
1,2,3,4,5,6,7,8,9,10,20,30,40,50 of sorting the aggregate field as asc, reverse for desc. A standard array/list sort would match the sort by ordering.
(In reply to comment #38)

>  since there is an out of data element required.

since there is an outside data element required.
Re (In reply to comment #21)
> I would write this class as follows (completely untested code, but it looks
> less buggy I think):

No Go.

lock() { } causes the following:
>Msg 6522, Level 16, State 1, Line 1
>A .NET Framework error occurred during execution of user-defined routine or
>aggregate "GROUPCONCAT": 
>System.Security.HostProtectionException: Attempted to perform an operation
>that >was forbidden by the CLR host.

>The protected resources (only available with full trust) were: All
>The demanded resources were: Synchronization, ExternalThreading

>System.Security.HostProtectionException: 
>   at GroupConcat.Accumulate(SqlString value, SqlString seperator, SqlString
>sortOrder)

If the DLL is imported with permission of UNRESTRICTED you get
>Msg 6522, Level 16, State 1, Line 1
>A .NET Framework error occurred during execution of user-defined routine or
>aggregate "GROUPCONCAT": 
>System.NullReferenceException: Object reference not set to an instance of an
>object.
>System.NullReferenceException: 
>   at GroupConcat.<Enumerate>d__0.MoveNext()
>   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
>   at GroupConcat.Terminate()

This might be owing to the only item being passed along to the very end being the the binary object passed to Read(), which is why I stored the sort and separator in the list and split them off there at the end. If however I can store these outside, or at least persist the and still mark the dll as SAFE I'd be quite happy. Ideas?
Update VS 2010 is out and better support auto deployment of CLR functions

2009 failed where our usage of GroupConcat was concerned, ie passing -1 to indicate nvarchar(max), as of 2010 it works.

2010 requires each UDF to be preceded with [Microsoft.SqlServer.Server.SqlFunction()] any with out are omitted in the deploy. this requirement does appear to exist in 2009. This may be similar to the former issue with groupconcat where manual deployment works fine.

Good progress on the CLR portion should have that up soon. I will likely post the CLR code as a separate patch file to allow for independent reviews.
Also of note VS 2010, it now generates a sql deploy file as well, which includes the dll and associated files converted to a binary format. I have updated my development files for reading in and converting the dll for deploy as well. So a little more work on that side then I'll get that patch file up as well.
Okay, although I don't know much about what a lot of that means, it sounds positive. :-)

Yeah, actually file a separate bug for the CLR code and attach the patch there. That way we can keep the reviews separate.
Blocks: 565720
No longer blocks: 565720
Depends on: 565720
Attached patch v8 (obsolete) — Splinter Review
Patch for latest trunk for review.
Attachment #559976 - Flags: review?(mkanat)
Attached file v8 (obsolete) —
Attachment #559976 - Attachment is obsolete: true
Attachment #559978 - Flags: review?(mkanat)
Attachment #559976 - Flags: review?(mkanat)
Attached patch v8 (obsolete) — Splinter Review
Attachment #559978 - Attachment is obsolete: true
Attachment #559979 - Flags: review?(mkanat)
Attachment #559978 - Flags: review?(mkanat)
I will go through the comments here and apply all suggestions where applicable if they haven't been done already and post another patch update.
Why the INSTR function?  Does the builtin CHARINDEX not do everything?
http://msdn.microsoft.com/en-us/library/ms186323.aspx
CARINDEX does NOT do that same function similar but enough different. I forget all the differences off the top of my head. PATINDEX is closer in function however still different.

I think part was one using base 0 vs base 1 also the issue of case sensitivity.

What my goal was during this process was to insure that all things equal a query using mysql would result in the same returns as mssql.

I do recall I spent a while on that function and did test with both CHRINDEX and PATINDEX before deciding to build a new function. Also the issue was the hard coding of function in the bugzilla code. There are a number of issues that had to be worked around specifically because assumed ANSI standards which weren't such as  the mysql limit function when used as LIMIT 3, 4 the ANSI standard is ONLY a match for LIMIT 99 the 2nd param is non-standard and causes considerable contortions for both MSSQL and Oracle.
CHARINDEX can do case sensitivity if you choose the correct collation.

select CHARINDEX('CD','abcdefg' collate LATIN1_GENERAL_CS_AS,1)
select CHARINDEX('cd','abcdefg' collate LATIN1_GENERAL_CS_AS,1)
Yes however Latin1 is not always the char set in use. As I said there were multiple factors not just the one. I'd have to l
Continued..

I would need to look back at notes as to all my reasoning at the time. It may have been due to hardcoding in modules or support scripts like email in. At that time convincing Max to allow updates to hard coded supposedly standard function all the other DBs used was not a fight I had time to wage. I'll see if I can refresh my memory. 

Google mssql version of instr I think there were some comment out there they do suggest chrindex or patindex and have various comments on differences etc..
To add if you can get it to work properly I don't I suppose have a real objection however
Attached patch v9 (obsolete) — Splinter Review
Attachment #559979 - Attachment is obsolete: true
Attachment #562295 - Flags: review?(mkanat)
Attachment #559979 - Flags: review?(mkanat)
(In reply to Max Kanat-Alexander from comment #14)
> Comment on attachment 436376 [details] [diff] [review] [diff] [details] [review]
> v7
> 
> >=== added file Bugzilla/DB/Mssql.pm
> >+# The Initial Developer of the Original Code is Netscape Communications
> >+# Corporation. Portions created by Netscape are
> >+# Copyright (C) 1998 Netscape Communications Corporation. All
> >+# Rights Reserved.
> 
>   Untrue.
> 

Fixed.

> >+=head1 NAME
> >+
> >+Bugzilla::DB::Mssql - Bugzilla database compatibility layer for MSSQL
> 
>   I know that a lot of other modules have this stuff up top, but could you
> put the POD at the bottom? I've decided that's the best standard for us.
> 

Done.

> >+my %locks;
> 
>   Why do you have a package-level "my" variable? (Or any package-level
> variable at all, for that matter.)
> 

Removed.

> >+# This is how many comments of MAX_COMMENT_LENGTH we expect on a single bug.
> >+# In reality, you could have a LOT more comments than this, because 
> >+# MAX_COMMENT_LENGTH is big.
> >+use constant MAX_COMMENTS => 50;
> 
>   You should not be overriding this in a driver. Why are you?

Removed.

> 
> >+use constant EMPTY_STRING  => '';
> 
>   You don't need to specify that.

It is being used in some methods so I've left it there for now.

> 
> >+use constant BLOB_TYPE =>  undef ;
> 
>   MS-SQL can bind blobs correctly without any special stuff to DBI?

Looks like it.

> 
>   Nit: Semicolon right after "undef".
> 

Fixed.

> >+# This module extends the DB interface via inheritance
> >+use base qw(Bugzilla::DB);
> 
>   Nit: This should be right under "package" and doesn't need the comment.
> 

Done.

> 
> >+    my $attrs = { odbc_default_bind_type => '-9', LongReadLen => 5000000 };
> 
>   What are those for? Also, why isn't LongReadLen a constant?

Mockdin explained what they are and I've made them into constants now.

> 
> >+    # Needed by TheSchwartz
> >+    $self->{private_bz_dsn} = \($dsn, $user, $pass, $attrs);
> 
>   Um...  (a) You just tried to assign a list of references to a single hash
> item (b) that's not the format of private_bz_dsn (c) does TheSchwartz even
> support MS-SQL?

TheSchwartz does not support ODBC and therefor MS-SQL. As it is using DBD and employs use of LIMIT and the fact it uses coalesce as one of the column names TheSchwartz module needs work to make it compatible. I have patched version that I got working which I will have to see what the TheSchwartz maintainers think about. The private_bz_dsn has been set up correctly now.

> 
> >+    # all class local variables stored in DBI derived class needs to have
> >+    # a prefix 'private_'. See DBI documentation.
> >+    $self->{private_bz_tables_locked} = "";
> 
>   That variable is actually obsolete and should be removed from all of
> Bugzilla.

Removed.

> >+sub sql_regexp {
> >+    my ($self, $expr, $pattern) = @_;
> >+
> >+    $pattern = "'\\\\x00-\\\\x1f'" if ($pattern eq "'[[:cntrl:]]'");
> 
>   I think that instead, you need to replace [:cntrl:] with the item there.

Done.

> >+sub sql_string_concat {
> >+    my ($self, @params) = @_;
> 
> >+    my @hack_for_hack;
> 
>   Nit: Should have a better variable name.
> 

Better name was used :)

> >+    foreach my $string (@params){
> >+        push @hack_for_hack, "cast($string as nvarchar(max))";
> 
>   Nit: CAST should be capitalized.
> 

Fixed.

> >+# Currently Unsupportable, CONTAINS can not be used in GROUP BY
> 
>   Then let's not have the code here. But I don't see why it would be in a
> GROUP BY.

Added support for fulltext searching via the custom REGEXP function.

> 
> >+sub sql_from_days {
> >+    my ($self, $days) = @_;
> >+
> >+    return " dateadd(dd,0, ( $days )) ";
> 
>   Why the spaces around the SQL?

Fixed.

> 
> >+sub sql_date_format {
> >+    my ($self, $date, $format) = @_;
> >+
> >+    $format = trim($format);
> >+    $format = '%Y.%m.%d %H:%i:%s' unless $format;
> >+
> >+    return " dbo.DATE_FORMAT($date,'$format') ";
> 
>   You just have a return here, and then there's a bunch of unreachable code
> after it.

Unreachable code removed.

 
> >+sub LOCALTIMESTAMP {
> 
>   Why do you have this here?

Removed.

> 
> >+sub MOD {
> 
>   And this?

Removed.

> 
> >+sub bz_setup_database {
> >+    my ($self) = @_;
> >+
> >+    print "Regenerating CLR Functions for mssql...";
> 
>   That should be an install_string. Also, it probably shouldn't happen every
> time somebody runs checksetup.pl. If it has to happen every time, then there
> shouldn't be a string printed.

Removed printing.

> 
> >+    $self->do("IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[FROM_DAYS]') AND type in (N'FN', N'IF', N'TF', N'FS', N'FT'))
> >+    DROP FUNCTION [dbo].[FROM_DAYS]");
> 
>   Can you put this into a subroutine instead of repeating the same code 7
> times?

Done.

> 
> >+    $self->do(q|
> 
>   Usually we use q{ unless there's some reason not to.

Done.

> 
> >+        CREATE ASSEMBLY [Bugzilla.MSSQL]
> >+        AUTHORIZATION [dbo]
> >+        FROM 
> > [snip]
> 
>   Instead of having an enormous string inside of this file, which will be
> loaded into memory for every httpd child, could the CLR code live in an
> external file?

Now loading the assembly dll file directly from contrib.

> 
> >+    $self->do("CREATE FUNCTION [dbo].[FROM_DAYS](\@days [int])
> >+    RETURNS [datetime] WITH EXECUTE AS CALLER
> >+    AS 
> >+    EXTERNAL NAME [Bugzilla.MSSQL].[UserDefinedFunctions].[FROM_DAYS]
> >+    ");
> 
>   If you don't need to use ", then just use ' and you can remove the \ from
> @days. (Same comment for all of these.)
> 
>   Also, our normal style would be to simply put the "); at the end of the
> text.

Done and done.

> 
>   These might be more readable using the heredoc syntax, though:
> 
> $self->do(<<'END'
>   CREATE FUNCTION [dbo].[FROM_DAYS](@days [int])
>           RETURNS [datetime] WITH EXECUTE AS CALLER
>                AS 
>     EXTERNAL NAME [Bugzilla.MSSQL].[UserDefinedFunctions].[FROM_DAYS]
> END
> );
> 
>   (You might want to use something different than "END" though.)
> 

Done.

> 
> >+    $self->do("IF  EXISTS (SELECT name FROM sys.procedures WHERE name = N'sp_alter_column_default' and type = 'P')
> 
>   Nit: Very long line.

Done.

> 
> >+sub _bz_add_field_table {
> > [snip]
> 
>   This is not code you should be overriding, and I don't see any reason why
> you would have to. It looks just like a direct copy of the superclass's code.

Removed.

> 
> >+sub bz_add_field_tables {
> >+    my ($self, $field) = @_;
> >+    
> >+    $self->_bz_add_field_table($field->name,
> >+                               $self->_bz_schema->FIELD_TABLE_SCHEMA, $field->type);
> >+    if ($field->type == FIELD_TYPE_MULTI_SELECT) {
> >+        my $ms_table = "bug_" . $field->name;
> >+        $self->_bz_add_field_table($ms_table,
> >+            $self->_bz_schema->MULTI_SELECT_VALUE_TABLE);
> >+ 
> >+        $self->updatecreate_column_refs;
> 
>   Instead of copying all of the code, you should be calling
> SUPER::bz_add_field_tables and then doing your custom piece afterward. Also,
> your function name should start with bz_ or _bz_.
> 

Removed as there was nothing custom.

> 
> >+sub adjust_statement {
> >+    my ($sql) = @_;
> >+
> >+    # We can't just assume any occurrence of "''" in $sql is an empty
> >+    # string, since "''" can occur inside a string literal as a way of
> >+    # escaping a single "'" in the literal.  Therefore we must be trickier...
> >+
> >+    # split the statement into parts by single-quotes.  The negative value
> >+    # at the end to the split operator from dropping trailing empty strings
> >+    # (e.g., when $sql ends in "''")
> >+    my @parts = split /'/, $sql, -1;
> > [snip]
> 
> >+        # Look for a LIMIT clause
> >+        ($limit) = ($nonstring =~ m(/\* LIMIT (\d*) \*/)o);
> 
>   Why do you essentially have this code twice?
 
One is looking for a normal LIMIT and the other looks for LIMIT with offset.

>   Also, I think you don't need $has_from, $has_select, or probably this
> entire loop.

Those vars are not needed and have been removed. Loop is essential and has to remain.

> 
> >+    if (defined($limit)) {
> >+        if ($new_sql !~ /\bWHERE\b/) {
> >+            $new_sql = $new_sql." WHERE 1=1";
> 
>   Nit: Space around the period.

Done.
 
> >+         my ($before_where, $after_where) = split /\bWHERE\b/i,$new_sql;
> 
>   What if it has the word WHERE in a literal string? (Like, what if I search
> for "WHERE" on query.cgi?)

This was changed so literal strings cannot be affected now as replacements are done from the loop.

> >+         $before_where =~ s/^(select +(distinct)?)/$1 top $limit /i;
> 
>   Nit: Use \s+ instead of a raw space with a plus character after it.

Done.

> 
> >+    $new_sql =~ s/CURRENT_DATE\(?\)?/CAST(GETDATE() as DATE)/igo;
> 
>   Oh, CURRENT_DATE in a raw string....
> 
> >+    $new_sql =~ s/LOCALTIMESTAMP\(0\)/GETDATE()/igo; # Might have to change this is we do more than (0) in the future
> 
>   And so on....
> 
> >+    $new_sql =~ s/NOW\(\)/GETDATE()/igo;
> 
>   And so on....

Done, done and done.

> 
> >+    $new_sql =~ s/\(?CAST\((\b[\w\.]+\(?\)?) as (\w+\(?\)?)\) (\+|\-) \/\*INTERVAL (\d+|\?) (\b\w+\b)\*\/\)? (\+|\-) \/\*INTERVAL (\d+|\?) (\b\w+\b)\*\//CAST(DATEADD($5, $3$4, DATEADD($8, $6CAST($7 as int), $1)) as $2 )/;
> 
>   This is pretty hard to read. Could you use /x and make it more readable?
> 

Done. Hopefully now it is much clearer what is actually happening.

> 
> 
> >+sub db_lock {
> 
>   What's this for?

Removed.

> 
> >+# Should really live in it's own .pm file...
> 
>   Nit: its
> 
>   (it's is a contraction, "its" is a possessive pronoun like his or hers,
> which also don't have apostrophes in them.)

Corrected.

> 
> >+# DNI function override for MSSQL sql adjustments
> 
>   DNI? (Is that supposed to be DBI?)
> 

Corrected.

> >+package Bugzilla::DB::Mssql::st;
> > [snip]
> 
>   And these should go into their own class as well, that both MS-SQL and
> Oracle can subclass.
> 
>   Let's call the package Bugzilla::DB::StatementModifier, and it can also
> contain Bugzilla::DB::StatementModifier::st.

Haven't done anything about this just yet as I am not quite sure about the scope of this change just yet and may need a bit of discussion before I am sure what needs to be done here.

(In reply to Max Kanat-Alexander from comment #28)
> (In reply to comment #24)
> > MKanat mentioned moving items like
> > 
> > >$new_sql =~ s/CURRENT_DATE\(?\)?/CAST(GETDATE() as DATE)/igo;
> > 
> > inside the @parts loop, however, does running the regex mulitple times (once
> > per loop) make more since than once at the end as it is now?
> 
>   Yes, because you *must* do replacements inside the loop, otherwise you're
> also modifying strings that people are searching for or otherwise using in
> SQL.

This was done.



(In reply to Max Kanat-Alexander from comment #33)
> (In reply to comment #29)
> > Correct, the drivers aren't called "SQL Server Native Client" they are called
> > with the major version number.
> > ie SQL Server Native Client 9.0  (2005)
> >    SQL Server Native Client 10.0 (2008)
> 
>   Wow. That is utterly lame.
> 
>   Okay, here's what we should do:
> 
>   Allow people to put a space in $db_driver, for MS-SQL. The item after the
> space will be the version number. It will default to 10.0 if not specified.
> This will be only for MS-SQL.

I have only made the version number a constant in the pm for now. Allowing a space to be added in $db_driver will require a number of changes in DB.pm. Suggest we open another bug for this.



(In reply to Max Kanat-Alexander from comment #35)
> > >When building DBD::ODBC on Windows ($^O eq 'MSWin32') the WITH_UNICODE macro >is automatically added
> > 
> > >Do not confuse this with DBD::ODBC's unicode support. The odbc_utf8_on  >attribute only applies to non-unicode enabled builds of DBD::ODBC.
> > 
> > Taking those togeather, it means not required under Windows.
> 
>   Yes, but somebody may want to host Bugzilla on *nix with the MS-SQL server
> on Windows, so we have to make this check--probably just assure that we have
> odbc_has_unicode on.

odbc_utf8 attribute has been added.
Wow, amazing! Thank you, Jasmin! I will get to reviewing this some time in the next few days. If I don't, please email me directly.
Attached patch v10 (obsolete) — Splinter Review
Few little tweaks and removed full-text support as this currently can't be done directly. Full-text support should be worked on later.
Attachment #562295 - Attachment is obsolete: true
Attachment #562687 - Flags: review?(mkanat)
Attachment #562295 - Flags: review?(mkanat)
Comment on attachment 562687 [details] [diff] [review]
v10

Review of attachment 562687 [details] [diff] [review]:
-----------------------------------------------------------------

This is wonderful! I'm really happy that you've taken on this work! This is looking much much better!

I have a lot of comments on this patch, below, but I want you to know that it's totally normal to have this many comments on a patch of this size, and that I'm really really happy with the work you're doing here. :-)

::: C:/Bugzilla/MSSQL/Bugzilla/DB/Mssql.pm
@@ +10,5 @@
> +# implied. See the License for the specific language governing
> +# rights and limitations under the License.
> +#
> +# The Original Code is the Bugzilla Bug Tracking System.
> +#

This needs an Initial Developer section.

@@ +13,5 @@
> +# The Original Code is the Bugzilla Bug Tracking System.
> +#
> +# Contributor(s): 
> +#                 Michael Thomas <mockodin@gmail.com>
> +#                 Jasmin Sehic <jasmins@embedcard.com>

Nit: Only two-space indent past Contributor(s) below.

@@ +17,5 @@
> +#                 Jasmin Sehic <jasmins@embedcard.com>
> +
> +package Bugzilla::DB::Mssql;
> +use base qw(Bugzilla::DB);
> +use strict;

use strict should go above use base.

@@ +29,5 @@
> +
> +use constant EMPTY_STRING  => '';
> +
> +# needed for correct BLOB binding in MSSQL
> +use constant BLOB_TYPE =>  undef;

Nit: Extra spaces before "undef"

@@ +31,5 @@
> +
> +# needed for correct BLOB binding in MSSQL
> +use constant BLOB_TYPE =>  undef;
> +
> +# needed to avoid data truncation by increasing LongReadLen

Is this for blobs or for other fields as well?

@@ +32,5 @@
> +# needed for correct BLOB binding in MSSQL
> +use constant BLOB_TYPE =>  undef;
> +
> +# needed to avoid data truncation by increasing LongReadLen
> +use constant LONG_READ_LEN  => 5000000;

Nit: There's an extra space before =>

Also, Perl lets you insert _ anywhere into numbers for readability--for long numbers like this it would probably be best to do:

50_000_000

@@ +38,5 @@
> +# odbc_default_bind_type '-9' is SQLWCHAR which allows for NVARCHAR
> +use constant ODBC_BIND_TYPE => '-9'; 
> +
> +# SQL Server Native Client version default is SQL 2008 => 10.0
> +use constant CLI_VERSION => '10.0'; 

Okay. What should happen if users are using a different version? Perhaps put a comment here helping me and others to understand how this should be edited for various versions, and if any other work is required.

@@ +69,5 @@
> +
> +sub bz_explain {
> +    my ($self, $sql) = @_;
> +    # effectly does nothing but allow SQL to display
> +    return '';

You can actually just leave this empty--see Bugzilla::DB::Sqlite and copy what's there.

@@ +76,5 @@
> +sub bz_last_key {
> +    my ($self) = @_;
> +    my ($last_insert_id) = $self->selectrow_array('SELECT @@IDENTITY');
> +    return $last_insert_id;
> +}

Does the normal DBI last_insert_id work? If so, we don't need to override bz_last_key.

@@ +86,5 @@
> +}
> +
> +sub sql_regexp {
> +    my ($self, $expr, $pattern) = @_;
> +    $pattern =~ s/\'\[\[\:cntrl\:\]\]\'/\'\\\\x00-\\\\x1f\'/igo;

You don't need to escape the single-quote. You also don't need to escape the colon.

It looks like you're replacing a character class with a literal, though:

[[:cntrl:]] becomes:
\\x00-\\x1f

When really I would think what you want is for:

[:cntrl:]

to become:

\\x00-\\x1f

Also you always want to replace [:cntrl:], not just when it's the whole string. So you don't need to check for or replace the single-quotes, either, I would imagine.

@@ +92,5 @@
> +}
> +
> +sub sql_not_regexp {
> +    my ($self, $expr, $pattern) = @_;
> +    $pattern =~ s/\'\[\[\:cntrl\:\]\]\'/\'\\\\x00-\\\\x1f\'/igo;

Same comment there. Also might want to abstract this out into a separate method instead of duplicating the code.

@@ +98,5 @@
> +}
> +
> +sub sql_limit {
> +    my ($self, $limit, $offset) = @_;        
> +    if(defined $offset) {

Nit: Space after "if"

@@ +109,5 @@
> +    my ($self, @params) = @_;
> +    my @concat_strings;
> +    foreach my $string (@params) {
> +        push @concat_strings, "CAST($string as nvarchar(max))";
> +    }    

You could do this with a map instead:

my @concat_strings = map { "CAST($_ AS NVARCHAR(MAX))" } @params;

@@ +120,5 @@
> +}
> +
> +sub sql_from_days {
> +    my ($self, $days) = @_;
> +    return "dateadd(dd, 0, ($days))";

Generally we like to capitalize anything that's SQL, so I would go with DATEADD here.

@@ +125,5 @@
> +}
> +
> +sub sql_to_days {
> +    my ($self, $date) = @_;
> +    return "datediff(dd, 0, $date)";

And DATEDIFF here.

@@ +142,5 @@
> +}
> +
> +sub sql_iposition {
> +    my ($self, $fragment, $text) = @_;
> +    return "CAST(dbo.INSTR($fragment, $text, 1) AS NVARCHAR(MAX))";

iposition should return an integer, not an nvarchar.

@@ +147,5 @@
> +}
> +
> +sub sql_position {
> +    my ($self, $fragment, $text) = @_;
> +    return "CAST(dbo.INSTR($fragment, $text, 0) AS NVARCHAR(MAX))";

Same there, should be an integer, not an nvarchar.

@@ +154,5 @@
> +sub sql_group_by {
> +    my ($self, $needed_columns, $optional_columns) = @_;    
> +    return ($optional_columns ? "GROUP BY $needed_columns, $optional_columns" : 
> +                                "GROUP BY $needed_columns");
> +}

This is the same as the base sql_group_by, so this override isn't needed.

@@ +161,5 @@
> +    my ($self, $column, $separator, $sortorder) = @_;
> +    $column = trim($column);
> +    
> +    if(defined $separator) {
> +        $separator = "'$separator'" if $separator !~ /^'.*'$/;

You should never quote things manually in Bugzilla's DB code, you should always use $dbh->quote (or in this case, $self->quote).

However, presently if we pass in a separator, it should be quoted, so this code isn't necessary here at all.

@@ +163,5 @@
> +    
> +    if(defined $separator) {
> +        $separator = "'$separator'" if $separator !~ /^'.*'$/;
> +    } else {
> +        $separator = ""

The separator should default to a comma and space if not specified--look at Bugzilla::DB::Pg for an example.

@@ +168,5 @@
> +    }
> +    
> +    if(defined $sortorder) {
> +        $sortorder = 'NTRL' unless $sortorder =~ /ASC|DESC/;
> +        $sortorder = "'$sortorder'" if $sortorder !~ /^'.*'$/;

Same note there about quotes--you should always use $self->quote and never quote manually.

@@ +185,5 @@
> +    my ($self) = @_;
> +    
> +    # drop functions
> +    $self->_bz_drop_object("[dbo].[FROM_DAYS]", 
> +                           "N'FN', N'IF', N'TF', N'FS', N'FT'", 

Looks like this line is repeated. Perhaps what would be best would be to add a _bz_add_function($name, $code) that could do this drop before the function was created.

@@ +267,5 @@
> +     IF EXISTS (
> +        SELECT name FROM sys.procedures 
> +        WHERE name = N'sp_alter_column_default' and type = 'P'
> +     )
> +     DROP PROC [sp_alter_column_default]");

Same note here, we should have a _bz_add_procedure.

@@ +289,5 @@
> +                exec('alter table '+@table+' drop constraint '+@constraint)
> +            END
> +            exec('alter table [dbo].['+@table+'] 
> +                  add default ('''+@value+''') for ['+@column+']')
> +        END

My eyes start to glaze over when I read these, because I don't know the MS-SQL stored query language, or the MS-SQL information schema tables layout. Perhaps ask on the developers list for somebody who's familiar with this to take a look at it and give it an informal review?

@@ +337,5 @@
> +            push @{ $self->{schema}{$table}{FIELDS} }, 
> +                 "$table\_id" => { TYPE => 'MEDIUMSERIAL', 
> +                                   NOTNULL => 1, PRIMARYKEY => 1, };
> +        }
> +    }

Ah, you should never access {schema} directly, you should always use the methods that Schema provides, and access it via _bz_real_schema or _bz_schema.

Also, why is this being done? If we have to have a PK, can we simply add it after creating the table, during bz_add_table, instead? Also, you will have to modify all of the schema-modification functions to address that there is an "invisible" PK on the table, in the case where they add a PK. This seems like quite a bit of complexity.

@@ +368,5 @@
> +
> +# Add MSSQL assemblies from contrib
> +sub _bz_add_assembly {
> +    my ($self, $name) = @_;    
> +    my $file =  "contrib/$name.dll";

You'll need to use one of the bz_locations() values for that directory name--it's not always consistent and you're not always guaranteed to be running in the directory where your .cgi file is. I'm guessing the libpath (or whatever it's called) would work?

@@ +371,5 @@
> +    my ($self, $name) = @_;    
> +    my $file =  "contrib/$name.dll";
> +    
> +    # load entire CLR assembly from file
> +    open (FILE, $file) || die "Can't open $file: $!\n"; 

Ah, it's best to never use the two-argument form of open, and to always use the three-argument form.

Also, it's best not to use a global "glob" like FILE there. Instead, you can do:

open(my $fh, '<', $file) || die "$file: $!";

@@ +375,5 @@
> +    open (FILE, $file) || die "Can't open $file: $!\n"; 
> +    local $/;  
> +    binmode FILE;   
> +    my $assembly = <FILE>;
> +    close (FILE);    

The best way to do this is instead:

my $assembly;
binmode $assembly;
{ local $/; $assembly = <$fh>; }
close($fh) || warn "$file: $!";

@@ +389,5 @@
> +        WITH PERMISSION_SET = SAFE");
> +}
> +
> +# used by adjust statement to generate a unique column alias
> +sub _get_random_name {

Why aren't you using generate_random_password from Bugzilla::Util?

@@ +390,5 @@
> +}
> +
> +# used by adjust statement to generate a unique column alias
> +sub _get_random_name {
> +    my @chars=('a'..'z','A'..'Z');

Nit: Spaces around =

@@ +393,5 @@
> +sub _get_random_name {
> +    my @chars=('a'..'z','A'..'Z');
> +    my $random_string;
> +    foreach (1..22) {
> +        $random_string .= $chars[rand @chars];

Ah, never use Perl's rand in the Bugzilla codebase. (Sorry, this isn't documented anywhere, but it's now true.) Instead you want Bugzilla::RNG::irand, but you probably don't need this function at all, so we're probably fine there.

@@ +413,5 @@
> +      \(?CAST\((\b[\w\.]+\(?\)?)\s+as\s+(\w+\(?\)?)\) # date is $1 and $2 is type 
> +      \s*(\+|\-)\s*                                   # + or - is $3
> +      \/\*INTERVAL\s*(\d+|\?)\s*(\b\w+\b)\*\/\)?      # INTERVAL comment $4 & $5
> +      \s*(\+|\-)\s*                                   # + or - is $6
> +      \/\*INTERVAL\s*(\d+|\?)\s*(\b\w+\b)\*\/         # INTERVAL comment $7 & $8

Wow, these are so much clearer now, thank you so much. :-)

Instead of looking at nesting, why don't you just keep re-running the regex until you don't see INTERVAL in there anymore?

@@ +427,5 @@
> +     \(?(\b[\w\.]+\(?\)?)                       # date as $1
> +     \s*(\+|\-)\s*                              # + or - is $2
> +     \/\*INTERVAL\s+(\d+|\?)\s+(\b\w+\b)\*\/\)? # INTERVAL comment $3 & $4
> +     \s*(\+|\-)\s*                              # + or - is $5
> +     \/\*INTERVAL\s+(\d+|\?)\s+(\b\w+\b)\*\/    # INTERVAL comment $6 & $8

Same here. Also, why do we even care about CAST or not CAST?

@@ +456,5 @@
> +     
> +     # replace with MSSQL dateadd function instead of interval
> +     /DATEADD($4, $2CAST($3 AS int), $1)
> +    
> +     /xigo;

You probably don't want "i" in there or in any of these interval regexes actually, because your INTERVAL should always be in caps, since that's what you returned from sql_interval.

@@ +459,5 @@
> +    
> +     /xigo;
> +    
> +    # Length replace
> +    $sql =~ s/\bLENGTH\b\((.*)\)/LEN($1)/igo;

You probably want to make that .*? instead of just .*

@@ +462,5 @@
> +    # Length replace
> +    $sql =~ s/\bLENGTH\b\((.*)\)/LEN($1)/igo;
> +    
> +    # Substring replace
> +    $sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;

Same for that .* there.

@@ +465,5 @@
> +    # Substring replace
> +    $sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;
> +    
> +    # Modulo replace
> +    $sql =~ s/\bMOD\b\(\s*(.*)\s*,\s*(.*)\s*\)/($1 % $2)/igo;  

Does that work on our other DBs, just using the % operator? If it does, we should just switch.

@@ +470,5 @@
> +      
> +    return $sql;
> +}
> +
> +sub adjust_statement {

A lot of this is shared with the Oracle driver, right? If so, most or all of this method should be moved up into Bugzilla::DB.

@@ +565,5 @@
> +    }    
> +    return $new_sql;
> +}
> +
> +sub do {

And yeah, my comment about StatementModifier is still true, let's definitely still do that.
Attachment #562687 - Flags: review?(mkanat) → review-
Assignee: mockodin → jasmins
Attached patch v11Splinter Review
Attachment #562687 - Attachment is obsolete: true
Attachment #564013 - Flags: review?(mkanat)
(In reply to Max Kanat-Alexander from comment #59)
> Comment on attachment 562687 [details] [diff] [review] [diff] [details] [review]
> v10
> 
> Review of attachment 562687 [details] [diff] [review] [diff] [details] [review]:
> -----------------------------------------------------------------
> 
> This is wonderful! I'm really happy that you've taken on this work! This is
> looking much much better!
> 
> I have a lot of comments on this patch, below, but I want you to know that
> it's totally normal to have this many comments on a patch of this size, and
> that I'm really really happy with the work you're doing here. :-)
> 
> ::: C:/Bugzilla/MSSQL/Bugzilla/DB/Mssql.pm
> @@ +10,5 @@
> > +# implied. See the License for the specific language governing
> > +# rights and limitations under the License.
> > +#
> > +# The Original Code is the Bugzilla Bug Tracking System.
> > +#
> 
> This needs an Initial Developer section.

Done.

> 
> @@ +13,5 @@
> > +# The Original Code is the Bugzilla Bug Tracking System.
> > +#
> > +# Contributor(s): 
> > +#                 Michael Thomas <mockodin@gmail.com>
> > +#                 Jasmin Sehic <jasmins@embedcard.com>
> 
> Nit: Only two-space indent past Contributor(s) below.

Done.

> 
> @@ +17,5 @@
> > +#                 Jasmin Sehic <jasmins@embedcard.com>
> > +
> > +package Bugzilla::DB::Mssql;
> > +use base qw(Bugzilla::DB);
> > +use strict;
> 
> use strict should go above use base.

Done.

> 
> @@ +29,5 @@
> > +
> > +use constant EMPTY_STRING  => '';
> > +
> > +# needed for correct BLOB binding in MSSQL
> > +use constant BLOB_TYPE =>  undef;
> 
> Nit: Extra spaces before "undef"

Fixed.

> 
> @@ +31,5 @@
> > +
> > +# needed for correct BLOB binding in MSSQL
> > +use constant BLOB_TYPE =>  undef;
> > +
> > +# needed to avoid data truncation by increasing LongReadLen
> 
> Is this for blobs or for other fields as well?

This is for all long fields (e.g. VARCHAR, BLOB, etc)

> 
> @@ +32,5 @@
> > +# needed for correct BLOB binding in MSSQL
> > +use constant BLOB_TYPE =>  undef;
> > +
> > +# needed to avoid data truncation by increasing LongReadLen
> > +use constant LONG_READ_LEN  => 5000000;
> 
> Nit: There's an extra space before =>

Fixed.

> 
> Also, Perl lets you insert _ anywhere into numbers for readability--for long
> numbers like this it would probably be best to do:
> 
> 50_000_000

Done as suggested.

> 
> @@ +38,5 @@
> > +# odbc_default_bind_type '-9' is SQLWCHAR which allows for NVARCHAR
> > +use constant ODBC_BIND_TYPE => '-9'; 
> > +
> > +# SQL Server Native Client version default is SQL 2008 => 10.0
> > +use constant CLI_VERSION => '10.0'; 
> 
> Okay. What should happen if users are using a different version? Perhaps put
> a comment here helping me and others to understand how this should be edited
> for various versions, and if any other work is required.

Added a comment. Let me know if it isn't clear.

> 
> @@ +69,5 @@
> > +
> > +sub bz_explain {
> > +    my ($self, $sql) = @_;
> > +    # effectly does nothing but allow SQL to display
> > +    return '';
> 
> You can actually just leave this empty--see Bugzilla::DB::Sqlite and copy
> what's there.

Done.

> 
> @@ +76,5 @@
> > +sub bz_last_key {
> > +    my ($self) = @_;
> > +    my ($last_insert_id) = $self->selectrow_array('SELECT @@IDENTITY');
> > +    return $last_insert_id;
> > +}
> 
> Does the normal DBI last_insert_id work? If so, we don't need to override
> bz_last_key.

DBI interface for ODBC doesn't have a last_insert_id method so DBMS specific SELECT must be done to return the id from last insert.

> 
> @@ +86,5 @@
> > +}
> > +
> > +sub sql_regexp {
> > +    my ($self, $expr, $pattern) = @_;
> > +    $pattern =~ s/\'\[\[\:cntrl\:\]\]\'/\'\\\\x00-\\\\x1f\'/igo;
> 
> You don't need to escape the single-quote. You also don't need to escape the
> colon.
> 
> It looks like you're replacing a character class with a literal, though:
> 
> [[:cntrl:]] becomes:
> \\x00-\\x1f
> 
> When really I would think what you want is for:
> 
> [:cntrl:]
> 
> to become:
> 
> \\x00-\\x1f
> 
> Also you always want to replace [:cntrl:], not just when it's the whole
> string. So you don't need to check for or replace the single-quotes, either,
> I would imagine.
> 
> @@ +92,5 @@
> > +}
> > +
> > +sub sql_not_regexp {
> > +    my ($self, $expr, $pattern) = @_;
> > +    $pattern =~ s/\'\[\[\:cntrl\:\]\]\'/\'\\\\x00-\\\\x1f\'/igo;
> 
> Same comment there. Also might want to abstract this out into a separate
> method instead of duplicating the code.
> 

Done, done and done.

> @@ +98,5 @@
> > +}
> > +
> > +sub sql_limit {
> > +    my ($self, $limit, $offset) = @_;        
> > +    if(defined $offset) {
> 
> Nit: Space after "if"

Fixed.

> 
> @@ +109,5 @@
> > +    my ($self, @params) = @_;
> > +    my @concat_strings;
> > +    foreach my $string (@params) {
> > +        push @concat_strings, "CAST($string as nvarchar(max))";
> > +    }    
> 
> You could do this with a map instead:
> 
> my @concat_strings = map { "CAST($_ AS NVARCHAR(MAX))" } @params;

Done as suggested.

> 
> @@ +120,5 @@
> > +}
> > +
> > +sub sql_from_days {
> > +    my ($self, $days) = @_;
> > +    return "dateadd(dd, 0, ($days))";
> 
> Generally we like to capitalize anything that's SQL, so I would go with
> DATEADD here.
> 
> @@ +125,5 @@
> > +}
> > +
> > +sub sql_to_days {
> > +    my ($self, $date) = @_;
> > +    return "datediff(dd, 0, $date)";
> 
> And DATEDIFF here.

Done and done.

> 
> @@ +142,5 @@
> > +}
> > +
> > +sub sql_iposition {
> > +    my ($self, $fragment, $text) = @_;
> > +    return "CAST(dbo.INSTR($fragment, $text, 1) AS NVARCHAR(MAX))";
> 
> iposition should return an integer, not an nvarchar.
> 
> @@ +147,5 @@
> > +}
> > +
> > +sub sql_position {
> > +    my ($self, $fragment, $text) = @_;
> > +    return "CAST(dbo.INSTR($fragment, $text, 0) AS NVARCHAR(MAX))";
> 
> Same there, should be an integer, not an nvarchar.

Done and done.

> 
> @@ +154,5 @@
> > +sub sql_group_by {
> > +    my ($self, $needed_columns, $optional_columns) = @_;    
> > +    return ($optional_columns ? "GROUP BY $needed_columns, $optional_columns" : 
> > +                                "GROUP BY $needed_columns");
> > +}
> 
> This is the same as the base sql_group_by, so this override isn't needed.

Removed.

> 
> @@ +161,5 @@
> > +    my ($self, $column, $separator, $sortorder) = @_;
> > +    $column = trim($column);
> > +    
> > +    if(defined $separator) {
> > +        $separator = "'$separator'" if $separator !~ /^'.*'$/;
> 
> You should never quote things manually in Bugzilla's DB code, you should
> always use $dbh->quote (or in this case, $self->quote).
> 
> However, presently if we pass in a separator, it should be quoted, so this
> code isn't necessary here at all.
> 
> @@ +163,5 @@
> > +    
> > +    if(defined $separator) {
> > +        $separator = "'$separator'" if $separator !~ /^'.*'$/;
> > +    } else {
> > +        $separator = ""
> 
> The separator should default to a comma and space if not specified--look at
> Bugzilla::DB::Pg for an example.

Done.

> 
> @@ +168,5 @@
> > +    }
> > +    
> > +    if(defined $sortorder) {
> > +        $sortorder = 'NTRL' unless $sortorder =~ /ASC|DESC/;
> > +        $sortorder = "'$sortorder'" if $sortorder !~ /^'.*'$/;
> 
> Same note there about quotes--you should always use $self->quote and never
> quote manually.

Done.

> 
> @@ +185,5 @@
> > +    my ($self) = @_;
> > +    
> > +    # drop functions
> > +    $self->_bz_drop_object("[dbo].[FROM_DAYS]", 
> > +                           "N'FN', N'IF', N'TF', N'FS', N'FT'", 
> 
> Looks like this line is repeated. Perhaps what would be best would be to add
> a _bz_add_function($name, $code) that could do this drop before the function
> was created.
> 
> @@ +267,5 @@
> > +     IF EXISTS (
> > +        SELECT name FROM sys.procedures 
> > +        WHERE name = N'sp_alter_column_default' and type = 'P'
> > +     )
> > +     DROP PROC [sp_alter_column_default]");
> 
> Same note here, we should have a _bz_add_procedure.

Added _bz_add_function, _bz_add_aggregate, _bz_add_procedure and made _bz_add_assembly drop it before creating it too. I couldn't have the drop happen in the same sub as assembly cannot be dropped if there are references to it. So all CLR objects must be dropped first, assembly dropped and created and then all other CLR objects recreated.

> 
> @@ +289,5 @@
> > +                exec('alter table '+@table+' drop constraint '+@constraint)
> > +            END
> > +            exec('alter table [dbo].['+@table+'] 
> > +                  add default ('''+@value+''') for ['+@column+']')
> > +        END
> 
> My eyes start to glaze over when I read these, because I don't know the
> MS-SQL stored query language, or the MS-SQL information schema tables
> layout. Perhaps ask on the developers list for somebody who's familiar with
> this to take a look at it and give it an informal review?

Mockodin has written the sql and I have reviewed it myself if that counts :)

> 
> @@ +337,5 @@
> > +            push @{ $self->{schema}{$table}{FIELDS} }, 
> > +                 "$table\_id" => { TYPE => 'MEDIUMSERIAL', 
> > +                                   NOTNULL => 1, PRIMARYKEY => 1, };
> > +        }
> > +    }
> 
> Ah, you should never access {schema} directly, you should always use the
> methods that Schema provides, and access it via _bz_real_schema or
> _bz_schema.
> 
> Also, why is this being done? If we have to have a PK, can we simply add it
> after creating the table, during bz_add_table, instead? Also, you will have
> to modify all of the schema-modification functions to address that there is
> an "invisible" PK on the table, in the case where they add a PK. This seems
> like quite a bit of complexity.

This is part of Bug 535821 so we should discuss this there. Mockodin indicates that this is necessary to reduce page locking and table scanning. 

> 
> @@ +368,5 @@
> > +
> > +# Add MSSQL assemblies from contrib
> > +sub _bz_add_assembly {
> > +    my ($self, $name) = @_;    
> > +    my $file =  "contrib/$name.dll";
> 
> You'll need to use one of the bz_locations() values for that directory
> name--it's not always consistent and you're not always guaranteed to be
> running in the directory where your .cgi file is. I'm guessing the libpath
> (or whatever it's called) would work?

Yup libpath works.

> 
> @@ +371,5 @@
> > +    my ($self, $name) = @_;    
> > +    my $file =  "contrib/$name.dll";
> > +    
> > +    # load entire CLR assembly from file
> > +    open (FILE, $file) || die "Can't open $file: $!\n"; 
> 
> Ah, it's best to never use the two-argument form of open, and to always use
> the three-argument form.
> 
> Also, it's best not to use a global "glob" like FILE there. Instead, you can
> do:
> 
> open(my $fh, '<', $file) || die "$file: $!";
> 
> @@ +375,5 @@
> > +    open (FILE, $file) || die "Can't open $file: $!\n"; 
> > +    local $/;  
> > +    binmode FILE;   
> > +    my $assembly = <FILE>;
> > +    close (FILE);    
> 
> The best way to do this is instead:
> 
> my $assembly;
> binmode $assembly;
> { local $/; $assembly = <$fh>; }
> close($fh) || warn "$file: $!";

Done as suggested.

> 
> @@ +389,5 @@
> > +        WITH PERMISSION_SET = SAFE");
> > +}
> > +
> > +# used by adjust statement to generate a unique column alias
> > +sub _get_random_name {
> 
> Why aren't you using generate_random_password from Bugzilla::Util?

In MS-SQL column aliases cannot being with numbers. generate_random_password will create invalid aliases so we cannot use it.

> 
> @@ +390,5 @@
> > +}
> > +
> > +# used by adjust statement to generate a unique column alias
> > +sub _get_random_name {
> > +    my @chars=('a'..'z','A'..'Z');
> 
> Nit: Spaces around =
> 
> @@ +393,5 @@
> > +sub _get_random_name {
> > +    my @chars=('a'..'z','A'..'Z');
> > +    my $random_string;
> > +    foreach (1..22) {
> > +        $random_string .= $chars[rand @chars];
> 
> Ah, never use Perl's rand in the Bugzilla codebase. (Sorry, this isn't
> documented anywhere, but it's now true.) Instead you want
> Bugzilla::RNG::irand, but you probably don't need this function at all, so
> we're probably fine there.

Ok I've changed this.

> 
> @@ +413,5 @@
> > +      \(?CAST\((\b[\w\.]+\(?\)?)\s+as\s+(\w+\(?\)?)\) # date is $1 and $2 is type 
> > +      \s*(\+|\-)\s*                                   # + or - is $3
> > +      \/\*INTERVAL\s*(\d+|\?)\s*(\b\w+\b)\*\/\)?      # INTERVAL comment $4 & $5
> > +      \s*(\+|\-)\s*                                   # + or - is $6
> > +      \/\*INTERVAL\s*(\d+|\?)\s*(\b\w+\b)\*\/         # INTERVAL comment $7 & $8
> 
> Wow, these are so much clearer now, thank you so much. :-)
> 
> Instead of looking at nesting, why don't you just keep re-running the regex
> until you don't see INTERVAL in there anymore?

Nesting probably wasn't my best choice of words to put in the comment. There are specific ways INTERVAL is being used in Bugzilla, sometimes on its own, sometimes in a CAST and a lot of the time there are calculations being done to subtract or add time. These calculations simply cannot be done in same order with MS-SQL specific date functions so these statement adjustments are very specific.

> 
> @@ +427,5 @@
> > +     \(?(\b[\w\.]+\(?\)?)                       # date as $1
> > +     \s*(\+|\-)\s*                              # + or - is $2
> > +     \/\*INTERVAL\s+(\d+|\?)\s+(\b\w+\b)\*\/\)? # INTERVAL comment $3 & $4
> > +     \s*(\+|\-)\s*                              # + or - is $5
> > +     \/\*INTERVAL\s+(\d+|\?)\s+(\b\w+\b)\*\/    # INTERVAL comment $6 & $8
> 
> Same here. Also, why do we even care about CAST or not CAST?
> 
> @@ +456,5 @@
> > +     
> > +     # replace with MSSQL dateadd function instead of interval
> > +     /DATEADD($4, $2CAST($3 AS int), $1)
> > +    
> > +     /xigo;
> 
> You probably don't want "i" in there or in any of these interval regexes
> actually, because your INTERVAL should always be in caps, since that's what
> you returned from sql_interval.

I think Mockodin did that just as a safety net in case caps were not used somewhere. Is there much harm in doing insensitive comparisons? I've taken i out for now.

> 
> @@ +459,5 @@
> > +    
> > +     /xigo;
> > +    
> > +    # Length replace
> > +    $sql =~ s/\bLENGTH\b\((.*)\)/LEN($1)/igo;
> 
> You probably want to make that .*? instead of just .*
> 
> @@ +462,5 @@
> > +    # Length replace
> > +    $sql =~ s/\bLENGTH\b\((.*)\)/LEN($1)/igo;
> > +    
> > +    # Substring replace
> > +    $sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;
> 
> Same for that .* there.

Done and done.

> 
> @@ +465,5 @@
> > +    # Substring replace
> > +    $sql =~ s/SUBSTRING\((.*) FROM (\d+) FOR (\d+)\)/SUBSTRING($1, $2, $3)/igo;
> > +    
> > +    # Modulo replace
> > +    $sql =~ s/\bMOD\b\(\s*(.*)\s*,\s*(.*)\s*\)/($1 % $2)/igo;  
> 
> Does that work on our other DBs, just using the % operator? If it does, we
> should just switch.

Looked at all DB's and found that all support the % operator except for Oracle. 

> 
> @@ +470,5 @@
> > +      
> > +    return $sql;
> > +}
> > +
> > +sub adjust_statement {
> 
> A lot of this is shared with the Oracle driver, right? If so, most or all of
> this method should be moved up into Bugzilla::DB.

I'd say very little is shared instead of a lot. The logic has changed quite a bit from Oracle and there are many more MS-SQL specific adjustments than there were Oracle ones.

> 
> @@ +565,5 @@
> > +    }    
> > +    return $new_sql;
> > +}
> > +
> > +sub do {
> 
> And yeah, my comment about StatementModifier is still true, let's definitely
> still do that.

There is a separate Bug 557929 made for this so let's discuss this there.
Thanks for the updated patch! By the way, do you have the CLR code, also, somewhere? I can actually read and review C# now, so I'm happy to comment on that as well.
(In reply to Max Kanat-Alexander from comment #62)
> Thanks for the updated patch! By the way, do you have the CLR code, also,
> somewhere? I can actually read and review C# now, so I'm happy to comment on
> that as well.

Latest CLR code lives in Bug 565720.
Attachment #436376 - Attachment is obsolete: true
Comment on attachment 564013 [details] [diff] [review]
v11

Review of attachment 564013 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, way better!! Getting really close!! :-)

::: Bugzilla/DB/Mssql.pm
@@ +31,5 @@
> +use Bugzilla::Error;
> +use Bugzilla::DB::Schema::Mssql;
> +use List::Util qw(max);
> +
> +use constant EMPTY_STRING => '';

I don't think you need a constant for this.

@@ +50,5 @@
> +#
> +#   For MS-SQL 2005 use 9.0
> +#   For MS-SQL 2008 use 10.0
> +#   For MS-SQL 2008 R2 use 10.5
> +use constant CLI_VERSION => '10.0'; 

I'd rather have this in DB_MODULE, as a hash member. People already do modify Bugzilla::Constants now and again, so putting modifyable stuff there is better.

@@ +94,5 @@
> +}
> +
> +sub _sql_regex_fix {
> +    $_[1] =~ s/\[:cntrl:\]/\\\\x00-\\\\x1f/igo;
> +    return $_[1];

Don't modify $_[1]--that actually modifies the value in the caller!

@@ +193,5 @@
> +    # recreate CLR assembly
> +    $self->_bz_add_assembly();
> +    
> +    # add functions
> +    $self->_bz_add_function('FROM_DAYS', '@days [int]', '[datetime]');

These bz_add_function calls are awesome. :-) Makes the code very readable. :-) I have an idea--how about putting them into a constant hash at the top of the file? Then you could just iterate through the hash for dropping and adding. It could look like:

use constant FUNCTIONS => {
  INSTR => {
      code => '@find [nvarchar](max), 
               @inthis [nvarchar](max), 
               @CaseSensitive [bit]',
      returns => '[int]'
  },
};

@@ +232,5 @@
> +                IF (@constraint is not null) BEGIN
> +                    EXEC('ALTER TABLE '+@table+' DROP CONSTRAINT '+@constraint)
> +                END
> +                EXEC('ALTER TABLE [dbo].['+@table+'] 
> +                      ADD DEFAULT ('''+@value+''') for ['+@column+']')});

Probably best to format this using the <<ENDSQL system you used for triggers in Schema::Mssql. You can do <<'ENDSQL' to make it act like q{}.

@@ +250,5 @@
> +                    EXEC('ALTER TABLE [dbo].['+@table+'] 
> +                          DROP CONSTRAINT ['+@constraint+']')
> +                END
> +                EXEC('ALTER TABLE [dbo].['+@table+'] 
> +                      ADD PRIMARY KEY CLUSTERED (['+@column+'] ASC)')});

Same there.

@@ +268,5 @@
> +            push @{ $self->{schema}{$table}{FIELDS} }, 
> +                 "$table\_id" => { TYPE => 'MEDIUMSERIAL', 
> +                                   NOTNULL => 1, PRIMARYKEY => 1, };
> +        }
> +    }

Ah, you should probably be doing this when you generate the "CREATE TABLE" SQL, instead, and you should probably be giving it a random (or fixed) name that nobody will use. I'm concerned about what happens when people call bz_add_table outside of bz_setup_database, otherwise.

@@ +277,5 @@
> +sub _bz_drop_object {
> +    my ($self, $name, $type, $object) = @_;    
> +    $self->do("
> +         IF EXISTS (
> +             SELECT * FROM sys.objects 

Wouldn't "SELECT 1" be faster there?

@@ +356,5 @@
> +}
> +
> +# used by adjust statement to generate a unique column alias
> +sub _get_random_name {
> +    my @chars=('a'..'z','A'..'Z');

Probably best to make that a constant.

@@ +358,5 @@
> +# used by adjust statement to generate a unique column alias
> +sub _get_random_name {
> +    my @chars=('a'..'z','A'..'Z');
> +    my $random_string;
> +    foreach (1..22) {

Probably best to make that 22 a constant.

@@ +379,5 @@
> +      \(?CAST\((\b[\w\.]+\(?\)?)\s+as\s+(\w+\(?\)?)\) # date is $1 and $2 is type 
> +      \s*(\+|\-)\s*                                   # + or - is $3
> +      \/\*INTERVAL\s*(\d+|\?)\s*(\b\w+\b)\*\/\)?      # INTERVAL comment $4 & $5
> +      \s*(\+|\-)\s*                                   # + or - is $6
> +      \/\*INTERVAL\s*(\d+|\?)\s*(\b\w+\b)\*\/         # INTERVAL comment $7 & $8

That's really hard to read with all of the backslash-escaping. Instead of doing that, how about storing it somewhere else and doing \Q \E or something? Would that work? Probably once this becomes more readable I'll have more comments, too.

@@ +390,5 @@
> +    # Check for multiple interval calculations for a single date not using CAST
> +    $sql =~ s/
> +    
> +     \(?(\b[\w\.]+\(?\)?)                       # date as $1
> +     \s*(\+|\-)\s*                              # + or - is $2

The code here repeats a lot of patterns, like this one. Instead, how about storing them in variables that can be interpolated here?

@@ +422,5 @@
> +     
> +     # replace with MSSQL dateadd function instead of interval
> +     /DATEADD($4, $2CAST($3 AS int), $1)
> +    
> +     /xg;

Do the above cover every possible case where somebody could possibly use an INTERVAL in SQL? We're going to be adding code to Bugzilla, and I don't want somebody to have to modify Mssql.pm every time we want to use INTERVAL in some different context.

@@ +486,5 @@
> +        } else {
> +           push @result, $string;
> +           push @result, $nonstring;
> +        }
> +    }

I suspect that everything above this point can be shared with Oracle--this whole while loop. Then both Mssql and Oracle can implement their own _replace_sql--perhaps even by passing it in as a callback.

@@ +504,5 @@
> +        $new_sql = _replace_sql($new_sql);
> +    }
> +        
> +    # MSSQL doesn't have LIMIT, so if we find the LIMIT comment, wrap the
> +    # query with "SELECT * FROM (...) WHERE rownum < $limit" 

What happens if there's a LIMIT in a sub-select?

@@ +509,5 @@
> +    if (defined($limit)) {
> +        if ($new_sql !~ /\bWHERE\b/) {
> +            $new_sql = $new_sql . " WHERE 1=1";
> +        }    
> +        my ($before_where, $after_where) = split(/\bWHERE\b/i,$new_sql,2);    

Nit: Spaces after commas.

@@ +512,5 @@
> +        }    
> +        my ($before_where, $after_where) = split(/\bWHERE\b/i,$new_sql,2);    
> +        if (defined($offset)) {
> +            if ($new_sql =~ /(.*\s+)FROM(\s+.*)/i) { 
> +                my ($before_from,$after_from) = ($1,$2);

Nit: Spaces after commas.
Attachment #564013 - Flags: review?(mkanat) → review-
Jasmin: still alive? :)
Yes still alive. Just haven't had any time to dedicate to Bugzilla development after we restructured the development approach in our company. We've moved away from Bugzilla to using tools that better integrate with Scrum based development approach.

I've spent a fair bit of time re-factoring patches originally submitted by Mockodin and based on Mkanat's last review feedback there is still a lot of work to do to improve the standard of the code which is a bit of a shame since the code is a major improvement on the Oracle code this was based on

I can look at spending some more time over the weekends to see if I can knock this out. One thing I really don't want to get into is re-factoring code to make it shared between MSSQL and Oracle since I have no way of verifying whether I have broken something in Oracle or not and all this will do is increase the impact to the code base and make it more of an effort to complete than this has already been.

One major issue I have noted in my past attempts to get a fully working Bugzilla system on MSSQL is that TheSchwartz will need a few patches as well as one of the columns in one of the tables it uses is a reserved keyword in MSSQL and it pretty much falls over when trying to do anything.
(In reply to Jasmin Sehic from comment #66)
> One thing I really don't want to get into is re-factoring
> code to make it shared between MSSQL and Oracle

Yeah, that's fine.


> One major issue I have noted in my past attempts to get a fully working
> Bugzilla system on MSSQL is that TheSchwartz will need a few patches as well
> as one of the columns in one of the tables it uses is a reserved keyword in
> MSSQL and it pretty much falls over when trying to do anything.

Hum, I guess this is going to be really problematic, because TheSchwartz expects some given table format, AFAIK, not something we can change ourselves. Has this problem been reported to TheSchwartz developers on CPAN?
(In reply to Frédéric Buclin from comment #67)
> Hum, I guess this is going to be really problematic, because TheSchwartz
> expects some given table format, AFAIK, not something we can change
> ourselves. Has this problem been reported to TheSchwartz developers on CPAN?

I haven't yet as I was planning on getting more detail about the issue before reporting it but got sidetracked. Main issue is the keyword coalesce. I have tried decorating the keyword in square braces but TheSchwartz still doesn't like it when insert a send_mail job

Inserting a send_mail job into the Job Queue failed with the following error: Cannot find column '[coalesce]' for class 'TheSchwartz::Job' at c:/perl/site/lib/Data/ObjectDriver/Driver/DBI.pm line 326

To get this far I had to add ODBC as a DBD driver as MSSQL is being accessed through ODBC.

Internally TheSchwartz also uses LIMIT sql keyword which MSSQL doesn't like so that will need a patch as well.

As per Bug 636150 jobqueue.pl will also only work in foreground on win32 so I will need to work out how to make it a service. With Win32::Daemon perhaps? gd_redirect_output will need to be redirected to something like log4perl.

I will spend some time this week putting everything I've done so far together and incorporate as many of MKanat's coments as I can.
One thing which could be done is to ignore TheSchwartz (jobqueue) for now, and we can add to the release notes that email queuing doesn't work with MS-SQL.
OK no problem. I guess we can just have a separate bug that isn't blocking bz-mssql for the jobqueue. I will report the issues to TheSchwartz maintainers nonetheless.
We are going to branch for Bugzilla 4.4 next week and this bug is either too invasive to be accepted for 4.4 at this point or shows no recent activity. The target milestone is reset and will be set again *only* when a patch is attached and approved.

I ask the assignee to reassign the bug to the default assignee if you don't plan to work on this bug in the near future, to make it clearer which bugs should be fixed by someone else.
Target Milestone: Bugzilla 4.4 → ---
I also have a patch for this but I need to write as simple SQL statement parser as I can in order to inject ORDER BY clause in SQL statements that specify LIMIT without the ORDER BY clause. In MSSQL 2012 you can now do LIMIT but ORDER BY clause must be present. 

This poses the same problem MKanat brought up with the current way of limiting results which doesn't work so well for nested SQL statements hence needing to parse entire SQL statement to ensure no literal strings are corrupted and nested SQL statements are supported.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: