.NET Forum / Languages / C# / November 2006
Serious Bug System.Collections Sort
|
|
Thread rating:  |
william.hooper@gmail.com - 09 Nov 2006 12:46 GMT There is a longer article about this subject here: http://www.codeproject.com/useritems/SortedList_Bug.asp See the main article and the reply thread started by Robert Rohde.
Alternatively look at this code:
ArrayList a=new ArrayList();
string s1 = "-0.67:-0.33:0.33"; string s2 = "0.67:-0.33:0.33"; string s3 = "-0.67:0.33:-0.33";
a.Add(s1); a.Add(s2); a.Add(s3);
a.Sort(); for (int i=0; i<3; i++) Console.WriteLine( a[i] );
Console.WriteLine();
a.Clear(); a.Add(s1); a.Add(s3); a.Add(s2);
a.Sort(); for (int i=0; i<3; i++) Console.WriteLine( a[i] );
This code produces the following six lines of output:
-0.67:0.33:-0.33 0.67:-0.33:0.33 -0.67:-0.33:0.33
-0.67:-0.33:0.33 -0.67:0.33:-0.33 0.67:-0.33:0.33
Note that the .Sort produces different outputs depending on the order the strings are added.
It looks like the Sort algorithm is ignoring the "-" mark.
This is a very serious Bug impacting the System.Collections Array, SortedList etc.
Marc Gravell - 09 Nov 2006 13:11 GMT This appears to relate to the culture-specific comparer, which is presumably ignoring symbols (one of the CompareOptions flags); perhaps switch to ordinal comparison, which resolves this. You may be able to use StringComparer.Ordinal; can't remember if that exists in 1.1, but if not something like this should do (and use it in the Sort() calls).
public class OrdinalStringComparer : IComparer { public readonly static OrdinalStringComparer Singleton = new OrdinalStringComparer(); private OrdinalStringComparer() { } public int Compare(object x, object y) { return string.CompareOrdinal((string) x, (string) y); } }
Marc
william.hooper@gmail.com - 09 Nov 2006 13:39 GMT I don't agree....
string s2 = "0.67:-0.33:0.33"; string s3 = "-0.67:0.33:-0.33";
Console.WriteLine( String.Compare(s2,s3)); Console.WriteLine( String.Compare(s2,s3));
returns -1 and 1 showing that the Sting.Compare function is working as expected.
I don't think this is a culture issue and writing an ICompared for evxery System.Collections string comparison is a nighmare.
Marc Gravell - 09 Nov 2006 14:19 GMT Well... it would appear that the problem is a dodgy comparer; try this (using your previous values for s1, s2, s3):
Console.WriteLine(Comparer.Default.Compare(s1, s2)); Console.WriteLine(Comparer.Default.Compare(s2, s3)); Console.WriteLine(Comparer.Default.Compare(s3, s1));
Yields 1, 1, 1 meaning there is a comparer loop. Oops! Anybody [MS?] want to comment on whether this is intentional or a fubar? It would appear to violate the "transitive" rule of comparers, which should be enforced for non-zero results (0 being transitive as long as it agrees in each direction).
Actually, it's quite lucky that this returns at all! Perhaps there is a panic "oops, shouldn't possibly have taken more than n^2 iterations...".
Anyway, the point of my post was that this can be avoided using an ordinal comparer. And you can re-use the same one each time... no need to write anything more.
Marc
william.hooper@gmail.com - 09 Nov 2006 14:45 GMT Yes, Marc, sorry you are right.
The code
string s1 = "-0.67:-0.33:0.33"; string s2 = "0.67:-0.33:0.33"; string s3 = "-0.67:0.33:-0.33";
Console.WriteLine( String.Compare(s1,s2)); Console.WriteLine( String.Compare(s2,s3)); Console.WriteLine( String.Compare(s3,s1));
Console.WriteLine();
Console.WriteLine( String.CompareOrdinal(s1, s2)); Console.WriteLine( String.CompareOrdinal(s2, s3)); Console.WriteLine( String.CompareOrdinal(s3, s1));
returns
1 1 1
-3 3 3
Ie String.Compare can not handle the "-" marks but String.CompareOrdinal can.
Is there not some regional setting you can put once in the code so String.Compare and ArrayList.Sort all work that way from then on? Would much prefer this...
----
Note it annoys me that the documentation writes:
"The .NET Framework supports word, string, and ordinal sort rules....For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list...."
But in fact the hypen is being completly ignored rather than given a low weight.
I think this behaviour by default is very undesirable.
Marc Gravell - 09 Nov 2006 15:01 GMT Well, it isn't being ignored. If it was being ignored I would expect the result to be 0,0,0.
> I think this behaviour by default is very undesirable. I think it is buggy, but it isn't in System.Collections - it is in String.CompareTo. I can't think of a valid occasion in a well-ordered system when a < b < c < a or a > b > c > a. It just makes no logical sense unless you are Maurits Escher.
Now, a = b = c = a = 0 I could live with (i.e. hyhpens completely ignored).
Marc
w.hooper@hotmail.com - 09 Nov 2006 15:10 GMT If you play around with examples like this:
s1 = "0.67:0.33:-0.33"; s2 = "0.67:-0.33:0.33"; s3 = "-0.67:0.33:0.33";
Console.WriteLine( String.Compare(s1,s2)); Console.WriteLine( String.Compare(s2,s3)); Console.WriteLine( String.Compare(s3,s1));
It all looks OK. But when you put two hyphens into the lines the String.Compare gets confused and gives silly results.
OK the String.CompareOrdinal function fixes the problem but this is not behaviour by design surely?
We need a comment from our lord and master MS....
Marc Gravell - 09 Nov 2006 15:33 GMT Logged: http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID= 236900
Marc
w.hooper@hotmail.com - 09 Nov 2006 15:43 GMT Thanks very much that's great. I will keep an eye on it.
Marc Gravell - 13 Nov 2006 13:25 GMT Still no MS viewpoint?
For ref, I think this is actually quite important, as it could (as illustrated on the now-deleted CodeProject link) cause a whole range of sort-critical operations to fail... SortedList etc, or any custom collections that assume that .Sort() might actually work, and then trust the results.
Any thoughts?
Marc
william.hooper@gmail.com - 13 Nov 2006 17:30 GMT Yes absolutely, it cost me lots of time in the SortedList failing. I think it's to do with the hyphen algorithm which is designed to sort words with hyphens in a smart way. Obviously with two hyphens in the word it fails. Anyone who puts two hyphens in strings is taking a huge risk - yet using hyphens in strings is common enough. A really dangerous bug I would say, but I guess Microsoft haven't seen that yet. You could mention on the article that it causes problems with SortedList and you think it's a very serious problem. I think we have discovered a real corker and it deserves to be given a lot of attention.
Chris Dunaway - 09 Nov 2006 14:34 GMT > I don't agree.... > [quoted text clipped - 9 lines] > I don't think this is a culture issue and writing an ICompared for > evxery System.Collections string comparison is a nighmare.
>From the docs, note how it mentions that the hyphen might be given a low weight so that similar words will sort together:
"The .NET Framework supports word, string, and ordinal sort rules. A word sort performs a culture-sensitive comparison of strings in which certain nonalphanumeric Unicode characters might have special weights assigned to them. For example, the hyphen ("-") might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. A string sort is similar to a word sort, except that there are no special cases and all nonalphanumeric symbols come before all alphanumeric Unicode characters. An ordinal sort compares strings based on the numeric value of each Char in the string. For more information about word, string, and ordinal sort rules, see the System.Globalization.CompareOptions topic.
Comparison and search procedures are case-sensitive by default and use the culture associated with the current thread unless specified otherwise. By definition, any string, including the empty string (""), compares greater than a null reference, and two null references compare equal to each other.
If your application makes security decisions based on the result of a comparison or case change operation, then the operation should use the invariant culture to ensure the result is not affected by the value of the current culture. For more information, see the CultureInfo.InvariantCulture topic."
I don't know if this is what is causing the behavior you reported, but at least it's worth looking into.
Chris
Chris Dunaway - 09 Nov 2006 15:03 GMT And FWIW, I didn't see this issue officially reported on the feedback site. You may wish to post it there.
w.hooper@hotmail.com - 09 Nov 2006 15:40 GMT > And FWIW, I didn't see this issue officially reported on the feedback > site. You may wish to post it there. Chris, what is the feedback site? I would liek to post it there. It's really annoying and you never know, they might take an interest.
Marc Gravell - 09 Nov 2006 15:44 GMT See my previous post; since you didn't seem familiar with "connect" I posted it as a bug. Feel free to go in and click "validate"... and vote! http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID= 236900
And note: I wasn't trying to steal your thunder... just to get it logged with the least fuss...
Marc
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|