.NET Forum / .NET Framework / New Users / August 2007
Inconsistent derivation for generic collections
|
|
Thread rating:  |
Michi Henning - 28 Aug 2007 02:49 GMT Looking at List<T> and LinkedList<T>, we have:
public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>, IList, ICollection, IEnumerable public class LinkedList<T> : ICollection<T>, IEnumerable<T>, ICollection, IEnumerable, ISerializable, IDeserializationCallback
Interesting that LinkedList<T> derives from ISerializable and IDeserializationCallback, but List<T> does not. Why the inconsistency?
But, more interstingly, both LinkedList<T> and List<T> derived from ICollection<T> and from ICollection.
Now compare that to Queue<T> and Stack<T>:
public class Queue<T> : IEnumerable<T>, ICollection, IEnumerable public class Stack<T> : IEnumerable<T>, ICollection, IEnumerable
These derive from ICollection, but *not* from ICollection<T>.
So, I can pass a List<T> or a LinkedList<T> as ICollection<T>, but I *cannot* pass Queue<T> or Stack<T> as ICollection<T>.
I can pass all four collection concrete types as ICollection, but that has other drawbacks:
- It incurs the cost of boxing and unboxing.
- ICollection<T> has an Add() method, but ICollection does *not* have an Add() method. So, if I pass collections as ICollection, I cannot add to the collections in the callee.
Why these gratuitous differences?
Cheers,
Michi.
Mattias Sjögren - 28 Aug 2007 20:46 GMT >Interesting that LinkedList<T> derives from ISerializable and >IDeserializationCallback, but List<T> does not. Why the inconsistency? Implementing ISerializable is just a way to implement your own serialization logic. Why that's required should probably be seen as an implementation detail. List<T> is still Serializable, so for you as a consumer of the class it shouldn't matter.
>- ICollection<T> has an Add() method, but ICollection does *not* have >an Add() method. So, if I pass collections as ICollection, I cannot >add to the collections in the callee. > >Why these gratuitous differences? See
http://blogs.msdn.com/kcwalina/archive/2005/09/23/Collections.aspx http://connect.microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?FeedbackID= 110995
Mattias
 Signature Mattias Sjögren [C# MVP] mattias @ mvps.org http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com Please reply only to the newsgroup.
Michi Henning - 28 Aug 2007 23:02 GMT On Aug 29, 5:46 am, Mattias Sj?gren <mattias.dont.want.s...@mvps.org> wrote:
> See > > http://blogs.msdn.com/kcwalina/archive/2005/09/23/Collections.aspxhttp://connect .microsoft.com/VisualStudio/feedback/ViewFeedback.aspx?... I had a look, thanks. However, that article doesn't explain why Stack<T> and Queue<T> don't derive from ICollection<T>, as far as I can see.
Cheers,
Michi.
Mattias Sjögren - 29 Aug 2007 05:58 GMT >I had a look, thanks. However, that article doesn't explain why >Stack<T> >and Queue<T> don't derive from ICollection<T>, as far as I can see. I thought Anthony Moore's comment from the second link made it pretty clear
"This apparent incongruity was intentional and is the result of a difference between ICollection and ICollection<T>. ICollection was essentially just IEnumerable with a count, and as such did not add much value to IEnumerable. The heirarchy was re-designed for the generic namespace, so ICollection<T> represents a mutable collection with Add and Remove methods. As such Queue does not comply with this extended interface because it does not support unconstrained addition and removal."
Mattias
 Signature Mattias Sjögren [C# MVP] mattias @ mvps.org http://www.msjogren.net/dotnet/ | http://www.dotnetinterop.com Please reply only to the newsgroup.
Michi Henning - 29 Aug 2007 13:12 GMT On Aug 29, 2:58 pm, Mattias Sj?gren <mattias.dont.want.s...@mvps.org> wrote:
> I thought Anthony Moore's comment from the second link made it pretty > clear [quoted text clipped - 7 lines] > extended interface because it does not support unconstrained addition > and removal." Hmmm... Queue<T> certainly coold support Add() and Remove(). For Stack<T>, Remove() is problematic. It would probably have been better to have a Remove() that is the inverse off Add(). In other words, if Add() adds an element "to the end" of the collection, Remove() should remove the element "at the end" of the collection. This would make Add() and Remove() suitable for any collection, and make them analogous to C++ STL push_back and pop_back() functions.
This makes me wonder about the design of ICollection<T> in general though. I mean, what's the point of having a Remove() operation that removes a specified element when the run-time cost is O(n) for many collection types? This strikes me as singularly inappropriate for a generic collection interface. In fact, efficient Remove() is achievable only for hash tables and trees, as far as I can see. It certainly won't be efficient for arrays, linked lists, queues, deques, or priority queues.
A bit of a mis-design here, maybe?
What seems to be a big problem in the .NET collection classes is that there is no concept of separating container, algorithm, and iterator. For example, with a hash table or sorted hash table, I cannot remove an entry *and* get at the value that was removed. Instead, I have to first do a lookup. and then do a remove. With an iterator concept, the lookup could return an iterator that I can pass to the remove as hint, so the implementation can efficiently locate.
But I digress--the lack of a generic Add() operation for all the collection types still strikes me as strange. It cetinaly makes it quite a bit harder to work with collections generically.
Cheers,
Michi.
Peter Duniho - 29 Aug 2007 00:26 GMT > [...] > Now compare that to Queue<T> and Stack<T>: [quoted text clipped - 11 lines] > > - It incurs the cost of boxing and unboxing. In what way do you mean?
> - ICollection<T> has an Add() method, but ICollection does *not* have > an Add() method. So, if I pass collections as ICollection, I cannot > add to the collections in the callee. > > Why these gratuitous differences? IMHO, you should leave the word "gratuitous" out of your posts, especially when asking this sort of question. You are asking the question, which means you obviously don't know the reason the differences exist, which means you have no knowledge that would allow you to accuse the differences of being "gratuitous".
That sort of phrasing simply colors your posts as prejudiced and signals to others that they may not have much luck trying to answer your questions, because you've already apparently made up your mind. That, of course, discourages people from replying at all.
Now, that said...
In this particular case, there are very specific differences between ICollection and ICollection<>. You've even noted one, but the more general observation is that an ICollection<> is read/write while an ICollection is read-only.
Because a Queue<> and a Stack<> have a very specific structure and it is a design error (*) to modify those structures without going through their defined accessors (enqueue/dequeue, push/pop), those classes don't implement the ICollection<> interface. Since ICollection is read-only, it's safe for them to implement that interface.
Notes:
(*) It's my opinion that modifying a queue or stack outside the intended accessors is a design error. Others may disagree (in particular, I know that Jon Skeet wrote a queue-like class that he calls a queue but which allows for rearranging of the members according to priority; while we don't disagree on the utility of such a class, we disagree on what you should call it :) ). However, it seems to me that the designers of the .NET classes would agree with me, given the implementation of the classes.
2) IMHO, where you write that a class "inherits" an interface, it would be more correct to write that the class "implements" the interface. To me, "inherits" implies that there's some implementation that is actually inherited. I grant that this implication isn't 100% (see abstract classes for example), but as a general rule inheritance is about getting already-implemented behavior, while interfaces are about promising to implement specific behavior.
Pete
Michi Henning - 29 Aug 2007 01:58 GMT > IMHO, you should leave the word "gratuitous" out of your posts, > especially when asking this sort of question. You are asking the > question, which means you obviously don't know the reason the > differences exist, which means you have no knowledge that would allow > you to accuse the differences of being "gratuitous". You are right, and I apologize.
> Because a Queue<> and a Stack<> have a very specific structure and it is > a design error (*) to modify those structures without going through > their defined accessors (enqueue/dequeue, push/pop), those classes don't > implement the ICollection<> interface. Since ICollection is read-only, > it's safe for them to implement that interface. I hear your reasoning. But what would be wrong to have Add() simply enqueue on a queue, and push on a stack? I mean, after all, LinkedList supports ICollection<T> and therefore supports Add(). For LinkedList, Add() appens to the list, but it could equally well prepend, at the same run-time cost.
So, if Add() is deemed inappropriate for queues and stacks, I think I could mount an argument that Add() is equally inappropriate for a LinkedList.
So, I don't think this is a particularly strong argument.
And the cost of it is rather serious, because I cannot pass List, LinkedList, Stack, and Queue generically other than as an ICollection, but that incurs quite a performance hit for all four of them.
> 2) IMHO, where you write that a class "inherits" an interface, it would > be more correct to write that the class "implements" the interface. Fine, I'm happy to call it that. I guess I could have also said that List derives from ICollection--the exact terminology doesn't change the gist of the problem.
Cheers,
Michi.
Peter Duniho - 29 Aug 2007 05:10 GMT > I hear your reasoning. But what would be wrong to have Add() simply > enqueue on a queue, and push on a stack? I can't say that if you're just considering Add(), that would necessarily break the paradigm. It's slightly ambiguous, but as you suggest is unlikely to be a huge problem.
But that's not all that ICollection<> does. It also has Remove(), which specifically does not enforce one-ended removal.
> I mean, after all, LinkedList > supports ICollection<T> and therefore supports Add(). For LinkedList, > Add() appens to the list, but it could equally well prepend, at the > same run-time cost. I don't see how LinkedList<> is a significantly different example from List<>. List<> also has an InsertAt() that would allow prepending. But both List<> and LinkedList<> have a specific Add() method, and that always adds at the end. It's not a stretch for someone to assume that ICollection<>.Add() would do the same.
> So, if Add() is deemed inappropriate for queues and stacks, I think I > could mount an argument that Add() is equally inappropriate for a > LinkedList. IMHO it's not really Add() that is the big problem.
> So, I don't think this is a particularly strong argument. You're welcome to your opinion. And I suppose that you may still find the difference gratuitous after the explanations given. Personally, I think that the differences are significant enough to justify the difference in which interfaces are supported.
> And the cost of it is rather serious, because I cannot pass List, > LinkedList, Stack, and Queue generically other than as an ICollection, > but that incurs quite a performance hit for all four of them. I'm still not clear on what performance issues you're talking about. In what way does using ICollection instead of ICollection<> incur "quite a performance hit"?
I'm also not clear on what situation in which it would make sense to require having a single method that can handle all four types of collections. I'm especially suspicious of a method that treats a list (List<> or LinkedList<>) the same as a Queue<> or Stack<> (primarily for the same semantic reasons I've discussed already).
IMHO, you would probably have more success discussing the specific scenario that you're trying to address. The fact is, the .NET collection interfaces are what they are; that's not going to change. So it's probably more productive to focus on whatever specific issue you're trying to solve. There may actually be a reasonably appropriate alternative to what you're trying to do now. But it's hard to say without the specific scenario being addressed.
>> 2) IMHO, where you write that a class "inherits" an interface, it would >> be more correct to write that the class "implements" the interface. > > Fine, I'm happy to call it that. I guess I could have also said that > List derives from ICollection--the exact terminology doesn't change > the gist of the problem. IMHO "derives from" is the same as "inherits". But no, you're right...the exact terminology doesn't change the problem. It could affect how well you're understood, but so far I don't think that's been a problem either.
Pete
Michi Henning - 29 Aug 2007 12:06 GMT > I'm still not clear on what performance issues you're talking about. In > what way does using ICollection instead of ICollection<> incur "quite a > performance hit"? ICollection incurs the cost of boxing and unboxing, whereas ICollection<T> does not. If I iterate over a collection of ints (or other value types), that's significant.
> I'm also not clear on what situation in which it would make sense to > require having a single method that can handle all four types of > collections. I'm especially suspicious of a method that treats a list > (List<> or LinkedList<>) the same as a Queue<> or Stack<> (primarily for > the same semantic reasons I've discussed already). The reason I'm raising this question is that I'm unmarshaling sequences off the wire, as part of ZeroC's Ice middleware (www.zeroc.com). What I would like to do is allow the user to determine the concrete collection type into which the data that arrives over the wire will be unmarshaled. So the user gets a LinkedList, a List, a Queue, or Stack, depending on their preference. As part of the unmarshaling code, I have something like the following method:
void fillWithData(ICollection<T> c)
The caller passes an empty collection, and the implemetnation of this method reads the data off the wire and calls Add() to put the data into the collection.
Unfortunately, I can do this only for List and LinkedList, but not for Stack and Queue.
Alternatively, I can write the method as:
void fillWithData(ICollection c)
That works for all four collection types but, unfortunately, ICollection don't have an Add() method, so I can't fill the sequence.
So, I'm forced to write four overloads:
void fillWithData(List<T> c) void fillWithData(LinkedList<T> c) void fillWithData(Queue<T> c) void fillWithData(Stack<T> c)
Now I can do what I want, but only at substantially increased effort.
Yet another alternative is to go back to the previous signature:
void fillWithData(ICollection c)
and use downcasts or a type test to see what I'm dealing with. But the code still is different for each collection because the four collection types use different operation names for what, conceptually, is the same operation: Add() for List<T>, AddLast() for LinkedList<T>, Enqueue() for Queue<T>, and Push() for Stack<T>.
So, no matter how hard I try, I can't treat these collections generically.
What I'm really looking for is something like C++ STL's push_back() function, which can add elements to any collection, no matter what its type.
Cheers,
Michi.
Peter Duniho - 29 Aug 2007 18:42 GMT >> I'm still not clear on what performance issues you're talking about. In >> what way does using ICollection instead of ICollection<> incur "quite a >> performance hit"? > > ICollection incurs the cost of boxing and unboxing, whereas > ICollection<T> does not. Sorry, I'm still confused. As near as I can tell (from the rest of your reply) you can't even use ICollection, since it's a read-only interface.
> If I iterate over a collection of ints (or > other value types), that's significant. Your example was of adding data to the collection, which ICollection doesn't support. Are you actually also using the enumerator for the collection to iterate through the elements? And if so, why not just use the IEnumerable<> generic interface there? It's implemented by all four generic classes you're talking about.
Assuming you have a very good reason for using ICollection instead of IEnumerable<> I can see how the boxing would come into play there, but it didn't seem to me that you were doing that.
In any case, you should measure the cost of boxing before you worry about it. In particular, I suspect that unless _all_ you're doing is copying data around, the boxing is a negligible part of the algorithm. And note that by "copying data around" I don't mean across a network or to/from a disk. Those activities have a significant i/o overhead that I would expect to be at least an order of magnitude greater than the cost of copying boxed data within memory and so in those cases the boxing shouldn't be an issue either.
Have you actually measured the performance difference? Does using ICollection for iterating really cause a significant slowdown in your algorithm?
All of this, of course, assumes that there's a good reason IEnumerable<> doesn't work for you. If it does, the point is moot.
> [...] > So, I'm forced to write four overloads: [quoted text clipped - 5 lines] > > Now I can do what I want, but only at substantially increased effort. I'm not really convinced that the increased effort is substantial. But I guess that's in the eye of the beholder. At the very least, you only need three overloads, because you can handle List<> and LinkedList<> with an ICollection<>. Just because you need the explicit overload for Stack<> and Queue<>, that's not to say you need to use it for everything else.
So there's a 25% reduction in your substantially increased effort right there. No need to thank me. :)
Personally, I don't see how writing separate overloads is all that much extra effort. Any code that is the same between the overloads you should be able to share in a common method somewhere, and of course the code that the framework requires to be different will have to be different, but that should amount to one line (the line that actually adds an item to the collection).
Heck, for that matter, the implementation of the Stack<> and Queue<> versions could easily just create a temporary List<> instance, use the base ICollection<> implementation you use for actual List<> and LinkedList<> instances, and then copy the results into the Stack<> or Queue<> instance using the push/enqueue methods. Yes, there's a doubling of effort because of the extra copy but, again, assuming this data is coming from somewhere (like a disk or network) that overhead should be tiny compared to the bandwidth of the i/o source.
If that overhead is unacceptable to you, see above for an alternative that doesn't have that overhead.
But, assuming that the effort seems too substantial, you always have the option of implementing your own generic queue and stack classes that do support ICollection<>. Just because the built-in classes don't do what you want, that doesn't mean you're stuck with them.
Pete
Michi Henning - 29 Aug 2007 22:44 GMT > > ICollection incurs the cost of boxing and unboxing, whereas > > ICollection<T> does not. > > Sorry, I'm still confused. As near as I can tell (from the rest of your > reply) you can't even use ICollection, since it's a read-only interface. The example I quoted is about unmarshaling. However, I also have the opposite, where I need to marshal the containers and serialize them onto the wire. In this case, I have to iterate over the containers to pull out the elements.
> > If I iterate over a collection of ints (or > > other value types), that's significant. > > Your example was of adding data to the collection, which ICollection > doesn't support. Are you actually also using the enumerator for the > collection to iterate through the elements? Yes.
> And if so, why not just use > the IEnumerable<> generic interface there? It's implemented by all four > generic classes you're talking about. IEnumerable<T> works for getting at the elements for all four containers, yes. However, when I marshal a collection onto the wire, I have to write the number of elements, followed by the actual elements. If I pass the collections as IEnumreable<T>, I can no longer get at the count of elements, because that count is part of ICollection<T>, not IEnumerable<T>.
> In any case, you should measure the cost of boxing before you worry > about it. In particular, I suspect that unless _all_ you're doing is [quoted text clipped - 4 lines] > of copying boxed data within memory and so in those cases the boxing > shouldn't be an issue either. Your expectation is incorrect. I have measured this by marshaling in one process, sending the data via TCP/IP to another process, and unmarshaling it there. Boxing makes transmission and receipt of sequences of 50,000 ints slower by a factor of 5.5.
> > So, I'm forced to write four overloads: > [quoted text clipped - 6 lines] > > I'm not really convinced that the increased effort is substantial. I don't need to do this only for sequences of int, but also for sequences of bool, byte, string, etc. I have ten types overall. So, instead of having to write ten methods, I have to write forty. I believe that writing forty methods instead of ten is substantial extra effort, even if the method implementations are trivial.
> I guess that's in the eye of the beholder. At the very least, you only > need three overloads, because you can handle List<> and LinkedList<> [quoted text clipped - 4 lines] > So there's a 25% reduction in your substantially increased effort right > there. No need to thank me. :) There are other reasons why I need to distinguish between List<> and LinkedList<>, so I still need the overload.
But that's beside the point: the core problem is that we have four containers in the framework, but two of those containers cannot be filled generically. I see no good reason for this limitation.
> Heck, for that matter, the implementation of the Stack<> and Queue<> > versions could easily just create a temporary List<> instance, use the [quoted text clipped - 4 lines] > data is coming from somewhere (like a disk or network) that overhead > should be tiny compared to the bandwidth of the i/o source. Again, your assumption is incorrect. I have tried this and benchmarked it, and it is slower by something like a factor of two, depending on the element type. The code I'm writing is performance critical, so this matters.
> If that overhead is unacceptable to you, see above for an alternative > that doesn't have that overhead. I'm not sure what alternative you are referring to.
> But, assuming that the effort seems too substantial, you always have the > option of implementing your own generic queue and stack classes that do > support ICollection<>. Just because the built-in classes don't do what > you want, that doesn't mean you're stuck with them. I don't see the sense of that argument. There are queue and stack implementations in the framework. They should be usable, but are not. Writing my own is no good because the code I'm talking about is part of a library that application code links with. It is that application code that wants to use stacks and queues and pass them to other libraries, that expect them to be the .NET versions, not some homegrown version. If I do what you suggest, the application would have to copy between the homegrown versions and the .NET versions as it passes these containers around. Doing this is unacceptable, both because it is messy in the code because it incurs an unacceptable performance penalty.
Cheers,
Michi.
Peter Duniho - 30 Aug 2007 01:25 GMT > [...] >> And if so, why not just use [quoted text clipped - 7 lines] > the count of elements, because that count is part of ICollection<T>, > not IEnumerable<T>. Surely the caller could just pass the count as a parameter. I mean, you also could obviously enumerate the collection twice, but I don't think that should be necessary.
> Your expectation is incorrect. I have measured this by marshaling in > one process, sending the data via TCP/IP to another process, and > unmarshaling it there. Boxing makes transmission and receipt of > sequences of 50,000 ints slower by a factor of 5.5. Could you post a concise-but-complete example of code that demonstrates this difference?
That difference is, obviously, far larger than I would have expected. It's large enough for me to wonder if there's not something else going on. But it's hard to comment without seeing the code.
>>> So, I'm forced to write four overloads: >>> void fillWithData(List<T> c) [quoted text clipped - 9 lines] > believe that writing forty methods instead of ten is substantial extra > effort, even if the method implementations are trivial. So, your overloads are not actually generic, even though that's what you posted above? Okay.
I submit that in the time it's taken you to discuss the shortcomings of ICollection versus ICollection<>, you could have had all forty methods written.
Any of the "interesting" code should be shared, and so should only have to be written ten times (once for each non-generic type you have to deal with). The "uninteresting" code that has to be written four times should be able to simply be copied and pasted, replacing each non-generic type as necessary in each group of copied methods.
Only the "uninteresting" part is overhead anyway (you'd have to write the ten versions regardless), and even estimating conservatively, it shouldn't take more than ten minutes to do the whole thing (one minute per copied-and-pasted block of code, including the necessary search-and-replace operation).
> There are other reasons why I need to distinguish between List<> and > LinkedList<>, so I still need the overload. Now I'm really confused. If you need the overload anyway, then in what way is the difference between ICollection and ICollection<> relevant _at all_?
I mean, I see the theoretical difference. But it apparently doesn't apply to you anyway. Even if ICollection<> was implemented by Stack<> and Queue<> you'd have three different methods anyway (assuming that just as List<> and LinkedList<> need to be handled separately, they each would need to be separate from Stack<> and Queue<>).
> But that's beside the point: the core problem is that we have four > containers in the framework, but two of those containers cannot be > filled generically. I see no good reason for this limitation. I'm sorry that even after the explanations you still cannot understand the justification. Suffice to say, opinions obviously vary and in this case yours is in the minority.
>> Heck, for that matter, the implementation of the Stack<> and Queue<> >> versions could easily just create a temporary List<> instance, use the [quoted text clipped - 9 lines] > the element type. The code I'm writing is performance critical, so > this matters. And again, I don't understand. For copying a Stack<> or Queue<> to a List<> and then processing it that way to double the time it takes to process, _all_ of the processing time would have to be spent managing the structure. And that's assuming that you are iterating through the final List<> without doing anything more than the copy of each element that is required to create the List<> (typically code does something more interesting than just copying the element once).
There's no way that CPU and memory bandwidth is anywhere near as costly time-wise as network or disk i/o. So again, I'm left wanting a concise-but-complete example of code that demonstrates this difference.
>> If that overhead is unacceptable to you, see above for an alternative >> that doesn't have that overhead. > > I'm not sure what alternative you are referring to. Well, one was using IEnumerable<> instead of ICollection<> and another was just biting the bullet and writing your three^H^H^H^H^Hfour overloads.
>> But, assuming that the effort seems too substantial, you always have the >> option of implementing your own generic queue and stack classes that do >> support ICollection<>. Just because the built-in classes don't do what >> you want, that doesn't mean you're stuck with them. > > I don't see the sense of that argument. Why not? Sometimes, one finds that the framework-supplied classes just don't do what you want. In those cases, you generally either extend the class (by inheritance in .NET 2.0) or write your own. That's just a fact of life.
> There are queue and stack > implementations in the framework. They should be usable, but are not. I don't think it's helpful to say that the implementations aren't usable. Plenty of people use them all the time. Your own use of them is a relatively esoteric issue, and one that most people don't run into.
> Writing my own is no good because the code I'm talking about is part > of a library that application code links with. It is that application [quoted text clipped - 5 lines] > because it is messy in the code because it incurs an unacceptable > performance penalty. I understand "messy", at least to the extent that you might think that a simple search-and-replace of the words "Stack<" and "Queue<" with "MyStack<" and "MyQueue<" is messy (personally, I don't think that's a terrible requirement, but whatever).
I certainly don't understand your claim of "an unacceptable performance penalty". The whole point would be to avoid the performance penalty you are complaining about with respect to ICollection. There's no reason that your own stack or queue implementation should be any slower than the .NET implementations.
For reading from the collections, it seems to me that IEnumerable<> is a fine solution, and completely avoids whatever performance issues using ICollection might introduce (I'm skeptical that ICollection itself could be responsible for such a significant difference, but that's between you and your profiler I guess...without example code, it's impossible to discuss the specifics).
For writing to the collections, ICollection won't work anyway, so any performance issue is irrelevant. You can either implement a stack and queue yourself, or you can just spend the ten minutes writing the overloads you need.
Either way, I just don't see the big deal. It's certainly not enough of a problem to go around saying that the Stack<> and Queue<> classes are simply useless. You may not like the implementations, but it's just not productive to go projecting your own frustrations onto everyone else. Not everyone has the same difficulties you're having.
Pete
Michi Henning - 30 Aug 2007 22:06 GMT On Aug 30, 10:25 am, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com> wrote:
> > IEnumerable<T> works for getting at the elements for all four > > containers, yes. However, when I marshal a collection onto the wire, I [quoted text clipped - 6 lines] > also could obviously enumerate the collection twice, but I don't think > that should be necessary. Well, what's nicer?
prx.func(mySequence);
or
prx.func(mySequence, mySequence.Count);
Not exactly ergonomic, and also inconsistent with how we do this in Java, C++, Python, and Ruby.
> > Your expectation is incorrect. I have measured this by marshaling in > > one process, sending the data via TCP/IP to another process, and [quoted text clipped - 3 lines] > Could you post a concise-but-complete example of code that demonstrates > this difference? No, not really. This is in the context of a complete client and server written with Ice. I can assure you though that the measurements are accurate and conscienscous. If you like to find out for yourself, grab a copy of Ice from www.zeroc.com, and create a client and server to benchmark. (The throughput demo is a good starting point.) The modify the demo to use generics instead by editing the generated code.
> That difference is, obviously, far larger than I would have expected. > It's large enough for me to wonder if there's not something else going > on. But it's hard to comment without seeing the code. I can assure you that there is nothing else going on. I've taken these measuremeants with arrays, collections that are derived from CollectionBase, and generics.
> I submit that in the time it's taken you to discuss the shortcomings of > ICollection versus ICollection<>, you could have had all forty methods > written. I am not here to argue. I noticed a deficiency in the collection classes and pointed it out. What I'm trying to achieve can be done with ease in both Java and C++, but it can't be achieved with C#. That tells me that there is something wrong with the collection classes in .NET, not with what I'm trying to do.
> Either way, I just don't see the big deal. It's certainly not enough of > a problem to go around saying that the Stack<> and Queue<> classes are > simply useless. I don't recall having said that they are useless, only that they don't behave like their list and linked list counterparts for generic insertion.
> You may not like the implementations, but it's just not > productive to go projecting your own frustrations onto everyone else. > Not everyone has the same difficulties you're having. Projecting frustrations? I thought this group was about the .NET Framework? (If I want a psychoanalyst, I already know where to find one.)
Anyway, I can do what I want with ease in C++ and Java, but I can't do it in C#. You keep telling me that this is my fault and has nothing to do with C#. I guess we'll just have to disagree.
Michi.
Peter Duniho - 31 Aug 2007 02:26 GMT > Well, what's nicer? > [quoted text clipped - 3 lines] > > prx.func(mySequence, mySequence.Count); I don't see any significant difference. Other than one works and the other doesn't. I generally try to write code that works, so there is that.
> Not exactly ergonomic, and also inconsistent with how we do this in > Java, C++, Python, and Ruby. Um, whatever. There's lots of stuff in every language that doesn't have an exact analog in other languages. I don't see that as a problem.
>> Could you post a concise-but-complete example of code that demonstrates >> this difference? > > No, not really. Well, there's not much else I can offer in the way of advice then. Suffice to say, I am suspicious of the numbers, even if you did take care in measuring them. I remain unconvinced that boxing incurs the penalty you say it incurs, never mind that boxing appears to be a red herring anyway (since there's no need to use the class that boxes anyway).
I don't doubt the confidence you have in your measurements. But until I see code that reproduces the result myself, I'm going to suspect a mistake in the measurement somewhere. A 5x difference is just too severe to be real.
> [...] > I am not here to argue. I noticed a deficiency in the collection > classes and pointed it out. What I'm trying to achieve can be done > with ease in both Java and C++, but it can't be achieved with C#. That > tells me that there is something wrong with the collection classes > in .NET, not with what I'm trying to do. I can't speak for Java, but C++ isn't a framework. It is not in any way comparable to .NET, which is what's being discussed here, nor is really actually as hard as you claim to do what you want using .NET or C# (depending on the specific approach you want to take).
>> Either way, I just don't see the big deal. It's certainly not enough of >> a problem to go around saying that the Stack<> and Queue<> classes are [quoted text clipped - 3 lines] > behave like their list and linked list counterparts for generic > insertion. You wrote (and I quoted, though you snipped it out of your reply):
> There are queue and stack implementations in the framework. They should be usable, but are not. How can something not be usable, and yet also not be useless?
It seems to me that you definitely did write that the classes were useless.
>> You may not like the implementations, but it's just not >> productive to go projecting your own frustrations onto everyone else. >> Not everyone has the same difficulties you're having. > > Projecting frustrations? I thought this group was about the .NET > Framework? It is. But your comments aren't really so much about .NET Framework so much as they are about how much you don't like .NET Framework and how much you think everyone else shouldn't like it either.
If you had written simply that .NET Framework isn't meeting your needs, fine. But you're making absolute statements about things that you feel are flaws in .NET Framework. You are projecting.
> [...] > Anyway, I can do what I want with ease in C++ and Java, but I can't do > it in C#. You keep telling me that this is my fault and has nothing to > do with C#. I guess we'll just have to disagree. I suppose. I disagree with your characterization of me having placed blame somewhere, but I admit that nothing I can write about .NET appears to have any effect on influencing your opinion.
Perhaps I should have taken your initial hints of being prejudiced (see "gratuitous") to heart and ignored the thread altogether.
I have pointed out a number of alternatives, none of which are really all that hard. If you don't find them suitable, then I guess my attempt to assist was pointless. I'm sorry for having wasted your time.
Pete
Dilip - 31 Aug 2007 02:55 GMT > I have pointed out a number of alternatives, none of which are really > all that hard. If you don't find them suitable, then I guess my attempt > to assist was pointless. I'm sorry for having wasted your time. FWIW, in a private communication, a couple of folks from the C# team did confirm that Michi's gripe is legitimate and the design of ICollection and ICollection<> leaves a lot to be desired. In fact someone has even filed a bug about a variation of the same problem with Microsoft and has received sympathetic response from them (but they are unable to do anything because its constituting a breaking change at this stage).
I have known Michi for a while -- he has enormous respect in the middleware community. You could Google his name just for kicks (or in comp.object.corba if you want). So if he says boxing incurs that kind of penalty he sure as hell knows what he is talking about.
Peter Duniho - 31 Aug 2007 03:04 GMT > FWIW, in a private communication, a couple of folks from the C# team > did confirm that Michi's gripe is legitimate and the design of > ICollection and ICollection<> leaves a lot to be desired. I never said the design was perfect. I'm saying it doesn't seem to me to be nearly as intractable as Michi claims, nor do I find the issues being complained about to be nearly as arbitrary as the tone in the thread implies.
> [...] > I have known Michi for a while -- he has enormous respect in the > middleware community. You could Google his name just for kicks (or in > comp.object.corba if you want). So if he says boxing incurs that kind > of penalty he sure as hell knows what he is talking about. Whatever. I don't code by reputation.
If there's really the kind of over-the-wire slowdown claimed, it should be simple enough to reproduce. If no such code is forthcoming, I reserve my right to be skeptical, regardless of how highly regarded a particular individual is.
I'm happy to be shown. I'm not happy to have someone just say "trust me, it's as bad as I say it is".
Pete
Dilip - 31 Aug 2007 04:12 GMT > > FWIW, in a private communication, a couple of folks from the C# team > > did confirm that Michi's gripe is legitimate and the design of [quoted text clipped - 4 lines] > being complained about to be nearly as arbitrary as the tone in the > thread implies. I have no problems with that. It just means you are entitled to your opinion the same way he is entitled to his.
> > I have known Michi for a while -- he has enormous respect in the > > middleware community. You could Google his name just for kicks (or in > > comp.object.corba if you want). So if he says boxing incurs that kind > > of penalty he sure as hell knows what he is talking about. > > Whatever. I don't code by reputation. You don't "code" by reputation? I don't know what that means.
> If there's really the kind of over-the-wire slowdown claimed, it should > be simple enough to reproduce. Why? Maybe that kind of slowdown is intertwined inside that enormous code base that is not easily extractable. You don't have any experience with the Ice code base nor do you have any idea how complicated it can be. What makes you think its "simple" enough to take it out and reproduce? If Don Box says that some obscure corner of the WCF plumbing doesn't perform as expected would you ask him to either produce the code and prove it or else to keep his mouth shut? He architected it for Christ's sake -- he should know!
In any case you were already told how you can reproduce the problem yourself. If you don't want to do it, then that's your problem. People have better things to do with their time than establish their honesty to you.
> If no such code is forthcoming, I > reserve my right to be skeptical, regardless of how highly regarded a > particular individual is. If someone were posting with the deliberate intention of creating FUD you would be right. Which is why reputation precedes an individual. If Joe Average Programmer happens across this newsgroup and starts claiming .NET doesn't make coffee for him you are entitled to be skeptical. If someone as reputed as Michi Henning claims that there is a penalty incurred in *his* product that *he* created from the ground up, then questioning that is simply being argumentative for the sake of it.
Peter Duniho - 31 Aug 2007 04:36 GMT > I have no problems with that. It just means you are entitled to your > opinion the same way he is entitled to his. As long as opinions are stated as such, that's fine. Michi has not qualified his comments as opinion.
> You don't "code" by reputation? I don't know what that means. I mean that I don't see the point in bringing up someone's reputation as a way justifying some technical point. Technical points, by definition, can be justified in an objective way without needing to consider someone's reputation.
>> If there's really the kind of over-the-wire slowdown claimed, it should >> be simple enough to reproduce. > > Why? Maybe that kind of slowdown is intertwined inside that enormous > code base that is not easily extractable. If that's true, then the slowdown is not due solely to the boxing.
Conversely, if boxing is truly responsible for the slowdown, it would be reproducible without all the other stuff.
> You don't have any > experience with the Ice code base nor do you have any idea how > complicated it can be. It doesn't matter. If using Ice is required to reproduce the problem, it is hardly reasonable to blame the entire performance issue on .NET's boxing of value types.
> [...] > In any case you were already told how you can reproduce the problem > yourself. If you don't want to do it, then that's your problem. Actually, it's not my problem. I'm not the one who is having trouble figuring out how to get .NET's Stack<> and Queue<> classes to fit into my code. I don't have a problem to solve here.
> People have better things to do with their time than establish their > honesty to you. Who said anything about honesty? I'm not questioning anyone's honesty, and it's insulting and misleading for you to accuse me of doing so.
> [...] > If someone as reputed as Michi Henning claims that there > is a penalty incurred in *his* product that *he* created from the > ground up, then questioning that is simply being argumentative for the > sake of it. I have never said that I doubt he has noticed a penalty. What I said I doubt is that the penalty can be blamed on boxing.
As far as being "argumentative for the sake of it" goes, well...I find both Michi's and your own behavior to fall squarely into that category. You don't even have a horse in this race, and Michi continues to discount perfectly reasonable alternatives suggested to him (including the most obvious one, pointing out that simply writing all of the overloads he needs is NOT really all that much work).
Pete
Dilip - 31 Aug 2007 14:02 GMT On Aug 30, 10:36 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com> wrote:
> You don't even have a horse in this race, That much is certainly true. I guess my objective was to just inform you that I did converse with a few Microsoft folks and they did admit that for Michi's particular use-case, the way ICollection and friends are designed today, solutions _are_ a bit awkward. Thats not to say a workaround is impossible, I am just saying there is no straightforward way to do it when other languages seem to not suffer from this problem.
I should have stopped there. But I guess I got carried away. Michi certainly doesn't need _me_ to defend himself. He is quite capable of running circles around me!
Peter Duniho - 31 Aug 2007 17:59 GMT > That much is certainly true. I guess my objective was to just inform > you that I did converse with a few Microsoft folks and they did admit > that for Michi's particular use-case, the way ICollection and friends > are designed today, solutions _are_ a bit awkward. And I agree with that. I just don't agree that the awkwardness is gratuitous or that it renders the classes useless.
> Thats not to say a > workaround is impossible, I am just saying there is no straightforward > way to do it when other languages seem to not suffer from this > problem. I guess that depends on your definition of "straightforward". It's certainly a matter of opinion, and I think the workarounds I've offered here are pretty straightforward (including the suggestion to just write the overloads and be done with it).
Others may disagree, and I have no problem with that. It's the more general impugning of the .NET Framework that I find annoying, especially when it implies that everyone should share those feelings of discontentment.
Pete
briangru@microsoft.com - 29 Aug 2007 00:39 GMT On serialization, you might find this post useful. (Please put any follow-up comments on the MSDN forum - I don't read newsgroups.)
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2067909&SiteID=1&mode=1
I think a review of serialization would help a bit. Let's look at these types a little more closely. Here's my view of these types:
[DebuggerTypeProxy(typeof(Mscorlib_CollectionDebugView<>))] [DebuggerDisplay("Count = {Count}")] [Serializable()] <------------------------- public class List<T> : IList<T>, System.Collections.IList
[Serializable()] <------------------------- [System.Runtime.InteropServices.ComVisible(false)] [DebuggerTypeProxy(typeof(System_CollectionDebugView<>))] [DebuggerDisplay("Count = {Count}")] public class LinkedList<T>: ICollection<T>, System.Collections.ICollection, ISerializable, IDeserializationCallback {
Remember that serializability of a type, at least for the CLR's BinaryFormatter & SoapFormatter, is defined by the presence of the SerializableAttribute on the type. ISerializable is more of an implementation detail of the type, meaning that the type either has need of some special versioning constraints around compatibility with previous versions of its serialized format on disk (or over the network), or that it may explicitly need some additional code to run on deserialization to restore internal invariants. Sometimes these needs can be met by marking fields as optionally serializable, but ISerializable's more general methods were necessary.
IDeserializationCallback takes this a step further, saying that deserialization of this type is so special, we must explicitly run some code after the rest of the object graph has been deserialized. As an example of where this is necessary, consider a hash table that must call GetHashCode on all keys within the hash table. Note that we can't persist out the return value of GetHashCode, since hash functions are likely to be improved from one version of a library to another. So we must call GetHashCode on the key sometime during deserialization, but after the individual key has been deserialized. If the hash table is deserialized before the keys are deserialized, then calling GetHashCode may return the wrong value. IDeserializationCallback allows you a chance of working around simple ordering problems, by providing a post-deserialization callback.
The only confusing thing about ISerializable and IDeserializationCallback are that if a base type implements these interfaces, it suggests that derived types should be serializable (ie, they should have the SerializableAttribute on them), may need to implement a deserialization constructor, and if they override these methods, they may need to call the base type's implementation in the appropriate places. The most common place where this shows up in our library is Exception - very few developers remember the deserialization constructor on their own exceptions (which only shows up as a problem once you start using multiple appdomains, remoting, or the CLR's new add-in model). But the slightly fishy serialization design issue is that the implementation details of the type tell you whether special serialization semantics are required, and this leaks through in the public interface in terms of these marker interfaces & requirements on derived types. Given that serialization of a type isn't something to be done lightly for any type with non-trivial internal state, this requirement on subclasses is well within reason.
Back to our collections - clearly, List<T> and LinkedList<T> are serializable. The original poster's question might then be, how would these type's serialization needs be different? The issue comes down to only implementation details. For List<T>, it's basically a T[], a count, and some other uninteresting state around enumeration. Assuming that T is serializable (and all subclasses of T that are stored in the List are also serializable), then List<T> can be serialized & deserialized safely.
So what about LinkedList<T>? Unlike our hash tables, it doesn't require hash functions with may vary with the CLR or third party library version changes. So why do anything special here? The answer is performance - we don't want to serialize out a list of LinkedListNodes. We instead write out just the interesting data. Now let's turn a critical eye to this class. Do we need the OnDeserialization method? Probably not in this version. Does writing out just the values instead of the complete object graph break scenarios when a consumer of the list held references to individual LinkedListNodes? Yes, this does indeed break the notion of serializing out an entire object graph then being able to recover all of it faithfully, potentially on another machine with a different version of the CLR. But our collection designers apparently felt the performance of serializing out linked lists efficiently outweighed the one scenario we've (perhaps accidentally) disabled.
I hope this helps you to understand serialization more clearly.
Brian Grunkemeyer CLR Base Class Library Team
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|