.NET Forum / Languages / C# / March 2008
C# generic containers from a "C++ perspective"
|
|
Thread rating:  |
Giovanni Dicanio - 21 Mar 2008 13:04 GMT When I have a List<T> in C#, I think that the "list" (i.e.: dynamically growing array) stores "pointers" to T instances, e.g. considering a pseudo-equivalent C++ code, my understanding is that the C# version:
List< MyClass >
is equivalent to something like this in C++:
std::vector< MyClass * >
Is that correct?
Now, the question is: if we have List of "primitive" types (like an 'int', or 'double'...), how are these "primitive types" stored in the List? Are they stored "by value"? Or are they stored "by pointers"?
i.e. the C# version:
List< double >
is equivalent to C++ :
vector< double >
or instead:
vector< double * >
?
Thanks in advance, Giovanni
Anthony Jones - 21 Mar 2008 14:04 GMT > When I have a List<T> in C#, I think that the "list" (i.e.: dynamically > growing array) stores "pointers" to T instances, e.g. considering a [quoted text clipped - 7 lines] > > Is that correct? Close enough. All memory for classes are allocated from a heap anyway and all references point to that space in the heap. Hence in C# there is no need to differentiate between a class instance and a class pointer.
> Now, the question is: if we have List of "primitive" types (like an 'int', > or 'double'...), how are these "primitive types" stored in the List? Are [quoted text clipped - 13 lines] > > ? They are stored by value IOW vector< double >.
 Signature Anthony Jones - MVP ASP/ASP.NET
ignacio machin - 21 Mar 2008 15:34 GMT On Mar 21, 8:04 am, "Giovanni Dicanio" <giovanni.dica...@invalid.com> wrote:
> When I have a List<T> in C#, I think that the "list" (i.e.: dynamically > growing array) stores "pointers" to T instances, e.g. considering a [quoted text clipped - 5 lines] > > std::vector< MyClass * > Hi,
They are stored by value. That is the good thing about generics in .NET, you do not need to box (create a reference) a value type. The opposite happens when you use for example ArrayList , in that case the value type is converted to a reference type
Ignacio Machin ( .NET/ C# MVP ) machin TA laceupsolutions.com
Jon Harrop - 21 Mar 2008 16:22 GMT > i.e. the C# version: > [quoted text clipped - 7 lines] > > vector< double * > The former. This is one of the things that C++ screwed up and is fixed in C#, making it much nicer to program in.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Giovanni Dicanio - 21 Mar 2008 18:34 GMT >> i.e. the C# version: >> [quoted text clipped - 10 lines] > The former. This is one of the things that C++ screwed up and is fixed in > C#, making it much nicer to program in. I agree with you. And that is not the only thing done better in C# than in C++ (other things include better string management and built-in support for Unicode strings, reflection, etc.)
However, there are also some things that IMHO are better in C++ than C#, like RAII and deterministic destruction. (The garbage collector is good for memory resources, but there are also *non*-memory resources, like files, sockets, textures... for them garbage collector is useless, and RAII and deterministic destruction are better.)
Giovanni
Ben Voigt [C++ MVP] - 21 Mar 2008 18:42 GMT >> i.e. the C# version: >> [quoted text clipped - 10 lines] > The former. This is one of the things that C++ screwed up and is > fixed in C#, making it much nicer to program in. Go ahead, spread the FUD. Or explain exactly how you feel the .NET (not C#) way is better than the stl containers. Surely the OP would be interested in knowing the differences and advantages of each.
Actually templates and generics are different in a lot of ways, but templates are much more powerful, effectively a superset of generics (modulo some bad error messages which will be fixed in C++0x).
Jon Skeet [C# MVP] - 21 Mar 2008 20:54 GMT <snip>
> Actually templates and generics are different in a lot of ways, but > templates are much more powerful, effectively a superset of generics (modulo > some bad error messages which will be fixed in C++0x). In terms of being a superset, a couple of things I don't *think* you can do with C++ templates:
1) Create a constructed type at execution time via reflection. I don't know how much reflection exists at *all* in C++ (or standard/common libraries) but my understanding of templates would suggest this would be harder.
2) Ship a generic type without any source. I could very well be wrong on this, but my understanding is that when you use a template you need the source code, so the compiler can build the specialized version. In .NET the compiler just says "I want a List<int>" and refers to the *binary* List<T>. This comes with a bunch of corollaries to do with versioning/upgrades, of course.
Please, please correct me if I'm wrong though :)
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet World class .NET training in the UK: http://iterativetraining.co.uk
Ben Voigt [C++ MVP] - 21 Mar 2008 22:28 GMT > <snip> > [quoted text clipped - 9 lines] > libraries) but my understanding of templates would suggest this would > be harder. C++ reflection gives dynamic type testing, but nothing more. COM gives more reflection-like capabilities, as long as you are using type libraries. Using CoCreateInstance or the factory pattern for instantiation, if you compiled the template against an interface type (pure abstract class), you could then use it with any coclass implementing that interface, pretty much the same as a generic. These execution-time constructed types aren't offering compile-time type safety in .NET anyway, so I don't consider it a drawback that C++/COM might require calls to QueryInterface.
> 2) Ship a generic type without any source. I could very well be wrong > on this, but my understanding is that when you use a template you need > the source code, so the compiler can build the specialized version. In > .NET the compiler just says "I want a List<int>" and refers to the > *binary* List<T>. This comes with a bunch of corollaries to do with > versioning/upgrades, of course. The C++ standard supports this, but Visual C++ doesn't. On other platforms that define a C++ ABI, this might give you some ability to upgrade templates without a recompile, but not much.
> Please, please correct me if I'm wrong though :) .NET generics do preserve "type identity" across instantiations in different modules as well, while C++ needs COM to accomplish that.
But this list pales in comparison to the things templates do that generics don't and can't emulate.
Arne Vajhøj - 24 Mar 2008 02:26 GMT > .NET generics do preserve "type identity" across instantiations in different > modules as well, while C++ needs COM to accomplish that. If C++ needs COM to do something then the C++ language/C++ standard environment does not support it.
It is a fair assumption that C# code is running on Windows, but it is a very bad assumption that C++ code is running on Windows.
Arne
Jon Harrop - 22 Mar 2008 02:14 GMT >> The former. This is one of the things that C++ screwed up and is >> fixed in C#, making it much nicer to program in. > > Go ahead, spread the FUD. The design flaws in C++ are widely appreciated.
> Or explain exactly how you feel the .NET (not C#) way is better than the > stl containers. Surely the OP would be interested in knowing the > differences and advantages of each. This has nothing to do with containers or even templates. The OP's confusion stems from the fact that C++ forces you to use pointers in order to make full use of its object system. In reality, pointers are not necessary for OOP.
> Actually templates and generics are different in a lot of ways, but > templates are much more powerful, effectively a superset of generics Turing argument.
Your subjective notion of "power" equates to slow compilation and poor error messages in practice, neither of which are desirable. The advantage of templates is a rather naff form of metaprogramming, long since obsoleted by JIT compilers.
> (modulo some bad error messages which will be fixed in C++0x). And Fortran 2003 will rule the world...
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Arne Vajhøj - 24 Mar 2008 02:31 GMT >> Actually templates and generics are different in a lot of ways, but >> templates are much more powerful, effectively a superset of generics [quoted text clipped - 5 lines] > templates is a rather naff form of metaprogramming, long since obsoleted by > JIT compilers. Try and search for C# generic calculate.
There are plenty of functionality in C++ templates that are not in C# generics.
And JIT does not change anything about that.
Arne
PS: That something is missing from a language does not necessarily mean that it will be good to add it. More is not always better.
Jon Harrop - 24 Mar 2008 05:47 GMT > Try and search for C# generic calculate. I cannot find what you are indicating. Do you have a specific reference?
> There are plenty of functionality in C++ templates that are not > in C# generics. Exactly. C# provides parametric polymorphism in the form of generics (which is one use of C++ templates) and JIT compilation for metaprogramming (which is the other use of templates).
> And JIT does not change anything about that. On the contrary, template metaprogramming in C++ is used to work around the absence of a JIT compiler. This is a fundamental aspect of lots of mainstream C++ libraries like Blitz++ and Boost. Reams have been written on the subject. But it is all a waste of time because run-time code generation is vastly more productive than template metaprogramming: orders of magnitude faster to compile and easier to debug.
> PS: That something is missing from a language does not necessarily mean > that it will be good to add it. More is not always better. The problem is that people like Ben Voigt spread mindless propaganda like "templates are much more powerful, effectively a superset of generics", implying that the Turing completeness of templates makes them superior with no regard for the crippling degradation in programming productivity incurred by slow compile times and awful error messages in C++.
In practice, the usability of generics easily outweights the benefits of templates.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 24 Mar 2008 14:54 GMT >> Try and search for C# generic calculate. > > I cannot find what you are indicating. Do you have a specific > reference? C# generics cannot possibly use any feature of the type argument that is not expressed as an interface implementation. Some things, like constructor parameters, cannot be expressed with interfaces, while others, like simple arithmetics, aren't on the built-in types. Whatever the reason, if there's no interface, C# generics can't do it.
Just so you understand fully, I challenge you to create a .NET generic for a moving average (ring buffer of N entries that returns the mean) which can be instantiated on both double and decimal (the only operations needed are add and divide). For bonus points make it work with any UDT that defines those two operations.
The C++ template is both more straightforward and more efficient.
>> There are plenty of functionality in C++ templates that are not >> in C# generics. [quoted text clipped - 7 lines] > On the contrary, template metaprogramming in C++ is used to work > around the absence of a JIT compiler. This is a fundamental aspect of Or gain the same capabilities but precompiled, so the deployed application is lightweight and ready to run immediately. JIT is a disaster if you need fast startup.
> lots of mainstream C++ libraries like Blitz++ and Boost. Reams have > been written on the subject. But it is all a waste of time because > run-time code generation is vastly more productive than template > metaprogramming: orders of magnitude faster to compile and easier to > debug. More FUD, especially the "easier to debug" part is just outright wrong. It's like saying that dynamic languages are more productive -- well they are for small tasks, but for larger ones the benefits of static typing outweigh the additional effort.
As an example of how much "easier to debug" run-time generated code is, try just getting a stack trace of a .NET program from a minidump. It's both far more complicated a user interface and far more complicated underneath, plus it won't work with a selective dump, you need all the internal JIT data structures included, bloating your dumps by hundreds of megabytes.
>> PS: That something is missing from a language does not necessarily >> mean that it will be good to add it. More is not always better. [quoted text clipped - 8 lines] > In practice, the usability of generics easily outweights the benefits > of templates. Jon Harrop - 24 Mar 2008 22:21 GMT > C# generics cannot possibly use any feature of the type argument that is > not > expressed as an interface implementation. Some things, like constructor > parameters, cannot be expressed with interfaces, while others, like simple > arithmetics, aren't on the built-in types. Whatever the reason, if > there's no interface, C# generics can't do it. Sure. I'm not contesting that templates try to solve problems that generics do not. I am saying that all of the other problems solved by templates are better solved by other techniques, most notably explicit code generation. Moreover, in the overlap between generics and templates, generics are a clear win because they are so much easier to use.
> Just so you understand fully, I challenge you to create a .NET generic for > a moving average (ring buffer of N entries that returns the mean) which [quoted text clipped - 4 lines] > > The C++ template is both more straightforward and more efficient. You have drawn that conclusion before looking at any evidence.
Let's look at some real examples. For trivial cases like the (useless) static factorial function you can encode your metaprograms in C++ templates:
http://aszt.inf.elte.hu/~gsd/halado_cpp/ch06s04.html
Here is their factorial implementation:
#include <iostream> template <int N> struct Factorial { enum { value = N * Factorial<N-1>::value }; }; template <> struct Factorial<1> { enum { value = 1 }; }; int main() { const int fact5 = Factorial<15>::value; std::cout << fact5 << endl; return 0; }
This is useless because you can just use a lookup function.
Now let's look at something non-trivial, like a high-performance FFT implementation. The fastest cross-platform FFT implementation is FFTW, which is C code generated by an OCaml program.
Many C++ advocates have tried to do the same code generation using C++ templates because it would make such a compelling example of templates actually being good for metaprogramming. However, every single person to have ever attempted to do this has completely and utterly failed. Dr Dobb's even published an article on exactly this:
http://www.ddj.com/cpp/199500857
If you look at the OCaml source code of FFTW you will see many non-trivial symbolic optimizations implemented using pattern matching over variant types that optimize the generated code in ways that are prohibitively difficult to implement using C++ templates.
The fact that C++ templates are theoretically capable of doing this because they are Turing complete is useless because it is practically impossible to do so because templates are so cumbersome to use for metaprogramming.
>> On the contrary, template metaprogramming in C++ is used to work >> around the absence of a JIT compiler. This is a fundamental aspect of > > Or gain the same capabilities That is wrong. C++ templates cannot perform specialization based upon run-time values.
> but precompiled, JIT provides both precompilation and run-time compilation. C++ only provides precompilation.
> so the deployed application is lightweight and ready to run immediately. That is also wrong. Because C++ templates cannot handle run-time values the only efficient solution is to precompile every conceivable permutation of inputs. This results in combinatorial code explosion with incredibly heavy-weight libraries and very slow loading times. C++ is famously bad for this.
In contrast, JIT compilation can lazily compile code as and when it is needed. This results in leaner libraries and executables that load faster and even work cross-platform.
> JIT is a disaster if you need fast startup. If your application requires submillisecond startup times, sure.
>> lots of mainstream C++ libraries like Blitz++ and Boost. Reams have >> been written on the subject. But it is all a waste of time because [quoted text clipped - 6 lines] > are for small tasks, but for larger ones the benefits of static typing > outweigh the additional effort. I'm just going to pull rank here. You need to learn some decent languages. Look at modern statically-typed functional programming languages for a start.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 24 Mar 2008 23:01 GMT >> C# generics cannot possibly use any feature of the type argument >> that is not [quoted text clipped - 19 lines] >> >> The C++ template is both more straightforward and more efficient. [snip talking about OCaml]
> I'm just going to pull rank here. You need to learn some decent > languages. Look at modern statically-typed functional programming > languages for a start. Last I checked we were comparing C++ templates to .NET generics. Are you claiming that you could insert F# everywhere you said OCaml and all your claims remain true? Last I heard you were complaining about the miserable performance of F#'s allocator.
You need to learn about .NET generics before you start saying they are oh-so-much-better than C++ templates. They aren't. Maybe OCaml generic programming is, I haven't played with that for a while, but definitely .NET generics fall far short.
Also you completely avoided my example.
For that matter, I don't think the OCaml moving average is going to beat the C++ template either, for performance or readability. Plus anyone trying to do an FFT with metaprogramming is doing it for the challenge, no other reason. A much better way is selecting between several pre-optimized routines based on a couple of conditions like magnitude of length and whether the length is a power of 2.
For example, how much .NET code (pick your language) would you need to accomplish the same thing as this?
template <typename Integral> inline bool isPowerOfTwo(Integral const n) { return (n & (n-1)) == 0; }
Jon Harrop - 25 Mar 2008 05:54 GMT > Last I checked we were comparing C++ templates to .NET generics. Yes. In the process, you brought up a problem (metaprogramming) that templates can solve but generics do not. My point is that .NET languages already have better ways to solve that problem, independently of generics.
> Are you claiming that you could insert F# everywhere you said OCaml and > all your claims remain true? Yes.
> Last I heard you were complaining about the miserable performance of F#'s > allocator. The single-threaded symbolic optimizer from FFTW would currently be ~5x slower in F# than it is in OCaml. There is no easy way to improve this.
> You need to learn about .NET generics before you start saying they are > oh-so-much-better than C++ templates. They aren't. The error reporting from generics is much better than templates.
> Maybe OCaml generic programming is, I haven't played with that for a > while, but definitely .NET generics fall far short. That's because you're trying to use a tool for parametric polymorphism for code generation. They are completely different problems best solved by completely independent solutions like generics and JIT compilation.
> Also you completely avoided my example. Your problem statement does not make sense to me. If you want the moving average to work properly over ints and floats then you must use different algorithms. Or do you want to cast the ints to floats?
> For that matter, I don't think the OCaml moving average is going to beat > the C++ template either, for performance or readability. On what basis?
> Plus anyone trying to do an FFT with metaprogramming is doing it for the > challenge, no other reason. On the contrary, this is how professionals write production-quality code. FFTW has millions of users and is at the core of one of the world's most successful technical computing environments, MATLAB.
> A much better way is selecting between several pre-optimized > routines based on a couple of conditions like magnitude of length and > whether the length is a power of 2. All of the world's best FFT implementations used to do that. Actually, they even wrote dozens of painstakingly hand-optimized routines and chose between them at run-time.
Then FFTW came along with its thousands of automatically optimized routines (codelets) and beat all other implementations hands down on performance whilst remaining platform independent.
> For example, how much .NET code (pick your language) would you need to > accomplish the same thing as this? [quoted text clipped - 4 lines] > return (n & (n-1)) == 0; > } In F#:
let inline is_pow2(n : #Math.INumeric) = n &&& (n - n.One) = n.Zero
As long as you stick to trivial examples you won't see the downside of C++ templates.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 25 Mar 2008 14:32 GMT >> Last I checked we were comparing C++ templates to .NET generics. > [quoted text clipped - 19 lines] > > The error reporting from generics is much better than templates. That is a known issue which I spoke of in my first post in this thread. C++0x fixes that too.
>> Maybe OCaml generic programming is, I haven't played with that for a >> while, but definitely .NET generics fall far short. [quoted text clipped - 3 lines] > problems best solved by completely independent solutions like > generics and JIT compilation. Generics, which are early bound, don't provide parametric polymorphism. Templates do.
>> Also you completely avoided my example. > > Your problem statement does not make sense to me. If you want the > moving average to work properly over ints and floats then you must > use different algorithms. Or do you want to cast the ints to floats? What part of "both double and decimal" didn't you understand?
And why would different algorithms be needed? In C++ the same algorithm will work for both int and float (until overflow becomes a problem, but the accumulator can be a different type) except for roundoff, which is expected for int.
>> For that matter, I don't think the OCaml moving average is going to >> beat the C++ template either, for performance or readability. [quoted text clipped - 19 lines] > routines (codelets) and beat all other implementations hands down on > performance whilst remaining platform independent. through the use of JIT?
No, as I recall the comparison process is very slow (I may be thinking of ATLAS rather than FFTW), and the compile time is only a small fraction of that.
>> For example, how much .NET code (pick your language) would you need >> to accomplish the same thing as this? [quoted text clipped - 9 lines] > let inline is_pow2(n : #Math.INumeric) = > n &&& (n - n.One) = n.Zero Oh, F# defines the bitwise and operation for all types it pretends to implement INumeric for? It's not part of the INumeric interface. It's not even part of the IIntegral interface. And it must turn that into something pretty ugly in order for the CLR to accept it.
> As long as you stick to trivial examples you won't see the downside > of C++ templates. If they are so trivial, how come .NET generics can't solve them?
Jon Harrop - 25 Mar 2008 16:37 GMT > That is a known issue which I spoke of in my first post in this thread. > C++0x fixes that too. Has C++0x ever been implemented?
> Generics, which are early bound, don't provide parametric polymorphism. I suggest you read up on parametric polymorphism.
> What part of "both double and decimal" didn't you understand? "decimal" because it is meaningless.
> And why would different algorithms be needed? Because these types have completely different properties in this context:
. Ints do not accumulate round-off error. Floats do.
. The "/" operator you use is Euclidean quotient for ints and an approximation to real division for floats.
> In C++ the same algorithm > will work for both int and float (until overflow becomes a problem, but > the accumulator can be a different type) except for roundoff, which is > expected for int. As I thought, your algorithm is broken. This is difficult to demonstrate because you failed to provide working C++ code.
>> Then FFTW came along with its thousands of automatically optimized >> routines (codelets) and beat all other implementations hands down on >> performance whilst remaining platform independent. > > through the use of JIT? FFTW precompiles the most common codelets and puts them in its standard library. With a JIT, you can compile any codelet as and when you need it.
> No, as I recall the comparison process is very slow (I may be thinking of > ATLAS rather than FFTW), and the compile time is only a small fraction of > that. That is generally the case, yes.
>>> For example, how much .NET code (pick your language) would you need >>> to accomplish the same thing as this? [quoted text clipped - 12 lines] > Oh, F# defines the bitwise and operation for all types it pretends to > implement INumeric for? No, the INumeric interface is only used for the n.One and n.Zero. The inlining makes this function generic over everything else just like a C++ template.
> It's not part of the INumeric interface. Yes.
> It's not even part of the IIntegral interface. Yes, of course.
> And it must turn that into something pretty ugly in order for the CLR to > accept it. I've no idea how it is compiled but it provides strictly more functionality than your C++ template.
>> As long as you stick to trivial examples you won't see the downside >> of C++ templates. > > If they are so trivial, how come .NET generics can't solve them? How come a ripe banana can't solve them?
Nobody cares when modern languages are more expressive in every respect.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 25 Mar 2008 17:08 GMT >>>> For example, how much .NET code (pick your language) would you need >>>> to accomplish the same thing as this? [quoted text clipped - 16 lines] > inlining makes this function generic over everything else just like a > C++ template. Ok, now I understand.
The ironic thing is that you've proved my point, by showing that F# needed to implement a feature similar to C++ templates in order to overcome the deficiencies in .NET generics.
Jon Harrop - 25 Mar 2008 19:39 GMT >> No, the INumeric interface is only used for the n.One and n.Zero. The >> inlining makes this function generic over everything else just like a [quoted text clipped - 5 lines] > to implement a feature similar to C++ templates in order to overcome the > deficiencies in .NET generics. Exactly. Templates were obsoleted by a variety of alternatives (for parametric polymorphism, inlining and metaprogramming). Pretending that C++ is better because templates provide an unusably poor solution to problems that generics did not even attempt to address is just deceitful propaganda.
As I said, templates can do more metaprogramming than a ripe banana. That information is also of no use but at least it isn't deliberately misleading.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 25 Mar 2008 17:15 GMT >> What part of "both double and decimal" didn't you understand? > > "decimal" because it is meaningless. We are posting in a C# group about .NET.
decimal is a C# keyword just like double. System.Decimal is a standard .NET type.
Before you "pull rank" you should make sure you are familar with the subject matter. http://msdn2.microsoft.com/en-us/library/x53a06bb.aspx http://msdn2.microsoft.com/en-us/library/364x0z75.aspx
Jon Harrop - 25 Mar 2008 19:31 GMT >>> What part of "both double and decimal" didn't you understand? >> [quoted text clipped - 4 lines] > decimal is a C# keyword just like double. > System.Decimal is a standard .NET type. My explanation of why your algorithm is fundamentally broken still stands. If you implement the floating-point and integer algorithms properly you will see that there is nothing worth factoring out between them.
> Before you "pull rank"... Out of context.
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 25 Mar 2008 22:58 GMT >>>> What part of "both double and decimal" didn't you understand? >>> [quoted text clipped - 9 lines] > properly you will see that there is nothing worth factoring out > between them. The implementation would be:
Maintain the last N items in a ring buffer. (This consists of an array of size N and the index of the oldest element). Maintain the sum of all the items in an accumulator. On arrival of a new data point, deduct the oldest datum from the sum, overwrite it with the new one (updating the index), add the new one to the sum, return sum / N.
This works very well for any numeric data type as long as the accumulator doesn't overflow, but the accumulator doesn't need to be the same type as each buffer element.
Jon Harrop - 25 Mar 2008 23:42 GMT >> My explanation of why your algorithm is fundamentally broken still >> stands. If you implement the floating-point and integer algorithms [quoted text clipped - 9 lines] > overwrite it with the new one (updating the index), add the new one to the > sum, return sum / N. That is the fundamentally-broken algorithm that I was expecting you to describe: you have implicitly assumed that addition and subtraction commute, which is true for integers and not floats.
Specifically, your algorithm will suffer from cumulative round-off errors and will not handle special values (e.g. infinities) at all.
> This works very well for any numeric data type as long as the accumulator > doesn't overflow, That is not true. You only need one element 10^17x larger than any other currently in the buffer and you lose all precision in your accumulator from catastrophic round-off error, hundreds of orders of magnitude before any overflow.
> but the accumulator doesn't need to be the same type as each buffer > element. I see. And what type would you recommend for the accumulator? An arbitrary-precision rational, perhaps?
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 26 Mar 2008 14:38 GMT >>> My explanation of why your algorithm is fundamentally broken still >>> stands. If you implement the floating-point and integer algorithms [quoted text clipped - 30 lines] > I see. And what type would you recommend for the accumulator? An > arbitrary-precision rational, perhaps? The issue you describe is theoretically valid, but not applicable to real-world data created via analog-to-digital sampling and linear functions thereof, which is probably somewhere upwards of 99% of all data sets. If it is a problem, you can always recalculate the sum directly from the stored elements.
Jon Harrop - 28 Mar 2008 13:54 GMT > The issue you describe is theoretically valid, but not applicable to > real-world data created via analog-to-digital sampling and linear > functions thereof, which is probably somewhere upwards of 99% of all data > sets. I guess my experience has been tainted by my extensive use of really esoteric mathematical functions like sqrt and even quadratics. Nobody would want to use those so-called non-linear functions in the "real world". Or big numbers. Nobody uses big numbers in the real world either. Or small numbers.
Would you also say that positive numbers account for around 99% of all "real world" numbers too?
 Signature Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?u
Ben Voigt [C++ MVP] - 28 Mar 2008 18:36 GMT >> The issue you describe is theoretically valid, but not applicable to >> real-world data created via analog-to-digital sampling and linear [quoted text clipped - 7 lines] > big numbers. Nobody uses big numbers in the real world either. Or small > numbers. The question isn't whether people use big numbers, or people use small numbers. The question is whether people have both really big and really small numbers in the same dataset. Not often. In datasets where you want to talk about the moving average (via arithmetic mean)... almost never.
> Would you also say that positive numbers account for around 99% of all > "real > world" numbers too? Probably not 99%. If I had to guess I'd say non-negative (as opposed to strictly positive) are around 92%. Still a large enough fraction that an algorithm that doesn't handle any negative numbers is still pretty doggone useful. Which might be why most programming languages have "unsigned" types.
Just because some mathematical number exists that an algorithm can't handle doesn't mean the algorithm is useless.
Marc Gravell - 24 Mar 2008 23:52 GMT > Just so you understand fully, I challenge you to create a .NET generic for a > moving average (ring buffer of N entries that returns the mean) which can be > instantiated on both double and decimal (the only operations needed are add > and divide). For bonus points make it work with any UDT that defines those > two operations. If the mood took me, I'm pretty sure U could take that challenge, and the bonus points for UDT: www.pobox.com/~skeet/csharp/genericoperators.html
(full download available in MiscUtil, and for the record, I also have a .NET 2.0 version as well as the .NET 3.5 version cited)
> The C++ template is both more straightforward and more efficient. Stating it (by itself) isn't exactly conclusive. Besides; C++ templates are compile time; I can make new types and use them with generics at runtime. The JIT isn't perfect, but generally does a pretty good job... pretty flexible.
Marc
Marc Gravell - 24 Mar 2008 23:53 GMT >sure U could oops - I meant "I" could... QWERTY slip...
Marc
Ben Voigt [C++ MVP] - 25 Mar 2008 15:12 GMT >> Just so you understand fully, I challenge you to create a .NET >> generic for a moving average (ring buffer of N entries that returns [quoted text clipped - 15 lines] > generics at runtime. The JIT isn't perfect, but generally does a > pretty good job... pretty flexible. Except that .NET's JIT is neutered when it comes to reference types as generic arguments. Or does your .NET 3.5 Expression-based implementation overcome that as well?
> Marc Marc Gravell - 25 Mar 2008 16:53 GMT It depends on what you mean by neutered - plus in all the cases discussed above (decimal vs double vs int etc) we are talking about value-types, not reference types. As far as the JIT is concerned, it doesn't *need* more than one reference-type IL version, since every reference takes the same size and is accessed the same (possibly losing out on some "sealed" optimisations, but heh...)
As it happens, the generic-operator stuff will cope with either reference-type or value-type without issue (and in particular without boxing) - but if you clarify the question I might be able to give a better answer.
Marc
Jon Skeet [C# MVP] - 25 Mar 2008 17:03 GMT > It depends on what you mean by neutered - plus in all the cases discussed > above (decimal vs double vs int etc) we are talking about value-types, not [quoted text clipped - 8 lines] > boxing) - but if you clarify the question I might be able to give a better > answer. I think what Ben's driving at is that the JIT doesn't inline code which could be inlined for reference types. It might do for value types, because it JITs on a per-type basis, but for reference types it has to assume that any virtual method *might* be overridden, because it's going to use the same native code for all type arguments which are reference types.
There may well be other optimisations with similar issues.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet World class .NET training in the UK: http://iterativetraining.co.uk
Ben Voigt [C++ MVP] - 25 Mar 2008 17:12 GMT > It depends on what you mean by neutered - plus in all the cases > discussed above (decimal vs double vs int etc) we are talking about [quoted text clipped - 8 lines] > boxing) - but if you clarify the question I might be able to give a > better answer. I was referring to this line in the page you linked me to:
<quote> the JIT-compiler is remarkably good as inlining the simple code that invokes the methods; enough that runtime performance is within a few percent of regular compiled code. </quote>
AFAIK, this sort of inlining doesn't take place at all when the generic argument is a reference type. I was wondering if maybe using the Expression class is causing the JIT to specialize for each different reference type, overcoming the problem.
> Marc Marc Gravell - 25 Mar 2008 21:18 GMT I'm not sure that is a meaningful comparison to standard generics, since operators are static and thus outside of what generics offers by default. But yes; I think (from your and Jon's comments) that this relates back to the "sealed" comment in my post above - it may well miss some optimisations due to not being optimised by type. The generic operator code *has* to be per-type (even for reference-types) simply to exist, since it isn't driven by generics. Basically I compile a delegate on the fly and cache it against the type.
In reality, the cost of invoking a delegate is probably (untested) higher than the cost of a virtual call, so I don't think that the delegate approach is going to be a blanket fix - but it would / perhaps/ merrit testing *if* there is a demonstrable case where the virtual call is actually being a bottleneck.
Marc
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|