.NET Forum / .NET Framework / New Users / August 2005
How to parse various types without a switch?
|
|
Thread rating:  |
John - 07 Aug 2005 07:42 GMT Hi, I need to read a big CSV file, where different fields should be converted to different types, such as int, double, datetime, SqlMoney, etc.
I have an array, which describes the fields and their types. I would like to somehow store a reference to parsing operations in this array (such as Int32.Parse, Double.Parse, SqlMoney.Parse, etc), so I can invoke the appropriate one without writing a long switch.
Using reflection is not an option for performance reasons.
I tried to create a delegate, but since Int32.Parse, Double.Parse, etc. all have different return types, creating a common delegate type appears to be impossible.
For now I ended up writing wrappers around Parse methods for each type, such as static object ParseDouble(string str) {return double.Parse(str);} then inserting delegates to these methods into the array.
This seems to work, but it looks pretty ugly. I still hope that .NET framework has a way to do it in official way, which I overlooked. For example, it has interface IConvertible, which can be used to achieve opposite: convert an object to various types, but I cannot find an official way to do parsing.
Thank you John
Cor Ligthert [MVP] - 07 Aug 2005 08:18 GMT John,
You have crossposted (nothing wrong with) however I think that you be better of with crossposting this to a language newsgroup too, because we don't know what program language you are using now.
The two largest developer newsgroups of Microsoft are beside Excel.developer and ASPNET
microsoft.public.dotnet.csharp and microsoft.public.dotnet.languages.vb
At least when you post to the two newsgroups that you are posting now, tell us than what program language you use.
I hope this helps,
Cor
hB - 07 Aug 2005 10:48 GMT since Parse() is not declared in interface, you have to create wrappers. It would be like object Parse(string csvstrword);
(In C we have sscanf, it can read into many datatypes.)
--- hB
John - 07 Aug 2005 21:36 GMT Thank you for the answer.
> since Parse() is not declared in interface, you have to create wrappers. Yes, this is what I ended up doing (see my original post):
static object ParseDouble(string str) {return double.Parse(str);} ....
I just wanted to double check whether I am missing some standard solution, and, actually, I was missing Convert.ChangeType, as Klaus H. Probst showed.
John
John - 07 Aug 2005 21:31 GMT Thank you for the answer.
> You have crossposted It is very hard to figure out the difference between microsoft.public.dotnet.framework and microsoft.public.dotnet.general What is the proper place to post questions about the library?
> I think that you be better of with crossposting this to a language > newsgroup too, because we don't know what program language you are using > now. > At least when you post to the two newsgroups that you are posting now, > tell us than what program language you use. I don't understand, what's the difference? My question is about the library (Framework), not the language. I guess all (or most) of the solutions, such as interfaces, delegates, reflection, etc. are available to both C# and VB. Currently I write on C#, but this should not matter.
Thank you John
Klaus H. Probst - 07 Aug 2005 11:14 GMT > Using reflection is not an option for performance reasons. Reflection doesn't have to be slow. You can't get rid of the overhead, but if you code it correctly it can be quite fast.
> I tried to create a delegate, but since Int32.Parse, Double.Parse, etc. > all have different return types, creating a common delegate type > appears to be impossible. Return an object and unbox it (if applicable) after the delegate returns. This will have less overhead than reflection.
Or, if you can resolve the actual type of the value being parsed you can create a sort of generic converter function using Convert.ChangeType:
public object TryParse(object /* string */ val, System.Type type) { try { return Convert.ChangeType(val, type); } catch { return null; } }
And call it like:
string s = "3.41"; double d = (double) TryParse(s, System.Double);
 Signature Klaus H. Probst, MVP http://www.simulplex.net/
hB - 07 Aug 2005 11:31 GMT I have a better solution, if I understand your problem correctly :P
Assumption. since CSV has data, all in strings, like 1, "name" , "rank", "1.2.2006" ....
Example: [STAThread] static void Main(string[] args) { IFF[] i = new IFF[2]; i[0] = new BFF(new Int32()); i[1] = new BFF(new Double());
object o = i[0].parse("1");//You have CSV in a proper manner o = i[1].parse("1.1"); }
public interface IFF { object parse(string s); }
public class BFF : IFF { private object myobj; public BFF(object ob) { myobj = ob; }
public object parse(string s) { try { Type tp = myobj.GetType(); System.Reflection.MethodInfo mi = tp.GetMethod("Parse",new Type[]{typeof(System.String)}); object[] param = new object[1]; param[0] = s ; object o = mi.Invoke(myobj,param); //myobj.Parse(o); return o; } catch { return null; } } }
John - 07 Aug 2005 21:41 GMT >I have a better solution ..
> System.Reflection.MethodInfo mi = tp.GetMethod("Parse",new Sorry, I am not going to use reflection for performance reason.
John
John - 07 Aug 2005 21:39 GMT Thank you for answer.
> Reflection doesn't have to be slow. You can't get rid of the overhead, but > if you code it correctly it can be quite fast. May be it does not have to be, but it is.
Unfortunately, in my own benchmarking calling a static empty method without arguments using reflection (MethodInfo.Invoke) is 300 times slower than a direct call, calling the same method using a delegate is 10% slower. Calling Int32.Parse using reflection is 15 times slower than a direct call, delegate is 0.5% slower. So, I will not use reflection is a loop (unless you show that my results are wrong).
> can create a sort of generic converter function > Convert.ChangeType(val, type); Thank you, I completely missed this one in my original search. However, I am not going to use it. Convert.ChangeType is implemented as a kind of a switch, which I tried to avoid at the first place. As the result: - it is 30% slower than the delegate - it can only handle standard types and not the Sql* types.
Thank you John
Cor Ligthert [MVP] - 07 Aug 2005 13:50 GMT John,
I thought that Paul was more active in some other newsgroups the last time.
However I see he is it here as well.
See what he wrote about your problem in this newsgroup. http://groups-beta.google.com/group/microsoft.public.dotnet.general/msg/a7c29549 7ae67bf4?hl=en&
I am not so familiar with those Ini files so maybe you can search for that when the message from Paul is not sufficient enough or wait until he sees this. Your subject is however not one that in my opinion gets direct the eye from Paul.
I hope this helps,
Cor
hB - 07 Aug 2005 14:44 GMT I think i have provided John a pretty easy way. Note that reflection can be used only once, just extract methodinfo and then keep invoking it in parse() function.
--- hB
John - 07 Aug 2005 21:48 GMT > See what he wrote about your problem in this newsgroup. > http://groups-beta.google.com/group/microsoft.public.dotnet.general/msg/a7c29549 7ae67bf4?hl=en& This is an interesting solution, but I am not sure I want to redesign my program to read CSV to a dataset, then extract data from the dataset instead of parsing them directly.
Thank you John
Nick Malik [Microsoft] - 08 Aug 2005 02:25 GMT Your problem is easily handled using the decorator pattern with a builder pattern to construct the parsing. Observer pattern can be used, but is not terribly efficient. Switch statements are not needed in the parsing, but may be needed in the builder.
Take at look at the builder pattern and the decorator pattern by googling these names. They are standard OO patterns from the Gang of Four (GoF).
Let me know if you want help implementing them.
 Signature --- Nick Malik [Microsoft] MCSD, CFPS, Certified Scrummaster http://blogs.msdn.com/nickmalik
Disclaimer: Opinions expressed in this forum are my own, and not representative of my employer. I do not answer questions on behalf of my employer. I'm just a programmer helping programmers. --
> Hi, > I need to read a big CSV file, where different fields should be converted [quoted text clipped - 27 lines] > Thank you > John Cor Ligthert [MVP] - 08 Aug 2005 08:22 GMT Nick,
I have the idea I miss something here, why not try to convert a CSV using OleDB with an Ini file. I don't know if it completly works setting the datatypes because without that it is almost forever string.
Can you enlighten me what I miss?
(seriously meant)
Cor
Nick Malik [Microsoft] - 09 Aug 2005 09:31 GMT Hi Cor,
I looked up the post that you refer to. For some reason, I hadn't seen that reply from Paul, but it is an excellent reply. Honestly, if the format of the CSV file is rarely changing or changes only with advance notice, his answer is far-and-away the best answer to use. The TEXT OleDb driver is debugged and easy to configure.
My suggestion would only be valid if the application needs to adapt itself to the data on the fly. In other words, if the app needs to allow the user to provide a format, or the format can be deduced, but it cannot be configured in advance.
In that case, you should create a simple decorator pattern. The endpoint object would discard the remainder of the line. You decorate the object with a class for each data type, reading right to left, to create the object structure in memory. The builder does this work. Then, it is a matter of sending each line through the data structure. The tokens are pulled off from left to right (reverse of the order in which it was built). It is fast and dynamic. No need for reflection.
 Signature --- Nick Malik [Microsoft] MCSD, CFPS, Certified Scrummaster http://blogs.msdn.com/nickmalik
Disclaimer: Opinions expressed in this forum are my own, and not representative of my employer. I do not answer questions on behalf of my employer. I'm just a programmer helping programmers. --
> Nick, > [quoted text clipped - 8 lines] > > Cor John - 09 Aug 2005 19:11 GMT Hi, Thank you for your answer.
> I looked up the post that you refer to. For some reason, I hadn't seen > that reply from Paul, but it is an excellent reply. As I already answered last Sunday, this is an interesting solution, but I am not sure I want to redesign my program to read CSV to a dataset, then extract data from the dataset instead of parsing them directly. But I will remember this as an option for the future.
> My suggestion would only be valid if the application needs to adapt itself > to the data on the fly. Yes, currently I store the field names in the first row of the CSV. This allows easy viewing of the CSV in Excel and pretty flexible parsing.
> You decorate the object with a class for each data type. If this will require writing a new class for each data type, then it will require at least twice more lines of code than the wrapper/delegate approach. And in general the code looks more complex.
> No need for reflection. I never considered the reflection for performance reasons despite what some other people suggested.
Thank you John
Nick Malik [Microsoft] - 10 Aug 2005 07:16 GMT Hello John,
In your original post, you stated:
>> I still hope that .NET framework has a way to do it in official way, >> which I overlooked. << I was simply pointing out that there is a way that you had overlooked... the decorator pattern. Note that you can do this with the visitor pattern as well. You chose the observer pattern. Personally, I would not have done, but that is your choice.
>> You decorate the object with a class for each data type. > > If this will require writing a new class for each data type, > then it will require at least twice more lines of code than > the wrapper/delegate approach. And in general the code looks more complex. I would agree that there are more classes. I would also state that you have a much more OO approach this way. I would disagree that it looks more complex. On the contrary, it looks much simpler.
>> No need for reflection. > > I never considered the reflection for performance reasons > despite what some other people suggested. I am aware of your other responses dealing with reflection. I pointed this out to let you know that, unlike other suggestions, this pattern does not require reflection.
 Signature --- Nick Malik [Microsoft] MCSD, CFPS, Certified Scrummaster http://blogs.msdn.com/nickmalik
Disclaimer: Opinions expressed in this forum are my own, and not representative of my employer. I do not answer questions on behalf of my employer. I'm just a programmer helping programmers. --
John - 10 Aug 2005 19:37 GMT Thank you for your answer.
I guess this discussion drifted a bit off topic. I am well aware of patterns and was using them in various languages (when appropriate) but this question was about the .NET library (framework).
Patterns will not save me from calling the framework Parse() or Convert.ChangeType() routines. They will just pile up more code on top.
> Note that you can do this with the visitor pattern as well. You chose the > observer pattern. I don't believe I am using any pattern right now, especially an observer. As you, probably aware, the essence of an observer pattern is that one or several listeners are registered to hear an event which may be raised by the observed object. My code certainly does not have any listeners, events or observed objects. All it has is a static table of structures describing types (with names and delegates for parsing), several one-or-two-line wrappers for Parse routines for various types, an array of types for a particular CSV, and a simple loop, which invokes an appropriate delegate for each field of CSV row. So, if you insist on using OO terms here, I have a trivial case of polymorphism and no more than that.
> I would agree that there are more classes. I would also state that you > have a much more OO approach this way. I am not a purist, who is using a particular approach just for the sake of using it. OO approach is not a sacred cow and using it is not the goal of my project. The code should be simple, reliable and efficient. If OO helps me to achieve it, great, if not - forget it.
> I would disagree that it looks more complex. On the contrary, it looks > much simpler. Your code (writing a dozen or two of custom decorator classes for system types), will not fit a screen. My table-driven approach does fit one screen and have less executable code (only one-line wrappers) and a loop. So, it is simpler, easier to read, maintain and debug. Code bloat will achieve the opposite - more code, harder to read, maintain, have more bugs, etc.
By the way, I posted in another response a thought that C++ may have allowed me to generate custom decorators for system types using templates. In that case I may have considered using the decorator approach if generation of a single decorator will take only one line (such as MyDecorator<Int32>, MyDecorator<double>, etc.). But C# and .NET generics are no good for this purpose.
> I pointed this out to let you know that, unlike other suggestions, this > pattern does not require reflection. Sure, I never said that it does. Patterns usually use virtual methods, which (I guess) should be similar to delegates in performance. So, I don't reject your method because of performance, but because of code bloat.
Thank you John
John - 10 Aug 2005 21:04 GMT Actually I found that I can create templates using managed C++:
interface class MyParserIntf { public: virtual Object^ Parse(String^ str); };
template <typename T> ref class MyParser : public MyParserIntf { public: virtual Object^ Parse(String^ str) { return T::Parse(str); } }; ...... array<MyParserIntf^> ^ Parsers = gcnew array<MyParserIntf^> {//All types known to the CSV processor gcnew MyParser<Int32>(), gcnew MyParser<DateTime>(), gcnew MyParser<double>(), .......... }; ..... array<MyParserIntf^> ^ CurCSVParsers; generate CurCSVParsers array of parsers for the current CSV ...... for(int i = 0; i < CurCSVParsers->Length; i++) { l_Results[i] = CurCSVParsers[i]->Parse(l_SampleRow[i]); }
This way I can implement the decorator pattern as Nick suggested with only a single line per the system type (if I will find any any advantage of doing so over the simple code above).
But I don't see a way to mix the C++ and C# in the same project. And creating a separate DLL just for parsing seems to be an overkill.
John
Jay B. Harlow [MVP - Outlook] - 10 Aug 2005 22:19 GMT John, While reviewing another thread, I found Java 5.0 supports very powerful Enums.
http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html
From what I've read Java 5.0 got Enums right! At least they implemented Enums the way I probably would have, then added some features that are useful, but I'm not sure I would have included...
I could see you defining your Field type as a Java Enum instead of using System.Type, allowing you to give each enum constant its own unique behavior...
Although it doesn't help you on this project, I found it to be a very cool feature...
Just a thought Jay
| Actually I found that I can create templates using managed C++: | [quoted text clipped - 38 lines] | | John John - 10 Aug 2005 23:19 GMT > While reviewing another thread, I found Java 5.0 supports very powerful > Enums. > > http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html Yes, kind of overblown enums.
> Although it doesn't help you on this project, I found it to be a very cool > feature... Well, you are wrong. Reading about Java Enums, which now allow so called "constant-specific methods", I realized that these are pretty similar to old and familiar anonymous classes, which made me thinking that C# 2.0 also have similar anonymous methods, so I can throw away those separate Parse wrappers and implement the parsers inline right in the table:
delegate object Parse(string str); static Parse[] Types = new Parse[] { delegate(string str) {return Int32.Parse(str);}, delegate(string str) {return DateTime.Parse(str);}, delegate(string str) {return double.Parse(str);}, }; .... for (int i = 0; i < l_SampleRow.Length; i++) { l_Results[i] = (CSVTypes[i])(l_SampleRow[i]); } Looks simple enough.
Thank you John
Jay B. Harlow [MVP - Outlook] - 11 Aug 2005 00:15 GMT John,
| which made me thinking that C# 2.0 also have similar anonymous methods, | so I can throw away those separate Parse wrappers and implement the parsers | inline | right in the table: If you have the luxury of using C# 2.0, then yes anonymous methods might be a good way to simplify the wrappers methods.
It appears that it should work, does it?
Thanks for the follow up. Jay
|> While reviewing another thread, I found Java 5.0 supports very powerful | > Enums. [quoted text clipped - 30 lines] | Thank you | John Jay B. Harlow [MVP - Outlook] - 09 Aug 2005 18:28 GMT John, In addition to the other comments.
My first choice would be Convert.ChangeType as Klaus shows.
My second choice would be an Adapter pattern, similar to your array of Wrapper Delegates. In addition to using Delegates as you did, I would consider using a series of classes that implemented an interface or shared a common base class.
Something like:
Public Interface IConverter
Function Parse(ByVal s As String) As Object
End Interface
Public Class DoubleConverter Implements IConverter
Public Function Parse(ByVal s As String) As Object Implements IConverter.Parse Return Double.Parse(s) End Function
End Class
Public Class ConverterCollection Inherits DictionaryBase
Public Sub Add(ByVal type As Type, ByVal converter As IConverter) MyBase.InnerHashtable.Add(type, converter) End Sub
Default Public ReadOnly Property Item(ByVal type As Type) As IConverter Get Return DirectCast(MyBase.InnerHashtable.Item(type), IConverter) End Get End Property
End Class
Public Shared Sub Main() Dim converters As New ConverterCollection converters.Add(GetType(Double), New DoubleConverter)
End Sub
The disadvantage of the interface/class method is the proliferation of classes. The advantage of the delegate method is the elimination of all the classes...
As to performance: Remember the 80/20 rule. That is 80% of the execution time of your program is spent in 20% of your code. I will optimize (worry about performance, memory consumption) the 20% once that 20% has been identified & proven to be a performance problem via profiling (CLR Profiler is one profiling tool).
For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's article "Yet Another Optimization Article" at http://martinfowler.com/ieeeSoftware/yetOptimization.pdf
Hope this helps Jay
| Hi, | I need to read a big CSV file, where different fields should be converted to [quoted text clipped - 26 lines] | Thank you | John Jay B. Harlow [MVP - Outlook] - 09 Aug 2005 18:38 GMT Doh!
I should add that System.ComponentModel.TypeConverter might be a class that you could leverage instead of creating your own IConverter class.
You can use TypeDescripter.GetConverter to get the TypeConverter for a Type or Object. If performance was a consideration I would consider caching the TypeConverters.
Of course instead of storing the converters in their own hash table as I showed earlier, you could store them in type describing each field...
Something like:
Public Class FieldDescription
Public Name As String
Public Type As Type
Public Converter As IConverter
' alternate to IConverter or your delegate... Public Converter As TypeConverter
End Class
Hope this helps Jay
| John, | In addition to the other comments. [quoted text clipped - 96 lines] || Thank you || John John - 09 Aug 2005 19:36 GMT Hi, Thank you for your answer.
> The disadvantage of the interface/class method is the proliferation of > classes. Exactly. I asked this question at the first place because I didn't like the proliferation of wrappers. Am just lazy. After typing 2 or 3 wrappers I was bored and posted this message last Saturday. But the proliferation of classes is much worse because it will require 2 (or 3) times more lines of code than the wrappers. I am certainly not going to type them.
By the way, in C++ I may have used templates to automatically generate wrappers. I had big hopes for the generics feature in .NET, but I was disappointed when found that generics cannot be used in this (and in many other cases) because there is no base class or interface common to all the types, which exposes Parse method. But without such common interface the compiler refuses to compile the generic class. Stupid!
> Remember the 80/20 rule. Yes, I am reminded of this rule every time I am, as a user, have to suffer running a slow program. I know what the developers were thinking.
> (CLR Profiler is one profiling tool). My CSV may have few millions of records. Remembering that reflection is 15 times (1500%) slower than delegates in this case, I don't want to waste my time on the profiler (in this case a wristwatch if sufficient).
So, I am sticking to the wrappers/delegates, but I appreciate all answers, I learned about TEXT OleDb driver and Convert.ChangeType and may use them in the future.
Thank you John
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|