Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / August 2005

Tip: Looking for answers? Try searching our database.

How to parse various types without a switch?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
John - 07 Aug 2005 07:42 GMT
Hi,
I need to read a big CSV file, where different fields should be converted to
different types,
such as int, double, datetime, SqlMoney, etc.

I have an array, which describes the fields and their types. I would like
to somehow store a reference to parsing operations in this array
(such as Int32.Parse, Double.Parse, SqlMoney.Parse, etc),
so I can invoke the appropriate one without writing a long switch.

Using reflection is not an option for performance reasons.

I tried to create a delegate, but since Int32.Parse, Double.Parse, etc.
all have different return types, creating a common delegate type
appears to be impossible.

For now I ended up writing wrappers around Parse methods for each type,
such as
static object ParseDouble(string str) {return double.Parse(str);}
then inserting delegates to these methods into the array.

This seems to work, but it looks pretty ugly. I still hope that .NET
framework
has a way to do it in official way, which I overlooked. For example,
it has interface IConvertible, which can be used to achieve opposite:
convert
an object to various types, but I cannot find an official way to do parsing.

Thank you
John
Cor Ligthert [MVP] - 07 Aug 2005 08:18 GMT
John,

You have crossposted (nothing wrong with) however I think that you be better
of with crossposting this to a language newsgroup too, because we don't know
what program language you are using now.

The two largest developer newsgroups of Microsoft are beside Excel.developer
and ASPNET

microsoft.public.dotnet.csharp
and
microsoft.public.dotnet.languages.vb

At least when you post to the two newsgroups that you are posting now, tell
us than what program language you use.

I hope this helps,

Cor
hB - 07 Aug 2005 10:48 GMT
since Parse() is not declared in interface, you have to create
wrappers.
It would be like object Parse(string csvstrword);

(In C we have sscanf, it can read into many datatypes.)

---
hB
John - 07 Aug 2005 21:36 GMT
Thank you for the answer.

> since Parse() is not declared in interface, you have to create wrappers.

Yes, this is what I ended up doing (see my original post):

static object ParseDouble(string str) {return double.Parse(str);}
....

I just wanted to double check whether I am missing some standard solution,
and, actually, I was missing Convert.ChangeType, as Klaus H. Probst showed.

John
John - 07 Aug 2005 21:31 GMT
Thank you for the answer.

> You have crossposted

It is very hard to figure out the difference between
microsoft.public.dotnet.framework and microsoft.public.dotnet.general
What is the proper place to post questions about the library?

> I think that you be better of with crossposting this to a language
> newsgroup too, because we don't know what program language you are using
> now.
> At least when you post to the two newsgroups that you are posting now,
> tell us than what program language you use.

I don't understand, what's the difference? My question is about the library
(Framework),
not the language. I guess all (or most) of the solutions, such as
interfaces, delegates, reflection, etc. are available to both C# and VB.
Currently I write on C#, but this should not matter.

Thank you
John
Klaus H. Probst - 07 Aug 2005 11:14 GMT
> Using reflection is not an option for performance reasons.

Reflection doesn't have to be slow. You can't get rid of the overhead, but
if you code it correctly it can be quite fast.

> I tried to create a delegate, but since Int32.Parse, Double.Parse, etc.
> all have different return types, creating a common delegate type
> appears to be impossible.

Return an object and unbox it (if applicable) after the delegate returns.
This will have less overhead than reflection.

Or, if you can resolve the actual type of the value being parsed you can
create a sort of generic converter function using Convert.ChangeType:

public object TryParse(object /* string */ val, System.Type type) {
   try {
       return Convert.ChangeType(val, type);
   }
   catch {
       return null;
   }
}

And call it like:

string s = "3.41";
double d = (double) TryParse(s, System.Double);

Signature

Klaus H. Probst, MVP
  http://www.simulplex.net/

hB - 07 Aug 2005 11:31 GMT
I have a better solution, if I understand your problem correctly :P

Assumption.
since CSV has data, all in strings, like
1, "name" , "rank", "1.2.2006"
....

Example:
[STAThread]
        static void Main(string[] args)
        {
            IFF[] i = new IFF[2];
            i[0] = new BFF(new Int32());
            i[1] = new BFF(new Double());

            object o = i[0].parse("1");//You have CSV in a proper manner
            o = i[1].parse("1.1");
        }

public interface IFF
    {
        object parse(string s);
    }

public class BFF : IFF
    {
        private object myobj;
        public BFF(object ob)
        {
            myobj = ob;
        }

        public object parse(string s)
        {
            try
            {
                Type tp = myobj.GetType();
                System.Reflection.MethodInfo mi = tp.GetMethod("Parse",new
Type[]{typeof(System.String)});
                object[] param = new object[1];
                param[0] = s ;
                object o = mi.Invoke(myobj,param);
                //myobj.Parse(o);
                return o;
            }
            catch
            {
                return null;
            }
        }
    }
John - 07 Aug 2005 21:41 GMT
>I have a better solution
..
> System.Reflection.MethodInfo mi = tp.GetMethod("Parse",new

Sorry, I am not going to use reflection for performance reason.

John
John - 07 Aug 2005 21:39 GMT
Thank you for answer.

> Reflection doesn't have to be slow. You can't get rid of the overhead, but
> if you code it correctly it can be quite fast.

May be it does not have to be, but it is.

Unfortunately, in my own benchmarking calling a static empty method
without arguments using reflection (MethodInfo.Invoke) is 300 times
slower than a direct call,
calling the same method using a delegate is 10% slower.
Calling Int32.Parse using reflection is 15 times slower than a direct call,
delegate is 0.5% slower.
So, I will not use reflection is a loop (unless you show that my results are
wrong).

> can create a sort of generic converter function
> Convert.ChangeType(val, type);

Thank you, I completely missed this one in my original search.
However, I am not going to use it.
Convert.ChangeType is implemented as a kind of a switch,
which I tried to avoid at the first place. As the result:
- it is 30% slower than the delegate
- it can only handle standard types and not the Sql* types.

Thank you
John
Cor Ligthert [MVP] - 07 Aug 2005 13:50 GMT
John,

I thought that Paul was more active in some other newsgroups the last time.

However I see he is it here as well.

See what he wrote about your problem in this newsgroup.
http://groups-beta.google.com/group/microsoft.public.dotnet.general/msg/a7c29549
7ae67bf4?hl=en
&

I am not so familiar with those Ini files so maybe you can search for that
when the message from Paul is not sufficient enough or wait until he sees
this. Your subject is however not one that in my opinion gets direct the eye
from Paul.

I hope this helps,

Cor
hB - 07 Aug 2005 14:44 GMT
I think i have provided John a pretty easy way.
Note that reflection can be used only once, just extract methodinfo and
then keep invoking it in parse() function.

---
hB
John - 07 Aug 2005 21:48 GMT
> See what he wrote about your problem in this newsgroup.
> http://groups-beta.google.com/group/microsoft.public.dotnet.general/msg/a7c29549
7ae67bf4?hl=en
&

This is an interesting solution, but I am not sure I want to redesign
my program to read CSV to a dataset, then extract data from the dataset
instead of parsing them directly.

Thank you
John
Nick Malik [Microsoft] - 08 Aug 2005 02:25 GMT
Your problem is easily handled using the decorator pattern with a builder
pattern to construct the parsing.  Observer pattern can be used, but is not
terribly efficient.  Switch statements are not needed in the parsing, but
may be needed in the builder.

Take at look at the builder pattern and the decorator pattern by googling
these names.  They are standard OO patterns from the Gang of Four (GoF).

Let me know if you want help implementing them.

Signature

--- Nick Malik [Microsoft]
   MCSD, CFPS, Certified Scrummaster
   http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
  I do not answer questions on behalf of my employer.  I'm just a
programmer helping programmers.
--

> Hi,
> I need to read a big CSV file, where different fields should be converted
[quoted text clipped - 27 lines]
> Thank you
> John
Cor Ligthert [MVP] - 08 Aug 2005 08:22 GMT
Nick,

I have the idea I miss something here, why not try to convert a CSV using
OleDB with an Ini file.
I don't know if it completly works setting the datatypes because without
that it is almost forever string.

Can you enlighten me what I miss?

(seriously meant)

Cor
Nick Malik [Microsoft] - 09 Aug 2005 09:31 GMT
Hi Cor,

I looked up the post that you refer to.  For some reason, I hadn't seen that
reply from Paul, but it is an excellent reply.  Honestly, if the format of
the CSV file is rarely changing or changes only with advance notice, his
answer is far-and-away the best answer to use.  The TEXT OleDb driver is
debugged and easy to configure.

My suggestion would only be valid if the application needs to adapt itself
to the data on the fly.  In other words, if the app needs to allow the user
to provide a format, or the format can be deduced, but it cannot be
configured in advance.

In that case, you should create a simple decorator pattern.  The endpoint
object would discard the remainder of the line.  You decorate the object
with a class for each data type, reading right to left, to create the object
structure in memory.  The builder does this work.  Then, it is a matter of
sending each line through the data structure.  The tokens are pulled off
from left to right (reverse of the order in which it was built).  It is fast
and dynamic.  No need for reflection.

Signature

--- Nick Malik [Microsoft]
   MCSD, CFPS, Certified Scrummaster
   http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
  I do not answer questions on behalf of my employer.  I'm just a
programmer helping programmers.
--

> Nick,
>
[quoted text clipped - 8 lines]
>
> Cor
John - 09 Aug 2005 19:11 GMT
Hi,
Thank you for your answer.

> I looked up the post that you refer to.  For some reason, I hadn't seen
> that reply from Paul, but it is an excellent reply.

As I already answered last Sunday,
this is an interesting solution, but I am not sure I want to redesign
my program to read CSV to a dataset, then extract data from the dataset
instead of parsing them directly.
But I will remember this as an option for the future.

> My suggestion would only be valid if the application needs to adapt itself
> to the data on the fly.

Yes, currently I store the field names in the first row of the CSV.
This allows easy viewing of the CSV in Excel and pretty flexible parsing.

> You decorate the object  with a class for each data type.

If this will require writing a new class for each data type,
then it will require at least twice more lines of code than
the wrapper/delegate approach. And in general the code looks more complex.

> No need for reflection.

I never considered the reflection for performance reasons
despite what some other people suggested.

Thank you
John
Nick Malik [Microsoft] - 10 Aug 2005 07:16 GMT
Hello John,

In your original post, you stated:
>> I still hope that .NET framework has a way to do it in official way,
>> which I overlooked. <<

I was simply pointing out that there is a way that you had overlooked... the
decorator pattern.
Note that you can do this with the visitor pattern as well.  You chose the
observer pattern.  Personally, I would not have done, but that is your
choice.

>> You decorate the object  with a class for each data type.
>
> If this will require writing a new class for each data type,
> then it will require at least twice more lines of code than
> the wrapper/delegate approach. And in general the code looks more complex.

I would agree that there are more classes.  I would also state that you have
a much more OO approach this way.
I would disagree that it looks more complex.  On the contrary, it looks much
simpler.

>> No need for reflection.
>
> I never considered the reflection for performance reasons
> despite what some other people suggested.

I am aware of your other responses dealing with reflection.  I pointed this
out to let you know that, unlike other suggestions, this pattern does not
require reflection.

Signature

--- Nick Malik [Microsoft]
   MCSD, CFPS, Certified Scrummaster
   http://blogs.msdn.com/nickmalik

Disclaimer: Opinions expressed in this forum are my own, and not
representative of my employer.
  I do not answer questions on behalf of my employer.  I'm just a
programmer helping programmers.
--

John - 10 Aug 2005 19:37 GMT
Thank you for your answer.

I guess this discussion drifted a bit off topic.
I am well aware of patterns and was using them in various languages
(when appropriate)
but this question was about the .NET library (framework).

Patterns will not save me from calling the framework Parse() or
Convert.ChangeType() routines. They will just pile up more code on top.

> Note that you can do this with the visitor pattern as well.  You chose the
> observer pattern.

I don't believe I am using any pattern right now, especially an observer.
As you, probably aware, the essence of an observer pattern is that
one or several listeners are registered to hear an event which
may be raised by the observed object. My code certainly does not
have any listeners, events or observed objects. All it has is a static
table of structures describing types  (with names and delegates for
parsing),
several one-or-two-line wrappers for Parse routines for various types,
an array of types for a particular CSV, and a simple loop,
which invokes an appropriate delegate for each field of CSV row.
So, if you insist on using OO terms here, I have a trivial case of
polymorphism and no more than that.

> I would agree that there are more classes.  I would also state that you
> have a much more OO approach this way.

I am not a purist, who is using a particular approach just for the sake of
using it.
OO approach is not a sacred cow and using it is not the goal of my project.
The code should be simple, reliable and efficient. If OO helps me to
achieve it, great, if not - forget it.

> I would disagree that it looks more complex.  On the contrary, it looks
> much simpler.

Your code (writing a dozen or two of custom decorator classes for system
types),
will not fit a screen. My table-driven approach does fit one screen and have
less executable code (only one-line wrappers) and a loop. So, it is simpler,
easier to read, maintain and debug. Code bloat will achieve the opposite
- more code, harder to read, maintain, have more bugs, etc.

By the way, I posted in another response a thought that C++ may have allowed
me to generate custom decorators for system types using templates.
In that case I may have considered using the decorator approach
if generation of a single decorator will take only one line
(such as MyDecorator<Int32>, MyDecorator<double>, etc.).
But C# and .NET generics are no good for this purpose.

> I pointed this out to let you know that, unlike other suggestions, this
> pattern does not require reflection.

Sure, I never said that it does. Patterns usually use virtual methods,
which (I guess) should be similar to delegates in performance.
So, I don't reject your method because of performance, but because
of code bloat.

Thank you
John
John - 10 Aug 2005 21:04 GMT
Actually I found that I can create templates using managed C++:

interface class MyParserIntf
{
public:
   virtual Object^ Parse(String^ str);
};

template <typename T>
ref class MyParser : public MyParserIntf
{
public:
   virtual Object^ Parse(String^ str)
   {
       return T::Parse(str);
   }
};
......
array<MyParserIntf^> ^ Parsers = gcnew array<MyParserIntf^>
{//All types known to the CSV processor
       gcnew MyParser<Int32>(),
       gcnew MyParser<DateTime>(),
       gcnew MyParser<double>(),
      ..........
};
.....
array<MyParserIntf^> ^ CurCSVParsers;
generate CurCSVParsers array of parsers for the current CSV
......
for(int i = 0; i < CurCSVParsers->Length; i++)
{
       l_Results[i] = CurCSVParsers[i]->Parse(l_SampleRow[i]);
}

This way I can implement the decorator pattern as Nick suggested
with only a single line per the system type
(if I will find any any advantage of doing so over the simple code above).

But I don't see a way to mix the C++ and C# in the same project.
And creating a separate DLL just for parsing seems to be an overkill.

John
Jay B. Harlow [MVP - Outlook] - 10 Aug 2005 22:19 GMT
John,
While reviewing another thread, I found Java 5.0 supports very powerful
Enums.

http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html

From what I've read Java 5.0 got Enums right! At least they implemented
Enums the way I probably would have, then added some features that are
useful, but I'm not sure I would have included...

I could see you defining your Field type as a Java Enum instead of using
System.Type, allowing you to give each enum constant its own unique
behavior...

Although it doesn't help you on this project, I found it to be a very cool
feature...

Just a thought
Jay

| Actually I found that I can create templates using managed C++:
|
[quoted text clipped - 38 lines]
|
| John
John - 10 Aug 2005 23:19 GMT
> While reviewing another thread, I found Java 5.0 supports very powerful
> Enums.
>
> http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html

Yes, kind of overblown enums.

> Although it doesn't help you on this project, I found it to be a very cool
> feature...

Well, you are wrong. Reading about Java Enums, which now allow
so called "constant-specific methods",
I realized that these are pretty similar to old and familiar anonymous
classes,
which made me thinking that C# 2.0 also have similar anonymous methods,
so I can throw away those separate Parse wrappers and implement the parsers
inline
right in the table:

       delegate object Parse(string str);
       static Parse[] Types = new Parse[] {
               delegate(string str) {return Int32.Parse(str);},
               delegate(string str) {return DateTime.Parse(str);},
               delegate(string str) {return double.Parse(str);},
           };
....
           for (int i = 0; i < l_SampleRow.Length; i++)
           {
               l_Results[i] = (CSVTypes[i])(l_SampleRow[i]);
           }
Looks simple enough.

Thank you
John
Jay B. Harlow [MVP - Outlook] - 11 Aug 2005 00:15 GMT
John,
| which made me thinking that C# 2.0 also have similar anonymous methods,
| so I can throw away those separate Parse wrappers and implement the parsers
| inline
| right in the table:
If you have the luxury of using C# 2.0, then yes anonymous methods might be
a good way to simplify the wrappers methods.

It appears that it should work, does it?

Thanks for the follow up.
Jay

|> While reviewing another thread, I found Java 5.0 supports very powerful
| > Enums.
[quoted text clipped - 30 lines]
| Thank you
| John
Jay B. Harlow [MVP - Outlook] - 09 Aug 2005 18:28 GMT
John,
In addition to the other comments.

My first choice would be Convert.ChangeType as Klaus shows.

My second choice would be an Adapter pattern, similar to your array of
Wrapper Delegates. In addition to using Delegates as you did, I would
consider using a series of classes that implemented an interface or shared a
common base class.

Something like:

   Public Interface IConverter

       Function Parse(ByVal s As String) As Object

   End Interface

   Public Class DoubleConverter
       Implements IConverter

       Public Function Parse(ByVal s As String) As Object Implements
IConverter.Parse
           Return Double.Parse(s)
       End Function

   End Class

   Public Class ConverterCollection
       Inherits DictionaryBase

       Public Sub Add(ByVal type As Type, ByVal converter As IConverter)
           MyBase.InnerHashtable.Add(type, converter)
       End Sub

       Default Public ReadOnly Property Item(ByVal type As Type) As
IConverter
           Get
               Return DirectCast(MyBase.InnerHashtable.Item(type),
IConverter)
           End Get
       End Property

   End Class

   Public Shared Sub Main()
       Dim converters As New ConverterCollection
       converters.Add(GetType(Double), New DoubleConverter)

   End Sub

The disadvantage of the interface/class method is the proliferation of
classes. The advantage of the delegate method is the elimination of all the
classes...

As to performance: Remember the 80/20 rule. That is 80% of the execution
time of your program is spent in 20% of your code. I will optimize (worry
about performance, memory consumption) the 20% once that 20% has been
identified & proven to be a performance problem via profiling (CLR Profiler
is one profiling tool).

For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's
article "Yet Another Optimization Article" at
http://martinfowler.com/ieeeSoftware/yetOptimization.pdf

Hope this helps
Jay

| Hi,
| I need to read a big CSV file, where different fields should be converted to
[quoted text clipped - 26 lines]
| Thank you
| John
Jay B. Harlow [MVP - Outlook] - 09 Aug 2005 18:38 GMT
Doh!

I should add that System.ComponentModel.TypeConverter might be a class that
you could leverage instead of creating your own IConverter class.

You can use TypeDescripter.GetConverter to get the TypeConverter for a Type
or Object. If performance was a consideration I would consider caching the
TypeConverters.

Of course instead of storing the converters in their own hash table as I
showed earlier, you could store them in type describing each field...

Something like:

   Public Class FieldDescription

       Public Name As String

       Public Type As Type

       Public Converter As IConverter

       ' alternate to IConverter or your delegate...
       Public Converter As TypeConverter

   End Class

Hope this helps
Jay

| John,
| In addition to the other comments.
[quoted text clipped - 96 lines]
|| Thank you
|| John
John - 09 Aug 2005 19:36 GMT
Hi,
Thank you for your answer.

> The disadvantage of the interface/class method is the proliferation of
> classes.

Exactly. I asked this question at the first place because I didn't like
the proliferation of wrappers. Am just lazy. After typing 2 or 3 wrappers
I was bored and posted this message last Saturday.
But the proliferation of classes is much worse
because it will require 2 (or 3) times more lines of code than the wrappers.
I am certainly not going to type them.

By the way, in C++ I may have used templates to automatically
generate wrappers. I had big hopes for the generics feature in .NET,
but I was disappointed when found that generics cannot be used
in this (and in many other cases) because there is no
base class or interface common to all the types, which exposes Parse
method. But without such common interface the compiler refuses
to compile the generic class. Stupid!

> Remember the 80/20 rule.

Yes, I am reminded of this rule every time I am, as a user, have to suffer
running a slow program. I know what the developers were thinking.

> (CLR Profiler is one profiling tool).

My CSV may have few millions of records. Remembering that
reflection is 15 times (1500%) slower than delegates in this case,
I don't want to waste my time on the profiler
(in this case a wristwatch if sufficient).

So, I am sticking to the wrappers/delegates,
but I appreciate all answers, I learned about TEXT OleDb driver
and Convert.ChangeType and may use them in the future.

Thank you
John

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.