Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / June 2007

Tip: Looking for answers? Try searching our database.

XMLReader skip current element

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Alex - 05 Jun 2007 16:29 GMT
For example, i have some part of XML file.

<AppSettings>
   <Object ClassVersion="1.0.0.0" Type="AppSettings">
     <Fields>
       <Field Name="App_ID" Type="System.Int32">
         <Value>
           <int>-1</int>
         </Value>
       </Field>
       <Field Name="AppDate Type="System.DateTime">
         <Value>
           <dateTime>2007-05-25T00:00:00</dateTime>
         </Value>
       </Field>
       <Field Name="AppFileName" Type="System.String">
         <Value>
           <string>TEST 03222007.daf</string>
         </Value>
       </Field>
       <Field Name="AppVersion" Type="System.String">
         <Value>
           <string>1.0.3.3</string>
         </Value>
       </Field>
       <Field Name="_ClassVersion" Type="System.String">
         <Value>
           <string>1.0.0.0</string>
         </Value>
       </Field>
     </Fields>
   </Object>
 </AppSettings>

As you can see, its corrupted, because AppDate doesn't gave second ".
I am getting exception when reader.MoveToContent (after i read App_ID)
this all are in try..catch section...
and after that i am receiving smth like string fieldname == "AppDate
Type=";
I can't understand, how i can jump to AppFileName and skip corrupted
AppDate ?
so, how in catch section i can jump to next element ? (during
application's work, i dont know what is the name of next element)

Thanks
Jon Skeet [C# MVP] - 05 Jun 2007 16:34 GMT
<snip>

> As you can see, its corrupted, because AppDate doesn't gave second ".

Right. It's an invalid XML file. I would strongly recommend that you
completely reject such files - trying to cope with broken files like
this is a real pain, and I don't know whether XmlReader (or any of the
other .NET XML types) support it.

Jon
Alex - 05 Jun 2007 16:40 GMT
> <snip>
>
[quoted text clipped - 6 lines]
>
> Jon

Sure, i made file to be invalid manually, because i want to add some
improvements to my code, to avoid or solve this problem.

This is just fragment, now file size is 100KB and will be bigger
later.
Also, this file is like XmlSerialization of some classes i want to be
serialized.
So, the data which stored are big, and i really don't want user to
fill out all again.

So, if there is some solution about this, i will be glad to here.
Jon Skeet [C# MVP] - 05 Jun 2007 18:47 GMT
> > Right. It's an invalid XML file. I would strongly recommend that you
> > completely reject such files - trying to cope with broken files like
[quoted text clipped - 3 lines]
> Sure, i made file to be invalid manually, because i want to add some
> improvements to my code, to avoid or solve this problem.

Is there any real reason why you need to handle an invalid XML file?
Most XML-based applications don't, as far as I'm aware. (Obviously XML
editors have to, but other than that...)

> This is just fragment, now file size is 100KB and will be bigger
> later.
[quoted text clipped - 4 lines]
>
> So, if there is some solution about this, i will be glad to here.

Why would the user have to fill anything out again? Why are you
expecting invalid XML?

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Peter Duniho - 05 Jun 2007 19:50 GMT
> Is there any real reason why you need to handle an invalid XML file?
> Most XML-based applications don't, as far as I'm aware. (Obviously XML
> editors have to, but other than that...)

Well, and in fact I'm not sure that XML editors have to either.  As an  
imprecise but similar example, consider Visual Studio's code editor.  If  
you miss some sort of closing quote, comment closure, closing bracket,  
etc. the editor makes no attempt to recover from that.  It just shows you  
that there's a problem, treating the file as "valid" all the way up to the  
point where it knows for sure it's not valid (which is often the end of  
the file).

I can imagine someone writing an XML editor that goes to a lot of effort  
to try to detect and correct invalid XML, just as the OP wants to do in  
his program.  But it would surprise me if this is the norm, even when  
looking only at XML editors.

Pete
Jon Skeet [C# MVP] - 05 Jun 2007 20:47 GMT
> > Is there any real reason why you need to handle an invalid XML file?
> > Most XML-based applications don't, as far as I'm aware. (Obviously XML
[quoted text clipped - 7 lines]
> point where it knows for sure it's not valid (which is often the end of  
> the file).

It depends on quite how broken you make it.

If you miss off a semi-colon or have a random extra character like "+"
between statements, it's still syntactically invalid, but it recovers
quickly. An extra closing brace certainly confuses it though, yes.

> I can imagine someone writing an XML editor that goes to a lot of effort  
> to try to detect and correct invalid XML, just as the OP wants to do in  
> his program.  But it would surprise me if this is the norm, even when  
> looking only at XML editors.

Maybe it's just the ones I've used - and that's only from memory,
admittedly...

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Peter Duniho - 05 Jun 2007 21:16 GMT
> It depends on quite how broken you make it.
>
> If you miss off a semi-colon or have a random extra character like "+"
> between statements, it's still syntactically invalid, but it recovers
> quickly. An extra closing brace certainly confuses it though, yes.

I suppose "recovers" is in the eye of the beholder.  What I see when one  
leaves off a semi-colon is that the end of the statement where the  
semi-colon was expected is flagged.  However, the only reason it can do  
that is that it is apparent upon seeing the first thing that doesn't make  
sense in that statement (ie, the next statement) where the error is.

But I don't really see that the editor has "recovered".  It is simply  
pointing out the first place it has detected a problem.  Just as the  
compiler won't compile a file even though it could usually correctly infer  
the correct location of the semicolon, it's not really like the VS editor  
has judged the remainder of the file correct and accurate.  In fact, it  
gives up on a variety of automatic stuff once it's stumbled (for example,  
I've lost count of the number of times that I don't get Intellisense  
feedback because of a localized compiler-type error in my source code).

Compilers, code editors, and XML editors alike can all make inferences  
about what the input data *should* look like, and try to produce correct  
behavior based on those inferences.  But my experience (granted, limited  
in the case of XML editors, but not so limited in other areas) is that if  
the input data does not comply exactly with what's expected, the user is  
simply told "this data is bad...I'm not going any further until you fix  
it".

Pete
Jon Skeet [C# MVP] - 05 Jun 2007 21:40 GMT
> > It depends on quite how broken you make it.
> >
[quoted text clipped - 10 lines]
> But I don't really see that the editor has "recovered".  It is simply  
> pointing out the first place it has detected a problem.

It recovers to the extent that it's able to find errors later on, and
you can still use Intellisense etc.

For example, take this code:

using System;

public class Test
{
   static void Main()
   {
       int x = 5
       int y = 10;
       
       Console.WriteLine("Hello");
   }
}

If you type another "Console." underneath the current call to
Console.WriteLine, VS (2005 at least) offers Intellisense.

It's hard for me to judge exactly how well VS does as opposed to
resharper, but if you change Console.WriteLine to Console.Foo, I
certainly get some feedback that Foo isn't a valid member of Console.

> Just as the  
> compiler won't compile a file even though it could usually correctly infer  
[quoted text clipped - 3 lines]
> I've lost count of the number of times that I don't get Intellisense  
> feedback because of a localized compiler-type error in my source code).

You should try Eclipse some time - it will compile (in some cases, at
least) syntactically invalid code, generating code which throws an
exception when it's got to somewhere that the compilation broke. Not
terribly handy, but quite cute.

> Compilers, code editors, and XML editors alike can all make inferences  
> about what the input data *should* look like, and try to produce correct  
[quoted text clipped - 3 lines]
> simply told "this data is bad...I'm not going any further until you fix  
> it".

Certainly things are more limited after an error, but there's often
still *some* functionality available. If I find the time I might see
what a few XML editors do past an error - whether they still
automatically close tags, find further errors etc. Certainly the VS
2005 XML editor was able to automatically close the "blech" tag in the
below XML, despite the previous error:

<?xml version="1.0" encoding="utf-8" ?>
<foo>
 <bar>
   <baz text="Hello otherText="There"/>

   <blech></blech>
 </bar>
</foo>

Also if you change </blech> to </blech2> it notices that as a second
error.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Peter Duniho - 05 Jun 2007 22:47 GMT
> [...]
> You should try Eclipse some time - it will compile (in some cases, at
> least) syntactically invalid code, generating code which throws an
> exception when it's got to somewhere that the compilation broke. Not
> terribly handy, but quite cute.

Well, sure.  I can appreciate "cute".  :)  But as you say, not terribly  
handy.  Likewise, just how handy would it be to just skip over an invalid  
section of XML, when you have no idea what the overall effect of doing so  
would be?  Just because the remaining XML can be parsed, that doesn't mean  
that it can be *used* without the part that was erroneous.

> [...] Certainly the VS
> 2005 XML editor was able to automatically close the "blech" tag in the
> below XML, despite the previous error:

I certainly agree that it *can* be done.  I just am not convinced it makes  
sense to bother writing the code to do so.  It does seem to me that in an  
editor, where the user is actively modifying the data, it makes more sense  
to put the effort in, but even there I wouldn't necessarily insist on it  
(even in VS there are limits to what it can recover from, and frankly it  
only handles the simplest situations).  I expect it's something you see in  
editors that are intended to be feature-laden, considered "heavy-duty"  
(that's certainly how I'd describe VS).

In a situation where the data is static though, I don't see the use in  
recovering.  You never know when the data that was in error was critical  
to the use of the larger XML document.  Just because you can successfully  
parse the rest of the document doesn't mean you should, just as just  
because a compiler could make an assumption about where to insert a  
missing semi-colon doesn't mean it should.

Pete
Jon Skeet [C# MVP] - 05 Jun 2007 23:00 GMT
> > [...]
> > You should try Eclipse some time - it will compile (in some cases, at
[quoted text clipped - 7 lines]
> would be?  Just because the remaining XML can be parsed, that doesn't mean  
> that it can be *used* without the part that was erroneous.

On the other hand, if I open an invalid XML file it's nice to know
whether there's just one error or whether the whole thing is pooched.

> > [...] Certainly the VS
> > 2005 XML editor was able to automatically close the "blech" tag in the
[quoted text clipped - 8 lines]
> editors that are intended to be feature-laden, considered "heavy-duty"  
> (that's certainly how I'd describe VS).

Agreed in the last bit - and I'm *certainly* not suggesting that the OP
should try to recover.

> In a situation where the data is static though, I don't see the use in  
> recovering.  You never know when the data that was in error was critical  
> to the use of the larger XML document.  Just because you can successfully  
> parse the rest of the document doesn't mean you should, just as just  
> because a compiler could make an assumption about where to insert a  
> missing semi-colon doesn't mean it should.

Oh absolutely. I was only talking about editors, where it can be handy
to be able to show more than the first error.

Even with static document reading, it *may* be useful to bomb out with
an error which has a good stab at working out where all the error parts
are, rather than just the first one. That's not the same as really
trying to recover though.

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
If replying to the group, please do not mail me too

Peter Duniho - 06 Jun 2007 07:34 GMT
> On the other hand, if I open an invalid XML file it's nice to know
> whether there's just one error or whether the whole thing is pooched.

Sure, I agree.  If you're using an editor, that would be a nice feature to  
have.  But that still doesn't mean it would be a ubiquitous feature in all  
XML editors (though I can see how it might appear in advanced editors).

> [...]
> Even with static document reading, it *may* be useful to bomb out with
> an error which has a good stab at working out where all the error parts
> are, rather than just the first one. That's not the same as really
> trying to recover though.

Nope.  :)

If I wanted to provide feedback as to a place to look for the error, I  
would inform the user where the last place in the file I had valid data.  
That's not really the same as trying to do anything fancy with figuring  
out the erroneous part though.  All it requires is keep track of how far  
into the file you got before you failed to generate new valid data.

It's the parsing bad data that I think is normally going to be outside the  
scope of typical software.  Sorry if I seem to have taken this thread off  
on a tangent.  I just got set off by the statement that an XML editor  
*has* to handle errors.  An XML editor *could* in fact just display the  
text beyond the error and tell the user "I'm not going to help you with  
this until you fix it".  :)

Pete
Martin Honnen - 05 Jun 2007 16:41 GMT
> For example, i have some part of XML file.
>
[quoted text clipped - 7 lines]
>         </Field>
>         <Field Name="AppDate Type="System.DateTime">

> As you can see, its corrupted, because AppDate doesn't gave second ".
> I am getting exception when reader.MoveToContent (after i read App_ID)
[quoted text clipped - 5 lines]
> so, how in catch section i can jump to next element ? (during
> application's work, i dont know what is the name of next element)

XML has strict rules, the sample markup is not well-formed and therefore
the XML parser will not parse it but throw an exception. There is no way
to simply skip markup that is not well-formed. So you will not be able
to parse that markup successfully with XmlReader. You have to fix
whatever application generates the markup to produce well-formed XML.
With .NET using XmlWriter can help.

Signature

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/

Alex - 05 Jun 2007 17:06 GMT
ok :(

is it possible to read in some another way, but a bit automatically,
and skip problem like that as i need ?
i mean not to use XmlReader, because it can't jump, but use smth else.
But for sure i dont want to write to xmlfile all-all fields manually
(this is just serialization of classes' fields i need).

but, if exception appears - skip field

?
Peter Duniho - 05 Jun 2007 17:56 GMT
> is it possible to read in some another way, but a bit automatically,
> and skip problem like that as i need ?
[quoted text clipped - 3 lines]
>
> but, if exception appears - skip field

No.  The general-purpose XML classes have no practical way to make  
intelligent decisions about where to start looking again for valid data.  
The only way to do what you want, even in some limited way, is to do  
everything yourself.

You as a person can look at the file visually and tell where valid data  
again starts, but that's because you have a LOT of "meta-information"  
about the XML and can recognize things that would never appear inside  
quoted text, but which are definitely part of the XML structure.  If you  
want your code to handle that, you will need to write it yourself, taking  
advantage of this knowledge.  If you do this, you will likely want to  
implement your entire XML reading code from scratch, so that when you run  
across something that doesn't make sense you can recover immediately based  
on where you've already read.

Personally, I would not bother.  As has been pointed out, the XML is  
simply invalid.  It's not going to be invalid unless some user hand-edits  
the file and starts mucking it up, and once you assume users may do that,  
it is impossible to ensure that you can in any sensible way recover from  
their doings.  You should definitely make sure that bad data doesn't bring  
your application crashing down, but it's not reasonable for a user to  
expect you to come up with some graceful way to reconstruct the invalid  
data in the general case, and so you should probably not waste a lot of  
time implementing code that does so.

Pete

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.