.NET Forum / Languages / C# / June 2007
XMLReader skip current element
|
|
Thread rating:  |
Alex - 05 Jun 2007 16:29 GMT For example, i have some part of XML file.
<AppSettings> <Object ClassVersion="1.0.0.0" Type="AppSettings"> <Fields> <Field Name="App_ID" Type="System.Int32"> <Value> <int>-1</int> </Value> </Field> <Field Name="AppDate Type="System.DateTime"> <Value> <dateTime>2007-05-25T00:00:00</dateTime> </Value> </Field> <Field Name="AppFileName" Type="System.String"> <Value> <string>TEST 03222007.daf</string> </Value> </Field> <Field Name="AppVersion" Type="System.String"> <Value> <string>1.0.3.3</string> </Value> </Field> <Field Name="_ClassVersion" Type="System.String"> <Value> <string>1.0.0.0</string> </Value> </Field> </Fields> </Object> </AppSettings>
As you can see, its corrupted, because AppDate doesn't gave second ". I am getting exception when reader.MoveToContent (after i read App_ID) this all are in try..catch section... and after that i am receiving smth like string fieldname == "AppDate Type="; I can't understand, how i can jump to AppFileName and skip corrupted AppDate ? so, how in catch section i can jump to next element ? (during application's work, i dont know what is the name of next element)
Thanks
Jon Skeet [C# MVP] - 05 Jun 2007 16:34 GMT <snip>
> As you can see, its corrupted, because AppDate doesn't gave second ". Right. It's an invalid XML file. I would strongly recommend that you completely reject such files - trying to cope with broken files like this is a real pain, and I don't know whether XmlReader (or any of the other .NET XML types) support it.
Jon
Alex - 05 Jun 2007 16:40 GMT > <snip> > [quoted text clipped - 6 lines] > > Jon Sure, i made file to be invalid manually, because i want to add some improvements to my code, to avoid or solve this problem.
This is just fragment, now file size is 100KB and will be bigger later. Also, this file is like XmlSerialization of some classes i want to be serialized. So, the data which stored are big, and i really don't want user to fill out all again.
So, if there is some solution about this, i will be glad to here.
Jon Skeet [C# MVP] - 05 Jun 2007 18:47 GMT > > Right. It's an invalid XML file. I would strongly recommend that you > > completely reject such files - trying to cope with broken files like [quoted text clipped - 3 lines] > Sure, i made file to be invalid manually, because i want to add some > improvements to my code, to avoid or solve this problem. Is there any real reason why you need to handle an invalid XML file? Most XML-based applications don't, as far as I'm aware. (Obviously XML editors have to, but other than that...)
> This is just fragment, now file size is 100KB and will be bigger > later. [quoted text clipped - 4 lines] > > So, if there is some solution about this, i will be glad to here. Why would the user have to fill anything out again? Why are you expecting invalid XML?
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Peter Duniho - 05 Jun 2007 19:50 GMT > Is there any real reason why you need to handle an invalid XML file? > Most XML-based applications don't, as far as I'm aware. (Obviously XML > editors have to, but other than that...) Well, and in fact I'm not sure that XML editors have to either. As an imprecise but similar example, consider Visual Studio's code editor. If you miss some sort of closing quote, comment closure, closing bracket, etc. the editor makes no attempt to recover from that. It just shows you that there's a problem, treating the file as "valid" all the way up to the point where it knows for sure it's not valid (which is often the end of the file).
I can imagine someone writing an XML editor that goes to a lot of effort to try to detect and correct invalid XML, just as the OP wants to do in his program. But it would surprise me if this is the norm, even when looking only at XML editors.
Pete
Jon Skeet [C# MVP] - 05 Jun 2007 20:47 GMT > > Is there any real reason why you need to handle an invalid XML file? > > Most XML-based applications don't, as far as I'm aware. (Obviously XML [quoted text clipped - 7 lines] > point where it knows for sure it's not valid (which is often the end of > the file). It depends on quite how broken you make it.
If you miss off a semi-colon or have a random extra character like "+" between statements, it's still syntactically invalid, but it recovers quickly. An extra closing brace certainly confuses it though, yes.
> I can imagine someone writing an XML editor that goes to a lot of effort > to try to detect and correct invalid XML, just as the OP wants to do in > his program. But it would surprise me if this is the norm, even when > looking only at XML editors. Maybe it's just the ones I've used - and that's only from memory, admittedly...
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Peter Duniho - 05 Jun 2007 21:16 GMT > It depends on quite how broken you make it. > > If you miss off a semi-colon or have a random extra character like "+" > between statements, it's still syntactically invalid, but it recovers > quickly. An extra closing brace certainly confuses it though, yes. I suppose "recovers" is in the eye of the beholder. What I see when one leaves off a semi-colon is that the end of the statement where the semi-colon was expected is flagged. However, the only reason it can do that is that it is apparent upon seeing the first thing that doesn't make sense in that statement (ie, the next statement) where the error is.
But I don't really see that the editor has "recovered". It is simply pointing out the first place it has detected a problem. Just as the compiler won't compile a file even though it could usually correctly infer the correct location of the semicolon, it's not really like the VS editor has judged the remainder of the file correct and accurate. In fact, it gives up on a variety of automatic stuff once it's stumbled (for example, I've lost count of the number of times that I don't get Intellisense feedback because of a localized compiler-type error in my source code).
Compilers, code editors, and XML editors alike can all make inferences about what the input data *should* look like, and try to produce correct behavior based on those inferences. But my experience (granted, limited in the case of XML editors, but not so limited in other areas) is that if the input data does not comply exactly with what's expected, the user is simply told "this data is bad...I'm not going any further until you fix it".
Pete
Jon Skeet [C# MVP] - 05 Jun 2007 21:40 GMT > > It depends on quite how broken you make it. > > [quoted text clipped - 10 lines] > But I don't really see that the editor has "recovered". It is simply > pointing out the first place it has detected a problem. It recovers to the extent that it's able to find errors later on, and you can still use Intellisense etc.
For example, take this code:
using System;
public class Test { static void Main() { int x = 5 int y = 10; Console.WriteLine("Hello"); } }
If you type another "Console." underneath the current call to Console.WriteLine, VS (2005 at least) offers Intellisense.
It's hard for me to judge exactly how well VS does as opposed to resharper, but if you change Console.WriteLine to Console.Foo, I certainly get some feedback that Foo isn't a valid member of Console.
> Just as the > compiler won't compile a file even though it could usually correctly infer [quoted text clipped - 3 lines] > I've lost count of the number of times that I don't get Intellisense > feedback because of a localized compiler-type error in my source code). You should try Eclipse some time - it will compile (in some cases, at least) syntactically invalid code, generating code which throws an exception when it's got to somewhere that the compilation broke. Not terribly handy, but quite cute.
> Compilers, code editors, and XML editors alike can all make inferences > about what the input data *should* look like, and try to produce correct [quoted text clipped - 3 lines] > simply told "this data is bad...I'm not going any further until you fix > it". Certainly things are more limited after an error, but there's often still *some* functionality available. If I find the time I might see what a few XML editors do past an error - whether they still automatically close tags, find further errors etc. Certainly the VS 2005 XML editor was able to automatically close the "blech" tag in the below XML, despite the previous error:
<?xml version="1.0" encoding="utf-8" ?> <foo> <bar> <baz text="Hello otherText="There"/>
<blech></blech> </bar> </foo>
Also if you change </blech> to </blech2> it notices that as a second error.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Peter Duniho - 05 Jun 2007 22:47 GMT > [...] > You should try Eclipse some time - it will compile (in some cases, at > least) syntactically invalid code, generating code which throws an > exception when it's got to somewhere that the compilation broke. Not > terribly handy, but quite cute. Well, sure. I can appreciate "cute". :) But as you say, not terribly handy. Likewise, just how handy would it be to just skip over an invalid section of XML, when you have no idea what the overall effect of doing so would be? Just because the remaining XML can be parsed, that doesn't mean that it can be *used* without the part that was erroneous.
> [...] Certainly the VS > 2005 XML editor was able to automatically close the "blech" tag in the > below XML, despite the previous error: I certainly agree that it *can* be done. I just am not convinced it makes sense to bother writing the code to do so. It does seem to me that in an editor, where the user is actively modifying the data, it makes more sense to put the effort in, but even there I wouldn't necessarily insist on it (even in VS there are limits to what it can recover from, and frankly it only handles the simplest situations). I expect it's something you see in editors that are intended to be feature-laden, considered "heavy-duty" (that's certainly how I'd describe VS).
In a situation where the data is static though, I don't see the use in recovering. You never know when the data that was in error was critical to the use of the larger XML document. Just because you can successfully parse the rest of the document doesn't mean you should, just as just because a compiler could make an assumption about where to insert a missing semi-colon doesn't mean it should.
Pete
Jon Skeet [C# MVP] - 05 Jun 2007 23:00 GMT > > [...] > > You should try Eclipse some time - it will compile (in some cases, at [quoted text clipped - 7 lines] > would be? Just because the remaining XML can be parsed, that doesn't mean > that it can be *used* without the part that was erroneous. On the other hand, if I open an invalid XML file it's nice to know whether there's just one error or whether the whole thing is pooched.
> > [...] Certainly the VS > > 2005 XML editor was able to automatically close the "blech" tag in the [quoted text clipped - 8 lines] > editors that are intended to be feature-laden, considered "heavy-duty" > (that's certainly how I'd describe VS). Agreed in the last bit - and I'm *certainly* not suggesting that the OP should try to recover.
> In a situation where the data is static though, I don't see the use in > recovering. You never know when the data that was in error was critical > to the use of the larger XML document. Just because you can successfully > parse the rest of the document doesn't mean you should, just as just > because a compiler could make an assumption about where to insert a > missing semi-colon doesn't mean it should. Oh absolutely. I was only talking about editors, where it can be handy to be able to show more than the first error.
Even with static document reading, it *may* be useful to bomb out with an error which has a good stab at working out where all the error parts are, rather than just the first one. That's not the same as really trying to recover though.
 Signature Jon Skeet - <skeet@pobox.com> http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet If replying to the group, please do not mail me too
Peter Duniho - 06 Jun 2007 07:34 GMT > On the other hand, if I open an invalid XML file it's nice to know > whether there's just one error or whether the whole thing is pooched. Sure, I agree. If you're using an editor, that would be a nice feature to have. But that still doesn't mean it would be a ubiquitous feature in all XML editors (though I can see how it might appear in advanced editors).
> [...] > Even with static document reading, it *may* be useful to bomb out with > an error which has a good stab at working out where all the error parts > are, rather than just the first one. That's not the same as really > trying to recover though. Nope. :)
If I wanted to provide feedback as to a place to look for the error, I would inform the user where the last place in the file I had valid data. That's not really the same as trying to do anything fancy with figuring out the erroneous part though. All it requires is keep track of how far into the file you got before you failed to generate new valid data.
It's the parsing bad data that I think is normally going to be outside the scope of typical software. Sorry if I seem to have taken this thread off on a tangent. I just got set off by the statement that an XML editor *has* to handle errors. An XML editor *could* in fact just display the text beyond the error and tell the user "I'm not going to help you with this until you fix it". :)
Pete
Martin Honnen - 05 Jun 2007 16:41 GMT > For example, i have some part of XML file. > [quoted text clipped - 7 lines] > </Field> > <Field Name="AppDate Type="System.DateTime">
> As you can see, its corrupted, because AppDate doesn't gave second ". > I am getting exception when reader.MoveToContent (after i read App_ID) [quoted text clipped - 5 lines] > so, how in catch section i can jump to next element ? (during > application's work, i dont know what is the name of next element) XML has strict rules, the sample markup is not well-formed and therefore the XML parser will not parse it but throw an exception. There is no way to simply skip markup that is not well-formed. So you will not be able to parse that markup successfully with XmlReader. You have to fix whatever application generates the markup to produce well-formed XML. With .NET using XmlWriter can help.
 Signature Martin Honnen --- MVP XML http://JavaScript.FAQTs.com/
Alex - 05 Jun 2007 17:06 GMT ok :(
is it possible to read in some another way, but a bit automatically, and skip problem like that as i need ? i mean not to use XmlReader, because it can't jump, but use smth else. But for sure i dont want to write to xmlfile all-all fields manually (this is just serialization of classes' fields i need).
but, if exception appears - skip field
?
Peter Duniho - 05 Jun 2007 17:56 GMT > is it possible to read in some another way, but a bit automatically, > and skip problem like that as i need ? [quoted text clipped - 3 lines] > > but, if exception appears - skip field No. The general-purpose XML classes have no practical way to make intelligent decisions about where to start looking again for valid data. The only way to do what you want, even in some limited way, is to do everything yourself.
You as a person can look at the file visually and tell where valid data again starts, but that's because you have a LOT of "meta-information" about the XML and can recognize things that would never appear inside quoted text, but which are definitely part of the XML structure. If you want your code to handle that, you will need to write it yourself, taking advantage of this knowledge. If you do this, you will likely want to implement your entire XML reading code from scratch, so that when you run across something that doesn't make sense you can recover immediately based on where you've already read.
Personally, I would not bother. As has been pointed out, the XML is simply invalid. It's not going to be invalid unless some user hand-edits the file and starts mucking it up, and once you assume users may do that, it is impossible to ensure that you can in any sensible way recover from their doings. You should definitely make sure that bad data doesn't bring your application crashing down, but it's not reasonable for a user to expect you to come up with some graceful way to reconstruct the invalid data in the general case, and so you should probably not waste a lot of time implementing code that does so.
Pete
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|