> "The Xml-document is not loaded into memory when using XmlTextReader, as
> opposed to using the DOM where the entire document is loaded in memory"
>
> but, when using XmlTextReader, how can I parse then if the document is not
> loaded ?
> something must be loaded no ?
XmlTextReader is streaming forward-only non-caching reader. It reads and
holds in memory only one XML node at a time.

Signature
Oleg Tkachenko [XML MVP]
http://blog.tkachenko.com
> the docs say :
>
[quoted text clipped - 4 lines]
> loaded ?
> something must be loaded no ?
XmlTextReader provides sequential, forward only, read only view of xml.
In general one works with XmlTextReader as:
while(stop condition)
{
reader.ReadXXX();
Handle read data
}
If you are going to store some data of the predefined schema, you may
consider to declare class, annotate it with xml serialization attributes,
and use XmlSerializer to read and write xml.
--
Vladimir Nesterovsky
e-mail: vladimir@nesterovsky-bros.com
home: http://www.nesterovsky-bros.com
Even outside the .NET world, there have been for some time two ways to
read XML. I've heard them referred to as "DOM" and "SAX".
"DOM" (short for "Document Object Model") parsers read the entire XML
document and build a representation of it as a hierarchy of objects in
memory. There are DOM parsers for Java, C, C++, and other languages as
well as the ones built into .NET.
DOM is, generally speaking, the easiest way in which to deal with XML
documents, but it has the disadvantage that it loads the entire
document into memory, which can be a problem if you have a
many-megabyte document.
If you are reading XML into ADO.NET you really don't have any choice
but to use DOM in some form because all of MS's automated
XML-to-ADO.NET tools read the entire document into a dataset.
"SAX" (named after the original parser, I think) parsers read one XML
token at a time. You supply callback methods that the parser should
call when it encounters certain kinds of things in the document. For
example, "Call this method when you find an attribute called
"Address". SAX parsers are extremely resource efficient, because they
read only one XML element at a time. However, they leave it up to the
calling application to maintain state. When your "Address attribute"
method is called, you have no idea where in the document you are, only
that you hit an attribute called "Address". For this reason
programming for SAX parsers can be a pain in the butt.
MS claims that XmlTextReader improves upon the SAX parser, but I
remember thinking that it really wasn't a leap forward in technology,
back when I was investigating .NET's XML support.
I ended up writing what I consider the best balance between SAX and
DOM, and something that I wish MS (and Java, and ... ) would include
in their standard libraries: a sequential, forward-only parser that
reads an XML document's repeating record content one DOM tree at a
time.
In brief, there are two kinds of XML documents: those that represent
documents with little or no repeating structure (such as MS Word
files). For these you use DOM. However, many XML files represent large
record sets, where each "record" has complex substructure. DOM is
overkill for these, because you don't need all of the records in
memory at once: you're processing them serially, one-by-one. SAX,
however, is too simplistic and makes it difficult to work with each
record. What you really want is a parser that, given some information
about what constitutes a "record" in your XML document, reads one
"record" at a time into a mini DOM tree.
This is what I built for our own use here, and it works well. It reads
only a small portion of an XML file into memory at one time, but each
portion comes in as a DOM tree that is easy to work with.
Amol Kher [MSFT] - 01 Dec 2004 18:12 GMT
Good insight.
You can however easily build your program on top of XmlTextReader but
writing a custom xml reader that reads a record at a time. At an API level,
the designers cant make a choice between whether we should read records or
not. How would you decide generically what is a record in your structure or
not. XmlTextReader is a low level forward only streaming parser. You could
have built your record reader on top of it, if you havent already done so.
Can you take your program and apply it to any XML? How would you know which
is a record and which is not? XmlTextReader is a parser to read any API and
at the same time check conformance to XML 1.0 spec. What you described is a
custom xml parser solution to your needs. You could have used XmlTextReader
underlying to read tokens and report only records at a time. (ReadOuterXml
and ReadInnerXml do report one structure in a sense).
HTH,
Amol
> Even outside the .NET world, there have been for some time two ways to
> read XML. I've heard them referred to as "DOM" and "SAX".
[quoted text clipped - 48 lines]
> only a small portion of an XML file into memory at one time, but each
> portion comes in as a DOM tree that is easy to work with.
Bruce Wood - 02 Dec 2004 01:14 GMT
My DOM-tree-at-a-time XML parser is, in fact, built on top of
XmlTextReader and is generic. I called it XmlFragmentReader and it's a
subclass of XmlTextReader.
The generic parser needs one extra piece of information in order to do
its work, and adds one extra property for retrieving information.
The additional piece of information it needs is an XPath expression
for the node that encloses what I would call the "record". If I have a
document that looks like this:
<document>
<header>
</header>
<data>
<thing>
<firstData />
</thing>
<thing>
<secondData />
</thing>
</data>
<footer>
</footer>
</document>
Then I would pass my XmlFragmentReader "/document/data" on the
constructor. This tells it that every tag inside the tag
"/document/data" should be returned as a complete DOM tree. So, on the
document above, my fragment reader would return
<thing>
<firstData />
</thing>
as the first DOM tree, and
<thing>
<secondData />
</thing>
as the second DOM tree. The third read would return "end of document".
The only remaining hitch is what to do about the rest of the document.
For this, the XmlFragmentReader has a RemainintDocument property that
returns the rest of the document _excluding any records read and as
read thusfar_. So, upon opening the document and after the second
read, the RemainingDocument property would return
<document>
<header>
</header>
<data>
</data>
</document>
as its DOM tree because it can read at least as far as the opening
<data> tag. After the second read, and at the end of document, the
RemainingDocument property would return
<document>
<header>
</header>
<data>
</data>
<footer>
</footer>
</document>
as its DOM tree because after the second read it would read all the
way to the end looking for the next <data> tag. There is, of course,
no requirement that all tags inside <data> must be the same, nor that
there be only one <data> tag, only that anything not a child element
of the XPath "/document/data" is built into the RemainingDocument
progressively as the XmlTextReader passes over the document, and that
anything inside the XPath "/document/data" is returned one-by-one as a
sequence of DOM trees. This copes quite nicely with the vast majority
of documents containing repeating data, providing the benefits of DOM
trees with the memory savings of a SAX-style parser.