> I need to read/parse XHTML aspx pages and look for certain tokens and
> content. How can I use a XmlTextReader for this? If not, any other ideas?
If the pages are well-formed XHTML then it is possible to use XmlReader
(in .NET 2.0/3.0) or XmlTextReader (in .NET 1.x) to parse the XHTML
documents. You can also use the other XML APIs .NET provides so using
XPathNavigator and/or XmlDocument might offer more comfort than XmlReader.
Here is an example using XmlReader that prints out all heading elements
(h1 .. h6 elements) assuming they have no child elements:
static public void PrintHeadings (string path) {
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
using (XmlReader xmlReader = XmlReader.Create(path, settings)) {
while (xmlReader.Read()) {
if (xmlReader.NodeType == XmlNodeType.Element &&
xmlReader.NamespaceURI == "http://www.w3.org/1999/xhtml") {
switch (xmlReader.LocalName) {
case "h1":
case "h2":
case "h3":
case "h4":
case "h5":
case "h6":
Console.Out.WriteLine(
"{0} heading has InnerText: \"{1}\".", xmlReader.LocalName,
xmlReader.ReadString());
break;
}
}
}
}
PrintHeasdings("doc.xhtml");
}

Signature
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jose Antonio Reyes - 28 Jun 2007 20:18 GMT
Thanks Martin,
but how can I load the aspx page DTD?? I need to deal with special symbols
like nbsp; and so on...
For example:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
Thanks in advance,
Jose Antonio Reyes.
> > I need to read/parse XHTML aspx pages and look for certain tokens and
> > content. How can I use a XmlTextReader for this? If not, any other ideas?
[quoted text clipped - 32 lines]
> PrintHeasdings("doc.xhtml");
> }
Martin Honnen - 29 Jun 2007 12:55 GMT
> but how can I load the aspx page DTD?? I need to deal with special symbols
> like nbsp; and so on...
>
> For example:
>
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
That is an SGML DTD, don't expect to use an XML parser to consume that.
If the document is an XHTML document (not a HTML 4.0) document then you
can parse it with XmlReader, I have already included the settings for that:
static public void PrintHeadings (string path) {
XmlReaderSettings settings = new XmlReaderSettings();
settings.ProhibitDtd = false;
using (XmlReader xmlReader = XmlReader.Create(path, settings)) {

Signature
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/
Jose Antonio Reyes - 29 Jun 2007 13:52 GMT
Unfornately I could find some nbsp; items or javascript in the aspx page.
Could be a good solution to parse after the aspx and include CDATA sections??
Thanks.
> > but how can I load the aspx page DTD?? I need to deal with special symbols
> > like nbsp; and so on...
[quoted text clipped - 10 lines]
> settings.ProhibitDtd = false;
> using (XmlReader xmlReader = XmlReader.Create(path, settings)) {
Martin Honnen - 29 Jun 2007 14:27 GMT
> Unfornately I could find some nbsp; items or javascript in the aspx page.
>
> Could be a good solution to parse after the aspx and include CDATA sections??
If the document is an XHTML document and the entity nbsp is defined in
the DTD then the XML parser can parse it.

Signature
Martin Honnen --- MVP XML
http://JavaScript.FAQTs.com/