>I don't know where you have gotten your information, but this is exactly
>what the DOM is for.
Scott,
I used this approach with a Windows Forms application back in 2001, with
.NET 1.0. It worked, but was a bit clumsy, and it was time-consuming. I used
the ActiveX Internet Browser control to load the page I was interested in,
and once the page was loaded, I could access the DOM from C# code. Did you
have a different technique in mind when you talk about the DOM?
Perhaps a faster technique would be to use regular expressions to parse the
HTML and find what you're looking for.
John
Scott M. - 25 Feb 2007 15:38 GMT
What I had in mind was, if the HTML in question was well-formed (XHTML), you
could just load it into an XMLDocument (from a string) object and use the
XML DOM to parse from there.
>>I don't know where you have gotten your information, but this is exactly
>>what the DOM is for.
[quoted text clipped - 12 lines]
>
> John
Mohammad-Reza - 26 Feb 2007 08:22 GMT
Can you give a sample code for loading XHTML to a XMLDocument?
> What I had in mind was, if the HTML in question was well-formed (XHTML), you
> could just load it into an XMLDocument (from a string) object and use the
[quoted text clipped - 16 lines]
> >
> > John
Scott M. - 26 Feb 2007 14:51 GMT
Well, XHTML is XML, so you'd really be loading XML into an XMLDocument, but
once it's loaded, you can parse out whatever you like using the DOM.
Dim xmlDoc As New System.XML.XMLDocument()
'You can load the XML in one of two ways...
'docPath represents a path to an file containing the XML
xmlDoc.Load(docPath)
'or
'Here you can load a string directly
xmlDoc.LoadXML(string)
'Example of getting all the paragraph tags and then the text of the first
one using the DOM...
dim theParagraphs As XMLNodeList = xmlDoc.GetElementsByTagName("P")
dim firstParagraphText As String = theParagraphs(0).Text
-Scott
> Can you give a sample code for loading XHTML to a XMLDocument?
>
[quoted text clipped - 22 lines]
>> >
>> > John
John Saunders - 26 Feb 2007 17:27 GMT
> What I had in mind was, if the HTML in question was well-formed (XHTML),
> you could just load it into an XMLDocument (from a string) object and use
> the XML DOM to parse from there.
That works well for XHTML. The problem is that most web sites are still
using HTML, which is not well-formed XML.
John
Scott M. - 26 Feb 2007 19:06 GMT
But, we're not talking about most web pages. We are talking about a
particular page that is being used with a web service. In other words, it's
part of the OP's applicaiton, which he should have some control over.
>> What I had in mind was, if the HTML in question was well-formed (XHTML),
>> you could just load it into an XMLDocument (from a string) object and use
[quoted text clipped - 4 lines]
>
> John
John Saunders - 26 Feb 2007 22:49 GMT
> But, we're not talking about most web pages. We are talking about a
> particular page that is being used with a web service. In other words,
> it's part of the OP's applicaiton, which he should have some control over.
Sorry, I didn't recall that he said it was his application. I assumed he was
scraping from somebody else's application.
Even though it's his, there may be reasons why he can't guarantee that the
page he needs will be XHTML and will be guaranteed to remain XHTML.
John