Thanks Cor. That was of great help.
Do we have similar facility available with .NET libraries? Or can we convert
HTML to XML and then XMLReader for the same?
-Ocean
Silent,
The big difference between HTML and XML is that the first has W3C defined
tags while the last has user defined tags (direct or using a schema).
MSHTML is directly to use in dotNet when you reference that in .Net as
Microsoft.Mshtml.
Use it without a Using/Import, because of the endless interfaces your IDE
will probably almost freeze when you don't do that.
I hope this was the information you were looking for.
Cor
UAError - 12 Jan 2005 13:21 GMT
>Silent,
>
>The big difference between HTML and XML is that the first has W3C defined
>tags while the last has user defined tags (direct or using a schema).
Its not that big, its just big enough , but XHTML is trying
to bridge the gap:
XHTML 1.0 The Extensible HyperText Markup Language (Second
Edition)
http://www.w3.org/TR/xhtml1/
Under section 4 you can find the main obstacles for treating
HTML 4.0 as an XML document:
- XML documents must be well formed
- Attribute values must be quoted.
etc.
It should be possible to load an XHTML document into an
XmlDocument and then use XPath to select all the image
nodes.
'Any fool can write code that a computer can understand.
Good programmers write code that humans can understand.'
Martin Fowler,
'Refactoring: improving the design of existing code', p.15
UAError - 16 Jan 2005 21:47 GMT
>Silent,
>
[quoted text clipped - 10 lines]
>
>Cor
Addendum to my previous post.
There is an Open Source (W3C license) utility "HTML Tidy"
http://www.w3.org/People/Raggett/tidy/
http://tidy.sourceforge.net/
which can generate XHTML from HTML. So it should be possible
to "pre-process" (reasonable) HTML input and then work with
the resulting output as an XML document (and benefit from
all the other XML related functionality).
'Any fool can write code that a computer can understand.
Good programmers write code that humans can understand.'
Martin Fowler,
'Refactoring: improving the design of existing code', p.15