Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / General / January 2005

Tip: Looking for answers? Try searching our database.

HTML Parser

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
silent_ocean@fastmail.fm - 11 Jan 2005 07:41 GMT
Does Microsft provide any HTML Parser that could search me the img/src
attribute and others similar to it?

If not, are there any third party tools available?
Cor Ligthert - 11 Jan 2005 09:43 GMT
mshtml
http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/hosting/host
ing.asp

I hope this helps a little bit?

Cor
Silent Ocean - 11 Jan 2005 18:31 GMT
Thanks Cor. That was of great help.

Do we have similar facility available with .NET libraries? Or can we convert
HTML to XML and then XMLReader for the same?

-Ocean

> mshtml
> http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/hosting/host
ing.asp

> I hope this helps a little bit?
>
> Cor
Cor Ligthert - 12 Jan 2005 07:26 GMT
Silent,

The big difference between HTML and XML is that the first has W3C defined
tags while the last has user defined tags (direct or using a schema).

MSHTML is directly to use in dotNet when you reference that in .Net as
Microsoft.Mshtml.

Use it without a Using/Import, because of the endless interfaces your IDE
will probably almost freeze when you don't do that.

I hope this was the information you were looking for.

Cor
UAError - 12 Jan 2005 13:21 GMT
>Silent,
>
>The big difference between HTML and XML is that the first has W3C defined
>tags while the last has user defined tags (direct or using a schema).

Its not that big, its just big enough , but XHTML is trying
to bridge the gap:

XHTML™ 1.0 The Extensible HyperText Markup Language (Second
Edition)
http://www.w3.org/TR/xhtml1/

Under section 4 you can find the main obstacles for treating
HTML 4.0 as an XML document:
- XML documents must be well formed
- Attribute values must be quoted.
 etc.

It should be possible to load an XHTML document into an
XmlDocument and then use XPath to select all the image
nodes.

'Any fool can write code that a computer can understand.
Good programmers write code that humans can understand.'
Martin Fowler,
'Refactoring: improving the design of existing code', p.15
UAError - 16 Jan 2005 21:47 GMT
>Silent,
>
[quoted text clipped - 10 lines]
>
>Cor

Addendum to my previous post.

There is an Open Source (W3C license) utility "HTML Tidy"
http://www.w3.org/People/Raggett/tidy/

http://tidy.sourceforge.net/

which can generate XHTML from HTML. So it should be possible
to "pre-process" (reasonable) HTML input and then work with
the resulting output as an XML document (and benefit from
all the other XML related functionality).

'Any fool can write code that a computer can understand.
Good programmers write code that humans can understand.'
Martin Fowler,
'Refactoring: improving the design of existing code', p.15

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.