Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / October 2007

Tip: Looking for answers? Try searching our database.

Retrieve tag A from html

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Matteo Migliore - 05 Oct 2007 16:43 GMT
Hi.

I need a regular expression to extract all tag A from HTML
code. I need the href and text as two disting objects.

Suggestions?

Thx! ;-)
Matteo Migliore.
Martin Honnen - 05 Oct 2007 17:14 GMT
> I need a regular expression to extract all tag A from HTML
> code. I need the href and text as two disting objects.

Why do you want to use regular expressions to parse HTML when there are
APIs for that like SgmlReader
<URL:http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-
E60D-43F8-A5C4-C3BD760564BC
>

Signature

    Martin Honnen --- MVP XML
    http://JavaScript.FAQTs.com/

Matteo Migliore - 05 Oct 2007 17:49 GMT
> Why do you want to use regular expressions to parse HTML when there
> are APIs for that like SgmlReader

Thanks! I see SqmlReader but for my problem is too much
and I prefer to use .NET classes and RegEx.

I downloaded the project but i don't like it very much :-).

Thx! ;-)
Matteo Migliore.
Ignacio Machin ( .NET/ C# MVP ) - 05 Oct 2007 19:53 GMT
Hi,

All you have to do is get the text from "<a" and up to "</a>"

> Hi.
>
[quoted text clipped - 5 lines]
> Thx! ;-)
> Matteo Migliore.
Jesse Houwing - 05 Oct 2007 23:24 GMT
Hello Ignacio Machin ( .NET/ C# MVP )" machin TA laceupsolutions.com,

> Hi,
>
[quoted text clipped - 9 lines]
>> Thx! ;-)
>> Matteo Migliore.

Which would come down to something like this:

<a[^>]+href\s*=\s*"(?<href>[^"]+)"[^>]*>(?<text>(?:(?!</a).)*)

It would save the href to a group named href and the text to a group named
text.

--
Jesse Houwing
jesse.houwing at sogeti.nl
Matteo Migliore - 06 Oct 2007 06:20 GMT
> Which would come down to something like this:
>
> <a[^>]+href\s*=\s*"(?<href>[^"]+)"[^>]*>(?<text>(?:(?!</a).)*)
>
> It would save the href to a group named href and the text to a group
> named text.

Sorry but with this Regex I can't retrieve all links. I'm comparing using
WebClient class and WebBrowser (Document.Links). In the second case I obtain
all links, in the first not.

Thx a lot! ;-)
Matteo Migliore.

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.