Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / December 2007

Tip: Looking for answers? Try searching our database.

Regex Issues - Finding Qualified URLS

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Mick Walker - 11 Dec 2007 12:08 GMT
Hi,
I am using the following function to match any URLS from within a string
containing the html of a webpage:

 public List<string> DumpHrefs(String inputString)
        {
            Regex r;
            Match m;
            List<string> LstURLs = new List<string>();

            r = new Regex("href\\s*=\\s*(?:\"(?<1>[^\"]*)\"|(?<1>\\S+))",
                RegexOptions.IgnoreCase | RegexOptions.Compiled);
            for (m = r.Match(inputString); m.Success; m = m.NextMatch())
            {
               LstURLs.Add(m.Groups[1].ToString());
            }
            return LstURLs;
        }
However the problem with this, is it returns all links on the page, and
I only wish to return fully qualified links such as
http://www.domain.com/page.html and not relitive links.

I was given the following information by Kevin Spencer:
/* Start */
(?i)href\s*=\s*"?(?<1>http://[^"]+\"?[^>]*)>
First, rather than using an alternation, I just gave a rule that it could
have 0 or 1 quotes at the beginning and end. The (?i) indicates that the
regex is not case-sensitive. The group 1 consists of the character sequence
"http://" followed by any character that is not a quote mark, followed by
zero or 1 quote marks, followed by any character that is not ">". The
expression ends with the ">" character.
/* End */

I am unsure of how to incorperate the regex given by kevin into my
function, does anyone have any suggestions?

Regards
christery@gmail.com - 12 Dec 2007 19:15 GMT
> /* Start */
> (?i)href\s*=\s*"?(?<1>http://[^"]+\"?[^>]*)>

Hmm, this link *might* help

http://www.regexplib.com/Search.aspx?k=url&c=-1&m=-1&ps=20

sorta new to regex, but there is a group in regexp I think... cant se
how that above masks out links to same page but...

//CY
christery@gmail.com - 12 Dec 2007 20:21 GMT
> Hmm, this link *might* help
>
> http://www.regexplib.com/Search.aspx?k=url&c=-1&m=-1&ps=20

but it might be 4 posix... when I think a bit about the "p"

//CY

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.