Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / CLR / July 2004

Tip: Looking for answers? Try searching our database.

UNANSWERED: strange framework exceptions with regex

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Tim Mackey - 11 Jun 2004 23:40 GMT
hi,
posting this a third time in the hope that someone may explain why it happens, or that MS will acknowledge a bug and fix it in 2.0...

i have a regular expression and very occassionally i'm getting an index out
of bounds exception from one of the inner framework methods, when i use
Match().  i don't know what the input is because its in a production
environment and all debugging is turned off.

my code is as follows:
----------------------------
Regex rex = new Regex(@"http\:\/\/([a-zA-z0-9\-]*\.?)*?(\:[0-9]*)??\/",
RegexOptions.IgnoreCase);
Match match = rex.Match(absoluteUrl);        // exception happens here
if(match.Success)
  return "/" + absoluteUrl.Replace(match.ToString(), ""); // strip out the
absolute part of the entire url, returning the relative url.
-------------------------------

> stack trace:
> -------------------------------
> IndexOutOfRangeException at
> System.Text.RegularExpressions.RegexInterpreter.Go() at
> System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text,
> Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean
quick)
> at
> System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen,
> String input, Int32 beginning, Int32 length, Int32 startat) at
> System.Text.RegularExpressions.Regex.Match(String input)
> -------------------------------
>
> thanks for any help
> tim mackey.

\\ email: tim at mackey dot ie //
\\ blog: http://tim.mackey.ie //
67d0ebfec70e8db3
Justin Rogers - 12 Jun 2004 09:08 GMT
Sorry Tim, but I'm afraid we just can't help if we can't reproduce the scenario
that you are running yourself into.  Any number of things could be going
wrong in this scenario and the Interpreter code is sufficiently complex that
I'm not sure anyone could poke a guess at why the exception is occuring.

Eat the cost of turning your debugging on for a few days and give us the
string that is tossing the exception.

Signature

Justin Rogers
DigiTec Web Consultants, LLC.
Blog: http://weblogs.asp.net/justin_rogers

 hi,
 posting this a third time in the hope that someone may explain why it happens, or that MS will acknowledge a bug and fix it in 2.0...

 i have a regular expression and very occassionally i'm getting an index out
 of bounds exception from one of the inner framework methods, when i use
 Match().  i don't know what the input is because its in a production
 environment and all debugging is turned off.

 my code is as follows:
 ----------------------------
 Regex rex = new Regex(@"http\:\/\/([a-zA-z0-9\-]*\.?)*?(\:[0-9]*)??\/",
 RegexOptions.IgnoreCase);
 Match match = rex.Match(absoluteUrl);        // exception happens here
 if(match.Success)
    return "/" + absoluteUrl.Replace(match.ToString(), ""); // strip out the
 absolute part of the entire url, returning the relative url.
 -------------------------------

 >
 > stack trace:
 > -------------------------------
 > IndexOutOfRangeException at
 > System.Text.RegularExpressions.RegexInterpreter.Go() at
 > System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text,
 > Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean
 quick)
 > at
 > System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen,
 > String input, Int32 beginning, Int32 length, Int32 startat) at
 > System.Text.RegularExpressions.Regex.Match(String input)
 > -------------------------------
 >
 > thanks for any help
 > tim mackey.

 \\ email: tim at mackey dot ie //
 \\ blog: http://tim.mackey.ie //
 67d0ebfec70e8db3
Jay B. Harlow [MVP - Outlook] - 12 Jun 2004 15:59 GMT
Tim,
Have you considered calling Microsoft directly with the problem? If there is
an actual bug you will not be charged with the support call.

Have you considered asking in a different newsgroup?
microsoft.public.dotnet.framework or microsoft.public.dotnet.general have a
larger following someone in one of those may have come across your
problem...

As Justin suggested, have you considered putting the RegEx.Match in a try
catch & writing out the URL that is causing problems? So as to identify the
URL that is causing an issue...

I would consider creating a custom exception class so as to log the URL &
other context info that is causing the exception...

try
{
   Match match = rex.Match(absoluteUrl);        // exception happens here
}
catch (Exception ex)
{
   // throw new exception with the input, pattern & innerException
   throw new MyMatchException(absoluteUrl, pattern, ex);
}
if(match.Success)
  return "/" + absoluteUrl.Replace(match.ToString(), ""); // strip out the

Note for production environments I find it invaluable to add global
exception handlers to my application, where the global exception handler
logs Exception.ToString to the EventLog. With a custom exception class, the
log would contain the URL & other context info that caused the problem. The
Exception Management Block is useful for this logging & provides options as
to how & where things are logged...

http://msdn.microsoft.com/webservices/building/frameworkandstudio/default.aspx?p
ull=/library/en-us/dnbda/html/emab-rm.asp


Depending on the type of application you are creating, .NET has three
different global exception handlers.

For ASP.NET look at:
   System.Web.HttpApplication.Error event
   Normally placed in your Global.asax file.

For console applications look at:
   System.AppDomain.UnhandledException event
   Use AddHandler in your Sub Main.

For Windows Forms look at:
   System.Windows.Forms.Application.ThreadException event
   Use AddHandler in your Sub Main.

It can be beneficial to combine the above global handlers in your app, as
well as wrap your Sub Main in a try catch itself.

There is an article in the June 2004 MSDN Magazine that shows how to
implement the global exception handling in .NET that explains why & when you
use multiple of the above handlers...

http://msdn.microsoft.com/msdnmag/issues/04/06/NET/default.aspx

For example: In my Windows Forms apps I would have a handler attached to the
Application.ThreadException event, plus a Try/Catch in my Main. The
Try/Catch in Main only catches exceptions if the constructor of the MainForm
raises an exception, the Application.ThreadException handler will catch all
uncaught exceptions from any form/control event handlers.

Hope this helps
Jay

Hope this helps
Jay

hi,
posting this a third time in the hope that someone may explain why it
happens, or that MS will acknowledge a bug and fix it in 2.0...

i have a regular expression and very occassionally i'm getting an index out
of bounds exception from one of the inner framework methods, when i use
Match().  i don't know what the input is because its in a production
environment and all debugging is turned off.

my code is as follows:
----------------------------
Regex rex = new Regex(@"http\:\/\/([a-zA-z0-9\-]*\.?)*?(\:[0-9]*)??\/",
RegexOptions.IgnoreCase);
Match match = rex.Match(absoluteUrl);        // exception happens here
if(match.Success)
  return "/" + absoluteUrl.Replace(match.ToString(), ""); // strip out the
absolute part of the entire url, returning the relative url.
-------------------------------

> stack trace:
> -------------------------------
> IndexOutOfRangeException at
> System.Text.RegularExpressions.RegexInterpreter.Go() at
> System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text,
> Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean
quick)
> at
> System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen,
[quoted text clipped - 4 lines]
> thanks for any help
> tim mackey.

\\ email: tim at mackey dot ie //
\\ blog: http://tim.mackey.ie //
67d0ebfec70e8db3
Pandurang Nayak - 15 Jun 2004 11:34 GMT
have you considered the possibility that certain URLs might be of the form:
http://www.somesite.com/something.aspx?site=http://somesitereference.com - in this case, your regex is probably matching the second string and then when you run the Replace command your getting a out of range coz it exceeds the string length.

You could add a condition to check if the index recieved from the regex match is out of bounds (less than zero or greater than the string length, or if the string is empty). In any of these cases, you could log the URL for inspection later like other people have suggested already.

Regards
Pandurang
Signature

blog: pandurang.thinkingMS.com

> hi,
> posting this a third time in the hope that someone may explain why it happens, or that MS will acknowledge a bug and fix it in 2.0...
[quoted text clipped - 33 lines]
> \\ blog: http://tim.mackey.ie //
> 67d0ebfec70e8db3
Tim Mackey - 19 Jun 2004 19:44 GMT
hi Pandurang,
unfortunately my code never got as far as the replace command because the
exception happens at the line above. i have added in an error logging
try/catch to inform me of the url that caused the problem. next time it
happens, i'll post the problem causing url here.

thanks
tim
Lasse V?gs?ther Karlsen - 15 Jun 2004 21:25 GMT
> hi,
> posting this a third time in the hope that someone may explain why it
> happens, or that MS will acknowledge a bug and fix it in 2.0...

In order to verify that it's a bug, we would probably need a copy of the
url string that produces the exception as well.

You need to turn on some kind of logging so that you get hold of the value.

Signature

Lasse V?gs?ther Karlsen
http://www.vkarlsen.no/
PGP KeyID: 0x0270466B

David Gutierrez[MSFT] - 24 Jun 2004 17:00 GMT
Tim, I did some experimenting with your code snippet and found that an
absolute url without the trailing / will cause this exception.  For
example: "http://www.msn.com".  I'll enter a bug to track this and it
should get fixed in the next version.  Thanks for letting us know about
this!

David
Tim Mackey - 01 Jul 2004 17:41 GMT
hi David,
thanks for acknowledging that, glad i got brought to light before 2.0 is
officially released.
it could be caused by my dodgy regular expression as i'm new to this topic
:)
thanks
tim

\\ email: tim at mackey dot ie //
\\ blog: http://tim.mackey.ie //
67d0ebfec70e8db3
Tim Mackey - 21 Jul 2004 11:49 GMT
hi david,
further to my last email, i have found another example of a string
that causes the index out of range exception.

for the regex: http://([a-zA-z0-9\-]*\.?)*?(:[0-9]*)??/
the string: http://ks%20med%20test/
causes the exception.  it's the % character i believe.

hope you can include this in your testing.
thanks
tim
Niki Estner - 28 Jul 2004 23:58 GMT
Hi Tim,

I could reduce this to: "(a?)*?b", mathing "a".
Seems like the regex engine doesn't like a lazy matching quantifier that
contains a subexpression that can match nothing.
I'd suggest avoiding the lazy quantifier as a workaround (i.e.
"http://([a-zA-z0-9\-]*\.?)*(:[0-9]*)??/"), or adding some required part to
the capture (e.g. "http://([a-zA-z0-9\-]+\.?)*?(:[0-9]*)??/".)

Niki

> hi david,
> further to my last email, i have found another example of a string
[quoted text clipped - 7 lines]
> thanks
> tim

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.