Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / Languages / C# / March 2008

Tip: Looking for answers? Try searching our database.

regex challenge

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Jarlaxle - 04 Mar 2008 18:08 GMT
I'd like to issue a challenge (I already have a class to do it so I'm not
asking just to get the code)...

Write one regex expression that removes all comments from a string.

Hints:

1. /* a block can contain newlines and must match till next */
2. // must match until end of line
3. I have seen some that claim to remove all comments but are inadequate for
one reason...(QUOTES!)
Jon Skeet [C# MVP] - 04 Mar 2008 18:35 GMT
> I'd like to issue a challenge (I already have a class to do it so I'm not
> asking just to get the code)...
[quoted text clipped - 7 lines]
> 3. I have seen some that claim to remove all comments but are inadequate for
> one reason...(QUOTES!)

Well, you need to specify the quote behaviour as well. Hint: C# quote
handling would be different to Java handling. (In fact, that goes for
various other non-quote cases, given the way Java handles Unicode
escape sequences.)

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

Jarlaxle - 04 Mar 2008 21:31 GMT
we can keep it to c#:

1. /* a block can contain newlines and must match till next */
2. // must match until end of line
3. quotes must be supported
       a.  all text from a starting quote to closing quote must not be
matched
       b.  must support \" escape sequence inside quotes.

> > I'd like to issue a challenge (I already have a class to do it so I'm not
> > asking just to get the code)...
[quoted text clipped - 12 lines]
> various other non-quote cases, given the way Java handles Unicode
> escape sequences.)
Jon Skeet [C# MVP] - 04 Mar 2008 22:30 GMT
> we can keep it to c#:
>
[quoted text clipped - 4 lines]
> matched
>         b.  must support \" escape sequence inside quotes.

What about @"\"//This is a comment ?

Signature

Jon Skeet - <skeet@pobox.com>
http://www.pobox.com/~skeet   Blog: http://www.msmvps.com/jon.skeet
World class .NET training in the UK: http://iterativetraining.co.uk

Paul E Collins - 04 Mar 2008 21:36 GMT
> I'd like to issue a challenge (I already have a class to do it so I'm
> not asking just to get the code)... Write one regex expression that
> removes all comments from a string.

How does your class handle code like this?

/* string s = "/* hello */ // how are you?"; */ string s = "/* // */";

I'm pretty sure it's impossible with a regular expression; you'd need a
true C# parser.

Eq.
Paul E Collins - 04 Mar 2008 21:43 GMT
> How does your class handle code like this?
> /* string s = "/* hello */ // how are you?"; */ string s = "/* // */";

Hey, here's something weird.

If you type my (top-of-head) code above into Visual Studio, it produces
one long line that's somehow a single comment, and none of it gets
compiled as actual code.

/* string s = "/* hello */ // how are you?"; */ string s = "/* // */";

But if you put a line break where you'd expect the first multi-line
comment to end -- without changing anything else -- then the second
statement gets syntax-coloured and compiled as C# code.

/* string s = "/* hello */ // how are you?"; */
string s = "/* // */";

So I think I've accidentally found a bug. What do I win?

Eq.

P.S. My question to the original poster still stands, bug or none!
Paul E Collins - 04 Mar 2008 22:08 GMT
Blah, never mind :) I worked out what's going on with that line.

This can serve as a lesson to anyone who would combine // and /**/
comments.

Eq.
Ben Voigt [C++ MVP] - 05 Mar 2008 15:27 GMT
> Blah, never mind :) I worked out what's going on with that line.
>
> This can serve as a lesson to anyone who would combine // and /**/
> comments.

You absolutely should combine // and /**/ comments.  If you need to comment
out a block of code, you need to block prefix with // because /* */ style
comments do not nest.
Ben Voigt [C++ MVP] - 05 Mar 2008 15:26 GMT
>> How does your class handle code like this?
>> /* string s = "/* hello */ // how are you?"; */ string s = "/* // */";
[quoted text clipped - 10 lines]
> comment to end -- without changing anything else -- then the second
> statement gets syntax-coloured and compiled as C# code.

This just goes to show that where you'd expect the first comment to end is
not, in fact, where it does end.

Here is the first comment:
/* string s = "/* hello */

The */ inside quotes is NOT skipped because it is NOT inside a quoted string
literal because a quote inside a comment is a comment, not the beginning of
a string literal.
Jesse Houwing - 05 Mar 2008 00:09 GMT
Hello Jarlaxle,

> I'd like to issue a challenge (I already have a class to do it so I'm
> not asking just to get the code)...
[quoted text clipped - 8 lines]
> inadequate for
> one reason...(QUOTES!)

I would love to take this challenge on the one hand, but am lackign time
and incentive to do so at this time.

There's a lot to keep in mind even when only considering C# as target language
for this exercise.

I say:

A that this can be done with regular expressions.
B that it will be bloody hard and unreadable
C that regex isn't the best tool for this job
D that if you mix up comment like code with string constants as in the provided
samples, you're either crazy or need total job security ;)

So for those that want to try:

A) Verbatim strings are a pain in the a.s for this as it removes the security
of the normal line ends.
B) You need to understand balancing groups to get this to work
C) And a lot of look ahead's/behinds
D) And greedy matching to solve performance issues

The best way is probably to make a regex that matches from the start to the
end and use a MatchEvaluator to null all the comments found... but that wouldn't
be a pure regex solution would it?

--
Jesse Houwing
jesse.houwing at sogeti.n

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.