Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / June 2007

Tip: Looking for answers? Try searching our database.

Regex replace where Search Value not between specific delimiters

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Rory Becker - 07 Jun 2007 22:51 GMT
Hi all.

I have managed over the last few years to get by with relativly little regex
knowledge.

However I now have what seems like a simple problem which I simply cannot
find a solution for.

As stated Regular expressions are not my strong point.

What I need is a way to replace a set string within a source but only where
that set string is not surounded by brackets.

Thus I would like to (for instance) change "AB(A)BABA" into "CB(A)BCBC"

Must I match all the A's and then loop to find those that are not surrounded
or is there a better way?

Thanks in advance
Walter Wang [MSFT] - 08 Jun 2007 08:17 GMT
Hi Rory,

You can achieve this with "Negative Lookahead":

  Regex RegexObj = new Regex("(?!\\()A(?!\\))");
  Debug.Assert(RegexObj.Replace("AB(A)BABA", "C")=="CB(A)BCBC");

<quote>
#Grouping Constructs
http://msdn2.microsoft.com/en-us/library/bs2twtah.aspx

(?! subexpression)

(Zero-width negative lookahead assertion.) Continues match only if the
subexpression does not match at this position on the right. For example,
\b(?!un)\w+\b matches words that do not begin with un.
</quote>

Hope this helps.

Regards,
Walter Wang (wawang@online.microsoft.com, remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
Rory Becker - 08 Jun 2007 09:30 GMT
> You can achieve this with "Negative Lookahead":
>
[quoted text clipped - 10 lines]
> \b(?!un)\w+\b matches words that do not begin with un.
> </quote>

This is the definative example of why "Community Rocks".

I could have spent hours looking for this :) (but I wouldn't have :P. Sadly
my iterative process would have had to do.)

Not only do I now have the answer I was looking for, but I have the name
of the feature I was describing and a great reference back to the original
docs.

Cheers Walter, your answer is about as perfect as I could have wished for :)

Great job man
Rory Becker - 11 Jun 2007 00:36 GMT
>> You can achieve this with "Negative Lookahead":
>>
>> Regex RegexObj = new Regex("(?!\\()A(?!\\))");
>> Debug.Assert(RegexObj.Replace("AB(A)BABA", "C")=="CB(A)BCBC");

Ok so I have now integrated this into my code and it works very well.

However I have found a problem which exists due to the greedy nature of the
modified regEx I am using.

Time to elaborate more on the problem....

I am trying to place parenthesis around key phrases within a text but only
if they are not contained already within parenthesis.

My code...
-------------------------------------------------------------
Dim Pattern as String = String.Format("(?!\(.*){0}(?!.*\))", regex.Escape(SearchPhrase))

However I am trying to do this iteratively with several phrases.

If I try to do this with "Hello World" and "today" in the following phrase....
-------------------------------------------------------------
"Hello World. How are you today? Hello World"
-------------------------------------------------------------
...I wind up with...
-------------------------------------------------------------
"(Hello World). How are you today? (Hello World)"
-------------------------------------------------------------

This appears to be because the regex subsystem views the word

"AB(A)BABA
Rory Becker - 11 Jun 2007 00:51 GMT
>> You can achieve this with "Negative Lookahead":
>>
>> Regex RegexObj = new Regex("(?!\\()A(?!\\))");
>> Debug.Assert(RegexObj.Replace("AB(A)BABA", "C")=="CB(A)BCBC");

Ok so I have now integrated this into my code and it works very well.

However I have found a problem which exists due to the greedy nature of the
modified regEx I am using.

Time to elaborate more on the problem....

I am trying to place parenthesis around key phrases within a text but only
if they are not contained already within parenthesis.

However I am trying to do this iteratively with several phrases.

Thus applying first...
-------------------------------------------------------------
(?!\(.*)Hello World(?!.*\))
-------------------------------------------------------------
... and then...
-------------------------------------------------------------
(?!\(.*)Hello(?!.*\))
-------------------------------------------------------------

If I try to do this with "Hello World" and "Hello" in the following phrase....
-------------------------------------------------------------
"Hello World. Hello. Hello World"
-------------------------------------------------------------
...I would like to get...
-------------------------------------------------------------
"(Hello World). (Hello). (Hello World)"
-------------------------------------------------------------
...but I wind up with...
-------------------------------------------------------------
"(Hello World). Hello. (Hello World)"
-------------------------------------------------------------

This appears to be because the regex subsystem views the word "Hello" to
already be surrounded by parenthesis.
I admit that I have changed the original RegEx to include references to .*
but without this I would have got...
-------------------------------------------------------------
"((Hello) World). (Hello). ((Hello) World)"
-------------------------------------------------------------
...as the Hello's from "Hello world" were found after already having been
parenthesised

I think I need a non greedy .* which I have researched and apears to be .*?
but this doesn't seem to change anything.

Any Ideas...
Rory Becker - 11 Jun 2007 01:06 GMT
Wouldn't you know it....

Found an answer... well one that will do for now.

All phrases are Alphanumeric + underscore so instead of
-------------------------------------------------------------
(?!«\(.*)SearchPhrase(?!\.*\)»)
-------------------------------------------------------------
...I can use...
-------------------------------------------------------------
(?!«\(\w*)SearchPhrase(?!\(\w*\)»)
-------------------------------------------------------------

Just the job

--
Ror
Walter Wang [MSFT] - 11 Jun 2007 07:50 GMT
Hi Rory,

I understand that the new problem is caused by the fact that one phrase is
a substring of another phrase. I did some research and this indeed seems
difficult to overcome with some clear rules. This will probably require
some more conditions as you currently found out. Please feel free to let me
know if there's anything I can help. Thanks.

Regards,
Walter Wang (wawang@online.microsoft.com, remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
Ben Voigt [C++ MVP] - 22 Jun 2007 00:19 GMT
>>> You can achieve this with "Negative Lookahead":
>>>
[quoted text clipped - 48 lines]
> I think I need a non greedy .* which I have researched and apears to be
> .*?

How about '[^)]*?' which means a sequence of things except a closing
parenthesis... then the match must end at the first subsequent closing
parenthesis, not extend from the first open to the ultimate close.

> but this doesn't seem to change anything.
>
> Any Ideas...?
Walter Wang [MSFT] - 22 Jun 2007 05:55 GMT
I think Rory's requirement is to first replace the "Hello World" with
"(Hello World)"; then replace "Hello" with "(Hello)"; however, since
"Hello" is a substring of "Hello World", this will result with "((Hello)
World)". Unless we could do the replace in one pass, I think it's difficult
to overcome.

Regards,
Walter Wang (wawang@online.microsoft.com, remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.