I have a simple regex need and I've already wasted too much time on it
spinning in circles. Can a regex god help a stranded soul? I just need to
replace all non-escaped ampersands in a file. It needs to skip escaped
ampersands such as & and 
&[a-zA-Z0-9]+; will get the escaped ampersands (some inproper escapes will
slip buy, but good enough for my purposes), but I need to replace all the
ampersands that aren't escaped
example:
&abc;&def& ghi&_jkl
after replace:
&abc;*def* ghi*_jkl
Paul E Collins - 18 Aug 2006 00:41 GMT
> &[a-zA-Z0-9]+; will get the escaped ampersands
> (some inproper escapes will slip buy, but good enough
[quoted text clipped - 4 lines]
> after replace:
> &abc;*def* ghi*_jkl
You can't do this unambiguously. If you've got a file that's somehow
been *partially* escaped, it's no longer in a state that makes any
sense, and you can't tell "&123;" (intended to be an escaped
character) from the identical "©" (an unescaped ampersand that
just happens to be followed by the string "123"). Where are you
getting this input from?
Eq.
Jesse Houwing - 18 Aug 2006 00:54 GMT
* Stephen Brown wrote, On 18-8-2006 1:13:
> I have a simple regex need and I've already wasted too much time on it
> spinning in circles. Can a regex god help a stranded soul? I just need to
[quoted text clipped - 9 lines]
> after replace:
> &abc;*def* ghi*_jkl
This will probably do for most circumstances, though Pauls remark does
apply of course.
&(?![a-z0-9]+;)
This will find all '&' not directly followed by a number of letters,
digits and finally a ';'.
To really ensure you're only escaping unescaped '&'s you'll need to
write a very long regex that looks like:
&(?!([0-9]+|copy|euml|amp|......);)
Where you'll have to fill the dots with all allowed escape sequences and
optimize afterwards (so that 'amp' & 'auml' bccome 'a(?:uml|mp)' for
improved speed. But my guess is that in most cases the first option will
suffice.
Jesse Houwing