I usually avoid regex's because of performance. In this case I haven't tested
but would imagine the difference is approximatly "who cares" ... nonetheless
I just think of regex's as overkill in many situations where people try to
use them.
A great way to use them though is to put the pattern in a config file so it
can be easily changed when requirements change or for different customers w/o
recompiling the app.
> > Regex is a bit overkill for that; you could...
> >
[quoted text clipped - 16 lines]
>
> Arne
>> I think that code is an overkill compared to a simple Regex.Replace !
>
[quoted text clipped - 5 lines]
> to
> use them.
It's funny. I agree with both statements, sort of. (Do you smell an
essay coming on? You should... :) )
Fundamentally, I think that Regex is a good thing. It's a concise,
reliable way to represent various string interpretations and
manipulations. As far as performance goes, I don't think there's a
reliable way to say that Regex is always better- or worse-performing than
an equivalent explicit algorithm.
However, I do think that it's likely that Regex performs better for at
least a broad variety of possible applications, if not the majority. As a
framework class, it's got the potential to be well-optimized and there's
good justification for it to be. On the other hand, explicit algorithms
may or may not be well-optimized, depending on who wrote the code and how
often it's likely to be used.
In addition, every time you write an explicit algorithm, you risk writing
it wrong. With Regex, yes there's the possibility of writing an incorrect
expression, but it's more likely in that case that it just won't work.
It's much harder to get those subtle "happens once in awhile with only
this very specific input". Not impossible, but IMHO more difficult.
So those are all things in favor of Regex. I think that in general,
anything that allows you to specify an operation in a concise, error-free
way and then perform that operation with reasonable, or even optimal
speed, that's a good thing.
But with Regex, the conciseness is IMHO a bit overboard. I recognize that
there are folks out there who have used regular expressions so much that
it's just like writing regular programming code to them. They know it
inside and out.
But for the rest of us, using Regex is an exercise in frustration as we
skip back and forth in the MSDN documentation trying to find just the
right syntax for representing some goal. There's an incredible amount of
capability there, and with that comes a fairly extensive grammar that
needs to be learned to use it effectively. But the syntax of that grammar
is pretty arcane IMHO, and has been very hard to learn, at least for me.
I wish we had something like Regex, but with a more natural-language-like
way to program it. Maybe something like a RegexBuilder class or something
that you can use to construct an appropriate regular expression. Or maybe
just a syntax that looks more like C# than like APL. Or maybe something
that takes actual C# code expressions and converts it into a suitable
regular expression. Or some alternative I've yet to consider.
I don't know what the actual solution is. All I know is that Regex itself
can be very trying to use if you're inexperienced with it, to a _much_
greater extent than, say, VB or C# might be. So in the end, for simple
operations I find myself thinking "well, some explicit C# code will be
clearer, and it should be easy to make it bug-free", and so I wind up not
using Regex there. And then for more complex operations, where the
conciseness and precision of Regex would be a benefit, I find myself
thinking "I just don't get how to do this in Regex and the docs aren't
helping me figure it out", and so I wind up not using Regex.
Which means that either way, I don't use Regex. I've posted questions
here asking how to write Regex expressions to do what I want, and to the
credit of the newsgroup experts who do know Regex, they've always come
through. For me, and for others who ask similar questions. Jesse Houwing
in particular deserves major kudos for his Regex "kung fu" and his
willingness to share it with others. But in the end, if I can't be
self-reliant on a technology, I tend not to use it.
Maybe if I had greater need to doing string pattern matching, I'd take the
time and really learn regular expressions and then it'd be useful. But I
don't, and for the occasional moments when it'd be useful to me, it's just
not worth the time and effort to figure out that specific case.
I'd love to see someone fix that problem. :)
Pete
KWienhold - 27 Feb 2008 07:11 GMT
On 27 Feb., 02:39, "Peter Duniho" <NpOeStPe...@nnowslpianmk.com>
wrote:
> >> I think that code is an overkill compared to a simple Regex.Replace !
>
[quoted text clipped - 78 lines]
>
> Pete
While I do use Regex from time to time (input field validation,
parsing Sql-Connection-strings etc.), I totally agree with Peter.
Whenever I do use regular expressions it would have been quite trivial
to achieve the same thing in code, when the pattern matching becomes
complex enough to really make you want the power the Regex engine
offers, I often find I just can't get the expression to work right in
all circumstances.
A library that would offer a more natural way of constructing regular
expressions would be great, but given the complexity of the syntax
(let alone the fact that there are several different implementations),
I don't quite see how that could be done...
Kevin Wienhold
Stefan Nobis - 27 Feb 2008 09:32 GMT
> Fundamentally, I think that Regex is a good thing.
Fundamentally a RegEx is a type 3 grammar, equivalent to a finite
automata. :)
So a RegEx is more like an upper bound to a class of pattern matching
problems. Sometimes a RegEx is not enough, then you need to go up in
the hierachy to type 2 grammars and write parsers. But in many cases
you don't need all of the expressiveness of a RegEx so you can use
quite simpler constructs.
BTW: In the class of parsing problems where regular expressions
suffice, using a RegEx parser is the most costly (sane) way to do the
job. Simple comparisios like IsDigitOrLetter (traversing the input
string only once, without the overhead of parser generation) are
always (much) faster and need (much) less memory.
Some problems need full regluar expression expressiveness, so in these
cases the cost and overhead of a RegEx is mandatory.
> As far as performance goes, I don't think there's a reliable way to
> say that Regex is always better- or worse-performing than an
> equivalent explicit algorithm.
These class of problems are really good studies and understood. There
are quite reliable ways to say when a RegEx is needed, what performance
and memory characterics follow and when other way are needed or more
efficient.
These and much more are the basics of computer science. There's more
to programming than just try&error.
> other hand, explicit algorithms may or may not be well-optimized,
But a regular expression may also be badly written and as such induce
much more overhead and worse performance for the same regular
expression engine used with a better written RegEx. A regluar
expression is a simple language but still complex enough to say the
same thing in different ways.
If you do basic comparision of algorithms you have always to assume
that the implementation are written as good as possible (for example a
routine to copy a 10 character long string should not need 50MB RAM
and quite some minutes of runtime to do it's job; it's always possible
to do worse, we are only interested if it's possible to do better).
> In addition, every time you write an explicit algorithm, you risk
> writing it wrong. With Regex, yes there's the possibility of
> writing an incorrect expression, but it's more likely in that case
> that it just won't work. It's much harder to get those subtle
> "happens once in awhile with only this very specific input". Not
> impossible, but IMHO more difficult.
You didn't write quite some complex regular expressions, did you? A
RegEx is quite easy to have those subtle problems. But you are not
wrong. A regular expression is a type 3 grammar, C# has (more or less)
a type 2 grammar (it's even Turing complete), so it's much more
expressive and so there exists much more potential for errors.
> But for the rest of us, using Regex is an exercise in frustration as
> we skip back and forth in the MSDN documentation trying to find just
[quoted text clipped - 3 lines]
> syntax of that grammar is pretty arcane IMHO, and has been very hard
> to learn, at least for me.
The concept of regular expressions are not that difficult. The most
common representation in todays languages are pure artificial. Other
representations and syntaxes are possible and do exists; for the
language Common Lisp exists a library called cl-ppcre implementing a
quite efficient regular expression engine (for some examples even
faster than the C engine) -- this engine understands the common
representations but also allows another syntax:
CL-USER> (ppcre::parse-string "^([\w\d_-])*$")
(:SEQUENCE :START-ANCHOR (:GREEDY-REPETITION 0 NIL (:REGISTER (:CHAR-CLASS #\w #\d #\_ #\-))) :END-ANCHOR)
It's quite long representation and maybe to some eyes even worse but
showing that other ways to notated a RegEx are quite possible.
> questions here asking how to write Regex expressions to do what I
> want
Maybe have a look at
http://weitz.de/regex-coach/
a IMHO quite useful tool to learn regular expressions and to
experiment with them.

Signature
Stefan.
Stefan Nobis - 27 Feb 2008 09:51 GMT
> CL-USER> (ppcre::parse-string "^([\w\d_-])*$")
> (:SEQUENCE :START-ANCHOR (:GREEDY-REPETITION 0 NIL (:REGISTER (:CHAR-CLASS #\w #\d #\_ #\-))) :END-ANCHOR)
Ups, bad example. The simple translator doen't convert \w and
\d. Sorry. It should read more like this (to put everything except \w
- and _ in the register):
(:SEQUENCE :START-ANCHOR
(:GREEDY-REPETITION 0 NIL
(:REGISTER
(:INVERTED-CHAR-CLASS :WORD-CHAR-CLASS
#\_
#\-)))
:END-ANCHOR)
The first to parameters to :GREEDY-REPETITION meening the min and max
allowed number of repetitions (the above 0 NIL corresponds to the *,
something like (:GREEDY-REPETITION 3 5 ...) corresponds to
...{3,5}). The syntax #\_ is Common Lisp syntax for the single
character _.
Here is a handwritten example using the verbose syntax (I
don't have the perl-like version at hand, sorry):
(:sequence :start-anchor (:alternation #\# ";;;")
(:positive-lookahead :word-char-class)
(:register (:greedy-repetition 0 nil :word-char-class))
(:positive-lookahead
(:alternation :end-anchor
(:sequence
(:greedy-repetition 1 nil
:whitespace-char-class)
:non-whitespace-char-class)))
(:greedy-repetition 0 1
(:sequence
(:greedy-repetition 1 nil :whitespace-char-class)
(:register (:greedy-repetition 0 nil :everything)))))

Signature
Stefan.
> I usually avoid regex's because of performance. In this case I haven't tested
> but would imagine the difference is approximatly "who cares" ... nonetheless
> I just think of regex's as overkill in many situations where people try to
> use them.
Usually fewer lines of code is what is most cost effective overall.
Regex is simple code (and if the reader knows regex as a general concept
it is even easy to read) and code that is easy to modify to different
requirements.
It does come with a certain overhead. It may not be suited for
being called billions or trillions of times. But I doubt that was
the case here (the variable was named 'username').
Arne