Having some trouble with a regex that I hope someone can help me with.
The data I am processing looks as follows:
15 items per dataset. Most datasets are on only 1 line of text,
however on occasion a few text fields are multi-line making the
dataset span more than 1 line.
Each data item is surrounded by ", and the items are seperated by a ;.
The last item however is not terminated with a ;.
There are no quotes within quotes.
So basically the whole thing looks like this:
"1";"2";"3";..."15"
The regex I came up with almost works the way I need it to, however on
occasion some data items are empty resulting in ""; and in that case
my regex just skips it and doesn't return a match. That of course
throws off the position of the next data items and everything goes all
bad...I would need it to return a 0 length string for those items.
Can anyone help me with what I need to modify to make this work? Here
is the current regex: [^\"]*[^\";]
I am very tempted to just go do it the old fashioned way manually, but
if I can get this regex to work, that would be nicer.
Thanks all!
--
Stephan
2003 Yamaha R6
Chris Chilvers - 29 Apr 2006 20:43 GMT
>Having some trouble with a regex that I hope someone can help me with.
>
[quoted text clipped - 26 lines]
>
>Thanks all!
Something like:
.*?"(.*?)".*?(?:;|$)
.*?" -- ignore any charcacters until we find a opening "
(.*?)" -- capture all characters until we find the closing speach mark
(?:;|$) -- ignore any charcters until we find a ; or end of line
The would match one, you could call it multiple times to get each match
or make it:
(?:.*?"(.*?)".*?(?:;|$))*
to capture all the matches at once.
This is assuming you don't mind it accepting things like
aoeu "1"; "2" oaeuh; "3"
Which will find the values 1, 2, 3 and ignore the garbage outside the
quotes.
Kevin Spencer - 30 Apr 2006 13:23 GMT
Hi Stephan,
This may work for you:
"([\w]*)"
You didn't say, so I had to make some assumptions. First, I assumed that the
values in the items would be alphanumeric ("word" characters"). I also
assumed that there would not be other content in the target string, which
does not conform to the pattern you laid down.
Basically, it works like this:
Find a quote, followed by zero or more word characters, followed by a quote.
Put the word characters into Group 1.
Under these condition, it doesn't matter *what* delimits them, as anything
which doesn't match the pattern is eliminated. I tested it on the following:
"1";"2";"3";
"4";"5";"15";""
"..." " "";";";
ljkhl";"[]"; jhg"jkh"
Note that in the third line there are no delimiters, and a stray quote. It
found none of the third line. In the fourth line, however, it did match the
"jkh" at the end, because it satisfied the match condition.

Signature
HTH,
Kevin Spencer
Microsoft MVP
Professional Numbskull
Hard work is a medication for which
there is no placebo.
> Having some trouble with a regex that I hope someone can help me with.
>
[quoted text clipped - 30 lines]
> Stephan
> 2003 Yamaha R6