Given a string that represents a sentence, is there a quick way of extracting
the words that make up the sentence? For example, given a string "See Dick
run." it would return an array of strings {"See","Dick","run"} One catch is
that it has to be locale sensitive.
I thought about just removing all the punctuation, but that leaves problems
like the word "don't". I don't want to end up with two words, "don" and "t".
Bill Priess (MCP) - 23 Aug 2005 22:47 GMT
Hi David,
Use Regular expressions:
Regex re = new Regex(@"\b\w*\b");
MatchCollection mc = re.Matches("See Dick Run");
foreach(Match m in mc)
{
Console.WriteLine(m.Value);
}
(I didn't test the code, but it *should* be mostly correct.
HTH,
Bill Priess MCP
> Given a string that represents a sentence, is there a quick way of
> extracting
[quoted text clipped - 8 lines]
> like the word "don't". I don't want to end up with two words, "don" and
> "t".
Nick Hertl - 23 Aug 2005 23:06 GMT
You could also use System.String.Split() passing the ' ' char as the
argument.
so something like this:
string s = "See Dick Run";
string[] a = s.Split(' ');
David - 24 Aug 2005 14:05 GMT
Thanks for the help, Bill and Nick.
I tried split(), but it had some issues. In particular, given "See Dick
run." the third word that it returned would be "run.", where what I wanted
was "run". (The period isn't part of the word.)
The function I'm looking for would actually have to be quite sophisticated,
because the application allows the user to select a language, and so the
function would have to work with whatever rules were appropriate for the
culture of the string. I can come up with the correct rules for English and
French, but after that, I get a bit lost.
I was hoping that maybe the wizards who did all this .Net Framework
globalization stuff had already tackled the problem. It was a long shot, but
sometimes I am amazed at what is buried in there, so I was hoping.
I think I'll have to use the regular expression route, which I hadn't
thought of before, and count on user feedback to improve the situation for
languages that don't seem to work.
> Given a string that represents a sentence, is there a quick way of extracting
> the words that make up the sentence? For example, given a string "See Dick
[quoted text clipped - 3 lines]
> I thought about just removing all the punctuation, but that leaves problems
> like the word "don't". I don't want to end up with two words, "don" and "t".