Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / New Users / February 2007

Tip: Looking for answers? Try searching our database.

Help with Regex.replace

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
maheshvd@gmail.com - 07 Feb 2007 00:21 GMT
Hi Group,

I've a HTML document with all sorts of HTML tags. I nned to provide
search and replace feature for text in the HTML documents. User can
enter any phrase to search and any phrase to replace it with. While
searching, I strip all HMTL tags from the HTML document and search.
User can select the document(s) s/he wants to replace the desired
text.
While replacing, I've issue. How do I replace the string with the new
one?
e.g.
The HTML document may contain:

<li>This is a test document</li> All the  <b>articles</b> here are
written for general public. <strong>Tip: <strong>If you do not find
desired articles, please mail <SPAN id="test" style="FONT-WEIGHT:
bold; COLOR: #ff0000">developer@test.com</SPAN >

User may want to find
"All the articles here"
and replace with
"all the documents here".

The resultant document could be
<li>This is a test document</li> All the  documents here are written
for general public. <strong>Tip: <strong>If you do not find desired
articles, please mail <SPAN id="test" style="FONT-WEIGHT: bold; COLOR:
#ff0000">developer@test.com</SPAN >

So while replacing the string, can I somehow ignore the HTML tags and
achieve replacement? Rest of the HTML tags must be retained in the
HTML doc.
Any thoughts will be appreciated.

Regards,
dev
Alexey Smirnov - 10 Feb 2007 22:53 GMT
On Feb 7, 1:21 am, mahes...@gmail.com wrote:
> Hi Group,
>
[quoted text clipped - 29 lines]
> HTML doc.
> Any thoughts will be appreciated.

string sourceTxt = "....";

string searchTxt = "All the articles here";
string replaceTxt = "all the documents here";

string searchPattern = searchTxt.replace(" ","(.*?)");
string replaceString = replaceTxt;

int i = 0;

while (replaceString.indexOf(" ") > -1) {
i+=1;
replaceString = Regex.Replace(" ", "$" + i.toString(), 1);
}

string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);
Alexey Smirnov - 10 Feb 2007 23:46 GMT
On Feb 10, 11:53 pm, "Alexey Smirnov" <alexey.smir...@gmail.com>
wrote:
> On Feb 7, 1:21 am, mahes...@gmail.com wrote:
>
[quoted text clipped - 49 lines]
>
> string finalTxt = Regex.Replace(sourceTxt, searchTxt, replaceString);- Hide quoted text -

A silly typo, sorry:

string sourceTxt = "....";

           string searchTxt = "All the articles here";
           string replaceTxt = "all the documents here";

           string searchPattern = searchTxt.Replace(" ", "(.*?)");
           string replaceString = replaceTxt;

           int i = 0;

           Regex r = new Regex(@"\s");
           while (replaceString.IndexOf(" ") > -1)
           {
               i += 1;
               replaceString = r.Replace(replaceString,  "$" +
i.ToString(), 1);
           }

           string finalTxt = Regex.Replace(sourceTxt, searchPattern,
replaceString);
maheshvd@gmail.com - 13 Feb 2007 18:31 GMT
Hey Alexey,
Thanks a ton. Thats a great solution.
There is a small hitch though. If the string to be replaced is bigger
that the searched string, the replacement string carries extra $3,$4.
I'm counting the words in both the strings and whateever remains goes
in the last replacement.
Hope this is the right way.
Regards,
Mahesh
Alexey Smirnov - 14 Feb 2007 22:21 GMT
On Feb 13, 7:31 pm, mahes...@gmail.com wrote:
> Hey Alexey,
> Thanks a ton. Thats a great solution.
[quoted text clipped - 5 lines]
> Regards,
> Mahesh

Yup, it could be a problem. Maybe we have to look for a better
approach.
maheshvd@gmail.com - 15 Feb 2007 02:14 GMT
> On Feb 13, 7:31 pm, mahes...@gmail.com wrote:
>
[quoted text clipped - 10 lines]
> Yup, it could be a problem. Maybe we have to look for a better
> approach.

Moreover, (.*?) will not only ignore HTML tags, it may ignore whole
sentenses. e.g. if I have something like
"This is a test where we need to replace words. Also test words"
and I search for "test words" and try to replace with "test
sentences", it will replace in 2 places because in first sentence we
have "test" and "word" seperated by many other words which we are
trying to ignore. Is there any way we can say only if its HTML tag,
replace?
Thanks for all the help. I desperately need a solution to this.
Mahesh
Alexey Smirnov - 15 Feb 2007 07:45 GMT
On Feb 15, 3:14 am, mahes...@gmail.com wrote:

> > On Feb 13, 7:31 pm, mahes...@gmail.com wrote:
>
[quoted text clipped - 21 lines]
> Thanks for all the help. I desperately need a solution to this.
> Mahesh

Sure, there is a way to do that.

Use this pattern:

test(((<[^>]*>)|\s)*?)words

It will skip HTML tags and spaces between words.
maheshvd@gmail.com - 22 Feb 2007 00:51 GMT
Yes, thats exactly what I was looking for. I tested it with few
strings, working fine. I'll test it thoroughly.
Thanks a ton.

Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.