Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsFree MagazinesWhite PapersSubmit Content
Discussion GroupsASP.NETWindows FormsLanguages.NET FrameworkVisual Studio.NET
Articles.NET FrameworkASP.NETToolsWindows Forms
.NET DirectoryOpen Source ProjectsUser GroupsWeb Resources
Related Topics
Visual Basic 6SQL ServerMS AccessOther DB ProductsMS Server ProductsMore Topics ...

.NET Forum / .NET Framework / General / March 2008

Tip: Looking for answers? Try searching our database.

Regex Help

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Christopher Robin - 29 Mar 2008 01:58 GMT
I'm inserting a SharePoint List into a SQL Database, but some of the text has
oddly formed HTML tags.  I want to remove these tags with a regular
expression, but I'm having some difficulty.  My code is below.

Imports System
Imports System.Net
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Runtime
Imports System.Xml
Imports SharePointServices
Imports SharePointServices.NorthwindSync
Imports System.Text.RegularExpressions
Imports System.IO

Public Class ScriptMain

   Public Sub Main()

       Dim DocLoc As String
       Dim TextDoc As TextWriter
       Dim listService As New Lists()
       Dim node As XmlNode
       Dim strHtmlString As String
       Dim pattern As String = "<[/]?(font|span|div|del|ins|color:\w+)[^>]*?"

       DocLoc = "\\MYSERVER\MyFolder\MyFile.xml"

       listService.PreAuthenticate = True
       listService.Credentials = CredentialCache.DefaultNetworkCredentials

       Try

           node = ListHelper.GetAllListItems(listService, "My List Name")
           strHtmlString = node.InnerXml()
           Regex.Replace(strHtmlString, pattern, String.Empty,
RegexOptions.IgnoreCase).Trim()

           TextDoc = File.CreateText(DocLoc)
           TextDoc.WriteLine(strHtmlString)
           TextDoc.Flush()
           TextDoc.Close()

       Catch ex As Exception

           'Raise the error again and the result to failure.
           Dts.Events.FireError(1, ex.TargetSite.ToString(), ex.Message,
"", 0)
           Dts.TaskResult = Dts.Results.Failure

       End Try

       Dts.TaskResult = Dts.Results.Success

   End Sub

End Class

And here are a few samples of what I'm tryig to remove with the Regex.

"<div></div>"
"<font size=2 color="#1F497D">"
"</font><br>&nbsp;"

Any help would be greatly appreciated.

Thanks,
Chris
Jesse Houwing - 31 Mar 2008 20:02 GMT
Hello Christopher,

What is it that isn't working right now? It looks like you're nearly there.

Your pattern isn't what I'd make of it, try the following if that's what's
currently bothering you:

</?(?:font|span|div|more tags here)[^>]*>

And there seems to be a little error in your code: Regex.Replace doesn't
alter the original string (strings are immutable in .NET), but it returns
a new string instead, so the following code needs to be changed:

strHtmlString = node.InnerXml()
strHtmlString = Regex.Replace(strHtmlString, pattern, String.Empty,RegexOptions.IgnoreCase).Trim()

If that doesn't work, then please explain what it is that isn't working :).

Jesse

> I'm inserting a SharePoint List into a SQL Database, but some of the
> text has oddly formed HTML tags.  I want to remove these tags with a
[quoted text clipped - 61 lines]
> Thanks,
> Chris
--
Jesse Houwing
jesse.houwing at sogeti.n

Rate this thread:







Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.