How to remove html tags from word content?

I know there are a couple threads about it which says simply using

Regex.Replace(input, "<.*?>", String.Empty);

but I cant use it in text written in word doc. my code is like:

Microsoft.Office.Interop.Word.Document wBelge = oWord.Documents.Add(ref oMissing,
    ref oMissing, ref oMissing, ref oMissing);
Microsoft.Office.Interop.Word.Paragraph paragraf2;
paragraf2 = wBelge.Paragraphs.Add(ref oMissing);
paragraf2.Range.Text ="some long text";

I can change with finding and replacing like

Word.Find findObject = oWord.Selection.Find;
findObject.ClearFormatting();
findObject.Text = "<strong>";
findObject.Replacement.Text = "";
findObject.Replacement.ClearFormatting();               

object replaceAllc = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAllc, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

Do I need to do this for every html tag?

Answers


Give a try the following:

Convert the text with HTML addings to a simple string using

string unFormatted = paragrapf2.ToString(SaveOptions.DisableFormatting));

and then replace the paragraf2 contect with the unFormatted string.


With some help provided in the comments, i realized the following working solution

findObject.ClearFormatting();
findObject.Text = @"\<*\>";
findObject.MatchWildcards=true;                     
findObject.Replacement.ClearFormatting();
findObject.Replacement.Text = "";                       

object replaceAll = Word.WdReplace.wdReplaceAll;
findObject.Execute(ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing,
    ref replaceAll, ref oMissing, ref oMissing, ref oMissing, ref oMissing);

which is using the search pattern \<*\> (containing the wildcard character *, hence findObject.MatchWildcards must be set to true).


Need Your Help

JAXB XmlElement Annotation convert from String to int

java xml jaxb annotations

I am receiving XML via a web service, and I am parsing the XML into Java using the XmlElement Annotation. Here is how I am using the annotations. Please let me know if I am using them correctly. ...

Input button image styling in IE9

html css forms internet-explorer-9

I'm styling a form, using an input where type=image. It renders fine in FF, but not in IE (pic below), where it adds a bevelled border. It's also showing the little icon thing over my image, althou...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.