Issues decoding strings from Xml

I have been given a large quantity of Xml's where I need to pull out parts of the text elements and reuse it for other purposes. (I am using XDocument to pull Xml data).

But, how do I decode the text contained in the elements? What is even the formatting used here? A few examples:

"What is the meaning of this® asks Sonny."
"The big centre cost 1¾ million pounds"
"... lost it. ® The next ..."

I have tried HttpUtility.HtmlDecode but that did not do the trick. If I decode twice the "®" turns into a ® which is obviously not right.

Looks like ® are line breaks. The ® are probably question marks. The 190 one, I don't even know. Perhaps a dot or comma?

Any ideas would be welcome.

Answers


It does appear that the strings you show have been HTML encoded, and then XML encoded (or HTML again).

It is correct that ® -> ® -> ® (the registered trademark symbol) per the ISO Latin-1 entities - ® should behave the same way

Similarly &amp#190; would turn into a fraction representing three quarters.


Need Your Help

Is there a way to use the MongoDB C# driver synchronously

c# .net mongodb async-await mongodb-csharp

I have a 2 layered C# project. The 1st one is a data layer which connects to mongodb and sends collections to the web service layer. The problem is that I couldn't find in the new driver non-async

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.