How to read textfiles with unknown encoding?

I want to read several text files (eg CSV), but I don't know the encoding.

As the textfiles may contain special chars like umlauts, chosing the right encoding seems to be crucial.

new BufferedReader(new InputStreamReader(resource.getInputStream(), encoding));

I tried reading with ISO_8859_1 which did not work propertly with umlauts encoded. So I tried UTF-8, which works.

But I don't know in future if this might also cause problems with different files. And I never now before reading a file in which encoding the file is.

So how should I best read files with encoding unknown?

Answers


Strictly speaking the other two answers are right - you just have to know what the encoding is to be guaranteed of anything. However, there are libraries out there that will allow you to make educated guesses about the encoding. Check out ICU4J or jchardet, for example.


Need Your Help

Simple way to implement a Collection?

c# .net ienumerable yield ienumerator

I am developing a collection class, which should implement IEnumerator and IEnumerable.

Page views and Visits using conversion variable in Sitecatalyst

javascript web web-analytics adobe-analytics

With reference to this question Page views and Visits using traffic variables in Sitecatalyst I have changed USERID from traffic variable to conversion variable (evar).

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.