How to read textfiles with unknown encoding?
I want to read several text files (eg CSV), but I don't know the encoding.
As the textfiles may contain special chars like umlauts, chosing the right encoding seems to be crucial.
new BufferedReader(new InputStreamReader(resource.getInputStream(), encoding));
I tried reading with ISO_8859_1 which did not work propertly with umlauts encoded. So I tried UTF-8, which works.
But I don't know in future if this might also cause problems with different files. And I never now before reading a file in which encoding the file is.
So how should I best read files with encoding unknown?
Strictly speaking the other two answers are right - you just have to know what the encoding is to be guaranteed of anything. However, there are libraries out there that will allow you to make educated guesses about the encoding. Check out ICU4J or jchardet, for example.