How do I read a text file having Unicode codes?

I initialize a string using the following code.

  std::string unicode8String = "\u00C1 M\u00F3ti S\u00F3l";

Printing it using cout, the output is Á Móti Sól.

But when I read same same string from a text file using ifstream, store it in a std::string, and print it, the output is \u00C1 M\u00F3ti S\u00F3l.

The content of my file is \u00C1 M\u00F3ti S\u00F3l and I want to print it as Á Móti Sól. Is there any way to do this?

Answers


Off the top of my head (completely untested)

std::string convert_string(const std::string& in)
{
    std::string out;
    for (size_t i = 0; i < in.size(); )
    {
        if (i + 5 < in.size() && in[i] == '\\' && in[i+1] == 'u' && 
            in[i+2] == '0' && in[i+3] == '0' && 
            isxdigit(in[i+4]) && isxdigit(in[i+5]))
        {
            out += (unsigned char)16*in[i+4] + (unsigned char)in[i+5];
            i += 6;
        }
        else
        {
            out += in[i];
            ++i;
        }
    }
    return out;
}

But this won't work with any unicode values above 255, (e.g. \u1234) because you have the fundamental problem that your string stores 8 bit characters, and Unicode characters can have up to 20 bits.

As I said completely untested, but I'm sure you get the idea.


Can you try printing using "std::wcout"!


The unicode characters have a different representation in a text file (There is no \u).

For Evaluation

int main()
{
    // Write
    {
        std::string s = "\u00C1 M\u00F3ti S\u00F3l";
        std::ofstream out("/tmp/test.txt");
        out << s;
    }
    // Read Text
    {
        std::string s;
        std::ifstream in("/tmp/test.txt");
        std::getline(in, s);
        std::cout << "Result: " << s << std::endl;
    }
    // Read Binary
    {
        std::ifstream in("/tmp/test.txt");
        in.unsetf(std::ios_base::skipws);
        std::istream_iterator<unsigned char> first(in);
        std::istream_iterator<unsigned char> last;
        std::vector<unsigned char> v(first, last);
        std::cout << "Result: ";
        for(unsigned c: v) std::cout << std::hex << c << ' ';
        std::cout << std::endl;
    }
    return 0;
}

On Linux with UTF8: Result: Á Móti Sól Result: c3 81 20 4d c3 b3 74 69 20 53 c3 b3 6c


Need Your Help

epoll client disconnetion error

linux networking epoll

for(int i=0; i&lt;event_cnt; i++){

Strange offset in getLocationInWindow/getLocationOnScreen/getHitRect

java android android-4.3-jelly-bean

I am working on an Android application that needs to handle taps on the screen and translate them to a geographic position on a map. For this I have a helper method that helps me adjust the point in

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.