UTF-8 string size in bytes

I need to determine the length of UTF-8 string in bytes in C. How to do it correctly? As I know, in UTF-8 terminal symbol has 1-byte size. Can I use strlen function for this?

Answers


Can I use strlen function for this?

Yes, strlen gives you the number of bytes before the first '\0' character, so

strlen(utf8) + 1

is the number of bytes in utf8 including the 0-terminator, since no character other than '\0' contains a 0 byte in UTF-8.

Of course, that only works if utf8 is actually UTF-8 encoded, otherwise you need to convert it to UTF-8 first.


Need Your Help

Adding a variable number of sub-plots in a loop. add_subplot

python matplotlib

I will always have at minimum 2 sub-plots which should be positioned on top of each other without the graph areas touching. They should be wider than they are tall.

Catch Segfault or any other errors/exceptions/signals in C++ like catching exceptions in Java

c++ exception exception-handling segmentation-fault signals

I wrote a Linux program based on a buggy open source library. This library sometimes triggers segfaults that I cannot control. And of course once the library has segfaults, the entire program dies.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.