UTF-8 string size in bytes

I need to determine the length of UTF-8 string in bytes in C. How to do it correctly? As I know, in UTF-8 terminal symbol has 1-byte size. Can I use strlen function for this?

Answers


Can I use strlen function for this?

Yes, strlen gives you the number of bytes before the first '\0' character, so

strlen(utf8) + 1

is the number of bytes in utf8 including the 0-terminator, since no character other than '\0' contains a 0 byte in UTF-8.

Of course, that only works if utf8 is actually UTF-8 encoded, otherwise you need to convert it to UTF-8 first.


Need Your Help

Adding a variable number of sub-plots in a loop. add_subplot

python matplotlib

I will always have at minimum 2 sub-plots which should be positioned on top of each other without the graph areas touching. They should be wider than they are tall.

Catch Segfault or any other errors/exceptions/signals in C++ like catching exceptions in Java

c++ exception exception-handling segmentation-fault signals

I wrote a Linux program based on a buggy open source library. This library sometimes triggers segfaults that I cannot control. And of course once the library has segfaults, the entire program dies.