Writing a string to the ouputstream of a socket
I am working on a project that adapts server-client architecture. Messages that are transferred between clients and the server are combinations of strings and byte arrays. I need to send the size of the whole message beforehand.
It is trivial to find the byte size of a byte array, however it is not so with strings. Obviously, I can convert those strings into byte arrays (taking encodings into consideration). But, these strings can be long and I don't want to allocate memory for copies of them (e.g. getBytes() allocates a new array).
My question is, what is the most memory efficient way of doing the following?
- Find byte size of a string (using UTF-8 encoding)
- Write that size to the output stream
- Write the string to the output stream
Iterate the string character by character. Call codePointAt() for each position to get its unicode code-point. Depending on the codepoint you can deduce how many bytes will be needed when encoded in UTF-8:
Codepoint range | UTF-8 bytes ----------------------------- 0 - 127 | 1 128 - 2047 | 2 2048 - 65535 | 3 65536 + | 4
But before you do that, you should first validate if this is really necessary. It is quite likely that a String passed to a socket is internally copied to a byte array anyway.
If the size is not a critical issue, use UTF16-BE encoding for strings. In this case the size will be string length * 2.
In this mode you can write Java characters one by one without a need to do additional processing (Unicode high-low surrogates etc).