UTF-8 support in v2019

Happy holidays!

with recent changes( https://github.com/fluffos/fluffos/pull/544 and https://github.com/fluffos/fluffos/pull/550) merged in, FluffOS is on track to have full UTF-8 support.

What does this mean:

  1. FluffOS depends on ICU library, which is the most widely used and robust framework to handle unicode data.
  2. LPC compiler will only accept source code in UTF-8 encoding. If it is not, you will see compiler errors about invalid UTF-8 string.
  3. LPC string is stored internally using UTF-8 encoding, unlike old programming languages like java, javascript, fluffos doesn’t use UTF-16LE, we are more like rust.
  4. FluffOS,(unlike ldmud) fully supports extended grapheme clusters, which means strlen() returns EGC counts in the string, and substrings operations like str[0…1], will correctly slice at EGC boundaries, not codepoint boundary . This means full support for multi-codepoint emojis!
  5. for maximum backward compatibility reason string index operation like str[0] still works for single codepoint EGC, both as rvalue and lvalue, that means you can still write str[0] = ‘a’ , and it will do the right thing TM.
  6. Also there are fixes to sprintf to consider character width . which means padding and justification works as expected, treating wide characters to be 2 column and not 1.

There are still some rough edges to be worked out. If you see a case not covered in https://github.com/fluffos/fluffos/blob/master/testsuite/single/tests/compiler/utf8.c feel free to chime in!

Cheers and happy holiday.

Also, There are full support of input/output transparent transcoding, the default input/output encoding is no translation at all, which means UTF-8

set_encoding

set input/output encoding for current user.

query_encoding

Use for query the current encoding of the user.

And now we have