Happy holidays!
with recent changes( https://github.com/fluffos/fluffos/pull/544 and https://github.com/fluffos/fluffos/pull/550) merged in, FluffOS is on track to have full UTF-8 support.
What does this mean:
- FluffOS depends on ICU library, which is the most widely used and robust framework to handle unicode data.
- LPC compiler will only accept source code in UTF-8 encoding. If it is not, you will see compiler errors about invalid UTF-8 string.
- LPC string is stored internally using UTF-8 encoding, unlike old programming languages like java, javascript, fluffos doesn’t use UTF-16LE, we are more like rust.
- FluffOS,(unlike ldmud) fully supports extended grapheme clusters, which means strlen() returns EGC counts in the string, and substrings operations like str[0…1], will correctly slice at EGC boundaries, not codepoint boundary . This means full support for multi-codepoint emojis!
- for maximum backward compatibility reason string index operation like str[0] still works for single codepoint EGC, both as rvalue and lvalue, that means you can still write str[0] = ‘a’ , and it will do the right thing TM.
- Also there are fixes to sprintf to consider character width . which means padding and justification works as expected, treating wide characters to be 2 column and not 1.
There are still some rough edges to be worked out. If you see a case not covered in https://github.com/fluffos/fluffos/blob/master/testsuite/single/tests/compiler/utf8.c feel free to chime in!
Cheers and happy holiday.