ase/unicode.hh file


Namespaces

namespace Ase
The Anklang C++ API namespace.

Functions

auto decodefs(const std::string& utf8str) →  std::string
Decode UTF-8 string back into file system path representation, extracting surrogate code points as bytes.
auto displayfs(const std::string& utf8str) →  std::string
Convert UTF-8 encoded file system path into human readable display format, the conversion is lossy but readable.
auto encodefs(const std::string& fschars) →  std::string
Encode a file system path consisting of bytes into UTF-8, using surrogate code points to store non UTF-8 bytes.
auto string_is_ncname(const String& input) →  bool
auto string_to_ncname(const String& input, uint32_t substitute) →  String
auto unicode_is_assigned(uint32_t u) →  bool constexpr
Return whether u matches any of the assigned Unicode planes.
auto unicode_is_character(uint32_t u) →  bool constexpr
Return whether u is not one of the 66 Unicode noncharacters.
auto unicode_is_control_code(uint32_t u) →  bool constexpr
Return whether u is one of the 65 Unicode control codes.
auto unicode_is_noncharacter(uint32_t u) →  bool constexpr
Return whether u is one of the 66 Unicode noncharacters.
auto unicode_is_private(uint32_t u) →  bool constexpr
Return whether u is in one of the 3 private use areas of Unicode.
auto unicode_is_valid(uint32_t u) →  bool constexpr
Return whether u is an allowed Unicode codepoint within 0x10FFFF and not part of a UTF-16 surrogate pair.
auto utf8_to_unicode(const std::string& str, std::vector<uint32_t>& codepoints) →  size_t
auto utf8_to_unicode(const char* str, uint32_t* codepoints) →  size_t
auto utf8decode(const std::string& utf8str) →  std::vector<uint32_t>
Convert valid UTF-8 sequences to Unicode codepoints, invalid sequences are treated as Latin-1 characters.
auto utf8encode(const std::vector<uint32_t>& codepoints) →  std::string
Convert codepoints into an UTF-8 string, using the shortest possible encoding.
auto utf8encode(const uint32_t* codepoints, size_t n_codepoints) →  std::string
Convert codepoints into an UTF-8 string, using the shortest possible encoding.
auto utf8len(const std::string& str) →  size_t
Count valid UTF-8 sequences, invalid sequences are counted as Latin-1 characters.
auto utf8len(const char* str) →  size_t
Count valid UTF-8 sequences, invalid sequences are counted as Latin-1 characters.