Skip to content

Non-Basic Multilingual Plane Regex Ranges to UTF-16 Compliant Regex

License

Notifications You must be signed in to change notification settings

rampaa/UnicodeRangeToUtf16CompliantRegex

Repository files navigation

Some programming languages that use UTF-16 for strings face problems with unicode ranges not found in Basic Multilingual Plane (e.g., CJK Unified Ideographs Extension B) while matching those characters with using RegEx (see: dotnet/runtime#79865). This program converts unsupported unicode range RegExes into UTF-16 compliant RegExes. For example, [\U00020000-\U0002A6DF] will be converted into \uD840[\uDC00-\uDFFF]|[\uD841-\uD868][\uDC00-\uDFFF]|\uD869[\uDC00-\uDEDF].

The code is basically taken from https://stackoverflow.com/a/47627127 with some small modifications. This repo solely exists for the sake of convenience.