Skip to content

Latest commit

 

History

History
3 lines (2 loc) · 798 Bytes

File metadata and controls

3 lines (2 loc) · 798 Bytes

Some programming languages that use UTF-16 for strings face problems with unicode ranges not found in Basic Multilingual Plane (e.g., CJK Unified Ideographs Extension B) while matching those characters with using RegEx (see: dotnet/runtime#79865). This program converts unsupported unicode range RegExes into UTF-16 compliant RegExes. For example, [\U00020000-\U0002A6DF] will be converted into \uD840[\uDC00-\uDFFF]|[\uD841-\uD868][\uDC00-\uDFFF]|\uD869[\uDC00-\uDEDF].

The code is basically taken from https://stackoverflow.com/a/47627127 with some small modifications. This repo solely exists for the sake of convenience.