Unescape JSON Unicode sequences #1175

diamondsw · 2024-06-19T09:37:56Z

What type of request is this?

New tool idea

Clear and concise description of the feature you are proposing

Convert between unicode symbols (💯) and JSON escape sequences (\uD83D\uDCAF).

Is their example of this tool in the wild?

https://magictool.ai/tool/unicode-decoder-encoder/

Additional context

Honestly not sure if this should be a separate tool, added to "Text to Unicode", be made part of a JSON tool, etc, since it touches so many concepts (Unicode, JSON, escaping, encoding, etc).

Validations

Check the feature is not already implemented in the project.
Check that there isn't already an issue that request the same feature to avoid creating a duplicate.
Check that the feature can be implemented in a client side only app (IT-Tools is client side only, no server).

lionel-rowe · 2024-06-20T02:35:04Z

My PR #1087 will fix this if merged — it adds more conversion options to the text-to-unicode tool. One of those options is UTF-16 Code Units, of which 💯 ↔ \uD83D\uDCAF is an example.

it-tools/src/tools/text-to-unicode/text-to-unicode.service.test.ts

Lines 91 to 102 in 7e90365

    
           { 
        
             text: 'a 💩 b', 
        
             skipAscii: true, 
        
             results: { 
        
               // eslint-disable-next-line unicorn/escape-case 
        
               fullUnicode: String.raw`a \u{1f4a9} b`, 
        
               // eslint-disable-next-line unicorn/escape-case 
        
               utf16: String.raw`a \ud83d\udca9 b`, 
        
               hexEntities: String.raw`a &#x1f4a9; b`, 
        
               decimalEntities: String.raw`a &#128169; b`, 
        
             }, 
        
           },

Note that it isn't accurate to call this "JSON Unicode", as the best* way of encoding 💯 in JSON is as a literal 💯 Unicode character. For example, JavaScript's JSON.stringify just uses the literal characters where possible, and JSON.parse accepts them with no problem:

JSON.stringify({ '💯': '💯' })
// result: '{"💯":"💯"}'

JSON.parse('{"💯":"💯"}')
// result: { '💯': '💯' }

Meanwhile, encoding it as a surrogate pair may yield correct results when parsed:

JSON.parse(String.raw`{"\uD83D\uDCAF":"\uD83D\uDCAF"}`)
// result: { '💯': '💯' }

But the JSON spec doesn't require that parsers convert the surrogate pair to its corresponding code point, so it's not guaranteed that even fully spec-compliant parsers will "do the right thing" here.

*For most use cases. If you can't guarantee that the consumer will use UTF-8 as required by RFC 8259, but you can guarantee that the consumer will handle surrogate pairs correctly, then it'd make sense to use the surrogate pair. But that seems like a pretty niche situation.

diamondsw added enhancement New feature or request triage labels Jun 19, 2024

lionel-rowe linked a pull request Jun 20, 2024 that will close this issue

fix(text-to-unicode): handle non-BMP + more conversion options #1087

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unescape JSON Unicode sequences #1175

Unescape JSON Unicode sequences #1175

diamondsw commented Jun 19, 2024

lionel-rowe commented Jun 20, 2024 •

edited

Loading

Unescape JSON Unicode sequences #1175

Unescape JSON Unicode sequences #1175

Comments

diamondsw commented Jun 19, 2024

What type of request is this?

Clear and concise description of the feature you are proposing

Is their example of this tool in the wild?

Additional context

Validations

lionel-rowe commented Jun 20, 2024 • edited Loading

lionel-rowe commented Jun 20, 2024 •

edited

Loading