Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support inspect on strings with unicode #79

Merged
merged 3 commits into from
May 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion src/helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ export function normaliseOptions(
return options
}

function isHighSurrogate(char: string): boolean {
return char >= '\ud800' && char <= '\udbff'
}

export function truncate(string: string | number, length: number, tail: typeof truncator = truncator) {
string = String(string)
const tailLength = tail.length
Expand All @@ -101,7 +105,11 @@ export function truncate(string: string | number, length: number, tail: typeof t
return tail
}
if (stringLength > length && stringLength > tailLength) {
return `${string.slice(0, length - tailLength)}${tail}`
let end = length - tailLength
if (end > 0 && isHighSurrogate(string[end - 1])) {
end = end - 1
}
return `${string.slice(0, end)}${tail}`
}
return string
}
Expand Down
19 changes: 19 additions & 0 deletions test/strings.js
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,25 @@ describe('strings', () => {
expect(inspect('foobarbaz', { truncate: 3 })).to.equal("'…'")
})

it('truncates strings involving surrogate pairs longer than truncate (7)', () => {
// not '🐱🐱\ud83d…' (length 7) but '🐱🐱…' (length 6)
expect(inspect('🐱🐱🐱', { truncate: 7 })).to.equal("'🐱🐱…'")
})

it('truncates strings involving surrogate pairs longer than truncate (6)', () => {
expect(inspect('🐱🐱🐱', { truncate: 6 })).to.equal("'🐱…'")
})

it('truncates strings involving surrogate pairs longer than truncate (5)', () => {
// not '🐱\ud83d…' (length 5) but '🐱…' (length 4)
expect(inspect('🐱🐱🐱', { truncate: 5 })).to.equal("'🐱…'")
})

it('truncates strings involving graphemes than truncate (5)', () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting graphemes, is way more complex than surrogate pairs. As such splitting a grapheme (aka a visual unit from a user point of view) is acceptable for Unicode as the string remains valid. On the opposite splitting a surrogate pair is responsible to create strings that are considered invalid from an Unicode point of view as they contain illegal characters.

// partial support: valid string for unicode
expect(inspect('👨‍👩‍👧‍👧', { truncate: 5 })).to.equal("'👨…'")
})

it('disregards truncate when it cannot truncate further (2)', () => {
expect(inspect('foobarbaz', { truncate: 2 })).to.equal("'…'")
})
Expand Down
Loading