Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8: enhancements for handling of multibyte sequences #9687

Merged
merged 11 commits into from
Dec 13, 2024

Conversation

edsiper
Copy link
Member

@edsiper edsiper commented Dec 4, 2024

Fix and enhance the way UTF-8 bytes are handled when encoded as JSON. Unit test have been fixed too.


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

This patch refactor a bit how UTF8 decoding works by replacing the old lookup table
for special characters/codepoints with a new routine and optional lookup table based
on the compiler type (GNU/Clang).

It also supports proper encoding of multibyte sequences.

Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
Signed-off-by: Eduardo Silva <[email protected]>
@edsiper edsiper changed the title utf8: enhancements for handling of multibyte sequences (WIP) utf8: enhancements for handling of multibyte sequences Dec 5, 2024
@edsiper edsiper added this to the Fluent Bit Next milestone Dec 5, 2024
@edsiper edsiper marked this pull request as ready for review December 5, 2024 23:53
@edsiper
Copy link
Member Author

edsiper commented Dec 5, 2024

@cosmo0920 would you please take a look at this ?

Copy link
Contributor

@cosmo0920 cosmo0920 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally reviewed the newly written utf8 decoding codes.
It sounds nice to me.
I found an issue for this PR. The dead code for flb_sds_cat_utf8 is still existing.
Can we remove that part to simplify the code base?

src/flb_sds.c Outdated Show resolved Hide resolved
Signed-off-by: Eduardo Silva <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants