Incorrect handling of supplementary unicode characters #9

danieldk · 2014-08-04T08:00:39Z

Tokenizing the following sentence:

"Dabei handelt es sich um Sequenzen aus zwei Zeichen, die Länderkürzeln nach ISO 3166-1 ALPHA-2 entsprechen, beispielsweise 🇩🇪 (U+1F1E9 U+1F1EA) für Deutschland."

Results in incorrect XML.

This is probably related to:
https://issues.apache.org/jira/browse/XALANJ-2419

danieldk · 2014-08-04T08:26:46Z

This bug is at least fixed in the latest JDK 7, unfortunately it persists in the latest JDK 6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect handling of supplementary unicode characters #9

Incorrect handling of supplementary unicode characters #9

danieldk commented Aug 4, 2014

danieldk commented Aug 4, 2014

Incorrect handling of supplementary unicode characters #9

Incorrect handling of supplementary unicode characters #9

Comments

danieldk commented Aug 4, 2014

danieldk commented Aug 4, 2014