Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

line breaking in splitParagraphIntoLines for non Western Languages #3327

Closed
FrankYFTang opened this issue Dec 2, 2021 · 13 comments
Closed

Comments

@FrankYFTang
Copy link

Feature requests, bug reports etc. are very welcome as issues. But questions are directed to stackoverflow with the tag jspdf.

If you are facing issues with garbled Unicode characters, please refer to #2677.

Note that new issues should follow these guidelines. Otherwise, the issue will be closed without a comment and tagged with the "Needs Information" label.

  1. A bug should be reported as an mcve.
  2. Make sure code is properly indented and formatted (Use ``` around code blocks).
  3. Provide a runnable example. Optimally, a link to an example that runs directly in the browser (JSFiddle, CodePen, etc.). Please don't share framework-specific code such as React components, unless strictly necessary to reproduce the issue. Try to isolate the code as much as possible and use only plain JS/HTML/CSS.
  4. Try to make sure and show in your issue that the issue is actually related to jspdf and not your framework of choice or your setup.
  5. Read and follow the contribution guidelines.
  6. To make sure you have read this, delete this template and start the issue description with "I have read and understood the contribution guidelines.".

What I intend here is find a way to communicate with the creator of jsPDF, in particular, the one who is working on splitParagraphIntoLines . Maybe @HackbrettXXX ?

I try to find someone in jsPDF to help me to lobby in TC39.
I am working on a new ECMA402 proposal Intl.Segmenter v2) and my draft slides for the TC39 Dec 15-16 2021 meeting. The key functionality of this proposal is to add line break functionality to Intl.Segmenter. While we working on the Intl.Segmenter, we receive opposition from some member claiming that there are no need for line break in JavaScript since browser should solve that in the html layout instead of letting the developers to use JavaScript to break line in text. Because of that 2-3 years ago, we strip out such functionality from the Intl.Segmenter . Now I try to form v2 to add the line breaking back, I try to find counter example to prove the web does need such functionality outside the context of html but using JavaScript. I searched and found your project jsPDF which does 1) use JavaScript, 2) has font metrics information, 3) need to break line. I hope you can join me to lobby this to TC39. I implemented all the Intl. feature in v8 and my colleague is also working with Mozilla folks to implement their new replacing line wrapping feature in Mozilla Firefox so if TC39 agree, we can let all the browser to implement such API (It will take me several days of work to implement a prototype in v8 since I am also working in ICU.) But I do need web developers to express the need . Please contact me if you feel interest about this. My email is [email protected] . The TC39 presentation is scheduled in Dec 15.

@HackbrettXXX
Copy link
Collaborator

While I haven't written splitParagraphIntoLines myself, I definitively agree that such a line breaking API has many applications. E.g.

  • creating text documents like PDF in the browser
  • web applications that use SVG, Canvas or WebGL to draw text. E.g. the yFiles library from yWorks (where I work) can draw diagrams (containing text) in any of the three rendering techniques and we had to implement our own line breaking code for rendering multiline text (see TextRenderSupport). There are two issues:
    • The output is never perfect, especially for non-western languages
    • It is quite slow: in many scenarios, breaking the text into lines is what slows down the initial rendering of larger diagrams.

Both issues are addressed by this proposal.

What immediately comes to my mind is this: the user of the line breaking API should be able to pass a custom text measuring function: both use cases above need different kind of text measuring: jsPDF has its own text measuring for custom Unicode fonts. yFiles needs different text measuring when using Canvas or SVG for rendering.

@HackbrettXXX
Copy link
Collaborator

Another thing that popped into my mind is this: it might also be good to support splitting text with mixed formatting (e.g. mixed bold, italic, sub/superscript, different font sizes, etc.). This can probably be handled by the text measuring function, as long as it knows where the text chunk to measure is in the whole string.

@FrankYFTang
Copy link
Author

FrankYFTang commented Dec 2, 2021

What immediately comes to my mind is this: the user of the line breaking API should be able to pass a custom text measuring function:

That for sure WILL NOT happen. The reason is the API is designed to be a low level API which has no knowledge of text measurement, you can create a class on top of that "pass a custom text measuring function" , call that API and check with that custom text measurement function and return the result. But the TC39 API is designed not to deal with text measurement, but just return "potential line break space" and let caller to call the "custom text measuring function" to figure out the answer. In a way it really just try to implement similar to the "let word = text.split(" ");" part (but not removing any " ")

@FrankYFTang
Copy link
Author

Notice ECMA402 is Intl library for ECMAScript, which is not always within a browser (for example, could exist in Node.js or small devices) Anything dealing with DOM, html or font is out of scope of this standard body. Anything deal with RegEx or Unicode property (and many other) are in the scope. It is a language standard, not a web platform standard. Just like C++ language standard (even with STL, etc) will not have API for font or text measurement, TC39 and ECMA402 will not have those.

@FrankYFTang
Copy link
Author

FrankYFTang commented Dec 2, 2021

See https://www.ecma-international.org/publications-and-standards/standards/ecma-402/ for what is ALREADY in it. Notice the charter is
"This Standard defines the application programming interface for ECMAScript objects that support programs that need to adapt to the linguistic and cultural conventions used by different human languages and countries."

@FrankYFTang
Copy link
Author

Anything about "rendering" is out of the scope of the standard. Text boundary analysis, in the other hand, is IN the scope of that standard.

@HackbrettXXX
Copy link
Collaborator

HackbrettXXX commented Dec 6, 2021

@FrankYFTang ah, thanks for clarifying this. Nevertheless, the proposal still makes implementing line breaking algorithms easier for the scenarios I mentioned above.

@FrankYFTang
Copy link
Author

Would it be possible for you to make a statement of supporting the additional of "line" in Intl.Segementer as a JS web developer and explain why that will help the jsPDF project in a way I can quote you in my TC39 presentation. TC39 really like to see the new prospoal reflect the demand from the real world developers. And since I am working on the brower implementation (v8), your words will carry a different kind of weight (heavioer) than just mine. You do not need to ensorse the proposal in a particular shape, but just need to express the need/demand for TC39 to add a way to ALSO support "line break" in Intl.Segmenter.

@FrankYFTang
Copy link
Author

BTW, one key argument against such proposal (from another browser implementer) is the argument of the code will need to access font metrics to work with it. My counter argument is Canvas already have measureText for their context. For non-Canvas usage such as jsPDF, could you explain how do you meaure the font metrcis, do you load all the font file and parse them in js code?

@FrankYFTang
Copy link
Author

@HackbrettXXX - I would encourage you to put in your comment into tc39/proposal-intl-segmenter-v2#1 if you think adding Line Break is important.

@HackbrettXXX
Copy link
Collaborator

BTW, one key argument against such proposal (from another browser implementer) is the argument of the code will need to access font metrics to work with it. My counter argument is Canvas already have measureText for their context. For non-Canvas usage such as jsPDF, could you explain how do you meaure the font metrcis, do you load all the font file and parse them in js code?

There are different ways to measure text: in jsPDF, we load the font files and calculate the text size ourselves from the glyph sizes. However, there are at least two other ways to measure text in the browser: the canvas measureText function and the SVG API: place a temporary SVG in the DOM and measure it for example with SVGTextContentElement.getComputedTextLength() or SVGGraphicsElement.getBBox().

@HackbrettXXX
Copy link
Collaborator

You might also reach out to the author of html2canvas, a library that renders HTML to canvas, which obviously also needs to break text into lines.

@github-actions
Copy link

This issue is stale because it has been open 90 days with no activity. It will be closed soon. Please comment/reopen if this issue is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants