Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we really need ASCII-only text output? #540

Closed
mpusz opened this issue Jan 4, 2024 · 13 comments
Closed

Do we really need ASCII-only text output? #540

mpusz opened this issue Jan 4, 2024 · 13 comments
Labels
design Design-related discussion help wanted Extra attention is needed question Further information is requested

Comments

@mpusz
Copy link
Owner

mpusz commented Jan 4, 2024

Besides the Unicode text output mp-units provides the ability to output ASCI-only text as well.

Standardizing such ASCII-only text output will be hard as ISO and SI standards do not specify alternative ASCII characters for those. This means we will have to guess and use some arbitrary things. Moreover, this complicates the design (e.g., requires an additional unit_symbol class template that stores two fixed_string objects).

Please let us know if you have issues with removing support for ASCII-only output and what is the rationale for keeping it.

@mpusz mpusz added help wanted Extra attention is needed question Further information is requested design Design-related discussion labels Jan 4, 2024
@JohelEGP
Copy link
Collaborator

JohelEGP commented Jan 4, 2024

We can follow [time.duration.io]:

(1.5)
Otherwise, if Period​::​type is micro, it is implementation-defined whether units-suffix is "μs" ("\u00b5\u0073") or "us".

@mpusz
Copy link
Owner Author

mpusz commented Jan 4, 2024

Yes, we could, but I do not think that is a good idea. For chrono it was one exception case. For our library there are plenty of cases like that.

@mpusz
Copy link
Owner Author

mpusz commented Jan 4, 2024

@JohelEGP
Copy link
Collaborator

JohelEGP commented Jan 4, 2024

Our support for ASCII can be one exception case in the specification.
Rather than specifying how each string representing a dimension, unit, and eventually quantity, maps to ASCII,
just specify that the format specifier for ASCII does an implementation-defined mapping of the Unicode equivalent.

@mpusz
Copy link
Owner Author

mpusz commented Jan 4, 2024

I think that is not an option. The alternative symbol for each Unicode sign has to be explicitly provided so that text logs from one application can be then read as input by the other (see #541).

@JohelEGP
Copy link
Collaborator

JohelEGP commented Jan 4, 2024

Does it?
What does scnlib or WG21 says about round-tripping the one case in std::chrono?

@tahonermann
Copy link

From a standardization perspective, symbols that utilize only characters from the basic literal character set are required since the complete set of Unicode characters is not supported by all character encodings allowed by the C++ standard. I think the question posed in this issue is therefore misguided.

I believe the desired design is for a unit specification to have a preferred symbol selected from all of the characters available in the Unicode standard as well as a fallback symbol selected from the basic literal character set ([lex.charset]p7). By default, the preferred symbol would be used if the target encoding supports the full range of Unicode characters and the fallback symbol used otherwise. For those that wish to restrict output to ASCII-only, an option should be provided to use the fallback symbol in cases where the preferred symbol could otherwise be used but is not desired.

@mpusz
Copy link
Owner Author

mpusz commented Jan 7, 2024

Exactly! I tried to form a question so that most C++ developers would understand it. I believe that most have heard about ASCII but may have no clue what "basic literal character set" means 😉

Anyway, the main question remains. Do we want to limit the implementation to The Unicode characters only, or do we also want to provide a fallback option? Having both complicates the design and potential support for text input in the future, but may be required by some users, and I would love to hear about such cases.

@mpusz
Copy link
Owner Author

mpusz commented Jan 7, 2024

@ChrisRyan98008 stated on LinkedIn:

... from a general engineering opinion I would like to keep the ascii version. I could foresee uses for it. It is just sometimes too hard to type special unicode characters so I presume it would maintain symmetry with that input method.

@mpusz
Copy link
Owner Author

mpusz commented Jan 7, 2024

@ChrisRyan98008 also suggested:

Maybe you could just do the unicode output but with a units translations output utility layer to ascii. Maybe this would open up the translation output option for other formats like LaTeX.

For now, we do not plan to provide a translation layer for text output, but a user could probably do something on their own to implement it. Please let us know in case someone has a good idea of how to incorporate such a feature into the framework.

@tahonermann
Copy link

Anyway, the main question remains. Do we want to limit the implementation to The Unicode characters only, or do we also want to provide a fallback option?

A fallback symbol is required for standardization since there is no guarantee that characters outside the basic literal character set are representable at all. That fallback symbol is needed regardless of whether the proposed std::format grammar includes an option to explicitly opt-out of use of symbols that potentially include characters from outside the basic literal character set.

The question to be posed is, is the units-text-encoding grammar option currently present in the D3045R0 draft needed or does it suffice for the implementation to determine on its own when to use the fallback symbol. The responses so far suggest that the grammar option would be used and appreciated. I don't see a reason not to provide that option.

@kwikius
Copy link
Contributor

kwikius commented Feb 7, 2024

Exactly! I tried to form a question so that most C++ developers would understand it. I believe that most have heard about ASCII but may have no clue what "basic literal character set" means 😉

Anyway, the main question remains. Do we want to limit the implementation to The Unicode characters only, or do we also want to provide a fallback option? Having both complicates the design and potential support for text input in the future, but may be required by some users, and I would love to hear about such cases.

Use case : I use my quantities library on 8bit mcu .eg https://github.com/kwikius/ultrasonic_wind_sensor/blob/master/libraries/UltrasonicWindSensor/wind_sensor_impl.cpp. ( Atmega328 ) For that type of use, the serial port is often used for output with ascii text.

@mpusz
Copy link
Owner Author

mpusz commented Feb 28, 2024

Based on the feedback we got, we decide to leave ASCII-only text output.

@mpusz mpusz closed this as completed Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design-related discussion help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants