-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Algebraic Data Types #1448
Comments
I don't think it's wise to use an arbitrary encoding. At least not at this time. Maybe non-serialisable algebraic types is the reasonable step forward. And it's forward compatible with fixing a serialisation for algebraic types in the future. |
I also prefer non-serialisable ADTs for now. The |
I believe That being said, I don't think this proposal must depend on it either, but having an |
For the syntax, another thing that came to mind: I like the anonymous struct/record feature of Rust and OCaml, that is, you can use a record type inside an alternative and give names to the parameters of your variant. It's great for pattern matching, it's not order-dependent, etc. now I almost use them exclusively, except maybe for variants with one argument. We could enforce that for Nickel by default, having something like Additionally, ADTs will surely have to be exported by libraries, so separately from the purely type/syntax side, we might want to start thinking about how to define, export and import user-defined types (cf #422) |
I've got an idea to make union types possible. I'm not necessarily advocating this, just putting it out there. Basically, the idea is to take only what we need from tagged unions, but still be isomorphic to untagged unions. You have to tag your value, but only with the type that it is. (I'm making up
Perhaps (hopefully!) this could be inferred for types that can be painlessly unioned (like Number and String), so the real use-case would be something like this:
For this to be practical, we need to be able to bind types to variables, which as I recall is on the todo list but hasn't been implemented yet. |
@Radvendii: I wonder if this solution would differ from having native ADTs and serializing them as untagged unions? Because it seems you need to indicate which part of the union your take, so constructors are the same as for a tagged union, at first sight. But I'm not sure to understand completely, though. How do you match on such values? What would be the corresponding runtime contract? |
The difference is we guarantee that no information is lost in that translation. If I define the ADT (in haskell notation)
and then try to serialize it as an untagged union, those two options get merged and we lose information. The idea that I have is to make such an ADT unrepresentable. since values are tagged by type. It looks like a tagged union but behaves like an untagged union. Trying to construct a tagged union using the notation I used above would look like:
But this
Good question. Presumably we'd need a construct that allows us to match on the different types a union could be. I don't see off the top of my head what would make that impossible. It would look like matching on a tagged union, but as before the tags are actually just types.
You're asking one of two things. I'm not sure which so I'll answer both:
|
Side note: I have a hope in the back of my mind that the |
I feel like those two sentences are somehow contradictory. What would the "double Number" example be serialized to? If it's serialized as an untagged union, then don't you lose information as well? If both values are indistinguishable, how do you match on them? Or would you reject such an example?
Then they're not really untagged unions either. I have a hard time understanding, in practice, what do you mean by "tagged by type" and how does it differ in practice from tagged by an ADT constructor, if you need to specify this tag when you build a value. I guess my general question is: what issue would this proposal solve in practice, for example when compared to the ADT + serialize_with solution? |
Yes. Likely it would not be allowed. If it were allowed, it would be isomorphic to
For the moment I'm tagging everything explicitly using |
Just a few thoughts, but I've been so disconnected with the project that they shouldn't be taken too seriously. I think calling tagged unions ADTs is a bit confusing, at least for me ADTs is what you get from combining sum and product types. Some languages like Rust have enum types that give you all, but I don't think that's the best way to understand them. @Radvendii I can't say I completely understood your idea, but it seems very similar to C++'s any and variant. As Yann mentioned on the first message Nickel's structural type system may get something like this to work a bit harder (I think). I agree that going for the most restrictive (full on ADTs with pattern matching but without serialization) is a good first approach. It also seems that all you really want are contracts over alternatives (union or dependent records are too complex) and pattern matching, have you considered having a discriminated union type/contract, where you would specify which field is the discrimator, and impose some restrictions on the type of the discrimator (it should be flat, maybe just allow enums) and on the fact that each case of the union has to have a unique discriminator. Something like:
And then you could (maybe easily) create contracts and pattern matching for this case:
|
I don't want to completely dominate this issue discussion, so I tried to write up my idea more clearly in another issue: #1533 |
ADTs are reasonably implemented in master, with a proper syntax, pattern matching, destructuring, and typechecking. We can still discuss further improvements or challenge the current approach somewhere, but I would consider this as done. |
Algebraic Data Types
Algebraic data types (ADTs) - also called tagged unions - and the ability to pattern match on them is a common and very well appreciated feature of most typed functional languages. If record types can be understood as product types, that is a logical
and
({n : Number, s : String}
is a string and a number together), ADTs are sum types, corresponding toor
. Having both of them allow to represent most common shapes of data.Nickel enums are a very limited form of tagged unions, which can't hold any data, but have only a tag.
Motivation
ADTs are widely useful, so it's difficult to write a precise and exhaustive list of their applications, but we missed them in many occasions while working on Nickel code:
Result
)std.array.fold
. The "state" of the loop is passed along, and it's been often the case that this state is naturally an ADT.std.string.find
returns{matched = "", index = -1, groups = []}
if the pattern isn't found in the string, which is not very elegant.Representation
Currently, all Nickel primitive values (numbers, strings, booleans, arrays and records) directly map to an equivalent in the JSON data model. This is deliberate as JSON interop is of high importance.
This property isn't true for ADTs. Take a hypothetical Nickel ADT:
type Either = 'Left Number | 'Right String
. A reasonable encoding in JSON could be something like a record{tag, value}
wheretag : [| 'Left, 'Right |]
andvalue
is either of typeNumber
orString
, depending on thetag
.This is in fact how we've encoded tagged union in Nickel in practice, up to now (of course, by doing it manually, we don't have any pattern matching capability and we can represent many invalid values).
This encoding isn't unique. For once, unions are used in the wild in configurations, but they are not necessarily tagged: a more natural and direct representation of
Either
could be just one value that is either a string or a number, corresponding to the contract:Second, even if the tag+value encoding is used, the choice of field names is arbitrary. We could have chosen
discriminant
anddata
for example.Finally, ADTs usually allow multiple arguments, for example
'Binop Ast Ast
, which are harder to translate using a fixed scheme. Although I advocate for having 1-ary ADTs at most and use records to represent multiple arguments anyway.In conclusion, there doesn't seem to be a canonical encoding of ADTs into the JSON data model. Because unions (tagged or not) are used in the wild, this means in particular that choosing an arbitrary encoding could hurt interop with existing configurations.
Possible solutions
Union types
The first idea we explored a long time ago was to offer a somehow lower-level construct, (untagged) union types. Together with some magic around branching (flow typing), one can achieve a similar level of expressivity and ergonomics as with ADTs. This is the approach of TypeScript. Bare unions also fit well in a structural type system like the one of Nickel.
By writing a union, the user specifies the encoding at the same time, relieving Nickel from doing an arbitrary choice.
Nickel's gradual typing discipline requires that we derive a contract from any static type, and thus that we can derive general union contracts. Alas, union contracts are problematic and rather incompatible with lazyness1.
Dependent records
After the union contracts dead-end, we started to explore the route of adding a very limited form of depending record types, that would allow to express the encoding with a tag and a value, still in the spirit of letting the user specify the representation together with the type while providing ADT-like capabilities.
This work was stopped when @teofr left and haven't been worked on since. Dependent record types might still be a valid solution, but even restricted forms of dependent types have non trivial consequence on the type system and the implementation of the typechecker.
Pick an arbitrary encoding
We could bite the bullet and choose an arbitrary encoding, in spite of the non-canonicity.
Don't serialize ADTs
Another possible stance is to look back at our motivation section, and realize that most of the time, we need ADTs to model internal values that are passed to and return from helper functions. Among the experiments done by the Nickel team, all use-cases where we needed ADTs were values that weren't exported nor imported (this is not true of the nullable use-case, but this one probably deserve an ad hoc type anyway, because it's so pervasive and because we don't want to serialize it as a tagged union but as a bare union).
If ADTs are mostly used internally, the argument around (de)serializing them is less important. So, as for functions, we could simply make ADTs non serializable. Being able to (de)serialize unions is then a separate issue, that can be handled purely on the contract side (using proposals of limited union contracts from the Union and Intersection Contracts are Hard paper for example).
If users really need to export an ADT value, the
export_with
metadata proposal would help (the idea is to be able to provide a normalization/formatting/post-processing function called before export), as the user could provide a function from their ADT to a serializable value. Deserialization can be achieved simply through contracts, or we could have a dual ofimport_with
.Proposal
We showed that union types aren't a viable approach in Nickel (it's not technically impossible, but the cost of doing so is just too high on many levels).
Dependent record types are an interesting direction, but with a lot of uncertainty around both the feasibility and the ergonomics (I would dare to say random programmers aren't really familiar with dependent types yet).
Thus, I'm partial to the two last propositions, which aren't very different in the end. If we implement
export_with
, or any kind of equivalent metadata, the user can control the export if needed, so it might be ok to chose an arbitrary encoding by default.Or we can pick the last proposition (don't serialize ADTs), but provide a
std.enum.encode
function as for a default encoder, always forcing the user to explicitly specify an encoding.Structural vs nominal
Most ADTs in the wild (in other languages) are nominal. But Nickel's type system is structural. Thus, structural ADTs could make more sense: one example is OCaml's polymorphic variants.
On the other hand, polymorphic variants require subtyping (including in the source language) to be used as ADTs, which is arguably more complex to implement and to use.
Syntax
A second question is how ADTs would fit in the existing syntax. This section assumes that we go with proper tagged unions (not bare unions or dependent records).
We already have enum types, so it would make sense to enrich them to accept data inside the variants:
This syntax would fit particularly well with structural ADTs. If we go with nominal ADTs, we probably need a separate syntax, as the typing would be quite different (typically, an ADT couldn't have a row tail).
|
is usually picked to delimit alternatives in many functional languages, but it's already used a lot in Nickel for metadata. The separator,
might be an appealing alternative:This last example doesn't look particularly good, it's just fuel to start a discussion. Feel free to add your own suggestion for a syntax for nominal ADTs!
Roadmap
Footnotes
we wrote a paper about the issue of union contracts in a lazy language. ↩
The text was updated successfully, but these errors were encountered: