Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add structure to UCM command args #5480

Draft
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

sellout
Copy link
Contributor

@sellout sellout commented Dec 5, 2024

Overview

This changes the UCM command parser quite a bit.

The args now have to match the parameter definitions

First, this renames some things to make the distinction between parameters and arguments provided to those parameters clearer. This is most apparent in the InputPattern.args to InputPattern.params change.

Previously, there were a lot of commands which omitted or had incorrect parameter definitions. This prevented completion from working for those, and just allowed things to get a bit out of sync. This is no longer possible. They now have to match.

But an InputPattern’s Parameters are now used to determine whether an argument is “structured” or not. Structured args have numbered arg substitution applied, while unstructured args do not. So the arguments to run are unstructured, meaning numbers can be passed as arguments now.

Also, the arguments are lexed with Unison’s lexer, which (among other things) handles quoted strings consistently. This now allows for file paths containing spaces, and arguments to other functions (run, create.author, etc. now consistently allow quoted strings)

Fixes #2805 and #5193.

Implementation notes

It creates a more rigid Parameters structure is then folded alongside a list of arguments. As each argument is processed, it is optionally (based on the individual Parameter) passed through numeric expansion. That can cause an argument to be replaced with multiple arguments (because of ranges), which then cascade to the later parameters.

The result of the fold is a pair of remaining parameters and the expanded arguments that have been processed. If there are unsatisfied required parameters, then it its handed off to fuzzy completion.

Interesting/controversial decisions

This uses the existing Unison lexer. It seems like a better approach might be to use some of the lexing functions in a new (simpler) lexer that also supports lexemes for projects, branches, etc.

Test coverage

The transcripts check a lot of this.

Loose ends

This is still a WIP. It mostly works, but there are a couple remaining rough edges, and the code needs to be cleaned up.

Right now, I think it requires quoting strings a bit more often than would be ideal (basically, anything that doesn’t form a valid Unison lexeme, so "@project/branch", versions like "1.2.3", and a couple other cases). The error reporting needs to be improved as well.

I mentioned having a more specific lexer for this as one bit of further work (which would eliminate the extra quoting). Alternatively, we could lex incrementally – with each ParameterType specifying whether the next argument should be lexed or just taken to the next space.

Another good change would be to move the argument handling functions from the individual command parsers to the ParameterType structure.

- distinguish between “parameters” and “arguments” – a command has a
  fixed number of parameters, each of which maps to some number of
  arguments (from 0 to many, depending on the parameter)
- change the type of parameters to eliminate invalid parameter structures
- require `InputPattern` parameters to cover all arguments (there are a
  number of commands whose parameters were under-specified).
This allows us to incrementally expand numbered arguments, only doing so when
we want a Unison value and not unstructured text.

Fixes unisonweb#2805.
Good:
- handling of quoted strings is now solid (e.g., paths can now contain spaces, `create.author` now takes a single
string, rather than processing the tail of the input)

Bad:
- have to quote some things we rather wouldn’t – e.g., project/branch refs with @ or /
- still some breakage to iron out
@sellout sellout requested a review from aryairani December 5, 2024 01:43
@sellout
Copy link
Contributor Author

sellout commented Dec 5, 2024

I would recommend only looking at the transcript diffs for now – the code diffs are overly complicated and messy.

@aryairani
Copy link
Contributor

Yeah IMO the quoted strings are generally going to be a dealbreaker. I think they should be optional on filenames, and probably absent from branches, version number. view "1-2" also feels bad.

I want to think about it some more though.

@sellout
Copy link
Contributor Author

sellout commented Dec 10, 2024

Yeah IMO the quoted strings are generally going to be a dealbreaker. I think they should be optional on filenames, and probably absent from branches, version number. view "1-2" also feels bad.

I agree. I think the solution is either

  1. make a UCM-specific lexer that shares some lexemes with Unison or
  2. try the parse one token, if it fails, just breakOn " ". So that gives us quoting, etc., but we end up with a spaceless String if something goes wrong.

Option 2 is a bit more cowboy, and there are issues with ranges. 1-4 will successfully lex as [Numeric 1, Numeric (-4)], so won’t fall back to a String.

I think both options have an issue with the find : syntax, because the following type is a series of lexemes that we then need to parse to get an actual type. But I haven’t tried to fix that.

Also, the first three commits here should get their own PR – they fix the “pass numbers to run” issue and have some other improvements to UCM command parameter definitions. I can get those cleaned up without worrying about the lexer bit of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ucm run : Numeric arguments are changed to find results
2 participants