From 1aba5b667ddf3f40bb6b0fe92b3bef2b9bbd8b2f Mon Sep 17 00:00:00 2001 From: Paul Jolly Date: Thu, 4 Jul 2024 15:31:51 +0100 Subject: [PATCH] designs/3264-embed: initial commit Initial commit of the embed proposal destined for https://cuelang.org/issue/3264. Signed-off-by: Paul Jolly Change-Id: If96e82ca68ff25db59d032c27d770c70acf2c285 Reviewed-on: https://review.gerrithub.io/c/cue-lang/proposal/+/1197180 Reviewed-by: Roger Peppe Reviewed-by: Marcel van Lohuizen TryBot-Result: CUEcueckoo --- designs/3264-embed.md | 192 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 192 insertions(+) create mode 100644 designs/3264-embed.md diff --git a/designs/3264-embed.md b/designs/3264-embed.md new file mode 100644 index 0000000..831132e --- /dev/null +++ b/designs/3264-embed.md @@ -0,0 +1,192 @@ +# Objective / Abstract + +We propose a solution for directly loading files of any type as part of CUE +evaluation. + +# Background + +Users frequently need to load JSON, YAML, or other types of files into their CUE +code. As CUE only supports `import` declarations that reference CUE packages, +users currently resort to the CUE tooling layer (`cue cmd`) to load non-CUE +files, which can be overly complex for their needs. The tooling layer was +introduced to handle external influences that make a configuration non-hermetic, +typically files. + +However, files that are part of a CUE module can be considered hermetic. We +aim to make it easier to reference these files. + +# Overview / Proposal + +We propose the `@embed` attribute for embedding. + +``` +@extern(embed) // Enable processing of embedding. + +package foo + +// Load a single JSON file +a: _ @embed(file=foo.json) + +// Load all files with a name containing a dot (".") in the images directory +// as binary files. +b: _ @embed(glob=images/*.*, type=binary) +b: [string]: bytes + +// Unusual file names may be quoted to prevent +// misinterpretation. +c: _ @embed(file="a file.json") +``` + +Key aspects: + +- Embedding must be enabled by a file-level `@extern(embed)` attribute. This + allows for quick identification of the use of embeddings by tooling. +- Embedded files can be resolved and interpreted at load time, before + evaluation: it is a syntactic operation. +- The `@embed` attribute can use a file argument for a single file and a glob + argument for multiple files. +- By default, files are decoded using the encoding implied by the +file name extension. It's an error if the extension is not known. + This can be overridden using `type=$filetype`, where `$filetypes` can be + any file type described in `cue help filetypes`. +- For glob, if the extension is not given, the `type` field is required. + + +# Detailed Design + +## Embedding variants + +When an embed attribute refers to a file, the file path is interpreted relative +to the directory containing the embed attribute and may not include ‘.’ or ‘..’ +or empty path elements. It is not possible to embed a file that is outside the +containing module. + +File paths must be `/`-separated, even if CUE is used on Windows or other OS +that does not use `/`-separated paths. + +Multiple `@embed` attributes may be associated with the same field, in which +case each of the respective values are unified. + +### `@embed(file=$filepath)` + +Specifies a single file to be loaded. The encoding of the file is determined by +the file extension unless overridden by type. + +It is an error if the file does not exist. + +### `@embed(glob=$pattern)` + +An embed attribute with the glob argument embeds any matching file as a map from +file path to embedded file. The `$pattern` is matched according to the syntax +used with [`path.Match`](https://pkg.go.dev/cuelang.org/go/pkg/path#Match). + +All files must be of the same type, as identified by the extension. In case the +extension is not fully specified (for example `@embed(glob=file.json*)`), the +type needs to be explicitly specified. + +We currently do not support `**` to allow selecting files in arbitrary +subdirectories. To allow for this in the future, we do not allow `**` to appear +in the glob. + +Files starting with a ‘.’ are not included. We could later add an option to +allow including those. + +## File types + +File types, when not derived from the file extension, are indicated with the +`type` argument. The values this argument can take follow that of the `cue help +filetypes`. In summary, a type can specify the encoding, interpretation, or +both. + +Initially we will not support the CUE filetype. Support for the `cue+data` file +type, or more generally self-contained CUE files, could be added at a later +date. + +We will also not support [`.jsonl`](https://jsonlines.org/) or multi-doc `.yaml` +file types initially. Instead these files can be embedded as `type=text` and +decoded via `encoding/json` and `encoding/yaml`. + +Unlike the command line, `@embed` does not automatically detect the +interpretation based on the contents. For instance, to interpret a JSON file as +OpenAPI, `openapi` needs to be explicitly in the `type` argument. + +Just as on the command line, if the extension neither reflects the encoding nor +the interpretation, they can both be specified in type, such as in +`type=openapi+yaml`. + +The interpretation of `type` is already internally implemented in the +[`internal/filetypes`](https://pkg.go.dev/cuelang.org/go/internal/filetypes) +package. This could be exposed via a non-internal package. + +In the future we could consider an auto-detect option as is available in the +command line. + +We will not initially support schema-guided decoders, such as text protocol +buffer values, as part of the `@embed` mechanism. In these cases, users will +have to load the files as text and use CUE builtin and CUE evaluation to decode +the embedded files. Using the +[`ExternInterpreter`](https://pkg.go.dev/cuelang.org/go/cue/cuecontext#ExternInterpreter) +infrastructure, we are at least prepared for such a change in the future. + +## Build information + +We propose listing files that are selected for embedding in the +[`cue/build.Instance.EmbedFiles`](https://pkg.go.dev/cuelang.org/go/cue/build#Instance) +field. + +## Implementation + +The embedding proposal can use the `internal/filetypes` and `internal/encoding` +packages to compute the parameters of the decoding. We should investigate if we +can reuse the `runtime.Interpreter` implementation for processing the +attributes, as it is quite similar, though different, to how the `@extern` +attribute is processed. + +# Other Considerations + +## Only support bytes for now + +We wanted to see if we could support a simpler approach that only supports bytes +and force users to convert bytes to the format they want. However, most of the +converter packages assume UTF-8. This is fine to assume for strings within CUE, +but like package [`cue/load`](https://pkg.go.dev/cuelang.org/go/cue/load), it +should not be assumed when loading files. + +We could still support only loading bytes if we ensure that all encoder +functionality properly handles BOMs. We may still want to do that regardless +eventually. + +## Supporting `**` in `glob` patterns + +We currently do not support `**` in a `glob` pattern to match arbitrary +subdirectories. If we do adopt such a feature in the future, we will likely use +the syntax adopted as part of the [LSP +specification](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.18/specification/#documentFilter), +here `**` is used to match any number of path segments, including none. + +## Parent directories + +Map keys generated for the glob option are files relative to the directory +containing the `@embed` attribute. We could, instead, create path keys relative +to a module root. This would make it possible to embed files from parent +directories (as long as they are within the same module). We could make this an +option later on and denote such paths starting them with `/` to represent the +module root. + +## Security + +Embedding is always enabled and may pull in files that end up being exposed in a +configuration. + +The restrictions that disallow embedding files from parent directories, and that +limit any embedding to files within the containing CUE module, preclude the +loading of sensitive files from random places on disk. + +A CUE module's +[`source.kind`](https://cuelang.org/docs/reference/modules/#determining-zip-file-contents) +ensures that the contents of a published module correspond to a VCS commit. +Assuming that sensitive files are not included as part of a VCS commit, this +ensures that a published CUE module will also not contain sensitive files. + +It is ultimately, however, the responsibility of the module author to ensure +that sensitive files are not accidentally included.