Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance on importing big records #1428

Closed
YorikSar opened this issue Jul 3, 2023 · 4 comments
Closed

Poor performance on importing big records #1428

YorikSar opened this issue Jul 3, 2023 · 4 comments
Labels
P2 major: an upcoming release

Comments

@YorikSar
Copy link
Member

YorikSar commented Jul 3, 2023

I'm trying to generate a big Nickel file with metadata for all packages in nixpkgs. When I import it, Nickel takes a relatively long time to do anything with it. I've removed all values and kept only nested record structure to test this. See nixpkgs.zip for example records. no_recurse.ncl doesn't include nested attribute sets in nixpkgs and thus is smaller, only ~17k lines/fields. with_recurs.ncl includes all nested attribute sets, totalling to ~60k lines/fields. Both take very similar time to process though:

% echo '(import "no_recurse.ncl").hello' | time nickel export
{}
 export  3.71s user 0.09s system 99% cpu 3.818 total 
% echo '(import "with_recurs.ncl").hello' | time nickel export
{}
 export  4.15s user 0.11s system 99% cpu 4.290 total

This makes me think that this is not dominated by the size of the input, but rather on just it being big.

We would like to use this in nickel-nix to provide a better interface for importing packages from Nixpkgs.

@yannham
Copy link
Member

yannham commented Jul 3, 2023

Here are two flamegraphs of exporting full_nixpkgs.ncl in a release mode and debug mode. While nothing, at first sight, looks really obviously wrong, we spent a lot of time hashing and querying hashmaps (which is both expected, given the size of records, but still probably too much). I'll try to see how many "generated variables" there are: this might cause many many more environments access when exporting, which we could completely avoid if we had a proper AST/Bytecode for the runtime.

Also, exporting is currently actually doing substitution, which might not be the most efficient. We might also avoid the validation pass altogether and just fail during serialization if we encounter something that we can't handle.

flamegraphs.zip

@yannham
Copy link
Member

yannham commented Jul 3, 2023

So, simple tests show that neither parsing nor program transformations seems to be at fault here. Evaluation is:

$ time nickel -f full_nixpkgs.ncl
nickel -f full_nixpkgs.ncl  9,71s user 0,22s system 99% cpu 10,017 total
$ time nickel -f full_nixpkgs.ncl export
nickel -f full_nixpkgs.ncl   8,80s user 0,23s system 98% cpu 9,150 total
$ time nickel <<< 'std.deep_seq (import "full_nixpkgs".ncl) 1'
1
nickel <<<   8,46s user 0,09s system 99% cpu 8,553 total
$ time nickel <<< 'std.seq (import "full_nixpkgs.ncl") 1'
1
nickel <<<   5,12s user 0,07s system 99% cpu 5,196 total
$ time nickel <<< 'let x = import "full_nixpkgs.ncl" in 2'
2
nickel <<<   0,71s user 0,08s system 99% cpu 0,794 total

I put them in order of decreasing slowness:

  1. The first example fully (deeply) evaluates the term, substitutes all variables, and pretty-print it. This is the longest.
  2. The second example fully (deeply) evaluates the term, substitutes all variables, and export it. It seems serialization code is at least more efficient than the pretty printer, which isn't totally surprising. Still the same order of magnitude
  3. The third example fully evaluates the import, but doesn't use the result. The time is very close to export (a few tenth of a second, and for some reason there is high variability in multiple runs), so the culprit seems to be evaluation.
  4. Very interestingly, the following one shallowly evaluates the main record. This should be quite fast to do, but it's still taking more than 5 seconds (against 8+ for deep_seq). If not for the last test, we could deduce that maybe parsing + program transformations also play a role. But see the next one.
  5. The last one import the file, but doesn't use it at all. This means that the imported file is parsed, typechecked and transformed. This "only" takes .79sec. I don't know if that is reasonable, but it's one order of magnitude faster than the full export.

Given 5., I really really suspect our share normal form transformation is playing a role here. In fact, since the record is around 70 to 80k entries, this transformation should generate 70 to 80k heading let-bindings with generated variables, to insert them in the environment and then immediately get them.

Additionally, because we don't have let-blocks, the interpreter doesn't know that all those lets are independent. So each new let-binding opens a new scope I believe, which means the environment, instead of being one hashmap of 80k entries, is more like a linked list of 80k entries where each entry is a hashmap with one element... if that's true it's even a miracle that the whole thing runs under 10 seconds 🥵

This needs more investigation, but one obvious fix that has been on the back of my head for some time - fix or at least substantial improvement - would be to have a dedicated construct in the AST (and, in fact, the best would be to have several ASTs, with one specifically tailored for evaluation) to hold closures. So that we don't have to store them artificially in the environment anymore. From there we can even try #837 again.

@yannham
Copy link
Member

yannham commented Oct 16, 2023

Generated variables seem to be indeed the culprit: on the HEAD of #1647 (soon to be merged), which get rid of them, I get:

$ time target/release/nickel export <<< '(import "no_recurse.ncl").hello' 
{}
target/release/nickel export <<< '(import "no_recurse.ncl").hello'  0,13s user 0,06s system 99% cpu 0,183 total
$ time target/release/nickel export <<< '(import "with_recurs.ncl").hello'
{}
target/release/nickel export <<< '(import "with_recurs.ncl").hello'  0,39s user 0,09s system 99% cpu 0,487 total

@YorikSar, are those timing reasonable, or would you need a more drastic reduction to make it usable?

@YorikSar
Copy link
Member Author

Well, 10x speedup is surely great, and now it being sub-second, it should be acceptable from UX perspective to use these in Nickel. Now that we have this option, we'll need to decide whether we want to switch to it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 major: an upcoming release
Projects
None yet
Development

No branches or pull requests

2 participants