Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement contract deduplication optimization #1631

Merged
merged 14 commits into from
Sep 28, 2023
Merged

Implement contract deduplication optimization #1631

merged 14 commits into from
Sep 28, 2023

Conversation

yannham
Copy link
Member

@yannham yannham commented Sep 26, 2023

Fixes #1646. Partially fixes #1622. Related #1484.

Implement the contract deduplication optimization, which avoid applying many times the same contract to a field value. This optimization requires that contracts are idempotent, or it could change the meaning of programs.

Content

  • Add push_dedup method on RuntimeContract, which conditionally pushes a contract to an existing list of pending contracts, given that it's not already in the list. Theoretically, this makes the complexity of merging two pending contract lists size(left)*size(right). In practice, we expect pending contracts list to be very small (< 10), and in fact the deduplication optimization typically participate in keeping the size small.
  • Add a combine_dedup method to combine contract annotation and deduplicate them based on an unsound, "syntax equality" like check. This is to avoid accumulating and printing many copies of the same initial contract when showing the result of evaluation.
  • Fix the contract equality check for the eval environment. In particular, the contract equality checker actually needs to access to the initial environment, which comes with a few techcal complications as to how to properly define the signature in the TermEnvironment trait.
  • Add a unsound mode to contract equality checking, corresponding to comparing them syntactically, used by |
  • The previous point required to thread the initial environment into me functions, to make it accessible when deduplicating contracts. In the end, I put the initial environment into the virtual machine, because it looks like it shouldn't change much over the course of any Nickel subcommands - it's always the environment populated with the stdlib - and got rid of a lot of environment threading, a bit of simplification around the query subcommand as well.

Results

On the codebase mentioned in #1622, this got the runtime down by roughly 50%, equivalently across nickel export or the normal evaluation.

Future work

  • Apply the same optimization to lazy array contracts

@github-actions github-actions bot temporarily deployed to pull request September 26, 2023 10:52 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 26, 2023 16:32 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 26, 2023 16:37 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 26, 2023 16:53 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 26, 2023 17:43 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 26, 2023 18:06 Inactive
@yannham yannham changed the title Implement contract deduplication and merge idempotency optimizations Implement contract deduplication optimization Sep 26, 2023
@yannham yannham marked this pull request as ready for review September 26, 2023 18:08
core/src/typecheck/eq.rs Outdated Show resolved Hide resolved
/// Push a pending contract to a vector of contracts if the contract to add isn't already
/// present in the vector, according to the notion of contract equality defined in
/// [crate::typecheck::eq].
pub fn push_dedup(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is dwarfed by the cost of evaluating the contracts, but will we ever need to worry about the fact that this will be quadratic if we dedup a lot of (distinct) contracts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed before that it might be sufficient to just check the last contract applied, since most of the time duplicates come together. Is that indeed how it panned out in the real-world case we tested?

If it is a duplicate, we're likely to see that immediately since we do check the last applied contract first, but if it's not a duplicate we have to iterate through all of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, isn't it problematic to dedup in this way?

foo | PosNumber | Number | PosNumber

is different than

foo | PosNumber | Number

e.g. the latter can't get passed to a function that takes a PosNumber

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is discussed in the PR description:

Add push_dedup method on RuntimeContract, which conditionally pushes a contract to an existing list of pending contracts, given that it's not already in the list. Theoretically, this makes the complexity of merging two pending contract lists size(left)*size(right). In practice, we expect pending contracts list to be very small (< 10), and in fact the deduplication optimization typically participate in keeping the size small.

At least, that's the assumption. It could be interesting to add a bit of infrastructure to enable different level of optimizations, such that we can continue to check that the assumption is verified (at least it is on the codebase I'm experimenting on).

@Radvendii : deduplication currently only happens at runtime, for things that are merged together. That is, it doesn't impact typechecking at all (there's no dedup at this pointà), and if you write explicitly a duplicated contract annotation, such as {foo | Number | Number}, it will not be deduplicated (after all, that's your intent). Deduplication would happen for example when doing {foo | Number} & {foo | Number} & {foo = 1}.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed before that it might be sufficient to just check the last contract applied, since most of the time duplicates come together. Is that indeed how it panned out in the real-world case we tested?

I think in the real world case we have, most contracts are in fact singleton, so it probably doesn't change anything. In practice I'm not sure about this assumption of order though, because merge order is left unspecified (who knows, we might decide to reorder merges at some point for other optimizations - it's assumed to be commutative and idempotent), so I'm a bit wary of introducing any effect that depends on the order, unless we have practical clues that doing what we're doing now is actually an issue in practice (in that we exhibit quadratic behavior AND it has a noticeable impact: in practice, I'm yet to see a value with more than 3 distinct contracts applied, linear in 3 elements is not really an issue I suspect)

/// Push a pending contract to a vector of contracts if the contract to add isn't already
/// present in the vector, according to the notion of contract equality defined in
/// [crate::typecheck::eq].
pub fn push_dedup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed before that it might be sufficient to just check the last contract applied, since most of the time duplicates come together. Is that indeed how it panned out in the real-world case we tested?

If it is a duplicate, we're likely to see that immediately since we do check the last applied contract first, but if it's not a duplicate we have to iterate through all of them.

Comment on lines +611 to +614
/// **Warning**: the contract equality check used in this function behaves like syntactic
/// equality, and doesn't take the environment into account. It's unsound for execution (it
/// could equate contracts that are actually totally distinct), but we use it only to trim
/// accumulated contracts before pretty-printing. Do not use prior to any form of evaluation.
Copy link
Contributor

@Radvendii Radvendii Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be sound because all the contracts are being applied in the same environment? There's no way for foo | Foo | Foo to sneak in a shadow of for Foo in the second contract application without being different syntactically.

If it's not sound, it seems problematic for pretty-printing as well. We've run into issues when the pretty-printer produces output that doesn't get parsed back the same way, right?

/// Push a pending contract to a vector of contracts if the contract to add isn't already
/// present in the vector, according to the notion of contract equality defined in
/// [crate::typecheck::eq].
pub fn push_dedup(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, isn't it problematic to dedup in this way?

foo | PosNumber | Number | PosNumber

is different than

foo | PosNumber | Number

e.g. the latter can't get passed to a function that takes a PosNumber

core/src/typecheck/eq.rs Outdated Show resolved Hide resolved
Comment on lines +141 to +145
#[derive(Clone, Copy)]
pub struct EvalEnvsRef<'a> {
pub eval_env: &'a eval::Environment,
pub initial_env: &'a eval::Environment,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the issue here is that we can't construct a &EvalEnvsOwned from two &Environments. Do I have that right? Is this a common pattern in rust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly: what I want is not a reference to a tuple but a tuple of references instead. I don't know if it's pattern to define a companion reference type and call it Ref like that, but at least it seemed natural.

yannham and others added 11 commits September 27, 2023 11:57
The previous attempt wasn't fruitful, because we don't pass the initial
environment to the contract equality checker. This commit thread the
initial environment through the virtual machine and various functions to
make it available to the contract deduplicator
Use a special contract deduplication to avoid pretty printing many many
identical type annotations when evaluating. The actual contract
applications were already elided as far as evaluation is concerned, but
contracts annotations were still accumulated without deduplication.
@github-actions github-actions bot temporarily deployed to pull request September 27, 2023 10:06 Inactive
@github-actions github-actions bot temporarily deployed to pull request September 27, 2023 11:58 Inactive
core/src/program.rs Outdated Show resolved Hide resolved
Co-authored-by: Viktor Kleen <[email protected]>
@yannham yannham enabled auto-merge September 28, 2023 16:01
@github-actions github-actions bot temporarily deployed to pull request September 28, 2023 16:04 Inactive
@yannham yannham added this pull request to the merge queue Sep 28, 2023
Merged via the queue into master with commit f06bda9 Sep 28, 2023
4 checks passed
@yannham yannham deleted the bench-q64 branch September 28, 2023 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants