indexer-alt: separate updates for consistent sequential pipelines #20482

amnn · 2024-12-02T13:33:21Z

Description

Use object changes from transaction effects to figure out which changes to consistent tables correspond to new rows and which changes correspond to updates. This means we can avoid using INSERT ... ON CONFLICT DO UPDATE which requires postgres to try an insert, detect constraints, and then go for the update, which should hopefully improve performance. On the other hand, it means that these pipelines will not work at all if they are started at an arbitrary point in time (because the UPDATE-s will fail).

The largest part of this change was adding support for bulk-updates to diesel (see diesel-rs/diesel#2879). This requires opting in to breaking changes by exposing diesel's internals. To limit the fall out of that, this support has been added in its own crate.

Finally, as part of this change, I ran into a flag that can be set on model types: #[diesel(treat_none_as_default_value = ...)] which defaults to true. Setting this to false on models that contain optional values should improve statistics collection and may improve performance through prepared statement caching.

Test plan

Unit tests for update_from query generation, and E2E tests for running updates on a DB with the new DSL:

sui$ cargo nextest run -p diesel-update-from

Run the indexer before and after the change, dump the resulting tables and make sure the results are the same:

sui$ cargo run -p sui-indexer-alt -- generate-config > /tmp/indexer.toml
sui$ cargo run -p sui-indexer-alt -- indexer            \
  --remote-store-url https://checkpoints.mainnet.sui.io/ \
  --last-checkpoint 50000 --config /tmp/indexer.toml    \
  --pipeline sum_obj_types --pipeline sum_coin_balances

sui$ psql postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt
sui_indexer_alt=# COPY
    (SELECT object_id, object_version, owner_kind, owner_id FROM sum_obj_types ORDER BY object_id)
TO
    '/tmp/objs.csv'
WITH
    DELIMITER ',' CSV HEADER;
sui_indexer_alt=# COPY
    (SELECT object_id, object_version, owner_id, coin_balance FROM sum_coin_balances ORDER BY object_id)
TO
    '/tmp/coins.csv'
WITH
    DELIMITER ',' CSV HEADER;

Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

## Description Although postgres supports bulk-updating rows using `VALUES`, diesel does not natively support it. This change adds support for this. It is in its own crate so that we can limit the fallout of depending on the diesel breaking changes feature (which we need to depend on to access the types that diesel converts collections of model types into, ready to be inserted into the DB). ## Test plan Unit tests for the query generation, and E2E tests for running updates on a DB with this new DSL: ``` sui$ cargo nextest run -p diesel-update-from ```

## Description Use object changes from transaction effects to figure out which changes to consistent tables correspond to new rows and which changes correspond to updates. This means we can avoid using `INSERT ... ON CONFLICT DO UPDATE` which requires postgres to try an insert, detect constraints, and then go for the update, which should hopefully improve performance. On the other hand, it means that these pipelines will not work at all if they are started at an arbitrary point in time (because the `UPDATE`-s will fail). ## Test plan Run the indexer before and after the change, dump the resulting tables and make sure the results are the same: ``` sui$ cargo run -p sui-indexer-alt -- generate-config > /tmp/indexer.toml sui$ cargo run -p sui-indexer-alt -- indexer \ --remote-store-url https://checkpoints.mainnet.sui.io \ --last-checkpoint 50000 --config /tmp/indexer.toml \ --pipeline sum_obj_types --pipeline sum_coin_balances sui$ psql postgres://postgres:postgrespw@localhost:5432/sui_indexer_alt sui_indexer_alt=# COPY (SELECT object_id, object_version, owner_kind, owner_id FROM sum_obj_types ORDER BY object_id) TO '/tmp/objs.csv' WITH DELIMITER ',' CSV HEADER; sui_indexer_alt=# COPY (SELECT object_id, object_version, owner_id, coin_balance FROM sum_coin_balances ORDER BY object_id) TO '/tmp/coins.csv' WITH DELIMITER ',' CSV HEADER; ```

## Description Add `#[diesel(treat_none_as_default_value = false)]` to model types that include optional fields. This affects how those fields are written out to SQL when they contain `None`. Previously (and by default), those fields would be represented by the keyword `DEFAULT VALUE`, and after this change, they will be represented by a parameter binding, which will be bound to `NULL`. This is semantically identical in our case, because we don't set default values, but it also results in less variety in prepared statements (because regardless of the content of fields, they will now all be represented by a binding), which will improve grouping of statistics per-statement, and could also improve performance, if those prepared statements can be cached and re-used. ## Test plan Re-run indexer on first 100000 checkpoints.

vercel · 2024-12-02T13:33:26Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
sui-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Dec 3, 2024 3:11pm

3 Skipped Deployments

Name	Status	Preview	Updated (UTC)
multisig-toolkit	⬜️ Ignored (Inspect)	Visit Preview	Dec 3, 2024 3:11pm
sui-kiosk	⬜️ Ignored (Inspect)	Visit Preview	Dec 3, 2024 3:11pm
sui-typescript-docs	⬜️ Ignored (Inspect)	Visit Preview	Dec 3, 2024 3:11pm

Avoid the dependency on `pq-sys`.

amnn · 2024-12-02T19:44:28Z

crates/diesel-update-from/src/grouped.rs

I actually think I don't need this anymore.

gegaowp

looks good overall, now I can relate more about the pain caused by diesel ..

gegaowp · 2024-12-03T16:24:11Z

crates/sui-indexer-alt/src/handlers/sum_coin_balances.rs

@@ -59,38 +60,49 @@ impl Processor for SumCoinBalances {
                }
            }

-            // Deleted and wrapped coins
+            // Do a fist pass to add updates without their associated contents into the `values`
+            // mapping, based on the transaction's object changes.
            for change in tx.effects.object_changes() {


TransactionEffectsAPI has methods of

fn created(&self) -> Vec<(ObjectRef, Owner)>; fn mutated(&self) -> Vec<(ObjectRef, Owner)>; fn unwrapped(&self) -> Vec<(ObjectRef, Owner)>; fn deleted(&self) -> Vec<ObjectRef>; fn unwrapped_then_deleted(&self) -> Vec<ObjectRef>; fn wrapped(&self) -> Vec<ObjectRef>;

it seems cleaner to use that instead of classifying object changes again?

gegaowp · 2024-12-03T16:25:14Z

crates/sui-indexer-alt/src/models/objects.rs

@@ -14,6 +14,7 @@ use crate::schema::{

 #[derive(Insertable, Debug, Clone, FieldCount)]
 #[diesel(table_name = kv_objects, primary_key(object_id, object_version))]
+#[diesel(treat_none_as_default_value = false)]


gegaowp · 2024-12-03T16:26:43Z

crates/diesel-update-from/src/tests.rs

+            },
+        ]);
+
+    assert_display_snapshot!(debug_query::<Pg, _>(&query), @r###"UPDATE "objects" SET "version" = excluded."version", "kind" = excluded."kind", "owner" = excluded."owner", "type_" = excluded."type_" FROM (VALUES ($1, $2, $3, $4, $5), ($6, $7, $8, $9, $10)) AS excluded ("object_id", "version", "kind", "owner", "type_") WHERE ("objects"."object_id" = excluded."object_id") -- binds: [[1, 2, 3], 1, 2, None, Some("type"), [4, 5, 6], 2, 3, Some([7, 8, 9]), None]"###);


amnn added 3 commits December 1, 2024 21:24

amnn requested review from lxfind, bmwill, emmazzz, gegaowp and wlmyng December 2, 2024 13:33

amnn self-assigned this Dec 2, 2024

amnn temporarily deployed to sui-typescript-aws-kms-test-env December 2, 2024 13:33 — with GitHub Actions Inactive

fixup: use diesel-async in diesel-update-from tests

3f2a757

Avoid the dependency on `pq-sys`.

amnn temporarily deployed to sui-typescript-aws-kms-test-env December 2, 2024 14:02 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 2, 2024 14:03 View deployment

yanganto mentioned this pull request Dec 2, 2024

Allow to call init and debug inner field in a test module #20389

Open

amnn mentioned this pull request Dec 2, 2024

[indexer-alt] Properly respect skip_watermark in sequential pipelines #20423

Open

8 tasks

amnn commented Dec 2, 2024

View reviewed changes

crates/diesel-update-from/src/grouped.rs Outdated

Copy link

Member Author

amnn Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think I don't need this anymore.

fixup: no need for grouped construct.

a5084b7

amnn temporarily deployed to sui-typescript-aws-kms-test-env December 3, 2024 15:11 — with GitHub Actions Inactive

vercel bot deployed to Preview – sui-docs December 3, 2024 15:11 View deployment

gegaowp reviewed Dec 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indexer-alt: separate updates for consistent sequential pipelines #20482

indexer-alt: separate updates for consistent sequential pipelines #20482

amnn commented Dec 2, 2024

vercel bot commented Dec 2, 2024 •

edited

Loading

amnn Dec 2, 2024

gegaowp left a comment

gegaowp Dec 3, 2024

gegaowp Dec 3, 2024

gegaowp Dec 3, 2024

indexer-alt: separate updates for consistent sequential pipelines #20482

Are you sure you want to change the base?

indexer-alt: separate updates for consistent sequential pipelines #20482

Conversation

amnn commented Dec 2, 2024

Description

Test plan

Release notes

vercel bot commented Dec 2, 2024 • edited Loading

amnn Dec 2, 2024

Choose a reason for hiding this comment

gegaowp left a comment

Choose a reason for hiding this comment

gegaowp Dec 3, 2024

Choose a reason for hiding this comment

gegaowp Dec 3, 2024

Choose a reason for hiding this comment

gegaowp Dec 3, 2024

Choose a reason for hiding this comment

vercel bot commented Dec 2, 2024 •

edited

Loading