Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support window functions in Cube Store #8932

Open
ChloeBellm opened this issue Nov 8, 2024 · 4 comments
Open

Support window functions in Cube Store #8932

ChloeBellm opened this issue Nov 8, 2024 · 4 comments
Assignees
Labels
cube store Issues relating to Cube Store enhancement New feature proposal pre-aggregations Issues related to pre-aggregations

Comments

@ChloeBellm
Copy link

ChloeBellm commented Nov 8, 2024

Describe the bug
I think this is a bug based on the error message but if this isn't supported would love to hear workaround suggestions as this is currently preventing us from using pre aggregations.

I'm expecting to create a pre aggregation with column A. I then want to query column B, which is a measure based on a window function on A but get the following error:

Internal: Internal error: unsupported operation. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

The use case for this is to get a "totals" row, so my window function is just sum(sum(A)) over ()

To Reproduce
Steps to reproduce the behavior:

  1. Create cube with columns A and B, where B is a measure defined as:
total: {
          sql: (CUBE) => `sum(${CUBE[`A`]}) over ()`,
          type: `number`,
          title: `A Total`,
}
  1. Create a rollup pre aggregation with A as the only measure, no dimensions
  2. Try querying B, and it generates the following SQL:
SELECT
  sum(`my_cube__A`) `my_cube__A`,
  sum(sum(`my_cube__A`)) over () `my_cube__B`
FROM
  prod_pre_aggregations.my_cube_rollup AS `my_cube__rollup`

This is correct but throws the error above.

Expected behavior
I expected it to run the SQL query above and give me a total column as column B.

Example expected output:
A | B
1 | 12
3 | 12
8 | 12

Where B is the total of everything in column A. We should assume filters have been applied, which is why this needs to be calculated after pre aggregations created.

Screenshots
Error on playground:

Screenshot 2024-11-08 at 11 29 51

Minimally reproducible Cube Schema

cube(`Orders`, {
  sql: `
  select 1 as id, 100 as amount, 'new' status
  UNION ALL
  select 2 as id, 200 as amount, 'new' status
  UNION ALL
  select 3 as id, 300 as amount, 'processed' status
  UNION ALL
  select 4 as id, 500 as amount, 'processed' status
  UNION ALL
  select 5 as id, 600 as amount, 'shipped' status
  `,
  measures: {
    totalAmount: {
      sql: `amount`,
      type: `sum`,
    },
    grandTotalAmount: {
          sql: `sum(${CUBE[`totalAmount`]}) over ()`,
          type: `number`,
    },
  },
  dimensions: {
    status: {
      sql: `status`,
      type: `string`,
    },
  },
});

Version:
0.36.7

Additional context
If there are any other suggestions as to how to do totals (and sub totals) we would love to hear these too!

@igorlukanin igorlukanin added question The issue is a question. Please use Stack Overflow for questions. data modeling labels Nov 8, 2024
@igorlukanin
Copy link
Member

Thanks for a very elaborate question @ChloeBellm 🙌

A couple of points here:

  • Currently, Cube Store (pre-aggregations) does not support window functions.
  • In principle, calculating totals/subtotals should be unlocked by the multi-stage calculations feature that is on the roadmap (Multi-stage calculations #8486).
  • For now, the recommended approach to calculate totals/subtotals is to run a granular query and also run a query that has less dimensions. Naturally, the results of the second query will be subtotals/totals for the first one. Then you can combine the results on the presentation side. And both queries will be accelerated by pre-aggregations without any issues.

I hope this helps.

@igorlukanin igorlukanin self-assigned this Nov 8, 2024
@ChloeBellm
Copy link
Author

Thanks @igorlukanin! Good to hear this is on the roadmap.

How would running a query with fewer dimensions work if we want to apply measure filters at the dimension breakdown the user has selected?

For example, if we have some data:
A | B
apple | 20
orange | 30

Where A is a dimension, and B is a "sum" measure.

First query asks for dimension A and measure B with a filter on B > 25 and this returns just orange. The total we'd like to show is therefore 30.
The second "totals" query would just ask for B with the same filter for B > 25? If we did this the filter would perform a check of 50 > 25 which is true so it would return 50 as the total, which is not correct.

Please let me know if I should open a new issue for this question.

@igorlukanin
Copy link
Member

@ChloeBellm Oh, I see now. You have a measure filter, and this basically renders my "less dimensions" workaround useless.

How do you plat to consume the data? It looks like you might need to calculate totals on the client side then—until we get either multi-stage cals or window functions support in Cube Store.

@igorlukanin igorlukanin changed the title Window functions error with pre aggregations Support window functions in Cube Store Nov 8, 2024
@igorlukanin igorlukanin added enhancement New feature proposal cube store Issues relating to Cube Store pre-aggregations Issues related to pre-aggregations and removed question The issue is a question. Please use Stack Overflow for questions. data modeling labels Nov 8, 2024
@ChloeBellm
Copy link
Author

@igorlukanin we want to show users a table of data and a totals row.
The main challenge is with calculated measures as we have measures that are one dimension divided by the other, for example, so to calculate this client side we'd have to specify how those calculations should work. We had actually done this previously but we have many metrics and defining the logic for those in two places is not ideal!
We also use Cube's pagination which means the total in the data returned wouldn't necessarily be the whole total.
Another idea I had was to join a separate Cube with FILTER PARAMS, but I'm not sure I can see how this would work with pre aggregations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cube store Issues relating to Cube Store enhancement New feature proposal pre-aggregations Issues related to pre-aggregations
Projects
None yet
Development

No branches or pull requests

2 participants