Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WITHIN GROUP query syntax #433

Closed
samuelcolvin opened this issue Sep 18, 2024 · 7 comments
Closed

WITHIN GROUP query syntax #433

samuelcolvin opened this issue Sep 18, 2024 · 7 comments

Comments

@samuelcolvin
Copy link
Member

samuelcolvin commented Sep 18, 2024

@@frankie567:

I just tried with this new database engine and it's clearly day-and-night in terms of performance. We struggled before to load our dashboards for timespans > 3 hours; now it's able to load 30 days span pretty quickly. Really great move 👏

I've an existing query that's no longer compatible. The goal was to get request durations percentiles. It looks like this:

WITH duration AS (
  SELECT
    time_bucket('%time_bucket_duration%', start_timestamp) AS x,
    extract(milliseconds from end_timestamp - start_timestamp) as duration
  FROM records
  WHERE otel_scope_name = 'opentelemetry.instrumentation.asgi' and parent_span_id is null
)
SELECT
  x,
  percentile_cont(0.99) WITHIN GROUP (ORDER BY duration) as percentile_99,
  percentile_cont(0.95) WITHIN GROUP (ORDER BY duration) as percentile_95
FROM duration
GROUP BY x
ORDER BY x DESC;

But DataFusion states it doesn't support WITHIN GROUP:

Capture d’écran 2024-09-18 à 17 19 35

I'm pretty sure there is another way to achieve that query but don't have time right now to explore this ☺️

Originally posted by @frankie567 in #408 (comment)

@samuelcolvin
Copy link
Member Author

Thanks @frankie567 we'll look into this, I think @alexmojaki has something similar working until we can support that syntax in datafusion.

@samuelcolvin
Copy link
Member Author

I've created apache/datafusion#12533, and commented on apache/datafusion#11732. Once I get some guidance on next steps, I'll try to work on it.

@jules-ch
Copy link
Contributor

Same here I have a query using WITHIN thet fails, here getting slowest SQL ordering by P90

WITH query_durations AS (
    SELECT 
        attributes->>'db.statement' as query, 
        EXTRACT(EPOCH FROM (end_timestamp - start_timestamp)) * 1000 AS duration_ms 
    FROM 
        records
    WHERE 
        attributes->>'db.system' = 'influxdb'
        AND service_name = 'server'
),
p90_durations AS (
    SELECT 
        query, 
        percentile_cont(0.90) WITHIN GROUP (ORDER BY duration_ms) AS p90_duration_ms
    FROM 
        query_durations
    GROUP BY 
        query
)
SELECT 
    query, 
    p90_duration_ms
FROM 
    p90_durations
ORDER BY 
    p90_duration_ms DESC
LIMIT 10;

@alexmojaki
Copy link
Contributor

I've updated the query in https://logfire.pydantic.dev/docs/integrations/use-cases/web-frameworks/#query-http-requests-duration-per-percentile:

WITH dataset AS (
  SELECT
    time_bucket('%time_bucket_duration%', start_timestamp) AS x,
    (extract(ms from end_timestamp - start_timestamp)) as duration_ms
  FROM records
  WHERE attributes ? 'http.method'
)
SELECT
  x,
  approx_percentile_cont(duration_ms, 0.50) as percentile_50,
  approx_percentile_cont(duration_ms, 0.90) as percentile_90,
  approx_percentile_cont(duration_ms, 0.95) as percentile_95,
  approx_percentile_cont(duration_ms, 0.99) as percentile_99
FROM dataset
GROUP BY x
ORDER BY x

This works with the new database and doesn't need WITHIN GROUP at all.

@jules-ch
Copy link
Contributor

Yeah I managed to find a workaround with datafusion fucntions

@frankie567
Copy link
Contributor

Works perfectly well, thank you @alexmojaki 🙏

@alexmojaki
Copy link
Contributor

OK, closing this because I think the main problem was the docs recommending a query that didn't work with datafusion. With time this syntax may be supported but it's not really a logfire issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants