-
Notifications
You must be signed in to change notification settings - Fork 251
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #6359 from EnterpriseDB/release/2024-12-18a
Release: 2024-12-18a
- Loading branch information
Showing
17 changed files
with
488 additions
and
351 deletions.
There are no files selected for viewing
66 changes: 66 additions & 0 deletions
66
advocacy_docs/edb-postgres-ai/analytics/external_tables.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
--- | ||
title: Querying Delta Lake Tables in S3-compatible object storage | ||
navTitle: External Tables | ||
description: Access and Query data stored as Delta Lake Tablles in S3-compatible object storage using External Tables | ||
deepToC: true | ||
--- | ||
|
||
## Overview | ||
|
||
External tables allow you to access and query data stored in S3-compatible object storage using SQL. You can create an external table that references data in S3-compatible object storage and query the data using standard SQL commands. | ||
|
||
## Prerequisites | ||
|
||
* An EDB Postgres AI account and a Lakehouse node. | ||
* An S3-compatible object storage location with data stored as Delta Lake Tables. | ||
* See [Bringing your own data](reference/loadingdata) for more information on how to prepare your data. | ||
* Credentials to access the S3-compatible object storage location, unless it is a public bucket. | ||
* These credentials will be stored within the database. We recommend creating a separate user with limited permissions for this purpose. | ||
|
||
!!! Note Regions, latency and cost | ||
Using an S3 bucket that isn't in the same region as your node will | ||
|
||
* be slow because of cross-region latencies | ||
* will incur AWS costs (between $0.01 and $0.02 / GB) for data transfer. Currently these egress costs are not passed through to you but we do track them and reserve the right to terminate an instance. | ||
!!! | ||
|
||
## Creating an External Storage Location | ||
|
||
The first step is to create an external storage location which references S3-compatible object storage where your data resides. A storage location is an object within the database which you refer to to access the data; each storage location has a name for this purpose. | ||
|
||
Creating a named storage location is performed with SQL by executing the `pgaa.create_storage_location` function. | ||
`pgaa` is the name of the extension and namespace that provides the functionality to query external storage locations. | ||
The `create_storage_location` function takes a name for the new storage location, and the URI of the S3-compatible object storage location as parameters. | ||
The function optionally can take a third parameter, `options`, which is a JSON object for specifying optional settings, detailed in the [functions reference](reference/functions#pgaacreate_storage_location). | ||
For example, in the options, you can specify the access key ID and secret access key for the storage location to enable access to a private bucket. | ||
|
||
The following example creates an external table that references a public S3-compatible object storage location: | ||
|
||
```sql | ||
SELECT pgaa.create_storage_location('sample-data', 's3://pgaa-sample-data-eu-west-1'); | ||
``` | ||
|
||
The next example creates an external storage location that references a private S3-compatible object storage location: | ||
|
||
```sql | ||
SELECT pgaa.create_storage_location('private-data', 's3://my-private-bucket', '{"access_key_id": "my-access-key-id","secret_access_key": "my-secret-access-key"}'); | ||
``` | ||
|
||
## Creating an External Table | ||
|
||
After creating the external storage location, you can create an external table that references the data in the storage location. | ||
The following example creates an external table that references a Delta Lake Table in the S3-compatible object storage location: | ||
|
||
```sql | ||
CREATE TABLE public.customer () USING PGAA WITH (pgaa.storage_location = 'sample-data', pgaa.path = 'tpch_sf_1/customer'); | ||
``` | ||
|
||
Note that the schema is not defined in the `CREATE TABLE` statement. The pgaa extension expects the schema to be defined in the storage location, and the schema itself is derived from the schema stored at the path specified in the `pgaa.path` option. The pgaa extension will infer the best Postgres-equivalent data types for the columns in the Delta Table. | ||
|
||
## Querying an External Table | ||
|
||
After creating the external table, you can query the data in the external table using standard SQL commands. The following example queries the external table created in the previous step: | ||
|
||
```sql | ||
SELECT COUNT(*) FROM public.customer; | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
6d3e3ec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π Published on https://edb-docs.netlify.app as production
π Deployed on https://67630f38d98e3b3e566c5273--edb-docs.netlify.app
6d3e3ec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π Published on https://edb-docs-staging.netlify.app as production
π Deployed on https://6763104c615123435cd39ee7--edb-docs-staging.netlify.app