Analytics Resource Hub

Welcome to our curated collection of AWS analytics resources. This repository serves as a comprehensive guide for individuals seeking to deepen their understanding of key analytics categories within the AWS ecosystem. Our analytics team has carefully selected these resources, which include blog posts, white papers, YouTube videos, and hands-on workshops, to provide a diverse range of learning materials suitable for various learning styles and expertise levels.

We have organized these materials into distinct analytics domains to facilitate easy navigation and targeted learning. Whether you're looking to build a data lake, implement real-time analytics, or explore the latest in AI-driven insights, you'll find relevant and up-to-date information here.

To navigate this repository, please use the table of contents below to find the specific area(s) of analytics you're interested in. We regularly update this collection to ensure it reflects the latest developments and best practices in AWS analytics services and solutions.

SageMaker Unified Studio

Amazon SageMaker Platform - YouTube Playlist
Introducing the next generation of Amazon SageMaker: The center for all your data, analytics, and AI - Blog

Data Engineering and Integration

Data Engineering Immersion Day - Workshop
Simplify AWS Glue job orchestration and monitoring with Amazon MWAA - Blog
Create a modern data platform using the Data Build Tool dbt in the AWS Cloud - Blog
Amazon MWAA for Analytics - Workshop
Choosing the Right Orchestration Service for Your Data Pipeline - Community Article
Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog - Blog
Build incremental data pipelines to load transactional data changes using AWS DMS Delta 2 0 and Amazon EMR Serverless - Blog
Monitoring & Troubleshooting for AWS Glue - YouTube Video

Data Lakes & Open Table Formats

Choosing an Open Table Format for Your Transactional Data Lake on AWS - Blog
Use Apache Iceberg in a Data Lake to Support Incremental Data Processing - Blog
Simplify Operational Data Processing in Data Lakes Using AWS Glue and Apache Hudi - Blog
Get Started Managing Partitions for Amazon S3 Tables Backed by the AWS Glue Data Catalog - Blog
Create a Transactional Data Lake with Apache Iceberg - Workshop
Automated Optimization for Apache Iceberg Tables Using AWS Glue Data Catalog - YouTube Video
Modern Data Architecture with AWS Lake Formation - Whitepaper

Data Governance & Security

Enhance data governance with enforced metadata rules in Amazon DataZone - Blog
Data Governance Master Class - Guide
Data Governance in the Age of Generative AI - Blog
Automated Data Governance with AWS Glue Data Quality, Sensitive Data Detection, and AWS Lake Formation - Blog
Apply Enterprise Data Governance and Management Using AWS Lake Formation and AWS IAM Identity Center - Blog
Data Governance on AWS - Workshop
Amazon DataZone - YouTube Playlist

Data Warehousing

Unleash deeper insights with Amazon Redshift data sharing for data lake tables - Blog
Using Amazon Redshift with DBT and MWAA - Workshop
Amazon RDS for MySQL zero-ETL integration with Amazon Redshift Demo - YouTube Video
Amazon Redshift Reference Architectures Powering Customer Success - Guide
Connect and Analyze all your Data with Zero-ETL Approaches - Workshop
Getting started guide for near real time operational analytics using Amazon Aurora zero ETL integration with Amazon Redshift - Blog
Amazon RedShift Deep Dive - Workshop

Real-time Analytics and Streaming

A side by side comparison of Apache Spark and Apache Flink for Common Streaming Use Cases - Blog
Exploring Real-Time Streaming for Generative AI Applications - Blog
Real-time Event Driven Decisions with Amazon Redshift Streaming and Amazon MSK - Workshop
Real-time Event Driven Decisions with Amazon MSK and Amazon Redshift Streaming - Workshop
Best Practices for Running Production Workloads using Amazon MSK Tiered Storage - Blog
Streaming Data Solution for Amazon Kinesis - AWS Solution
Build a Real-Time Streaming Analytics Application on Apache Kafka - Community Article
Proactively addressing customer concern in real-time with GenAI, Flink, Kafka, and Kinesis - Workshop
Spark Streaming examples with EMR - Self-Paced Labs

Big Data Analytics

EMR Best practices guide and benchmarks - GitHub Repository
EMR on EKS - YouTube Playlist
EMR Serverless Samples - GitHub QuickStart
Big Data Analytics Options on AWS - Whitepaper
Orchestrate Amazon EMR Serverless jobs with AWS Step functions - Blog

Search and Discovery

New Amazon CloudWatch and Amazon OpenSearch Service launch an integrated analytics experience - Blog
Dive into Amazon OpenSearch Service - Workshop
Building Retrieval Augmented Generation (RAG) Workflows with Amazon OpenSearch Service - SkillBuilder Self-Paced Labs
Amazon OpenSearch Service Vector Database Capabilities Explained - Blog
Try Semantic Search with the Amazon OpenSearch Service Vector Engine - Blog
Operational Best Practices for Amazon OpenSearch Service - Documentation
Unlock Your Enterprise Data with Intelligent Document Search - Workshop
Amazon OpenSearch Service - YouTube Channel

Data Democratization and Marketplaces

Develop a business chargeback model within your organization using Amazon Redshift multi-warehouse writes - Blog
Demystify Data Sharing and Collaboration Patterns on AWS - Blog
Build and govern your data mesh with Amazon DataZone - Workshop
Set up Cross-Account AWS Glue Data Catalog Access - Blog
Introducing Data Products in Amazon DataZone - Blog
Guidance for Deduplicating Syndicated Data on AWS - AWS Solution
Simplifying workflows with Data Products in Amazon DataZone - YouTube Video
AWS Data Exchange - Workshop

Data for Generative AI

Unlocking the Value of Data as your Differentiator - Blog
10 Tips for Building a Data Foundation for Generative AI - Guide
Data Foundation for Generative AI on AWS Vector Databases - Workshop
Build scalable and serverless RAG workflows with a vector engine for Amazon OpenSearch Serverless and Amazon Bedrock Claude models - Blog
Data Governance in the Age of Generative AI - Blog
Differentiate generative AI applications with your data using AWS analytics and managed databases - Blog
Build up-to-date generative AI applications with real-time vector embedding blueprints for Amazon MSK Blog

Cost Optimization for Analytics

Speed up queries with the cost-based optimizer in Amazon Athena - Blog
Query big data with resilience using Trino in Amazon EMR with Amazon EC2 Spot Instances for less cost - Blog
Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0 - Blog
Optimize storage costs in Amazon OpenSearch Service using Zstandard compression - Blog
EMR on EKS Best Practice for Cost Performance - GitHub Repository
Monitor and optimize cost on AWS Glue for Apache Spark - Blog
Amazon EMR Cost Optimization - Workshop
Optimize Spark data pipeline on Amazon EKS - Workshop
Optimizing Amazon EMR clusters for cost and scale - Hands-On Lab

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analytics Resource Hub

Table of Contents

SageMaker Unified Studio

Data Engineering and Integration

Data Lakes & Open Table Formats

Data Governance & Security

Data Warehousing

Real-time Analytics and Streaming

Big Data Analytics

Search and Discovery

Data Democratization and Marketplaces

Data for Generative AI

Cost Optimization for Analytics

Security

License

About

Releases

Packages

Contributors 2

License

aws-samples/analytics-resource-hub

Folders and files

Latest commit

History

Repository files navigation

Analytics Resource Hub

Table of Contents

SageMaker Unified Studio

Data Engineering and Integration

Data Lakes & Open Table Formats

Data Governance & Security

Data Warehousing

Real-time Analytics and Streaming

Big Data Analytics

Search and Discovery

Data Democratization and Marketplaces

Data for Generative AI

Cost Optimization for Analytics

Security

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages