Welcome to our curated collection of AWS analytics resources. This repository serves as a comprehensive guide for individuals seeking to deepen their understanding of key analytics categories within the AWS ecosystem. Our analytics team has carefully selected these resources, which include blog posts, white papers, YouTube videos, and hands-on workshops, to provide a diverse range of learning materials suitable for various learning styles and expertise levels.
We have organized these materials into distinct analytics domains to facilitate easy navigation and targeted learning. Whether you're looking to build a data lake, implement real-time analytics, or explore the latest in AI-driven insights, you'll find relevant and up-to-date information here.
To navigate this repository, please use the table of contents below to find the specific area(s) of analytics you're interested in. We regularly update this collection to ensure it reflects the latest developments and best practices in AWS analytics services and solutions.
- SageMaker Unified Studio
- Data Engineering and Integration
- Data Lakes and Lake Houses
- Data Governance & Security
- Data Warehousing
- Real-time Analytics and Streaming
- Open Source Analytics
- Search and Discovery
- Data Democratization and Marketplaces
- Data for Generative AI
- Cost Optimization for Analytics
- Amazon SageMaker Platform - YouTube Playlist
- Introducing the next generation of Amazon SageMaker: The center for all your data, analytics, and AI - Blog
- Data Engineering Immersion Day - Workshop
- Simplify AWS Glue job orchestration and monitoring with Amazon MWAA - Blog
- Create a modern data platform using the Data Build Tool dbt in the AWS Cloud - Blog
- Amazon MWAA for Analytics - Workshop
- Choosing the Right Orchestration Service for Your Data Pipeline - Community Article
- Getting started with AWS Glue Data Quality from the AWS Glue Data Catalog - Blog
- Build incremental data pipelines to load transactional data changes using AWS DMS Delta 2 0 and Amazon EMR Serverless - Blog
- Monitoring & Troubleshooting for AWS Glue - YouTube Video
- Choosing an Open Table Format for Your Transactional Data Lake on AWS - Blog
- Use Apache Iceberg in a Data Lake to Support Incremental Data Processing - Blog
- Simplify Operational Data Processing in Data Lakes Using AWS Glue and Apache Hudi - Blog
- Get Started Managing Partitions for Amazon S3 Tables Backed by the AWS Glue Data Catalog - Blog
- Create a Transactional Data Lake with Apache Iceberg - Workshop
- Automated Optimization for Apache Iceberg Tables Using AWS Glue Data Catalog - YouTube Video
- Modern Data Architecture with AWS Lake Formation - Whitepaper
- Enhance data governance with enforced metadata rules in Amazon DataZone - Blog
- Data Governance Master Class - Guide
- Data Governance in the Age of Generative AI - Blog
- Automated Data Governance with AWS Glue Data Quality, Sensitive Data Detection, and AWS Lake Formation - Blog
- Apply Enterprise Data Governance and Management Using AWS Lake Formation and AWS IAM Identity Center - Blog
- Data Governance on AWS - Workshop
- Amazon DataZone - YouTube Playlist
- Unleash deeper insights with Amazon Redshift data sharing for data lake tables - Blog
- Using Amazon Redshift with DBT and MWAA - Workshop
- Amazon RDS for MySQL zero-ETL integration with Amazon Redshift Demo - YouTube Video
- Amazon Redshift Reference Architectures Powering Customer Success - Guide
- Connect and Analyze all your Data with Zero-ETL Approaches - Workshop
- Getting started guide for near real time operational analytics using Amazon Aurora zero ETL integration with Amazon Redshift - Blog
- Amazon RedShift Deep Dive - Workshop
- A side by side comparison of Apache Spark and Apache Flink for Common Streaming Use Cases - Blog
- Exploring Real-Time Streaming for Generative AI Applications - Blog
- Real-time Event Driven Decisions with Amazon Redshift Streaming and Amazon MSK - Workshop
- Real-time Event Driven Decisions with Amazon MSK and Amazon Redshift Streaming - Workshop
- Best Practices for Running Production Workloads using Amazon MSK Tiered Storage - Blog
- Streaming Data Solution for Amazon Kinesis - AWS Solution
- Build a Real-Time Streaming Analytics Application on Apache Kafka - Community Article
- Proactively addressing customer concern in real-time with GenAI, Flink, Kafka, and Kinesis - Workshop
- Spark Streaming examples with EMR - Self-Paced Labs
- EMR Best practices guide and benchmarks - GitHub Repository
- EMR on EKS - YouTube Playlist
- EMR Serverless Samples - GitHub QuickStart
- Big Data Analytics Options on AWS - Whitepaper
- Orchestrate Amazon EMR Serverless jobs with AWS Step functions - Blog
- New Amazon CloudWatch and Amazon OpenSearch Service launch an integrated analytics experience - Blog
- Dive into Amazon OpenSearch Service - Workshop
- Building Retrieval Augmented Generation (RAG) Workflows with Amazon OpenSearch Service - SkillBuilder Self-Paced Labs
- Amazon OpenSearch Service Vector Database Capabilities Explained - Blog
- Try Semantic Search with the Amazon OpenSearch Service Vector Engine - Blog
- Operational Best Practices for Amazon OpenSearch Service - Documentation
- Unlock Your Enterprise Data with Intelligent Document Search - Workshop
- Amazon OpenSearch Service - YouTube Channel
- Develop a business chargeback model within your organization using Amazon Redshift multi-warehouse writes - Blog
- Demystify Data Sharing and Collaboration Patterns on AWS - Blog
- Build and govern your data mesh with Amazon DataZone - Workshop
- Set up Cross-Account AWS Glue Data Catalog Access - Blog
- Introducing Data Products in Amazon DataZone - Blog
- Guidance for Deduplicating Syndicated Data on AWS - AWS Solution
- Simplifying workflows with Data Products in Amazon DataZone - YouTube Video
- AWS Data Exchange - Workshop
- Unlocking the Value of Data as your Differentiator - Blog
- 10 Tips for Building a Data Foundation for Generative AI - Guide
- Data Foundation for Generative AI on AWS Vector Databases - Workshop
- Build scalable and serverless RAG workflows with a vector engine for Amazon OpenSearch Serverless and Amazon Bedrock Claude models - Blog
- Data Governance in the Age of Generative AI - Blog
- Differentiate generative AI applications with your data using AWS analytics and managed databases - Blog
- Build up-to-date generative AI applications with real-time vector embedding blueprints for Amazon MSK Blog
- Speed up queries with the cost-based optimizer in Amazon Athena - Blog
- Query big data with resilience using Trino in Amazon EMR with Amazon EC2 Spot Instances for less cost - Blog
- Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0 - Blog
- Optimize storage costs in Amazon OpenSearch Service using Zstandard compression - Blog
- EMR on EKS Best Practice for Cost Performance - GitHub Repository
- Monitor and optimize cost on AWS Glue for Apache Spark - Blog
- Amazon EMR Cost Optimization - Workshop
- Optimize Spark data pipeline on Amazon EKS - Workshop
- Optimizing Amazon EMR clusters for cost and scale - Hands-On Lab
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.