Skip to content

Commit

Permalink
let's get started!
Browse files Browse the repository at this point in the history
  • Loading branch information
AnalyticJeremy authored Apr 14, 2023
1 parent 2646923 commit 9543848
Showing 1 changed file with 25 additions and 2 deletions.
27 changes: 25 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,25 @@
# adbx_vm_analysis
Databricks VM Usage Analysis
# Azure Databricks VM Usage Analysis

Azure Databricks customers should be using [instance pools](https://learn.microsoft.com/en-us/azure/databricks/clusters/pool-best-practices)
for their production workloads. These instance pools will help your jobs run faster (because you don't have to wait for
VM's to spin up) and will make your workload more resilient (because you won't get "Cloud Provisioning" errors).

One common challenge to creating instance pools is knowing how large to make them. Customers may have multiple production workspaces, each
with numerous jobs running at a variety of intervals. Determining the right size for the pools can require complex analysis.

... and that's why I created this tool! It's an accelerator that you can run in your environment to determine your VM usage patterns.
This will give you the insights you need to choose the size of each of your instance pools.

This tool has two phases:

1. **Data Acquisition** - use the Azure Activity Logs to gather information about VM creation and deletion over the past few days
1. **Data Analysis** - analyze the VM usage patterns to determine the most efficient size for your pools

## Setup
TODO: How to set up this accelerator

## Phase 1: Data Acquisition
TODO: Running the first notebook

## Phase 2: Data Analysis
TODO: How to run the second notebook

0 comments on commit 9543848

Please sign in to comment.