(adapted from Step by step approach to perform data analysis in Python)
So you have decided to learn Python, but you don’t have prior programming experience. So you are confused on where to start, and how much Python to learn.
These are some of the common questions a beginner has while getting started with Python(for data centric application):
- “How long does it take to learn Python”
- “How much Python should I learn for performing data analysis”
- “What are the best books/courses to learn Python”
- “Should I be an expert Python programmer, in order to work with data sets”
It is good to be confused, while beginning to learn a new skill, that’s what author of “learn anything in 20 hours” says.
However the key word here is: Don’t Panic! This tutorial has been thought and designed to show you that
Most people have the misconception that for performing data analysis in Python requires to be proficient in Python programming.
Coding is fun, but you don't really need to be a coding ninja in Python to do data analysis.
What you just need to get started is some basics of (Python) programming and some very elementary software engineering concepts, just to avoid disasters when you go in production - whatever production means to you (e.g. deploy a system online, or share the code of your prototype or experiments on a public repo for reproducibility.)
In this tutorial, you won't learn how to program in Python. If you are looking for a quick tutorial on Python programming, maybe this is the tutorial for you: Python Programming Tutorial
For a glimpse on what to expect by this tutorial, I would suggest this 5 mins
reading:
5 amazingly powerful Python libraries for Data Science
(Most of) The materials in this tutorial will be provided as Jupyter Notebooks.
If you don't know what a Jupyter notebook is, or how to use it, please take a look at this quick introductory tour: IPython Notebook Beginner Guide.
For additional details and materials on Jupyter and IPython, here there are some other suggester readings:
- Jupyter Notebook the Definitive Guide:
- What is a Jupyter Notebook
- Practical Introduction
- Notebook Examples
The lecture materials is organised as it follows:
- Introduction to Jupyter and iPython notebook format notebook
- Introduction to
numpy
for numerical computation notebook - Data Representation in Machine Learning and
scipy.sparse
notebook - Dataset for Machine Learning:
pmlb
notebook - Introduction to
pandas
for data analysis notebook - Data Science case study:
- Introduction to Data Visualisation using
bokeh
notebook
If you want an introductory overview of Python for Data Science, I strongly recommend
Scipy Lecture Notes: a community driven project where you can find
tutorials (for non-experts) on the scientific Python ecosystems.
Additional Books for further readings: