title: Support Workshops for Digital Literacy and Curriculum F20
place: online
time: Oct 23, Oct. 30, Nov. 12, Nov. 19, Dec. 4
instructors: K.G. Kjelmann, R.D. Kristensen-McLachlan, M. Jacomy, & K.L. Nielbo
Support Workshops
are optional hands-on introductions to a technical topic (e.g., data wrangling, web-scraping, machine learning) offered exclusively to course participants on Digital Literacy and Digital Curriculum. Participation is optional and there is no sign-up, just drop, when a workshop begins. Every workshop requires approximately one hour preparation (see below), but tech support
offers a pre-workshop preparation one hour before each workshop (e.g., a workshop that starts at 09:00 AM, will offer online pre-workshop preparation at 08:00 AM using the same Zoom room). Zoom links are distributed through Slack and mailing list for participants. For questions, please contact M. Andesen.
As several of the instructors are are trained Software and Data Carpenters, we re-use material from The Carpentries' workshop curricula and, to a lesser extend CodeRefinery's lesson material.
Support Workshops
will continue in S21, if you have any requests for content or suggestions please write K.L. Nielbo. Planned topics for S21 are Machine Learning with Python #2 and Reproducible Coding with Python.
Date | Time | Content | Instructor |
---|---|---|---|
Oct. 23 | 09:00-12:00 | Managing Data with OpenRefine | K.G. Kjelmann |
Oct. 30 | 09:00-12:00 | Basic Scripting with Python* #1 | R.D. Kristensen-McLachlan & K.L. Nielbo |
Nov. 12 | 09:00-12:00 | Basic Scripting with Python* #2 | R.D. Kristensen-McLachlan & K.L. Nielbo |
Nov. 19 | 12:00-15:00 | Introduction to Web-Scraping | M. Jacomy & K.G. Kjelmannm |
Nov. 04 | 09:00-12:00 | Machine Learning with Python* #1 | R.D. Kristensen-McLachlan & K.L. Nielbo |
*) To accommodate R users, tech support
will provide parallel scripts in R.
A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identified and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis. OpenRefine is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another.
This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.
Preparation
: install OpenRefine, first DOWNLOAD then follow these installation instructions INSTALL. Tech support
will provide support for installation of OpenRefine on Oct. 23 08:00-09:00 AM.
The workshop is cloned and modified from OpenRefine for Social Science Data.
The workshop (consists of two episodes) introduces how researchers can use basic scripting in Python (and R) to manipulate data, automate analysis, and make research pipelines reproducible. To goal is to provide tools that make it easier to get more done with less work, while, at the the same time,facilitate open and reproducible science.
The lessons will teach you how to use variables, data structures, control structures, functions, and error handling in Python. We use Jupyter Notebooks to interactively run Python code in the browser.
Preparation
: Jupyter is offered in the Cloud by tech support
, but should you want to install it locally, please download and install the individial Anaconda Distribution and obtain lesson material and follow Option A: Jupyter Notebook.
Episode | Content | Instructor |
---|---|---|
A Python Calculator | Basic data type | R.D. Kristensen-McLachlan & K.L. Nielbo |
Variable assignment | ||
Analyzing Tabular Data | Why tabular data? | R.D. Kristensen-McLachlan & K.L. Nielbo |
Process tabular data | ||
Visualizing Tabular Data | Basic visualization | R.D. Kristensen-McLachlan & K.L. Nielbo |
Group visualizations | ||
Repeating Operations | Computers love repetitions | R.D. Kristensen-McLachlan & K.L. Nielbo |
applying same operation on different values | ||
Collections of Values | How to bundle values | R.D. Kristensen-McLachlan & K.L. Nielbo |
list operations | ||
Analyzing Multiple Files | Read many files | R.D. Kristensen-McLachlan & K.L. Nielbo |
Visualize many files | ||
Logical Conditions | Making choices | R.D. Kristensen-McLachlan & K.L. Nielbo |
Boolean operators | ||
Packaging Code in functions | Function definition | R.D. Kristensen-McLachlan & K.L. Nielbo |
Defining vs. calling | ||
Errors and Exceptions | Is Python rude? | R.D. Kristensen-McLachlan & K.L. Nielbo |
error handing |
The workshop introduces to basic internet technology and how to automatically query and extract data available through the internet using Beautiful Soup and Hyphe:
Episode | Content | Instructor |
---|---|---|
How does the internet work? | The structure of internet and the web: IP, DNS, browser, HTML... | M. Jacomy |
What you need to know as a scholar | ||
Accessing the internet with Python | Making a HTTP request | K.G. Kjelmann |
Downloading basic data | ||
Parsing HTML with Beautiful Soup in Python | Dealing with web data | K.G. Kjelmannm |
Writing a simple script | ||
Web Crawlers | Differences between scraping and crawling | M. Jacomy & K.G. Kjelmann |
Different tools for different needs (harvesting, exploring, archiving...) | ||
An example with the crawler Hyphe | ||
Working with the internet | Methodological, ethical and legal considerations | M. Jacomy & K.G. Kjelmann |
Support Holiday special is an introduction to machine learning with Python (and R). Throughout Digital Literacy and Curriculum, we have seen examples of how AI, machine learning and deep learning can accelerate (& automate) research tasks in humanities and social science. Now it is time to get our hands dirty! The workshop focuses on text classification in classical machine learning using scikit-learn in Python. At the end, we touch extend the usecase to deep neural networks and image classification.
Episode | Content | Instructor |
---|---|---|
Data preparation | feature engineering | K.L. Nielbo |
train-test split | ||
Model definition | task definition | K.L. Nielbo |
Parameters vs. hyperparameters | ||
Model training | training goal | K.L. Nielbo |
training steps | ||
Model evaluation | performance metrics | K.L. Nielbo |
validation procedures | ||
Parameter tuning | hyperparamter tuning | K.L. Nielbo |
artform vs science | ||
Application | predictive vs. statistical modeling | K.L. Nielbo |