GitHub - pbvahlst/support-workshops-f20

title: Support Workshops for Digital Literacy and Curriculum F20
place: online
time: Oct 23, Oct. 30, Nov. 12, Nov. 19, Dec. 4
instructors: K.G. Kjelmann, R.D. Kristensen-McLachlan, M. Jacomy, & K.L. Nielbo

Support Workshops are optional hands-on introductions to a technical topic (e.g., data wrangling, web-scraping, machine learning) offered exclusively to course participants on Digital Literacy and Digital Curriculum. Participation is optional and there is no sign-up, just drop, when a workshop begins. Every workshop requires approximately one hour preparation (see below), but tech support offers a pre-workshop preparation one hour before each workshop (e.g., a workshop that starts at 09:00 AM, will offer online pre-workshop preparation at 08:00 AM using the same Zoom room). Zoom links are distributed through Slack and mailing list for participants. For questions, please contact M. Andesen.

As several of the instructors are are trained Software and Data Carpenters, we re-use material from The Carpentries' workshop curricula and, to a lesser extend CodeRefinery's lesson material.

Support Workshops will continue in S21, if you have any requests for content or suggestions please write K.L. Nielbo. Planned topics for S21 are Machine Learning with Python #2 and Reproducible Coding with Python.

Workshops at a Glance

Date	Time	Content	Instructor
Oct. 23	09:00-12:00	Managing Data with OpenRefine	K.G. Kjelmann
Oct. 30	09:00-12:00	Basic Scripting with Python* #1	R.D. Kristensen-McLachlan & K.L. Nielbo
Nov. 12	09:00-12:00	Basic Scripting with Python* #2	R.D. Kristensen-McLachlan & K.L. Nielbo
Nov. 19	12:00-15:00	Introduction to Web-Scraping	M. Jacomy & K.G. Kjelmannm
Nov. 04	09:00-12:00	Machine Learning with Python* #1	R.D. Kristensen-McLachlan & K.L. Nielbo

*) To accommodate R users, tech support will provide parallel scripts in R.

Managing Data with OpenRefine

A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identified and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis. OpenRefine is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another.

This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.

Preparation: install OpenRefine, first DOWNLOAD then follow these installation instructions INSTALL. Tech support will provide support for installation of OpenRefine on Oct. 23 08:00-09:00 AM.

The workshop is cloned and modified from OpenRefine for Social Science Data.

Basic Scripting with Python

The workshop (consists of two episodes) introduces how researchers can use basic scripting in Python (and R) to manipulate data, automate analysis, and make research pipelines reproducible. To goal is to provide tools that make it easier to get more done with less work, while, at the the same time,facilitate open and reproducible science.

The lessons will teach you how to use variables, data structures, control structures, functions, and error handling in Python. We use Jupyter Notebooks to interactively run Python code in the browser.

Preparation: Jupyter is offered in the Cloud by tech support, but should you want to install it locally, please download and install the individial Anaconda Distribution and obtain lesson material and follow Option A: Jupyter Notebook.

Episode	Content	Instructor
A Python Calculator	Basic data type	R.D. Kristensen-McLachlan & K.L. Nielbo
	Variable assignment
Analyzing Tabular Data	Why tabular data?	R.D. Kristensen-McLachlan & K.L. Nielbo
	Process tabular data
Visualizing Tabular Data	Basic visualization	R.D. Kristensen-McLachlan & K.L. Nielbo
	Group visualizations
Repeating Operations	Computers love repetitions	R.D. Kristensen-McLachlan & K.L. Nielbo
	applying same operation on different values
Collections of Values	How to bundle values	R.D. Kristensen-McLachlan & K.L. Nielbo
	list operations
Analyzing Multiple Files	Read many files	R.D. Kristensen-McLachlan & K.L. Nielbo
	Visualize many files
Logical Conditions	Making choices	R.D. Kristensen-McLachlan & K.L. Nielbo
	Boolean operators
Packaging Code in functions	Function definition	R.D. Kristensen-McLachlan & K.L. Nielbo
	Defining vs. calling
Errors and Exceptions	Is Python rude?	R.D. Kristensen-McLachlan & K.L. Nielbo
	error handing

Introduction to Web Scraping and Crawling

The workshop introduces to basic internet technology and how to automatically query and extract data available through the internet using Beautiful Soup and Hyphe:

Episode	Content	Instructor
How does the internet work?	The structure of internet and the web: IP, DNS, browser, HTML...	M. Jacomy
	What you need to know as a scholar
Accessing the internet with Python	Making a HTTP request	K.G. Kjelmann
	Downloading basic data
Parsing HTML with Beautiful Soup in Python	Dealing with web data	K.G. Kjelmannm
	Writing a simple script
Web Crawlers	Differences between scraping and crawling	M. Jacomy & K.G. Kjelmann
	Different tools for different needs (harvesting, exploring, archiving...)
	An example with the crawler Hyphe
Working with the internet	Methodological, ethical and legal considerations	M. Jacomy & K.G. Kjelmann

Machine Learning with Python

Support Holiday special is an introduction to machine learning with Python (and R). Throughout Digital Literacy and Curriculum, we have seen examples of how AI, machine learning and deep learning can accelerate (& automate) research tasks in humanities and social science. Now it is time to get our hands dirty! The workshop focuses on text classification in classical machine learning using scikit-learn in Python. At the end, we touch extend the usecase to deep neural networks and image classification.

Episode	Content	Instructor
Data preparation	feature engineering	K.L. Nielbo
	train-test split
Model definition	task definition	K.L. Nielbo
	Parameters vs. hyperparameters
Model training	training goal	K.L. Nielbo
	training steps
Model evaluation	performance metrics	K.L. Nielbo
	validation procedures
Parameter tuning	hyperparamter tuning	K.L. Nielbo
	artform vs science
Application	predictive vs. statistical modeling	K.L. Nielbo

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Workshops at a Glance

Managing Data with OpenRefine

Basic Scripting with Python

Introduction to Web Scraping and Crawling

Machine Learning with Python

About

Releases

Packages

pbvahlst/support-workshops-f20

Folders and files

Latest commit

History

Repository files navigation

Workshops at a Glance

Managing Data with OpenRefine

Basic Scripting with Python

Introduction to Web Scraping and Crawling

Machine Learning with Python

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages