Jon Chun jon-chun

AI Digital Humanities

Jon A Chun
Co-Founder, Kenyon DH Colab

Contents (UPDATED July 2024, see www.jonachun.com for archive 2023 information)

Overview
Research
Innovation in Higher Ed
Diversity from A Human-Centered AI Curriculum
Code, Products and Patents
Kenyon AI Digital Humanities
Social Media
Mentored Research
Course Descriptions
Organizations

Overview

Hello, my name is Jon Chun and I'm an interdisciplinary ML/AI researcher and educator. I focus on bridging traditional academic divisions, AI research, industry best practices, and related social topics like government regulation, ethics and entrepreneurship. My research centers on ML/AI approaches to language, narrative, emotion, cognition, and persuasion/deception using data science, statistical machine learning and deep learning including NLP, LLM, and LMM. I also work on eXplainable AI (XAI), fairness-accuracy-transparency-explainability (FATE), AI ethical auditing and AI regulation.

I'm a lifetime entrepreneur, intrapreneur, and innovator in diverse fields from network security and education to finance, insurance, and healthcare across environments ranging from hyper-efficient Silicon Valley startups to hidebound traditions in higher education. I've presented and published the first interdisciplinary AI research on storytelling and emotion at major conferences and journals like Narrative and the Modern Language Association (MLA). In 2016 I co-founded the world’s first human-centered AI curriculum to engage domain experts from every discipline from literature and music to political science to economics. To the best of our knowledge, we coined the term ‘AI Digital Humanities’ and have mentored over 300 ML/AI DH projects with approximately 60,000 downloads from top institutions worldwide as of October 2024. I'm a co-principal investigator for the US NIST AI Safety Institute representing the Modern Language Assocation and with the IBM-Notre Dame Tech Ethics Lab researching LLM prediction capabilities.

Previously, I co-founded the world’s largest privacy and anonymity website with investors including In-Q-Tel. I took over as CEO to pivot the company in the face of collapsing ad revenue, co-authored several patents on the first browser-based VPN appliance and successfully sold the Silicon Valley startup to Symantec. There, as a Director of Development, I oversaw the successful launch of our rebranded product. Before that, I served as CIO for the premier boutique return to work firm in Silicon Valley and co-founded and was CTO for two international startups in Latin America and Japan. In medical school, I was an American Heart Association research fellow and published on gene therapy and the first web-based electronic medical record system in the American Medical Informatics Journal. In grad school, I was the first US-based Japanese localization engineer for DELL and first engineer analyst of Japanese patents for the US semiconductor consortium SEMATECH. I also worked in financial reporting in Tokyo, the Lawrence Berkley Labs’ synchrotron facility (ALS), and Computer Associates serving the aerospace IT industry.

There isn’t a topic I’m not curious about although keeping up with AI is my primary obsession. It’s exciting to work at the epicenter of technology that mirros so many core human traits in a field that progresses weekly and is poised to terraform every sphere of humanity. I have enjoyed working on exceptionally driven, focused, curious and creative teams on high-impact projects. I like collaboration, creative engineering, making functional design beautiful, presentations, and sales. I speak English (native), Spanish (US Foreign Service Exam), Japanese (日本語能力試験), French (college) and would like a chance to relearn my forgotten Chinese (college). Former baseball, soccer, wrestling, robotics, and improv Destination Imagination coach.

Research

I created the open-source library SentimentArcs in 2019, at the time the largest ensemble for diachronic sentiment analysis and the basis for Katherine Elkins's “The Shapes of Stories” (Cambridge UP 2022). I presented some of the earliest GPT-2 story generation work at Narrative2020 and have since published in Cultural Analytics and Narrative on AI and narrative. I've mentored approximately three hundred computational Digital Humanities projects since 2017 across virtually every department of Kenyon College as part of the Integrated Program for Humane Studies and the Scientific Computing programs. I co-founded the AI Digital Humanities Colab, the world's first human-centered AI Digital Humanities curriculum at Kenyon, and our AI KDH research colab. I currently have research papers pending on using LLMs to compare multiple translations of Proust, multimodal (dialog+image) diachronic sentiment arcs in film, emotional hacking of LLM high-stakes decision-making, a novel benchmark on semantic similarity, IP infringement and creativity using Narrative theory, and an updated and expanding ethical audit of the leading LLMs. I'm also a co-author on an ICML position paper that was invited as an oral presentation this year in Vienna. My current research projects focus on AI persuasion, manipulation and deception as well as using LLMs for predictive analytics and decision-making on structured tabular data. (July 2024)

Google Scholar: jonchun2000
Academia.edu: kenyon.academia.edu/jchun
Following: ArXiv CS.(x)

Recent Highlights

"How Well Can GenAI Predict Human Behavior? Auditing State-of-the-Art Large Language Models for Fairness, Accuracy, Transparency, and Explainability (FATE)"
IBM-Notre Dame Tech Ethics Lab Grant (paper forthcoming)

This research project targets a pivotal issue at the intersection of technology and ethics: surfacing how Large Language Models (LLMs) reason in high-stakes decision-making over humans. Our central challenge is enhancing the explainability and transparency of opaque black-box LLMs and our specific use-case is predicting recidivism—a real-world application that influences sentencing, bail, and early release decision. To the best of our knowledge, this is the first study to integrate and contrast three different sources of ethical decision: human, statistical machine learning (ML), and LLMs. Methodologically, we propose a novel framework that combines state-of-the-art (SOTA) qualitative analyses of LLMs with SOTA quantitative performance of traditional statistical ML models. Additionally, we compare these two approaches with documented predictions by human experts. This multi-model human-AI approach aims to surface both faulty predictions across all three as well as correlate patterns of both valid and faulty reasoning by LLMs. This configuration offers a more comprehensive evaluation of their performance, fairness, and reliability that is essential for building trust in LLMs. The anticipated outcomes of our project include a test pipeline to analyze and identify discrepancies and edge cases in both predictions and the reasoning behind them. This pipeline includes automated API scripts, an array of simple to complex prompt engineering strategies, and well as various statistical analyses and visualizations. The pipeline architecture will be designed to generalize to other use cases and accommodate future models and prompt strategies to provide maximal reuse for the AI safety community and future studies. This project not only seeks to advance the field of XAI but also to foster a deeper understanding of how AI can be aligned with ethical principles. By highlighting the intricacies of AI decision-making in a context fraught with moral implications, we underscore the urgent need for models that are not only technologically advanced but also ethically sound and transparent.

DESCRIPTION: Emotional hacking high-stakes AI decision-making models (accepted, under review)

ANONIMIZED ABSTRACT: As artificial intelligence becomes increasingly integrated into various technologies and decision-making processes, concerns about trust, safety, and potential manipulation of humans by AI systems are growing. This study, however, explores the reverse scenario: how humans might influence AI decision-making. The research examines the impact of prompt reframing and empathetic backstories on the ethical decision-making processes of advanced language models. A novel benchmark is introduced, enabling human-in-the-loop evaluations of how both confidence and compassion in AI ethical decision-making are affected by framing and empathy. This research represents a pioneering effort in understanding the bidirectional nature of human-AI influence in ethical contexts.

"Comparative Global AI Regulation: Policy Perspectives from the EU, China, and the US"
SSRN and ArXiv (final edits)

As a powerful and rapidly advancing dual-use technology, AI offers both immense benefits and worrisome risks. In response, governing bodies around the world are developing a range of regulatory AI laws and policies. This paper compares three distinct approaches taken by the EU, China and the US. Within the US, we explore AI regulation at both the federal and state level, with a focus on California's pending Senate Bill 1047. Each regulatory system reflects distinct cultural, political and economic perspectives. Each also highlights differing regional perspectives on regulatory risk-benefit tradeoffs, with divergent judgments on the balance between safety versus innovation and cooperation versus competition. Finally, differences between regulatory frameworks reflect contrastive stances in regards to trust in centralized authority versus trust in a more decentralized free market of self-interested stakeholders. Taken together, these varied approaches to AI innovation and regulation influence each other, the broader international community, and the future of AI regulation.

"AIStorySimilarity: Quantifying Story Similarity Using Narrative for Search, IP Infringement, and Guided Creativity"
ACL EMNLP/CoNLL, Miami (12-16 November 2024)

Stories are central for interpreting experiences, communicating and influencing each other via films, medical, media, and other narratives. Quantifying the similarity between stories has numerous applications including detecting IP infringement, detecting hallucinations, search/recommendation engines, and guiding human-AI collaborations. Despite this, traditional NLP text similarity metrics are limited to short text distance metrics like n-gram overlaps and embeddings. Larger texts require preprocessing with significant information loss through paraphrasing or multi-step decomposition. This paper introduces AIStorySimiliarity, a novel benchmark to measure the semantic distance between long-text stories based on core structural elements drawn from narrative theory and script writing. Based on four narrative elements (characters, plot, setting, and themes) as well 31 sub-features within these, we use a SOTA LLM (gpt-3.5-turbo) to extract and evaluate the semantic similarity of of diverse set of major Hollywood movies. In addition, we compare human evaluation with story similarity scores computed three ways: extracting elements from film scripts before evaluation (Elements), directly evaluating entire scripts (Scripts), and extracting narrative elements from the parametric memory of SOTA LLMs without any provided scripts (GenAI). To the best of our knowledge, AIStorySimilarity is the first benchmark to measure long-text story similarity using a comprehensive approach to narrative theory. Code and data are available at https://github.com/anon.

"Affective AI, Multimodal Sentiment Analysis, Diachronic Sentiment Analysis, Open-Source AI, LLM, LMM, Narrative, Storytelling, Video Analysis"
Frontiers in Computer Science

Affective artificial intelligence and multimodal sentiment analysis play critical roles in designing safe and effective human-computer interactions and are in diverse applications ranging from social chatbots to eldercare robots. However emotionally intelligent artificial intelligence can also manipulate, persuade, and otherwise compromise human autonomy. We face a constant stream of ever more capable models that can better understand nuanced, complex, and interrelated sentiments across different modalities including text, vision, and speech. This paper introduces MultiSentimentArcs, combination of an open and extensible multimodal sentiment analysis framework, a challenging movie dataset, and a novel benchmark. This enables the quantitative and qualitative identification, comparison, and prioritization of conflicting sentiments commonly arising from different models and modalities. Diachronic multimodal sentiment analysis is especially challenging in film narratives where actors, directors, cinematographers and editors use dialog, characters, and other elements in contradiction with each other to accentuate dramatic tension. MultiSentimentArcs uses local open-source software models to democratize artificial intelligence. We demonstrate how a simple 2-step pipeline of specialized open-source software with a large multimodal model followed by a large language model can approximate video sentiment analysis of a commercial state-of-the-art Claude 3 Opus. To the best of our knowledge, MultiSentimentArcs is the first fully open-source diachronic multimodal sentiment analysis framework, dataset, and benchmark to enable automatic or human-in-the-loop exploration, analysis, and critique of multimodal sentiment analysis on long-form narratives. We demonstrate two novel coherence metrics and a methodology to identify, quantify, and explain real-world sentiment models and modalities. MultiSentimentArcs integrates artificial intelligence with traditional narrative studies and related fields like film, linguistic and cultural studies. It also contributes to eXplainable artificial intelligence and artificial intelligence safety by enhancing artificial intelligence transparency in surfacing emotional persuasion, manipulation, and deception techniques. Finally, it can filter noisy emotional input and prioritize information rich channels to build more performant real-world human computer interface applications in fields like e-learning and medicine. This research contributes to the field of Digital Humanities by giving non-artificial intelligence experts access to directly engage in analysis and critique of research around affective artificial intelligence and human-AI alignment. Code and non-copyrighted data will be available at https://github.com/jon-chun/multisentimentarcs.

"In search of a translator: using AI to evaluate what's lost in translation"
Frontiers in Computer Science, 12 August 2024, Sec. Human-Media Interaction, Volume 6 - 2024 | https://doi.org/10.3389/fcomp.2024.1444021 (Credited in Paper)

Machine translation metrics often fall short in capturing the challenges of literary translation in which translators play a creative role. Large Language Models (LLMs) like GPT4o and Mistral offer new approaches to assessing how well a translation mirrors the reading experience from one language to another. Our case study focuses on the first volume of Marcel Proust's “A la recherche du temps perdu,” a work known for its lively translation debates. We use stylometry and emotional arc leveraging the newest multilingual generative AI models to evaluate loss in translation according to different translation theories. AI analysis reveals previously undertheorized aspects of translation. Notably, we uncover changes in authorial style and the evolution of sentiment language over time. Our study demonstrates that AI-driven approaches leveraging advanced LLMs yield new perspectives on literary translation assessment. These methods offer insight into the creative choices made by translators and open up new avenues for understanding the complexities of translating literary works.

"Near to Mid-term Risks and Opportunities of Open-Source Generative AI" ICLR 2024, May 7-11, Vienna, Austria
"Risks and Opportunities of Open-Source Generative AI" (Long form version)

In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about potential risks and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open source Generative AI. We argue for the responsible open sourcing of generative AI models in the near and medium term. To set the stage, we first introduce an AI openness taxonomy system and apply it to 40 current large language models. We then outline differential benefits and risks of open versus closed source AI and present potential risk mitigation, ranging from best practices to calls for technical and scientific contributions. We hope that this report will add a much needed missing voice to the current public discourse on near to mid-term AI safety and other societal impact.

"Informed AI Regulation: Comparing the Ethical Frameworks of Leading LLM Chatbots Using an Ethics-Based Audit to Assess Moral Reasoning and Normative Values"
ArXiv.org (Jan 9, 2024)

With the rise of individual and collaborative networks of autonomous agents, AI is deployed in more key reasoning and decision-making roles. For this reason, ethics-based audits play a pivotal role in the rapidly growing fields of AI safety and regulation. This paper undertakes an ethics-based audit to probe the 8 leading commercial and open-source Large Language Models including GPT-4. We assess explicability and trustworthiness by a) establishing how well different models engage in moral reasoning and b) comparing normative values underlying models as ethical frameworks. We employ an experimental, evidence-based approach that challenges the models with ethical dilemmas in order to probe human-AI alignment. The ethical scenarios are designed to require a decision in which the particulars of the situation may or may not necessitate deviating from normative ethical principles. A sophisticated ethical framework was consistently elicited in one model, GPT-4. Nonetheless, troubling findings include underlying normative frameworks with clear bias towards particular cultural norms. Many models also exhibit disturbing authoritarian tendencies. Code is available at https://github.com/jonchun/llm-sota-chatbots-ethics-based-audit.

"eXplainable AI with GPT4 for story analysis and generation: A novel framework for diachronic sentiment analysis"
Springer International Journal of Digital Humanities 5, 507–532 (2023). https://doi.org/10.1007/s42803-023-00069-8 (Oct 11, 2023)

The recent development of Transformers and large language models (LLMs) offer unique opportunities to work with natural language. They bring a degree of understanding and fluidity far surpassing previous language models, and they are rapidly progressing. They excel at representing and interpreting ideas and experiences that involve complex and subtle language and are therefore ideal for Computational Digital Humanities research. This paper briefly surveys how XAI can be used to augment two Computational Digital Humanities research areas relying on LLMs: (a) diachronic text sentiment analysis and (b) narrative generation. We also introduce a novel XAI greybox ensemble for diachronic sentiment analysis generalizable to any AI classification data points within a structured time series. Under human-in-the-loop supervision (HITL), this greybox ensemble combines the high performance of SOTA blackbox models like gpt-4–0613 with the interpretability, efficiency, and privacy-preserving nature of whitebox models. Two new local (EPC) and global (ECC) metrics enable multi-scale XAI at both the local and global levels. This greybox ensemble framework extends the SentimentArcs framework with OpenAI’s latest GPT models, new metrics and a modified supervisory HITL workflow released as open source software at https://github.com/jon-chun/SentimentArcs-Greybox.

"The Crisis of Artificial Intelligence: A New Digital Humanities Curriculum for Human-Centred AI"
International Journal of Humanities and Arts Computing

This article outlines what a successful artificial intelligence digital humanities (AI DH) curriculum entails and why it is so critical now. Artificial intelligence is rapidly reshaping our world and is poised to exacerbate long-standing crises including (1) the crisis of higher education and the humanities, (2) the lack of diversity, equity and inclusion (DEI) in computer science and technology fields and (3) the wider social and economic crises facilitated by new technologies. We outline a number of ways in which an AI DH curriculum offers concrete and impactful responses to these many crises. AI DH yields meaningful new avenues of research for the humanities and the humanistic social sciences, and offers new ways that higher education can better prepare students for the world into which they graduate. DEI metrics show how an AI DH curriculum can engage students traditionally underserved by conventional STEM courses. Finally, AI DH educates all students for civic engagement in order to address both the social and economic impacts of emerging AI technologies. This article provides an overview of an AI DH curriculum, the motivating theory behind design decisions, and a detailed look into two sample courses.

"How to Identify, Understand, and Analyze ChatGPT AI Narratives"
Narrative 2023, March 1-4th, Dallas, TX
"Augmenting Narrative Generation with Visual Imagery Using Integrated Prompt Engineering (ChatGPT, DALL-E 2)"
Narrative 2023, March 1-4th, Dallas, TX -Narrative Society
Roundtable: "Generative AI Art and Writing: ChatGPT and Generative AI Art: How it Works, Where It's Going, and What It Means for Our Future"
(video and links to generative AI resources), 17th January 2023, AI DHColab, Kenyon College, Gambier, OH
Living In Difficult Times
The Helix Center, Nov 19th, 2022, NY, NY
Chun, Jon, and Katherine Elkins. "What the Rise of AI Means for Narrative Studies: A Response to “Why Computers Will Never Read (or Write) Literature” by Angus Fletcher."
Narrative 30, no. 1 (2022): 104-113. doi:10.1353/nar.2022.0005.
Chun, Jon. "SentimentArcs: A Novel Method for Self-Supervised Sentiment Analysis of Time Series Shows SOTA Transformers Can Struggle Finding Narrative Arcs." \
ArXiv abs/2110.09454 (2021): n. page.

SOTA Transformer and DNN short text sentiment classifiers report over 97% accuracy on narrow domains like IMDB movie reviews. Real-world performance is significantly lower because traditional models overfit benchmarks and generalize poorly to different or more open domain texts. This paper introduces SentimentArcs, a new self-supervised time series sentiment analysis methodology that addresses the two main limitations of traditional supervised sentiment analysis: limited labeled training datasets and poor generalization. A large ensemble of diverse models provides a synthetic ground truth for self-supervised learning. Novel metrics jointly optimize an exhaustive search across every possible corpus:model combination. The joint optimization over both the corpus and model solves the generalization problem. Simple visualizations exploit the temporal structure in narratives so domain experts can quickly spot trends, identify key features, and note anomalies over hundreds of arcs and millions of data points. To our knowledge, this is the first self-supervised method for time series sentiment analysis and the largest survey directly comparing real-world model performance on long-form narratives.

Chun, Jon. AI Improv DivaBot in collaboration with Katherine Elkins, James Dennen (Denison University and Wexner Arts), Lauren Katz (Thymele Arts, LA), 100th anniversary of the premiere of “R.U.R.,” by Czechoslovakian playwright Karel Capek. “R.U.R.” (for “Rossum’s Universal Robots”) opened on January 25th, 1921, at the National Theater of Prague and marks the first use of the word “robot,” coined by Capek and derived from the Czech word for “forced labor.”, 25 Jan 2021
Elkins, Katherine, and Jon Chun. "Can GPT-3 pass a Writer’s Turing Test?." Journal of Cultural Analytics 5, no. 2 (2020): 17212.

Until recently the field of natural language generation relied upon formalized grammar systems, small-scale statistical models, and lengthy sets of heuristic rules. This older technology was fairly limited and brittle: it could remix language into word salad poems or chat with humans within narrowly defined topics. Recently, very large-scale statistical language models have dramatically advanced the field, and GPT-3 is just one example. It can internalize the rules of language without explicit programming or rules. Instead, much like a human child, GPT-3 learns language through repeated exposure, albeit on a much larger scale. Without explicit rules, it can sometimes fail at the simplest of linguistic tasks, but it can also excel at more difficult ones like imitating an author or waxing philosophical.

(AI Story Generation / AI Narrative Generation) How Artificial Intelligence Tells Stories: Natural Language Generation and Narrative, Narrative 2020 Conference (page 28), March 5-7 The Intercontinental Hotel, New Orleans

SentimentArcs is the open-source code for
The Shapes of Stories by Katherine Elkins
(Cambridge Press, Aug 2022)

Sentiment analysis has gained widespread adoption in many fields, but not―until now―in literary studies. Scholars have lacked a robust methodology that adapts the tool to the skills and questions central to literary scholars. Also lacking has been quantitative data to help the scholar choose between the many models. Which model is best for which narrative, and why? By comparing over three dozen models, including the latest Deep Learning AI, the author details how to choose the correct model―or set of models―depending on the unique affective fingerprint of a narrative. The author also demonstrates how to combine a clustered close reading of textual cruxes in order to interpret a narrative. By analyzing a diverse and cross-cultural range of texts in a series of case studies, the Element highlights new insights into the many shapes of stories.

Back to Top

Innovation in Higher Ed

I creatively apply the best of industry practices and state-of-the-art AI/ML techniques on interesting and high-impact interdisciplinary research. The combination of AI/ML, math/statistics and a diversity of domain expertise provides fresh insights and countless new paths of discovery.

I've also long been interested in bringing diverse voices to urgent debates surrounding technology’s growing impact on society. Our AI Digital Humanities computing curriculum has succeeded in attracting a majority female (61%), non-STEM (91%) and Under-Represented Minorities (11% Hispanic, 13% Black) as of 2022. Enrollments have steadily grown to become one of the most popular courses on campus. Both our research and that of our students have seen exponential growth in terms of citations and thousands of visits from top academic institutions around the world.

Over most of the last decade, I have been developing a new human-first approach to teaching computation grounded in ML, AI and Data Science with real-world applications inseparable from ethics. One challenge was to bridge the STEM and non-STEM divide. Another challenge was harmonizing the rigorous specialization of academia with practical, interdisciplinary and generalizable real-world solutions. The final challenge was to bootstrap an entirely new AI Digital Humanities computing curriculum without a budget, support staff, or academic credit toward any major/minor.

Over the first 6 years, our foundational course has become one of the most popular on campus. Both our professors' and students' research have been published in top journals, presented at leading conferences and have been read by thousands from top universities and research centers around the world. Both founders of our program have been involved in several organizations beyond Kenyon dedicated to AI, Ethics and innovating CS Education.

Philosophically, my goal is to cultivate in students a technologically informed worldview grounded in universal humanistic values. This integrated worldview is designed to intimately align the core strengths of traditional education with more ethical, practical and beneficial uses of technology for all.

Back to Top

Diversity from A Human-Centered AI Curriculum

UPDATE: Progress on UMR Diversity

Fall 2022 IPHS 200 Programming Humanity (estimate)

Category	Count	Percent
Male	41	53%
Female	36	47%
TOTAL	78	100%

13% African-American (10)

Progress on Gender Diversity in AI Digital Humanities curriculum since
the 2017-2018 academic year
(61% female as of Spring 2022)

At Kenyon College, I co-founded the world’s first human-centric AI curriculum. I am the sole technical advisor and the primary collaborative content creator. Over the last six years of teaching this curriculum, we have achieved the following milestones:

• Research: Published research in top publications and conferences (Cambridge UP, Narrative, Journal of Cultural Analytics, etc.) with clear growth in citations.

• AI Digital Humanities/DH Colab Research: Organically grew (no marketing/PR) to ~15k hits from top universities worldwide (#4 CMU, #5 Berkeley, #6 Stanford, #7 Columbia, #9 NYU, #16 Princeton, #22 Oxford, #23 MIT, #25 Cambridge, etc.)

• Diversity:

Female Grew from 18% to 61% between 2017-2021

Hispanic participation rates are often at or above college averages

Black 13% (Fall 2022 estimate above)

Non-STEM Our classes are ~90% non-STEM from across nearly all departments, enfranchising many students who may otherwise feel alienated by traditional CS programs

100% Pass rate (Quality of student work independently confirmed by success of their research archive at digital.kenyon.edu/dh)

0% Drop rate

• Enrollment: Experienced enrollment growth from 20 to 120 between 2017-2022 becoming one of the largest classes at Kenyon as an elective with no credit toward the traditional STEM computing major/minor

• Budget: With no budget or antecedent, innovated from scratch a globally recognized computational DH Colab research center and AI Digital Humanities. This includes no funds for hardware, software, cloud computing, support staff or other common expenses. This is achieved thru continual strategic planning, careful curation and testing fully open-source, robust, best-of-breed and/or freely available resources informed by decades of experience in industry.

Our interdisciplinary AI DH research has been published in top presses, journals, and conferences. We have also mentored hundreds of ML/AI DH projects that synthesize Artificial Intelligence with literature, history, political science, art, dance, music, law, medicine, economics and more. Various sample AI DH projects are given at the bottom of this page.

Timeline

1992-99: The Integrated Program for Humane Studies (IPHS, the oldest interdisciplinary program at Kenyon) established a computer lab in Timberlake House for DH scholarship under Director Michael Brint

2002 Jul: Katherine Elkins joined Kenyon and began mentoring traditional Digital Humanities projects (e.g. critiques of technology, websites, media, etc.) in the IPHS program

2003 May: Launched product Symantec Clientless VPN appliance as Director of Development and relocated from Silicon Valley

2005 Mar: Proposed new humanity-centered AI Digital Humanities curriculum in conjunction with a multi-million Ewing Marion Kauffman Foundation grant

2015 Aug: Formulated detailed interdisciplinary AI Digital Humanities curriculum after years of research and training

2017 Mar: Lead DH Kenyon Team at the HackOH5 Hackathon to explore challenges and opportunities in implementing computational Digital Humanities and effecting collaboration across disciplines

2017 Aug: Kenyon supports the first 'Programming Humanity' course co-taught with a Humanities and Comparative Literature professor.

2018 Aug: Kenyon adds first 'AI for the Humanities' course with a differentiated approach to GOFAI/ML through DNN, RL, and GA

2018 Aug: Katherine Elkins awarded a multi-year National Endowment of the Humanities Distinguished Professorship to continue developing a campus-wide Digital Humanities program to include every interested department

2022 Jan: Collaboration with Scientific Computing program at Kenyon mentoring several majors on interdisciplinary research

2022 Aug: Kenyon offers first computational 'Cultural Analytics' DH methodology course for Social Sciences and Humanities

2022 Aug: First collaboration with local industry via 'Industrial IoT Independent Study' targeting technical reference implementation and strategic whitepaper

Kenyon College's
The National Endowment for the Humanities Professorship

Our AI research and DHColab were collaboratively developed, and the curriculum is currently co-taught by a technology expert (Jon Chun) and an accomplished academic (Katherine Elkins). Both have broad experiences, publications, and interests transcending traditional domain boundaries. Support was provided with a 3-year National Endowment for the Humanities (NEH) appointment described here.

Collaborator Katherine Elkins work as
Kenyon College's National Endowment for the Humanities Professorship

A Humanity-First approach to AI Digital Humanities
consistently attracts over 90% non-STEM majors
(Kenyon College Institutional Research)

Back to Top

Code, Products and Patents

SentimentArcs: Github Repo

GitHub: jon-chun

SafeWeb: SEA Tsunami Products

Patents: SEA Tsumani

Block Diagram for
SentimentArcs Notebooks

Stories are everywhere. Here are a few examples of original research projects using SentimentArcs to extract and analyze narrative emotional arcs in:

Literature: Doubles and Reflections: Sentiment Analysis and Vladimir Nabokov’s Pale Fire

Translations: The Trials of Translation: A Cross-Linguistic Survey of Sentiment Analysis on Franz Kafka’s Trial

TV Scripts: Blood in the Water: Storytelling and Sentiment Analysis in ABC's Shark Tank

Medical End of Life Narratives: On Death and Emotion: Evaluating the Five Stages of Grief in End-of-Life Memoirs Using AI Deep Learning Models

Social Media (Government Collapse): How Did Sri Lankan Protestors End Up in the President’s Pool? Understanding the evolution of an occupy-style protest: A story of economic turmoil, declining social sentiment and resulting political change

Social Media (Elections): Quantifying Polarization around Election Denial: Measuring Public Sentiment Changes in the 2022 Midterms

Multimodal SentimentArcs: Royal Wedding (1951) Video 10% SMA Plot (2024)

Multimodal SentimentArcs: Royal Wedding (1951) Transcript 10% SMA Plot (2024)

Multimodal SentimentArcs: Royal Wedding (1951) KDE Plot (2024)
Back to Top

Kenyon AI Digital Humanities

Top 10 Institutions reading our AI DH Research in 2022
digital.kenyon.edu/dh

AI/ML Digital Humanities Projects

Kenyon Digital Colab

Kenyon AI Digital Humanities

Leading Institutions reading our AI DH Research in 2022
digital.kenyon.edu/dh

Eurasian Institutions
digital.kenyon.edu/dh

Institutions from The Americas
digital.kenyon.edu/dh

Countries Worldwide
digital.kenyon.edu/dh

Institutions Worldwide (2023 May)
digital.kenyon.edu/dh

images\kenyon_dh_analytics_institutions_1958.png

Back to Top

Social Media

@jonchun2000
Main Social Media Account

Twitter: @jonchun2000

LinkedIn: jonchun2000

Instagram: jonchun2000

Back to Top

Mentored Research

Brainstorming to translate new theories into testable models for (a) Literary Analysis, (b) Financial Forensics and (c) the Latent Space of Generative Art Prompts.

Integrated Program for Humane Studies (2017-)

IPHS200 Programming Humanity (samples below)

Cultural Bias/DALL-E 2: Adjectivally-Oriented: Women Through the Decades: Stylistic Shifts In Magazines As Represented By Image-Generating AI

Political Science/Social Media How Did Sri Lankan Protestors End Up in the President’s Pool? Understanding the evolution of an occupy-style protest: A story of economic turmoil, declining social sentiment and resulting political change

Gender Studies/Topic Modeling: The Second Meaning: Uncovering the Linguistic Interpretation of Simone de Beauvior’s The Second Sex

Literature/Multilingual Sentiment Analysis: The Trials of Translation: A Cross-Linguistic Survey of Sentiment Analysis on Franz Kafka’s Trial

ChatGPT/Security: Breaking ChatGPT with Dangerous Questions

Art Communities/DALL-E 2: Do Andriods Dream of Digital Art? Addressing the Spectrum of Perspectives on AI-Generated Artwork

Economics/Social Media Sentiment: How the Mighty Have Fallen: Analyzing Twitter Sentiment in the Wake of FTX Bankruptcy and Sam Bankman-Fried's Indictment

Literature/Spanish Sentiment Analysis: Multilingual Sentiment Analysis and Translation: Spanish and English Story Arcs in Juan Rulfo’s Pedro Páramo

Sociology/Social Media: Understanding Caste System in Nepal: Surfacing changes in public sentiment on Twitter over time

Political Science/Economics Military Expenditures and Terrorism: Assessing Correlation Between Terror Attacks and Global Military Spending

Literature/Sentiment Analysis Hitchhiker's Guide to Sentiment Analysis: A Comparison between Movie and Film

Sports Economics/Social Media: Values or Profit? An Analysis on the Impact of Legal Sports Betting on Sports Business

Environmental Studies/Social Media Energy Conversation on Alternative Energy World Perspective vs Bangladesh

Political Science/Social Media Understanding Public Opinion towards the Government of Bangladesh through Sentiment Analysis of Twitter

Poetry/GPT-3: The GPT3 Re-Imagining of “Howl” By Allen Ginsberg: What Are The Strengths and Weaknesses of This Representation?

Literature/GPT-2: 345M-GPT-2 After James Wright: Can AI Generate Convincing Contemporary Poetry?

Environmental Studies/Bayesian Time Series Analysis: Predicting Attitudes Toward the Environment Artificial Intelligence for the Humanities

Political Science/NLP Topic Modeling: Transitional Justice Terminology Analysis in United Nations General Assembly Speeches (1971-2015)

Literature/NLP Sentiment Analysis Doubles and Reflections: Sentiment Analysis and Vladimir Nabokov’s Pale Fire

Conflict Studies/Data Science Cold War Conflicts: Analyzing the Role of U.S. Arms Exports

Music/RNN Composition: RNN monophonic sheet music generation with LilyPond

Political Science/Machine LearningFreedom, Democracy, and Well-Being: A Comparative Analysis of Global Progress Indexes Using K-Means Clustering

Environmental Studies/Machine Learning LEED Certification Prediction with K-Means Clustering Algorithm

Economics/Time Series Forecasting: Computational Approaches to Predicting Cryptocurrency Prices

Film/NLG w/GPT-2: Digitizing Camp: Training a GPT-2 on "The Rocky Horror Picture Show"

Journalism/NLG 2/GPT-2: GPT-2 Journalism: Can AI produce Mike Royko’s writing?

Modern Language/NLP Analysis: Lost in Translation: Using Sentiment Analysis to Analyze Translations of Homer’s Odyssey

Music/Machine Learning Recommendation: Analyzing popular music using Spotify’s Machine Learning Audio Features

Political Science/Data Science: COVID-19: Global Trends in Social Protection, Unemployment, and Economic Stimuli

Social Sciences/NLP: Building a Universal Human Trafficking Lexicon

Law/NLP Topic Modeling: Topic Modeling Analysis of Supreme Court Opinions Focusing on Privacy Rights in the Context of Abortion Law

Philosophy/NLG w/GPT-3 Prompt Engineering: Prompt Engineering Tips for Generating Text on Cognitive Science and Philosophy of Mind

Political Science/Social Media NLP Sentiment Analysis: Consequences of Social Network Architecture: Analyzing Sentiment in Reddit Posts About Donald Trump

Film/GPT-2: https://digital.kenyon.edu/dh_iphs_prog/19/

IPHS290 Cultural Analytics (New Fall 2022)

Upcoming Projects: (Intentionally vague for now)

Multi-Racial Identity

NLP Analysis of End-of-Life Medical Narratives

US-Latin American Geopolitical, Economic, & Military Aid and Analysis

Exploring Representations of Utopia in Literature over Time

Analysis of Economic Performance and Healthcare Quality Metrics after Private Equity Acquisitions

Global Supply Chain Analysis post-Covid

IPHS300 AI for the Humanities (samples below)

Jewish Studies/Information Science: Taxonomy Techniques for Holocaust-Related Image Digitization and Text

Law: Synthetic Biology: Analyzing Trends in Intellectual Property Rights vs. Open Access to Research, 1989-2019

Asian-American Studies/Sentiment Analysis: The Rise of Anti-Asian American Sentiment with COVID-19

Gender Studies/Data Science: TikTok’s Non-Inclusive Beauty Algorithm & Why We Should Care

Literature/NLP Stylometrics: Analyzing the Reading Levels of Fifty Shades of Grey and The DaVinci Code: Learning More About Blockbuster Books

Conflict Studies/Data Science: Does U.S. Conflict Intervention Provoke Terrorism in the Middle East and North Africa?

Mathematics/Cryptography: Homomorphic Encryption

Political Science/Sentiment Analysis: Killed by Division: Sentiment Analysis Towards Juan Guaido by Venezuelan Opposition Factions Between 2019-2021

Social Sciences/Data Science: Reframing the ways we understand Cancel Culture: Clickbait Campaigns in The Attention Economy

Literature/NLP Sentiment Analysis: Five Books, Same Story: Understanding Percy Jackson through Sentiment Analysis

Political Science/Data Science: 2020 Election Fraud: What Can Twitter Teach Us?

Literature/NLP Sentiment Analysis: Jane AI-sten: What is Sentiment Analysis’s Connection to Best-Selling Literature?

IPHS484 Senior Seminar/Research (samples below)

Film/ChatGPT & DALL-E2: When AI Met Screenwriting... Can AI Generate Beat Sheets and Storyboards?

Sociology/AI Visual Sentiment Analysis: AI reads Playboy (but not for the articles): Revealing Cover Trends with Deep Neural Networks

Political Science/Econometrics: An Econometric Measure of the Post-9/11 Growth of the Defense Budget: Quantifying the Military-Industrial Complex’s Growing Influence Over the Pentagon

Sociology/NLG with GPT-2 vs GPT-3: Black Box Karl Marx: What do large language models have to say about Das Kapital? A Comparison of GPT-2 and GPT-3 Outputs

Fiction Narrative/NLP Sentiment Analysis: Adapted Arcs: Sentiment Analysis and The Sorcerer’s Stone

Public Health/Statistical Machine Learning: Evaluating Ohio’s Opioid Overdose Epidemic with AI

TV Script/NLP Sentiment Analysis: Blood in the Water: Storytelling and Sentiment Analysis in ABC's Shark Tank

Social Media/NLG with GPT-2: GPT-2 Jomboy: Can AI produce exciting Baseball content?

Art/Deep Neural Networks Generative Art: An Artist's Guide to AI Art

Economics/Time Series Anomaly Detection: Analyzing Pump and Dump Schemes

Back to Top

Course Descriptions

The virtuous cycle, feedback and tension between
the 3 models that guide our interdisciplinary innovation

Integrated Program for Humane Studies (2017-)

IPHS200 Programming Humanity

IPHS290 Cultural Analytics

IPHS300 AI for the Humanities (samples)

IPHS494 Senior Seminar Research Projects

IPHS Independent Study Research

IPHS391 Frontiers in Generative AI (New Fall 2024, Approved Fall 2023)

OVERVIEW:

This upper-division course offers an in-depth exploration of advanced AI concepts, focusing on interdisciplinary applications of large language models, AI information systems, and autonomous agents. Over 15 weeks, students will engage with a progressive curriculum, starting with a review of Python and a series of four hands-on projects: (a) OpenAI API programming a GPT-based chatbot, (b) mechanistic interpretations of transformer internals using Huggingface Transformers, (c) Retrieval-Augmented Generation (RAG) using LangChain, and (d) simulations of autonomous multi-agent systems using AutoGen. The course includes four substantive subprojects and one final project, enabling students to apply theoretical knowledge to practical, real-world AI challenges. This course is designed to equip students with the skills and knowledge necessary to innovate in the rapidly evolving field of artificial intelligence, emphasizing both technical proficiency and ethical considerations. Introductory Python programming experience required.

NOTE: These 4 broad frontiers of AI research are rapidly evolving and based upon my AI research and industry consulting with Meta, IBM, the Whitehouse/NIST AI Safety Institute, etc. There is a constant flow of major new AI research, libraries, frameworks and startups nearly every week. Since this course will begin 9 months after this syllabus was written, expect updates to reflect the most recent in AI research breakthroughs, tooling, and industry best practices as of August 2024. Nonetheless, the class will be structured around these 4 broad and relatively consistent universal areas in AI.

Scientific Computing Mentored Projects(2020-)

SciComp Senior Seminar/Research

Noisy Time Series Filtering, Smoothing and Feature Detection

Narrative Metrics for NLG using LLM Transformers

Diachronic Sentiment Analysis Central Bank Speeches using SentimentArcs

SciComp Independent Study

Industrial Revolution 4.0: End-to-End Industrial IoT Preventative Maintenance

Back to Top

Organizations

US NIST AI Safety Institute Consortium

*Principle Investigator (2024-) for the Modern Language Association

Announcement: For over 100 years, the MLA has become the principle organization of scholars in language and literature with over 25,000 members in over 100 countries. The MLA is joining more than 200 of the nation’s leading artificial intelligence (AI) stakeholders to participate in a US Department of Commerce initiative to support the development and deployment of trustworthy and safe AI. Established by the Department of Commerce’s National Institute of Standards and Technology (NIST) on 8 February 2024, the US AI Safety Institute Consortium (AISIC) brings together AI creators and users, academics, government and industry researchers, and civil society organizations to meet this mission.The MLA-sponsored team will be led by Katherine Elkins and Jon Chun at Kenyon College. The team will evaluate model capabilities with a special focus on linguistic edge cases and ethical frameworks.

The AISIC includes companies and organizations on the front lines of developing and using AI systems as well as the civil society and academic teams building the foundational understanding AI’s potential to transform our society. Consortium members represent the nation’s largest companies and innovative startups; creators of the world’s most advanced AI systems and hardware; representatives of professions with deep engagement in AI’s use today; state and local governments; and nonprofits. The consortium will also work with organizations from other nations in order to establish interoperable and effective safety around the world.

Human Centered AI Lab.Org

*Cofounder (2023-)

About: Our mission is to facilitate efficient collaboration on interdisciplinary AI between individual researchers and domain experts separated by geographic, organizational, doctrinal, and legal divisions. We focus on human-centered AI topics like safety, bias, explainability, ethics, and policy grounded in careful experimentation and expert interpretation. Our goal is to enable fast, focused and flexible research funding and collaboration overlooked by traditional institutional research structures. We are a purely volunteer and wholly virtual non-profit corporation conducting human-centered AI research and education in the public interest.

The Helix Center, NY, NY

Executive Committee (2022-)

Round-Table: Living in Difficult Times, Nov 19, 2022

About: The original inspiration for interdisciplinary forums arose from the observations by our director, Dr. Edward Nersessian, of the constraints in both communication and creativity among scientists at professional meetings, fueled both by narrow specialization and the grant process, that with its demand for sharply defined investigation seemed, in fact, to be limiting curiosity and inquiry. This motivated him to form discussion groups drawing on multiple disciplines, the creative productivity of which inspired the formation of the Philoctetes Center for the Multidisciplinary Study of the Imagination.

Mission: The primary mission of The Helix Center is to draw together leaders from distinct spheres of knowledge in the arts, humanities, sciences, and technology for interdisciplinary roundtables, the unique format of which potentiates new ideas, new questions, and facilitates emergent creative qualities of mind less possible in conventional collaborations. Such a drawing together of leaders of various disciplines irrespective of their academic affiliation allows the Helix Center to function as a kind of university without walls. In addition, through audience attendance and its Q&A engagement with the roundtable participants, and live streamed and archived events, we aim to expand public understanding and appreciation of the sciences and technology, the arts and humanities.

Back to Top

Kenyon DHColab
(Kenyon AI Digital Humanities Colab)