Introduction to Data Ethics


Data Science Ethics - Sketchnote by @nitya

We are all data citizens living in a datafied world.

Market trends tell us that by 2022, 1-in-3 large organizations will buy and sell their data through online Marketplaces and Exchanges. As App Developers, we'll find it easier and cheaper to integrate data-driven insights and algorithm-driven automation into daily user experiences. But as AI becomes pervasive, we'll also need to understand the potential harms caused by the weaponization of such algorithms at scale.

Trends also indicate that we will create and consume over 180 zettabytes of data by 2025. As Data Scientists, this gives us unprecedented levels of access to personal data. This means we can build behavioral profiles of users and influence decision-making in ways that create an illusion of free choice while potentially nudging users towards outcomes we prefer. It also raises broader questions on data privacy and user protections.

Data ethics are now necessary guardrails for data science and engineering, helping us minimize potential harms and unintended consequences from our data-driven actions. The Gartner Hype Cycle for AI identifies relevant trends in digital ethics, responsible AI, and AI governance as key drivers for larger megatrends around democratization and industrialization of AI.

In this lesson, we'll explore the fascinating area of data ethics - from core concepts and challenges, to case studies and applied AI concepts like governance - that help establish an ethics culture in teams and organizations that work with data and AI.

Pre-lecture quiz 🎯

Basic Definitions

Let's start by understanding the basic terminology.

The word "ethics" comes from the Greek word "ethikos" (and its root "ethos") meaning character or moral nature.

Ethics is about the shared values and moral principles that govern our behavior in society. Ethics is based not on laws but on widely accepted norms of what is "right vs. wrong". However, ethical considerations can influence corporate governance initiatives and government regulations that create more incentives for compliance.

Data Ethics is a new branch of ethics that "studies and evaluates moral problems related to data, algorithms and corresponding practices". Here, "data" focuses on actions related to generation, recording, curation, processing dissemination, sharing ,and usage, "algorithms" focuses on AI, agents, machine learning ,and robots, and "practices" focuses on topics like responsible innovation, programming, hacking and ethics codes.

Applied Ethics is the practical application of moral considerations. It's the process of actively investigating ethical issues in the context of real-world actions, products and processes, and taking corrective measures to make that these remain aligned with our defined ethical values.

Ethics Culture is about operationalizing applied ethics to make sure that our ethical principles and practices are adopted in a consistent and scalable manner across the entire organization. Successful ethics cultures define organization-wide ethical principles, provide meaningful incentives for compliance, and reinforce ethics norms by encouraging and amplifying desired behaviors at every level of the organization.

Ethics Concepts

In this section, we'll discuss concepts like shared values (principles) and ethical challenges (problems) for data ethics - and explore case studies that help you understand these concepts in real-world contexts.

1. Ethics Principles

Every data ethics strategy begins by defining ethical principles - the "shared values" that describe acceptable behaviors, and guide compliant actions, in our data & AI projects. You can define these at an individual or team level. However, most large organizations outline these in an ethical AI mission statement or framework that is defined at corporate levels and enforced consistently across all teams.

Example: Microsoft's Responsible AI mission statement reads: "We are committed to the advancement of AI-driven by ethical principles that put people first" - identifying 6 ethical principles in the framework below:

Let's briefly explore these principles. Transparency and accountability are foundational values that other principles built upon - so let's begin there:

Accountability makes practitioners responsible for their data & AI operations, and compliance with these ethical principles.
Transparency ensures that data and AI actions are understandable (interpretable) to users, explaining the what and why behind decisions.
Fairness - focuses on ensuring AI treats all people fairly, addressing any systemic or implicit socio-technical biases in data and systems.
Reliability & Safety - ensures that AI behaves consistently with defined values, minimizing potential harms or unintended consequences.
Privacy & Security - is about understanding data lineage, and providing data privacy and related protections to users.
Inclusiveness - is about designing AI solutions with intention, adapting them to meet a broad range of human needs & capabilities.

🚨 Think about what your data ethics mission statement could be. Explore ethical AI frameworks from other organizations - here are examples from IBM, Google ,and Facebook. What shared values do they have in common? How do these principles relate to the AI product or industry they operate in?

2. Ethics Challenges

Once we have ethical principles defined, the next step is to evaluate our data and AI actions to see if they align with those shared values. Think about your actions in two categories: data collection and algorithm design.

With data collection, actions will likely involve personal data or personally identifiable information (PII) for identifiable living individuals. This includes diverse items of non-personal data that collectively identify an individual. Ethical challenges can relate to data privacy, data ownership, and related topics like informed consent and intellectual property rights for users.

With algorithm design, actions will involve collecting & curating datasets, then using them to train & deploy data models that predict outcomes or automate decisions in real-world contexts. Ethical challenges can arise from dataset bias, data quality issues, unfairness ,and misrepresentation in algorithms - including some issues that are systemic in nature.

In both cases, ethics challenges highlight areas where our actions may encounter conflict with our shared values. To detect, mitigate, minimize, or eliminate, these concerns - we need to ask moral "yes/no" questions related to our actions, then take corrective actions as needed. Let's take a look at some ethical challenges and the moral questions they raise:

2.1 Data Ownership

Data collection often involves personal data that can identify the data subjects. Data ownership is about control and user rights related to the creation, processing ,and dissemination of data.

The moral questions we need to ask are:

Who owns the data? (user or organization)
What rights do data subjects have? (ex: access, erasure, portability)
What rights do organizations have? (ex: rectify malicious user reviews)

2.2 Informed Consent

Informed consent defines the act of users agreeing to an action (like data collection) with a full understanding of relevant facts including the purpose, potential risks, and alternatives.