- drawing useful conclusions from large and diverse data sets, through exploration, prediction and inference.
- exploration involves identifying patterns in information
- prediction involves using information to make informed guesses about unknown values
- inference involves quantifying degree of certainty.
- primary tools for
- exploration are visualizations and descriptive statistics
- prediction are machine learning and optimization
- inference statistical tests and models
- Statistics is central component of data science because it studies how to make robust conclusions based on incomplete information.
- Computing is central component of data science because it allows us to apply analysis techniques to the large and diverse data sets that arise in real-world applications, including
- numbers
- text
- images
- videos
- sensor readings
- numbers
- Data Science is all of these things, but it is more than the sum of its parts because of the applications.
- Through understanding a particular domain, data scientists learn to ask appropriate questions about their data and correctly interpret the answers provided by inferential and computational tools.
- The degree of uncertainty for many decisions ca be reduced sharply by
- access to large data sets, and
- computational tools required to analyze them effectively
- Data driven decision making has already transformed a tremendous breadth of industries
- wide range of academic disciplines are evolving rapidly to incorporate large-scale data analysis into their theory and practice.