Fundamentals of Data Science
Computational and Inferential Thinking
1. What is Data Science?
1. Introduction
1.1. Computational Tools
1.2. Statistical Techniques
2. Why Data Science?
3. Plotting the classics
3.1. Literary Characters
3.2. Another Kind of Character
2. Causality and Experiments
1. Observation and Visualization: John Snow and the Broad Street Pump
2. Randomization
3. Establishing Causality
4. Snow’s “Grand Experiment”
5. Endnote
3. Programming in Python
1. Expressions
2. Names
2.1. Example: Growth Rates
3. Call Expressions
4. Introduction to DataFrames
4. Data Types
1. Numbers
2. Strings
2.1. String Methods
3. Comparisons
5. Sequences
1. Arrays
2. Ranges
3. More on Arrays
6. DataFrames
1. Sorting Rows
2. Selecting Rows
3. Example: Population Trends
4. Example: Trends in Gender
7. Visualization
1. Visualizing Categorical Distributions
2. Visualizing Numerical Distributions
3. Overlaid Graphs
8. Functions and Tables
1. Applying a Function to a Column
2. Classifying by One Variable
3. Cross-Classifying by More than One Variable
4. Joining Tables by Columns
5. Bike Sharing in the Bay Area
9. Randomness
1. Conditional Statements
2. Iteration
3. Simulation
4. The Monty Hall Problem
5. Finding Probabilities
10. Sampling and Empirical Distributions
1. Empirical Distributions
2. Sampling from a Population
3. Empirical Distribution of a Statistic
11. Testing Hypotheses
1. Assessing Models
2. Multiple Categories
3. Decisions and Uncertainty
4. Error Probabilities
12. Comparing Two Samples
1. A/B Testing
2. Deflategate
3. Causality
13. Estimation
1. Percentiles
2. The Bootstrap
3. Confidence Intervals
4. Using Confidence Intervals
14. Why the Mean Matters
1. Properties of the Mean
2. Variability
3. The SD and the Normal Curve
4. The Central Limit Theorem
5. The Variability of the Sample Mean
6. Choosing a Sample Size
15. Prediction
1. Correlation
2. The Regression Line
3. The Method of Least Squares
4. Least Squares Regression
5. Visual Diagnostics
6. Numerical Diagnostics
16. Inference for Regression
1. A Regression Model
2. Inference for the True Slope
3. Prediction Intervals
17. Classification
1. Nearest Neighbors
2. Training and Testing
3. Rows of Tables
4. Implementing the Classifier
5. Accuracy of Classifier
6. Home Prices
18. Updating Predictions
1. A “More Likely Than Not” Binary Classifier
2. Making Decisions
repository
Index