Statistics For Data Science
Statistics for Data Science
A brief summary of the topics covered in this course is as below. This is 25 hours course, it is suggested to complete this course in 3 weeks.
Introduction of Statistics for Data Scientist
- Introduction to basic statistics terms
- Types of statistics
- Types of data
- Levels of measurement (nominal, ordinal, and interval/ratio)
- Measures of central tendency
- Measures of dispersion
- Random variables
- Concept of Set
- Skewness, Kurtosis
- Covariance and correlation
- Data Visualization
- Data summarization methods
- Tables, Graphs, Charts, Histograms,
- Frequency distributions
- Box Plot
- Chebychev’s Inequality on relationship
Descriptive & Inferential Statistics for Data Scientist
- Type of Probability distributions – discrete vs continuous distributions,
- Cumulative Probabilities, Normal & Standard Normal Distribution
- Discrete Distributions
- Binomial Distributions
- Poisson Distribution
- Continuous Distributions
- Uniform Distribution
- Normal Distribution
- Standard Normal Distribution
- Exponential Distribution
- Sampling methods
- Interval Estimation
- Central limit theorem – sampling, sampling distribution, properties of sampling distribution, central limit theorem, estimating mean using CLT
Hypothesis Testing for Data Scientist
- Concepts of hypothesis testing – business relevance, framing hypotheses, hypothesis testing process and p-value
- Types of hypothesis tests – left- and right-tailed tests, two-tailed tests, types of errors, hypothesis testing using T-distribution
- Industry demos on hypothesis testing (Excel) – two-sample mean test, two-sample proportion test, A/B testing
- Z-Test, normal standard distribution
- T-Test, t-stats, Student t distribution
- T-stats vs. Z-stats
- Type 1 & type 2 error
- Bayes statistics (Bayes theorem)
- Confidence interval (CI), margin of error
- Interpreting confidence levels and confidence intervals
- Chi-square test
- Chi-square distribution using python
- Chi-square for goodness of fit test
- When to use which statistical distribution?
- Analysis of variance (ANOVA)
- Assumptions to use ANOVA
- ANOVA three type
- Partitioning of variance in the ANOVA
- Calculating using python
- F-distribution
- F-test (variance ratio test)
- Determining the values of f
- F distribution using python
Project & Resources
- Resources for practice
- A Final Assignment