Instructor. Jianqing Fan, Frederick L. Moore'18 Professor of Finance. Office:
205 Sherrerd Hall. Phone: 258-7924. E-mail: email@example.com
The four sessions are identical and students are free to choose one. The Precept holds at Engineering Quad E-Wing, E225. The schedules are
Mondays: 10:00am--11:00am and Wednesdays 11:00am -- 12:00pm or by appointments at 205 Sherrerd Hall
- AIs: Room 005 Sherrerd Hall (Instructor's office hours hold at 205 Sherrerd Hall)
--- Monday: 11:00am-12:00pm (Anna Guo), Mon 7:00pm-8:00pm (Anna Guo)
--- Tuesday: 9:30am-11:00am (Yvette Gong), 11:00am-12:30pm (Joe Zhong)
--- Wednesday: 10:00am-11:00am (Kevin Wang), 1:30pm--2:30pm (Huanran Lu)
--- Thursday: 11:00am-12:00pm (Anna Guo)
--- Friday: 10:00am--11:00am (Kevin Wang), 1:30pm--2:30pm (Huanran Lu)
Assistants in Instruction (AIs): All office hours will be held at
005 Sherrerd Hall.
- Yvette Gong(head AI), firstname.lastname@example.org,
222 Sherrerd Hal.
Office Hours: Tue 9:30am-11:00am
- Anna Guo, email@example.com,
222 Sherrerd Hall.
Office Hours: Mon 11:00am-12:00pm, Mon 7:00pm-8:00pm, Thurs 11:00am-12:00pm
- Huanran Lu, firstname.lastname@example.org,
Office: 219 Sherrerd Hall.
Wed 1:30pm-2:30pm, Fri 1:30pm-2:30pm
- Kevin Wang, email@example.com,
Office: 220 Sherrerd Hall.
Wed 10:00am-11:00am, Fri 10:00am-11:00am
- Joe Zhong, firstname.lastname@example.org,
Office: 222 Sherrerd Hall.
- Statistics Lab, 609-258-9433 Location: 213 Sherrerd
- Financial Econometrics Lab, 609-258-8787 Location: 222 Sherrerd
and Reference Books:
- Jay Devore, Probability and Statistics for Engineering and the Sciences, 9th Edition.
Syllabus: A first introduction to probability, statistics and machine learning. This course will provide background to understand and
produce rigorous statistical analysis including estimation, confidence intervals, hypothesis testing, regression, logistic regression and machine learning. Applicability and limitations of these methods will be illustrated using a variety of modern real world data sets and manipulation of the statistical software R. Precepts are based on real data analysis using R.
Course material will be covered the
following topics; some topics will be assigned as reading materials.
- Descriptive statistics
- Statistics vs. probability, sample vs population;
- Summary statistics: Mean, SD, Median, IQR;
- Graphical Summary: Pie Charts, Histograms, Box-plots
Lecture Notes 1, Homework 1
Random variables and probability distributions
- Sample space, event, probability
- Conditional Probability, Bayes's Theorem
- Monte Carlo Simulations
Lecture Notes 2, Homework 2
Commonly used distributions
- Random variables and probability distributions
- Expected values and standard deviations
- Probability density functions
Lecture Notes 3, Homework 3
Joint Distributions and Random Samples
- Binomial distribution
- Hypergeometric, negative bionomial
- Poisson distributions
- Normal distributions
- Normal approximations to data histograms
- Exponential and Gammas distributions
- Quantile-Quantile plot
Lecture Notes 4, Homework 4
Concepts and Methods of Estimation
- Discrete joint distribution
- Joint densities
- Covariance and correlation
- Multivariate random variables
- Square root law
- Central limit theorem
Lecture Notes 5, Homework 5
- Point Estimation
- Methods of Estimation
- Standard error
Lecture Notes 6,
Comparisons of two treatments
- Basic Concept
- Precision, sample size
- Intervals based on normal population
- One-sided confidence bounds
Lecture Notes 6, Homework 6
Simple linear regression
- Inference based on two samples
- Two-sample z-test
- Two-sample t-test
- Difference between two proportions
- Analysis of paired data
- $Chi^2$-square tests and contigency tables
Lecture Notes 9, Homework 8
Multiple and NonlinearRegression
- Models and summary statistics
- Estimation of model parameters
- Regression effect and goodness of fit
- Inference of model parameters
- Inference of Correlation
Lecture Notes 10, Homework 9
Logistic Regression and Classification
- Parameter estimation
- Variable Selection
- Statistical inference and ANOVA
- Model diagnostics
- Training and Testing
- Cross-validation and Prediction errors
- Polynomial and nonlinear regression
- Model building using dummies
Lecture Notes 11
- Logistic Regression
- Supervised learning and Bayesian classifiers
- Fisher and nearest neighborhood classification
- Support vector machine
- Unsupervised learning
Lecture Notes 12, Homework 10
The software package for this class is R. The implementations of the statistical machine learning ideas are essential to this class. Laptops can be used during the exam as a calculator; however, internet and other communication tools should be turned off.
Popularity of Data Science Software
Popularity of Programming Languages
Attendance of the class is required. The class covers many conceptual issues and statistical thinking that are not covered in the text book. They will appear in the midterm and final exams. In addition, random quizzes will be used to check understanding and attendance.
Problems will be assigned on the class web with due dates. They will be
due in 8:00pm of the due dates (placed in the ``IN-OUT'' box for
the course at Room 123, Sherrerd Hall). Missed homework will receive a grade of zero. The homework will be graded, and each assignment carries equal weight, except lowest score that will only received 40% weight. You are allowed to work with other students on the homework problems, however, verbatim copying of homework is absolutely forbidden. Therefore each student must ultimately produce his or her own homework to be handed in and graded.
There will be an in-class midterm exams, and a final exam. All exams are required and
there will be no make-up exams.
Missed exams will receive a grade of zero. All exams are
open-book and open-notes. Laptops with wireless off and calculators may be used during the exams.
Schedules and Grading:
8:00pm of due dates
Class/Precept Participation (5%) ..............................................................................
Midterm Exam (20%) ....................................
Wed (3:00--4:20pm), Oct 25, 2017, Robertson Hall 100
to be scheduled.
R labs: The following
files intend to help you familiar with the use of R commands.
Data sets used in the class
- Daily Stock Prices from 1/1/2000 to 9/8/2016:
Johnson & Johnson ,
- Tax data in years
- Salary data for 253 MBA's first jobs (in pounds) in UK in 2010
- 129 macroeconomic monthly time series from 1959 to 2016
- motorcycle data
- autism data
- Boston Housing Data
- Image Data: 500 photos with people in the pictures and 500 photos without people in pictures
and its associated R-code to preprocess the data human.r