ORF 245:  Fundamentals of Statistics

Fall Semester, 2017

MW 3:00pm--4:20pm in Robertson Hall 100
Home page: http://orfe.princeton.edu/~jqfan/

 

 

 

 

General

Text Book

Syllabus

Grading

R Examples

Datasets

 





  

 Instructor. Jianqing Fan, Frederick L. Moore'18 Professor of Finance. Office: 205 Sherrerd Hall. Phone: 258-7924. E-mail: jqfan@princeton.edu

Precept: The four sessions are identical and students are free to choose one.  The Precept holds at Engineering Quad E-Wing, E225. The schedules are are: 
        Tuesdays:   3:30pm--4:20pm         7:30pm --8:20pm
        Thursdays: 3:30pm--4:20pm         7:30pm --8:20pm

Office Hours:
  • Instructor: Mondays: 10:00am--11:00am and Wednesdays 11:00am -- 12:00pm or by appointments at 205 Sherrerd Hall
  • AIs: Room 005 Sherrerd Hall (Instructor's office hours hold at 205 Sherrerd Hall)
      --- Monday: 11:00am-12:00pm (Anna Guo), Mon 7:00pm-8:00pm (Anna Guo)
      --- Tuesday: 9:30am-11:00am (Yvette Gong), 11:00am-12:30pm (Joe Zhong)
      --- Wednesday: 10:00am-11:00am (Kevin Wang), 1:30pm--2:30pm (Huanran Lu)
      --- Thursday: 11:00am-12:00pm (Anna Guo)
      --- Friday: 10:00am--11:00am (Kevin Wang), 1:30pm--2:30pm (Huanran Lu)

Assistants in Instruction (AIs): All office hours will be held at 005 Sherrerd Hall.

  • Yvette Gong(head AI), wenyang@princeton.edu,  609-258-8787. Office:  222 Sherrerd Hal.
    Office Hours:
    Tue 9:30am-11:00am
  • Anna Guo, yongyig@princeton.edu,  609-258-8787. Office:  222 Sherrerd Hall.
    Office Hours:
    Mon 11:00am-12:00pm, Mon 7:00pm-8:00pm, Thurs 11:00am-12:00pm
  • Huanran Lu, huanranl@princeton.edu,  609-258-6239, Office: 219 Sherrerd Hall. 
    Office Hours
    :   Wed 1:30pm-2:30pm, Fri 1:30pm-2:30pm
  • Kevin Wang, kaizheng@princeton.edu,  609-258-4660, Office: 220 Sherrerd Hall. 
    Office Hours
    :   Wed 10:00am-11:00am, Fri 10:00am-11:00am
  • Joe Zhong, yiqiaoz@princeton.edu,  609-258-8787, Office: 222 Sherrerd Hall. 
    Office Hours
    :   Tue 10:30am-12:00pm
  • Statistics Lab, 609-258-9433 Location: 213 Sherrerd Hall.
  • Financial Econometrics Lab, 609-258-8787 Location: 222 Sherrerd Hall.


 Text and Reference Books:

  • Jay Devore, Probability and Statistics for Engineering and the Sciences, 9th Edition.


Syllabus:
A first introduction to probability, statistics and machine learning. This course will provide background to understand and produce rigorous statistical analysis including estimation, confidence intervals, hypothesis testing, regression, logistic regression and machine learning. Applicability and limitations of these methods will be illustrated using a variety of modern real world data sets and manipulation of the statistical software R. Precepts are based on real data analysis using R.

Course material will be covered the following topics; some topics will be assigned as reading materials.

  1. Descriptive statistics
  2. Probability
    • Sample space, event, probability
    • Conditional Probability, Bayes's Theorem
    • Independence
    • Monte Carlo Simulations
      Lecture Notes 2,   Homework 2
  3. Random variables and probability distributions
    • Random variables and probability distributions
    • Expected values and standard deviations
    • Probability density functions
  4. Commonly used distributions
    • Binomial distribution
    • Hypergeometric, negative bionomial
    • Poisson distributions
    • Normal distributions
    • Normal approximations to data histograms
    • Exponential and Gammas distributions
    • Quantile-Quantile plot
  5. Joint Distributions and Random Samples
    • Discrete joint distribution
    • Joint densities
    • Covariance and correlation
    • Multivariate random variables
    • Square root law
    • Central limit theorem
  6. Concepts and Methods of Estimation
    • Point Estimation
    • Methods of Estimation
    • Standard error
    • Bootstrap
  7. Confidence intervals
    • Basic Concept
    • Precision, sample size
    • Bootstrap
    • Intervals based on normal population
    • One-sided confidence bounds
  8. Hypothesis Testing
    • Basic concept
    • Test for population mean
    • t-test
    • Test for population proportion
  9. Comparisons of two treatments
    • Inference based on two samples
    • Two-sample z-test
    • Two-sample t-test
    • Difference between two proportions
    • Analysis of paired data
    • $Chi^2$-square tests and contigency tables
  10. Simple linear regression
    • Models and summary statistics
    • Estimation of model parameters
    • Regression effect and goodness of fit
    • Inference of model parameters
    • Prediction
    • Inference of Correlation
  11. Multiple and NonlinearRegression
    • Parameter estimation
    • Variable Selection
    • Statistical inference and ANOVA
    • Model diagnostics
    • Training and Testing
    • Cross-validation and Prediction errors
    • Polynomial and nonlinear regression
    • Model building using dummies
  12. Logistic Regression and Classification
    • Logistic Regression
    • Supervised learning and Bayesian classifiers
    • Fisher and nearest neighborhood classification
    • Support vector machine
    • Unsupervised learning

Computation: The software package for this class is R. The implementations of the statistical machine learning ideas are essential to this class. Laptops can be used during the exam as a calculator; however, internet and other communication tools should be turned off.
Popularity of Data Science Software      Popularity of Programming Languages

Attendance: Attendance of the class is required. The class covers many conceptual issues and statistical thinking that are not covered in the text book. They will appear in the midterm and final exams. In addition, random quizzes will be used to check understanding and attendance.

Homework: Problems will be assigned on the class web with due dates. They will be due in 8:00pm of the due dates (placed in the ``IN-OUT'' box for the course at Room 123, Sherrerd Hall). Missed homework will receive a grade of zero. The homework will be graded, and each assignment carries equal weight, except lowest score that will only received 40% weight. You are allowed to work with other students on the homework problems, however, verbatim copying of homework is absolutely forbidden. Therefore each student must ultimately produce his or her own homework to be handed in and graded.


Exams: There will be an in-class midterm exams, and a final exam. All exams are required and there will be no make-up exams. Missed exams will receive a grade of zero. All exams are open-book and open-notes. Laptops with wireless off and calculators may be used during the exams.


Schedules and Grading:
Homework (30%)   ............................................................................... 8:00pm of due dates
Class/Precept Participation (5%) .............................................................................. Random quizzes
Midterm Exam (20%) .................................... Wed (3:00--4:20pm), Oct 25, 2017, Robertson Hall 100
FINAL EXAM (45%)........................................................................... to be scheduled.

 

 R labs: The following files intend to help you familiar with the use of R commands.

 

 Data sets used in the class

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2007 The Trustees of Princeton University. Last update: September 12, 2009