Description of the Phys5053 course Data Analysis Methods

Updated May2020 for the Fall 2020 offering of Phys5053

The official course website is hosted on Canvas, and will be available to you when you register. This page contains a more extensive description of the course and its content.

This is a lab-style course that surveys data analysis methods in the physical sciences. Though designed for physics M.S. students and advanced undergraduates, the content of the labs are drawn from a wide spectrum of topics. The course has been taken by students from other departments and students from other majors switching into a physics M.S. degree.

It is a lab course where the data is already available. The activities are based around very quantitative tools and processes. Over the course of two or three weeks each, the analysis iterates from simple first-look to complex conclusions.

The course has no required textbook, but you will find it advantageous to have a reference for the statistical methods of data and uncertainties. Typical physics undergraduate texts such as "Data Reduction and Error Analysis for the Physical Sciences" by Bevington and Robinson or "An Introduction to Error Analysis" by Taylor would suit, as probably would a text from a statistics course. Paper versions are available in the lab, and electronic versoins of these resources may be made available, stay tuned.

The course assumes some prior experience. All course materials are supported in Python or Matlab, and previous programming experience is necessary. Experience with undergraduate level statistics is also assumed. For physics majors, you would have gotten this experience in your jr/sr level or possibly your sophomore laboratory courses, plus concepts of probability typically covered in your quantum courses. For non-physics majors, a post-calculus level statistics course (like Stat3611 and Stat3612 at UMD) would do.

The data sets we will analyze are from many sources and not especially obscure parts of physics. On purpose, the physical model behind these and why they are interesting will usually be familiar to you. They are examples of the most common types of data found in the wild, by "data analysts" working in outside academia and by scientists and engineers. They are (almost all) real data, with all the caveats and tricky aspects. Each data set will require three week-long cycles to unpack new layers of detail. Reporting conclusions in professional figures and written reports are a major part of the course. One or more "status update" and formal oral reports will be required.

Synthetic statistical data and human height distributions, probability distribution functions and their properties.

Furnace run time vs outdoor air temperature, Newton's Law of Cooling, model fitting

Lake superior air temperature, Fourier Transforms and Spatial Correlation functions

Cosmic ray arrival time. Poisson data, coincidence data, very large data sets

Solar panel yield and energy use on UMD campus, data-driven cost modeling

Skittles (the candy) color distributions, hypothesis testing, Monte Carlo simulation

Propagating input uncertainty in orbital dynamics numerical simulations

ANOVA style analysis using the classic plants and paints data sets

An independent project of your choosing.