Schedule

Any changes to the schedule will be reflected here, so we advise you to check this page often.

We will use Canvas for class announcements, materials and other administrivia.

The class meets Thursdays, 10:30-11:50, in Green Earth Sciences, Room 150

April 6, 13



Ryan Tibshirani	Daniel J. McDonald
UC Berkeley	U of British Columbia

Opportunities and Challenges in Auxiliary Surveillance for Public Health the United States

In 2015, the Delphi group at Carnegie Mellon University launched an effort called the Epidata project, to collect and make publicly available signals that reflect infectious disease activity in real-time. The focus was primarily on seasonal influenza in the United States. In March 2020, this effort was massively expanded and accelerated to help support the COVID-19 response. Now, Epidata has over 4.5 billion records, with ~3 million records added daily, and receives between 100,000 and 1 million API queries per day. It covers a diverse set of data streams, both novel and traditional, for tracking COVID-19, influenza, and other diseases. The first lecture, on April 6, will give a high-level summary of the main goals behind Epidata, and the challenges and opportunities in auxiliary surveillance for public health. The second lecture, on April 13, will dive into some of the software packages that Delphi is building that support data access, as well as nowcasting and forecasting.

April 20, 27

Henrik Bengtsson (UC San Francisco)

Futureverse - A Unifying Parallelization Framework in R for Everyone

A future is a programming construct designed for concurrent and asynchronous evaluation of code, making it particularly useful for parallel processing. The future package implements the Future API for programming with futures in R. This minimal API provides sufficient constructs for implementing parallel versions of well-established, high-level map-reduce APIs. The future ecosystem supports exception handling, output and condition relaying, parallel random number generation, and automatic identification of globals lowering the threshold to parallelize code. The Future API bridges parallel frontends with parallel backends, following the philosophy that end-users are the ones who choose the parallel backend while the developer focuses on what to parallelize. A variety of backends exist, and third-party contributions meeting the specifications, which ensure that the same code works on all backends, are automatically supported.

The lectures focus on R but programmers from other languages will also find the material useful.

May 4, 11

Steven Diamond (Gridmatic)

Convex Optimization and Applications in Python

Convex optimization plays a central role in many fields such as statistics, machine learning, control and data science.

These lectures will bring you up to speed on basic (applied) convex optimization. We introduce Disciplined Convex Programming (DCP), a system for constructing convex optimization problems using building blocks called atoms. DCP is implemented in the Python package CVXPY, which enables practitioners to construct and solve optimization problems in a few lines of code. We explore convex optimization applications through a variety of CVXPY exercises and examples.

We conclude with pointers to other open source tools and relevant literature.

May 18

Balasubramanian Narasimhan (Stanford University)

Convex Optimization in R

This lecture will add to the previous lectures by Steven Diamond by discussing applications in the R language using the CVXR package.

May 25, June 1

Jehangir Amjad (Google)

Data Commons

Publicly available data from open sources (census.gov, cdc.gov, data.gov, etc.) are vital resources for students and researchers in a variety of disciplines. Unfortunately, processing these datasets is often tedious and cumbersome. Organizations follow distinctive practices for codifying datasets. Combining data from different sources requires mapping common entities (city, county, etc.) and resolving different types of keys/identifiers. This process is time consuming, tedious and done over and over. Our goal with Data Commons is to address this problem.

Data Commons synthesizes a single graph from these different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources without data cleaning or joining. Data Commons is designed to be useful to students, researchers, and enthusiasts across different disciplines.

In these lectures we will discuss data commons in general, programmatic usage and applications.

Prior Years Schedules

2022