

None of the results I show here should be considered authoritative.

I am NOT an epidemiologist or a professional data scientist. This was an exciting project because I got some validation of my approach. It’s great to reduce so much analysis into so few lines of code. The emerging tidymodels framework from RStudio using list columns is immensely powerful for this sort of thing. Not too long ago, managing models for multiple lags and multiple states would have been a bit messy. What prompted to do a write-up was discovering a new function in Matt Dancho’s timetk package, tk_augment_lags, which makes short work of building multiple lags.
#NYTIMES COVID DATA HOW TO#
I have been thinking about how to measure mortality lags for a while now. What is the lag between a positive case and a death? How does that vary among states? How has it varied as the pandemic has progressed? This is an interesting project because is combines elements of time series forecasting and dependent variable prediction. We will be looking at the relationship of COVID-19 cases to mortality. First, to explore an interesting data science question and, second, to explore some techniques and packages in the R universe. The purpose of this project is, as usual, twofold.

Second, it feels like lighting a candle to show that science can reveal truth at a time when the darkness of anti-science is creeping across the land. One, by delving into the numbers I imagine I have some control over this thing. I suspect there are two reasons for this. I have a macabre fascination with tracking the course of the COVID-19 pandemic.
