Everybody Contributes: Charts for the Sewer Coronavirus Alert Network by Zan Armstrong

Launched on Christmas Eve 2020, these "overview" charts tracking concentrations of covid and other pathogens sampled from sewage continue to be used daily to inform the decisions of public health officials and the public. The charts, along with scientific commentary, are also featured weekly in updates to members of the CDC, California public health department, city manager and local public health departments, wastewater treatment plant managers, researchers, and medical doctors. Adapted for Monkeypox, they were used to update the White House on the emerging crisis. They've resonated with the public as well, including being featured on trusted TikTok and Twitter accounts with tens and hundreds of thousands of followers. Although seemingly simple line charts, several subtle data encoding and design decisions underlie the efficacy, resilience, and flexibility of these charts. These design decisions were informed both by the underlying scientific context and by the types of decisions that public health officials needed to make based on this data. 21 months since launch, even as the nature of the pandemic has changed in unexpected ways, these charts are increasingly relied upon as a critical tool for understanding and mitigating the impact of SARS-CoV-2, it's variants, other potentially deadly pathogens like RSV and Influenza, and emerging threats like Monkeypox and HPMV.

In late 2020, Scientists Dr. Alexandra Boehm and Dr. Marlene Wolfe, with expertise in studying 'pathogens in the environment', were leading a collaboration with Verily, sewage treatment plants, and county public health departments. They sampled "settled solids" from sewage treatment plants and extracted, measured, and normalized traces of SARS-CoV-2 RNA. Because "everybody contributes" to the sewage system with each flush, this data could answer the questions most important to informing the decisions of public health officials: "In our community, is covid19 getting better or worse right now? How much better or worse? And, are there indications of what is coming around the corner so our community can prepare or mitigate?"

A key missing piece was the translation from the scientific context to the decision-makers context. Two principles underpinned the resulting charts. First, be true to the scientific processes that generated the data to inform how we interpret (and therefore visualize) it. That way the most obvious interpretation of the chart would also be the most scientifically accurate interpretation. Second, design the chart to most effectively answer public health officials' most critical questions to inform the high-stakes policy decisions they must make for their communities despite the uncertainty of the unfolding crisis.

The subtle, and sometimes unconventional, design decisions critical to the effectiveness of these line charts in answering this core "better / worse" question include:

#1 - Defining the horizontal gridlines at 1/2x, 1x, and 2x the concentration of SARS-Cov-2 from two weeks earlier, so you can immediately eyeball if things have been improving or worsening recently and by how much. Increases and decreases were more important and more meaningful than a raw value like 0.0000234.

This was especially important because at the time these graphs were designed we did not know if it was scientifically accurate to compare values across sewage treatment plants. It had been shown that for each specific treatment plant we could meaningfully compare values over time; that an increase from 0.00002 to 0.00003 meant that things had gotten ~50% worse. But, at the time it was not known if a value 0.00002 in Oceanside vs 0.00003 in Palo Alto meant that more of the population was infected in the Palo Alto sewershed vs Oceanside's sewershed (sewage treatment plants aren't identical). Labeling with relative values instead of absolute values therefore served to focus the charts on the stakeholder's question "getting better or worse" while also avoiding making comparisons, and therefore drawing conclusions, that were not yet scientifically sound.

#2 - Smoothing with a 5-day trimmed, centered mean to balance the need for sensitivity to rapid changes in concentrations with robustness to single-day outlier samples.

#3 - By default the charts are zoomed in to just the most recent six weeks of data as we're most concerned with "what's changing now". The y-axis max value is set based on the max smoothed value during that time period so that even when concentrations are relatively low we'll notice a small increase that might be the first hint of an upcoming surge. In a full-history chart, that critical signal may be an easy to overlook blip compared to past surges.

#4 - At the same time, it's easy to scroll or toggle to see the full historical context, and compare current levels to past peaks.

#5 - When variants were later added, shades of blues represented genes indicative of all types of SARS-CoV-2 while bright colors called attention to variant-specific traces. Points represent early less frequent unsmoothed data while lines represent the smoothed data from daily sampling.

Nearly 2 years later, these fundamental design decisions have proved robust to field emerging questions like: which variant is dominant, declining, or ascending in each community? How bad is it compared to past surges? What about RSV? Influenza? And new emerging threats like Monkeypox? The audience has also broadened to include the medical community, making life & death decisions about patient treatment based on the predominant variant in their specific community (ie. delta vs omicron), and the general public who make decisions about their own day-to-day activities.

At first glance these charts might not seem that specialized. However, a familiar and appropriate form combined with nuanced data-encoding choices has made these charts vital, trusted, and beloved by those who return to them over and over again.