Pioreactor development log #5
⭐️ One of our goals with the Pioreactor is design it such that you don't need to be a biologist, or an electrical engineer, or relevant for this article: a statistician. This article describes our internal algorithm that computes the culture's growth rate, but importantly: you don't need to know this algorithm to use the Pioreactor! We've designed the internal statistical algorithm to be robust enough that you can sit back and watch. This article is for the users who really want to dig deep into how we compute growth rates and the statistics behind it.
Introduction to cell density, optical density, and growth rate.
An important metric in bio-processing is the cell density, or biomass, of a bioreactor. The cell density can be measured directly, but the cost of this is very high. Either you are pulling a sample out and counting cells (slow, manual, and noisy), or passing liquid through a flow cytometer (expensive). A common proxy for cell density is optical density: measuring the amount of light scattering off cells. This approach is fast and inexpensive, but has drawbacks, too:
- Optical density sensors have upper and lower thresholds: too few cells can be difficult to detect, and too many cells can saturate the sensors (multiple scattering events are unpredictable).
- Changes in the size of cells during metabolic shifts can change the scattering, hence change optical density measurement.
- Optical density is unit-less, whereas cell density has units. A calibration curve is needed to convert optical density measurements to cell density (culture specific, too).
- Non-viable (dead) cells contribute to optical density.
- Optical density is noisy measurement of cell density. That is, for any true cell density, the optical density will have some statistical standard error that can't be reduced.
Suffice to say, optical density ≠ cell density, but tuned correctly, optical density has a very high correlation to cell density and is often good enough. This is why optical density is used so commonly in microbiology.
The Pioreactor, our affordable bioreactor, uses optical density to measure cell density. In fact, we have multiple optical density sensors. We call measurements from these sensors our raw optical density measurement, in volts. Taking a measurement is cheap, so we record a new measurement every 5 seconds. Because of the variability in each Pioreactor, the raw optical density measurements will vary between Pioreactor, and they don't have a direct interpretation anyways. For example, what does a raw optical density of 0.4341 V mean? Is that high, low, neither? So we don't normally expose the raw optical density to users.
Instead, we will investigate the normalized optical density defined as the current raw optical density divided by the mean of the raw optical density at the start of the experiment. That is, if
Analogously, if
This quantity is interpreted as the excess new cell density (new biomass) created, relative to the initial cell density (biomass). Of course, we don't know this, but we wish to estimate this.
We are also interested in the rate of change of the culture. In fact, as microbes grow exponentially, we are interested in the growth rate,
This is kinda like the derivative from calculus, but there's an exponential in there (because we know our culture grows exponentially, this mathematical form makes more sense. A growth rate of 0 means no growth, a positive growth rate means exponential growth, a negative growth rate means exponential decay). In fact, this is equivalent to
Note that using normalized cell density doesn't muddy the definition of our growth rate. The quantity
Estimation
Now that the definitions are present - how do we go about estimating these quantities, specifically the growth rate? Recall we don't have the cell density, so the best we can do is use the optical density measurements as a proxy. I've seen a few ways to estimate the growth rate, and I'll explain why I think they are problematic.
Parametric model approach
The first option is to fit the entire raw optical density curve to a logistic/Gompertz curve (see Figure 1.). This fitted parametric model of optical density provides us with interpretable parameters. However, this procedure has many problems:
- Fitting to the parametric model only makes sense after the experiment is completed. So you won't be able to get growth rate measurements in real time.
- Parametric models are rigid. See Figure 1. for an example of a poor fit. Even worse, if your culture exhibits diauxic growth (two or more growth spurts), they won't neatly fit into any simple "box" provided by parametric models.
- They provide a single estimate of growth rate, when in fact, growth rate changes over time due to metabolic changes in the culture. We should have a sequence of growth rates over time - not a single value.

Partitioning samples approach
To address the problems above, we can try something as follows: take a snapshot of the last
- How many points,
, is best? A small means higher resolution growth rate sequence, but also higher estimation error which causes a noisy growth rate sequence. A high is robust to outliers, but also is a lagging measure. - The procedure is not recursive, that is, it doesn't borrow relevant information from the past. If we estimate
in the previous step, we know is going to be close to that estimate. However, we totally ignore all previous estimates (and data points!) in the estimation of the current step. - Has a different time period than the original signal. For every
optical density measurements, we only observe a single growth rate measurement.

Pioreactor's approach: Kalman Filters
Okay, now that I've torn down the existing methods, what method do we implement in Pioreactor? It's a process-based model that uses a Kalman Filter. That is, we model the entire system of unknowns and their relationships, and let statistics handle the estimation. We start by supposing that we know the normalized cell density at time
where
Given we know
We next write down how we think
We also need to model how that acceleration term evolves. To keep it simple, we assume it is close to its previous value:
What have we done? We have modeled the physical system's variables (observed and hidden) and the relationships between these variables, considering our domain knowledge. Next, with this model in hand, we can pass it to a Kalman Filter for estimation. So when we feed the Kalman Filter the observations
Let's see it in action! Figure 3. reports the output from the same optical density data as used above.

Tuning the parameters of the Kalman Filter
The most annoying part of using Kalman Filters is choosing appropriate entries for the covariance matrices (
Constructing the observed covariance matrix,
Let's start with the easier case of
This gives us a pretty good estimate of
Note: a high value here means that we require a stronger (more extreme) "signal" from the incoming data to get past this filter. It's kinda like a high-pass filter...
So our
Note the 0s in the off-diagonal. This is telling the Kalman Filter that our signals are independent. Is this true? Not totally, as we do observe some correlation. Another improvement on this is to also calculate the covariance between the signals at the start of the experiment and populate the off-diagonals.
Constructing the process covariance matrix,
If you were hoping for a lesson on how to estimate
Conclusion
We hope this blog article gave you more confidence in how we do our internal inference. I think this Kalman Filter is one of our most powerful features. Accurate, real-time growth rates are critical for protocols like the morbidostat and more. We haven't touched on it (yet), but there's also the "trick" to gracefully handle dilution events (which drops the cell density and optical density). The other estimation approaches above can't handle this, but the Kalman Filter can. More on that in the future!