Corona Virus and Exponential Growth

Eladar

Science

21 Mar '20 02:04

Eladar
Joined
12 Jul '08
Moves
13814
02 Apr '20 19:241 edit
@DeepThought

Looks like you are confusing SD with Margin of Error.

1.96 SD for to calculate the Margin of Error for a 95 percent confidence interval.
~~Removed~~
Joined
10 Dec '06
Moves
8528
02 Apr '20 19:262 edits
@ponderable said
you buy it now?
I'm not trying to pick a fight, but no, I still don't buy it. Your prediction of 38,000 Deaths by April 11th is nearly double the current projection. Just 2 days further and its well past double.
DeepThought
Losing the Thread
Quarantined World
Joined
27 Oct '04
Moves
87415
02 Apr '20 19:35
@joe-shmo said
The online regression software is predicts 75,800 Deaths by the Inflection Point ( Day 56 - April 21), totaling 161,100. So they seem to be in agreement, with the exception of the inflection point date.
A major problem with my method is that I can change the answer drastically by changing the start date for the linear regression. I get a quarter of a million deaths if I time from the 13/3 (this is in the US) - although that's 109,000 if I include negative numbers of death in the averaging. I don't know if there's a way of stabilizing it against that.
DeepThought
Losing the Thread
Quarantined World
Joined
27 Oct '04
Moves
87415
02 Apr '20 19:39
@eladar said
@DeepThought

Looks like you are confusing SD with Margin of Error.

1.96 SD for to calculate the Margin of Error for a 95 percent confidence interval.
You didn't read my post very carefully did you.
Eladar
Joined
12 Jul '08
Moves
13814
02 Apr '20 19:503 edits
@deepthought said
Using the data from worldometers.com I get 76,500 deaths in the US with a standard deviation of 92,900 and the top of the 95% confidence interval at 186,000. I need to have a think about how to integrate errors in the linear regression into this.
76,500 + 1.96(92,900) would put the max for your 95 percent confidence interval at 258,584

Your low end would be smaller than -100,000. Your virus is actually making people or raising the dead.
~~Removed~~
Joined
10 Dec '06
Moves
8528
02 Apr '20 21:26
@deepthought said
Hi joe, I've got a way of doing this on a spreadsheet. The logistic function is:

f(x) = a/(1 + b exp(-mx)) + d

Notice that if we send x to minus infinity we get f = d. Since we expect the total number of deaths far in the past to be zero I'm setting d = 0. This makes the analysis a lot easier. Also I've replaced c with m for reasons that'll become clear below. ...[text shortened]...
So my 95% confidence interval is (1,800 to 42,750)

I'll repeat the calculation with the US data.
I have some questions so I can get a handle on what you are doing:

"The next thing to notice is that for small x we have:

f(x) ~ exp(mx)

So we can get m by taking the log of our data and do linear regression, m gives us the slope of the linear regression which is why I renamed it. We can always write:

b = exp(m*x0) = exp(c) where c is the intercept from linear regression.

In other words b just determines when our zero in time is. Let's choose b so that f(0) = 1. In other words the date of the first case. Then we can write:

f(0) = 1 = a/(1 + b) "

Is the above bit reverse order? If it isn't, I'm missing how we can go directly to f(x)≈e^(mx)

It seems like first we do some algebra:

f(x) = a* e^(mx) /(e^(mx) + b )

Then we constrain b,:

f(0) = 1 = a/(1+b)

Then its clear that for small values of x:

[a/( e^(mx)+b)] ≈ a/(1+b) = 1

Thus, f(x) ≈ e^(mx)

Guess Ill start with that, but I'll have more questions to follow. I hope you don't think I'm being a pain, its just that I don't do these manipulations of approximating functions over certain subsets that often ( I feel like that's more physicist business ), so I require pretty strict logical flow to follow along. 🙂
DeepThought
Losing the Thread
Quarantined World
Joined
27 Oct '04
Moves
87415
02 Apr '20 21:45
@eladar said
76,500 + 1.96(92,900) would put the max for your 95 percent confidence interval at 258,584

Your low end would be smaller than -100,000. Your virus is actually making people or raising the dead.
I think we can safely constrain the low end of the confidence interval at the current fatality figure.
Eladar
Joined
12 Jul '08
Moves
13814
02 Apr '20 22:321 edit
@deepthought said
I think we can safely constrain the low end of the confidence interval at the current fatality figure.
A 95 percent confidence interval implies the condition concerning a normal distribution has been met.

It is impossible that your mean and sd could come from a normal distribution.
DeepThought
Losing the Thread
Quarantined World
Joined
27 Oct '04
Moves
87415
02 Apr '20 23:501 edit
@joe-shmo said
I have some questions so I can get a handle on what you are doing:

"The next thing to notice is that for small x we have:

f(x) ~ exp(mx)

So we can get m by taking the log of our data and do linear regression, m gives us the slope of the linear regression which is why I renamed it. We can always write:

b = exp(m*x0) = exp(c) where c is the intercept from linear ...[text shortened]... l like that's more physicist business ), so I require pretty strict logical flow to follow along. 🙂
I was just taking the exponential approximation as a given, the easiest way to see it is to start with the standard logistic function:

f(x) = 1/(1 + exp(-x))

Its rate of change is given by:

f'(x) = exp(-x)/(1 + exp(-x))² = (1/f(x) - 1)f(x)² = f(x)(1 - f(x))

which makes sense for an epidemic, the rate of change of the fraction of the population infected is proportional to the number of carriers and proportional to the fraction of people left uninfected. For the full function we have:

f(x) = a/(1 + b exp(-mx))

so

f'(x) = abm exp(-mx)/(1 + b exp(-mx))² = (m/a) (1 + b exp(-mx) - 1) f(x)² = (m/a) (a/f(x) - 1) f(x)²

Giving our final result:

f'(x) = (m/a) f(x) (a - f(x))

Note first that b does not appear in this, which forms my justification in treating it as an initial condition. Since we expect a to be large compared with f(x) for the initial part of the curve we have the approximation:

f'(x) = m f(x)

giving us:

f(x) ~ exp(mx)
DeepThought
Losing the Thread
Quarantined World
Joined
27 Oct '04
Moves
87415
03 Apr '20 01:161 edit
The problem with the above is it seems to be numerically unstable. An alternative approach might be to use the tanh function. The standard logistic function can be written in terms of tanh:

f(x) = 1/(1 + exp(-x)) = (1 - tanh(x/2))/2

So for the whole function if we have:

f(x) = a/(1 + b exp(-mx)) + d = a/(1 + exp[-m(x - x₀)]) + d
f(x) = a (1 - tanh(-μ(x - x₀)))/2 + d
f(x) = (d + a/2) - a/2 tanh(-μ(x - x₀))

rationalising the constants we get:

f(x) = A + B tanh(μ(x - x₀))

For the four points we can simplify:

f(-∞ ) = A - B
f(0) = A + B tanh(-μx₀)
f(x₀) = A
f(∞ ) = A + B

Now, we can either select A = B which will guarantee that the inflection point is when exactly half the population has died - this is an assumption of my version of the model above. Or we could select f(0) = 0 when:

A + B tanh(-μx₀) = 0 => A = B tanh(μx₀)

The advantage of this form is we can expand around x = 0 to get the initial part of the curve. I might have a go at implementing it tomorrow.
Eladar
Joined
12 Jul '08
Moves
13814
03 Apr '20 07:42
Ponderable,

How did you calculate your margin of error for your 95 percent confidence interval? I do not think it is the same technique as stats.
~~Removed~~
Joined
10 Dec '06
Moves
8528
03 Apr '20 12:423 edits
@deepthought said
I was just taking the exponential approximation as a given, the easiest way to see it is to start with the standard logistic function:

f(x) = 1/(1 + exp(-x))

Its rate of change is given by:

f'(x) = exp(-x)/(1 + exp(-x))² = (1/f(x) - 1)f(x)² = f(x)(1 - f(x))

which makes sense for an epidemic, the rate of change of the fraction of the population infected is prop ...[text shortened]... itial part of the curve we have the approximation:

f'(x) = m f(x)

giving us:

f(x) ~ exp(mx)
Thank you, quite clever( at least for my standard ) algebraic manipulations there!

For anyone operating at my speed:

f(x) = a/ ( 1+ b*e^(-mx))

b*e^(-mx) = a/f(x) - 1

Then;

f'(x) = m/a * [a²/(1+b*e^(-mx))²]*[b*e^(-mx)] = m/a*[f(x)]² *[a/f(x) - 1]

Ok, so when you solved for "a" we were expecting a constant, but the results for it varied substantially? What is causing the variation? I guess, why was it not scrapped at that point when a wasn't shown to be a well behaved constant? Is the variability of "a" instead a hint to the failures of the Logistic Regression in this point of time?
Eladar
Joined
12 Jul '08
Moves
13814
03 Apr '20 17:311 edit
Removed by poster
Eladar
Joined
12 Jul '08
Moves
13814
03 Apr '20 17:491 edit
Take 2, bad data entry in round 1

Ti 84 exp reg equation

.95845*1.2478^t. With r^2=.985

Where t is measured in days after March 13 and we are finding how many tens of deaths. 156.0 means 1560 deaths

Ok, residuals are more random now, but there is still a pattern and the residual distance from the model is growing.

Predicted deaths for today, day 23 1560
DeepThought
Losing the Thread
Quarantined World
Joined
27 Oct '04
Moves
87415
03 Apr '20 17:57
@joe-shmo said
Thank you, quite clever( at least for my standard ) algebraic manipulations there!

For anyone operating at my speed:

f(x) = a/ ( 1+ b*e^(-mx))

b*e^(-mx) = a/f(x) - 1

Then;

f'(x) = m/a * [a²/(1+b*e^(-mx))²]*[b*e^(-mx)] = m/a*[f(x)]² *[a/f(x) - 1]

Ok, so when you solved for "a" we were expecting a constant, but the results for it varied substantially? What ...[text shortened]... variability of "a" instead a hint to the failures of the Logistic Regression in this point of time?
There's a few potential sources of the problem. The data's noisy at the start of the curve as the particular circumstances of the first 100 or so people to die affect the data. We're trying to impose a logistic curve onto nature, but the actual dynamics are a lot more complex, this can't be helped without using a very detailed model that's going to go way beyond what we can do on a spreadsheet.

Methodologically, part of the problem is that I'm getting values for m and x₀/b from the same data I'm trying to estimate "a" from. The time series for the UK data's longer so I've stopped doing that for the UK data as of the 1st of this month, using the estimating m and b from the data up to the 1st and a using the data from the 1st I get a = 24,500 +/- 210, but that's because there's only three data points. For the US data we don't have that luxury yet.

One thing. The "logest" function on OpenOffice gives a 1.4% error for its estimate of m but a 10% error for the estimate of b (US data). This means there's a significant source of error in our estimate of where the function crosses the x-axis.

Science Forum

Corona Virus and Exponential Growth