Results 1 to 6 of 6

Thread: A Maths Issue (how to solve this?)

  1. #1
    Senior Member
    Join Date
    Jun 2005
    Posts
    8,493
    Thanks
    11
    Thanked 103 Times in 90 Posts

    A Maths Issue (how to solve this?)

    OK without going right through it, I've currently found myself tracking the coronavirus 'new infections' data in attempt to establish the direction of travel that we're heading (it's my belief that we're now at a level below that which we locked down under incidentally) but as ever, it's bit a muddy

    Now I'm using the daily DoH data, and running the reports as a 7 day moving/ rolling average to omit the peaks of Tuesday and troughs of Sunday and Monday. That's fine. I'm happy that's the right way of doing it. The problem occurs of course with the raw data report which can show an increase in positive tests resulting from the obvious assumption that the more people you test, the more cases you detect.

    To try and put this on a level, I've introduced a baseline of 10,000 tests. That is to say divide the number of positives into the number of people tested, and multiply by 10,000. This figure then goes into the moving average until such time that it falls off

    What I'm wrestling with however is the sample construct. Throughout March the only people tested were those presenting with symptoms or those who had been in contact with people who'd tested positive. This creates a skewed sample, but is OK for analytical purposes so long as its evenly applied as you still get a trend
    line. In April (certainly the second half of the month) we began testing key workers (people who weren't reporting symptoms). Naturally the ratio of people testing positive to the number of people tested rises, which might also be attributable to the virus being less prevalent too (probably takes us into the realms of determinant coefficients).

    Altering a sample during a survey period is of course a nightmare

    Is there any method in quantitative analysis that anyone is aware of that could adjust for this?

    I'm looking at an output at the moment which has probably inflated the position of March because the sample was targeted. If you then project onto the 10,000 tested
    baseline you get a higher figure than was probably the case. This in turn leads to me potentially over-estimating the prevalence of Covid-19 circa March 15th onwards, potentially rendering the whole conclusion that we're broadly at our March 21st position again wrong

    Last edited by Warbler; 3rd May 2020 at 10:14 PM. Reason: problems with letter d
    Don't be so gloomy. After all it's not that awful. Like the fella says, in Italy for 30 years under the Borgias they had warfare, terror, murder, and bloodshed, but they produced Michelangelo, Leonardo da Vinci, and the Renaissance. In Switzerland they had brotherly love - they had 500 years of democracy and peace, and what did that produce? The cuckoo clock. So long Holly. _ Harry Limes

  2. #2
    Banned
    Join Date
    Jul 2014
    Posts
    1,026
    Thanks
    80
    Thanked 49 Times in 47 Posts
    The long winter nights must just...

  3. #3
    Senior Member
    Join Date
    Nov 2009
    Posts
    477
    Thanks
    16
    Thanked 69 Times in 49 Posts
    Is this essentially a 'goodness of fit' problem?

    If so, employing statistical 'smoothing' tests might help

    Chi-Squared and Poisson Distribution spring to mind, both of which are available on Excel

    Not that I'd have a clue how to use them: I just know the names

    This forum might prove useful:
    https://math.stackexchange.com/

    This pandemic caused by such a seemingly atypical pathogen with so many unknowns and so many variables should have statisticians salivating, so good luck with the research
    Last edited by Drone; 4th May 2020 at 10:24 AM.

  4. #4
    Senior Member
    Join Date
    Aug 2011
    Posts
    7,308
    Thanks
    813
    Thanked 1,031 Times in 874 Posts
    Doesn't the DoH data include re-tests, which (I'm guessing) would skew the figures upwards anyway? The premise being that negative tests are less likely to be repeated.
    Last edited by reet hard; 5th May 2020 at 3:30 AM.

  5. #5
    Senior Member
    Join Date
    Jun 2005
    Posts
    8,493
    Thanks
    11
    Thanked 103 Times in 90 Posts
    Cheers
    Drone.

    I think you're broadly in the right area for treating the predictive element on the data, but I can't make it stick.

    The data is roughly conforming with normally distributed sample so would pass a null hypothesis making Chi possible on a Pearson test. I could re-examine it

    Poisson I know less about in terms of how to apply the correction as I wouldn't know what to test for. It's perhaps closer to something like a confidence test by way of a forecast.

    The issue really stems from how the sampling changed during the survey period, and can clearly fall apart on a small targeted data. Clearly you can't extrapolate that 4 positives on 8 tests would equate to 5,000 on 10,000 tests and a 50% infection rate

    The problem I've got really is how to manipulate March's data safely. Something crude like using the lower quartile or standard error might be more accurate that what I've done, but it's pure guesswork. I suspect March's data is over reporting the amount of infection because the testing that was done was much more targeted. Therefore when I say we've recovered our position as of March 20th the chances are the amount of infection in the community on March 20th wasn't as high as I'm modelling
    Last edited by Warbler; 5th May 2020 at 4:12 PM.
    Don't be so gloomy. After all it's not that awful. Like the fella says, in Italy for 30 years under the Borgias they had warfare, terror, murder, and bloodshed, but they produced Michelangelo, Leonardo da Vinci, and the Renaissance. In Switzerland they had brotherly love - they had 500 years of democracy and peace, and what did that produce? The cuckoo clock. So long Holly. _ Harry Limes

  6. #6
    Senior Member
    Join Date
    Jun 2005
    Posts
    8,493
    Thanks
    11
    Thanked 103 Times in 90 Posts
    Quote Originally Posted by reet hard View Post
    Doesn't the DoH data include re-tests, which (I'm guessing) would skew the figures upwards anyway? The premise being that negative tests are less likely to be repeated.
    I've been able to source reports for number of people tested as opposed to number of tests conducted.

    In any event, it's the trend line that I'm after. The actually Y number needn't be that important so long as the method use to generate is consistent across the survey period
    Last edited by Warbler; 5th May 2020 at 4:16 PM.
    Don't be so gloomy. After all it's not that awful. Like the fella says, in Italy for 30 years under the Borgias they had warfare, terror, murder, and bloodshed, but they produced Michelangelo, Leonardo da Vinci, and the Renaissance. In Switzerland they had brotherly love - they had 500 years of democracy and peace, and what did that produce? The cuckoo clock. So long Holly. _ Harry Limes

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •