top | item 43452779

(no title)

It's not that we haven't thought up the statistical tools. The core theoretical tools you need are there. It's that gathering the data that you need is extremely difficult and time consuming.

If you gather EHR or medical claims record data for vaccines for example, you have to take very seriously the biases and impact of missingness inherent in the data. Is that person you have no evidence of disease for truly not diseased or do they just have missing data? IS it missing because they just didnt go to the doctor because they're healthy enough to kick the disease on their own or because they're so financially unstable that they can't afford to consistently see their primary care doctor. Is the data missingness itself actually what's more correlated with the disease than the vacciation you are looking at?

Example: If your outcome is dementia then may be using cognitive tests that have a high level of variability due more to social class, education, test taking ability. Is receiving a fancy vaccine is more likely in an affluent area? Could be that correlation itself might completely explain away the positive effect that vaccine has on cognitive test scores.

In Alzheimer's you're often trying to correlate things that happen in early life with long term damage that only surfaces many many years later. Retrospective studies where you go back and ask sick or healthy people have recall bias where the sick ones remember more issues with themselves early on than healthy ones do even with the same early life issues.

Not trying to say epi is perfect or that there isn't room for improvement in tools (there absolutely is). But just like often happens when crossing over into the biological sciences there's a lot stickier problems than people outside the field realize.

discuss

nradov|11 months ago

Right, the data quality is usually crap. Beyond the issues you mentioned, patients often switch providers or health plans and their data doesn't get migrated. In the USA at least there is no centralized national repository for that data so the further back you try to go the more likely the data will just be missing (or incorrectly coded). In theory there are interoperability APIs and national networks to solve this problem but in practice a lot of systems still aren't properly connected.

For vaccinations specifically the CDC Immunization Gateway can be a good place to start. Most states also maintain their own immunization registries that can be queried through standard HL7 V2 Messaging and/or FHIR APIs if you have the appropriate permissions.

https://www.cdc.gov/iis/iz-gateway/index.html