Me and many others have done this for a long time. The Harvard Personal Genome Project [1] is a large open database of people's genetic and phenotypic information. Here is my profile: https://my.pgp-hms.org/profile/hu1247AF
The Harvard Personal Genome Project is great (I'm a participant) but there are some other projects that are complementary as well, such as Open Humans [1] and Open SNP [2].
Ah awesome!!!! Shoot me an email would love to chat more about that. The one thing that I wanted to offer that's different than those other sources is if I was an engineer who wanted to see what working with EMR data was going to be like I can't find an EMR export online anywhere (CCDA and raw notes) for using ML and NLP to analyze.
I have a friend who is a medical researcher and it definitely seems they are stuck in the past.
In order to study something, he has to:
* Come up with a hypothesis that X may cause Y
* Request access to data about that hypothesis
* He is only given the data regarding his hypothesis
* He can then study whether his hypothesis has merit or not
We should be dumping these whole datasets into machine learning and having computers give us potential links to explore. Obviously there will be plenty of things that turn out to be unrelated, but it's also very likely the computer can find links that a human would not have considered.
I don't see it changing any time soon in the US, but I suspect other countries with this data will use it, and we'll find the next generation of medical breakthroughs no longer come from the US.
Every usage of every bit of medical data requires patient consent.
The potential for abuse is not hypothetical.
While I was implementing medical information exchanges, every single participant considered patient data to be their own, to be used as they wish. Our (grand)parent company, a lab, was negotiating with Microsoft, Google, pharmas, etc. Each was trying to figure out how to monetize it. For example, targeted ads.
The C (executive) level players mocked HIPAA and the other (meager) patient and consumer protections the same way they mocked Sarbanes-Oxley, environmental protections, financial reporting requirements, etc. If you think Google and Facebook are bad...
---
My data, all that is known about me, is my identity. It's me.
At the very least, if someone's going to profit from my data, I want my cut.
The more statistical tests you run against a set of data (EDIT: the more variables you test against a dataset), the higher the chance you get a statistically significant result from random error alone.
Sadly, even countries with universal healthcare systems don't have universal health informatics systems (the NHS is a prime example — they spent £12B trying to build an integrated system [1]). Lots of countries attempt, including the US — HIPAA was actually originally about data portability [2], and we just spent another $40B [3]. Thus far only smaller countries have had success with integrated health IT systems [4].
Thank you for sharing this. We (I) founded a company to help with several niche aspects of healthcare and the bureaucratic issues faced by administrations. We are finding success with data transactions, and while there are some companies out there who work really hard to make transaction engines, it's not very efficient, very expensive, and doesn't benefit the consumer at all.
My past experience as a software developer was, "Give me all the datum, and tell me what you need, then I'll make it work." I even worked for a very large EMR (probably the biggest on the planet), and getting a patient record out of their system is a nightmare, even though the foundation of their application is the patient record.
I'd love to converse more about what you're building, as we capture many unstructured documents and are now using ML to grab details out of these and match to criteria.
We should (as a society) consider open sourcing every medical record.
Medical privacy is ethically tricky. It (1) protects bad doctors, (2) makes it harder to develop treatments, (3) makes it hard for consumers to shop intelligently.
Medical privacy would be useful when negotiating cost of coverage with your insurer, but they have a contractual right to demand your complete medical record.
The best arguments I've heard for medical privacy are (1) you might not get a job if you're sick, (2) shame factor could prevent people from going for treatment and (3) you may not get a date if you have, say, herpes. (#3 is true but not necessarily a strong point from a social standpoint).
Your #1, #2, and #3 are all the same thing, but I think it's hugely important:
Medical records can show all kinds of markers about your past / current behavior that let people paint pretty horrible assumptions about eachother.
Type 2 Diabetes? Man you must eat poorly.
Herpes? You must have gotten from being promiscuous and risky
Depression? Must not be able to deal with the shit that is real life.
Hormone therapy? Dental issues? Pain killers? Allergies? I mean the list is almost as long as the list of all medical issues that people.
Just about every medical condition, people paint with behavioral moral/ethical judgement which is almost entirely unfair. I think medical privacy is hugely important for society as we currently are, and losing it would not change these effects, but instead increase the ease to discriminate against them.
Herpes. Mental illness. A history of suicide attempts. HIV+ status. The fact that you weren't born with your current gender. The fact that you've miscarried three times, and are currently pregnant. The fact that your child has fetal alcohol syndrome.
All super fun facts that people would love for friends, coworkers and strangers to be able to find out.
I understand that there are good arguments for releasing medical data, but this is just the "if you have nothing to hide, what are you worried about?" argument.
It is very easy for the typical software engineer to come up with the brilliant idea of open sourcing everything without thinking of any of the consequences. But the real world is much more complex than that.
Simple solution. Make medical records privacy opt-in.
Most people won't bother (strong default effect), so lots of data for research, and those who care can still can have their privacy. It won't exactly be a random sample, but it should still be better than what's available currently.
This is a risky thing to do when the patient's name is attached to it. Insurance companies, salesmen, etc., could do quite a lot with such information.
I whole-heartedly support the general idea, and making a centralised database of things like this would be great. Such a database would probably make it easier to anonymise the data as well.
Even without the patient's name attached it is easy to identify people because of the necessary metadata in the record. If you expect to get useful information about, for instance lung disease the record will have to contain information about exposure to likely causes, age, occupation, region of the country (possibly town), sex. It will also contain marital status, whether one has children, drinking and smoking habits, weight, ethnicity.
This is pretty close to unique, just like a browser finger print.
I'm just making my way out of a course called Health Informatics. Most of what we've done is look at HIPAA, and the standards that make sending patient info from one hospital to another possible. In general the whole situation in a mess. I understand the purpose of not sharing identifiable data with the world, stops people from targeting people because of their conditions. But we have a wealth of information that's been made effectively useless from a research perspective.
this isn't much of a question, just wanted to express my frustration with the whole thing as well. that said I've got a lot of respect for your mission, and the balls required to publish your otherwise HIPAA protected info.
Hey Brian, it's really great that you're doing this :)
If there's more to your medical history that you want to track down, or you want to get your data transformed into a structured format, you should reach out to us at PicnicHealth and we'll see what we can do.
What was the actual process of acquiring your entire medical record? My understanding was that this information can be highly fragmented depending on the number of different places one has received medical treatment.
Thanks for this! Have you considered sharing DICOM files, too (i.e. the actual images from your MRI, in addition to the reports)? If so, what went into the decision not to include these?
I'm really confused the purpose what this article's purpose is.
You first talk about the issues with clinical trials then you throw in a tidbit of you just feeling like putting your medical records on public because you couldn't find many open medical records?
Good feedback, when I initially was diagnosed and wanted to start working on this problem I had no idea what a medical record looked like so I didn't know what the data I'd be working with looked like which can be tricky to do a data project without knowing the data structure :) I just wanted to share mine in case anyone wants to tackle something medical record related in the future they'll be able to see what the data sets they'll be working with may look like!
The clinical trial bit is our specific use of that data
Unfortunately my provider doesn't have that for full download for me, I need to drive there and pickup a CD and haven't had the time to do that yet. Plan to get the full file soon!
picnichealth could easily add a "opt in" option whereby patients can opt their data into to trials. Institutions could pay for access to all this curated data to use for testing or recruitment of patients.
[+] [-] howderek|8 years ago|reply
You can add your Personal Health Record to it.
[1] http://personalgenomes.org/
[+] [-] abetusk|8 years ago|reply
[1] https://www.openhumans.org/
[2] https://opensnp.org/
[+] [-] blaurenceclark|8 years ago|reply
[+] [-] jpobst|8 years ago|reply
In order to study something, he has to:
* Come up with a hypothesis that X may cause Y
* Request access to data about that hypothesis
* He is only given the data regarding his hypothesis
* He can then study whether his hypothesis has merit or not
We should be dumping these whole datasets into machine learning and having computers give us potential links to explore. Obviously there will be plenty of things that turn out to be unrelated, but it's also very likely the computer can find links that a human would not have considered.
I don't see it changing any time soon in the US, but I suspect other countries with this data will use it, and we'll find the next generation of medical breakthroughs no longer come from the US.
[+] [-] specialist|8 years ago|reply
The potential for abuse is not hypothetical.
While I was implementing medical information exchanges, every single participant considered patient data to be their own, to be used as they wish. Our (grand)parent company, a lab, was negotiating with Microsoft, Google, pharmas, etc. Each was trying to figure out how to monetize it. For example, targeted ads.
The C (executive) level players mocked HIPAA and the other (meager) patient and consumer protections the same way they mocked Sarbanes-Oxley, environmental protections, financial reporting requirements, etc. If you think Google and Facebook are bad...
---
My data, all that is known about me, is my identity. It's me.
At the very least, if someone's going to profit from my data, I want my cut.
[+] [-] mattjack|8 years ago|reply
>You're describing P-value hacking
Here's an example of what can happen when you take a huge corpus of data and throw an equally huge number of hypotheses at it to see what sticks: https://io9.gizmodo.com/i-fooled-millions-into-thinking-choc...
tl;dr: he "proved" chocolate causes weight loss by comparing chocolate- and non-chocolate-eaters on a very high number of health indicators.
That also introduces the multiple testing problem: https://www.wikiwand.com/en/Multiple_comparisons_problem
The more statistical tests you run against a set of data (EDIT: the more variables you test against a dataset), the higher the chance you get a statistically significant result from random error alone.
[+] [-] kharms|8 years ago|reply
You're describing P-value hacking, thus named because hack scientists use this technique to publish papers about nonsense.
[+] [-] troyastorino|8 years ago|reply
[1] https://en.wikipedia.org/wiki/NHS_Connecting_for_Health [2] https://en.wikipedia.org/wiki/Health_Insurance_Portability_a... [3] https://en.wikipedia.org/wiki/Health_Information_Technology_... [4] https://en.wikipedia.org/wiki/Healthcare_in_Denmark#eHealth
[+] [-] diegoprzl|8 years ago|reply
[+] [-] blaurenceclark|8 years ago|reply
[+] [-] voicedYoda|8 years ago|reply
My past experience as a software developer was, "Give me all the datum, and tell me what you need, then I'll make it work." I even worked for a very large EMR (probably the biggest on the planet), and getting a patient record out of their system is a nightmare, even though the foundation of their application is the patient record.
I'd love to converse more about what you're building, as we capture many unstructured documents and are now using ML to grab details out of these and match to criteria.
[+] [-] blaurenceclark|8 years ago|reply
[+] [-] awinter-py|8 years ago|reply
Medical privacy is ethically tricky. It (1) protects bad doctors, (2) makes it harder to develop treatments, (3) makes it hard for consumers to shop intelligently.
Medical privacy would be useful when negotiating cost of coverage with your insurer, but they have a contractual right to demand your complete medical record.
The best arguments I've heard for medical privacy are (1) you might not get a job if you're sick, (2) shame factor could prevent people from going for treatment and (3) you may not get a date if you have, say, herpes. (#3 is true but not necessarily a strong point from a social standpoint).
[+] [-] codemac|8 years ago|reply
Medical records can show all kinds of markers about your past / current behavior that let people paint pretty horrible assumptions about eachother.
Type 2 Diabetes? Man you must eat poorly.
Herpes? You must have gotten from being promiscuous and risky
Depression? Must not be able to deal with the shit that is real life.
Hormone therapy? Dental issues? Pain killers? Allergies? I mean the list is almost as long as the list of all medical issues that people.
Just about every medical condition, people paint with behavioral moral/ethical judgement which is almost entirely unfair. I think medical privacy is hugely important for society as we currently are, and losing it would not change these effects, but instead increase the ease to discriminate against them.
[+] [-] pavel_lishin|8 years ago|reply
All super fun facts that people would love for friends, coworkers and strangers to be able to find out.
I understand that there are good arguments for releasing medical data, but this is just the "if you have nothing to hide, what are you worried about?" argument.
[+] [-] sweden|8 years ago|reply
http://www.reuters.com/article/us-cybersecurity-hospitals-id...
It is very easy for the typical software engineer to come up with the brilliant idea of open sourcing everything without thinking of any of the consequences. But the real world is much more complex than that.
[+] [-] comboy|8 years ago|reply
Most people won't bother (strong default effect), so lots of data for research, and those who care can still can have their privacy. It won't exactly be a random sample, but it should still be better than what's available currently.
[+] [-] maxerickson|8 years ago|reply
Looks like that has a fair chance of changing though.
[+] [-] roywiggins|8 years ago|reply
[+] [-] willpearse|8 years ago|reply
I whole-heartedly support the general idea, and making a centralised database of things like this would be great. Such a database would probably make it easier to anonymise the data as well.
[+] [-] kwhitefoot|8 years ago|reply
This is pretty close to unique, just like a browser finger print.
See for instance: http://randomwalker.info/publications/no-silver-bullet-de-id...
[+] [-] herman5|8 years ago|reply
[+] [-] blaurenceclark|8 years ago|reply
[+] [-] kiddico|8 years ago|reply
I'm just making my way out of a course called Health Informatics. Most of what we've done is look at HIPAA, and the standards that make sending patient info from one hospital to another possible. In general the whole situation in a mess. I understand the purpose of not sharing identifiable data with the world, stops people from targeting people because of their conditions. But we have a wealth of information that's been made effectively useless from a research perspective.
this isn't much of a question, just wanted to express my frustration with the whole thing as well. that said I've got a lot of respect for your mission, and the balls required to publish your otherwise HIPAA protected info.
[+] [-] troyastorino|8 years ago|reply
If there's more to your medical history that you want to track down, or you want to get your data transformed into a structured format, you should reach out to us at PicnicHealth and we'll see what we can do.
[+] [-] herman5|8 years ago|reply
[+] [-] JoshMandel|8 years ago|reply
[+] [-] tranv94|8 years ago|reply
[+] [-] blaurenceclark|8 years ago|reply
The clinical trial bit is our specific use of that data
[+] [-] brynlewis|8 years ago|reply
CDA are xml document conforming to a schema specified for medical documents.
[+] [-] IdleChris|8 years ago|reply
[+] [-] blaurenceclark|8 years ago|reply
[+] [-] Kinnard|8 years ago|reply
[+] [-] ipunchghosts|8 years ago|reply