top | item 19938201

(no title)

dasmoth | 6 years ago

Also a guide to how to usr all that to interpret medical nonclemature of mutations, like c.345G>E would be handy

Those mutation descriptions are called HGVS (Human Genome Variation Society) nomenclature. In the example you give, "c." means that it's in a (protein) coding region, 345 is the position within the region, and G>E would be the change (although E isn't a valid "letter" in DNA sequence, even if you allow ambiguity codes -- you'd normally see something like G>T there instead).

Complications include:

1) You need to know which gene this is relative to.

2) The "coding sequence" for the gene isn't always perfectly defined, due to splice variation and different versions of the annotation. Ideally, you'd see this code relative to a specific splice variant (which might have an ENST identifier, from http://www.ensembl.org/). But it depends...

More at http://varnomen.hgvs.org/ if you're curious.

discuss

andy-thomason|6 years ago

How to represent variants is a whole can of worms. There are a number of competing systems.

* RSIDs (from DBsnp https://www.ncbi.nlm.nih.gov/snp/) * HGVS as mentioned. * Ensembl chrom-pos-ref-alt (CPRA). * Variant key (Nicola Asuni)

As dasmoth says, there is no fixed coding sequence for a gene or location in the genome.