Hi,
that's exactly what's already done with the human genome. There's the Variant Call Format (VCF) around which is mainly used for SNP-data: http://www.1000genomes.org/node/101
And the Personal Genome Project delivers their genome data in GFF format which gives information as "chromosome 1, position 1-1243214": REF if the data matches the reference genome. So you only produce larger overhead for the variations which aren't included in the reference genome.
Those formats are still miles from perfect in terms of usability and memory efficiency, but they're a first step in the direction you mentioned. :)
cheers,
Bastian
On 26 November 2012 13:20, Cathal Garvey <cathalgarvey@gmail.com> wrote:
That raises another prospect of course, a "Genome diff": if a reference genome is known for a species that is "good enough", or if there are several known reference genomes for subgroups that narrow the gaps usefully, then your "genome" can become a string of differences between your genome and the chosen reference. Far smaller and easier to send/store/share.--
On 25 November 2012 21:06, Eric Kelsic <kelsic@gmail.com> wrote:Since people have very similar genomes, storage requirements for multiple genomes drop considerably when the data is compressed:Human genomes as email attachmentsScott Christley, Yiming Lu, Chen Li and Xiaohui Xie(open access) http://bioinformatics.oxfordjournals.org/content/25/2/274.fullIn this case they compress the SNPs and indels of a human genome compared to a reference in a 4mb file. There are other types of genomic variation that this method doesn't handle, like structural rearrangements, but getting that info is more a problem with sequencing technology than with file compression.Keeping the data for individual reads from a next generation sequencer requires a lot of storage. That's the easiest way to end up with terabytes of data. My main point is just that the differences you would actually care about for personal genomics are a relatively small part of the information contained in a human genome.-e
On Saturday, November 24, 2012 4:21:31 PM UTC-5, Nathan McCorkle wrote:Are you sure you're not thinking before assembly vs after assembly? If humans are around 6.5 gigabases, and 2 bits per base, that's 1.625 gigabytes. Assuming we just use one byte per base, that gives us 6 extra bits for storing methylation status, etc, and is 13 gigabytes (also what it costs to store that as ASCII).Am I missing something?On Fri, Nov 23, 2012 at 1:37 PM, Giovanni <giovanni...@gmail.com> wrote:
By one estimates, I read that the costs for storing full genome sequence data would be pricey (50 terabytes), although I read this article about a new optical disc that may be released in 2015 which will store 1-15 terabytes of capacity. I'm not sure what media formats genomic sequencing services use, but blu-ray discs costs about about $1/BD-R and make exome data storage relatively affordable. It's not unlikely that the price of TB optical discs will cost $4-5 dollars in 2018. A full genome sequence would likely benefit from something like the above link, because 4TB hard-drives aren't inexpensive or lightweight.--
-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diy...@googlegroups.com. To unsubscribe from this group, send email to diybio+un...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=enTo post to this group, send email to diy...@googlegroups.com.
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To unsubscribe from this group, send email to diybio+un...@googlegroups.com.
Visit this group at http://groups.google.com/group/diybio?hl=en.
To view this discussion on the web visit https://groups.google.com/d/msg/diybio/-/qAh8zVnfOZ0J.
For more options, visit https://groups.google.com/groups/opt_out.
--
-Nathan
--To view this discussion on the web visit https://groups.google.com/d/msg/diybio/-/TBFEoktGXagJ.
-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diybio@googlegroups.com. To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diybio@googlegroups.com.
To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com.
Visit this group at http://groups.google.com/group/diybio?hl=en.
www.indiebiotech.com
twitter.com/onetruecathal
joindiaspora.com/u/cathalgarvey
PGP Public Key: http://bit.ly/CathalGKey
--
-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diybio@googlegroups.com. To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diybio@googlegroups.com.
To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com.
Visit this group at http://groups.google.com/group/diybio?hl=en.
-- You received this message because you are subscribed to the Google Groups DIYbio group. To post to this group, send email to diybio@googlegroups.com. To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/diybio?hl=en
Learn more at www.diybio.org
---
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diybio@googlegroups.com.
To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com.
Visit this group at http://groups.google.com/group/diybio?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.






0 comments:
Post a Comment