On Thu, Apr 5, 2012 at 11:04 PM, Jonathan Cline <jncline@gmail.com> wrote:
By the way the challenge isn't in grabbing the data, it's in verifying the data is valid. Nature publishes trash, too (like most of the syn bio papers.. *cough* *cough*). From theory-of-information and entropy laws (communication systems theory), more information is typically worse than a single, better piece of information, since the garbage data becomes cumulative noise.
I agree that verifying the data is important. I've had way too many contributed .tar.gz's of PDFs without metadata. OCR doesn't work like it's supposed to (yet). Don't expect OCR to figure out which exact esoteric journal some paper comes from. Always grab the metadata, always verify if you have a complete copy of an issue, etc. At the moment I am assembling an index across a few publishers for this but it's slow going..
http://heybryan.org/
1 512 203 0507
--
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diybio@googlegroups.com.
To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/diybio?hl=en.
0 comments:
Post a Comment