Hm, interesting.
There's 450,000 well described records and they allow a maximum of 5,000 requests per day according to their developer guidelines. Not too slow to copy..
However with 20 million partially descriptive records.. Really the issue isn't mirroring the records to a local copy, but synchronizing with updates when they edit a modification to any one of their records.
Maybe I will pair the records with the 350,000 nature articles I already downloaded via university access for personal use.
By the way the challenge isn't in grabbing the data, it's in verifying the data is valid. Nature publishes trash, too (like most of the syn bio papers.. *cough* *cough*). From theory-of-information and entropy laws (communication systems theory), more information is typically worse than a single, better piece of information, since the garbage data becomes cumulative noise.
I suppose the author metadata could be harvested to prove we are all 5-degrees separation from George Church? :-D
The related topic is about a PLoS torrent which does not yet exist (collection of all PLoS articles in one downloadable package).. Perhaps a git repository would be more groovy.
## Jonathan Cline
## jcline@ieee.org
## Mobile: +1-805-617-0223
########################
On Wednesday, April 4, 2012 1:25:48 PM UTC-7, Bryan Bishop wrote:
So.. first person to make a complete copy of all bibliographic records, gets a chocolate chip cookie.The data from this query is pretty nice:describe <http://dx.doi.org/10.1038/nm.2129 >"Nature Publishing Group (NPG) today is pleased to join the linked data community by opening up access to its publication data via a linked data platform. NPG's Linked Data Platform is available at http://data.nature.com The platform includes more than 20 million Resource Description Framework (RDF) statements, including primary metadata for more than 450,000 articles published by NPG since 1869. In this first release, the datasets include basic citation information (title, author, publication date, etc) as well as NPG specific ontologies. These datasets are being released under an open metadata license, Creative Commons Zero (CC0), which permits maximal use/re-use of this data."- Bryan
http://heybryan.org/
1 512 203 0507
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To view this discussion on the web visit https://groups.google.com/d/msg/diybio/-/e5_XujarqZUJ.
To post to this group, send email to diybio@googlegroups.com.
To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/diybio?hl=en.
0 comments:
Post a Comment