Hey, this is a damned good idea! Normally the problem with making a
search engine is the nigh-infinite scale of the target; the internet.
With science, you've got a far more defined network to hit, and just by
following the refs you'll naturally find the most important articles
quickly, reaching the "long tail" more gradually.
What would you need, then? I imagine your search engine would need OCR
baked in, to read PDFs (might as well cache as plaintext as you do so..)
of older articles, some RegEx to find the References list and translate
to journal volumes, issues and pages (or just google-scholar links for
bootstrapping purposes..), and a system for translating that to direct
links to journal websites.
For more recent papers, you can just periodically scan the journal sites
for new issues and jump to the "citations" list for spidering purposes.
Of course, given the size of the population we're dealing with, you
could also try another tack; offer scientists a browser plugin that only
kicks in when they visit a journal website, and scans the pages to get
information on the papers being read.
On 31/01/12 00:49, Bryan Bishop wrote:
> On Mon, Jan 30, 2012 at 6:47 PM, kingjacob <kingjacob@gmail.com> wrote:
>
>> Having one "for profit" gatekeeper,no matter how not evil they claim to
>> be, doesn't sit well with me. Maybe if it was an open source search engine
>> with a real privacy policy I'd be okay with that being the only option.
>
>
> Maybe. One of the problems with starting a scholarly search engine is that
> all of the publishers have given pdf priority to Google's IPs. It used to
> be by user agent. So I'm not sure how to route around this problem.
>
> - Bryan
> http://heybryan.org/
> 1 512 203 0507
>
--
www.indiebiotech.com
twitter.com/onetruecathal
joindiaspora.com/u/cathalgarvey
PGP Public Key: http://bit.ly/CathalGKey
--
You received this message because you are subscribed to the Google Groups "DIYbio" group.
To post to this group, send email to diybio@googlegroups.com.
To unsubscribe from this group, send email to diybio+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/diybio?hl=en.
0 comments:
Post a Comment