Hello there!
So, part of what I do as a "researcher" is read new research papers. But an event arrived where I need to download and format (in a presentable fashion) large quantities of research papers.
I was wondering if anyone knew how I should start this (especially with the website/information archiving projects).
Basically I'd be going on websites like this:
http://www.sciencedirect.com/science/journal/00221694 (format could be different later)
And part of what I need to do right now download the PDF file of the report, and (off of the information presented on that page) create a nice easy-to-read way of presenting the Title, Authors, and the Abstract.
Is there already a source I can work off of to get this started? I'm assuming Python is probably the easiest language to start with and work on this. Anyone have a general idea on how I should go about this?
I guess for the "bigger discussion", anyone have an experience with programming web crawlers?
Edit: I guess I should have been a bit more specific. Currently I'm looking at using a variation of this: http://scrapy.org/.
So, part of what I do as a "researcher" is read new research papers. But an event arrived where I need to download and format (in a presentable fashion) large quantities of research papers.
I was wondering if anyone knew how I should start this (especially with the website/information archiving projects).
Basically I'd be going on websites like this:
http://www.sciencedirect.com/science/journal/00221694 (format could be different later)
And part of what I need to do right now download the PDF file of the report, and (off of the information presented on that page) create a nice easy-to-read way of presenting the Title, Authors, and the Abstract.
Is there already a source I can work off of to get this started? I'm assuming Python is probably the easiest language to start with and work on this. Anyone have a general idea on how I should go about this?
I guess for the "bigger discussion", anyone have an experience with programming web crawlers?
Edit: I guess I should have been a bit more specific. Currently I'm looking at using a variation of this: http://scrapy.org/.
Last edited by a moderator: