GitHub - bhuvanchennoju/NeurIPS-Papers-Crawler: This is data curation tool for the NeurIPS Proceddings. Based on year selection, papers could be downloaded into pdfs, abstarcts as needed.

NeurIPS Papers Crawler

This project is designed to curate the research papers for the language models, specific to the NeurIPS conference website from 1987 to the latest conference. The web scrapper fetches paper metadata (for years before 2020), summaries, and PDFs for either a given year or a specific range of years.

project_root/
├── data
│   ├── 2021
│   │   ├── paper_id
│   │   │   ├── paper_id_abstract.json
│   │   │   ├── paper_id_metadata.json
│   │   │   └── paper_id.pdf
│   └── ...
├── utils.py
├── crawler.py
├── requirements.txt
├── Readme.md

Installation

To install this crawler, Ensure you have Python 3.7 or later installed. You will also need to install the required Python packages:

pip install -r requirements.txt

Usage

To execute the crawler use the following command.

python crawler.py --start_year 2020 --end_year 2021 --output_dir './data/' --type 'all'

Arguments for the crawler:

--start_year: The starting year of the conference papers you want to download (default is 1987).
--end_year: The ending year of the conference papers you want to download (default is the latest year available).
--output_dir: The directory where the data will be saved (default is data).
--type: To download the specific data type., it takes 'metadata', 'pdf', 'abstract', 'all' as inputs (default is all)

Notes

Ensure the output directory exists before running the script.
The script fetches data concurrently to speed up the process.
If any errors occur while fetching the data, error message will be shown and exit out.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
crawler.py		crawler.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NeurIPS Papers Crawler

Installation

Usage

Notes

About

Releases

Packages

Languages

bhuvanchennoju/NeurIPS-Papers-Crawler

Folders and files

Latest commit

History

Repository files navigation

NeurIPS Papers Crawler

Installation

Usage

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages