Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reagrding datasets #1

Open
alkakhurana opened this issue May 2, 2022 · 3 comments
Open

Reagrding datasets #1

alkakhurana opened this issue May 2, 2022 · 3 comments

Comments

@alkakhurana
Copy link

Hi,
I found your paper titled "Extracting highlights of scientific articles: A supervised summarization approach". I am addressing the problem of text summarization. For evaluation purpose, I need the data-set used in your work. Is there any way out to get the three datasets (BioPubSum, CSPubSumm and AIPubSum)?

Thanks

@MorenoLaQuatra
Copy link
Owner

Hi @alkakhurana,

Thank you for your interest in the topic.
To download the data you should follow the instructions provided in this repository. It will allow you to download CSPubSumm.

To download the additional dataset provided by us (BioPubSum & AIPubSum) you just need to use the files containing the URLs provided in our repo: link. They follow the exact same format as the original ones.

@alkakhurana
Copy link
Author

Hi @MorenoLaQuatra,
Scientific articles in CSPubSum data-set are not open access and are not accessible through the API key method described in https://github.com/EdCo95/scientific-paper-summarisation/tree/master/DataDownloader

Can you provide the text/xml of the scientific articles in the three data-sets?

Thanks

@MorenoLaQuatra
Copy link
Owner

Unfortunately, I don't have the right to share the data collection, Elsevier is very strict with that. This is why no one share the formatted version of the collection.

You should be able to access the required data from an institution that has an agreement with Elsevier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants