2017.5.11: For Edgar MD&A Extraction, see edgar-10k-mda

edgar-10k-sa

Section I. downlaod & extract mda from edgar 10k forms

To see full command: python crawl10k.py -h

Class FormIndex: - First we download the full indexes with year range(urls of form10k files) - Save to csv file
Class Form: - We download with http requests(edgar closed ftp service since 2017) with previously downloaded form indices

- The 10k are stored in html format, so use BeautifulSoup to parse the raw html and also preprocess text for easier MDA finding
- Save to txt dir in 'filename.txt'

Class MDAParser: - Try to extract MDA section from preprocessed text - Save file to mda dir in 'filename.mda' - Save parsing results to 'parsing.log', shows SUCCESS/FAILURE of each file

II. Sentiment Analysis with Bill McDonald's Code (Code can be found at http://sraf.nd.edu/textual-analysis/)

Specify mda files, dictionary file & result csv file in Generic_Parser.py
run 'python Generic_Parser.py'
Code has been modified to add CIK for this repo(CIK is included in filename in the first section)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

2017.5.11: For Edgar MD&A Extraction, see edgar-10k-mda

edgar-10k-sa

Section I. downlaod & extract mda from edgar 10k forms

Files

README.md

Latest commit

History

README.md

File metadata and controls

2017.5.11: For Edgar MD&A Extraction, see edgar-10k-mda

edgar-10k-sa

Section I. downlaod & extract mda from edgar 10k forms