Skip to content

Latest commit

 

History

History
27 lines (22 loc) · 1.17 KB

README.md

File metadata and controls

27 lines (22 loc) · 1.17 KB

2017.5.11: For Edgar MD&A Extraction, see edgar-10k-mda


edgar-10k-sa


Section I. downlaod & extract mda from edgar 10k forms

To see full command: python crawl10k.py -h

  1. Class FormIndex: - First we download the full indexes with year range(urls of form10k files) - Save to csv file

  2. Class Form: - We download with http requests(edgar closed ftp service since 2017) with previously downloaded form indices

- The 10k are stored in html format, so use BeautifulSoup to parse the raw html and also preprocess text for easier MDA finding
- Save to txt dir in 'filename.txt'
  1. Class MDAParser: - Try to extract MDA section from preprocessed text - Save file to mda dir in 'filename.mda' - Save parsing results to 'parsing.log', shows SUCCESS/FAILURE of each file

II. Sentiment Analysis with Bill McDonald's Code (Code can be found at http://sraf.nd.edu/textual-analysis/)

  1. Specify mda files, dictionary file & result csv file in Generic_Parser.py
  2. run 'python Generic_Parser.py'
  3. Code has been modified to add CIK for this repo(CIK is included in filename in the first section)