Last Update 03/14/2020
This article will provide you a basic understanding of how SEC Edgar works and how to use it with python. SEC Edgar is a valuable source for researchers. Realizing this fact, SEC post official guidance on how to use the data: https://www.sec.gov/oiea/Article/edgarguide.html
On SEC Edgar, you can find:
- Financial Information and Result of Operation (10-K, …)
- Shareholder meetings
- Executive compensation
- Insider transaction
- Beneficial Ownership interest
- Business Combination
- Public Offerings (S-1, S-3,…)
- Securities based Crowdfundings
- Regulation A offering
- Foreign Private Issuers
- Mutual Funds and ETFs
- Variable Annuities
*: Name of form types: https://www.sec.gov/info/edgar/forms/edgform.pdf
Understand SEC Edgar Search Process
When searching for a specific company, the most widely used keys are “tickers” and “Central Index Key (CIK)”. Also, Users can search by company name. A typical search page looks like this:
From the picture above, We can see that using ticker or CIK is the fastest way to find the document for a specific company. Fortunately, CIK and ticker are easy to obtain. A sample of S &P 1500 CIK is attached below.
A. Traditional Function
We can filter the data by date and by type. But when we are using filter by date, we can only find files prior to a specific date. For each file, there will be both .HTML and .txt version. Besides, the URL link of the images and signed pages are also provided.
An interesting function，“Interactive data”，is now available for financial information and results. By clicking the button of interactive data, we can enter a separate page:
This page will enable users to use the data and text inside the financial statements directly, which means:
- All data can be downloaded in excel, including those small tables in the note disclosure.
- The text is well-organized, making the text analysis of the 10-K no longer difficult. This improvement might contribute to future corporate disclosure studies.
How to develop SEC Edgar web-crawler
Currently, many unofficial APIs are available. For example, python- edgar developed by Edouard Swiac, sec-edgar-downloader developed by Jad Chaar, and SEC-Edgar-Crawler by jackmoody. Most of them are focused on file downloading, making the download of HTML or txt file above available.
However, in some research, we might want to see something more specific, such as the image, the accounting policy, or a part of the note. Therefore, people might need to develop their own web-crawler.
A few Python libraries will be really helpful in developing personal SEC Edgar web scrawler:
- HTML based webpage: request, beautiful soup
- regular expressions: re
In addition, Sample code for catching all image in 10-K files on Edgar can be found at the Google Colab link below:
Sales Count: [mycred_content_sale_count]
We are still working on the new web crawler that is compatible with the new function on SEC Edgar and will update ASAP.