Last Update: 3/22/ 2020
Cheetah, a valuable website of tax documentation, can be better utilized with Python. This article will help you understand how Cheetah works and how to use it by Python.
Understand how Cheetah works
To begin with, we need to understand how Cheetah works. By clicking tax-state &local, one can easily identify the tax documentation for different states in different years by selecting them from the drop-down menu.
An interesting fact about Cheetah is that if your file is too large, it will send it to your mailbox rather than download it directly. On the contrary, if your file is relatively small, you can download it instantly by clicking the download button. To help the reader better understand the email part, we post a screenshot below.
In addition, there are also some other things you need to take into consideration when designing the program.
- First, if you are looking for a specific type of tax code, a particular state might not have that tax in a particular year, and you don’t know it.
- Second, the website might stop working for a little well if you continuously requesting the download.
- Third, sometimes the webpage is slow, so the element didn’t pop out timely.
- Fourth, you might need to go to your email box to download part of the documents.
- Fifth, the downloaded data is not auto-named by year or state.
How to use it in Python
Unlike SEC Edgar or Google Trend Data, there are barely any available open-source python libraries for Cheetah. So we can only make this happen by developing our own tool.
Cheetah has its own download tool, which is a paid function, and we do not express our opinion on it as we haven’t used it before.
The path for the chrome driver should be set. If you are using Mac, you can know the path of chrome driver by dragging and dropping anything into the OS X Terminal. If you are a first-time user, testing code for set-up selenium is:
This code will automatically open a Google Page and search for ChromeDriver. If you made it, the basic setup for ChromeDriver is finished.
Find the pattern and use Selenium
Once you identified the pattern, you can locate the XPath in Chrome by examing the element and clicking copy XPath. We now provide a short example of the XPath and how we use it in python.
For more details, please download the document below.
After you feel that you are crystal clear about the XPath thing, you can begin to work on designing loops for selecting. There are basically three layers of loops: state, year and type of tax code. All patterns can be found in the above document. Also in part B of the document, it also helps you understand how to create a program to go to your email and get your file downloaded automatically, read the RTF file and rename it.
A sample code is also provided for this program.
Currently, we haven’t updated the code since 2019 February and we can’t ensure that it does not have any compatibility issue with the current Cheetah website. Feel free to reach out to firstname.lastname@example.org about the update.