This spider crawls GreatSchools.org for school district information. It can be used to scrape school district information from multiple or all 50 states. The output provides:
- District Name
- City
- County
- District Website
- District Phone Number
- Number of schools in the district
- Grade Levels
These instructions will help you get started with using the application.
- Python 3.6.5 - The scripting language used.
- Scrapy - Web crawling framework to write spider to scrape site.
Run the following command to start the spider:
scrapy runspider greatschools_org.py
To run the spider and output results in a CSV file:
scrapy runspider greatschools_org.py -o output.csv
Below is a screenshot of the resulting CSV file from running the spider on (https://www.greatschools.org/schools/districts/California/CA/)
- Patrick Yu - Initial work - patrickgod1