Congressional Record Scraper

This package contains functions to scrape, parse, and analyze the Congressional Record. It provides tools to extract metadata from congress.gov and download the text of the Congressional Record.

Features

Scrape metadata from congress.gov
Download text of the Congressional Record
Analyze the Congressional Record (functionality to be implemented)

Main Functions

1. `get_cr_df(date, section)`

Scrapes metadata for all subsections of the Congressional Record for a specific date and section.

Parameters:
- date: Date object
- section: String (e.g., "senate-section", "house-section")
Returns: DataFrame with metadata including headers and URLs to raw text

2. `get_cr_htm(row, output_dir)`

Downloads the raw text of each subsection as an .htm file.

Parameters:
- row: DataFrame row containing metadata
- output_dir: Directory to save the downloaded files

Usage

from datetime import datetime
from congressional_record_scraper import get_cr_df, get_cr_htm

# Scrape metadata
date = datetime(2007, 3, 1)
cr_metadata = get_cr_df(date, section="senate-section")

# Download raw text
for _, row in cr_metadata.iterrows():
    get_cr_htm(row, "data/htm")

Requirements

Python 3.6+
requests
BeautifulSoup4
pandas
tqdm

Data Storage

By default, the script creates a data directory to store:

cr_metadata.csv: CSV file containing scraped metadata
htm subdirectory: Contains downloaded raw text files

Note

This scraper is designed to respect the terms of service of congress.gov. Please use responsibly and avoid overloading their servers with too many requests in a short time.

Reference

judgelord‘s projcet

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Congressional Record Scraper

Features

Main Functions

1. `get_cr_df(date, section)`

2. `get_cr_htm(row, output_dir)`

Usage

Requirements

Data Storage

Note

Reference

About

Releases

Packages

Languages

License

Nuohai-muxi/scraper-for-US-congress

Folders and files

Latest commit

History

Repository files navigation

Congressional Record Scraper

Features

Main Functions

1. get_cr_df(date, section)

2. get_cr_htm(row, output_dir)

Usage

Requirements

Data Storage

Note

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `get_cr_df(date, section)`

2. `get_cr_htm(row, output_dir)`

Packages