Skip to content

scipeds.data.engine

IPEDSQueryEngine(db_path: Optional[Path] = SCIPEDS_CACHE_DIR / DB_NAME)

A structured way to query the IPEDS table to format data for visualization

Parameters:

Name Type Description Default
db_path Optional[Path]

Path to pre-processed database file. Defaults to CACHE_DIR / DB_NAME.

SCIPEDS_CACHE_DIR / DB_NAME

Raises:

Type Description
FileNotFoundError

Pre-processed database file not found.

get_df_from_query(query: str, query_params: Optional[Dict[str, Any]] = None, show_query: bool = False) -> pd.DataFrame

Return the dataframe result of the provided SQL query on the pre-processed duckdb

Parameters:

Name Type Description Default
query str

SQL query (using duckdb syntax)

required
query_params Dict[str, Any]

Prepared statement variables for query. Defaults to None.

None
show_query bool

Whether to print the query and parameters before executing. Defaults to False

False

Returns:

Type Description
DataFrame

pd.DataFrame: Data returned by query

list_tables() -> List[str]

List all tables in the duckdb

Returns:

Type Description
List[str]

List[str]: A list of all available tables

get_cip_table() -> pd.DataFrame

Get a table of every unique 2020 CIP Code

Returns:

Type Description
DataFrame

pd.DataFrame: Data frame of CIP codes and corresponding taxonomy titles

get_institutions_table(cols: str | list[str] | None = None) -> pd.DataFrame

Get institution characteristics table, optionally with specified columns

Returns:

Type Description
DataFrame

pd.DataFrame: Data frame of institution characteristics