pipeline.cip_crosswalk
The code in pipeline.cip_crosswalk
handles transformations related to CIP (Classification of Instructional Program) codes:
- The
CIPCodeCrosswalk
handles transforming old CIP codes (CIP90, CIP2k, and CIP2010) to the current standard (CIP2020) - The
DHSClassifier
classifies CIP codes according to whether or not the appear in the Department of Homeland Security's STEM Designated Degree Program List - The
NCSESClassifier
classifies CIP codes according to the NCSES Alternate Classification and the broad fields included in the NSF's Diversity and STEM reports
CIPCodeCrosswalk(crosswalk_dir: Path = pipeline.settings.RAW_DATA_DIR / pipeline.settings.CROSSWALKS_DIRNAME)
Handles cross-walking of CIP codes from the past to 2020 CIP codes
walk(year_range: Tuple[int, int], codes: pd.Series, titles: Optional[pd.Series] = None) -> Tuple[pd.Series, pd.Series]
Map from old set of codes (in old year range) to newer set of codes
convert_to_cip2020(year: int, codes: Union[str, List[str], pd.Series], titles: Optional[Union[str, List[str], pd.Series]] = None) -> pd.DataFrame
Convert from old CIP codes to CIP 2020 CIP Codes and Titles
NCSESClassifier(filepath: Path = PIPELINE_ASSETS / 'ncses_stem_classification_table.csv')
Class for converting CIP codes to NCSES hierarchical classification
Read the NCSES file into an internal df to use for classification
get_titles(codes: Union[str, Iterable[str]], fill_na: bool = True) -> pd.Series
Return NCSES title strings corresponding to 2020 CIP Codes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
codes
|
Union[str, List[str], Series]
|
CIP 2020 codes |
required |
fill_na
|
bool
|
Whether to fill NA values with "Unknown". Default: True |
True
|
Returns:
Type | Description |
---|---|
Series
|
pd.Series: NCSES title strings |
classify(original_codes: Union[str, List[str], pd.Series], codes_2020: str | List[str] | pd.Series | None = None) -> pd.DataFrame
Classify CIP code(s) in the NCSES classification.
In all cases, prefer the classification of the CIP2020, but use the original version if the 2020 version is unclassified.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
original_codes
|
Union[str, List[str], Series]
|
CIP code(s) |
required |
codes_2020
|
Union[str, List[str], Series]
|
CIP 2020 code(s) to classify, optional. Default: None |
None
|
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: Data frame indexed by CIP code with each level of NCSES classifcation as columns |
DHSClassifier(filepath: Path = PIPELINE_ASSETS / 'dhs_stem_classification_table.csv')
Read the NCSES file into an internal df to use for classification
classify(codes: Union[str, List[str], pd.Series, pd.Index]) -> pd.DataFrame
Classify a set of CIP codes as belonging (True) or not belonging (False) to the DHS set of STEM CIP codes
Parameters:
Name | Type | Description | Default |
---|---|---|---|
codes
|
Union[str, List[str], Series]
|
CIP code(s) |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pd.DataFrame: DataFrame indexed by input codes with one bool column indicating DHS STEM |