Usage¶

This page documents basic concepts and functionality within scipeds.

The Data¶

First things first: scipeds is all about IPEDS data. While IPEDS provides an incredibly rich set of data across multiple decades, the data is split out across different files and formats on the IPEDS data portal, making it challenging to do anything but the most basic analyses.

One key thing we've done with scipeds is to standardize and centralize this data across multiple years into a single, query-able database. That database file is critical to using scipeds, so downloading it is the first step.

For this tutorial, we'll be focusing on IPEDS completions data. That said, we've written scipeds in a way that should make it easy to add other survey components like admissions and enrollment. As the package expands, approaches and concepts similar to the ones we share here will apply to these new IPEDS survey components.

In [1]:

Copied!





import scipeds

# Download the pre-processed database. You can control
# the file location by setting the SCIPEDS_CACHE_DIR
# environment variable or by specifying an output_path parameter.
# If nothing is provided (as in the example here),
# this path Defaults to platformdirs' user_cache_dir.
scipeds.download_db()
import scipeds

# Download the pre-processed database. You can control
# the file location by setting the SCIPEDS_CACHE_DIR
# environment variable or by specifying an output_path parameter.
# If nothing is provided (as in the example here),
# this path Defaults to platformdirs' user_cache_dir.
scipeds.download_db()

Downloading pre-processed IPEDS db to /opt/render/.cache/.scipeds/scipeds_0_0_6.duckdb

Download complete.
scipeds has 3 files and is using 923MB in /opt/render/.cache/.scipeds

Now that the database is downloaded, we can use the IPEDS Completions Query Engine to query it.

The Query Engine¶

scipeds is really just a thin wrapper for aggregating the pre-processed IPEDS data using SQL queries. This thin wrapper is implemented in the CompletionsQueryEngine, which is what we'll use to work with the pre-processed database file that we just downloaded.

In [2]:

Copied!

from scipeds.data.completions import CompletionsQueryEngine
from scipeds.data.completions import CompletionsQueryEngine

We also provide a few other tools in scipeds that makes it possible to construct different kinds of queries of the completions data.

In particular, we provide tools for filtering data and aggregating data across fields in the scipeds.data.queries submodule, and convenience enumerations in scipeds.data.enums that make it easier to specify the values you want to filter on without having to remember how every variable is specifically spelled.

We'll start by importing these helper tools and initializing the completions query engine.

In [3]:

Copied!





# Convenience tools for specifying different variables
from scipeds.data.enums import (
    AwardLevel,
    FieldTaxonomy,
    Grouping,
    NCSESSciGroup,
    RaceEthn,
)

# Tools for filtering data and aggregating fields within a taxonomy
from scipeds.data.queries import (
    QueryFilters,
    TaxonomyRollup,
)
# Convenience tools for specifying different variables
from scipeds.data.enums import (
    AwardLevel,
    FieldTaxonomy,
    Grouping,
    NCSESSciGroup,
    RaceEthn,
)

# Tools for filtering data and aggregating fields within a taxonomy
from scipeds.data.queries import (
    QueryFilters,
    TaxonomyRollup,
)

In [4]:

Copied!





# If you used a non-default location when you downlaoded
# the pre-processed database, you should specify that location
# with the db_path parameter here.
engine = CompletionsQueryEngine()
# If you used a non-default location when you downlaoded
# the pre-processed database, you should specify that location
# with the db_path parameter here.
engine = CompletionsQueryEngine()

Filtering the underlying data¶

First, we'll walk through how to use our pre-written wrappers to query the completions data.

All of the pre-written queries take a QueryFilter as input. This defines the set of filters that are applied to the underlying data before any aggregations are done. For example, you can use a QueryFilter to specify which years of data you want and to limit your results to just Bachelor's degrees.

In [5]:

Copied!





filters = QueryFilters(
    # The first year you want to include in your aggregation
    # (default: 1984)
    start_year=2022,
    # The last year you want to include in your aggregation
    # (default: 2023)
    end_year=2022,
    # The racial/ethnic groups you want to include
    # (default: all racial / ethnic groups)
    race_ethns=list(RaceEthn),
    # The award levels you want to include
    # (default: all award levels)
    award_levels=AwardLevel.bachelors,
    # The major numbers you want to include
    # (default: both first and second majors)
    majornums=[1, 2],
)
filters = QueryFilters(
    # The first year you want to include in your aggregation
    # (default: 1984)
    start_year=2022,
    # The last year you want to include in your aggregation
    # (default: 2023)
    end_year=2022,
    # The racial/ethnic groups you want to include
    # (default: all racial / ethnic groups)
    race_ethns=list(RaceEthn),
    # The award levels you want to include
    # (default: all award levels)
    award_levels=AwardLevel.bachelors,
    # The major numbers you want to include
    # (default: both first and second majors)
    majornums=[1, 2],
)

There's a few enumerations we use and provide that make it so you don't have to remember the exact string values for different variables. For instance, RaceEthn is an enumeration of the different race/ethnicity groups present in the dataset, Degree is an enumeration of all the award levels available, etc.

Rolling up over fields¶

The underlying completions table that we're looking at provides the total number of awards for each individual CIP code. Because we're usually interested in high-level field groupings (rather than individual CIP codes), we created a TaxonomyRollup which you use to tell scipeds the fields within a type of field grouping that you want to aggregate together.

There are two main concepts in the TaxonomyRollup:

The taxonomy_name, which defines which approach for grouping fields you want to use. For example, NCSES provides taxonomies for grouping CIP codes into broad field groups (like "Math & Computer Sciences") and detailed field groups (like "Computer Science"). You can see the options available in the enumeration FieldTaxonomy. Enumerated values are defined for some columns like the NCSES Science Group (NCSESSciGroup), but not for others (like CIP code) because there are simply too many values.
The taxonomy_values, which is a list of fields from the taxonomy you've chosen that you want to aggregate over. These are also available as enumerations, with each type of FieldTaxonomy having their own set of enumerated fields.

For the following examples, we'll be looking at the NCSES broadest categorization, aggregating over all fields that NCSES has categorized as "Science & Engineering". We do this by specifying the ncses_sci_group taxonomy and including any fields that are marked as NCSESSciGroup.sci.

In [6]:

Copied!





ncses_stem = TaxonomyRollup(
    taxonomy_name=FieldTaxonomy.ncses_sci_group,
    taxonomy_values=[NCSESSciGroup.sci],
)
ncses_stem = TaxonomyRollup(
    taxonomy_name=FieldTaxonomy.ncses_sci_group,
    taxonomy_values=[NCSESSciGroup.sci],
)

If you want to see which CIP codes are included in each FieldTaxonomy, you can get a table with all of the CIP codes and their respective categorizations with the .get_cip_table() function:

In [7]:

Copied!

engine.get_cip_table()
engine.get_cip_table()

Out[7]:

	cip_title	ncses_sci_group	ncses_field_group	ncses_detailed_field_group	nsf_broad_field	dhs_stem
cip2020
01.0000	Agriculture, General	Science and engineering	Life Sciences	Agricultural Sciences	Agricultural and biological sciences	False
01.0101	Agricultural Business and Management, General	Non-science and engineering	Business and Management	Business and Management	Non-science and engineering	False
01.0102	Agribusiness/Agricultural Business Operations	Non-science and engineering	Business and Management	Business and Management	Non-science and engineering	False
01.0103	Agricultural Economics	Science and engineering	Social Sciences	Economics	Social and behavioral sciences	False
01.0104	Farm/Farm and Ranch Management	Non-science and engineering	Business and Management	Business and Management	Non-science and engineering	False
...	...	...	...	...	...	...
61.2707	Unknown	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NSF broad field classification	False
61.2801	Unknown	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NSF broad field classification	False
80.9999	Unknown	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NSF broad field classification	False
95.0000	Unknown	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NCSES crosswalk	Not categorized in NSF broad field classification	False
95.9500	Undesignated Field Of Study	Non-science and engineering	Other Non-sciences or Unknown Disciplines	Other Non-sciences or Unknown Disciplines	Non-science and engineering	False

2015 rows × 6 columns

Pre-written queries to roll up across fields¶

Now that we've specified how we want to filter the raw data and which fields we want to aggregate over, we can run through some of the queries we've pre-written.

Aggregate by gender¶

First, we can calculate the total number of completions by gender by using the rollup_by_grouping function provided by the CompletionsQueryEngine. Remember: this will return the total completions across all of the fields we've specified (the rollup parameter), subject to the filters we've specified (the query_filters parameter).

In [8]:

Copied!





# Aggregate total bachelor's degree completions in 2022 in STEM fields
# by gender, including all races and both first and second majors
stem_completions_by_gender = engine.rollup_by_grouping(
    grouping=Grouping.gender,
    rollup=ncses_stem,
    query_filters=filters,
)
stem_completions_by_gender
# Aggregate total bachelor's degree completions in 2022 in STEM fields
# by gender, including all races and both first and second majors
stem_completions_by_gender = engine.rollup_by_grouping(
    grouping=Grouping.gender,
    rollup=ncses_stem,
    query_filters=filters,
)
stem_completions_by_gender

Out[8]:

	rollup_degrees_within_gender	rollup_degrees_total	uni_degrees_within_gender	uni_degrees_total
gender
men	381882	798298	884320	2138714
women	416416	798298	1254394	2138714

The returned dataframe contains four columns:

rollup_degrees_within_gender contains the number of degrees awarded across the fields specified by the TaxonomyRollup to members of the group specified in the row
rollup_degrees_total contains the total number of degrees awarded across the fields specified by the TaxonomyRollup across all groups in the grouping
uni_degrees_within_gender contains the total number of degrees awarded in all fields to members of the group specified in the row
uni_degrees_total contains the total number of degrees awarded in all fields across all groups in the grouping.

In other words, the columns that start with rollup provide total completions across just the fields specified in the TaxonomyRollup, while columns that start with uni provide total completions across all fields. Similarly, columns that end with _within_gender provide these totals for each gender separately, while columns that end with _total provide totals across all students in the query.

Note that all these values are subject to the specified QueryFilters.

The query returns some redundant data (and more data than needed) because that makes it easier to compute various percentages. For example, from the returned dataframe it's easy for us to compute the percentage of STEM Bachelor's degrees awarded to men and women as well as the percentage of all university Bachelor's degrees awarded to men and women:

In [9]:

Copied!





import pandas as pd

stem_pct = (
    stem_completions_by_gender[
        "rollup_degrees_within_gender"
    ]
    / stem_completions_by_gender["rollup_degrees_total"]
)
uni_pct = (
    stem_completions_by_gender["uni_degrees_within_gender"]
    / stem_completions_by_gender["uni_degrees_total"]
)

pct_df = pd.concat([stem_pct, uni_pct], axis=1)
pct_df.columns = ["stem_pct", "uni_pct"]
pct_df
import pandas as pd

stem_pct = (
    stem_completions_by_gender[
        "rollup_degrees_within_gender"
    ]
    / stem_completions_by_gender["rollup_degrees_total"]
)
uni_pct = (
    stem_completions_by_gender["uni_degrees_within_gender"]
    / stem_completions_by_gender["uni_degrees_total"]
)

pct_df = pd.concat([stem_pct, uni_pct], axis=1)
pct_df.columns = ["stem_pct", "uni_pct"]
pct_df

Out[9]:

	stem_pct	uni_pct
gender
men	0.47837	0.413482
women	0.52163	0.586518

Aggregate by race/ethnicity¶

If instead we wanted to aggregate by race/ethnicity instead of gender, the query engine lets us do that easily:

In [10]:

Copied!





# Aggregate to race/ethnicity-level within NCSES STEM
engine.rollup_by_grouping(
    grouping=Grouping.race_ethnicity,
    rollup=ncses_stem,
    query_filters=filters,
)
# Aggregate to race/ethnicity-level within NCSES STEM
engine.rollup_by_grouping(
    grouping=Grouping.race_ethnicity,
    rollup=ncses_stem,
    query_filters=filters,
)

Out[10]:

	rollup_degrees_within_race_ethnicity	rollup_degrees_total	uni_degrees_within_race_ethnicity	uni_degrees_total
race_ethnicity
American Indian or Alaska Native	2696	798298	8850	2138714
Asian	90626	798298	173760	2138714
Black or African American	61561	798298	197290	2138714
Hispanic or Latino	128678	798298	344957	2138714
Native Hawaiian or Other Pacific Islander	1342	798298	4529	2138714
Non-resident alien	55293	798298	103282	2138714
Two or more races	34614	798298	83073	2138714
Unknown	22065	798298	71277	2138714
White	401423	798298	1151696	2138714

You can see that we follow a similar naming convention for the columns, except that this time the grouping is _within_race_ethnicity rather than _within_gender.

Aggregate by intersectional group¶

Aggregating across intersectional race/ethnicity and gender groups is also straightforward:

In [11]:

Copied!





# Aggregate to intersectional groups
engine.rollup_by_grouping(
    grouping=Grouping.intersectional,
    rollup=ncses_stem,
    query_filters=filters,
)
# Aggregate to intersectional groups
engine.rollup_by_grouping(
    grouping=Grouping.intersectional,
    rollup=ncses_stem,
    query_filters=filters,
)

Out[11]:

		rollup_degrees_intersectional	rollup_degrees_total	uni_degrees_intersectional	uni_degrees_total
race_ethnicity	gender
American Indian or Alaska Native	men	1122	798298	3105	2138714
American Indian or Alaska Native	women	1574	798298	5745	2138714
Asian	men	46038	798298	77163	2138714
Asian	women	44588	798298	96597	2138714
Black or African American	men	23340	798298	68367	2138714
Black or African American	women	38221	798298	128923	2138714
Hispanic or Latino	men	53325	798298	129063	2138714
Hispanic or Latino	women	75353	798298	215894	2138714
Native Hawaiian or Other Pacific Islander	men	655	798298	1868	2138714
Native Hawaiian or Other Pacific Islander	women	687	798298	2661	2138714
Non-resident alien	men	31495	798298	53380	2138714
Non-resident alien	women	23798	798298	49902	2138714
Two or more races	men	15583	798298	33469	2138714
Two or more races	women	19031	798298	49604	2138714
Unknown	men	11161	798298	30582	2138714
Unknown	women	10904	798298	40695	2138714
White	men	199163	798298	487323	2138714
White	women	202260	798298	664373	2138714

Note that the output dataframe here is multi-indexed on both race_ethnicity and gender.

Pre-written queries to returning results for each field within a taxonomy¶

In addition to the rollup queries above, which provide results rolled up across a specified set of fields, we also provide queries for returning data for each field within a particular taxonomy.

In [12]:

Copied!





# Compute totals by gender for each NCSES Field Group
engine.field_totals_by_grouping(
    grouping=Grouping.gender,
    taxonomy=FieldTaxonomy.ncses_field_group,
    query_filters=filters,
)
# Compute totals by gender for each NCSES Field Group
engine.field_totals_by_grouping(
    grouping=Grouping.gender,
    taxonomy=FieldTaxonomy.ncses_field_group,
    query_filters=filters,
)

Out[12]:

		field_degrees_within_gender	field_degrees_total	uni_degrees_within_gender	uni_degrees_total
ncses_field_group	gender
Architecture and Environmental Design	men	4629	9535	884320	2138714
Architecture and Environmental Design	women	4906	9535	1254394	2138714
Arts and Music	men	34274	94139	884320	2138714
Arts and Music	women	59865	94139	1254394	2138714
Business and Management	men	216242	404518	884320	2138714
Business and Management	women	188276	404518	1254394	2138714
Communication and Librarianship	men	32387	90183	884320	2138714
Communication and Librarianship	women	57796	90183	1254394	2138714
Education	men	30601	118451	884320	2138714
Education	women	87850	118451	1254394	2138714
Engineering	men	95159	127300	884320	2138714
Engineering	women	32141	127300	1254394	2138714
Geosciences	men	3362	6419	884320	2138714
Geosciences	women	3057	6419	1254394	2138714
Humanities	men	32672	82213	884320	2138714
Humanities	women	49541	82213	1254394	2138714
Interdisciplinary or Other Sciences	men	8275	22359	884320	2138714
Interdisciplinary or Other Sciences	women	14084	22359	1254394	2138714
Law	men	1218	3916	884320	2138714
Law	women	2698	3916	1254394	2138714
Life Sciences	men	60441	176774	884320	2138714
Life Sciences	women	116333	176774	1254394	2138714
Life Sciences (Non-S&E)	men	47381	272793	884320	2138714
Life Sciences (Non-S&E)	women	225412	272793	1254394	2138714
Math and Computer Sciences	men	108053	148007	884320	2138714
Math and Computer Sciences	women	39954	148007	1254394	2138714
Other Non-sciences or Unknown Disciplines	men	33459	82243	884320	2138714
Other Non-sciences or Unknown Disciplines	women	48784	82243	1254394	2138714
Physical Sciences	men	13042	23698	884320	2138714
Physical Sciences	women	10656	23698	1254394	2138714
Psychology	men	27437	137620	884320	2138714
Psychology	women	110183	137620	1254394	2138714
Religion and Theology	men	7942	12378	884320	2138714
Religion and Theology	women	4436	12378	1254394	2138714
Science and Engineering Technologies	men	21132	43381	884320	2138714
Science and Engineering Technologies	women	22249	43381	1254394	2138714
Social Sciences	men	74388	178480	884320	2138714
Social Sciences	women	104092	178480	1254394	2138714
Social Service Professions	men	2666	23432	884320	2138714
Social Service Professions	women	20766	23432	1254394	2138714
Vocational Studies and Home Economics	men	29560	80875	884320	2138714
Vocational Studies and Home Economics	women	51315	80875	1254394	2138714

We've tried to keep the column names fairly self-explanatory, but you'll notice that for these queries, the rollup_ columns have been replaced by columns that start with field_. That's because this data is being returning for each pre-defined field in the taxonomy, rather than rolled up across a user-specified set of fields.

In [13]:

Copied!





# Compute totals for each race/ethnicity for each NCSES Field Group
engine.field_totals_by_grouping(
    grouping=Grouping.race_ethnicity,
    taxonomy=FieldTaxonomy.ncses_field_group,
    query_filters=filters,
)
# Compute totals for each race/ethnicity for each NCSES Field Group
engine.field_totals_by_grouping(
    grouping=Grouping.race_ethnicity,
    taxonomy=FieldTaxonomy.ncses_field_group,
    query_filters=filters,
)

Out[13]:

		field_degrees_within_race_ethnicity	field_degrees_total	uni_degrees_within_race_ethnicity	uni_degrees_total
ncses_field_group	race_ethnicity
Architecture and Environmental Design	American Indian or Alaska Native	34	9535	8850	2138714
	Asian	814	9535	173760	2138714
	Black or African American	560	9535	197290	2138714
	Hispanic or Latino	1758	9535	344957	2138714
	Native Hawaiian or Other Pacific Islander	11	9535	4529	2138714
...	...	...	...	...	...
Vocational Studies and Home Economics	Native Hawaiian or Other Pacific Islander	267	80875	4529	2138714
	Non-resident alien	1085	80875	103282	2138714
	Two or more races	2912	80875	83073	2138714
	Unknown	3027	80875	71277	2138714
	White	39000	80875	1151696	2138714

189 rows × 4 columns

In [14]:

Copied!





# Aggregate across each intersectional group for each field
engine.field_totals_by_grouping(
    grouping=Grouping.intersectional,
    taxonomy=FieldTaxonomy.ncses_field_group,
    query_filters=filters,
)
# Aggregate across each intersectional group for each field
engine.field_totals_by_grouping(
    grouping=Grouping.intersectional,
    taxonomy=FieldTaxonomy.ncses_field_group,
    query_filters=filters,
)

Out[14]:

			field_degrees_intersectional	field_degrees_total	uni_degrees_intersectional	uni_degrees_total
ncses_field_group	race_ethnicity	gender
Architecture and Environmental Design	American Indian or Alaska Native	men	17	9535	3105	2138714
	American Indian or Alaska Native	women	17	9535	5745	2138714
	Asian	men	332	9535	77163	2138714
	Asian	women	482	9535	96597	2138714
	Black or African American	men	274	9535	68367	2138714
...	...	...	...	...	...	...
Vocational Studies and Home Economics	Two or more races	women	1950	80875	49604	2138714
	Unknown	men	1511	80875	30582	2138714
	Unknown	women	1516	80875	40695	2138714
	White	men	15086	80875	487323	2138714
	White	women	23914	80875	664373	2138714

378 rows × 4 columns

In [15]:

Copied!





# Aggregate by gender for each NSF Broad Field 
# and each university each year
filters = QueryFilters(start_year=2011, end_year=2013)
df = engine.uni_field_totals_by_grouping(
    grouping=Grouping.gender,
    taxonomy=FieldTaxonomy.nsf_broad_field,
    query_filters=filters,
    by_year=True,
)
df
# Aggregate by gender for each NSF Broad Field 
# and each university each year
filters = QueryFilters(start_year=2011, end_year=2013)
df = engine.uni_field_totals_by_grouping(
    grouping=Grouping.gender,
    taxonomy=FieldTaxonomy.nsf_broad_field,
    query_filters=filters,
    by_year=True,
)
df

Out[15]:

					field_degrees_within_gender	field_degrees_total	uni_degrees_within_gender	uni_degrees_total
institution_name	unitid	nsf_broad_field	gender	year
Community College of the Air Force	100636	Agricultural and biological sciences	men	2011	11	19	14659	19263
		Agricultural and biological sciences	women	2011	8	19	4604	19263
		Mathematical and computer sciences	men	2011	770	984	14659	19263
		Mathematical and computer sciences	women	2011	214	984	4604	19263
		Non-science and engineering	men	2011	13736	18073	14659	19263
...	...	...	...	...	...	...	...	...
Arizona State University Digital Immersion	483124	Social and behavioral sciences	women	2013	177	260	870	1220
Louisiana Delta Community College	483212	Mathematical and computer sciences	men	2013	13	24	240	590
		Mathematical and computer sciences	women	2013	11	24	350	590
		Non-science and engineering	men	2013	227	566	240	590
		Non-science and engineering	women	2013	339	566	350	590

99786 rows × 4 columns

Custom queries¶

If you have a query you want to run that's not covered by the pre-written queries, you can write the SQL yourself and execute it using the engine's get_df_from_query function.

You can run any SQL query using this approach, which means you can use SQL to figure out which tables and columns are available in the pre-processed duckdb database file.

In [16]:

Copied!

query = "SHOW TABLES"

# Using the engine
engine.get_df_from_query(query)
query = "SHOW TABLES"

# Using the engine
engine.get_df_from_query(query)

Out[16]:

	name
0	cip_info
1	ipeds_completions_a
2	ipeds_directory_info

In [17]:

Copied!

from scipeds.constants import COMPLETIONS_TABLE

query = f"DESCRIBE {COMPLETIONS_TABLE}"
engine.get_df_from_query(query)
from scipeds.constants import COMPLETIONS_TABLE

query = f"DESCRIBE {COMPLETIONS_TABLE}"
engine.get_df_from_query(query)

Out[17]:

	column_name	column_type	null	key	default	extra
0	year	USMALLINT	YES	None	None	None
1	unitid	UINTEGER	YES	None	None	None
2	cipcode	ENUM('01.0000', '01.0101', '01.0102', '01.0103...	YES	None	None	None
3	awlevel	ENUM('Associate''s degree', 'Award of at least...	YES	None	None	None
4	majornum	UTINYINT	YES	None	None	None
5	cip2020	ENUM('01.0000', '01.0101', '01.0102', '01.0103...	YES	None	None	None
6	race_ethnicity	ENUM('American Indian or Alaska Native', 'Asia...	YES	None	None	None
7	gender	ENUM('men', 'women')	YES	None	None	None
8	n_awards	UINTEGER	YES	None	None	None
9	ncses_sci_group	ENUM('Non-science and engineering', 'Not categ...	YES	None	None	None
10	ncses_field_group	ENUM('Architecture and Environmental Design', ...	YES	None	None	None
11	ncses_detailed_field_group	ENUM('Aerospace Engineering', 'Agricultural Sc...	YES	None	None	None
12	nsf_broad_field	ENUM('Agricultural and biological sciences', '...	YES	None	None	None
13	dhs_stem	BOOLEAN	YES	None	None	None

You can then use these table and column names to write your own custom SQL:

In [18]:

Copied!





# Which CIP codes have the most awards over all recorded years?
query = f"""
SELECT cip2020, SUM(n_awards) as total_awards 
FROM {COMPLETIONS_TABLE} 
GROUP BY cip2020 
ORDER BY total_awards DESC;
"""
top_cips = (
    engine.get_df_from_query(query)
    .head(10)
    .set_index("cip2020")
)
top_cips
# Which CIP codes have the most awards over all recorded years?
query = f"""
SELECT cip2020, SUM(n_awards) as total_awards 
FROM {COMPLETIONS_TABLE} 
GROUP BY cip2020 
ORDER BY total_awards DESC;
"""
top_cips = (
    engine.get_df_from_query(query)
    .head(10)
    .set_index("cip2020")
)
top_cips

Out[18]:

	total_awards
cip2020
52.0201	9393674.0
24.0101	8794187.0
51.3801	5832908.0
42.0101	3749035.0
52.0301	2615257.0
26.0101	2348858.0
52.0101	2298456.0
13.1202	2291154.0
24.0102	2283145.0
51.0801	2089676.0