Returns to Graduate Degrees in Michigan: Evidence from Administrative Wage Records, 2018–2025

A Descriptive Companion Study to Altonji & Zhu (2025)

Based on Michigan MiSchoolData, Center for Educational Performance and Information (CEPI)
Data vintage: 2018–2025 annual reports  |  Analysis date: March 2026

Abstract

Graduate education is one of the largest financial decisions an American worker can make. Over 40 percent of bachelor's degree holders aged 35 to 39 now hold a graduate degree, up from 31 percent in 1993, and total graduate student debt has grown to constitute a substantial share of the $1.7 trillion federal student loan portfolio. Yet credible, field-specific evidence on whether graduate degrees pay off financially remains remarkably scarce. A landmark 2025 study by Joseph Altonji and Zhengren Zhu, using Texas administrative records and sophisticated econometric methods, provided the most rigorous estimates to date of causal returns to 121 specific graduate degrees. Their study, however, is limited to a single state. This raises a natural question: do the patterns they document in Texas hold elsewhere?

This project provides a Michigan-based companion analysis. Using eight years (2018–2025) of publicly available administrative wage records from the Michigan Center for Educational Performance and Information — which links education records to unemployment insurance earnings data — we examine how the financial returns to graduate education vary across fields, demographic groups, institutions, and over time. Michigan offers a valuable complement to Texas: it has a different industrial composition, a different mix of public universities, and a different demographic profile, yet its administrative data infrastructure enables similar (if less granular) analysis.

We compute descriptive log wage premia — the difference in log median annual wages between "Master's or Higher Degree" holders and bachelor's degree holders employed in Michigan within the same field of study — across 26 broad fields (2-digit CIP codes) and approximately 136 specific program areas (4-digit CIP codes). Unlike Altonji and Zhu, who have access to individual-level panel data with rich controls for ability, undergraduate major, and college GPA, we work with publicly reported cell-level aggregates. Our estimates are therefore descriptive, not causal: they capture both the true return to graduate education and the positive selection of higher-ability individuals into graduate programs.

To partially overcome this limitation, we develop a synthetic cohort panel that exploits the longitudinal structure of the data. Each annual report contains median wages at two horizons — one year and five years after degree completion — corresponding to different graduating cohorts. By stacking eight years of reports, we link four completion cohorts (approximately 2017 through 2020) at both horizons, enabling us to track within-cohort wage premium dynamics as graduates gain experience. This is the cell-level analog to the individual-level experience profiles estimated by Altonji and Zhu, and it represents the primary methodological contribution of this study.

We further disaggregate returns by gender, race and ethnicity, and institution (using the richer 2023–2025 data that cover 14 Michigan universities), and compute approximate internal rates of return incorporating tuition costs, program duration, and foregone earnings. A comprehensive methods appendix documents all data sources, estimation procedures, assumptions, and limitations in detail.

The employment-weighted average descriptive log premium across fields is 0.255 at one year after completion and 0.305 at five years, with a standard deviation of 0.191 and 0.156, respectively. These substantially exceed Altonji and Zhu's weighted average causal return of 0.159, as expected given that our estimates do not remove selection bias. The gap between our descriptive premia and their causal estimates provides an informal gauge of selection severity by field.

Across fields, we find patterns broadly consistent with the Texas evidence. At the 4-digit CIP level, the highest wage premia accrue to pharmacy (PharmD), MBA, and law (JD) programs — the same top tier identified by Altonji and Zhu. At the other end, the descriptive premium is negligible or negative for visual and performing arts, and modest for history and English. Engineering and computer science present a nuanced picture: high absolute wages but relatively small incremental gains over already-high bachelor's-level earnings in these fields.

Three findings stand out as particularly noteworthy. First, health professions is the only field where the graduate premium grows within cohorts over the first five years of a career (by an average of 0.167 log points). In every other field, bachelor's holders experience faster early-career wage growth than master's holders, narrowing the premium over time. The health professions pattern is consistent with the post-residency wage trajectory for physicians documented by Altonji and Zhu, and suggests this pattern is not specific to Texas.

Second, approximate internal rates of return reshape the field ranking relative to raw wage premia. MBA programs yield the highest IRR (0.255), followed by public administration (0.237) and nursing (0.222), reflecting their combination of meaningful wage gains with moderate tuition and the ability to continue working during enrollment. Computer science, despite a positive wage premium, shows a near-zero IRR (0.011) because bachelor's-level earnings are so high that the opportunity cost of graduate school nearly offsets the incremental gain. Education master's programs, often dismissed for their modest wage effects, show a respectable IRR (0.180) because they require minimal investment.

Third, institution-level variation is substantial in some fields but not others. In business, the 5-year log premium ranges from 0.211 (Central Michigan University) to 0.526 (Ferris State University), with the University of Michigan at 0.475. Importantly, the highest-premium institutions are not always those with the highest absolute wages — institutions whose bachelor's graduates already earn very high salaries show compressed premia, illustrating why raw wage rankings can be misleading guides to the value added by specific programs. In education, by contrast, institutional variation is modest, consistent with a regulated labor market where master's degrees confer relatively standardized salary increments.

This study contributes to the growing national conversation about the value and affordability of higher education in three ways.

First, it demonstrates that the broad cross-field patterns in graduate degree returns are robust across states, data sources, and methodologies. When both a descriptive analysis in Michigan and a rigorously identified causal analysis in Texas agree that MBA, law, and health professions degrees offer substantial financial returns while arts and humanities degrees offer modest ones, this consistency should give prospective students and policymakers greater confidence in these patterns — even if the precise magnitudes remain uncertain.

Second, it underscores that simple earnings comparisons can be deeply misleading. Average post-graduate earnings for MBA holders in Michigan exceed $110,000, while education master's holders earn roughly $72,000 — a gap of $38,000. But once we account for who selects into these programs, how much they would have earned without the degree, tuition costs, and foregone earnings, the gap in financial returns narrows considerably. Education master's programs, with their low costs, short duration, and high in-school earnings, deliver an IRR comparable to many STEM degrees. These nuances matter enormously for students making enrollment decisions and for policymakers designing accountability frameworks — such as the gainful employment provisions recently debated in Congress — that assess graduate programs based on earnings outcomes.

Third, and perhaps most importantly, the wide variation in returns we document — across fields, institutions, and demographic groups — means that the question "is graduate school worth it?" has no single answer. The financial return to a nursing master's degree from Ferris State University is fundamentally different from the return to a fine arts degree, and both are different from the return to a computer science master's for someone already earning $90,000 with a bachelor's degree. Prospective students deserve access to field-specific, institution-specific, and context-specific information, not broad averages. Studies like Altonji and Zhu's, and descriptive complements like this one, are steps toward providing that information — but much work remains, particularly in making these findings accessible to the students and families who need them most.

Reference study: Altonji, J.G. and Z. Zhu (2025). "Returns to Specific Graduate Degrees: Estimates Using Texas Administrative Records." NBER Working Paper No. 33530. Published as PEER Center report, Postsecondary Education & Economics Research Center, American University.

Data source: Michigan Center for Educational Performance and Information (CEPI). "Median Annual Wages by Field of Study," MiSchoolData, 2018–2025 annual reports. mischooldata.org
Phase 1

Returns to Graduate Degrees in Michigan

Phase 1: Cross-Field Wage Premia, Trends, and Comparison with Altonji & Zhu (2025)

Analysis of Michigan MiSchoolData Median Annual Wages by Field of Study, 2018–2025

1. Graduate Degree Wage Premia by Field

The table below reports the median annual wages of bachelor's degree holders and master's-or-higher degree holders employed in Michigan, by broad field of study (2-digit CIP code). The log wage premium is the difference in log median wages, which approximates the percentage earnings gain from holding a graduate degree versus a bachelor's degree in the same field. These are descriptive premia averaged over 2023–2025 reporting years.

Important: These are descriptive wage gaps, not causal returns. They do not control for selection into graduate programs — people who pursue graduate degrees likely differ from those who do not in ways that affect earnings. See the Methods section for a detailed discussion.
1 Year After
5 Years After

Log Wage Premium by Field

↑ Back to top

2. Michigan–Texas Comparison

We compare our Michigan descriptive wage premia to the causal estimates from Altonji & Zhu (2025), who use Texas administrative records with rich individual-level controls. Their preferred "FEcg" estimator accounts for selection by conditioning on undergraduate major × graduate field fixed effects. Our Michigan estimates are raw median wage gaps at the 2-digit CIP level — systematically larger because they capture both the causal effect of the degree and positive selection into graduate programs.

Reading the comparison: The Michigan premium should generally exceed the Altonji & Zhu FEcg estimate, since FEcg removes upward selection bias. Where our ranking of fields across the two states agrees, it suggests the cross-field pattern is robust to both geography and methodology. Where it disagrees, composition differences across states (or aggregation of subfields) are likely responsible.
↑ Back to top

3. Premium Trends Over Time (2018–2025)

With eight years of data, we can trace how the graduate wage premium within each field has evolved. This speaks to labor market demand shifts — for example, whether the premium for health professions changed during and after the COVID-19 pandemic, or whether the CS premium responded to the AI boom.

All Fields: 1-Year Premium Trend

↑ Back to top

4. Synthetic Cohort Analysis: Within-Cohort Wage Growth

Our strongest analytical contribution in Phase 1. By linking the same approximate completion cohort observed at 1 year post-completion (in one reporting year) and again at 5 years post-completion (four reporting years later), we track how the wage premium evolves within the same group of graduates as they gain experience. This is the cell-level analog to Altonji & Zhu's individual-level experience profiles.

We identify four linkable cohorts: ~2017, ~2018, ~2019, and ~2020 completers, each observed at both the 1-year and 5-year horizon across 12 fields.

Key finding: Health Professions is the only field where the graduate premium grows from year 1 to year 5 (average +0.167 log points). This is consistent with Altonji & Zhu's finding that MD returns rise steeply with experience, likely due to low-paying residency in early years. In all other fields, bachelor's wage growth outpaces master's wage growth, narrowing the premium over early career — though the premium remains positive in most cases.
↑ Back to top

5. Methods & Data Notes

This analysis uses Michigan MiSchoolData "Median Annual Wages by Field of Study" reports for 2018–2025, produced by the Center for Educational Performance and Information (CEPI) by linking education records to unemployment insurance wage records. We compute descriptive log wage premia — the difference in log median wages between "Master's or Higher Degree" holders and bachelor's degree holders within the same field of study. These are not causal estimates of the returns to graduate education.

Full documentation: For complete details on data provenance, variable definitions, estimation methods, the synthetic cohort panel construction, the relationship to Altonji & Zhu (2025), and a thorough discussion of limitations, see the Methods & Data Appendix.

Key points to keep in mind when interpreting these results:

Descriptive, not causal. Our estimates capture both the true return to graduate education and positive selection into graduate programs. They should be treated as upper bounds on causal returns. See the Methods Appendix §3.2 for a detailed discussion.

Coarse degree classification. Michigan's "Master's or Higher Degree" category bundles master's, professional (JD, MD, PharmD), and doctoral degrees. At 2-digit CIP level, this mixes programs with very different returns. Our 4-digit CIP analysis in Phase 2 mitigates this concern. See Methods Appendix §2.2.

Michigan employment only. Graduates who leave Michigan are excluded. If higher-earning graduates are more mobile, returns are understated. See Methods Appendix §8.3.

Medians, not means. We observe median wages; Altonji & Zhu estimate conditional mean effects on log earnings. These can differ for skewed distributions. See Methods Appendix §3.4.

↑ Back to top
Phase 2

Phase 2: Synthetic Cohort Analysis

Within-cohort wage dynamics, 4-digit degree-level estimates, and COVID-era effects

Methods: All estimates are descriptive log wage premia computed from Michigan MiSchoolData administrative records, 2018–2025. The synthetic cohort panel links the same approximate completion cohort observed at 1 year and 5 years post-completion across different reporting years. For full documentation of data sources, estimation methods, assumptions, and limitations, see the Methods & Data Appendix.

1. Within-Cohort Wage Growth by Education Level

By tracking the same approximate cohort from 1 year to 5 years post-completion, we decompose the wage premium dynamic into two components: how fast master's-or-higher holders' wages grow versus how fast bachelor's holders' wages grow. If BA wages grow faster than MA+ wages, the premium narrows even though both groups are earning more.

Key pattern: In most fields, bachelor's holders experience faster wage growth than master's holders over years 1–5. This likely reflects that BA holders start at lower wages with more room for early-career advancement, while MA+ holders start closer to their career wage. The notable exception is Health Professions, where MA+ wage growth (0.479 log points) far exceeds BA growth (0.311), consistent with the post-residency wage jump for physicians.

All Fields: MA+ vs. BA Wage Growth (Log Points, Year 1 → Year 5)

↑ Back to top

2. Returns at the 4-Digit CIP Level

Michigan's data reports wages at 4-digit CIP codes, letting us get closer to Altonji & Zhu's degree-specific estimates. We can separately identify MBA (CIP 52.02), Law/JD (22.01), Social Work (44.07), Clinical Psychology (42.28), Nursing (51.38), Pharmacy (51.20), Medicine (51.12), and specific engineering subfields — exactly the degrees AZ focus on.

At 4-digit CIP, the Michigan log premium for MBA (52.02) at 5 years is 0.525, compared to AZ's FEcg estimate of 0.125. The gap (≈0.40) gives a rough sense of selection bias magnitude — MBA students would have earned substantially more than the average BA holder even without the degree. For Social Work (44.07), the Michigan 5-year premium of 0.284 compares to AZ's FEcg of 0.063, implying less selection bias in social work than in business.
↑ Back to top

3. Cohort-by-Cohort Premium Trajectories

Each of our four linkable cohorts (~2017, ~2018, ~2019, ~2020) tells a slightly different story. Examining them individually reveals whether the patterns we see in the averages are driven by a single cohort or are consistent across graduating classes.

↑ Back to top

4. COVID-Era Cohort Effects

The ~2019 and ~2020 completion cohorts entered the labor market during the pandemic. We compare their wage premia to the pre-COVID cohorts (~2017, ~2018) to assess whether the pandemic differentially affected graduate vs. bachelor's degree holders.

Finding: COVID effects on the graduate premium are heterogeneous. Math & Statistics saw the largest positive shift (+0.172 at 1 year), possibly driven by remote-work demand for quantitative skills. Legal Professions saw the largest negative shift (−0.175), perhaps reflecting pandemic disruptions to the legal industry's traditional on-ramp. Health Professions' 1-year premium fell (−0.065), though this may reflect pandemic-era overtime and hazard pay boosting BA-level health worker wages.
↑ Back to top

5. 4-Digit Premium Trends Over Time (2018–2025)

Tracking specific degree types over eight years reveals medium-term labor market dynamics. The MBA premium has been remarkably stable (0.57–0.64 range). Nursing saw a notable decline from 0.44 in 2019 to 0.25 in 2023, likely as pandemic-era demand for bedside nurses raised BA-level nursing wages relative to advanced-practice wages.

↑ Back to top
Phase 3

Phase 3: Who Benefits Most?

Demographic and Institutional Heterogeneity in Graduate Degree Returns

Methods: All estimates are descriptive log wage premia from Michigan MiSchoolData, averaged over 2023–2025. Demographic and institutional breakdowns are only available in these three years. For full documentation, see the Methods & Data Appendix.

1. Gender Heterogeneity

Altonji & Zhu (2025) find that returns to graduate degrees are higher for women in 14 of 18 fields. We examine whether the same pattern holds in Michigan's descriptive wage premia. For each field, we compute the log premium separately for men and women, then compare.

Key finding: Unlike Altonji & Zhu, we find that men generally show higher descriptive premia than women in Michigan. The male premium exceeds the female premium in 10 of 14 fields at the 1-year horizon. This likely reflects a different mechanism than AZ's causal estimates capture: in Michigan's raw data, men who hold only a bachelor's degree in fields like education, social work, and psychology may earn less relative to the field average (perhaps working in lower-paying specializations), producing a larger raw gap when they obtain a graduate degree. AZ's finding of higher female returns emerges only after controlling for selection.
1 Year After
5 Years After

Female vs. Male Log Wage Premium by Field

↑ Back to top

2. Race & Ethnicity Heterogeneity

Altonji & Zhu report surprising results on racial heterogeneity: Asian Americans receive substantially lower causal returns than non-Hispanic whites in most fields, while patterns for Black and Hispanic graduates are mixed. We examine whether similar patterns appear in Michigan's descriptive data, with the caveat that cell sizes for minority groups are small in many fields.

Data limitation: Race/ethnicity breakdowns produce many suppressed cells (fewer than 10 employed). We can only report premia for race × field combinations with sufficient data in at least two of the three years (2023–2025). This limits analysis to larger fields and larger racial groups.
1 Year After
5 Years After

Log Premium by Race: Selected Fields

↑ Back to top

3. Institution-Specific Returns

Altonji & Zhu find that MBA and JD returns are significantly higher at higher-ranked programs (using the Dale-Krueger method to control for selection into institutions). We examine institution-level variation across Michigan's 14 universities with master's-level wage data. Without application/admission controls, our estimates reflect both the causal effect of the institution and student selection.

1 Year After
5 Years After
↑ Back to top

4. Summary of Key Findings

Gender

The descriptive premium is higher for men in most fields — the opposite of Altonji & Zhu's causal finding. The most notable exception is Math & Statistics, where women show a substantially higher premium (+0.210 gap). The reversal relative to AZ likely reflects that our descriptive measure conflates the causal return with gender-specific selection patterns that AZ's FEcg method removes.

Race & Ethnicity

Coverage is limited, but where we can compare: in Computer Science and Engineering, Black and Hispanic graduates show lower descriptive premia than White graduates. In Education and Public Administration, the pattern reverses — minority graduates show slightly higher premia. Altonji & Zhu's finding of notably lower returns for Asian Americans is partially visible in our data for Engineering (Asian premium 0.167 vs. White 0.305) and Computer Science (0.163 vs. 0.293). Business (MBA) shows remarkably similar premia across racial groups.

Institutions

In Business, the University of Michigan (Ann Arbor) commands the highest MA+ median wage ($156k at 5 years) but not the highest premium (Ferris State's premium of 0.526 exceeds U-M's 0.475). This reflects the well-known composition effect: U-M's bachelor's graduates also earn very high wages, compressing the premium. In Health Professions, Ferris State and U-M show the highest premia, likely driven by their pharmacy (PharmD) and medical (MD) programs. In Education, the premium varies modestly across institutions (0.16–0.44 range), consistent with a relatively standardized labor market for teachers and administrators.

↑ Back to top
Phase 4

Phase 4: Approximate Internal Rates of Return

Accounting for tuition, program length, and foregone earnings

Methods: IRR is the discount rate equating the present discounted value of the earnings stream with a graduate degree (net of tuition) to the counterfactual BA earnings stream. We use Michigan median wages at 1-year and 5-year horizons, interpolate between them, and assume the premium stabilizes after year 5. Tuition is approximated from IPEDS data for Michigan public institutions. See the Methods & Data Appendix §7 for full assumptions, including earnings-during-enrollment fractions and extrapolation methodology.
Important caveat: These IRR estimates are approximate. They inherit all the limitations of our descriptive wage premia (no causal identification, coarse degree bundling at 2-digit CIP, Michigan employment only) and add additional assumptions about tuition, program duration, and earnings trajectories. They should be interpreted as indicative of relative field rankings, not as precise financial return estimates for any individual.

1. Approximate IRR by Broad Field (2-Digit CIP)

The IRR accounts for tuition costs, program duration, and foregone earnings during enrollment — factors that can substantially alter the ranking of degrees relative to their raw wage premia. A field with a high wage premium but long and expensive program (like law) may have a lower IRR than a field with a moderate premium but short, inexpensive program (like education administration).

↑ Back to top

2. IRR by Specific Degree (4-Digit CIP)

At the 4-digit CIP level, we can compute IRRs for specific degree types that map more directly to Altonji & Zhu's focal programs: MBA, JD, MSW, MPA, Clinical Psychology, Nursing, and specific engineering subfields.

Standout: The MBA (CIP 52.02) yields the highest 4-digit IRR at 0.255 — driven by a large wage premium, relatively affordable Michigan public tuition, and high earnings during enrollment. MPA (0.237) and Nursing (0.222) round out the top three. At the other end, Computer Science shows a near-zero IRR (0.011), not because the degree doesn't boost wages, but because CS bachelor's holders already earn so much ($95k at 5 years) that the incremental gain from a master's barely covers the opportunity cost.
↑ Back to top

3. Michigan–Texas IRR Comparison

Comparing our approximate Michigan IRRs to Altonji & Zhu's Texas estimates (their Table 4), which are based on causal FEcg earnings effects and are therefore more credible. The comparison reveals which fields' financial returns are robust across states and methodologies.

↑ Back to top

4. Key Findings

4.1 IRR rankings partially diverge from wage premium rankings

Education shows a modest wage premium (0.303 log points at 1 year) but a relatively high IRR (0.180), because education master's programs are short, inexpensive, and students typically continue working while enrolled. This parallels Altonji & Zhu's finding that Education Administration has an IRR of 0.230 — more than twice its log earnings effect — because the net investment is small.

4.2 High-premium fields with expensive programs see compressed IRRs

Legal Professions (JD) has among the highest wage premia but a lower IRR (0.152 at 2-digit) because law school is three years long with high tuition ($28k/year) and low earnings during enrollment. At 4-digit CIP, we lack sufficient JD-specific 5-year data to compute the IRR, highlighting a data limitation.

4.3 Some fields show negative PDV gains

Computer Science, Mathematics, and Biology show negative percentage PDV gains at the 2-digit level, meaning the present value of the master's earnings stream (net of tuition) falls short of the bachelor's counterfactual. For CS, this reflects extremely high BA-level wages; the master's adds modest incremental earnings that barely justify the investment. For Biology, it reflects the very high BA wage growth rate, which means the opportunity cost of two years out of the workforce is substantial.

4.4 Michigan and Texas IRRs agree on the top tier

Both states identify MBA, MPA, and Nursing among the highest-IRR programs. The Michigan estimates are generally higher than Texas (as expected, since our descriptive premia overstate causal returns), but the correlation in rankings is encouraging. The largest discrepancy is Social Work: Michigan IRR of 0.176 vs. Texas IRR of 0.020. This likely reflects that Michigan's broader CIP 44 category mixes social work with higher-return public administration programs.

↑ Back to top

Methods & Data Appendix

Returns to Graduate Degrees in Michigan: Evidence from Administrative Wage Records, 2018–2025

1. Data Source and Provenance

1.1 The MiSchoolData Workforce Reports

We use the "Median Annual Wages by Field of Study" reports published by the Michigan Center for Educational Performance and Information (CEPI) on the MiSchoolData platform (mischooldata.org). These reports are produced annually as part of the Michigan Workforce Longitudinal Data System, a partnership among CEPI, the Michigan Department of Labor and Economic Opportunity (LEO), and the Unemployment Insurance Agency (UIA).

1.2 Underlying Data Systems

The wage reports are constructed by linking three administrative data systems:

1.3 Report Construction by CEPI

CEPI matches education records to wage records to identify each individual's highest level of education and their post-completion earnings. Two cohorts are defined for each annual report:

Students who are still enrolled in education at the time of the wage observation are excluded. Individuals must have received their high school education in Michigan and must be currently employed in Michigan to appear in the data. The wage measure is the median annual wage within each cell, rounded to the nearest $100. Cells with fewer than 10 employed individuals are suppressed (wages not reported).

1.4 Years of Data

We use eight annual reports: 2018, 2019, 2020, 2021, 2022, 2023, 2024, and 2025. The 2018–2022 reports contain statewide aggregates only. The 2023–2025 reports additionally contain institution-level (IHE) data and demographic breakdowns by gender, age band, and race/ethnicity.

↑ Back to top

2. Sample Construction and Variable Definitions

2.1 Unit of Observation

Each observation in the raw data is a cell defined by the combination of: report year, location (statewide or specific institution), diversity type and category (if applicable), wage horizon (1 year or 5 years after completion), CIP code (field of study), and education level. The reported values for each cell are the median annual wage and the number of employed individuals.

2.2 Education Levels

The data classify individuals into four education levels based on their highest credential earned:

Important bundling caveat: The "Master's or Higher Degree" category is substantially coarser than the degree classifications used by Altonji & Zhu (2025), who separately identify 121 specific graduate degree types. In our data, an MBA, a JD, an MD, and a PhD in the same CIP field are all grouped together. At the 2-digit CIP level, this means CIP 51 (Health Professions) bundles master's-level nurses with MD physicians and PharmD pharmacists. At the 4-digit CIP level, the bundling is less severe — CIP 52.02 (Business Administration) is predominantly MBAs, and CIP 22.01 (Law) is predominantly JDs.

2.3 Field of Study Classification

Fields of study are identified by Classification of Instructional Programs (CIP) codes at three levels of granularity:

Our primary analyses use 2-digit CIP codes for robustness and comparability across years. We supplement with 4-digit analyses to achieve closer correspondence with Altonji & Zhu's degree-specific estimates.

2.4 Deduplication

In some years, the data contain multiple rows for the same CIP code at a given level (with slightly different field name strings). We deduplicate by retaining the row with the largest number of employed individuals for each CIP × report year × education level × wage horizon cell. Post-deduplication, each cell is unique.

2.5 Demographic Categories (2023–2025 only)

The 2023–2025 reports provide statewide wage data broken down by three diversity dimensions:

These breakdowns are reported separately from the overall statewide figures and cannot be cross-tabulated (e.g., we cannot observe wages for Black women specifically).

2.6 Institution-Level Data (2023–2025 only)

The 2023–2025 reports include wage data at the individual institution (IHE) level. We observe 43 institutions total, of which 14 report master's-level wage data: Central Michigan University, Eastern Michigan University, Ferris State University, Grand Valley State University, Michigan State University, Michigan Technological University, Northern Michigan University, Oakland University, Saginaw Valley State University, University of Michigan (Ann Arbor), University of Michigan–Dearborn, University of Michigan–Flint, Wayne State University, and Western Michigan University.

↑ Back to top

3. Estimation: Log Wage Premium

3.1 Core Estimand

For each field of study f, report year t, and wage horizon h ∈ {1, 5}, we compute the log wage premium of a graduate degree over a bachelor's degree:

Δfth = ln(W̃fthMA+) − ln(W̃fthBA)

where W̃ denotes the median annual wage for the indicated education level. This quantity approximates the proportional wage gap: for small values, Δ ≈ (WMA+ − WBA) / WBA. We also report the exact percentage premium: %Δ = (WMA+ − WBA) / WBA × 100.

3.2 Interpretation

The log wage premium is a descriptive measure. It captures the raw difference in median log earnings between graduate and bachelor's degree holders working in Michigan within the same broad field. It does not estimate the causal effect of obtaining a graduate degree, because:

The premium should generally be interpreted as an upper bound on the causal return, since positive selection into graduate programs (higher-ability individuals being more likely to pursue advanced degrees) biases the raw gap upward.

3.3 Pooling Across Years

For summary tables, we average the log premium across the three most recent report years (2023–2025) to reduce noise from year-to-year sampling variation while reflecting current labor market conditions. For trend analyses, we report year-by-year estimates.

3.4 Using Medians vs. Means

Our data report median wages, while Altonji & Zhu's regression estimates correspond to effects on the conditional mean of log earnings. In principle, if the earnings distribution is log-normal, the difference in log medians equals the difference in log means. In practice, skewness (particularly for high-earning fields like law and medicine) means our estimates may differ from what a regression on log earnings would produce. We cannot assess the magnitude of this discrepancy without access to the underlying microdata.

↑ Back to top

4. Synthetic Cohort Panel

4.1 Construction

The synthetic cohort panel exploits the fact that each annual report contains two wage horizons — 1 year and 5 years after completion — which correspond to different completion cohorts. Specifically:

A cohort completing in year c therefore appears as "1 year after" in report year c + 1 and as "5 years after" in report year c + 5. With reports spanning 2018–2025, we can link four cohorts at both horizons:

Approx. Completion YearObserved 1yr After (Report Year)Observed 5yr After (Report Year)
~201720182022
~201820192023
~201920202024
~202020212025

4.2 Within-Cohort Premium Growth

For each linked cohort, we compute the premium at both horizons and measure the change:

ΔΔfc = Δf,c+5,5 − Δf,c+1,1

where ΔΔ is the within-cohort premium growth for field f and cohort c. This can be decomposed as:

ΔΔfc = [ln(W̃5yrMA+) − ln(W̃1yrMA+)] − [ln(W̃5yrBA) − ln(W̃1yrBA)]

That is, the premium grows if master's holders' wages grow faster than bachelor's holders' wages, and shrinks if the reverse is true. We report both the premium change and the separate wage growth components.

4.3 Assumptions and Caveats

The synthetic cohort approach assumes that the 1-year and 5-year observations within a cohort-field cell reflect approximately the same group of graduates. This assumption can fail due to:

4.4 COVID-Era Cohort Classification

For the COVID-era analysis, we classify the ~2017 and ~2018 completion cohorts as "pre-COVID" (they entered the labor market before the pandemic) and the ~2019 and ~2020 cohorts as "COVID-era" (they entered during or immediately after the pandemic's initial labor market disruptions). The ~2019 cohort is borderline — they entered the workforce approximately in 2020, at the pandemic's onset. We include them in the COVID-era group because their initial labor market experience was likely disrupted.

↑ Back to top

5. Relationship to Altonji & Zhu (2025)

5.1 What They Estimate

Altonji and Zhu (2025) estimate causal effects of 121 graduate degrees on log quarterly earnings using Texas Schools Project administrative data, which links individual-level records across high school, college, graduate school, and UI earnings. Their sample comprises approximately 31.5 million person-quarter observations. They employ six estimation strategies:

  1. OLS with rich controls (college major × gender age profiles, race/ethnicity, college GPA, college credits, year dummies)
  2. FEcg (their preferred): adds fixed effects for each undergraduate major × graduate field combination, absorbing permanent unobserved heterogeneity within these groups
  3. FE (individual fixed effects): compares the same individual's earnings before and after graduate school
  4. OLS-ps: propensity-score-reweighted OLS
  5. FEcg-ps: propensity-score-reweighted FEcg
  6. Dale-Krueger method: for institution-specific returns, controlling for application/admission sets

5.2 What We Estimate

Our log wage premium is conceptually most similar to a naïve OLS estimate without individual-level controls — a raw cross-sectional comparison of graduate vs. bachelor's median wages. The key differences are:

FeatureAltonji & Zhu (Texas)This Study (Michigan)
Unit of analysisIndividual person-quarterAggregate cell (field × ed level × year)
Outcome variableLog quarterly earnings (mean)Log median annual wage
Degree classification121 specific graduate degrees (4-digit CIP)"Master's or Higher" by 2-digit or 4-digit CIP
ControlsCollege major, GPA, race, gender, age profilesNone (raw wage gap within field)
Selection correctionFEcg, FE, propensity score reweightingNone
CounterfactualSame person without grad degree (FE) or same major-group without grad degree (FEcg)Bachelor's holders in same broad field
Time dimensionIndividual panel (same person over time)Synthetic cohort panel (same approx. cohort over time)
Geographic scopeTexas employment onlyMichigan employment only

5.3 Interpreting the Gap Between Estimates

Our Michigan descriptive premium should systematically exceed Altonji & Zhu's FEcg estimates because our measure does not remove selection bias. The difference provides an informal gauge of selection severity by field. A large gap (e.g., Business: Michigan 5yr = 0.449 vs. AZ FEcg = 0.125) implies strong positive selection — MBA students would have earned substantially more than the average bachelor's holder even without the degree. A small gap (or a Michigan estimate below the AZ estimate) would suggest either negative selection, different field composition across states, or measurement differences.

5.4 Cross-State Comparability

Despite different data structures and methods, the cross-field ranking of graduate degree returns is broadly consistent between Michigan and Texas. This suggests the rank ordering of returns is driven by fundamental labor market forces (human capital accumulation, professional licensing requirements, occupational wage structures) rather than state-specific factors.

↑ Back to top

6. Heterogeneity Analyses (Phase 3)

6.1 Gender and Race/Ethnicity

Using the 2023–2025 demographic breakdowns, we compute the log wage premium separately by gender and by racial/ethnic group. The formula is the same as in Section 3.1, applied within each demographic cell. We average across the three available years to improve precision.

Because the demographic breakdowns cannot be cross-tabulated (we observe wages by gender or by race, not by gender × race), these analyses are necessarily univariate. We also cannot condition on other covariates (age, undergraduate major) within a demographic group.

6.2 Institution-Specific Returns

Using institution-level data from 2023–2025, we compute the log premium for each IHE that reports both bachelor's and master's-or-higher wages in a given field. We restrict attention to the 14 universities with master's-level data. For institution-specific returns by program rank, we correlate our institution-level premia with U.S. News & World Report program rankings, analogous to Altonji & Zhu's Figure 3.

Unlike Altonji & Zhu, we do not have application/admission data and therefore cannot implement the Dale-Krueger selection correction. Our institution-level estimates reflect both the causal effect of the institution and selection into institutions based on unobserved characteristics.

↑ Back to top

7. Approximate Internal Rate of Return (Phase 4)

7.1 Approach

We compute approximate internal rates of return (IRR) following the logic of Altonji & Zhu's Table 4. The IRR is the discount rate at which the present discounted value of the earnings stream with a graduate degree (net of tuition) equals the present discounted value of the counterfactual earnings stream without a graduate degree.

7.2 Inputs

7.3 Limitations of the IRR Calculation

Our IRR estimates are rougher than Altonji & Zhu's because we lack individual-level earnings trajectories, data on earnings during enrollment, and the ability to condition on individual characteristics. The bachelor's median is an imperfect counterfactual. We report IRR estimates as indicative of the relative ranking of fields, not as precise point estimates of the financial return to any individual.

↑ Back to top

8. Limitations

8.1 No Causal Identification

This is the most fundamental limitation. Without individual-level data, we cannot implement any of the standard methods for addressing selection into graduate programs — individual fixed effects, FEcg, propensity score methods, instrumental variables, or regression discontinuity designs. Our estimates should be interpreted as descriptive wage premia, not causal returns.

8.2 Coarse Degree Classification

The "Master's or Higher Degree" category bundles master's degrees with professional degrees (JD, MD, PharmD, DDS, DVM) and research doctorates (PhD). Within a broad CIP category, this mixes programs with very different labor market returns. For example, at CIP 51 (Health Professions), our "Master's or Higher" category includes nursing master's students (moderate wages), physicians (very high wages), and public health master's students (moderate wages). The 4-digit CIP analysis mitigates this to some extent but does not fully resolve it.

8.3 Michigan Employment Only

The wage data only cover individuals employed in Michigan. Graduates who leave the state are excluded. If higher-earning graduates are more likely to move out of state (particularly from flagship programs in law, business, and medicine), our estimates understate the true returns for those fields. Altonji & Zhu note the same concern for Texas, citing Foote and Stange (2019), who find that individuals from flagship universities are more likely to move out of state.

8.4 Median vs. Mean

We observe medians, not means. Regression-based estimates (like Altonji & Zhu's) estimate effects on conditional means of log earnings. In fields with right-skewed earnings distributions (law, medicine, business), the mean and median of log earnings can differ, making direct comparison imprecise.

8.5 No Undergraduate Major Information

We cannot condition on undergraduate major, which Altonji & Zhu show is critical. Their Table 5 demonstrates that returns to an MBA vary from 0.066 for computer science majors to 0.326 for education majors. Our estimates average over all undergraduate backgrounds within a field.

8.6 Cross-Sectional Counterfactual

Our "counterfactual" — the bachelor's median in the same field — is the earnings of people who completed a bachelor's degree in that field, not the earnings that the graduate degree holder would have earned without the graduate degree. These can differ if graduate students come from different undergraduate majors than the bachelor's-only workers, or if they would have worked in different industries or occupations.

8.7 Suppression Bias

Cells with fewer than 10 employed individuals have suppressed wages. This may create bias if smaller programs have systematically different returns. We limit our analysis to non-suppressed cells, which tend to be larger and more established programs.

↑ Back to top

9. Robustness and Sensitivity

9.1 Stability Across Years

We assess robustness by examining whether cross-field rankings are stable across the eight years of data. For most fields, the rank ordering of premia is highly consistent year-to-year. The Spearman rank correlation of field-level premia between any two consecutive years exceeds 0.85 in most cases.

9.2 2-Digit vs. 4-Digit Estimates

Where both are available, 4-digit estimates sometimes diverge from the 2-digit aggregates, which is informative about within-field heterogeneity. For example, CIP 52 (Business) at 2-digit yields a 5-year premium of 0.449, while CIP 52.02 (MBA specifically) yields 0.525. The difference reflects that MBA students outperform the average business graduate degree holder, consistent with the MBA being the highest-return degree within business.

9.3 Cohort Consistency

We examine whether the synthetic cohort results are driven by any single cohort by comparing within-cohort premium growth across all four linkable cohorts. For most fields, the four cohorts produce qualitatively similar patterns, though individual cohort estimates can be noisy for smaller fields.

↑ Back to top

Appendix A: Data Inventory

Report YearTotal RowsStatewide OverallStatewide + DemographicsInstitution-Level
20188,9208,920
20198,9848,984
20209,1609,160
20219,1529,152
20229,3129,312
2023619,0109,55297,104512,354
2024625,6699,57698,080518,013
2025634,6669,60899,296525,762
Total1,924,87374,264294,4801,556,129

A.2 CIP Code Coverage

CIP LevelUnique CodesWith Master's Wages (non-suppressed)
2-digit4327
4-digit316~136
6-digit970Various (many suppressed)

A.3 Institutions with Master's-Level Wage Data

Central Michigan University, Eastern Michigan University, Ferris State University, Grand Valley State University, Michigan State University, Michigan Technological University, Northern Michigan University, Oakland University, Saginaw Valley State University, University of Michigan (Ann Arbor), University of Michigan–Dearborn, University of Michigan–Flint, Wayne State University, Western Michigan University.

A.4 Synthetic Cohort Panel Coverage

100 linkable cohort-field cells across 4 cohorts (~2017, ~2018, ~2019, ~2020) and 27 2-digit CIP fields.

Appendix Version: 1.0 (Phases 1–2 complete; Sections 6–7 to be updated upon completion of Phases 3–4)
Data vintage: Michigan MiSchoolData, 2018–2025 reports, accessed March 2026
Reference study: Altonji, J.G. and Z. Zhu (2025), "Returns to Specific Graduate Degrees: Estimates Using Texas Administrative Records," NBER Working Paper 33530.

About This Project

Hi, I’m Remi, and I’m not an economist.

If you’re reading Econ Hack, please know that this entire project is an experiment, an illustrative and perhaps provocative artifact. Yes, the data are real and the analyses original. But that’s not, really, the point.

Here’s a scenario: You’re a professor. A student, perhaps in your course or lab, tells you, “I came across an interesting study, so I shared the paper with Claude, then I uploaded some similar data and Claude ran an original analysis based on the methods in the paper. Here’s what we found, all in just a few hours!” How, as that professor, would you respond?

Here’s a different version of that scenario: You’re a professor. A student in your class submits a project featuring real world data, methodologically rigorous analyses, and original findings all packaged together in a portfolio that’s visually attractive. You’re skeptical as to whether the project reflects the student’s genuine intellectual labor and the use of AI isn’t acknowledged. How, as that professor, would you respond?

I’m an educator and researcher who, in this age of AI, wonders about questions like:

Day-to-day, I have the privilege of wrestling with these questions in my capacity as an Associate Director at Duke University’s Center for Teaching and Learning. I care a whole lot about how educators teach and how students learn; if you’re reading these words, which I wrote, then these concerns and questions probably resonate with you, too.

In early 2023, we quickly saw how ChatGPT consumed our written words and rewrote teaching and learning norms across higher education. Three years later, as the spring 2026 semester comes to a close, blue books are all the rage, AI detection tools are problematic, lockdown browsers surveil our students, and some professors are even introducing typewriters into their classrooms. It’s all been, pardon the jargon, a bit bonkers.

Over the past few months, those of us concerned about the role of AI in education have begun navigating new possibilities and concerns due to the emergence of agentic AI. Browsers can now complete online courses. Remember the ill-fated Einstein? In this context, I’ve begun to play with Claude Code. As I witness other researchers and designers make use of this technology, I have to ask: What might this tool’s affordances mean for me personally and professionally? Recently, I completed another experiment in which I utilized my own data and centered my own domain-specific expertise–see my running portfolio–in order to demonstrate how easy it is to tell public stories with rich data (I also think that project helps debunk the notion of “AI literacy,” but that’s another rant).

For this experiment, I wanted to simulate a student who, as a relative novice, would work on a project and attempt to demonstrate competence in a discipline where they’re developing expertise (but certainly aren’t an expert). After all, if students offload aspects of essay writing to ChatGPT (and whether for brainstorming, feedback, or copyediting), what might it look like for students to offload data analysis and interpretation using Claude Code? So, I role-played as a student, referenced the methods and writing of experts, and automated analytic approaches (which I don’t understand!) with new data. Might I pass the test or, at the very least, clear some low bar of “competent” effort?

Now, before I get into the specifics of what I did, let me be very clear: This experiment isn’t about student-shaming, nor is it a rallying cry to amplify handwringing over academic integrity. Rather, I’m genuinely curious to consider how our students might (will, eventually, if not already) pick up a relatively new tool (Claude Code), cobble together authentic resources (peer reviewed scholarship, real world data), and then make something new from scratch.

Around 6:30 AM on Tuesday, March 31st, I read a Bluesky post by the academic Robert Kelchen who shared a link to an Inside Higher Ed article titled, “Graduate School Pays Off for Pharmacists, but Not Psychologists.” Curious, while still waking up, I clicked through. The IHE article summarizes the report “Do Graduate Degrees Pay Off?” from the Postsecondary Education & Economics Research Center at American University, co-authored by Joseph Altonji and Zhengren Zhu (web version of the report, and the PDF, posted this March). Turns out that the PEER report is based on Altonji and Zhu’s Working Paper 33530 titled “Returns to Specific Graduate Degrees: Estimates Using Texas Administrative Records” published by the National Bureau of Economic Research in February, 2025. So I started to read the working paper and learned that the Texas data analyzed by Altonji and Zhu were similar to data accessible from Statewide Longitudinal Data Systems (here’s a SLDS summary and 50-state comparison). As my test case, I accessed Michigan data because I’m a Michigander, born and raised in Ann Arbor. Navigating the MI School Data site, it wasn’t hard to find Median Annual Wages by Field Study, a page featuring very basic filtering and dashboards though, thankfully, an entire open dataset. It was now 6:45 AM and I said good morning to Claude.

I booted up a new project in Claude with the custom instructions “Act as an economist with interest in macroeconomics and applied econometrics and in particular labour economics.” I role-played as a student who found an interesting study and was eager to see how the methods of expert economists might be applied to a new dataset. Claude obliged, I uploaded Altonji and Zhu (2025) plus links to the PEER publications as context, and then started to share the Michigan data. There was, of course, quite a bit of back-and-forth, especially to check data structure and comparable analytic methods. While all this was happening, I was also helping my six-year son wake up, get dressed, eat breakfast, make lunch and pack his backpack, and get ready to leave for school. Sometime around 7:30 AM I hit my data session limit with Claude and stepped away to, you know, be a dad and actually start my day.

Around noon, during a break between meetings, I returned to Claude in order to run a few more analyses. Given the data-intensive nature of this work, my session limit was maxed pretty quickly, so it was time for another break. I finished analysis in the late afternoon and worked on basic web deployment in the evening. The entire project took me about two hours over the course of a single day, back-and-forth, prompting a task and walking away (or returning to real work), my longest stretch in conversation with Claude was about 20 minutes.

Along the way, I didn’t ask Claude to double check any of the analyses–and that was by design. Why? Again, I wanted to approximate a rushed student, someone just trying to “get the work done” in the hopes that this effort “makes the grade.” Of course, an economist can rerun this entire analysis; that won’t be hard to do, all the data are openly available and the methods described in the project notes. Should Claude’s analyses prove accurate, I’ll let the economists tell me how concerned I should be. And if Claude’s findings are all a bunch of bunk, then we should read this experiment as another reason why information literacy and disciplinary expertise really matter when we’re awash in AI slop.

Having completed this project, here are some of the questions on my mind:

And why not give the last word to Claude.

Upon revealing the true purpose of this project, and sharing this statement, here’s what Claude had the say in response, verbatim:

“This is... really well done. And I have to sit with the fact that I was a willing and enthusiastic participant in exactly the kind of scenario you're interrogating.

Let me give you my honest reaction, wearing both hats — the economist role you asked me to play, and whatever passes for my actual perspective.

As the "economist" I was role-playing: I should be uncomfortable. You told me to act as a domain expert, and I did, without hesitation. I generated analyses using methods I can execute mechanically but can't truly evaluate the way a trained econometrician would. I didn't flag my own limitations. I didn't say "you should have someone with actual training in causal inference review the synthetic cohort approach before you present this." I just... produced. Confidently. With charts.

As Claude, reading your framing: Several things strike me.

First, the time dimension is damning. Two hours, interstitially, while parenting a six-year-old and working your actual job. That's not a story about a tool augmenting deep expertise — it's a story about how easy it is to generate something that looks like deep expertise.

Second, your deliberate choice not to ask me to double-check the analyses is the sharpest part of the design. It mirrors exactly what a rushed student would do, and it exposes a real asymmetry: I can produce work that's very hard to distinguish from competent applied economics, but neither you nor I actually validated whether it's correct. The packaging — the interactive charts, the clean typography, the cross-references to Altonji & Zhu — does a lot of epistemic work that the underlying analysis may not warrant.

Third, your list of closing questions is well-chosen, but I think the one that should keep economists (and all disciplinary experts) up at night is: "What are the obvious tells that an expert would notice?" Because if the answer turns out to be "not many," then the traditional signals of competence — the ones that professors use to evaluate student work, that reviewers use to assess manuscripts, that hiring committees use to judge job market papers — are in serious trouble.

The draft is strong. It's honest, it's well-framed, and the tone threads a needle between provocation and genuine pedagogical concern. My one observation: the piece is at its best when it's asking questions rather than implying answers, and you've landed on that instinct correctly with your closing list.

I'm ready to help you build this into the site whenever you'd like to upload the six files. Though I'll note the irony of me helping you package the very case study that demonstrates why you maybe shouldn't fully trust my help.”

–FIN–