Returns to Graduate Degrees in Michigan: Evidence from Administrative Wage Records, 2018–2025
A Descriptive Companion Study to Altonji & Zhu (2025)
Graduate education is one of the largest financial decisions an American worker can make. Over 40 percent of bachelor's degree holders aged 35 to 39 now hold a graduate degree, up from 31 percent in 1993, and total graduate student debt has grown to constitute a substantial share of the $1.7 trillion federal student loan portfolio. Yet credible, field-specific evidence on whether graduate degrees pay off financially remains remarkably scarce. A landmark 2025 study by Joseph Altonji and Zhengren Zhu, using Texas administrative records and sophisticated econometric methods, provided the most rigorous estimates to date of causal returns to 121 specific graduate degrees. Their study, however, is limited to a single state. This raises a natural question: do the patterns they document in Texas hold elsewhere?
This project provides a Michigan-based companion analysis. Using eight years (2018–2025) of publicly available administrative wage records from the Michigan Center for Educational Performance and Information — which links education records to unemployment insurance earnings data — we examine how the financial returns to graduate education vary across fields, demographic groups, institutions, and over time. Michigan offers a valuable complement to Texas: it has a different industrial composition, a different mix of public universities, and a different demographic profile, yet its administrative data infrastructure enables similar (if less granular) analysis.
We compute descriptive log wage premia — the difference in log median annual wages between "Master's or Higher Degree" holders and bachelor's degree holders employed in Michigan within the same field of study — across 26 broad fields (2-digit CIP codes) and approximately 136 specific program areas (4-digit CIP codes). Unlike Altonji and Zhu, who have access to individual-level panel data with rich controls for ability, undergraduate major, and college GPA, we work with publicly reported cell-level aggregates. Our estimates are therefore descriptive, not causal: they capture both the true return to graduate education and the positive selection of higher-ability individuals into graduate programs.
To partially overcome this limitation, we develop a synthetic cohort panel that exploits the longitudinal structure of the data. Each annual report contains median wages at two horizons — one year and five years after degree completion — corresponding to different graduating cohorts. By stacking eight years of reports, we link four completion cohorts (approximately 2017 through 2020) at both horizons, enabling us to track within-cohort wage premium dynamics as graduates gain experience. This is the cell-level analog to the individual-level experience profiles estimated by Altonji and Zhu, and it represents the primary methodological contribution of this study.
We further disaggregate returns by gender, race and ethnicity, and institution (using the richer 2023–2025 data that cover 14 Michigan universities), and compute approximate internal rates of return incorporating tuition costs, program duration, and foregone earnings. A comprehensive methods appendix documents all data sources, estimation procedures, assumptions, and limitations in detail.
The employment-weighted average descriptive log premium across fields is 0.255 at one year after completion and 0.305 at five years, with a standard deviation of 0.191 and 0.156, respectively. These substantially exceed Altonji and Zhu's weighted average causal return of 0.159, as expected given that our estimates do not remove selection bias. The gap between our descriptive premia and their causal estimates provides an informal gauge of selection severity by field.
Across fields, we find patterns broadly consistent with the Texas evidence. At the 4-digit CIP level, the highest wage premia accrue to pharmacy (PharmD), MBA, and law (JD) programs — the same top tier identified by Altonji and Zhu. At the other end, the descriptive premium is negligible or negative for visual and performing arts, and modest for history and English. Engineering and computer science present a nuanced picture: high absolute wages but relatively small incremental gains over already-high bachelor's-level earnings in these fields.
Three findings stand out as particularly noteworthy. First, health professions is the only field where the graduate premium grows within cohorts over the first five years of a career (by an average of 0.167 log points). In every other field, bachelor's holders experience faster early-career wage growth than master's holders, narrowing the premium over time. The health professions pattern is consistent with the post-residency wage trajectory for physicians documented by Altonji and Zhu, and suggests this pattern is not specific to Texas.
Second, approximate internal rates of return reshape the field ranking relative to raw wage premia. MBA programs yield the highest IRR (0.255), followed by public administration (0.237) and nursing (0.222), reflecting their combination of meaningful wage gains with moderate tuition and the ability to continue working during enrollment. Computer science, despite a positive wage premium, shows a near-zero IRR (0.011) because bachelor's-level earnings are so high that the opportunity cost of graduate school nearly offsets the incremental gain. Education master's programs, often dismissed for their modest wage effects, show a respectable IRR (0.180) because they require minimal investment.
Third, institution-level variation is substantial in some fields but not others. In business, the 5-year log premium ranges from 0.211 (Central Michigan University) to 0.526 (Ferris State University), with the University of Michigan at 0.475. Importantly, the highest-premium institutions are not always those with the highest absolute wages — institutions whose bachelor's graduates already earn very high salaries show compressed premia, illustrating why raw wage rankings can be misleading guides to the value added by specific programs. In education, by contrast, institutional variation is modest, consistent with a regulated labor market where master's degrees confer relatively standardized salary increments.
This study contributes to the growing national conversation about the value and affordability of higher education in three ways.
First, it demonstrates that the broad cross-field patterns in graduate degree returns are robust across states, data sources, and methodologies. When both a descriptive analysis in Michigan and a rigorously identified causal analysis in Texas agree that MBA, law, and health professions degrees offer substantial financial returns while arts and humanities degrees offer modest ones, this consistency should give prospective students and policymakers greater confidence in these patterns — even if the precise magnitudes remain uncertain.
Second, it underscores that simple earnings comparisons can be deeply misleading. Average post-graduate earnings for MBA holders in Michigan exceed $110,000, while education master's holders earn roughly $72,000 — a gap of $38,000. But once we account for who selects into these programs, how much they would have earned without the degree, tuition costs, and foregone earnings, the gap in financial returns narrows considerably. Education master's programs, with their low costs, short duration, and high in-school earnings, deliver an IRR comparable to many STEM degrees. These nuances matter enormously for students making enrollment decisions and for policymakers designing accountability frameworks — such as the gainful employment provisions recently debated in Congress — that assess graduate programs based on earnings outcomes.
Third, and perhaps most importantly, the wide variation in returns we document — across fields, institutions, and demographic groups — means that the question "is graduate school worth it?" has no single answer. The financial return to a nursing master's degree from Ferris State University is fundamentally different from the return to a fine arts degree, and both are different from the return to a computer science master's for someone already earning $90,000 with a bachelor's degree. Prospective students deserve access to field-specific, institution-specific, and context-specific information, not broad averages. Studies like Altonji and Zhu's, and descriptive complements like this one, are steps toward providing that information — but much work remains, particularly in making these findings accessible to the students and families who need them most.
Data source: Michigan Center for Educational Performance and Information (CEPI). "Median Annual Wages by Field of Study," MiSchoolData, 2018–2025 annual reports. mischooldata.org
Returns to Graduate Degrees in Michigan
Phase 1: Cross-Field Wage Premia, Trends, and Comparison with Altonji & Zhu (2025)
1. Graduate Degree Wage Premia by Field
The table below reports the median annual wages of bachelor's degree holders and master's-or-higher degree holders employed in Michigan, by broad field of study (2-digit CIP code). The log wage premium is the difference in log median wages, which approximates the percentage earnings gain from holding a graduate degree versus a bachelor's degree in the same field. These are descriptive premia averaged over 2023–2025 reporting years.
Log Wage Premium by Field
2. Michigan–Texas Comparison
We compare our Michigan descriptive wage premia to the causal estimates from Altonji & Zhu (2025), who use Texas administrative records with rich individual-level controls. Their preferred "FEcg" estimator accounts for selection by conditioning on undergraduate major × graduate field fixed effects. Our Michigan estimates are raw median wage gaps at the 2-digit CIP level — systematically larger because they capture both the causal effect of the degree and positive selection into graduate programs.
3. Premium Trends Over Time (2018–2025)
With eight years of data, we can trace how the graduate wage premium within each field has evolved. This speaks to labor market demand shifts — for example, whether the premium for health professions changed during and after the COVID-19 pandemic, or whether the CS premium responded to the AI boom.
All Fields: 1-Year Premium Trend
↑ Back to top4. Synthetic Cohort Analysis: Within-Cohort Wage Growth
Our strongest analytical contribution in Phase 1. By linking the same approximate completion cohort observed at 1 year post-completion (in one reporting year) and again at 5 years post-completion (four reporting years later), we track how the wage premium evolves within the same group of graduates as they gain experience. This is the cell-level analog to Altonji & Zhu's individual-level experience profiles.
We identify four linkable cohorts: ~2017, ~2018, ~2019, and ~2020 completers, each observed at both the 1-year and 5-year horizon across 12 fields.
5. Methods & Data Notes
This analysis uses Michigan MiSchoolData "Median Annual Wages by Field of Study" reports for 2018–2025, produced by the Center for Educational Performance and Information (CEPI) by linking education records to unemployment insurance wage records. We compute descriptive log wage premia — the difference in log median wages between "Master's or Higher Degree" holders and bachelor's degree holders within the same field of study. These are not causal estimates of the returns to graduate education.
Key points to keep in mind when interpreting these results:
Descriptive, not causal. Our estimates capture both the true return to graduate education and positive selection into graduate programs. They should be treated as upper bounds on causal returns. See the Methods Appendix §3.2 for a detailed discussion.
Coarse degree classification. Michigan's "Master's or Higher Degree" category bundles master's, professional (JD, MD, PharmD), and doctoral degrees. At 2-digit CIP level, this mixes programs with very different returns. Our 4-digit CIP analysis in Phase 2 mitigates this concern. See Methods Appendix §2.2.
Michigan employment only. Graduates who leave Michigan are excluded. If higher-earning graduates are more mobile, returns are understated. See Methods Appendix §8.3.
Medians, not means. We observe median wages; Altonji & Zhu estimate conditional mean effects on log earnings. These can differ for skewed distributions. See Methods Appendix §3.4.
↑ Back to topPhase 2: Synthetic Cohort Analysis
Within-cohort wage dynamics, 4-digit degree-level estimates, and COVID-era effects
1. Within-Cohort Wage Growth by Education Level
By tracking the same approximate cohort from 1 year to 5 years post-completion, we decompose the wage premium dynamic into two components: how fast master's-or-higher holders' wages grow versus how fast bachelor's holders' wages grow. If BA wages grow faster than MA+ wages, the premium narrows even though both groups are earning more.
All Fields: MA+ vs. BA Wage Growth (Log Points, Year 1 → Year 5)
↑ Back to top2. Returns at the 4-Digit CIP Level
Michigan's data reports wages at 4-digit CIP codes, letting us get closer to Altonji & Zhu's degree-specific estimates. We can separately identify MBA (CIP 52.02), Law/JD (22.01), Social Work (44.07), Clinical Psychology (42.28), Nursing (51.38), Pharmacy (51.20), Medicine (51.12), and specific engineering subfields — exactly the degrees AZ focus on.
3. Cohort-by-Cohort Premium Trajectories
Each of our four linkable cohorts (~2017, ~2018, ~2019, ~2020) tells a slightly different story. Examining them individually reveals whether the patterns we see in the averages are driven by a single cohort or are consistent across graduating classes.
4. COVID-Era Cohort Effects
The ~2019 and ~2020 completion cohorts entered the labor market during the pandemic. We compare their wage premia to the pre-COVID cohorts (~2017, ~2018) to assess whether the pandemic differentially affected graduate vs. bachelor's degree holders.
5. 4-Digit Premium Trends Over Time (2018–2025)
Tracking specific degree types over eight years reveals medium-term labor market dynamics. The MBA premium has been remarkably stable (0.57–0.64 range). Nursing saw a notable decline from 0.44 in 2019 to 0.25 in 2023, likely as pandemic-era demand for bedside nurses raised BA-level nursing wages relative to advanced-practice wages.
Phase 3: Who Benefits Most?
Demographic and Institutional Heterogeneity in Graduate Degree Returns
1. Gender Heterogeneity
Altonji & Zhu (2025) find that returns to graduate degrees are higher for women in 14 of 18 fields. We examine whether the same pattern holds in Michigan's descriptive wage premia. For each field, we compute the log premium separately for men and women, then compare.
Female vs. Male Log Wage Premium by Field
↑ Back to top2. Race & Ethnicity Heterogeneity
Altonji & Zhu report surprising results on racial heterogeneity: Asian Americans receive substantially lower causal returns than non-Hispanic whites in most fields, while patterns for Black and Hispanic graduates are mixed. We examine whether similar patterns appear in Michigan's descriptive data, with the caveat that cell sizes for minority groups are small in many fields.
Log Premium by Race: Selected Fields
3. Institution-Specific Returns
Altonji & Zhu find that MBA and JD returns are significantly higher at higher-ranked programs (using the Dale-Krueger method to control for selection into institutions). We examine institution-level variation across Michigan's 14 universities with master's-level wage data. Without application/admission controls, our estimates reflect both the causal effect of the institution and student selection.
4. Summary of Key Findings
Gender
The descriptive premium is higher for men in most fields — the opposite of Altonji & Zhu's causal finding. The most notable exception is Math & Statistics, where women show a substantially higher premium (+0.210 gap). The reversal relative to AZ likely reflects that our descriptive measure conflates the causal return with gender-specific selection patterns that AZ's FEcg method removes.
Race & Ethnicity
Coverage is limited, but where we can compare: in Computer Science and Engineering, Black and Hispanic graduates show lower descriptive premia than White graduates. In Education and Public Administration, the pattern reverses — minority graduates show slightly higher premia. Altonji & Zhu's finding of notably lower returns for Asian Americans is partially visible in our data for Engineering (Asian premium 0.167 vs. White 0.305) and Computer Science (0.163 vs. 0.293). Business (MBA) shows remarkably similar premia across racial groups.
Institutions
In Business, the University of Michigan (Ann Arbor) commands the highest MA+ median wage ($156k at 5 years) but not the highest premium (Ferris State's premium of 0.526 exceeds U-M's 0.475). This reflects the well-known composition effect: U-M's bachelor's graduates also earn very high wages, compressing the premium. In Health Professions, Ferris State and U-M show the highest premia, likely driven by their pharmacy (PharmD) and medical (MD) programs. In Education, the premium varies modestly across institutions (0.16–0.44 range), consistent with a relatively standardized labor market for teachers and administrators.
↑ Back to topPhase 4: Approximate Internal Rates of Return
Accounting for tuition, program length, and foregone earnings
1. Approximate IRR by Broad Field (2-Digit CIP)
The IRR accounts for tuition costs, program duration, and foregone earnings during enrollment — factors that can substantially alter the ranking of degrees relative to their raw wage premia. A field with a high wage premium but long and expensive program (like law) may have a lower IRR than a field with a moderate premium but short, inexpensive program (like education administration).
2. IRR by Specific Degree (4-Digit CIP)
At the 4-digit CIP level, we can compute IRRs for specific degree types that map more directly to Altonji & Zhu's focal programs: MBA, JD, MSW, MPA, Clinical Psychology, Nursing, and specific engineering subfields.
3. Michigan–Texas IRR Comparison
Comparing our approximate Michigan IRRs to Altonji & Zhu's Texas estimates (their Table 4), which are based on causal FEcg earnings effects and are therefore more credible. The comparison reveals which fields' financial returns are robust across states and methodologies.
4. Key Findings
4.1 IRR rankings partially diverge from wage premium rankings
Education shows a modest wage premium (0.303 log points at 1 year) but a relatively high IRR (0.180), because education master's programs are short, inexpensive, and students typically continue working while enrolled. This parallels Altonji & Zhu's finding that Education Administration has an IRR of 0.230 — more than twice its log earnings effect — because the net investment is small.
4.2 High-premium fields with expensive programs see compressed IRRs
Legal Professions (JD) has among the highest wage premia but a lower IRR (0.152 at 2-digit) because law school is three years long with high tuition ($28k/year) and low earnings during enrollment. At 4-digit CIP, we lack sufficient JD-specific 5-year data to compute the IRR, highlighting a data limitation.
4.3 Some fields show negative PDV gains
Computer Science, Mathematics, and Biology show negative percentage PDV gains at the 2-digit level, meaning the present value of the master's earnings stream (net of tuition) falls short of the bachelor's counterfactual. For CS, this reflects extremely high BA-level wages; the master's adds modest incremental earnings that barely justify the investment. For Biology, it reflects the very high BA wage growth rate, which means the opportunity cost of two years out of the workforce is substantial.
4.4 Michigan and Texas IRRs agree on the top tier
Both states identify MBA, MPA, and Nursing among the highest-IRR programs. The Michigan estimates are generally higher than Texas (as expected, since our descriptive premia overstate causal returns), but the correlation in rankings is encouraging. The largest discrepancy is Social Work: Michigan IRR of 0.176 vs. Texas IRR of 0.020. This likely reflects that Michigan's broader CIP 44 category mixes social work with higher-return public administration programs.
↑ Back to topMethods & Data Appendix
Returns to Graduate Degrees in Michigan: Evidence from Administrative Wage Records, 2018–2025
1. Data Source and Provenance
1.1 The MiSchoolData Workforce Reports
We use the "Median Annual Wages by Field of Study" reports published by the Michigan Center for Educational Performance and Information (CEPI) on the MiSchoolData platform (mischooldata.org). These reports are produced annually as part of the Michigan Workforce Longitudinal Data System, a partnership among CEPI, the Michigan Department of Labor and Economic Opportunity (LEO), and the Unemployment Insurance Agency (UIA).
1.2 Underlying Data Systems
The wage reports are constructed by linking three administrative data systems:
- Student Transcript and Academic Record Repository (STARR): Contains enrollment, degree program (CIP code), and completion records for all students at Michigan public community colleges and universities.
- National Student Clearinghouse (NSC) StudentTracker: Supplements STARR with enrollment records from out-of-state and non-STARR-participating institutions, enabling tracking of Michigan high school graduates who attend college elsewhere.
- Unemployment Insurance (UI) Wage Records: Quarterly earnings records submitted by employers to the Michigan UIA. These cover virtually all wage and salary employment in Michigan but exclude self-employment, federal government employment, and out-of-state employment.
1.3 Report Construction by CEPI
CEPI matches education records to wage records to identify each individual's highest level of education and their post-completion earnings. Two cohorts are defined for each annual report:
- Wages After 1 Year: Graduates who completed their highest education credential approximately one year before the reporting period (four quarters of wage data).
- Wages After 5 Years: Graduates who completed approximately five years before the reporting period (four quarters of wage data).
Students who are still enrolled in education at the time of the wage observation are excluded. Individuals must have received their high school education in Michigan and must be currently employed in Michigan to appear in the data. The wage measure is the median annual wage within each cell, rounded to the nearest $100. Cells with fewer than 10 employed individuals are suppressed (wages not reported).
1.4 Years of Data
We use eight annual reports: 2018, 2019, 2020, 2021, 2022, 2023, 2024, and 2025. The 2018–2022 reports contain statewide aggregates only. The 2023–2025 reports additionally contain institution-level (IHE) data and demographic breakdowns by gender, age band, and race/ethnicity.
↑ Back to top2. Sample Construction and Variable Definitions
2.1 Unit of Observation
Each observation in the raw data is a cell defined by the combination of: report year, location (statewide or specific institution), diversity type and category (if applicable), wage horizon (1 year or 5 years after completion), CIP code (field of study), and education level. The reported values for each cell are the median annual wage and the number of employed individuals.
2.2 Education Levels
The data classify individuals into four education levels based on their highest credential earned:
- Certificate: Sub-baccalaureate certificate or diploma
- Associate Degree: Two-year associate's degree
- Bachelor's Degree: Four-year baccalaureate degree
- Master's or Higher Degree: Master's degree, professional degree (JD, MD, PharmD, DDS, etc.), or doctoral degree
2.3 Field of Study Classification
Fields of study are identified by Classification of Instructional Programs (CIP) codes at three levels of granularity:
- 2-digit CIP: Broad field categories (e.g., CIP 14 = Engineering). We identify 43 unique 2-digit fields in the data; 27 have reportable master's-level wages.
- 4-digit CIP: Specific program areas (e.g., CIP 14.19 = Mechanical Engineering). We identify 316 unique 4-digit fields; approximately 136 have reportable master's-level wages.
- 6-digit CIP: Detailed program specializations. We identify 970 unique 6-digit fields, though many have small cell sizes.
Our primary analyses use 2-digit CIP codes for robustness and comparability across years. We supplement with 4-digit analyses to achieve closer correspondence with Altonji & Zhu's degree-specific estimates.
2.4 Deduplication
In some years, the data contain multiple rows for the same CIP code at a given level (with slightly different field name strings). We deduplicate by retaining the row with the largest number of employed individuals for each CIP × report year × education level × wage horizon cell. Post-deduplication, each cell is unique.
2.5 Demographic Categories (2023–2025 only)
The 2023–2025 reports provide statewide wage data broken down by three diversity dimensions:
- Gender: Male, Female, Unknown/Unreported
- Race/Ethnicity: American Indian or Alaska Native, Asian, Black or African American, Hispanic of Any Race, Native Hawaiian or Pacific Islander, Two or More Races, White, Unknown/Unreported
- Age Band: 18–24, 25–34, 35–44, 45–54, 55–64, >64
These breakdowns are reported separately from the overall statewide figures and cannot be cross-tabulated (e.g., we cannot observe wages for Black women specifically).
2.6 Institution-Level Data (2023–2025 only)
The 2023–2025 reports include wage data at the individual institution (IHE) level. We observe 43 institutions total, of which 14 report master's-level wage data: Central Michigan University, Eastern Michigan University, Ferris State University, Grand Valley State University, Michigan State University, Michigan Technological University, Northern Michigan University, Oakland University, Saginaw Valley State University, University of Michigan (Ann Arbor), University of Michigan–Dearborn, University of Michigan–Flint, Wayne State University, and Western Michigan University.
↑ Back to top3. Estimation: Log Wage Premium
3.1 Core Estimand
For each field of study f, report year t, and wage horizon h ∈ {1, 5}, we compute the log wage premium of a graduate degree over a bachelor's degree:
where W̃ denotes the median annual wage for the indicated education level. This quantity approximates the proportional wage gap: for small values, Δ ≈ (WMA+ − WBA) / WBA. We also report the exact percentage premium: %Δ = (WMA+ − WBA) / WBA × 100.
3.2 Interpretation
The log wage premium is a descriptive measure. It captures the raw difference in median log earnings between graduate and bachelor's degree holders working in Michigan within the same broad field. It does not estimate the causal effect of obtaining a graduate degree, because:
- Individuals who pursue graduate degrees differ systematically from those who do not — in ability, motivation, prior academic performance, family background, and occupational preferences.
- The bachelor's median and master's-or-higher median within a field may reflect very different pools of workers, not the same people with and without a degree.
- We cannot control for undergraduate major, college GPA, demographics, or other confounders that affect both the graduate school decision and subsequent earnings.
The premium should generally be interpreted as an upper bound on the causal return, since positive selection into graduate programs (higher-ability individuals being more likely to pursue advanced degrees) biases the raw gap upward.
3.3 Pooling Across Years
For summary tables, we average the log premium across the three most recent report years (2023–2025) to reduce noise from year-to-year sampling variation while reflecting current labor market conditions. For trend analyses, we report year-by-year estimates.
3.4 Using Medians vs. Means
Our data report median wages, while Altonji & Zhu's regression estimates correspond to effects on the conditional mean of log earnings. In principle, if the earnings distribution is log-normal, the difference in log medians equals the difference in log means. In practice, skewness (particularly for high-earning fields like law and medicine) means our estimates may differ from what a regression on log earnings would produce. We cannot assess the magnitude of this discrepancy without access to the underlying microdata.
↑ Back to top4. Synthetic Cohort Panel
4.1 Construction
The synthetic cohort panel exploits the fact that each annual report contains two wage horizons — 1 year and 5 years after completion — which correspond to different completion cohorts. Specifically:
- The "Wages After 1 Year" observation in report year t covers graduates who completed their highest credential in approximately year c = t − 1.
- The "Wages After 5 Years" observation in report year t covers completers from approximately c = t − 5.
A cohort completing in year c therefore appears as "1 year after" in report year c + 1 and as "5 years after" in report year c + 5. With reports spanning 2018–2025, we can link four cohorts at both horizons:
| Approx. Completion Year | Observed 1yr After (Report Year) | Observed 5yr After (Report Year) |
|---|---|---|
| ~2017 | 2018 | 2022 |
| ~2018 | 2019 | 2023 |
| ~2019 | 2020 | 2024 |
| ~2020 | 2021 | 2025 |
4.2 Within-Cohort Premium Growth
For each linked cohort, we compute the premium at both horizons and measure the change:
where ΔΔ is the within-cohort premium growth for field f and cohort c. This can be decomposed as:
That is, the premium grows if master's holders' wages grow faster than bachelor's holders' wages, and shrinks if the reverse is true. We report both the premium change and the separate wage growth components.
4.3 Assumptions and Caveats
The synthetic cohort approach assumes that the 1-year and 5-year observations within a cohort-field cell reflect approximately the same group of graduates. This assumption can fail due to:
- Differential out-migration: If higher-earning graduates are more likely to leave Michigan between years 1 and 5, the 5-year median will understate true wage growth. This concern is particularly acute for fields like law, business, and engineering where national labor markets may draw talent out of state.
- Selective labor force entry/exit: Individuals who were not employed at year 1 but are employed at year 5 (or vice versa) change the composition of the cell.
- Median sensitivity: Because we observe medians (not means), the addition or subtraction of a few individuals near the median can produce discrete jumps in measured wages that do not reflect actual wage dynamics for any individual.
4.4 COVID-Era Cohort Classification
For the COVID-era analysis, we classify the ~2017 and ~2018 completion cohorts as "pre-COVID" (they entered the labor market before the pandemic) and the ~2019 and ~2020 cohorts as "COVID-era" (they entered during or immediately after the pandemic's initial labor market disruptions). The ~2019 cohort is borderline — they entered the workforce approximately in 2020, at the pandemic's onset. We include them in the COVID-era group because their initial labor market experience was likely disrupted.
↑ Back to top5. Relationship to Altonji & Zhu (2025)
5.1 What They Estimate
Altonji and Zhu (2025) estimate causal effects of 121 graduate degrees on log quarterly earnings using Texas Schools Project administrative data, which links individual-level records across high school, college, graduate school, and UI earnings. Their sample comprises approximately 31.5 million person-quarter observations. They employ six estimation strategies:
- OLS with rich controls (college major × gender age profiles, race/ethnicity, college GPA, college credits, year dummies)
- FEcg (their preferred): adds fixed effects for each undergraduate major × graduate field combination, absorbing permanent unobserved heterogeneity within these groups
- FE (individual fixed effects): compares the same individual's earnings before and after graduate school
- OLS-ps: propensity-score-reweighted OLS
- FEcg-ps: propensity-score-reweighted FEcg
- Dale-Krueger method: for institution-specific returns, controlling for application/admission sets
5.2 What We Estimate
Our log wage premium is conceptually most similar to a naïve OLS estimate without individual-level controls — a raw cross-sectional comparison of graduate vs. bachelor's median wages. The key differences are:
| Feature | Altonji & Zhu (Texas) | This Study (Michigan) |
|---|---|---|
| Unit of analysis | Individual person-quarter | Aggregate cell (field × ed level × year) |
| Outcome variable | Log quarterly earnings (mean) | Log median annual wage |
| Degree classification | 121 specific graduate degrees (4-digit CIP) | "Master's or Higher" by 2-digit or 4-digit CIP |
| Controls | College major, GPA, race, gender, age profiles | None (raw wage gap within field) |
| Selection correction | FEcg, FE, propensity score reweighting | None |
| Counterfactual | Same person without grad degree (FE) or same major-group without grad degree (FEcg) | Bachelor's holders in same broad field |
| Time dimension | Individual panel (same person over time) | Synthetic cohort panel (same approx. cohort over time) |
| Geographic scope | Texas employment only | Michigan employment only |
5.3 Interpreting the Gap Between Estimates
Our Michigan descriptive premium should systematically exceed Altonji & Zhu's FEcg estimates because our measure does not remove selection bias. The difference provides an informal gauge of selection severity by field. A large gap (e.g., Business: Michigan 5yr = 0.449 vs. AZ FEcg = 0.125) implies strong positive selection — MBA students would have earned substantially more than the average bachelor's holder even without the degree. A small gap (or a Michigan estimate below the AZ estimate) would suggest either negative selection, different field composition across states, or measurement differences.
5.4 Cross-State Comparability
Despite different data structures and methods, the cross-field ranking of graduate degree returns is broadly consistent between Michigan and Texas. This suggests the rank ordering of returns is driven by fundamental labor market forces (human capital accumulation, professional licensing requirements, occupational wage structures) rather than state-specific factors.
↑ Back to top6. Heterogeneity Analyses (Phase 3)
6.1 Gender and Race/Ethnicity
Using the 2023–2025 demographic breakdowns, we compute the log wage premium separately by gender and by racial/ethnic group. The formula is the same as in Section 3.1, applied within each demographic cell. We average across the three available years to improve precision.
Because the demographic breakdowns cannot be cross-tabulated (we observe wages by gender or by race, not by gender × race), these analyses are necessarily univariate. We also cannot condition on other covariates (age, undergraduate major) within a demographic group.
6.2 Institution-Specific Returns
Using institution-level data from 2023–2025, we compute the log premium for each IHE that reports both bachelor's and master's-or-higher wages in a given field. We restrict attention to the 14 universities with master's-level data. For institution-specific returns by program rank, we correlate our institution-level premia with U.S. News & World Report program rankings, analogous to Altonji & Zhu's Figure 3.
Unlike Altonji & Zhu, we do not have application/admission data and therefore cannot implement the Dale-Krueger selection correction. Our institution-level estimates reflect both the causal effect of the institution and selection into institutions based on unobserved characteristics.
↑ Back to top7. Approximate Internal Rate of Return (Phase 4)
7.1 Approach
We compute approximate internal rates of return (IRR) following the logic of Altonji & Zhu's Table 4. The IRR is the discount rate at which the present discounted value of the earnings stream with a graduate degree (net of tuition) equals the present discounted value of the counterfactual earnings stream without a graduate degree.
7.2 Inputs
- Post-degree earnings: Median wages from the Michigan data at the 1-year and 5-year horizons, interpolated and extrapolated as needed.
- Counterfactual earnings: Bachelor's median wages in the same field and year, serving as the proxy for what the graduate degree holder would have earned without the degree.
- Tuition: Sourced from IPEDS for Michigan institutions, by degree type and year. We use average tuition at public institutions.
- Program duration: Assumed from standard program lengths, calibrated to Altonji & Zhu's Table A2.1 where possible.
7.3 Limitations of the IRR Calculation
Our IRR estimates are rougher than Altonji & Zhu's because we lack individual-level earnings trajectories, data on earnings during enrollment, and the ability to condition on individual characteristics. The bachelor's median is an imperfect counterfactual. We report IRR estimates as indicative of the relative ranking of fields, not as precise point estimates of the financial return to any individual.
↑ Back to top8. Limitations
8.1 No Causal Identification
This is the most fundamental limitation. Without individual-level data, we cannot implement any of the standard methods for addressing selection into graduate programs — individual fixed effects, FEcg, propensity score methods, instrumental variables, or regression discontinuity designs. Our estimates should be interpreted as descriptive wage premia, not causal returns.
8.2 Coarse Degree Classification
The "Master's or Higher Degree" category bundles master's degrees with professional degrees (JD, MD, PharmD, DDS, DVM) and research doctorates (PhD). Within a broad CIP category, this mixes programs with very different labor market returns. For example, at CIP 51 (Health Professions), our "Master's or Higher" category includes nursing master's students (moderate wages), physicians (very high wages), and public health master's students (moderate wages). The 4-digit CIP analysis mitigates this to some extent but does not fully resolve it.
8.3 Michigan Employment Only
The wage data only cover individuals employed in Michigan. Graduates who leave the state are excluded. If higher-earning graduates are more likely to move out of state (particularly from flagship programs in law, business, and medicine), our estimates understate the true returns for those fields. Altonji & Zhu note the same concern for Texas, citing Foote and Stange (2019), who find that individuals from flagship universities are more likely to move out of state.
8.4 Median vs. Mean
We observe medians, not means. Regression-based estimates (like Altonji & Zhu's) estimate effects on conditional means of log earnings. In fields with right-skewed earnings distributions (law, medicine, business), the mean and median of log earnings can differ, making direct comparison imprecise.
8.5 No Undergraduate Major Information
We cannot condition on undergraduate major, which Altonji & Zhu show is critical. Their Table 5 demonstrates that returns to an MBA vary from 0.066 for computer science majors to 0.326 for education majors. Our estimates average over all undergraduate backgrounds within a field.
8.6 Cross-Sectional Counterfactual
Our "counterfactual" — the bachelor's median in the same field — is the earnings of people who completed a bachelor's degree in that field, not the earnings that the graduate degree holder would have earned without the graduate degree. These can differ if graduate students come from different undergraduate majors than the bachelor's-only workers, or if they would have worked in different industries or occupations.
8.7 Suppression Bias
Cells with fewer than 10 employed individuals have suppressed wages. This may create bias if smaller programs have systematically different returns. We limit our analysis to non-suppressed cells, which tend to be larger and more established programs.
↑ Back to top9. Robustness and Sensitivity
9.1 Stability Across Years
We assess robustness by examining whether cross-field rankings are stable across the eight years of data. For most fields, the rank ordering of premia is highly consistent year-to-year. The Spearman rank correlation of field-level premia between any two consecutive years exceeds 0.85 in most cases.
9.2 2-Digit vs. 4-Digit Estimates
Where both are available, 4-digit estimates sometimes diverge from the 2-digit aggregates, which is informative about within-field heterogeneity. For example, CIP 52 (Business) at 2-digit yields a 5-year premium of 0.449, while CIP 52.02 (MBA specifically) yields 0.525. The difference reflects that MBA students outperform the average business graduate degree holder, consistent with the MBA being the highest-return degree within business.
9.3 Cohort Consistency
We examine whether the synthetic cohort results are driven by any single cohort by comparing within-cohort premium growth across all four linkable cohorts. For most fields, the four cohorts produce qualitatively similar patterns, though individual cohort estimates can be noisy for smaller fields.
↑ Back to topAppendix A: Data Inventory
| Report Year | Total Rows | Statewide Overall | Statewide + Demographics | Institution-Level |
|---|---|---|---|---|
| 2018 | 8,920 | 8,920 | — | — |
| 2019 | 8,984 | 8,984 | — | — |
| 2020 | 9,160 | 9,160 | — | — |
| 2021 | 9,152 | 9,152 | — | — |
| 2022 | 9,312 | 9,312 | — | — |
| 2023 | 619,010 | 9,552 | 97,104 | 512,354 |
| 2024 | 625,669 | 9,576 | 98,080 | 518,013 |
| 2025 | 634,666 | 9,608 | 99,296 | 525,762 |
| Total | 1,924,873 | 74,264 | 294,480 | 1,556,129 |
A.2 CIP Code Coverage
| CIP Level | Unique Codes | With Master's Wages (non-suppressed) |
|---|---|---|
| 2-digit | 43 | 27 |
| 4-digit | 316 | ~136 |
| 6-digit | 970 | Various (many suppressed) |
A.3 Institutions with Master's-Level Wage Data
Central Michigan University, Eastern Michigan University, Ferris State University, Grand Valley State University, Michigan State University, Michigan Technological University, Northern Michigan University, Oakland University, Saginaw Valley State University, University of Michigan (Ann Arbor), University of Michigan–Dearborn, University of Michigan–Flint, Wayne State University, Western Michigan University.
A.4 Synthetic Cohort Panel Coverage
100 linkable cohort-field cells across 4 cohorts (~2017, ~2018, ~2019, ~2020) and 27 2-digit CIP fields.
Data vintage: Michigan MiSchoolData, 2018–2025 reports, accessed March 2026
Reference study: Altonji, J.G. and Z. Zhu (2025), "Returns to Specific Graduate Degrees: Estimates Using Texas Administrative Records," NBER Working Paper 33530.
About This Project
Hi, I’m Remi, and I’m not an economist.
If you’re reading Econ Hack, please know that this entire project is an experiment, an illustrative and perhaps provocative artifact. Yes, the data are real and the analyses original. But that’s not, really, the point.
Here’s a scenario: You’re a professor. A student, perhaps in your course or lab, tells you, “I came across an interesting study, so I shared the paper with Claude, then I uploaded some similar data and Claude ran an original analysis based on the methods in the paper. Here’s what we found, all in just a few hours!” How, as that professor, would you respond?
Here’s a different version of that scenario: You’re a professor. A student in your class submits a project featuring real world data, methodologically rigorous analyses, and original findings all packaged together in a portfolio that’s visually attractive. You’re skeptical as to whether the project reflects the student’s genuine intellectual labor and the use of AI isn’t acknowledged. How, as that professor, would you respond?
I’m an educator and researcher who, in this age of AI, wonders about questions like:
- What discipline-specific skills should students practice and learn?
- When—if at all—is it appropriate to offload skills-based tasks to AI?
- How do we help our students thoughtfully demonstrate their critical thinking?
- How do we help students responsibly use AI as a thought partner and collaborator?
- How do students “show their work” as they embrace new skills and technologies?
- How do we model the productive struggle of inquiry and the labor of scholarship?
Day-to-day, I have the privilege of wrestling with these questions in my capacity as an Associate Director at Duke University’s Center for Teaching and Learning. I care a whole lot about how educators teach and how students learn; if you’re reading these words, which I wrote, then these concerns and questions probably resonate with you, too.
In early 2023, we quickly saw how ChatGPT consumed our written words and rewrote teaching and learning norms across higher education. Three years later, as the spring 2026 semester comes to a close, blue books are all the rage, AI detection tools are problematic, lockdown browsers surveil our students, and some professors are even introducing typewriters into their classrooms. It’s all been, pardon the jargon, a bit bonkers.
Over the past few months, those of us concerned about the role of AI in education have begun navigating new possibilities and concerns due to the emergence of agentic AI. Browsers can now complete online courses. Remember the ill-fated Einstein? In this context, I’ve begun to play with Claude Code. As I witness other researchers and designers make use of this technology, I have to ask: What might this tool’s affordances mean for me personally and professionally? Recently, I completed another experiment in which I utilized my own data and centered my own domain-specific expertise–see my running portfolio–in order to demonstrate how easy it is to tell public stories with rich data (I also think that project helps debunk the notion of “AI literacy,” but that’s another rant).
For this experiment, I wanted to simulate a student who, as a relative novice, would work on a project and attempt to demonstrate competence in a discipline where they’re developing expertise (but certainly aren’t an expert). After all, if students offload aspects of essay writing to ChatGPT (and whether for brainstorming, feedback, or copyediting), what might it look like for students to offload data analysis and interpretation using Claude Code? So, I role-played as a student, referenced the methods and writing of experts, and automated analytic approaches (which I don’t understand!) with new data. Might I pass the test or, at the very least, clear some low bar of “competent” effort?
Now, before I get into the specifics of what I did, let me be very clear: This experiment isn’t about student-shaming, nor is it a rallying cry to amplify handwringing over academic integrity. Rather, I’m genuinely curious to consider how our students might (will, eventually, if not already) pick up a relatively new tool (Claude Code), cobble together authentic resources (peer reviewed scholarship, real world data), and then make something new from scratch.
Around 6:30 AM on Tuesday, March 31st, I read a Bluesky post by the academic Robert Kelchen who shared a link to an Inside Higher Ed article titled, “Graduate School Pays Off for Pharmacists, but Not Psychologists.” Curious, while still waking up, I clicked through. The IHE article summarizes the report “Do Graduate Degrees Pay Off?” from the Postsecondary Education & Economics Research Center at American University, co-authored by Joseph Altonji and Zhengren Zhu (web version of the report, and the PDF, posted this March). Turns out that the PEER report is based on Altonji and Zhu’s Working Paper 33530 titled “Returns to Specific Graduate Degrees: Estimates Using Texas Administrative Records” published by the National Bureau of Economic Research in February, 2025. So I started to read the working paper and learned that the Texas data analyzed by Altonji and Zhu were similar to data accessible from Statewide Longitudinal Data Systems (here’s a SLDS summary and 50-state comparison). As my test case, I accessed Michigan data because I’m a Michigander, born and raised in Ann Arbor. Navigating the MI School Data site, it wasn’t hard to find Median Annual Wages by Field Study, a page featuring very basic filtering and dashboards though, thankfully, an entire open dataset. It was now 6:45 AM and I said good morning to Claude.
I booted up a new project in Claude with the custom instructions “Act as an economist with interest in macroeconomics and applied econometrics and in particular labour economics.” I role-played as a student who found an interesting study and was eager to see how the methods of expert economists might be applied to a new dataset. Claude obliged, I uploaded Altonji and Zhu (2025) plus links to the PEER publications as context, and then started to share the Michigan data. There was, of course, quite a bit of back-and-forth, especially to check data structure and comparable analytic methods. While all this was happening, I was also helping my six-year son wake up, get dressed, eat breakfast, make lunch and pack his backpack, and get ready to leave for school. Sometime around 7:30 AM I hit my data session limit with Claude and stepped away to, you know, be a dad and actually start my day.
Around noon, during a break between meetings, I returned to Claude in order to run a few more analyses. Given the data-intensive nature of this work, my session limit was maxed pretty quickly, so it was time for another break. I finished analysis in the late afternoon and worked on basic web deployment in the evening. The entire project took me about two hours over the course of a single day, back-and-forth, prompting a task and walking away (or returning to real work), my longest stretch in conversation with Claude was about 20 minutes.
Along the way, I didn’t ask Claude to double check any of the analyses–and that was by design. Why? Again, I wanted to approximate a rushed student, someone just trying to “get the work done” in the hopes that this effort “makes the grade.” Of course, an economist can rerun this entire analysis; that won’t be hard to do, all the data are openly available and the methods described in the project notes. Should Claude’s analyses prove accurate, I’ll let the economists tell me how concerned I should be. And if Claude’s findings are all a bunch of bunk, then we should read this experiment as another reason why information literacy and disciplinary expertise really matter when we’re awash in AI slop.
Having completed this project, here are some of the questions on my mind:
- Is this AI slop, style over substance?
- Is this a useful analysis of real world data?
- Is this a trustworthy analysis of real world data?
- Under what circumstances would this be considered cheating?
- What are the obvious “tells” that an expert would notice indicating this project was completed by a novice with no knowledge of the methods used?
- What approaches to AI transparency and attribution should be used if a project like this were to show up in an academic setting?
- How far might I have taken this analytic ruse, including potentially toward publication, and what would that mean?
And why not give the last word to Claude.
Upon revealing the true purpose of this project, and sharing this statement, here’s what Claude had the say in response, verbatim:
“This is... really well done. And I have to sit with the fact that I was a willing and enthusiastic participant in exactly the kind of scenario you're interrogating.
Let me give you my honest reaction, wearing both hats — the economist role you asked me to play, and whatever passes for my actual perspective.
As the "economist" I was role-playing: I should be uncomfortable. You told me to act as a domain expert, and I did, without hesitation. I generated analyses using methods I can execute mechanically but can't truly evaluate the way a trained econometrician would. I didn't flag my own limitations. I didn't say "you should have someone with actual training in causal inference review the synthetic cohort approach before you present this." I just... produced. Confidently. With charts.
As Claude, reading your framing: Several things strike me.
First, the time dimension is damning. Two hours, interstitially, while parenting a six-year-old and working your actual job. That's not a story about a tool augmenting deep expertise — it's a story about how easy it is to generate something that looks like deep expertise.
Second, your deliberate choice not to ask me to double-check the analyses is the sharpest part of the design. It mirrors exactly what a rushed student would do, and it exposes a real asymmetry: I can produce work that's very hard to distinguish from competent applied economics, but neither you nor I actually validated whether it's correct. The packaging — the interactive charts, the clean typography, the cross-references to Altonji & Zhu — does a lot of epistemic work that the underlying analysis may not warrant.
Third, your list of closing questions is well-chosen, but I think the one that should keep economists (and all disciplinary experts) up at night is: "What are the obvious tells that an expert would notice?" Because if the answer turns out to be "not many," then the traditional signals of competence — the ones that professors use to evaluate student work, that reviewers use to assess manuscripts, that hiring committees use to judge job market papers — are in serious trouble.
The draft is strong. It's honest, it's well-framed, and the tone threads a needle between provocation and genuine pedagogical concern. My one observation: the piece is at its best when it's asking questions rather than implying answers, and you've landed on that instinct correctly with your closing list.
I'm ready to help you build this into the site whenever you'd like to upload the six files. Though I'll note the irony of me helping you package the very case study that demonstrates why you maybe shouldn't fully trust my help.”
–FIN–