Methodology and data notes

Overview

The Global Education Futures Readiness Index (GEFRI) benchmarks countries’ education futures readiness using globally comparable, openly available data. This methodology page documents how we select, process, normalize, and aggregate indicators, manage missing values, and assign confidence ratings to ensure a transparent, reproducible composite index.

Data sources

Indicator selection

GEFRI groups indicators into five dimensions. Each indicator was chosen for global coverage, policy relevance, and a clear connection to education futures. A population minimum is applied to certain per capita metrics to prevent microstates from distorting results.

DimensionIndicator CodeIndicator Description
InfrastructureEG.ELC.ACCS.ZSAccess to electricity (% of population)
IT.NET.USER.ZSInternet users (% of population)
IT.NET.SECR.P6Secure internet servers (per 1 million people)
IT.CEL.SETS.P2Mobile cellular subscriptions (per 100 people)
Human CapitalSE.XPD.TOTL.GD.ZSGovernment expenditure on education (% of GDP)
SE.ADT.LITR.ZSAdult literacy rate (% age 15+)
SE.SEC.ENRRSchool enrollment, secondary (% gross)
SE.TER.ENRRSchool enrollment, tertiary (% gross)
School Access and Gender ParitySE.ENR.SECO.FM.ZSSecondary GPI (Gross enrollment ratio, female/male)
SE.ENR.TERT.FM.ZSTertiary GPI (Gross enrollment ratio, female/male)
SE.PRM.UNER.ZSChildren out of school, primary (% of primary school age)
SE.SEC.UNER.LO.ZSAdolescents out of school, secondary (% of lower secondary school age)
SE.SEC.CMPT.LO.FE.ZSLower secondary completion rate, female (% of relevant age group)
SE.SEC.CMPT.LO.MA.ZSLower secondary completion rate, male (% of relevant age group)
InnovationGB.XPD.RSDV.GD.ZSR&D expenditure (% of GDP)
SP.POP.SCIE.RD.P6Researchers in R&D (per million people)
IP.JRN.ARTC.SC (pop ≥ 1 million)Scientific and technical journal articles per million population (countries with ≥1 million people)
TX.VAL.TECH.CDHigh-tech exports (current US$)
GovernanceGE.ESTGovernment Effectiveness (WGI)
RQ.ESTRegulatory Quality (WGI)
CC.ESTControl of Corruption (WGI)
VA.ESTVoice and Accountability (WGI)

All indicators are sourced from the World Bank Open Data API, with further attribution available in indicator metadata. Some indicators may be originally reported by UNESCO or other agencies.

Data imputation and confidence

GEFRI aims to maximize data comparability while ensuring transparency about imputed (estimated or filled using averages from comparable countries) values. When indicator data are missing for a country, values are imputed using a structured, four-stage process:

  1. Region + Income Group Average: If both region and income group are known and there is sufficient data, the average for countries matching both is used.
  2. Regional Average: If no value is available for the combined group, the average for all countries in the same region is used.
  3. Income Group Average: If regional data is missing, the average for the income group is used.
  4. Global Average: If none of the above are available, the global average is imputed.

Special Case: For Adult literacy rate in high-income countries with missing data, a value of 100% is imputed, with the imputation source recorded as Assumed (high income). This assumption reflects international norms, unless other evidence is available.

All imputed values are clearly flagged, and the specific imputation method is recorded for each data point. For every country and indicator, the source (“Original” or “Imputed: [method]”) is displayed in the data tables. Microstates (countries with populations of less than 300,000) are not included in imputation calculations.

For each GEFRI dimension (such as Infrastructure, Innovation, Human Capital, etc.), a confidence rating is assigned based on how many underlying indicators were imputed:

  • High confidence: No imputed data — all indicators for the dimension are based on directly reported values.
  • Moderate confidence: Imputed indicators are present, but less than half of the dimension’s indicators are imputed.
  • Low confidence: Half or more of the dimension’s indicators are imputed.

The full list of imputed indicators, their imputation methods, and confidence ratings is displayed for each country profile. Users should interpret results with extra caution when confidence is low.

Special note: In rare cases where a specific indicator is unavailable for small-population countries (e.g., scientific articles per million for populations under 1 million), the relevant dimension is automatically rated as “Low confidence.”

Normalization and scoring

  • Min-max normalization: Most indicators are normalized to a 0–1 scale (and presented on a 0–100 scale) using min-max scaling. For each indicator, values are rescaled using the minimum and maximum among all available countries.
  • Log transformation: Indicators with highly skewed distributions—such as Secure Internet Servers, Researchers in R&D, and Journal Articles per million—are log-transformed (using log1p) before normalization. High-tech exports are also log-transformed before inclusion.
  • Population-based eligibility: For Innovation, the scientific journal articles per million indicator is only included for countries with populations of at least 1 million. Countries below this threshold receive a “Low” confidence rating for Innovation.
  • Dimension scoring:
    • For Infrastructure, Human Capital, Innovation, and Governance, scores are calculated as the arithmetic mean (unweighted average) of all normalized indicators within each dimension.
    • For School Access and Gender Parity, the score is a weighted composite of “banded” sub-scores (see below).
  • Confidence and penalties: If fewer than 70% of indicators in a dimension are original (not imputed), or if a country is ineligible for a required indicator, that dimension receives “Low” confidence and its score is penalized by 30% in the composite (multiplied by 0.7).
  • Composite GEFRI Score: The overall score is the mean of the (potentially penalized) dimension scores, multiplied to 0–100.
School Access and Gender Parity – details:
  • Calculation method: This dimension is scored using a minimum/threshold (“veto”) approach, ensuring that a low score on any key indicator will directly lower the final score. For each country, four main indicators are used:
    • Children out of school, primary (% of primary school age)
    • Adolescents out of school, secondary (% of lower secondary school age)
    • Lower secondary completion rate, female (% of relevant age group)
    • Secondary GPI (gross enrollment ratio, female/male)
    Each indicator is transformed to a 10–100 scale using a linear function:
    • Out-of-school rates (OOS, lower is better): Score declines linearly from 100 (0% out-of-school) to 10 (30% or more out-of-school).
    • Lower secondary completion (higher is better): Score increases linearly from 10 (65% or lower) to 100 (100% completion).
    • GPI (Gender Parity Index): Score declines linearly from 100 at perfect parity (1.0) to 10 at 0.3 above or below parity (GPI = 0.7 or 1.3). Both boys’ and girls’ exclusion are penalized equally.
    The final School Access and Gender Parity score for each country is the minimum of these four scores, ensuring that no single form of exclusion or inequity is masked by strong performance elsewhere.
    Plausibility adjustment: In rare cases where a high-income, non-conflict country would receive an unusually low score due to a single indicator—while all other indicators are very high (≥80)—GEFRI uses the second-lowest score instead. This adjustment helps prevent data artifacts or definitional issues (such as those affecting vocational tracks) from unfairly lowering a country’s score.
  • Imputation penalty: 5 points are subtracted for each imputed (estimated) indicator used in the calculation, up to a maximum of 25 points.
  • Conflict/fragility cap: For countries classified as fragile or conflict-affected, the score is capped at 40 to reflect severe structural barriers.
  • Transparency and limitations: All scoring formulas are published. Edge cases (such as low completion in countries with strong vocational tracks) are flagged internally for review and user transparency. GEFRI does not manually override scores. All adjustments are made transparently using automated, documented rules.
Summary table of dimension scoring:
DimensionCalculation MethodWeighting
InfrastructureMean of normalized indicatorsEqual/unweighted
Human CapitalMean of normalized indicatorsEqual/unweighted
InnovationMean of normalized indicatorsEqual/unweighted
GovernanceMean of normalized indicatorsEqual/unweighted
School Access and Gender ParityMinimum of linearly transformed indicator scores (with plausibility adjustment for high-income, non-conflict countries), minus imputation penalty; capped for conflict/fragilityNot weighted; lowest (or plausibly second-lowest) indicator determines score

Data limitations and use notes

GEFRI is based on internationally reported data and transparent processing, but several limitations remain. All indicators are pulled using the most recent available value, capped at a maximum of 7 years from the present to avoid the use of outdated or unrepresentative figures. Imputation introduces uncertainty, especially where many indicators are missing or are based on older data. Some indicators may still lag behind real-world changes, and reporting standards can vary between countries. The population filter on innovation metrics helps prevent distortion from microstates, though small-sample issues may still occur.

The index should not be used for simple league tables or to draw definitive conclusions about individual countries, especially where confidence is low. GEFRI is best used as a starting point for inquiry and policy dialogue, supported by local expertise and contextual understanding.

  • Scores reflect the latest available data published by the reporting agency (within a 7-year window), not current-year events or rapid recent changes.
  • High shares of imputed data will reduce confidence in those scores.

For feedback or to report issues, contact us.

Changelog and updates

  • 2025-05-27: [1] Microstates are no longer ranked and are no longer included in computations for imputed scores. This results in a new score set. [2] GEFRI applies a plausibility adjustment to the School Access and Gender Parity dimension for high-income, non-conflict countries. When three of four indicators are very high, but a single value is anomalously low, the score is based on the second-lowest indicator to reduce the risk of data artifacts distorting results (e.g., due to vocational tracking of students in secondary education that may not be reflected in reported data).
  • 2025-05-16: Initial public beta, test run of GEFRI Score and component scores for public comment.