ECM Mapping Project Plan (Draft)

Easy Curriculum Mapping (ECM): A Research Proposal

Building the Curriculum Intermediate Representation for Africa's Digital Public Infrastructure for Education

Proposed to: The Bill & Melinda Gates Foundation β€” Global Education Program

Proposed by: The Spix Foundation, for consideration by the African Union Development Agency (AUDA-NEPAD), in partnership with university research institutions in Africa, India, and the United States

Duration: 48 months (4 years)

Requested funding: USD 10 million (staged across two phases with a go/no-go gate)


1. Executive Summary

Nine out of ten children in Sub-Saharan Africa cannot read a simple sentence by age ten. Digital courseware that teaches foundational literacy and numeracy exists β€” and cannot deploy across African countries because mapping courseware to each country's curriculum standards is manual, expert-dependent, and prohibitively expensive. This curriculum-mapping bottleneck is a structural barrier to continental-scale deployment of effective EdTech.

This proposal requests USD 10 million over 48 months to build Easy Curriculum Mapping (ECM): a Curriculum Intermediate Representation (Curriculum IR) that collapses the combinatorial cost of curriculum mapping from O(AppsΓ—Standards) to O(Apps+Standards). The architectural pattern is proven β€” LLVM demonstrated it for compilers, TCP/IP for computer networking, and FHIR for healthcare interoperability. ECM applies the same structural insight to education. Phase 1 includes a funded desk pilot validating the concept against real curricula. Deliverables include the open-source Curriculum IR specification, 12 digitized curricula (6 African, 6 Indian), validated crosswalks, and mapping tools for Ministries and courseware developers. AUDA-NEPAD leads. The Spix Foundation provides project management and software development. University partners in Africa, India, and the United States provide research expertise. The program's technical methodology leverages AI-assisted research and development tooling β€” from LLM-based concept extraction to AI-accelerated software engineering modeled on the open-source LLVM, TCP/IP, and FHIR codebases β€” compressing technical milestones and enabling proportionally greater investment in the institutional architecture that determines long-term adoption.


1A. Decision Package

What a Funder Commits To

The Gates Foundation commits USD 10M over 48 months, disbursed in two phases with a go/no-go gate at Month 24. Phase 1 (Months 1–24): USD 5.5M. Phase 2 (Months 25–48): USD 4.5M. Phase 2 funding is contingent on Phase 1 deliverables.

What a Funder Gets

Success Criteria by Month 24

Phase 1 success is defined by the Month-24 Proof of Capability outcome set (Section 1B). If Phase 1 deliverables are met, Phase 2 funding is released. If they are not met, the program convenes a technical review to determine whether the IR architecture requires revision, the timeline requires extension, or the approach is non-viable (see Section 13.2).

Governance and Reporting

The program provides quarterly progress reports to the Gates Foundation, including milestones achieved, accuracy metrics, budget expenditure, and risk register updates. AUDA-NEPAD provides institutional coordination. The Spix Foundation provides project management and software development. Independent evaluation is conducted at the Phase 1 gate (Month 24) and at program completion (Month 48) by an external evaluator nominated by the Gates Foundation.

Intellectual Property Posture

The PREMIER Institute owns all intellectual property resulting from ECM research. The Gates Foundation, as funding partner, receives a worldwide, paid-up, royalty-free, sub-licensable, non-exclusive license to all such IP. Code and specifications are released under the Apache License 2.0. Creative works (illustrations, documentation artwork) are released under the appropriate Creative Commons license. University research partners retain academic publication rights; all code and infrastructure deliverables are owned by the PREMIER Institute.

Sovereignty Posture

Attribution is distinct from authority. Founder attribution and SOCLE Board hosting rights are recognition mechanisms; they confer no governance authority over curriculum standards, data access, or platform operations. National curriculum authority remains with Ministries of Education. Continental coordination authority remains with AUDA-NEPAD. Technical infrastructure authority remains with the RESPECT Platform's technical steward. The Curriculum IR does not set curriculum policy β€” it maps existing curricula as authored by sovereign governments.


1B. Month-24 Proof of Capability

The following concrete outcomes define Phase 1 success and gate Phase 2 funding:


2. The Problem: Africa's Curriculum Mapping Bottleneck

2.1 The Learning Crisis

Sub-Saharan Africa faces the world's most severe learning crisis. According to the World Bank's Learning Poverty indicator, functional illiteracy among ten-year-olds in the region stands at approximately 90%. The African Union has identified the elimination of learning poverty as a continental priority. The Gates Foundation's Global Education Program has identified foundational learning in Sub-Saharan Africa as a core priority, committing more than USD 240 million over four years (announced April 2025) to help 15 million children in Sub-Saharan Africa and India learn more effectively, and, with ADQ, an additional USD 40 million for responsible AI and EdTech deployment across Sub-Saharan Africa (announced December 2025).

Digital courseware addressing foundational literacy and numeracy exists and has demonstrated impact in controlled settings. The barrier to continental-scale deployment is the curriculum-mapping bottleneck described below.

2.2 The Structural Barrier

Across Africa, an estimated 100 or more distinct national or sub-national curriculum standards govern learning expectations. These standards differ in conceptual decomposition, sequencing, representation conventions, linguistic realization, and cultural embedding. For digital courseware to be used in a country's public education system, it must be mapped to that country's curriculum standards.

Today, this mapping is performed manually by subject-matter experts, separately for each country, separately for each courseware application. If there are N courseware applications and M national curricula, the total mapping effort scales as O(AppsΓ—Standards). Each new country requires N new mappings; each new application requires M new mappings. This combinatorial cost structure makes multi-country deployment economically irrational for all but the largest publishers.

At realistic rates using African curriculum specialists, expert-produced curriculum mapping costs an estimated USD 1,000–3,500 per application per country, depending on subject scope and curriculum complexity. (This estimate is based on 20–80 hours of curriculum specialist time at USD 30–50/hour for standards analysis, content mapping, gap analysis, and quality assurance, plus project management and coordination overhead β€” consistent with African education consultant market rates.) The counterfactual at scale is decisive: covering 55 AU member states for 30 courseware applications through manual mapping would cost USD 1.7–5.8 million per mapping cycle β€” and this cost recurs every time a curriculum is revised or a courseware application updates its content. National curricula are typically revised on 5–10 year cycles; courseware applications update far more frequently. Each revision triggers a new round of manual re-mapping across all affected jurisdictions and applications. Over a decade with expansion to 1,000+ global jurisdictions, manual mapping costs compound into hundreds of millions of dollars β€” with no infrastructure, no machine-readable standards, and no path to automation.

The Curriculum IR transforms this cost structure. When a Ministry revises its national curriculum, the Ministry re-maps its standards to the Curriculum IR once; all courseware applications connected to the Curriculum IR receive the updated alignment automatically. When a courseware application updates its content, it re-maps to the Curriculum IR once; all jurisdictions receive the updated alignment automatically. An application mapped to an earlier version of the Curriculum IR can still be aligned to current national standards through the IR's versioning and transformation mechanisms β€” imperfectly, but at zero marginal cost, which is infinitely less expensive than the current system's requirement for a fresh manual mapping. The Curriculum IR converts a recurring, multiplicative expenditure into a one-time infrastructure investment that amortizes as the number of applications and jurisdictions grows.

2.3 The Digitization Gap

The mapping bottleneck is compounded by a digitization gap. No African country has published its curriculum standards in an internationally interoperable machine-readable format. South Africa's CAPS exists as PDF documents. Kenya's CBC is digitized for internal KICD use in a format specific to that institution. The African Union's Decade of Education (2025–2034) has yet to address curriculum digitization.

The ECM program must therefore solve both problems simultaneously: build the Curriculum IR and produce the machine-readable curricula it requires as inputs.

2.4 Why Now

Four developments have converged to make ECM feasible today:

LLM-based concept extraction has reached usable accuracy. Current research reports up to 89% F1-accuracy for goal-to-skill matching using large language models, with human validation. Five years ago, automated concept extraction from curriculum documents was impractical. Today, LLMs enable the extraction pipeline that the Curriculum IR depends on.

India's Sunbird/DIKSHA has proven open-source education infrastructure at national scale. India's DPI ecosystem β€” specifically the Sunbird taxonomy service β€” provides a tested, MIT-licensed platform layer that ECM can adopt, dramatically reducing infrastructure development time.

AI-assisted research and development tooling has reached production capability. The Curriculum IR's software architecture can be modeled directly on the production codebases of LLVM, TCP/IP, and FHIR β€” open-source systems whose design patterns are well-documented and accessible to AI-accelerated development tools. Environments such as Claude Code, GitHub Copilot, and their successors compress the implementation timeline for the IR compiler, mapping tools, and validation infrastructure. This acceleration is methodologically significant: time saved on technical milestones is reinvested in the institutional work β€” Ministry engagement, governance formation, Mapper certification β€” that determines whether infrastructure achieves adoption. ECM is AI-era education infrastructure, built with AI-era tools.

The African Union's Decade of Education (2025–2034) has created institutional momentum. AUDA-NEPAD's mandate, the African Continental Qualifications Framework (ACQF), and the AU's renewed commitment to education reform provide the continental coordination structure that an IR-based approach requires.


3. The Proposed Solution: Easy Curriculum Mapping (ECM)

3.1 The Core Insight

At the foundational level, curriculum standards describe concepts. Representations vary β€” the concept of number exists regardless of notation system; phonemic awareness is a prerequisite for alphabetic decoding regardless of language. A canonical intermediate layer that captures concepts independently of any particular curriculum's representation enables a structural reduction in mapping cost.

3.2 The Curriculum Intermediate Representation

ECM centers on a Curriculum IR that encodes learning concepts at a stable, representation-independent level. National curriculum standards map once to the Curriculum IR; digital courseware maps once to the same Curriculum IR. This converts the O(AppsΓ—Standards) mapping problem into two linear processes β€” Standards-to-IR and Courseware-to-IR β€” yielding O(Apps+Standards) total cost.

The Curriculum IR is designed to interoperate with existing education metadata standards, including CASE (Competency and Academic Standards Exchange) for standards representation and IEEE LOM for learning object metadata. The Curriculum IR extends these standards with concept-level semantics, dialect support (see Section 3.4), and weighting metadata that existing frameworks do not provide.

3.3 The TCP/IP, LLVM, and FHIR Precedents

This architectural pattern has been proven in three directly analogous domains.

TCP/IP introduced the Internet Protocol (IP) as a canonical intermediate layer between application protocols and network technologies. Before TCP/IP, each application had to be implemented separately for each network type β€” an NΓ—M problem. IP collapses this: any application protocol maps to IP (N mappings), and IP maps to any network technology (M mappings), yielding O(N+M). Conceived by Vint Cerf and Robert Kahn in their 1974 paper "A Protocol for Packet Network Intercommunication" (IEEE Transactions on Communications), funded by DARPA, and formalized as RFC 791 (1981), TCP/IP became the mandatory standard for US defense networks on January 1, 1983 ("Flag Day") and the foundational infrastructure of the global internet. The IETF governs its evolution through an open, consensus-based standards process with no single controlling institution. (Full history in Appendix C3.)

LLVM introduced a compiler intermediate representation that enables any source language to compile to any hardware target through a single canonical layer. LLVM began as a graduate research project at the University of Illinois (2000), funded by NSF and DARPA, and became the standard infrastructure for new compiler and toolchain projects within a decade (ACM Software System Award, 2012). Its successor, MLIR, extends the IR concept through a "dialect" mechanism that enables domain-specific representations within a unified framework. (Full history in Appendix C1.)

FHIR introduced canonical resources, concept maps, and integration with the UMLS Metathesaurus to collapse the NΓ—M problem across a dozen incompatible clinical coding systems. FHIR began as a volunteer initiative within HL7 International (2011), was accelerated by ONC's USD 15 million SMART on FHIR grant to Harvard/Boston Children's Hospital, became a normative standard in 2018, and was mandated for US healthcare systems by the ONC's 2020 Final Rule implementing the 21st Century Cures Act. (Full history in Appendix C2.)

All three precedents demonstrate that: (a) IR-based approaches work for complex, multi-stakeholder interoperability problems; (b) they require a governance body, an economic mandate, and sufficiently formal domain semantics; and (c) they can move from initial research to normative standard within 7–10 years, with deployable prototypes within 3–4 years.

3.4 MLIR's Dialect Concept: A Key Architectural Insight

MLIR's innovation β€” domain-specific "dialects" within a unified IR framework β€” directly addresses the most serious literature-based objection to a Curriculum IR (see Section 5). African curricula include competency-based frameworks (Kenya's CBC), content-standards frameworks (e.g., South Africa's CAPS), and outcomes-based frameworks (various). These are genuinely different organizational logics, each encoding distinct instructional commitments. A Curriculum IR with dialect support represents each curricular tradition in its own terms while enabling transformation between dialects β€” preserving cultural and pedagogical specificity while achieving interoperability.

Several African countries and most Indian states maintain sub-national curriculum variations (language-of-instruction differences, regional supplementary content). The Curriculum IR's dialect mechanism accommodates sub-national variation within the same national mapping.

3.5 Why This Has Not Been Attempted Before

Curriculum interoperability at this level has three prerequisites that did not co-exist until recently: (a) the foundational NLP/LLM technology for automated concept extraction from curriculum documents; (b) an institutional actor with both the continental mandate (AUDA-NEPAD) and the technical capacity (the MLIR/LLVM research community) to conceive the approach; and (c) an economic incentive at sufficient scale. The Global North's education systems, which fund most EdTech R&D, face the AppsΓ—Standards problem at manageable scale (a few dozen curricula, mostly digitized). Africa's fragmentation is uniquely severe β€” an estimated 100+ curricula, none digitized in interoperable format β€” and Africa's institutions are uniquely positioned to solve it.


4. Positioning Within the Breakthrough Ecosystem

4.1 The Breakthrough System and DPI-Ed

ECM is one component within a larger system. The Breakthrough System addresses four structural barriers to EdTech deployment in Africa β€” Policy, Technology, Data, and Economics (see Essay 07), "Making Education Outcomes Finance-Grade". ECM helps to address the Technology Barrier β€” the curriculum-mapping bottleneck β€” and enables the Data Barrier to be addressed through automated, curriculum-aligned assessment.

Africa's Digital Public Infrastructure for Education (DPI-Ed) is the open-source infrastructure layer that produces continuous, curriculum-aligned, auditable learning evidence. The Spix Foundation's RESPECT system is the first reference implementation of Africa's DPI-Ed. ECM provides the curriculum interoperability layer within DPI-Ed, enabling courseware to connect to any participating country's curriculum through a single mapping.

4.2 The Two-Track Curriculum Mapping Strategy

The Breakthrough System employs a two-track strategy for curriculum mapping:

Track 1 β€” RESPECT Certified Mappers (Years 1–4). During the period while ECM is under development, human curriculum experts β€” RESPECT Certified Mappers β€” perform manual, expert-validated curriculum alignments. RESPECT Certified Mappers are designed as a phase-limited profession: governance protocols include mandatory sunset clauses and transition pathways for Mappers into ECM-related auditing, standards-maintenance, and quality-assurance roles. (See Essay 23), "Mappers: Mapping Lessons to Curriculum Standards, Years 1–4."

Track 2 β€” ECM (Year 5+). By the end of Year 4, ECM is expected to deliver a deployable Curriculum IR and mapping toolset, collapsing the long-term cost of curriculum mapping. From Year 5 onward, ECM enables automated, curriculum-aligned assessment infrastructure β€” the foundation for cross-jurisdictional comparability that underpins Results-Based Finance for Education (RBF4Ed) at continental scale. (See Essay 22), "ECM: Mapping Lessons to Curriculum Standards, Year 5+."

This 48-month research program spans Years 1 through 4, producing a deployable Curriculum IR by Month 48 β€” aligned with the Breakthrough System's timeline. Mapper-produced curriculum alignments from Track 1 serve as expert ground-truth data for ECM validation, providing a natural bridge between the transitional manual system and the automated IR-based system.

4.3 Data Flow

A national curriculum document (e.g., Kenya's CBC, K–3 Mathematics) enters the system as a PDF. The digitization pipeline converts it into a CASE-compliant machine-readable format. The ECM research team maps the national standards to the Curriculum IR, in consultation with Ministry of Education personnel and curriculum experts to ensure the mapping reflects the jurisdiction's understanding of its own standards. A courseware application (e.g., a numeracy app) independently maps its lesson content to the same Curriculum IR. The crosswalk between the curriculum and the courseware is computed automatically from these two independent mappings. Ministries retain the right to challenge any IR-mediated alignment for cause β€” for example, if a courseware application claims curriculum alignment that a Ministry considers inaccurate or misleading β€” through formal contestability procedures with defined timelines and independent adjudication. Ministries are not burdened with the administrative overhead of reviewing or approving every courseware-to-curriculum alignment.

The program's ultimate goal is to produce mapping tools that are sufficiently intuitive and well-documented that Ministries choose to produce and maintain their own curriculum-to-IR mappings independently β€” perhaps with consulting support, but under their own sovereign authority. The incentive is direct: a Ministry that maintains its own Curriculum IR mapping gives every courseware application in the network automatic access to its curriculum, which enables curriculum-aligned assessment and the cross-jurisdictional comparability that underpins Results-Based Finance for Education (RBF4Ed). The tools must be good enough that this value proposition is self-evident.

4.4 Governance Architecture: Separation of Roles

The ECM program is designed as a loosely-coupled system with explicit separation of roles (see Essay 07):

This separation prevents any single institution from controlling the system and enables trust to accumulate across institutions with different mandates. All curriculum data, mapping outputs, and validation datasets are governed under Malabo Convention-compliant data sovereignty protocols, with each Ministry retaining full authority over its national curriculum data.

4.5 Theory of Change

If this program produces a validated Curriculum IR specification and mapping tools (outputs), then courseware developers can deploy across African and Indian jurisdictions by mapping once to the Curriculum IR (immediate outcome), which enables curriculum-aligned assessment at continental scale (intermediate outcome), which is the prerequisite for Results-Based Finance for Education at a projected benefit of approximately USD 35 per child per year (long-term impact; see Essay 07.


5. Literature-Informed Design Principles

A comprehensive review of the curriculum-mapping, ontology-alignment, and comparative education literatures identified five substantive objections to an IR-based approach. Each objection has been incorporated as a design requirement for the Curriculum IR:

Objection Design Response
Interlingua problem (semantic drift in intermediate representations) Continuous validation with formal feedback loops from Ministries; versioned Curriculum IR with built-in contestability
Pedagogical content knowledge (representation inseparable from concept) Curriculum IR encodes concept relationships β€” prerequisites, co-requisites, and representational alternatives β€” alongside concept identifiers
Granularity problem (no universal "atomic" concept level) Multi-granularity support; concepts can be leaf nodes in one curriculum's mapping and parent nodes in another's
Cultural embedding (curricula encode epistemological commitments) Curriculum IR functions as mapping infrastructure, making limited, verifiable claims about concept overlap while preserving each curriculum's internal logic
Assessment validity (mapping does not preserve contextual weighting) Curriculum IR encodes weighting and emphasis metadata alongside concept mappings

Each objection identifies a real design constraint. Each has been incorporated as a specification requirement for Phase 1, Milestone 1. The operating environment β€” Sub-Saharan Africa, where functional illiteracy among ten-year-olds stands at approximately 90% β€” demands infrastructure that is substantially better than the status quo. The status quo is no mapping system at all.

The architectural response to the literature's strongest objection β€” that heterogeneous curricular traditions cannot be represented in a single canonical layer β€” is MLIR's dialect concept (Section 3.4), which enables each tradition to retain its organizational logic within a unified transformation framework.


6. Research Goals and Milestones

Program Timeline: 48 Months (4 Phases)

The 48-month timeline aligns with the Breakthrough System's ecosystem design: Years 1–4 develop and validate ECM; Year 5 transitions to operational deployment with automated assessment and RBF4Ed integration.

Phase 1: Research and Validation (Months 1–24) β€” USD 5.5 million

Goal: Validate the Curriculum IR concept through a desk pilot, produce the IR v0.2 specification, digitize and map 12 curricula, and validate against real courseware.

Phase 1 requires no prior deliverables. This is the program's starting point.

Milestones:

Phase 2: Deployment and Operational Readiness (Months 25–48) β€” USD 4.5 million

Goal: Prepare ECM for operational deployment, achieve operational readiness across all 12 jurisdictions, prepare the scaling pathway to at least 44 countries (80% of AU Member States), and execute the sustainability transition.

Phase 2 requires the following Phase 1 deliverables as inputs: Curriculum IR v0.2; validated crosswalks for all 12 curricula; at least 3 courseware-to-IR mappings; formal validation study results; LLM-based concept extraction pipeline; open-source mapping tools.

Milestones:


7. The Hybrid Build & Buy Strategy

7.1 Components to Be Built (Original Research)

7.2 Components to Be Bought or Adopted

Component Source Cost Notes
Standards database (US/intl reference) EdGate USD 13,100/yr EdGate Pro annual subscription (USD 12,500/yr) plus international standards library (from USD 600/organization). Post-grant recurring cost assumed by the Spix Foundation. EdGate's foundational correlation patent (US9373264, priority date 2002) has likely reached its 20-year term, but EdGate announced additional patents in 2018; license terms should be reviewed at contracting to determine which patent claims, if any, remain in force and whether the database is independently protected by copyright or trade-secret law.
Standards identifiers AB GUIDs (US), CASE identifiers (global) Custom pricing US-centric; Curriculum IR provides the African/Indian layer
Platform infrastructure Sunbird ED (MIT licensed) β€” taxonomy service Free (open source) Designed for India; adaptation required for multi-country use
Curriculum digitization tooling OpenSALT (MIT licensed) Free (open source) CASE framework management; requires curriculum source documents
LLM-based mapping acceleration GPT-4o / Claude / Llama API costs (variable) Up to 89% F1-accuracy for goal-to-skill matching; human validation at every stage

8. Alignment with Gates Foundation Programs

ECM directly addresses three active Gates Foundation commitments, each representing a current funding stream:

Global Education Program (more than USD 240M over 4 years, announced April 2025): ECM directly enables the program's goal of helping children in Sub-Saharan Africa learn more effectively through evidence-based digital solutions. By collapsing the curriculum-mapping bottleneck, ECM allows effective courseware to reach learners across multiple countries β€” the prerequisite for the "regional exemplars" (the foundation's term) scaling strategy.

Digital Public Infrastructure (USD 200M+ commitment, announced September 2022): ECM is education DPI. It provides foundational, reusable digital infrastructure β€” the Curriculum IR, mapping tools, digitized curricula β€” designed for public benefit. The foundation's DPI investments β€” MOSIP for digital identity, Mojaloop for digital payments β€” establish the pattern. ECM is the curriculum interoperability layer within Africa's DPI-Ed, complementing the identity and payments layers the foundation already supports.

AI and EdTech for Africa (USD 40M ADQ partnership, announced December 2025): ECM uses LLMs to accelerate concept extraction and alignment suggestion, with human expert validation at every stage. The foundation's emphasis on responsible AI adoption β€” solutions that reflect local needs, empower teachers, and build capacity for sustained progress β€” aligns precisely with ECM's design: AI-accelerated, expert-validated, open-source, and governed by African institutions.

The program includes 6 Indian state curricula alongside 6 African national curricula for three reasons: (a) Indian state curricula are more digitized than African equivalents, providing a higher-fidelity validation environment for the Curriculum IR during early phases; (b) India's Sunbird/DIKSHA ecosystem is the primary open-source platform infrastructure the program adopts, and Indian state curriculum mapping enables direct integration testing; (c) cross-continental validation (Africa and India) provides stronger evidence of Curriculum IR generalizability than intra-continental validation alone. The primary beneficiaries of the program remain African children.

The Gates Foundation has invested in multiple EdTech platforms operating in Sub-Saharan Africa through its Global Education Program and ADQ partnership. ECM is infrastructure that amplifies the impact of all Gates-funded EdTech: any courseware application that maps to the Curriculum IR can deploy across all participating countries without additional mapping cost.


9. Competitive Landscape

No existing system provides curriculum interoperability at the level the Curriculum IR proposes. The closest approaches are:

ECM is complementary to all three. It occupies a layer that does not currently exist: the canonical concept representation that connects standards databases, curriculum expertise, and platform-specific content through a single interoperable infrastructure.

Structural precedents for the IR approach itself are described in Section 3.3 (TCP/IP, LLVM, FHIR). ECM differs from all three precedents in one critical respect: it must operate across sovereign jurisdictions with different educational philosophies, not merely across technical systems with different formats. This jurisdictional dimension β€” addressed through the dialect mechanism (Section 3.4), the contestability framework (Section 4.3), and the Sovereignty Posture (Section 1A) β€” is ECM's distinctive contribution to the IR pattern.


10. Principal Investigator Profile and Team

10.1 Required Expertise

Three domains of expertise define the PI requirements for ECM:

Domain 1 β€” Intermediate Representation Architecture. Deep expertise in designing, implementing, and scaling canonical intermediate representations for complex, heterogeneous systems. The PI must understand why IRs succeed (formal semantics, compositionality, separation of concerns) and why they fail (semantic drift, granularity mismatch, cultural embedding). Direct experience with TCP/IP, LLVM, MLIR, FHIR, or analogous IR systems is strongly preferred.

Domain 2 β€” African and Indian Education Systems. Working knowledge of African curriculum structures, the differences among them, and the institutional landscape (Ministries of Education, ACQF, regional qualification frameworks). This expertise may reside in a co-PI or senior research partner.

Domain 3 β€” Computational Linguistics / Knowledge Representation. Expertise in ontology alignment, multilingual concept representation, and LLM-based information extraction. The Curriculum IR's multilingual and multi-granularity requirements demand computational linguistics sophistication.

10.2 Proposed Principal Investigators

The following PI candidates are proposed based on expertise fit. Formal expressions of interest will be secured following AUDA-NEPAD's institutional endorsement of the program.

Vikram Adve (University of Illinois at Urbana-Champaign) - Co-creator of LLVM. Donald B. Gillies Professor of Computer Science. ACM Fellow. ACM Software System Award (2012). - Directly responsible for the most successful IR in computing history. Post-LLVM trajectory demonstrates domain transfer: from compilers to security (SVA, SAFECode, ALLVM β€” USD 5.6M NSF/ONR) to agricultural AI (AIFARMS β€” USD 100M NIFA/NSF program). Proven ability to apply IR-based thinking to domains beyond compilers. - IIT Bombay alumnus. Established track record of securing large-scale federal research funding. - Complementary requirement: strong co-PI in curriculum and pedagogy.

Uday Bondhugula (Indian Institute of Science, Bangalore) - Co-author of the foundational MLIR paper and contributor to its design. Professor in the Department of Computer Science and Automation, IISc. - Deep expertise in multi-level IR design β€” the specific architectural innovation (dialects) most relevant to ECM's challenge of representing heterogeneous curricular traditions. - Based at IISc Bangalore, in geographic and institutional proximity to EkStep Foundation (Sunbird/DIKSHA) and India's DPI architects. Provides a natural bridge between IR research and education infrastructure. - Founder of Polymage Labs (compiler building blocks for AI). Understands research-to-deployment transition. - Complementary requirement: strong co-PI in curriculum and pedagogy. India-based; AUDA-NEPAD coordination via program management.

Lesley Le Grange (Stellenbosch University, South Africa) - Distinguished Professor, Department of Curriculum Studies. Vice-President of the International Association for the Advancement of Curriculum Studies (IAACS). Over 250 publications. - The most internationally connected African curriculum scholar. Deep expertise in curriculum theory, decolonization of curriculum, and cross-cultural knowledge systems. - Based in Africa. Understands the cultural and epistemological dimensions that the Curriculum IR must navigate. - Complementary requirement: strong co-PI in IR architecture or computer science.

10.3 Project Management and Software Development: The Spix Foundation

Research produces knowledge. Deployment requires code. The Spix Foundation provides the engineering and project management capacity that bridges the two. The Spix Foundation's development team implements the Curriculum IR specification, builds the mapping tools, integrates the Sunbird taxonomy service, develops the LLM-based extraction pipeline, and delivers the open-source tooling that Ministries and courseware developers will use. (Organizational details in Appendix H.)

Project management is led by Jim Plamondon, CEO of the Spix Foundation, who managed multi-million-dollar research budgets at Microsoft Research β€” notably the program to integrate third-party programming languages into Microsoft's .NET Common Language Runtime and Visual Studio. That program required the same kind of coordination ECM demands: academic researchers defining the specification (the Common Language Infrastructure), industry partners implementing against it, and a central project management function ensuring that research insights translated into shipping infrastructure on a fixed timeline. The .NET CLR is itself an intermediate representation β€” a virtual machine IR enabling multiple source languages to compile to a single target β€” making this experience directly architecturally relevant.

10.4 Operational Structure

Core team (approximately 18 FTE across 48 months, scaling by phase): 8 FTE researchers (IR architecture, computational linguistics, curriculum studies), 4 FTE curriculum experts (digitization, mapping validation, Ministry liaison), and 6 FTE software engineers (IR implementation, mapping tools, platform integration). Staffing levels vary by phase: Phase 1 emphasizes research and digitization; Phase 2 emphasizes tooling, training, and deployment.

Budget allocation model: Approximately 55–60% of the program budget flows to personnel and university partner subgrants (researcher salaries, curriculum expert fees, field work). The remaining 40–45% covers infrastructure, LLM costs, program management, travel, evaluation, and contingency.

Partner selection: University research partners are selected based on: demonstrated expertise in IR architecture, computational linguistics, or curriculum studies; presence in or partnership with African or Indian institutions; prior experience with applied (deployment-oriented) research; and ability to meet open-source and open-access requirements.

Principal Investigator responsibilities: The Lead PI is responsible for Curriculum IR design, validation methodology, and overall technical research direction. Each co-PI is responsible for their domain's research quality, deliverable acceptance criteria, and publication. All PIs report to the program's governance structure and to the Gates Foundation through quarterly reports.

Quality assurance: Each phase undergoes independent external evaluation (Month 24 and Month 48). Subgrant agreements include deliverable acceptance criteria, milestone-based disbursement, and financial audit provisions. The desk pilot at Months 1–6 provides an early feasibility checkpoint before the program's full investment.

10.5 Recommended Structure: Co-PI Team

The program should be led by a co-PI team:


11. Institutional Partners

11.1 Lead Institution: AUDA-NEPAD

AUDA-NEPAD (African Union Development Agency) serves as the program's institutional home:

11.2 African University Partners

Institution Country Contribution
Kenya Institute of Curriculum Development (KICD) Kenya CBC curriculum source documents; curriculum expert validation; RESPECT pilot country
Ministry of Education, Liberia Liberia Liberian curriculum source documents; RESPECT pilot country; West African curriculum access
Ministry of Education, Eswatini Eswatini Eswatini curriculum source documents; RESPECT pilot country; Southern African curriculum access
Stellenbosch University South Africa Curriculum studies expertise; comparative curriculum analysis; validation research
University of Cape Town South Africa Education research; assessment validity
Makerere University Uganda Programming language and systems expertise (Bainomugisha); East African curriculum access
UNESCO-IBE Master's Programs Senegal, Congo-Brazzaville, Mozambique Trained curriculum specialists across West, Central, and Lusophone Africa

11.3 Indian Partners

Institution Contribution
Indian Institute of Science (IISc), Bangalore MLIR/IR research expertise (Bondhugula); proximity to EkStep/Sunbird ecosystem; Indian state curriculum digitization and mapping
EkStep Foundation Sunbird taxonomy service expertise; DIKSHA implementation experience; open-source platform support
IIIT Bangalore (Center for Digital Public Infrastructure) DPI architecture expertise; CDPI co-chaired by Pramod Varma; Indian state curriculum standards access and DPI-Ed integration

11.4 US Partners

Institution Contribution
University of Illinois at Urbana-Champaign LLVM/IR architecture expertise (Adve); large-scale research program management
The Spix Foundation Project management and software development (see Section 10.3)

11A. Institutional Outputs

ECM produces governance and standards bodies that outlive the research program:

ECM's role is to produce the research and infrastructure that these institutions require; it does not govern them after handoff. Governance authority flows from the Breakthrough System's established structures.


11B. Standards-Based Interoperability

All ECM research outputs will conform to or align with relevant standards. The distinction matters: "comply with" means ECM will implement the standard and test/certify against it; "align to" means ECM will follow the standard's design principles and interoperate with its interfaces, adapting where the standard does not fully address African educational contexts.


12. Budget Framework

12.1 Summary

Category Amount (USD)
Personnel (PI team, researchers, curriculum experts, developers) 4,000,000
Curriculum digitization (6 African countries + 6 Indian states, K–3, math + literacy) 800,000
Desk pilot (Phase 1 proof-of-concept: 2 curricula, 50 concepts each) 300,000
Infrastructure (Sunbird adaptation, EdGate license, cloud computing, tools) 650,000
LLM costs (API usage for concept extraction and alignment, 48 months) 450,000
Partner institution subgrants (African and Indian universities) 1,400,000
Travel and convening (Ministry engagement, connectathons, workshops) 750,000
Program management and administration (AUDA-NEPAD + Spix Foundation) 900,000
Independent evaluation (external evaluator, 3 assessments) 275,000
Contingency (5%) 475,000
Total 10,000,000

12.2 Budget by Phase

Phase Duration Amount (USD) Key Activities
Phase 1: Research + Validation Months 1–24 5,500,000 Desk pilot, IR v0.1β†’v0.2, 12 curricula digitized and mapped, courseware partnerships, validation study
Phase 2: Deployment + Operational Readiness Months 25–48 4,500,000 IR v1.0, tools delivered, Ministry training, governance framework, end-to-end deployment, scaling plan, sustainability transition

Funding is structured as staged commitments with a go/no-go gate (see Section 13).

12.3 Budget Rationale

The personnel budget assumes approximately 8 FTE researchers, 4 FTE curriculum experts, and 6 FTE software engineers across all partner institutions over 48 months, with staffing levels varying by phase. A detailed staffing plan is provided in Appendix D, to be developed by the Spix Foundation during Phase 1.

Based on the project's Build vs. Buy analysis (Appendix A), the hybrid Build & Buy strategy reduces the estimated cost from USD 8–12 million for a pure-build approach to USD 10 million with adopted infrastructure. The 48-month timeline is within range of the TCP/IP precedent (7 years from Cerf and Kahn's 1974 paper to RFC 791 in 1981) and the LLVM precedent (5 years from first code to 1.0 release), and aggressive relative to the FHIR precedent (7 years from first proposal to normative standard). Four factors justify the pace: (a) the architectural pattern is well understood and the program builds on existing IR design knowledge from three proven open-source precedents whose production codebases are directly accessible; (b) AI-assisted research and development tooling β€” from LLM-based concept extraction to AI-accelerated software engineering β€” compresses technical milestones, enabling the team to model the Curriculum IR compiler and mapping tools directly on LLVM, FHIR, and TCP/IP architectures; (c) time saved on technical milestones is reinvested in institutional readiness β€” Ministry engagement, governance formation, and Mapper certification β€” which is the rate-limiting factor for adoption; (d) Africa's education crisis demands urgency β€” the children currently in K–3 will age out of foundational learning within 36 months.

12.4 Assumptions and Bounds

The budget and timeline rest on the following assumptions. If an assumption proves false, the corresponding bound applies.

Assumption Bound (what ECM is not promising)
LLM-based concept extraction achieves β‰₯85% F1-accuracy with human validation If accuracy is lower, human expert effort increases; budget absorbs this through the contingency allocation. ECM does not promise fully automated extraction.
At least 6 African countries' curriculum documents are accessible through AUDA-NEPAD's Ministry relationships If fewer are accessible, ECM substitutes additional Indian state curricula or other available African curricula. The Curriculum IR's validity depends on typological diversity, not on specific countries.
The Sunbird taxonomy service is adaptable for multi-country use within the budgeted infrastructure allocation If adaptation proves more complex, ECM builds a lightweight alternative using the same API specification.
University partner institutions can recruit and retain qualified researchers within project budgets If recruitment proves difficult, the co-PI structure provides redundancy: the program can proceed with any two of the three domain leads.
48 months is sufficient to reach Curriculum IR v1.0 with operational mapping tools Phase 1 alone (24 months) produces 12 digitized curricula, a validated IR v0.2, and open-source mapping tools β€” valuable even if Phase 2 requires extension.
ECM does not promise that the Curriculum IR will replace all manual curriculum mapping by Month 48 The IR reduces cost and enables automation; expert validation remains part of the process. The goal is O(Apps+Standards) cost structure, not zero human involvement.

13. Evaluation and Go/No-Go Criteria

13.1 Independent Evaluation

The program will be independently evaluated at the end of Phase 1 and Phase 2 by an external evaluator nominated by the Gates Foundation during Phase 1. Evaluation criteria include: mapping accuracy against expert ground truth, time and cost per mapping, usability of mapping tools for Ministry personnel, effectiveness of contestability mechanisms, and courseware developer adoption rates.

13.2 Go/No-Go Gate

Phase 1 β†’ Phase 2 gate (Month 24): The formal validation study achieves β‰₯85% concept-level accuracy across 12 curricula. At least 3 courseware developers have mapped content to the Curriculum IR. The Curriculum IR v0.2 specification is published. All 12 curricula are digitized and mapped. Peer-reviewed results are submitted for publication. If accuracy falls below 50%, the program convenes a technical review to determine whether the IR architecture requires fundamental revision or the approach is non-viable. Between 50% and 85%, the program may extend Phase 1 by up to 6 months for architectural refinement before re-evaluation.

13.3 Reporting

The program provides quarterly progress reports to the Gates Foundation, including: milestones achieved, accuracy metrics, budget expenditure, and risk register updates. A comprehensive mid-term review is conducted at the Phase 1 gate (Month 24).

13.4 What If the Curriculum IR Approach Proves Non-Viable?

If the Curriculum IR fails to achieve β‰₯50% accuracy at the Phase 1 gate, the program will have produced three outputs with independent value: (a) 6 digitized curricula in CASE-compliant format β€” the first machine-readable African curriculum canon; (b) a rigorous empirical assessment of the IR hypothesis, informing future research directions; (c) an LLM-based concept extraction pipeline with documented accuracy metrics. The digitized curricula and extraction pipeline serve the broader DPI-Ed ecosystem regardless of the IR's ultimate viability.


14. Risk Mitigation

Risk Mitigation
IR architecture proves too lossy for foundational subjects Phase 1 targets math and literacy (K–3), where cross-curricular concept overlap is highest and the IR approach is on strongest theoretical ground. Desk pilot validates before full investment.
Curriculum source documents unavailable or incomplete AUDA-NEPAD's Ministry relationships provide direct access to African curricula; IISc and IIIT Bangalore provide access to Indian state curricula; KICD and other national/state curriculum bodies are named partners
Ministry engagement insufficient for validation The program includes funded Ministry training and engagement activities; AUDA-NEPAD's existing relationships de-risk sovereign participation. Ministry adoption is incentivized by three outputs: free curriculum digitization, access to the full courseware network, and the eventual ability to produce their own curriculum-to-IR mappings β€” the foundation for cross-jurisdictional comparability that underpins RBF4Ed funding. Ministries are not burdened with approving individual courseware alignments.
PI recruitment contingent on institutional commitment AUDA-NEPAD's endorsement is pursued first; PI recruitment follows. Co-PI structure with multiple candidates per domain provides redundancy; the program can proceed with any two of the three domain leads
LLM accuracy insufficient for production use Human-in-the-loop validation is built into the design at every stage. LLMs accelerate expert judgment through automation of initial concept extraction and alignment suggestion.
Lock-in risk (premature standardization) Explicit versioning from v0.1; sunset mechanisms for early mappings; open-source licensing (Apache 2.0) prevents single-institution control
48-month timeline proves insufficient Phase structure allows useful outputs at each stage; Phase 1 alone (24 months) produces 12 digitized curricula, a validated IR v0.2, and open-source mapping tools β€” valuable even if Phase 2 requires extension
LLMs become accurate enough to map curricula without an IR The Curriculum IR provides three capabilities that direct LLM mapping lacks: (a) governance and contestability (Ministries can audit mappings against a published specification); (b) compositionality (new curricula and courseware connect to the full network); (c) institutional permanence (the IR persists across LLM model generations). The IR and LLMs are complementary.
Ministries or courseware developers do not adopt the tools Phase 1 includes partnership with 3–5 courseware developers and direct Ministry engagement in 12 jurisdictions. Phase 2 measures adoption rates. AUDA-NEPAD's relationships with all 55 AU Ministries provide the sovereign engagement pathway.

14.1 Degraded Operations: What Ships If Dependencies Slip

ECM's principal external dependencies are AUDA-NEPAD's Ministry access (for curriculum documents) and the Sunbird taxonomy service (for platform infrastructure). Neither is a hard blocker.

If Ministry access is delayed in specific countries:

If Sunbird adaptation proves more complex than expected:

If LLM accuracy falls short:

14.2 Go/No-Go Gates

Gate Timing Condition Action if Not Met
Phase 1 β†’ Phase 2 release Month 24 β‰₯85% concept-level accuracy across 12 curricula; at least 3 courseware-to-IR mappings; IR v0.2 published; all 12 curricula digitized and mapped If accuracy β‰₯50% but <85%: extend Phase 1 by up to 6 months for architectural refinement. If accuracy <50%: convene technical review to assess viability.
Desk pilot checkpoint Month 6 Prototype Curriculum IR constructed; concept-mapping accuracy measured against expert ground truth for 2 curricula If results are negative, program pivots design before committing Phase 1's full investment
Courseware validation Month 18 At least 1 courseware developer has mapped content to the Curriculum IR with measurable results If no developer adoption, program intensifies partnership efforts and adjusts tooling priorities
Phase 2 completion Month 48 IR v1.0 published; operational tools delivered; at least 3 courseware applications deploying across all 12 jurisdictions Joint funder-program review; scope and timeline adjustment for any incomplete deliverables

15. Expected Outcomes and Impact

15.1 Direct Outputs

15.2 Beneficiary Population

The 6 African countries and 6 Indian states targeted in this program collectively serve tens of millions of K–3 students. At USD 10 million for infrastructure serving this population, the per-student investment is negligible β€” and amortizes toward zero as additional courseware applications and jurisdictions join the network. At full AU-wide deployment (approximately 170 million K–3 children across 55 member states), the per-student infrastructure cost would fall below USD 0.10.

15.3 Downstream Impact

If the Curriculum IR achieves its target of β‰₯85% concept-level accuracy for foundational literacy and numeracy across 12 curricula (6 African, 6 Indian), it will:

Every year without curriculum interoperability infrastructure is another year in which the 90% illiteracy rate compounds. The children currently in K–3 across Sub-Saharan Africa will age out of foundational learning within this program's 48-month timeline. ECM is the structural prerequisite for deploying effective courseware across African countries at a cost that education financing can sustain. The Curriculum IR makes this possible.

15.4 If the Program Exceeds Expectations

If Phase 2 validation achieves β‰₯85% accuracy, the immediate scaling path is: (a) extend to all 55 AU member states over 3–5 years, at an estimated cost of approximately USD 500,000 per additional country for digitization and mapping; (b) extend to secondary subjects (science, social studies) and upper grades; (c) invite non-African, non-Indian countries to contribute dialects to the Curriculum IR. The total estimated cost to reach all AU member states at K–3 is USD 30–35 million, fundable through a combination of Gates follow-on grants, GPE allocations, and Ministry co-funding.


16. Sustainability and Scaling

16.1 Post-Grant Institutional Home

Following the 48-month program, Curriculum IR governance transfers to the proposed SOCLE Board (Standard for Open Curriculum Logic in Education), based in the Gulf for political neutrality. The Gulf is chosen because the Curriculum IR is designed as global infrastructure β€” serving African, Indian, and eventually Latin American, Southeast Asian, and other jurisdictions β€” and its governance body should be perceived as neutral by all participating regions. During the 48-month research phase, AUDA-NEPAD retains operational leadership; the transition to the Gulf-based SOCLE Board occurs as part of Phase 2's sustainability transition.

The GEOS Organization (also proposed for the Gulf) governs outcome certification for Results-Based Finance for Education. The GEOS Organization also assumes responsibility for training and certifying curriculum mapping auditors worldwide β€” a natural extension of its quality-assurance mandate, since the Curriculum IR mapping is upstream in the same pipeline as outcome certification.

16.2 Funding Mechanism and Global Scaling

The key post-grant challenge is scaling beyond the 12 research jurisdictions to the approximately 1,000 curriculum jurisdictions worldwide that will eventually need certified curriculum-to-IR mappings.

Infrastructure maintenance is funded through three channels: (a) the RESPECT Ecosystem Fund (see Essay 23), which allocates a percentage of platform-wide transaction fees to ecosystem maintenance; (b) successor grants or GPE allocations for curriculum update cycles; (c) AUDA-NEPAD's recurring continental education infrastructure budget for African-specific operations. Estimated annual maintenance cost: USD 500,000–800,000 for specification updates, curriculum re-digitization cycles, and tool maintenance.

Mapper training and certification is the larger scaling challenge. For every curriculum jurisdiction that participates in the Curriculum IR β€” ultimately numbering in the hundreds or thousands β€” Ministry personnel must be trained to produce certification-ready curriculum-to-IR mappings of their constantly evolving standards (especially as the Curriculum IR itself evolves across versions). The GEOS Organization governs this training and certification function, analogous to its role in certifying outcome assessors ("GEOSors" β€” see Essay 07. This is pipeline integration: GEOS certifies that a jurisdiction's curriculum-to-IR mapping meets quality standards (upstream) and separately certifies that learning outcomes measured through that mapping meet finance-grade evidence standards (downstream). The training and certification program is funded through certification fees, scaled to jurisdiction size, and cross-subsidized by RBF4Ed transaction fees flowing through the RESPECT Ecosystem.

16.3 Dissemination

Research outputs will be disseminated through: (a) peer-reviewed publications in education technology, computational linguistics, and standards interoperability venues; (b) presentation at AUDA-NEPAD's education technology convenings; (c) open-source release of all specifications, tools, and datasets on GitHub under Apache License 2.0; (d) a public-facing project website with documentation for Ministries and courseware developers.

16.4 Intellectual Property

The PREMIER Institute owns all intellectual property resulting from ECM research. Funding partners receive a worldwide, paid-up, royalty-free, sub-licensable, non-exclusive license to all such IP. Code and specifications (the Curriculum IR specification, mapping tools, validation infrastructure) are released under the Apache License 2.0. Creative works (documentation illustrations, training materials artwork) are released under the appropriate Creative Commons license. University research partners retain academic publication rights; all code and infrastructure deliverables are owned by the PREMIER Institute.

Digitized curricula are published under open licenses; the underlying curriculum content remains the intellectual property of the issuing government. EdGate's licensed database remains proprietary; the Curriculum IR specification is designed to function independently of any proprietary data source. The EdGate license should be structured to terminate when EdGate's relevant patents expire (expected during the program's timeline), unless EdGate can demonstrate independent copyright or other IP protection for its database content. Post-grant, the Spix Foundation assumes the recurring license cost for the duration of the license.


17. Conclusion

ECM is AI-era education infrastructure. The Curriculum IR applies a proven architectural pattern β€” demonstrated by TCP/IP for networking, LLVM for compilers, and FHIR for healthcare interoperability β€” to the structural barrier preventing digital courseware from reaching the children who need it most. The program's methodology reflects the same convergence: AI-assisted research and development tools compress the technical timeline, while the institutional architecture β€” AUDA-NEPAD coordination, Ministry engagement, SOCLE Board governance, and global Mapper certification β€” receives the sustained investment that determines whether infrastructure achieves adoption.

The 90% functional illiteracy rate among ten-year-olds in Sub-Saharan Africa is a structural failure requiring a structural solution. ECM provides that solution: a canonical intermediate layer that converts the prohibitive O(AppsΓ—Standards) cost of curriculum mapping into sustainable O(Apps+Standards) infrastructure. The Curriculum IR makes it economically rational for courseware developers to serve every African country β€” and, through automated curriculum-aligned assessment, creates the evidentiary foundation for Results-Based Finance for Education at continental scale.

This research program will determine whether the Curriculum IR works. If it does, the tools, specifications, and institutional framework produced over 48 months will enable deployment across the continent and beyond. The architectural pattern has been proven three times. The AI-accelerated tooling to build it exists. The institutional partners are ready. The children are waiting.


Appendices