Proposed as: A research project within the PREMIER Institute (Platform Research and Engineering for Modern Infrastructure in Education Readiness)
Proposed by: The Spix Foundation, for consideration by the African Union Development Agency (AUDA-NEPAD), in partnership with SIL International and university research institutions in Africa and the United States
Duration: 48 months (4 years, aligned with ECM research timeline)
Requested funding: USD 8 million (staged across two phases with a go/no-go gate)
Nine out of ten children in Sub-Saharan Africa cannot read a simple sentence by age ten. Digital courseware that teaches foundational literacy and numeracy exists β and cannot deploy across African languages because localizing FLN courseware requires redesigning the entire pedagogical architecture β the localization surface spans all instructional sequences, all phonics scaffolding, all decodable-text inventories. This FLN localization bottleneck is a structural barrier to mother-tongue foundational learning at continental scale.
FLN courseware teaches the act of reading. In any given written language, the concepts of symbol, sound, meaning, and mnemonic are deeply intertwined. The pedagogical sequence β which letters to introduce first, which grapheme-phoneme correspondences to teach, which decodable words to construct, which blending exercises to scaffold β depends on the frequency and regularity of grapheme-phoneme correspondences in the specific target language. Changing the language changes the pedagogy. onebillion, co-winner of the Global Learning XPRIZE, reports that each language localization of its onecourse app requires approximately 180,000 words of contextually adapted content. Seven years after the codebase became open-source, it is available in only five languages (as of 2024).
A structural solution is feasible. Perfetti and Verhoeven's 2022 study of seventeen orthographies across five writing system types identified two universals: the Universal Writing System Constraint (all writing systems encode language and reflect basic properties of the linguistic system they encode) and the Universal Phonological Principle (reading activates phonology across all writing systems). Ziegler and Goswami's Psycholinguistic Grain Size Theory provides the parametric framework: all writing systems map written symbols to linguistic units, differing in the grain size and consistency of the mapping.
Africa has a structural advantage: the vast majority of its written languages use Latin or Arabic script; Ethiopic (Ge'ez) serves the languages of the Horn, and a small number of indigenous scripts (N'Ko, Tifinagh, Adlam, Vai) serve specific language communities. Most African Latin-script orthographies are transparent β designed by linguists in the 20th century with consistent grapheme-phoneme correspondences.
This proposal requests USD 8 million over 48 months to build Easy FLN Localization: a Writing Intermediate Representation (Writing IR) that captures the deep structural invariants among written languages β graphemes, phonemes, grapheme-phoneme correspondences, syllable structures, morphological rules, letter-introduction sequences, decodable word inventories, and pedagogical scaffolding patterns. The architectural pattern is the same one that makes Easy Curriculum Mapping (ECM) possible at O(Apps+Standards) cost. Languages will map once to the Writing IR; FLN courseware apps will map once to the same Writing IR. The result: automated FLN localization through parameterization, replacing manual redesign per language.
The funder commits USD 8M over 48 months, disbursed in two phases with a go/no-go gate at Month 24. Phase 1 (Months 1β24): USD 4.5M. Phase 2 (Months 25β48): USD 3.5M. Phase 2 funding is contingent on Phase 1 deliverables. Easy FLN Localization is a "Big Easy" housed within the PREMIER Institute but independently fundable, with its own Founder attribution.
Phase 1 success is defined by the Month-24 Proof of Capability outcome set (Section 1B). If Phase 1 deliverables are met, Phase 2 funding is released. If they are not met, the project convenes a technical review to determine whether the IR architecture requires revision, the timeline requires extension, or the approach is non-viable (see Section 10.2).
Easy FLN Localization reports through the PREMIER Institute's governance structure. The project's Lead PI reports quarterly to the funder and to the PREMIER Institute Director. Independent evaluation is conducted at the Phase 1 gate (Month 24) and at project completion (Month 48) by an external evaluator. AUDA-NEPAD provides institutional coordination.
The PREMIER Institute owns all intellectual property resulting from Easy FLN Localization research. Funding partners receive a worldwide, paid-up, royalty-free, sub-licensable, non-exclusive license to all such IP. Code and specifications (the Writing IR specification, parameterization tools, computational phonology pipeline) are released under the Apache License 2.0. Creative works (illustrations, audio recordings, training materials artwork) are released under the appropriate Creative Commons license. University research partners retain academic publication rights; all code and infrastructure deliverables are owned by the PREMIER Institute.
Attribution is distinct from authority. Founder attribution and any institutional hosting rights are recognition mechanisms; they confer no governance authority over research agendas, language policy, or platform operations. Language policy authority remains with national governments. Continental coordination authority remains with AUDA-NEPAD. Technical infrastructure authority remains with the RESPECT Platform's technical steward. The Writing IR does not prescribe orthography or language policy β it parameterizes existing written languages as defined by their linguistic communities and national standards.
The following concrete outcomes define Phase 1 success and gate Phase 2 funding:
FLN courseware is categorically harder to localize than courseware for already-literate learners. A Grade 5 history lesson assumes the its students are already literate; an FLN app treats language as the content itself. The localization surface is the entire pedagogical architecture.
The cost consequence is severe. onebillion reports approximately 180,000 words of content per language, each requiring contextual adaptation so that content is "culturally relevant and introduces letters in an order that makes sense." Each localization involves linguists, phonics specialists, audio engineers, illustrators, and pedagogy experts. Cisco-funded tooling improvements enabled multiple languages to be localized simultaneously β a throughput optimization β but the total effort per language remains substantial because each new language requires rebuilding the pedagogical scaffolding from scratch.
PREMIER's planned Easy Text Localization project addresses general-purpose translation and adaptation of educational content for already-literate learners. It is designed for content where language is the delivery medium. Easy FLN Localization addresses content where language is the pedagogical substance. The two projects are complementary: Easy Text Localization handles post-literacy courseware; Easy FLN Localization handles pre-literacy courseware.
Africa has approximately 2,000 languages and dozens of national curriculum standards. The Breakthrough Project's Phase 1 targets six countries, K-3, Foundational Literacy and Foundational Numeracy, in the AU languages of those countries. Phase 2 expands to ~21 countries. Phase 3 targets at least 44 countries (80% of AU Member States). At each expansion, the number of African languages requiring FLN localization grows. Without structural cost reduction, each language Γ each FLN app is a year-long manual localization effort β which simply too expensive to scale.
Foundational Numeracy presents a related but more tractable challenge. Arabic numerals (0β9) and positional notation are universal across African education systems. The mathematical concepts β counting, addition, subtraction, quantity comparison β are language-independent. What varies is the verbal layer: number words, counting conventions (some African languages use base-5 or base-20 counting alongside the base-10 system used in school), number-word irregularities, and word-problem contexts. Dehaene's Triple Code Model (1992) establishes that number representation involves three codes: a visual Arabic code (universal), an auditory verbal code (language-specific), and an analog magnitude code (universal). The verbal layer is a thinner localization surface than literacy's. Easy FLN Localization will treat Foundational Numeracy's verbal layer as a special case within the Writing IR β the same architectural framework, with a thinner parameter set. Specifically, the Numeracy verbal layer utilizes the Phoneme inventory, Morphological rules, and Pedagogical scaffolding patterns of the core Writing IR, adding a dedicated parameter set covering number words, counting conventions, and word-problem templates. This follows the Multi-Level IR (MLIR) design principle of accommodating domain-specific "dialects" within a unified IR infrastructure (Lattner et al., "MLIR: Scaling Compiler Infrastructure for Domain Specific Computation," IEEE/ACM CGO 2021).
Four developments have converged to make Easy FLN Localization feasible today:
The psycholinguistic evidence base has matured. Perfetti and Verhoeven (2022) demonstrated universals across 17 orthographies. Ziegler and Goswami (2005) provided the parametric framework. Makalela (2024) validated the Orthographic Depth Hypothesis and Morphological Transparency Hypothesis with longitudinal data from African bilingual children. The theoretical foundation for a formal Writing IR is established.
Computational grapheme-to-phoneme models cover hundreds of languages. Deri and Knight (2016, ACL) built computational G2P models covering 531 languages. Transformer-based models have extended this coverage further. The component technology for automated phonological analysis exists.
SIL International has built component tools. PrimerPrep analyzes language data to recommend optimal letter-teaching sequences. Bloom creates decodable readers in any language by separating pedagogical structure from language-specific parameters. These tools demonstrate that aspects of the Writing IR problem have been solved in isolation.
ECM proposes the architectural precedent. If the Curriculum IR succeeds, it will have demonstrated that a formal intermediate representation can collapse an AppsΓStandards cost problem to O(Apps+Standards) in the education domain. Easy FLN Localization applies the same hypothesis to a sibling problem, within the same institute, using shared methodology.
Written languages, despite their surface diversity, share deep structural invariants. All writing systems encode language through systematic mappings between written symbols and linguistic units. The mappings differ in grain size (phoneme, syllable, morpheme) and consistency (transparent vs. opaque) β but these dimensions are parametric. A formal intermediate representation that captures these invariants enables FLN localization through parameterization: supply a new language's parameters, and the pedagogical scaffolding is generated from the shared framework.
The Writing IR will encode the structural elements of FLN pedagogy at an abstract level:
National languages will map once to the Writing IR by supplying their language-specific parameters. FLN courseware apps will map once to the Writing IR by expressing their pedagogical structure in terms of the IR's abstract elements. The crosswalk between a language and a courseware app will be computed automatically from these two independent mappings.
The IR pattern is proven in other domains. LLVM captures computation at a stable, representation-independent level: source languages compile once to the IR, target architectures map once from the IR, and the result is linear cost instead of quadratic. TCP/IP captures network communication at a stable, representation-independent level: applications map once to the protocol stack, physical networks map once to the protocol stack. Easy FLN Localization applies the identical pattern to FLN pedagogy: a canonical intermediate layer, two families of linear mappings, and automatic crosswalk computation. ECM proposes to apply this pattern to curriculum alignment; Easy FLN Localization proposes to apply it to FLN localization. The two projects share IR design methodology, validation approach, governance framework, and sustainability model.
The Writing IR is analogous to the abstraction layer in Dynamic Tonality and the JIMS Isomorphic Music System (JIMS) β to which the Spix Foundation's CEO, Jim Plamondon, was a contributor. In music, the intervals between notes in a major triad follow the same pattern regardless of the triad's root β across not only in twelve-tone equal temperament, but all across the valid tuning range of the syntonic temperament, which includes the musical tunings of many non-Western cultures and eras. JIMS encodes the relationships between musical elements at a deep structural level, then renders them onto interfaces where the same physical gesture produces the same musical interval regardless of key or tuning. The Writing IR does the same for FLN pedagogy: it encodes the relationships among the elements of literacy at a deep structural level, then renders them into language-specific courseware where the same pedagogical pattern produces the same learning outcome regardless of target language.
The vast majority of Africa's written languages use Latin or Arabic script; Ethiopic (Ge'ez) serves the languages of the Horn, and a small number of indigenous scripts (N'Ko, Tifinagh, Adlam, Vai) serve specific language communities. Most African Latin-script orthographies were designed by linguists in the 20th century with consistent grapheme-phoneme correspondences β they are transparent. This constrains the parameter space for the Writing IR: transparent orthographies have regular GPC tables that map cleanly to the kind of formal representation the IR requires.
Mother-tongue literacy in a shared script transfers to colonial-language literacy at low cost. Research published in Economics of Education Review (2022) confirms that mother tongue reading materials serve as a bridge to second-language literacy. A child literate in Setswana has internalized the Latin alphabet's visual system and the cognitive operation of alphabetic decoding. Acquiring English literacy then requires learning English-specific grapheme-phoneme mappings and vocabulary β the concept of alphabetic reading itself does not need to be relearned.
This within-script transfer mechanism multiplies the Writing IR's value: localizing FLN courseware into a mother tongue using Latin script simultaneously prepares the child for literacy in (a) all of the African Union's official languages that share the same script (English, French, Portuguese, Spanish, Kiswahili), and (b) the vast majority of Sub-Saharan Africa's written languages.
Easy FLN Localization directly supports V&P_Core's scaling trajectory. Phase 1 targets six countries in AU official languages β requiring FLN courseware, each localized into at least one of those languages. Phase 2 expands to ~21 countries. Phase 3 targets at least 44 countries (80% of AU Member States). At each expansion, new languages require FLN localization. The Writing IR transforms this from a quadratic system cost (technically, O(Apps * Languages)) into a linear cost (O(Apps + Languages)).
ECM and Easy FLN Localization are siblings. Both build formal intermediate representations that capture deep structural invariants across a surface-diverse domain. Both collapse an NΓM cost problem to O(N+M) β O(AppsΓStandards) for ECM, O(AppsΓLanguages) for Easy FLN. Both use manual expert work during Years 1β4 as the ground-truth foundation for the automated system that follows.
The two IRs are complementary. The Curriculum IR captures what must be taught β the learning objectives, concept sequences, and assessment expectations specified by a national curriculum. The Writing IR captures how literacy is taught in a given language β the grapheme-phoneme correspondences, letter-introduction sequences, and decodable-word inventories that constitute the pedagogy. Localizing an FLN app to a new language in a new country requires both: the Writing IR for the language-specific pedagogy, and the Curriculum IR for the curriculum-specific alignment.
Housing both projects in the PREMIER Institute enables internal handoff, researcher cross-pollination, and coordinated IR design.
Easy Text Localization facilitates localization of post-literacy courseware. Easy FLN Localization facilitates localization of pre-literacy courseware. A localized FLN app will use both: Easy Text Localization for UI text and instructional scaffolding addressed to literate users (e.g., teacher guides), and Easy FLN Localization for the phonics, decoding, and early reading content that constitutes the learner-facing pedagogy.
XPRIZE's Accelerate Learning Challenge ($10M, 2025β2029) will produce finalists β FLN courseware apps β that must be localized into the AU official languages of the Breakthrough Project's participating countries when they enter the RESPECT Ecosystem during Phase 2 (see Essay 28), XPRIZE & the Breakthrough Project. These finalists will arrive with content in one or a few languages; the RESPECT Ecosystem must localize them across dozens. During Phase 2, this localization will be performed manually (Track 1). If the Writing IR research succeeds, it will enable this at scale from Year 5 onward (Track 2).
The Breakthrough System will employ a two-track strategy for FLN localization, mirroring the ECM two-track strategy:
Track 1 β Manual FLN Localization (Years 1β4). During the period while the Writing IR is under development, localizing FLN courseware is simply too expensive for the Breakthrough Project to undertake.
Track 2 β Writing IR (Year 5+). By the end of Year 4, the Writing IR is expected to reach operational readiness, enabling automated FLN localization through parameterization. From Year 5 onward, localizing an FLN app into a new language requires supplying the language-specific parameters into the Writing IR framework β a process measured in weeks, performed once per language, making that language available to all RESPECT Compatible FLN apps for free.
If this project produces a validated Writing IR specification and parameterization tools (outputs), then FLN courseware developers can localize across African languages by parameterizing once to the Writing IR (immediate outcome), which enables mother-tongue FLN delivery at continental scale (intermediate outcome), which directly addresses the 89% functional illiteracy rate among ten-year-olds in Sub-Saharan Africa (long-term impact).
A comprehensive review of reading science, computational linguistics, and FLN localization practice identified the empirical foundations and design constraints for the Writing IR:
| Evidence Base | Design Implication |
|---|---|
| Perfetti & Verhoeven (2022) β Universal Writing System Constraint and Universal Phonological Principle across 17 orthographies | The Writing IR rests on empirically validated universals, applied to 3.7 billion speakers across 5 writing system types |
| Ziegler & Goswami (2005) β Psycholinguistic Grain Size Theory: grain size and consistency as parametric dimensions | The Writing IR parameterizes languages along these two dimensions, enabling systematic variation within a unified framework |
| Dehaene (1992) β Triple Code Model: visual, verbal, and magnitude codes for number representation | Foundational Numeracy localization reduces to the verbal code β a thinner parameter set handled as a special case within the Writing IR |
| Makalela (2024) β Orthographic Depth Hypothesis and Morphological Transparency Hypothesis, validated with African bilingual children | Africa's transparent orthographies constrain the parameter space and facilitate cross-language transfer |
| Deri & Knight (2016) β Computational G2P models covering 531 languages | Automated grapheme-to-phoneme analysis is available as a component technology for the Writing IR's GPC tables |
| SIL PrimerPrep and Bloom β letter-sequence optimization and decodable-reader generation | Component tools that solve individual Writing IR sub-problems; the integration into a unified abstraction is the missing step |
| Global Proficiency Framework (UNESCO/USAID/World Bank) β universal constructs for reading proficiency | Provides the assessment alignment layer for validating Writing IRβlocalized courseware against learning outcomes |
| onebillion localization data β ~180,000 words per language, 1+ year per localization | Quantifies the cost that the Writing IR is designed to collapse |
Five design constraints emerge from the literature:
| Constraint | Design Response |
|---|---|
| Orthographic depth variation (transparent vs. opaque scripts) | The Writing IR parameterizes orthographic depth; Africa's predominantly transparent orthographies are the primary target |
| Grain size variation (phoneme-level vs. syllable-level vs. morpheme-level mappings) | Multi-granularity support: the Writing IR represents GPC mappings at the grain size appropriate to each language |
| Cultural embedding of mnemonics (letter-sound associations are culturally specific) | Mnemonic associations are language-specific parameters, not IR-level structures; supplied per language alongside GPC data |
| Pedagogical sequencing constraints (letter-introduction order depends on language-specific frequency and regularity) | Letter-introduction sequences are computed from language-specific GPC data and frequency distributions, using algorithms validated against expert-produced sequences |
| FN verbal layer variation (number words, counting conventions) | Treated as a special case within the Writing IR: universal arithmetic structure + language-specific verbal parameters |
The 48-month timeline aligns with the ECM research timeline and the Breakthrough System's ecosystem design: Years 1β4 develop and validate the Writing IR; Year 5 transitions to operational deployment.
Goal: Validate the Writing IR concept through a desk pilot, produce the IR v0.2 specification, build language parameter sets for 12 African languages, and validate against real FLN courseware.
Phase 1 requires no prior deliverables. This is the project's starting point.
Milestones:
Goal: Prepare the Writing IR for operational deployment, achieve operational readiness, prepare the scaling pathway to all AU languages, and execute the sustainability transition.
Phase 2 requires the following Phase 1 deliverables as inputs: Writing IR v0.2; validated language parameter sets for 12 languages; at least 2 courseware-to-IR mappings; formal validation study results; computational phonology pipeline; open-source parameterization tools.
Milestones:
Four domains of expertise define the PI requirements for Easy FLN Localization:
Domain 1 β Intermediate Representation Architecture. Deep expertise in designing, implementing, and scaling canonical intermediate representations for complex, heterogeneous systems. Direct experience with LLVM, MLIR, FHIR, or analogous IR systems. This expertise is shared with the ECM research team β a PI who serves both projects provides architectural coherence.
Domain 2 β Reading Science and Psycholinguistics. Deep expertise in cross-linguistic reading acquisition, orthographic transparency, grapheme-phoneme correspondence theory, and the empirical basis for FLN pedagogy across writing systems.
Domain 3 β Computational Phonology and African Linguistics. Expertise in computational grapheme-to-phoneme modeling, African language phonology, orthography development, and digital language resources for low-resource African languages.
Domain 4 β FLN Courseware Development. Practical experience building and localizing Foundational Literacy and Numeracy courseware for African learners, including direct knowledge of the manual localization process and its costs.
IR Architecture (shared with ECM):
Reading Science and Psycholinguistics:
Computational Phonology and African Linguistics:
FLN Courseware Development:
Project Management and Software Development
The PREMIER Institute provides the engineering and project management capacity that bridges research and deployment. The development team will implement the Writing IR specification, build the parameterization tools, develop the computational phonology pipeline, and deliver the open-source tooling that linguists and courseware developers will use. If the Institute manages both ECM and Easy FLN Localization, project management overhead is shared and architectural coherence is maintained. The Spix Foundation is expected to provide expertise and development support wrt the DPI-Ed/RESPECT Platform.
Core team (approximately 10 FTE across 48 months, scaling by phase): 4 FTE researchers (reading science, computational phonology, African linguistics), 2 FTE linguists/phonologists for field parameterization, and 4 FTE software engineers for IR implementation and tooling. Staffing levels vary by phase: Phase 1 emphasizes research and language parameterization; Phase 2 emphasizes tooling, training, and deployment.
Budget allocation model: Approximately 58% Β± 3% of the project budget flows to personnel and university partner subgrants (researcher salaries, linguist fees, field work). The remaining 40β45% covers infrastructure, NLP model costs, program management, travel, evaluation, and contingency. Project management and platform integration costs are partially shared with ECM.
Partner selection: Research partners are selected based on: demonstrated expertise in reading science, computational phonology, or African linguistics; prior experience with applied research in FLN contexts; presence in or partnership with African institutions; and ability to meet open-source and open-access requirements.
Principal Investigator responsibilities: The Lead PI (shared with ECM) is responsible for Writing IR design and architectural coherence with the Curriculum IR. Each co-PI is responsible for their domain's research quality, deliverable acceptance criteria, and publication. The senior partner from onebillion provides experiential validation against real FLN courseware.
Quality assurance: Each phase undergoes independent external evaluation (Month 24 and Month 48). The desk pilot at Months 1β6 provides an early feasibility checkpoint. SIL International's field linguists provide independent validation of language parameter sets against established phonological analyses.
AUDA-NEPAD serves as the program's institutional home, providing continental legitimacy, existing relationships with all 55 AU member state Ministries of Education, and coordination with the broader Breakthrough System.
| Institution | Country | Contribution |
|---|---|---|
| Radboud University (Verhoeven) | Netherlands | Cross-linguistic reading acquisition research; Writing IR validation framework; 17-orthography empirical base |
| Aix-Marseille University / CNRS (Ziegler) | France | Grain Size Theory expertise; parametric framework for the Writing IR's core dimensions |
| University of the Witwatersrand (Makalela) | South Africa | African bilingual literacy research; Orthographic Depth Hypothesis validation; South African language expertise |
| University of Illinois at Urbana-Champaign (Adve) | USA | LLVM/IR architecture expertise; shared with ECM |
| Indian Institute of Science (Bondhugula) | India | MLIR/dialect architecture expertise; shared with ECM |
| SIL International | Global | African language phonology; PrimerPrep and Bloom tools; decades of literacy program experience across hundreds of African languages |
| Masakhane NLP research community | Africa-wide | Low-resource African language NLP; computational G2P models; language data |
| onebillion | UK/Tanzania | FLN courseware localization experience; open-source codebase; XPRIZE alumni experiential knowledge |
| Ministries of Education (Phase 1 countries) | Africa | Literacy specialists; mother-tongue language expertise; classroom validation sites |
| Spix Foundation | USA | DPI-Ed/RESPECT Platform development expertise; project management |
| ECM research team (PREMIER) | β | Curriculum IR coordination; shared IR architecture; validation of combined pipeline |
Easy FLN Localization may produce governance or standards outputs that outlive the research project:
Writing IR governance: If the Writing IR achieves adoption, its specification requires ongoing maintenance β versioning, new-language onboarding procedures, parameter quality certification. This governance function transfers to the PREMIER Institute's ongoing standards process (or a dedicated Writing IR governance body if the scope warrants it). The Writing IR does not prescribe orthography or language policy; it parameterizes existing written languages. The parameter lifecycle follows four stages: (1) language parameter submission by a field linguist or literacy specialist using the parameterization toolkit; (2) validation against IR constraints (GPC consistency, syllable structure completeness, decodable-word coverage); (3) certification by a qualified literacy specialist confirming pedagogical validity; (4) deployment to the RESPECT Platform, making the language available to all RESPECT Compatible FLN apps.
Parameterization specialist roles: Literacy specialists and field linguists trained during the project transition into ongoing Writing IR parameter elicitation, quality assurance, and validation roles β maintaining and extending language coverage as the Breakthrough System scales from 6 to 21+ countries.
The project's role is to produce the research and infrastructure that these functions require; governance authority flows from the Breakthrough System's established structures.
All Easy FLN Localization outputs will conform to or align with relevant standards. The distinction matters: "comply with" means the project will implement the standard and test/certify against it; "align to" means the project will follow the standard's design principles and interoperate with its interfaces, adapting where the standard does not fully address African contexts.
| Category | Amount (USD) |
|---|---|
| Personnel (PI team, researchers, linguists, phonologists, developers) | 3,200,000 |
| Language parameterization (12 languages: field linguistics, GPC analysis, frequency studies, expert validation) | 600,000 |
| Desk pilot (Phase 1 proof-of-concept: 2 languages, full parameter extraction) | 200,000 |
| Infrastructure (computational phonology pipeline, IR tooling, cloud computing) | 400,000 |
| LLM and NLP model costs (G2P extraction, phonological analysis, 48 months) | 300,000 |
| Partner institution subgrants (universities, SIL, Masakhane) | 1,200,000 |
| Travel and convening (Ministry engagement, literacy specialist workshops, onebillion collaboration) | 600,000 |
| Program management and administration (AUDA-NEPAD + Spix Foundation, shared with ECM where possible) | 700,000 |
| Independent evaluation (external evaluator, 3 assessments) | 225,000 |
| Contingency (~5%) | 375,000 |
| Total | 7,800,000 |
Note: Budget rounds to USD 8 million for planning purposes.
| Phase | Duration | Amount (USD) | Key Activities |
|---|---|---|---|
| Phase 1: Research + Validation | Months 1β24 | 4,500,000 | Desk pilot, IR v0.1βv0.2, 12 languages parameterized, courseware partnerships, validation study |
| Phase 2: Deployment + Operational Readiness | Months 25β48 | 3,500,000 | IR v1.0, tools delivered, national team training, governance framework, end-to-end deployment, scaling plan, sustainability transition |
Funding is structured as staged commitments with a go/no-go gate (see Section 10).
The personnel budget assumes approximately 4 FTE researchers (reading science, computational phonology, African linguistics), 2 FTE linguists/phonologists for field parameterization, and 4 FTE software engineers for IR implementation and tooling, with staffing levels varying by phase.
The budget is lower than ECM's ($8M vs. $10M) for three reasons: (a) the Writing IR and the Curriculum IR share a common architectural methodology and co-develop shared infrastructure β the two projects run simultaneously (Years 1β4) with a single IR design team serving both, reducing duplicated engineering effort; (b) the computational phonology pipeline leverages existing G2P models and SIL tools, keeping the NLP research scope narrower than ECM's curriculum digitization effort; (c) project management and platform integration costs are partially shared with ECM.
The budget and timeline rest on the following assumptions. If an assumption proves false, the corresponding bound applies.
| Assumption | Bound (what Easy FLN Localization is not promising) |
|---|---|
| Africa's predominantly transparent orthographies constrain the Writing IR's parameter space to a tractable level | The Writing IR targets transparent Latin-script orthographies in Phase 1. Opaque orthographies (e.g., English) and non-Latin scripts (Arabic, Ethiopic) are explicitly bounded to Phase 2 or later. |
| Computational G2P models covering 500+ languages (Deri & Knight, 2016) are usable for automated phonological analysis | If G2P model quality is insufficient for specific languages, manual phonological analysis by SIL linguists substitutes. Budget absorbs this through contingency. |
| SIL International's PrimerPrep and Bloom tools are adaptable as component technologies for the Writing IR pipeline | If adaptation proves complex, the project builds equivalent components using the same specification. SIL's tools are a starting point, not a hard dependency. |
| At least 12 African languages have sufficient phonological documentation for complete parameter extraction | SIL International has documented phonological systems for hundreds of African languages. Language selection will prioritize well-documented languages for Phase 1. |
| 48 months is sufficient to reach Writing IR v1.0 with operational parameterization tools | Phase 1 alone (24 months) produces 12 language parameter sets, a computational phonology pipeline, and a validated IR v0.2 β valuable even if Phase 2 requires extension. |
| Easy FLN Localization does not promise that the Writing IR will eliminate all manual FLN localization effort by Month 48 | The IR reduces localization from months of expert redesign to weeks of parameterization and validation. Cultural content (mnemonics, illustrations, audio) still requires local teams. |
The project will be independently evaluated at the end of Phase 1 and Phase 2 by an external evaluator. Evaluation criteria include: letter-sequence accuracy against expert-produced sequences, decodable-text quality, phonological validity of GPC tables, literacy specialist satisfaction with generated localizations, and β where classroom trials are possible β learning outcomes compared to manually localized courseware.
Phase 1 β Phase 2 gate (Month 24): The formal validation study achieves β₯80% pedagogical validity (as rated by literacy specialists) for Writing IRβgenerated FLN localizations across 12 languages. At least 2 FLN courseware developers have mapped their pedagogical structure to the Writing IR. The Writing IR v0.2 specification is published. All 12 language parameter sets are complete. Peer-reviewed results are submitted for publication. If pedagogical validity falls below 60%, the project convenes a technical review to determine whether the IR architecture requires fundamental revision or the approach is non-viable.
If the Writing IR fails to achieve β₯60% pedagogical validity at the Phase 1 gate, the project will have produced three outputs with independent value: (a) complete phonological parameter sets for 6 African languages in machine-readable format β a digital language resource contribution; (b) a rigorous empirical assessment of the Writing IR hypothesis, informing future research; (c) a computational phonology pipeline with documented capabilities across African languages. These resources serve the broader African NLP and literacy communities regardless of the IR's ultimate viability.
| Risk | Mitigation |
|---|---|
| Writing IR too coarse for pedagogy β generated sequences prove pedagogically invalid | Phase 1 targets transparent orthographies (simplest case) where the IR is on strongest theoretical ground. Desk pilot validates before full investment. Opaque orthographies (e.g., English) are explicitly out of initial scope. |
| Language data scarcity β insufficient phonological data for target languages | SIL International has documented phonological systems for hundreds of African languages. Masakhane community provides computational G2P models. Partnership with Ministries provides access to mother-tongue literacy specialists. |
| Mnemonic and cultural content proves non-parameterizable β cultural embedding too deep for formal representation | Mnemonics and culturally specific content are treated as language-specific parameters, not IR-level abstractions. The IR does not attempt to generate cultural content; it generates pedagogical scaffolding into which culturally appropriate content is inserted by local teams. |
| ECM Curriculum IR delayed β blocks the combined pipeline validation | The Writing IR can be validated independently for language-specific pedagogical quality. Combined pipeline validation is a Phase 2 milestone; ECM delays shift this milestone but do not block Writing IR development. |
| FLN courseware developers do not adopt β tools unused | Phase 2 includes partnership with onebillion and 1β2 additional developers. onebillion's open-source codebase provides a validation target regardless of developer adoption. XPRIZE finalists provide a captive adoption audience during Phase 2. |
| PI recruitment contingent on institutional commitment β senior researchers unavailable | Co-PI structure with multiple candidates per domain provides redundancy. The project can proceed with any combination of one IR architect, one reading scientist, and one African linguist. |
| 48-month timeline proves insufficient β IR development takes longer than planned | Phase structure ensures useful outputs at each stage; Phase 1 alone produces language parameter sets and a computational phonology pipeline with independent value. |
Easy FLN Localization's principal external dependency is ECM's Curriculum IR (for combined pipeline validation). The Writing IR can be validated independently; the combined pipeline is a Phase 2 milestone.
If ECM's Curriculum IR is delayed:
If onebillion or other FLN courseware partners are unavailable:
If language data for specific target languages proves insufficient:
The following gates operationalize the Month-24 Proof of Capability (Section 1B) with specific action protocols for each contingency, and define additional checkpoints for operational risk management.
| Gate | Timing | Condition | Action if Not Met |
|---|---|---|---|
| Phase 1 β Phase 2 release | Month 24 | β₯80% pedagogical validity across 12 languages; at least 2 FLN courseware-to-IR mappings; IR v0.2 published; all 12 language parameter sets complete | If validity β₯60% but <80%: extend Phase 1 for architectural refinement. If validity <60%: convene technical review to assess viability. |
| Desk pilot checkpoint | Month 6 | Prototype Writing IR constructed; algorithmically generated letter-introduction sequences compared against expert-produced sequences for 2 languages | If results are negative, project pivots design before committing Phase 1's full investment |
| Courseware validation | Month 18 | At least 1 FLN courseware developer has mapped pedagogical structure to the Writing IR with measurable results | If no developer adoption, project intensifies partnership efforts (onebillion, XPRIZE entrants) and adjusts tooling priorities |
| Phase 2 completion | Month 48 | IR v1.0 published; operational tools delivered; at least 2 FLN courseware applications localized through the IR into all 12 languages | Joint funder-project review; scope and timeline adjustment for any incomplete deliverables |
The 6 Phase 1 countries collectively serve tens of millions of K-3 students across dozens of AU languages. At USD 8 million for infrastructure enabling FLN courseware to reach these students in their mother tongues, the per-student investment is negligible β and amortizes toward zero as additional languages and countries join the Writing IR framework.
If the Writing IR achieves its target of β₯80% pedagogical validity for FLN localizations across 12 languages, it will:
Following the 48-month research project, Writing IR governance and maintenance transfer to the PREMIER Institute's ongoing standards process. The Writing IR's operational infrastructure β parameterization tools and platform APIs β will be maintained by the RESPECT Platform engineering team, funded through V&P_Core's trademark and certification revenue.
The primary scaling dimension is the number of parameterized languages. The Writing IR shifts the bulk of FLN localization cost away from app developers and onto a Development Partner, who pays the one-time cost per language to map that language into the Writing IR. Each new language requires: (a) phonological data collection and GPC table construction; (b) parameter encoding using the parameterization toolkit; (c) validation by literacy specialists. Once a language has been parameterized, every IR-compatible FLN application can localize into that language β the Development Partner pays once, and all apps benefit.
Extending the Writing IR to all AU languages used in formal education (estimated at 200β300 languages) is fundable through a combination of follow-on grants, GPE allocations, and Ministry co-funding, spread over the Phase 2β3 expansion period.
The PREMIER Institute owns all intellectual property resulting from Easy FLN Localization research. Funding partners receive a worldwide, paid-up, royalty-free, sub-licensable, non-exclusive license to all such IP. Code and specifications (the Writing IR specification, parameterization tools, computational phonology pipeline) are released under the Apache License 2.0. Creative works (illustrations, audio recordings, training materials artwork) are released under the appropriate Creative Commons license. University research partners retain academic publication rights; all code and infrastructure deliverables are owned by the PREMIER Institute.
SIL's existing tools (PrimerPrep, Bloom) remain under their current licenses; the Writing IR specification is designed to interoperate with but not depend on any single proprietary or licensed tool.
Easy FLN Localization addresses the most expensive localization problem in education technology. It is the precise problem that the Breakthrough Project's Phase 1 scope (six countries, K-3, FLN) and the XPRIZE Accelerate Learning Challenge require solving β first in six countries, then in 21, then continent-wide.
The evidence from reading science establishes that the expense is addressable. Written languages share deep structural universals (Perfetti & Verhoeven, 2022). The variation among them is parametric and systematically characterizable (Ziegler & Goswami, 2005). Africa's transparent orthographies constrain the parameter space (Makalela, 2024). The component technologies β computational G2P models, language-parameter elicitation tools, decodable-text generators β exist. The formal abstraction that integrates them does not.
Easy FLN Localization will build that abstraction. The Writing IR applies a proven architectural pattern β the same one that makes Easy Curriculum Mapping possible β to a sibling problem, within the same institute, using shared methodology and shared IR architecture. The project's 48-month timeline aligns with ECM, the Breakthrough System's phased deployment, and the XPRIZE competition cycle. Manual FLN localization during Years 1β4 produces the ground truth; the Writing IR during Years 3β4 produces the automation; Year 5 begins operational deployment at the moment when V&P_Core is scaling from six to 21+ countries and XPRIZE finalists are entering the Ecosystem.
The children currently in K-3 across Sub-Saharan Africa will age out of foundational learning within this project's 48-month timeline. Africa's best FLN courseware exists. The Writing IR will make it available β in every mother tongue, on the RESPECT Platform, at a cost that scales.