# Methodology & Roadmap

## Purpose
MemeLingua aims to maximize explicit communication using a small, fixed inventory of semantic roots. The current locked root inventory has 100 single-emoji roots.

This repo tracks:
- canonical roots (emoji + short root word)
- grammar rules and morphology (doubling, pointer direction compounds, time compounds, naming)
- benchmark coverage
- experimental tests (functional prompts, narratives)

## Root selection principles
- Semantic primitives and high-yield relational words first.
- Replace low-yield content roots with compositional compounds.
- Prefer clear, iconic emojis for roots; allow compounds for derived meanings.
- Treat Arabic numerals as literal notation outside the emoji root inventory.

Examples:
- :loud_sound: SOUND (music = :loud_sound::page_facing_up:)
- :control_knobs: CHOOSE/DECIDE
- :ok_hand: AGREE/COMMIT
- :pray: POLITE/GRATITUDE/ACK
- :traffic_light: IF
- :bulb: REASON/BECAUSE
- TIME relations: :alarm_clock::point_left: BEFORE, :alarm_clock::point_right: AFTER, :alarm_clock::point_down: NOW

## Benchmarks
See `docs/tests.md`.

We track:
- NSM semantic primes + 150 sentences
- Leipzig–Jakarta 100 concepts
- Swadesh 100/200
- ASJP / Holman 40
- Dolgopolsky 15
- Leipzig Glossing Rules adaptation
- Conlang Syntax Test Cases (218 sentences)
- functional prompts across daily domains
- narrative plot compression and clarity

## Next steps
1. Adversarial minimal pairs for role-dependent ambiguity (AGREE vs ACCEPT vs ACKNOWLEDGE; WANT vs CHOOSE vs NEED vs CAN vs MAYBE).
2. Long-form encodings of mysteries, disagreements, medical consults, travel, rules, and flashbacks.
3. Blind encode/decode with participants to measure semantic retention per root-token.
4. Keep the 100-root v1.5 baseline locked; Arabic numerals handle number concepts outside the root inventory.
