# AMR/Smatch-Style Eval

This eval compares MemeLingua and Toki Pona on semantic graph preservation.

## Method

- Cases live in `eval/amr-smatch-cases.json`.
- Each case has an English target, a hand-authored canonical AMR-style gold graph, a MemeLingua expression, and a Toki Pona expression.
- `npm run eval:amr` asks a model to decode each expression into canonical AMR triples after reading only that language's compact primer.
- The scorer computes exact triple precision, recall, and F1 over normalized concept/relation triples.

This is a Smatch-style graph-overlap eval, not a full PENMAN variable-alignment run. The goal is to test the same idea Smatch rewards: semantic relation preservation.

## Why This Eval

The vocabulary coverage evals reward broad lexical coverage, which favors Toki Pona. AMR-style graph scoring instead rewards whether the language preserves semantic roles and relations:

- actor vs object
- recipient vs source
- destination vs purpose
- condition vs time
- negation
- location-in vs location-on
- topic/aboutness

These are the distinctions MemeLingua is designed to make explicit.

## Run

```bash
npm run eval:amr
npm run eval:amr -- --write
```

Generated JSONL and Markdown reports are written under `eval/results/` and `eval/reports/`.

## Current Case Set

The source set now contains 41 semantic graph cases. It includes the original relation-preservation comparison cases plus 25 added cases for:

- perception and speech
- source/destination and topic relations
- ownership and money transfer
- negation, condition, and cause
- feeling, fear, love, sleep, change, connection, questions, liquid/drinking, and completion/celebration

## Current Result

Latest local written run: `eval/reports/amr-smatch-2026-06-20T05-10-55-490Z.md`

| System | Matched | Gold | Predicted | Precision | Recall | F1 |
|---|---:|---:|---:|---:|---:|---:|
| MemeLingua | 103 | 103 | 103 | 100% | 100% | 100% |
| Toki Pona | 74 | 103 | 110 | 67.3% | 71.8% | 69.5% |

The Toki Pona misses cluster around distinctions that are intentionally broad or context-dependent in Toki Pona:

- `wile` does not force WANT vs NEED.
- `mi wile e ni` expresses wanting this, not choosing this.
- `mi pilin sama sina` expresses feeling/thinking similarly, not explicit agreement.
- `kama jo ... tan` expresses coming to have/acquiring from, not a direct TAKE event.
- `la`, `lon`, and `tawa` are broad contextual/location/direction markers, so topic, source, destination, condition, and purpose can be under-specified.

MemeLingua scores higher here because it has explicit roots or relation markers for NEED, CHOOSE, AGREE, TAKE/FROM, ABOUT, CAUSE, TO/FROM/IN/ON, and line-broken temporal frames.
