Every number Answena produces is reproducible from the signals on your page, with formulas published below. No black box, no hidden weights. This is the full spec of the scoring engine — what we measure, how we combine it, and which peer-reviewed papers each signal is grounded in.
engine/components.js. Weights sum to 1. Current baseline + learned blend is shown at /api/learn/weights.1 / (1 + exp(1.5 · (totalSeverity − 1.2))). Multiple small penalties compound; a single severe penalty (e.g. broken schema + thin content) clamps the score hard.| Cluster | What it measures | Weight |
|---|---|---|
| semanticRelevance | Topical focus, intent match, query–body embedding similarity (when API key present). | 0.13 |
| intentCoverage | Informational / commercial / transactional / navigational intent signals, H-hierarchy completeness. | 0.10 |
| contentClarity | Readability (Flesch / Ateşman / Amstad / Oborneva / OSMAN per language), sentence rhythm, cognitive load. | 0.10 |
| trustReliability | HTTPS, author byline, datePublished / dateModified, outbound authoritative citations. | 0.12 |
| entityAuthority | Organisation schema completeness, sameAs graph, Wikipedia/Wikidata presence, named-entity density. | 0.10 |
| structureReadability | Heading hierarchy, paragraph shape, list and table presence, schema presence & depth. | 0.10 |
| citationPotential | Quotable sentences, statistics, tables — the 9 Princeton GEO tactics (§4). | 0.13 |
| freshness | Publish/modify dates, visible last-updated text, recency of cited sources. | 0.07 |
| crossReference | Internal linking depth, breadcrumb presence, hub-and-spoke topical network. | 0.07 |
| toolActionability | FAQ / HowTo schema, step lists, tool links, embedded calculators, copyable code blocks. | 0.08 |
We dispatch the published per-language formula — coefficients do not transfer across languages.
| Language | Ease formula | Grade formula | Citation |
|---|---|---|---|
| English | Flesch Reading Ease | Flesch–Kincaid, Gunning Fog, SMOG, CLI, ARI (averaged) | DuBay 2004; Kincaid 1975; Gunning 1952; McLaughlin 1969; Coleman & Liau 1975; Senter & Smith 1967 |
| Turkish | Ateşman (1997) | Bezirci–Yılmaz (2010) | Dergipark article/492667; NCBI PMC11102775 |
| German | Amstad (1978) | Wiener Sachtextformel 1 (Bamberger & Vanecek 1984) | Amstad, Lesbarkeit deutscher Texte; Bamberger / Vanecek, Lesen–Verstehen–Lernen–Schreiben |
| Russian | Oborneva (2005) | Derived from Oborneva ease | Oborneva, Автоматизированная оценка сложности учебных текстов |
| Arabic | OSMAN-lite | Derived from OSMAN ease + long-word penalty | Al-Tamimi 2013, OSMAN: A Novel Arabic Readability Metric |
We audit your page against the 9 content interventions measured by Aggarwal et al., KDD 2024.
Each tactic has a published citation-visibility uplift. Your audit card shows the tactics you have applied, the tactics you haven't, and a priority ranking uplift × gap — i.e. the highest-leverage unclaimed wins first.
After the global composite, we re-weight the 10 clusters to produce 5 platform-specific scores — because ChatGPT, Claude, Gemini, Perplexity and Google AIO retrieve differently. Example: Perplexity weights citationPotential and trustReliability higher (sources shown inline); Google AIO weights structureReadability and entityAuthority higher (Knowledge-Graph-tight retrieval).
Full weight matrices: engine/platformScores.js.
With ANTHROPIC_API_KEY set, we actually ask the LLMs. The probe runs each query at 3 temperatures ([0, 0.3, 0.7]) and takes the median outcome — a pattern from Wang et al.'s self-consistency decoding (ICLR 2023) and Google AI Mode's 2025 Query Fan-Out architecture. This cuts single-call variance (which can be ±15%) to roughly ±3% on the final citation rate.
Probes are cached for 6 hours by {domain, query, temperature} hash, so a re-scan doesn't re-spend API credits.
Every 6 hours a background sweep runs a controlled seed corpus (~37 curated URLs across tiers and sectors) through the full scorer and records the real citation rate from LLM probes. The corpus is appended to data/groundtruth-YYYY-MM.jsonl.
Against that corpus we report: Spearman ρ (rank correlation between our score and citation rate), Pearson r (linear fit), AUC-ROC (ability to separate cited from non-cited at ≥30% threshold), and NDCG@10 (top-10 ranking match). Live numbers: /validation.html.
The 10 cluster weights aren't fixed. An L2-regularised logistic regression continuously learns from the ground-truth corpus (engine/metaLearner.js). New weights are only accepted if Spearman and AUC hold on the validation set — regressions trigger an automatic rollback to the previous snapshot.
Each accepted version is stored in data/weights/weights_YYYY_Qn_<timestamp>.json, so historical calibration can be diffed and any version can be restored via POST /api/admin/retrain/rollback.
Presence-only scoring can't tell a stub {"@type":"FAQPage"} from one with 10 Q&A items and full acceptedAnswer.text. Our schemaDepth module measures completeness per entity (filled / expected) with type-specific bonuses (FAQPage needs answer text on ≥80% of items, HowTo needs step text on ≥80% of steps, BreadcrumbList needs ≥2 levels), and a graph-connectivity bonus when @id / @graph stitches entities.
engine/pageAnalyzer.js, a formula in engine/components.js, and a weight in engine/metaLearner.js.Source: github.com/siyahtonu/gec · Last updated with every deploy · Calibration refreshed every 6h