← Answena

Methodology

Every number Answena produces is reproducible from the signals on your page, with formulas published below. No black box, no hidden weights. This is the full spec of the scoring engine — what we measure, how we combine it, and which peer-reviewed papers each signal is grounded in.

Short version: we audit your page for citation readiness — would an LLM (ChatGPT, Claude, Gemini, Perplexity, Google AIO) pick this up and quote it? We score 10 clusters of signals, apply a penalty gate, then cross-check against real citation outcomes on a controlled corpus.

1. The three-layer scorer

  1. Hard filters — kill-switches that cap the final score: empty body, thin content (< 150 words), blocked robots, no schema at all, no headings, soft-404 pattern, JavaScript-only rendering.
  2. 10 weighted clusters — each a 0..1 number produced by engine/components.js. Weights sum to 1. Current baseline + learned blend is shown at /api/learn/weights.
  3. Penalty multiplier — logistic gate 1 / (1 + exp(1.5 · (totalSeverity − 1.2))). Multiple small penalties compound; a single severe penalty (e.g. broken schema + thin content) clamps the score hard.

2. The 10 clusters

ClusterWhat it measuresWeight
semanticRelevanceTopical focus, intent match, query–body embedding similarity (when API key present).0.13
intentCoverageInformational / commercial / transactional / navigational intent signals, H-hierarchy completeness.0.10
contentClarityReadability (Flesch / Ateşman / Amstad / Oborneva / OSMAN per language), sentence rhythm, cognitive load.0.10
trustReliabilityHTTPS, author byline, datePublished / dateModified, outbound authoritative citations.0.12
entityAuthorityOrganisation schema completeness, sameAs graph, Wikipedia/Wikidata presence, named-entity density.0.10
structureReadabilityHeading hierarchy, paragraph shape, list and table presence, schema presence & depth.0.10
citationPotentialQuotable sentences, statistics, tables — the 9 Princeton GEO tactics (§4).0.13
freshnessPublish/modify dates, visible last-updated text, recency of cited sources.0.07
crossReferenceInternal linking depth, breadcrumb presence, hub-and-spoke topical network.0.07
toolActionabilityFAQ / HowTo schema, step lists, tool links, embedded calculators, copyable code blocks.0.08

3. Readability (multi-language)

We dispatch the published per-language formula — coefficients do not transfer across languages.

LanguageEase formulaGrade formulaCitation
EnglishFlesch Reading EaseFlesch–Kincaid, Gunning Fog, SMOG, CLI, ARI (averaged)DuBay 2004; Kincaid 1975; Gunning 1952; McLaughlin 1969; Coleman & Liau 1975; Senter & Smith 1967
TurkishAteşman (1997)Bezirci–Yılmaz (2010)Dergipark article/492667; NCBI PMC11102775
GermanAmstad (1978)Wiener Sachtextformel 1 (Bamberger & Vanecek 1984)Amstad, Lesbarkeit deutscher Texte; Bamberger / Vanecek, Lesen–Verstehen–Lernen–Schreiben
RussianOborneva (2005)Derived from Oborneva easeOborneva, Автоматизированная оценка сложности учебных текстов
ArabicOSMAN-liteDerived from OSMAN ease + long-word penaltyAl-Tamimi 2013, OSMAN: A Novel Arabic Readability Metric

4. Princeton GEO tactics

We audit your page against the 9 content interventions measured by Aggarwal et al., KDD 2024. Each tactic has a published citation-visibility uplift. Your audit card shows the tactics you have applied, the tactics you haven't, and a priority ranking uplift × gap — i.e. the highest-leverage unclaimed wins first.

5. Per-platform weights

After the global composite, we re-weight the 10 clusters to produce 5 platform-specific scores — because ChatGPT, Claude, Gemini, Perplexity and Google AIO retrieve differently. Example: Perplexity weights citationPotential and trustReliability higher (sources shown inline); Google AIO weights structureReadability and entityAuthority higher (Knowledge-Graph-tight retrieval).

Full weight matrices: engine/platformScores.js.

6. Self-consistency probing

With ANTHROPIC_API_KEY set, we actually ask the LLMs. The probe runs each query at 3 temperatures ([0, 0.3, 0.7]) and takes the median outcome — a pattern from Wang et al.'s self-consistency decoding (ICLR 2023) and Google AI Mode's 2025 Query Fan-Out architecture. This cuts single-call variance (which can be ±15%) to roughly ±3% on the final citation rate.

Probes are cached for 6 hours by {domain, query, temperature} hash, so a re-scan doesn't re-spend API credits.

7. Ground truth & calibration

Every 6 hours a background sweep runs a controlled seed corpus (~37 curated URLs across tiers and sectors) through the full scorer and records the real citation rate from LLM probes. The corpus is appended to data/groundtruth-YYYY-MM.jsonl.

Against that corpus we report: Spearman ρ (rank correlation between our score and citation rate), Pearson r (linear fit), AUC-ROC (ability to separate cited from non-cited at ≥30% threshold), and NDCG@10 (top-10 ranking match). Live numbers: /validation.html.

Published GEO targets (Princeton; Ahrefs Brand Radar): ρ ≥ 0.45 and AUC ≥ 0.70 are "production-ready" calibration. Answena's current numbers are public on the validation page — we don't hide regressions.

8. Meta-learner (auto-retrain)

The 10 cluster weights aren't fixed. An L2-regularised logistic regression continuously learns from the ground-truth corpus (engine/metaLearner.js). New weights are only accepted if Spearman and AUC hold on the validation set — regressions trigger an automatic rollback to the previous snapshot.

Each accepted version is stored in data/weights/weights_YYYY_Qn_<timestamp>.json, so historical calibration can be diffed and any version can be restored via POST /api/admin/retrain/rollback.

9. Schema depth (JSON-LD completeness)

Presence-only scoring can't tell a stub {"@type":"FAQPage"} from one with 10 Q&A items and full acceptedAnswer.text. Our schemaDepth module measures completeness per entity (filled / expected) with type-specific bonuses (FAQPage needs answer text on ≥80% of items, HowTo needs step text on ≥80% of steps, BreadcrumbList needs ≥2 levels), and a graph-connectivity bonus when @id / @graph stitches entities.

10. What we deliberately don't do

11. References

Source: github.com/siyahtonu/gec · Last updated with every deploy · Calibration refreshed every 6h