Loss ratio analytics for long-term health insurance — cohort development analysis, stage-adaptive projection, regime detection, and backtest validation.
Overview
lossratio is a loss ratio analytics toolkit for long-term health insurance, covering cohort development analysis, stage-adaptive projection, regime detection, and backtest validation. Input is long-format experience data — each row (cohort × dev × demographic) maps to one Triangle cell, with loss and premium columns (loss, premium).
In long-term health insurance, new claims and premium are generated and earned continuously within each cohort, so cumulative loss and premium grow together. Age-to-age (ATA) factors tend to show high variability in early development, and premium (≈ risk premium) becomes a more stable and reliable anchor. Product redesigns, underwriting changes, or regulatory actions can also produce structural breaks that accumulate across cohorts.
In this setting, lossratio provides stage-adaptive (SA) loss-ratio projection, supported by maturity point and regime detection. SA uses an exposure-driven (ED) model before the maturity point and chain ladder (CL) after it. Regime detection identifies homogeneous groups of cohorts (regimes) that share similar loss dynamics, separating structural break points and determining which cells to use for estimation.
It provides:
- Three aggregation frameworks of the experience data: cohort × dev (
Triangle), calendar period (Calendar), and portfolio total (Total) - Age-to-age (
ATA) and exposure-driven (ED) development modeling via the worker layer (fit_cl,fit_ed,fit_ata,fit_intensity) - Role-specific dispatchers (
fit_loss,fit_premium) that project a single side with standard errors and confidence intervals - Loss ratio projection (
fit_ratio) composes loss and premium fits with three methods:-
"ed"— exposure-driven (default): additive recursion across all dev — unconditional safe baseline -
"cl"— classical chain ladder (Mack model) -
"sa"— stage-adaptive: ED before maturity, CL after — a composition of the above two (requires maturity detection) Standard errors viase_method = "fixed"(premium treated as known) or"delta"(delta method onL / P); CIs viaconf_level; bootstrap option for empirical CIs.
-
- Cell-selection diagnostics — which cells to use for estimation:
-
detect_maturity— dev axis: link beyond which ATA factors are stable -
detect_regime— cohort axis: structural breaks across underwriting
-
- Projection diagnostic:
-
detect_convergence— valuation at which the projected ultimate loss ratio stops revising (operates on a fittedRatioFit)
-
- Backtest and triangle visualisations
Expected input
A long-format data.frame / data.table. Column names are configurable – pass them via as_triangle() arguments and the function standardises internally.
as_triangle() argument |
Meaning | Example |
|---|---|---|
cohort |
Cohort period (typically UY for long-term health) (Date) |
"uy_m", "uy"
|
calendar or dev
|
Calendar period (Date) or dev period (integer) |
"cy_m" / "dev_m"
|
loss |
Per-period or cumulative claim amount |
"incr_loss" / "loss"
|
premium |
Per-period or cumulative premium (risk premium) |
"incr_premium" / "premium"
|
cell_type (default)
|
Interpretation of loss / premium values |
"incremental" / "cumulative"
|
groups (optional)
|
Grouping column(s): product, coverage, age, … | "coverage" |
Two more arguments govern interpretation:
-
cell_type–"incremental"(default) or"cumulative". Raw experience is typically incremental; if your data is pre-summed cumulative, passcell_type = "cumulative"andas_triangle()derives the incremental form via per-cohort diff. -
grain–"auto"(default, inferred fromcohortdates) or"M"/"Q"/"H"/"Y". Aggregates to monthly / quarterly / half-yearly / yearly granularity.
as_triangle() validates the schema, coerces date columns, derives the missing axis when one of calendar / dev is supplied, bins to grain, and emits cumulative + incremental cell values plus the derived ratio, margin, profit columns.
Column convention
Throughout the package, cumulative is the unmarked default and per-period values carry an incr_ (incremental) prefix:
| Metric | Cumulative (default) | Per-period (incr_) |
|---|---|---|
| Loss | loss |
incr_loss |
| Premium | premium |
incr_premium |
| Ratio | ratio |
incr_ratio |
| Margin | margin |
incr_margin |
| Profit | profit |
incr_profit |
Raw experience input is per-period only (incr_loss, incr_premium); as_triangle() produces both forms in the output. Worker fit functions (fit_cl, fit_ed, fit_ata, fit_intensity) take loss / premium / weight arguments; dispatcher functions (fit_loss, fit_premium) and the composition fit_ratio use role-specific loss_* / premium_* argument names. Cumulative slots ("loss", "premium") are the defaults.
Installation
# pak (recommended)
pak::pak("seokhoonj/lossratio-r")
# remotes (alternative)
remotes::install_github("seokhoonj/lossratio-r")Quick Start
library(lossratio)
# Built-in calibrated synthetic experience data
# (per-coverage dev curve calibrated to a real portfolio's broad shape;
# cell-level values and cohort patterns are randomly generated)
data(experience)
# Build the canonical cohort × dev structure
tri <- as_triangle(
experience,
groups = "coverage",
cohort = "uy_m",
calendar = "cy_m",
loss = "incr_loss",
premium = "incr_premium",
cell_type = "incremental" # default; use "cumulative" for pre-summed cells
)
plot(tri) # cohort trajectories
plot_triangle(tri) # cell heatmap
# Exposure-driven fit (additive ED intensity)
ed <- fit_ed(tri, loss = "loss", exposure = "premium")
# Chain ladder fit (multiplicative ATA factors)
cl <- fit_cl(tri, loss = "loss")
plot(cl, type = "projection")
# Ratio fit (default: ED — exposure-driven baseline.
# Switch to method = "cl" or "sa" for cohort-anchored projection.)
ratio <- fit_ratio(tri)
plot(ratio, metric = "ratio", cell_type = "cumulative")
summary(ratio)
# Cell selection: maturity (dev axis) + regime (cohort axis)
detect_maturity(tri[coverage == "surgery"])
detect_regime(tri[coverage == "surgery"], method = "e_divisive")
# Projection diagnostic: when does the projected ultimate LR stop revising?
detect_convergence(ratio)Aggregation Frameworks
The same long-format experience data can be viewed three ways:
| Builder | Output object | Dimension | Use case |
|---|---|---|---|
as_triangle() |
Triangle |
cohort × dev (2D) | Chain ladder, ED, SA projection |
as_calendar() |
Calendar |
calendar period (1D) | Calendar-year trend / diagonal effect |
as_total() |
Total |
portfolio total (0D, per group) | High-level comparison across groups |
After as_triangle, downstream columns are standardized to cohort and dev regardless of input granularity (uy_m / uy_q / uy, etc.). Original column names are preserved as attributes (cohort, calendar, dev); grain is stored as grain ("M"/"Q"/ "H"/"Y").
Methods
Exposure-Driven (default)
fit_ratio(method = "ed") (default) or fit_ed(). All loss increments use premium (risk premium) as the denominator: . Unconditional safe baseline – no maturity dependency, robust under early-dev age-to-age volatility.
When to use: as a baseline. The pooled intensity assumes cohorts are reasonably homogeneous in loss-per-premium level; under cohort-level drift (e.g., underwriting / coverage regime change), the projection biases toward the pooled mean and may over-project post-change cohorts. See the regime argument for explicit filtering.
Chain Ladder
fit_ratio(method = "cl") or fit_cl(). Classical Mack (1993) chain ladder with analytic standard errors. The cohort’s own cum_loss acts as the anchor, so cohort-level drift propagates naturally without explicit regime detection.
When to use: once age-to-age factors stabilise; particularly suited to cohort-level drift scenarios because the cohort’s observed trajectory anchors the projection.
Stage-Adaptive
fit_ratio(method = "sa"). Composition of the two: ED before maturity, CL after. Combines ED’s early-dev stability with CL’s cohort-anchored later-dev behaviour.
When to use: long-tail portfolios where early dev is volatile (ED phase) but cohort-level drift needs cohort-anchored projection in later dev (CL phase).
Prior-Anchored (BF / CC)
fit_loss(method = "bf") / fit_bf() and fit_loss(method = "cc") / fit_cc(). Both blend an expected loss ratio (ELR) with the observed loss: . Bornhuetter-Ferguson (1972) takes the ELR from an external prior; Cape Cod (Stanard 1985) derives it from the data by payout weighting.
When to use: immature cohorts or post-rate-change cohorts where the observed data is too thin to anchor a projection on its own – BF when a credible external prior exists, CC when the portfolio suggests a single cohort-cohesive ELR target.
Visualisation
Both S3 generics dispatch on object class:
plot(x) # base plot generic — line / panel diagnostics
plot_triangle(x) # lossratio generic — cell heatmap layoutplot() and plot_triangle() work uniformly across Triangle, Calendar, Link, ATAFit, EDFit, CLFit, RatioFit, Maturity, Convergence, and Regime objects.
Documentation
?as_triangle
?fit_ratio
?detect_regime
vignette("diagnostics", package = "lossratio")License
MPL-2.0. See LICENSE.md.
Author
Seokhoon Joo (seokhoonj@gmail.com)
