Validate raw experience data, aggregate it onto a (group, cohort, dev) grid, and assign the Triangle S3 class so the downstream
methods (fit_ratio(), fit_loss(), backtest(), plot(),
plot_triangle(), detect_maturity(), detect_regime(),
detect_convergence(), ...) can dispatch on the result.
Three steps happen inside this single call:
Validate – required columns are present, dates coerce cleanly, the grain is consistent. Hard errors on schema issues so downstream code never receives malformed input.
Standardise + aggregate – rename to package-canonical column names (
cohort,calendar,dev,loss,premium, ...), auto-detect grain (M/Q/H/Y) fromcohortspacing, derivedevfrom(cohort, calendar), aggregate to(group, cohort, dev), and enrich with cumulative / share / LR columns.Tag – set S3 class
c("Triangle", "data.table", "data.frame")so every*.Trianglemethod becomes available.
lossratio's Triangle is a data.table in long format (one
row per (group, cohort, dev) cell) with the enriched columns
described above. The name Triangle refers to the conceptual
cohort x dev triangular region – older cohorts have more observed
dev cells than newer ones – not to a matrix layout.
The auto-grain detection (grain = "auto", default) reads cohort
value spacing; explicit values must be at least as coarse as the
input grain. The user does not pre-bin data or supply a dev_*
column.
The result contains:
cumulative loss and cumulative premium,
per-period and cumulative proportions,
per-period and cumulative margin,
profit indicators,
per-period loss ratio (
incr_ratio = incr_loss / incr_premium) and cumulative loss ratio (ratio = loss / premium).
The cumulative loss ratio is defined as: $$ratio = loss / premium$$
For long-term health insurance applications, risk premium is commonly
used as the premium measure.
Proportion variables are computed within each (cohort, dev) cell:
incr_loss_share = incr_loss / sum(incr_loss)incr_premium_share = incr_premium / sum(incr_premium)loss_share = loss / sum(loss)premium_share = premium / sum(premium)
Therefore, for a fixed (cohort, dev) cell, the proportions
sum to 1 across groups. These are useful for examining the composition of
each development cell across products or other grouping variables.
Usage
as_triangle(
df,
groups = NULL,
cohort,
calendar = NULL,
dev = NULL,
loss,
premium,
grain = "auto",
cell_type = c("incremental", "cumulative"),
fill_gaps = FALSE
)Arguments
- df
A data.frame containing experience data with per-period loss and premium columns plus
cohortandcalendarDate columns (or any input that the internal Date coercion accepts: Date, POSIXt, integeryyyy/yyyymm/yyyymmdd, ISO string).- groups
Column(s) used for grouping (e.g., product, gender).
- cohort
Single column (raw name) defining the underwriting / premium period start (e.g.,
"uy_m").- calendar
Single column (raw name) defining the calendar period of the observation (e.g.,
"cy_m"). Optional – supply eithercalendarordev(or both). Whencalendaris given,devis derived internally viacount_periods(cohort, calendar, grain).- dev
Single column (raw name) holding pre-computed development periods (e.g.,
"dev_m"). Optional – supply eithercalendarordev(or both). When onlydevis given, the calendar axis is omitted from the attribute (downstream calendar-diagonal logic uses cohort + dev). When both are given,devis cross-checked againstcount_periods(cohort, calendar, grain).- loss
Single character; per-period loss column in
df(raw name, e.g.,"incr_loss").Single character; per-period premium column in
df(raw name, e.g.,"incr_premium"). Premium measure used as denominator for loss ratio calculations. For long-term health insurance applications, risk premium is commonly used.- grain
One of
"auto"(default),"M","Q","H","Y"."auto"infers the grain from thecohortvalue spacing. Explicit values must be at least as coarse as the input grain; the input is binned (floored) to that grain before aggregation.- cell_type
One of
"incremental"(default) or"cumulative". Whetherlossandpremiumindfalready hold per-period (incremental) values or cumulative-within-cohort values. The internal triangle is always built on the incremental representation;"cumulative"inputs are differenced first.- fill_gaps
Logical; if
TRUE, zero-fill missing(groups, cohort, dev)cells so that every cohort has a consecutivedevsequence. DefaultFALSE, which raises an error when gaps are detected. Usevalidate_triangle()to inspect gaps before deciding.
Value
A data.frame with class "Triangle", containing the following
derived columns:
- n_cohorts
Number of distinct cohorts observed
- loss, incr_loss
Cumulative and per-period loss
- premium, incr_premium
Cumulative and per-period premium
- ratio, incr_ratio
Cumulative and per-period loss ratio
- margin, incr_margin
Cumulative and per-period margin (
premium - loss)- profit, incr_profit
Profit indicator (factor
"pos"/"neg")- loss_share, incr_loss_share
Cumulative and per-period proportions of loss within each
(cohort, dev)cell- premium_share, incr_premium_share
Cumulative and per-period proportions of premium within each
(cohort, dev)cell
Attributes set on the returned object: groups, cohort,
calendar, grain, dev (= "dev_<lower(grain)>", e.g.
"dev_m"), loss, premium, longer.
Examples
if (FALSE) { # \dontrun{
df <- data.frame(
pd_cd = rep(c("P001", "P002"), each = 6),
pd_nm = rep(c("cancer", "health"), each = 6),
uy_m = rep(as.Date(c("2023-01-01", "2023-02-01", "2023-03-01")), 4),
cy_m = rep(as.Date(c("2023-01-01", "2023-02-01")), 6),
incr_loss = runif(12, 80, 120),
incr_premium = runif(12, 90, 110)
)
# auto-detected monthly grain
res_m <- as_triangle(
df,
groups = "pd_cd",
cohort = "uy_m",
calendar = "cy_m",
loss = "incr_loss",
premium = "incr_premium"
)
# explicit quarterly view (re-bins monthly input to quarterly)
res_q <- as_triangle(
df,
groups = "pd_cd",
cohort = "uy_m",
calendar = "cy_m",
loss = "incr_loss",
premium = "incr_premium",
grain = "Q"
)
head(res_m)
attr(res_m, "longer")
} # }
