Compute group-wise summary statistics for age-to-age factors from an
object of class "ATA". This function serves two purposes:
Diagnostics: provides descriptive statistics (
mean,median,wt,cv) that help the user assess the stability and consistency of observed ata factors across cohorts.Estimation: fits a no-intercept weighted least squares model per ata link to produce the WLS-estimated factor (
f), its standard error (f_se), relative standard error (rse), and Mack sigma (sigma). These are used downstream byfind_ata_maturity()andfit_ata().
Usage
# S3 method for class 'ATA'
summary(object, alpha = 1, digits = 3, ...)Arguments
- object
An object of class
"ATA", typically produced bybuild_ata().- alpha
Numeric scalar controlling the variance structure in the WLS fit. Default is
1.- digits
Number of decimal places to round numeric columns. Default is
3. PassNULLto skip rounding.- ...
Additional arguments passed to the internal WLS estimation.
Value
A data.table with class "ATASummary" containing one row
per ata link:
ata_from,ata_to,ata_linkLink identifiers.
meanArithmetic mean of observed ata factors.
medianMedian of observed ata factors.
wtVolume-weighted mean: \(\sum C_{i,k+1} / \sum C_{i,k}\), independent of
alpha.cvCoefficient of variation of observed ata factors (\(SD / mean\)). Used by
find_ata_maturity()to assess stability.fWLS-estimated factor. Equals
wtwhenalpha = 2and no zerovalue_fromrows are present.f_seStandard error of the WLS-estimated factor.
rseRelative standard error of the WLS-estimated factor (\(f\_se / f\)).
sigmaMack sigma (residual standard deviation from the WLS fit). Used in Mack variance estimation.
n_obsTotal number of observations for the link.
n_validNumber of finite ata values.
n_infNumber of infinite ata values.
n_nanNumber of NaN ata values.
valid_ratioProportion of finite ata values (\(n\_valid / n\_obs\)).
Relationship between wt and f
Both wt and f are weighted averages of the observed ata factors,
but they differ in how weights are assigned and which observations
are included:
wtVolume-weighted mean: \(wt = \sum C_{i,k+1} / \sum C_{i,k}\). Computed from all rows where
value_fromandvalue_toare finite, including rows where either value is zero. Independent ofalpha.fWLS-estimated factor. Only rows where
value_from > 0are used, sincevalue_from = 0causes numerical issues in the WLS weights (\(w = value\_from^{\alpha}\)). Whenalpha = 2,fandwtare numerically equivalent (assuming no zerovalue_fromrows). Whenalpha \ne 2, they diverge.
Therefore wt and f can differ for two reasons:
Zero exclusion: rows with
value_from = 0are included inwtbut excluded fromf. This typically affects early development periods where some cohorts have not yet accumulated any claims.Alpha effect: when
alpha \ne 2, the WLS weights differ from the volume weights used inwt, leading to different estimates. Comparingwtandfcan help diagnose whether the choice ofalphamaterially affects the estimated factor.
Weights
When the input "ATA" object contains a weight column (added by
build_ata() when weight_var is supplied), that column is
automatically used as the WLS weight in place of value_from. This
is useful when value_var = "clr", where value_from carries no
exposure information and an external exposure variable such as crp
should be used instead.
Coefficient of variation (cv)
The coefficient of variation is defined as:
$$cv = \frac{SD(f_k)}{\bar{f}_k}$$
where \(f_k\) are the individual observed ata values for link
\(k\) and \(\bar{f}_k\) is their arithmetic mean. The cv
reflects the relative spread of observed factors across cohorts,
regardless of the exposure scale. It is used by
find_ata_maturity() as one of the criteria for determining the
maturity point.
Relative standard error (rse)
The relative standard error is defined as:
$$rse = \frac{SE(\hat{f}_k)}{\hat{f}_k}$$
where \(SE(\hat{f}_k)\) is the standard error of the
WLS-estimated factor. Unlike cv, which treats all cohorts equally,
rse gives more weight to cohorts with larger exposures (via the
WLS weights). A small rse indicates that the WLS estimate is
precise, which tends to occur when: (1) there are many cohorts,
(2) exposures are large, and (3) the observed ata values are
consistent across cohorts.
