Estimating transition coefficients for reconstructing coherent series of mortality by cause of Death

Abstract

Regular revisions of the classification of diseases and the consequent disruptions of mortality series are well-known issues in long-term cause-of-death analysis. Given basic assumptions and medical knowledge about possible exchanges across causes of death in the revision years, redistribution of counts of causes of death into a new classification can be viewed as a constrained optimization problem. Penalized likelihood within a quadratic programming framework allows estimation of exchanges that vary smoothly over age groups. The approach is illustrated using both German data on malignant neoplasms and French data on heart diseases.

Keywords

classification of causes of death Inequality constraints mortality series quadratic programming smoothing

1 Introduction

Unlike medical and epidemiological research, demographic studies often use data based on national populations. In mortality research, sex, age at death and cause of death (CoD) often are the only information available. While deaths are easily categorized by sex and age, the classification by CoD is less clear-cut: classification schemes change over time and are specific to each country. By CoD, we refer to the medical description of the harmful condition that leads to death. We do this without considering factors that jointly lead to this condition; for example, lung cancer and liver cirrhosis are causes of death, but not the associated smoking and drinking behaviour.

A widely used system for coding data on causes of death is the International Statistical Classification of Diseases and Related Health Problems (ICD), currently maintained by the World Health Organization (WHO). Since the classification was first published in 1893, it has been continuously upgraded and revised to reflect progress in medical knowledge (Moriyama et al., 2011). Currently in its tenth revision, the ICD-10 contains over 10 000 items, that is, codes for diseases and causes of death. Previously, ICD-9 contained nearly 5 000 and ICD-8 over 2 000 items.

An evident consequence of every revision is that time series of mortality by cause often exhibit disruptions that are not due to actual changes in mortality trends or levels but to the changes in the coding system. When changes are either due only to the merging of several items or to the division of one item into several new ones, discontinuities caused by the change in coding are relatively easy to deal with. Unfortunately, more common is the fusion of several old and some newly adopted items to a new CoD without an evident immediate solution. Such types of pooling are commonly labelled as ‘complex associations’ by demographers (Pechholdová et al., 2017). We will stick to this expression here, despite the different meaning of associations in statistics. National statistical offices rarely produce a double classification (cross-tabulation of deaths by both current and previous revisions) that would allow to directly redistribute deaths of the previous periods according to the new classification.

Therefore, a pressing problem in mortality studies is the correct reconstruction of coherent time series of deaths by cause, which is a preliminary step for any further analysis of cause-specific mortality trends. Several methods have been proposed in the literature.

Rey et al. (2011) suggest assessing the presence of disruptions by using local polynomial kernel smoothing and estimate associated correction factors. These factors are then incorporated in the reconstruction of cause-specific mortality series by age and sex by means of generalized additive models. Time series models with constraints for achieving consistent numbers across causes of death have been proposed by van der Stegen et al. (2014). However, these methods do not fully integrate knowledge about the medical contents of changes in the ICD revision, and they treat age groups independently, neglecting their natural order.

Alternatively, important results have been obtained using a procedure that redistributes death counts from an earlier revision across causes in the newer classification by the construction of concordance tables, thereby linking the items in two successive ICD revisions based on medical content (Meslé and Vallin, 1996; Janssen and Kunst, 2004; Pechholdová, 2009; Grigoriev et al., 2012; Pechholdová et al., 2017). In short, this approach uses a correspondence table that, based on medical and clinical knowledge, makes it possible to match CoDs between two classifications. This table identifies independent groups of CoDs that fully exchange their death counts. In the second step, transition coefficients are computed. These transition coefficients are the proportions of deaths that move between old and new items. They are identified by determining reasonable trends for each cause and age. Visual inspection of the cause- and age-specific trends is crucial for making eventual ex-post corrections. This approach involves considerable manual effort and sometimes requires subjective adjustments.

In this article, we suggest a novel methodology that combines existing techniques to mechanize the estimation of the transition coefficients in associations across old and new CoDs. Within a mathematical structure, our approach combines medical knowledge contained in a given correspondence table, and an inherent demographic assumption: proportions among old CoDs are equal in the transition years (Meslé and Vallin, 1996). In other words, we assume that total number of deaths in the last year of the old revision and in the first year of the new revision are equally distributed among old CoDs. In other words, we assume that deaths in the first year of the new revision are distributed among old CoDs like deaths are actually classified in the last year of the old revision.

Under these assumptions, the system for estimating transition coefficients can be formulated as a least-squares problem with inequality constraints that guarantee that the transition coefficients are bounded between 0 and 1. Moreover, we generalize the method so that the coefficients vary smoothly over age. The estimation of the proposed model can thus be addressed by using a penalization approach within a quadratic programming (QP) framework. QP allows us to easily incorporate our inequality constraints, while smooth estimates are obtained by putting a penalty on differences of neighbouring transition coefficients. In this way, we can routinely redistribute death counts and consequently reconstruct coherent series by CoDs.

The remainder of the article is organized as follows. A typical example of disruption due to a revision of the ICD will set the stage in the following section, after which we will introduce the proposed model. Here, different assumptions on the age patterns of transition coefficients will be presented, and corresponding estimation procedures will be suggested. In Section 4, we present an application of the approach to German deaths from malignant neoplasms and to French data on heart diseases. More complicated and more comprehensive applications are possible without changing the whole model structure. A critical discussion of the method concludes the article.

2 Example and assumptions

As an example of disruptions due to changes in the coding, we present the case of West German deaths between two ICD revisions. ICD-9 and ICD-10 were in use over the periods 1979–1997 and 1998–2013 (last available year), respectively. Here, we restrict our attention to some malignant neoplasms. In order to show longer reconstructed series, we also include data from ICD-8 (1968–1978) that were already redistributed according to the ICD-9 classification. The observed data are thus two three-dimensional arrays in which death counts are categorized by CoD, five-year age groups and calendar year, one array for each classification period. In order to avoid comparability issues for diseases that affect children and young adults differently, we only consider age groups starting from ages 30–34. The last available open-ended age category is 85+.

Figure 1 shows time series of death counts for two specific age groups (50–54 and 75–79). Disruptions in 1998 due to a change in CoD classification are evident. For ease of presentation, we identify CoDs in the old and new classifications with upper- and lower-case letters, respectively. A detailed breakdown of the series analysed in this example is given in Table A.1.

Figure 1:

Notes: The vertical dashed line indicates the year of the ICD revision. For detailed information about the underlying CoDs, see Table A.1.

The CoDs in this dataset belong to the same group of malignant neoplasms (they are in the same association, as we say). Therefore, we can assume that all deaths due to the old CoDs would have been classified within the new CoDs, if the new classification had existed during the first period. The aim is to reconstruct coherent mortality trends over both periods in accordance with the new classification, that is, to redistribute deaths that occurred in the first period across the new CoDs.

Changes in medical practices can also lead to specific irregularities in a mortality time series by CoD. However, we will assume that disruptions in the revision year are solely due to changes in the official classification, without considering variations in coding practices arising during the same year. This is a reasonable assumption, since it considers the only certain and measurable events in cause-specific mortality series, allowing a statistical treatment of the problem. Other changes that occur during the year of transition will be then regarded as part of the revision process, and known irregularities in other years will have to be treated separately.

As mentioned, all possible exchanges across CoDs in the two revisions are defined in a correspondence table. This table can be succinctly written as a Boolean $m \times n$ matrix $G$ in which columns are indexed by the CoD in the first period (the old coding regime), while rows are indexed by the new CoD. Cells equal to $1$ ( $0$ ) identify possible (impossible) exchanges between old and new CoDs. The number of CoDs in the new coding hence is denoted by $m$ , the number of CoDs in the old scheme is $n$ .

Medical knowledge and a detailed understanding of the changes between two ICD revisions are necessary to create a correspondence table. In our German example, we used information provided by Pechholdová (2009), and we thus consider $G$ as a given input in our model:

\begin{array}{l} \begin{matrix} A & B & C & D & E & F \end{matrix} \\ G = \begin{matrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{matrix} [\begin{matrix} 0 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 1 & 1 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 1 & 1 \end{matrix}] . \end{array}

(2.1)

Equation (2.1) informs us, for example, that counts belonging to the old CoD $B$ should be redistributed across new CoDs $[a, b, c, e]$ . Alternatively, we can read it the other way around: deaths classified in item $e$ by the ICD-10 were classified as either $A$ , $B$ or $F$ in the ICD-9 scheme.

Given the information in (2.1), we propose a method to estimate the so-called transition coefficients: These are proportions of counts in the categories of the old CoD coding that will be re-classified into the new CoD categories. Each age group may have its own transition coefficients.

We maintain the assumption made by Meslé and Vallin (1996) and later retained in other studies on reconstruction of mortality series by CoD (Pechholdová, 2009; Grigoriev et al., 2012; Pechholdová et al., 2017): The proportions across the old CoDs are equal in the two years right before and after the change in CoD coding. In our West German example, this means that proportions of counts classified as [A, B, ... , F] in 1997 are equal to the proportions of expected counts classified as [A, B, ... , F] in 1998. These expected counts are the proportions of the observed total death counts (per age-group) in 1998 in case they were classified still according to the old CoD scheme. Estimation will thus be based on death counts from the last year of the old classification and from the first year of the new classification.

For the complete reconstruction of cause-specific mortality series, we fix the estimated transition coefficients for the entire old period and redistribute all death counts in the old CoDs across the new CoDs. As a result, mortality trends in the old period (but expressed in the new CoD coding) will depend solely upon the actual trends among old CoDs.

3 Estimation of transition coefficients

In this section, we will build up our model and present the associated procedure for estimating the transition coefficients. We will start from the simple setting where all ages share the same transition coefficients. We will then generalize this approach to coefficients that may vary over ages, but in a smooth manner.

3.1 Constant transition coefficients over ages

The simplest assumption is to disregard age and redistribute the total number of deaths. Though relatively unrealistic, this first approach mirrors the conventional method in Meslé and Vallin (1996) and Pechholdová (2009), and it lays the groundwork for further steps.

Within this one-dimensional setting, we work with two vectors of death counts in the transition years. We denote with v = (v₁, ... , v_i, ... , v_m)′ the observed deaths in the first year of the new revision. The vector z = (z₁, ... , z_j, ... , z_n)′ defines the observed death counts in the last year of the old classification.

The requirement that the relative distribution of the causes of death should be the same in the two transition years can be expressed either in the new or in the old CoD coding. In the following, we express the observed proportions in the CoD categories in the old classification ( $n$ categories):

ρ j = \frac{z_{j}}{\sum_{j = 1}^{n} z_{j}} j = 1, . . ., n .

(3.1)

By assuming the same distribution among old CoDs in the transition years, we would expect that the total number of deaths observed in the first year of the new classification, $\sum_{i = 1}^{m} v_{i}$ , would be

ρ_{j} \cdot \sum_{i = 1}^{m} v_{i} = u_{j},

in case they still would have been categorized according to the old scheme. We denote the vector of these expected counts by

u = {(u_{1}, \dots, u_{j}, \dots, u_{n})}^{'} with u_{j} = z_{j} \frac{\sum v_{j}}{\sum z_{j}} .

(3.2)

The matrix of transition coefficients $T$ has the same dimension as the matrix $G$ in (2.1):

T = [\begin{matrix} t_{11} & t_{12} & \dots & t_{1 n} \\ t_{21} & t_{22} & \dots & t_{2 n} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t_{m 1} & t_{m 2} & \dots & t_{m m} \end{matrix}] .

(3.3)

Each element $t_{ij}$ is the proportion of the new category $i$ that was received from the old category $j$ . Consequently, $\sum_{j = 1}^{n} t_{ij} = 1$ for all $i = 1, \dots, m$ . The $m \cdot n$ transition coefficients are to be estimated. If there is no exchange between CoDs, as indicated by a $0$ in matrix $G$ , then the corresponding transition coefficient automatically equals zero. Therefore, in practice, fewer than $m \cdot n$ elements $t_{ij}$ need to be found.

For a matrix T of transition coefficients and observed numbers of death v = (v₁, ... , v_m)′ in the first year of the new classification, the multiplication

T^{'} v = {(\sum_{i = 1}^{m} t_{i 1} v_{i}, \dots, \sum_{i = 1}^{m} t_{i n} v_{i})}^{'}

(3.4)

gives the $n$ -vector of (expected) observations that would result from the observed ones in the old coding system. The assumption of an unchanged distribution in the transition years hence requires that

T' v \overset{!}{=} u

(3.5)

with u as defined in Equation (3.2). In other words, we are looking for transition coefficients $t_{ij}$ that can redistribute the actually observed numbers of deaths $v$ over the old CoDs, so that the expected numbers of deaths in $u$ result.

Estimating proportions and working within a given association where death counts are fully exchanged, two conditions must be fulfilled.

Transition coefficients must be between 0 and 1. An estimated transition coefficient equal to 0 implies the removal of a link between an old and new CoD, even though it was assumed to be present in the correspondence table in $G$ . This result might occur due to distinctive patterns in the death counts in the transition years, which should not jeopardize the reconstruction in earlier years when the lost link might be needed again. Therefore, we set the lower bound $ε$ to transition coefficients that have a nonzero entry in matrix $G$ . The value of $ε = 10^{- 5}$ works well in our setting.

All deaths in the first year of a new revision need to be redistributed according to the old classification, that is,

\sum_{j = 1}^{n} t_{i j} v_{j} = v_{j} .

For a single age group, we could immediately simplify this equation to $\sum_{j = 1}^{n} t_{ij} = 1$ , however, we keep the previous formulation because it will turn out to be useful when more ages are considered simultaneously, by smoothing $t_{ij}$ over ages.

Alternatively, the goal stated in (3.5) and condition 2 can be viewed as a way to fill a matrix with prescribed row and column sums, and where entries are functions of the unknown transition coefficients:

M [\begin{matrix} t_{11} v_{1} & t_{12} v_{1} & \dots & t_{1 n} v_{1} \\ t_{21} v_{2} & t_{22} v_{2} & \dots & t_{2 n} v_{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t_{m}_{1} & t_{m 2} v_{m} & \dots & t_{m n} v_{m} \end{matrix}] \begin{matrix} v_{1} \\ v_{2} \\ ⋮ \\ v_{m} \end{matrix} . \begin{matrix} u_{1} & u_{2} & \begin{matrix} \dots & u_{n} \end{matrix} \end{matrix}

(3.6)

Whereas sums over rows concern observed death counts, column sums concern expected deaths based on the equal proportionality assumption. Constraints associated with row sums will thus be more stringent within our estimation procedure.

Equation (3.5) can be translated into a least-squares optimization problem. We denote with $τ = vec (T)$ the series of all possible transition coefficients. The aim is to minimize the following objective function:

\underset{τ}{\min imize} f (τ) = \frac{1}{2} {‖ X_{τ} - u ‖}^{2}

(3.7)

where the full design matrix $ℝ^{m \times m n}$ is given by

X = I_{n} \otimes v^{'} .

$I_{n}$ is the identity matrix of size $n$ and symbol $\otimes$ denotes the Kronecker product of two matrices.

By combining minimization in (3.7) and the previously presented conditions, we can treat the whole system as a QP problem

\underset{τ}{\min imize} f (τ) = \frac{1}{2} τ^{'} X^{'} X τ - τ^{'} X^{'} u s .t . B_{τ} \geq b,

(3.8)

where

B = [\begin{matrix} B^{L} \\ B^{U} \\ B^{S} \end{matrix}], b = [\begin{matrix} b^{L} \\ b^{U} \\ b^{S} \end{matrix}] .

The condition that t_ij ϵ[ϵ, 1] is expressed in the first two constrained matrices and vectors:

\begin{matrix} B^{L} = I_{m n}, B^{U} = - I_{m n} \\ b^{L} = \in 1_{m n}, b^{U} = - 1_{m n} . \end{matrix}

Condition 2 about row sums is incorporated into the system by:

B^{S} = 1_{n} \otimes diag (v e c (- v)), b^{S} = - v .

While introduced as equality, this condition was incorporated in the previous estimation procedure as a series of inequality constraints. We opted for this formulation since, for specific data and for transition coefficients bounded between $ε$ and 1, the previous constrained system of equations may not have any solution. This frequently happens when the estimation is performed for a specific age group, as we will show in Section 3.2 and in the application in Section 4.1. In these cases, especially when there is no issue in the correspondence table for all ages together, it is preferable to weaken the constraints rather than rebuild the correspondence table and restart the whole process from scratch. Nevertheless, whenever a previous quadratic programming problem admits one or more solutions, it always opts for estimated $t_{ij}$ on the boundary region of its set of feasible solutions, that is B^Sτ = b_S.

Given information about the presence of possible exchanges between causes expressed in the correspondence table $G$ , we discard all elements that will not be estimated. Specifically, we will remove columns in $X$ and $B$ and elements in $τ$ , $b^{L}$ and $b^{U}$ corresponding to $vec (G) = 0$ . We denote by $q \leq mn$ the number of nonzero elements in $vec (G)$ , that is, the number of effectively estimated transition coefficients.

Note that both the design matrix $X$ and the constraints are mechanically defined in the same way for all possible associations, and only the subsequent removal of columns and elements is association-specific, via its correspondence table $G$ .

In particular, it could happen that, given a specific $G$ , some $t_{ij}$ can be immediately identified. For instance, for the German example in (2.1), t₁₂ may be equal to 1 if the constraint, $\sum_{j = 1} t_{1 j} v_{1} \leq v_{1}$ reduces to a simple equality constraint: t₁₂v₁ = v₁. Still, we include them to provide a completely automated approach for estimating all transition coefficients simultaneously. Moreover, coefficients that are uniquely defined retain this feature only when each age group is analysed independently. When smoothness over age groups is incorporated, these specific coefficients may depart from their automatically estimated values (cf. Section 4.1).

As presented, the model has more unknown transition coefficients than observed independent marginals. Consequently, the associated system will have infinitely many solutions. We thus need to introduce additional information in order to solve an ill-posed problem and select one of the equally optimal solutions. To ensure a unique solution, one could employ a simple ridge penalty approach that penalizes the squared norm of $τ$ vector and favours smaller $t_{ij}$ (Hoerl and Kennard, 1970). We opt for an alternative and more suitable solution in our setting. We regularize the system by mirroring the reasoning behind the manual computation of the transition coefficients and assume that estimated proportions between cells and marginal totals in (3.6) should be as close as possible (Meslé and Vallin, 1996, p. 79).

To illustrate this approach, we take an $m \times n$ matrix that mirrors $M$ in (3.6), but in which all automated transition coefficients and associated cells have been removed. We denote with $\overset{⌣}{v} = {({\overset{⌣}{v}}_{1}, \dots, {\overset{⌣}{v}}_{i}, \dots, {\overset{⌣}{v}}_{m})}^{'}$ and $\overset{⌣}{u} = {({\overset{⌣}{u}}_{1}, \dots, {\overset{⌣}{u}}_{j}, \dots, {\overset{⌣}{u}}_{n})}^{'}$ the corresponding row and column sums.

Among infinite solutions from the constrained system in (3.8), we will favour estimated coefficients, which satisfies the following condition:

\frac{t_{i j} v_{i}}{{\overset{⌣}{v}}_{i}} = \frac{{\overset{⌣}{u}}_{j}}{\overset{⌣}{s}}

with, $\overset{⌣}{s} = Σ_{i} {\overset{⌣}{v}}_{i} = Σ_{j} {\overset{⌣}{u}}_{j}$ that is, each cell in, M devoid of the cells with automated transition coefficients, and divided by the corresponding row sum, should be as close as possible to the ratio between the corresponding column sum and the overall matrix sum.

We can express all conditions for all transition coefficients in a single matrix as follows:

H = (h_{i j}) = \frac{{\overset{⌣}{u}}_{j} {\overset{⌣}{v}}_{i}}{v_{i} \overset{⌣}{s}},

(3.9)

and incorporate this additional constraint within (3.8) through a small regularization parameter. κ_r The QP problem becomes:

\underset{τ}{minimize} f (τ) = \frac{1}{2} τ^{'} (X^{'} X + P_{r}) τ - τ^{'} X^{'} u + p_{r} s . t . B_{τ} \geq b,

(3.10)

where

\begin{array}{l} P_{r} = κ_{r} I_{m n} \\ p_{r} = κ_{r} vec (H) . \end{array}

As before, we can remove all elements in vec(H) corresponding to vec(G) = 0 and reduce P_r to P_r = κ_rI_q.

In other words, among all possible solutions given by (3.8), we select the transition coefficients that maintain an internal proportion over rows and columns similar to the observed proportions at the marginal level in M.

Whereas the constraints BÄ ≥ b are used to force all solutions within the [ϵ, 1] interval and to guarantee known sums over rows in (3.6), the penalty terms P_r and p_r act ‘gently’ to lead all equally possible solutions, in terms of the residual sum of squares, towards a better internal proportionality. The value of κ_r is then selected to be rather small. Here, we use κ_r = 10^-6.

Regarding the estimation approach, we have used the dual method of Goldfarb and Idnani (1983) implemented in the R-routine solve.QP to solve the QP problem, iterating Cholesky and QR factorizations and procedures.

3.2 Changes in transition coefficients over age groups

Applied to the total number of deaths over the age dimension, the approach can be adapted for every single age group, independently estimating series of t_ij^k for each age group k = 1, ... , ω. Alternatively, we can construct a system in which all t_ij^k over $k$ are simultaneously estimated. Specifically, we generalize the system in (3. 10) in a two-dimensional setting, leading to a system with $mn ω$ unknown transition coefficients.

In this setting, we make use of the matrices of death counts from the last year of the old classification Z = (z_jk) and from the first year of the new classification V = (v_ik). We can thus compute the matrix of expected death counts across old CoDs and age groups for the first year of new revision:

U = (u_{j k}) = Z diag [vec (\frac{1_{m} V}{1_{n} Z})] .

In two dimensions, the response in the constrained optimization problem is matrix $U$ arranged as a column vector; that is, u = vec(U). Both design matrix $X$ and constraint matrix B^s have a block-diagonal structure:

\begin{array}{l} X = diag (X^{1}, \dots, X^{k}, \dots, X^{ω}) \\ B^{S} = diag (B^{S, 1}, \dots, B^{S, k}, \dots, B^{S, ω}), \end{array}

(3.11)

where

\begin{array}{l} X^{k} = I_{n} \otimes V^{'} ., k \\ B^{S, k} = 1_{n} \otimes diag (vec (V ., k)) . \end{array}

Associated constraint vector b^s is given by b^S = -vec(V) and other components are augmented versions of previously presented elements:

\begin{array}{l} B^{L} = I_{m n ω}, B^{U} = - I_{m n ω} \\ b^{L} = \in 1_{m n ω}, b^{U} = - 1_{m n ω} . \end{array}

Concerning the regularization components, $P_{r}$ is merely equal to P_r = κ_rI_mnω and p_r = κ_rvec(H), where matrix $H$ takes the following form:

H = [vec (H^{1}) : \dots : vec (H^{k}) : \dots : vec (H^{ω})]

with $H^{k} = (h_{i j}^{k}) = \frac{{\overset{⌣}{U}}_{j k} {\overset{⌣}{V}}_{i k}}{V_{i k} {\overset{⌣}{s}}^{k}}$ . Here matrices $\overset{⌣}{V}$ and $\overset{⌣}{U}$ contain all, t_ij^k V_ik, but only those containing automated computed coefficients and ${\overset{⌣}{S}}^{k} = Σ_{i} {\overset{⌣}{V}}_{i k} = Σ_{j} {\overset{⌣}{U}}_{i k}$ .

By applying (3.10) with these generalized elements, we simultaneously obtain transition coefficients for each possible link and for all age groups, independently considered. Estimation results are thus identical to those obtained by treating each age group separately using the procedure presented in Section 3.1.

On the one hand, this approach is a straightforward extension over age of the conventional method, though embedded in a mathematical structure that routinely allows estimation of the coefficients. On the other hand, such generalization does not account for the continuous behaviour of mortality changes over age, that is, redistribution between CoDs in neighbouring age groups should not differ excessively. In other words, in the absence of specific and usually well-understood irregularities, a smooth change of the estimated transition coefficients over age groups is a reasonable and flexible approach.

Moreover, wiggly behaviour of t_ij^k over $k$ might generate unreasonable trends in reconstructed mortality series. This situation is particularly worrisome when coefficients are estimated on CoDs that present low death counts in the revision year but a larger number of deaths in early periods.

It is thus reasonable to assume that underlying trends of t_ij^k vary smoothly over ages. We introduce this smooth behaviour to the series of coefficients by adding a roughness penalty in (3.10):

\underset{τ}{minimize} f (τ) = \frac{1}{2} τ^{'} (X^{'} X + P_{r} + P_{S}) τ + u^{'} X_{τ} + p_{r} s .t . B_{τ} \geq b

(3.12)

The term $P_{s}$ measures the roughness of neighbouring coefficients over age groups:

P_{s} = λ D'_{d} D_{d} \otimes I_{n},

where $D_{d}$ computes the $d$ th order differences of t_ij^k over. k (Eilers and Marx, 1996). In other words, in this two-dimensional setting, the vector $τ$ consists of $ω$ blocks, one for each age group. Consequently $P_{s}$ is constructed in such a way that smoothness is enforced over the first (second, third,...) elements of these blocks.

The smoothing parameter $λ \geq 0$ controls the trade-off between fidelity to the data and smoothness over age groups of the transition coefficients: The larger $λ$ , the smoother $t_{ij}^{k}$ will be over age groups, especially when few deaths are involved in the algorithm. In the following, we use second-order differences ( $d = 2$ ), and we assume isotropic penalization of the series of transition coefficients, that is, a single $λ$ for all $t_{ij}^{k}$ over $k$ . Due to the combination of smoothing penalties within a QP approach as well as the inclusion of regularization into the system of equations, the smoothing parameter $λ$ could not be selected by standard criteria, so we chose $λ$ based on visual inspection.

Moreover, as will be shown in Section 4, differences between reconstructed series based on unsmooth and smooth transition coefficients are negligible, at least for the considered time horizons. The smoothing parameter can thus be tuned subjectively to incorporate a certain amount of smoothness, sufficient to prevent inconsistent trends in long-term reconstructed age- and cause-specific mortality series. The choice of $λ$ can also depend upon the CoDs and age groups under study.

Three different assumptions could be made with respect to the changes of t_ij^k over k: a common transition coefficient for all ages (cf. Section 3.1), coefficients independently estimated for each age (using λ = 0 in (3.12)) and a smooth change of the estimated transition coefficients over age groups.

Furthermore, a fourth option might be applicable: assuming constant transition coefficients over ages by selecting d = 1 in D_d and extremely large smoothing parameters: λ = 10⁸. In this way, we approach the limit for strong smoothing, which is a polynomial of degree $d - 1$ (Eilers and Marx, 1996), with $d = 1$ giving a constant line over $k$ . Outcomes from this last approach will slightly differ from the results obtained in Section 3.1, although constant t_ij^k over $k$ will occur in both cases. Unlike in Section 3.1, a two-dimensional model with $d = 1$ and large $λ$ will account for differences in terms of death counts, giving more leverage to ages with more deaths.

As in the one-dimensional setting, we remove from all the components of the constrained optimization problem in (3.12), all elements corresponding to vec(G) = 0 for each age, leading to a system with $q ω$ unknown transition coefficients. Routines for estimating t_ij^k using all of the presented approaches were implemented in R (R Development Core Team, 2018) and are available on the journal′s website.

4 Applications

4.1 Malignant neoplasms in Germany

Figure 2 presents the estimated transition coefficients over age groups, arranged as in the correspondence table $G$ in (2.1) for the German example introduced in Section 2. We plot all three different approaches previously presented. The coefficients depicted by dashed horizontal lines in Figure 2 are also presented in Table 1 and, by definition, these coefficients are constant for all ages. For comparison, outcomes with a ridge penalty are also presented. The second approach is to apply the same method for each age group considered independently, or, equally, by imposing $λ = 0$ in the smooth estimation procedure. See the dotted lines in Figure 2. These outcomes present rather irregular behaviours due to fluctuations in death counts over ages in the years of transition.

Table 1

Estimated transition coefficients τ, ‰. Constant coefficients over age groups. Association for some malignant neoplasms (cf. Section 2)

${\hat{t}}_{i j}$	Old causes
New causes	A	B	C	D	E	F
a		1000.0
b	934.6 (828.2)	65.4 (171.8)
c		284.8	715.2
d					1000.0
e	759.4 (812.6)	53.1 (0.01)				187.4
f				1000.0
g				1000.0
h				19.8	21.7	958.5

Notes: Shaded areas are excluded since no link exists between the two items. In parentheses are transition coefficients estimated using ridge penalty when different from the suggested approach.

Figure 2:

Notes: Size of the circles indicates number of deaths involved in the age- and cause-specific link based on coefficients independently fitted for each age.

Note that when we assumed a single coefficient for all age groups, some transition coefficients are necessarily estimated equal to definite values: see t_cB, t_cC, t_eF, t_hD, t_hE, t_hF in Table 1. Consequently, differences between the proposed regularization and ridge regression are visible only for the remaining $t_{ij}$ .

In Figure 2, smooth transition coefficients are portrayed by solid lines. For this example, we use a smoothing parameter λ = 10⁶. Most of the time the smooth estimates approximate the age-independent coefficients, especially when more death counts are involved in the age- and cause-specific link. The ‘size’ of these links is correlated with the size of the circles in Figure 2. This information helps to explain why the smooth estimates are not exactly smooth correspondents of the age-independent ones: when few deaths are involved, enforcing smoothness is preferable at the cost of small errors in terms of age-specific marginal sums in the correspondence matrix (3.6).

In order to assess the performance of each approach, we compute the total absolute loss in death counts, that is, we sum the absolute differences between estimated and observed/expected marginals in (3.6). Whereas smooth transition coefficients lead to an error of only 1.8% with respect to the overall sample size, using constant coefficients over ages brings an error of 2.6%. Age-independent transition coefficients with wiggling behaviour leads to a relative error of 0.2%. This last value is due to the fact that, for some ages, coefficients are estimated within the region expressed by inequality constraints $\sum_{j = 1}^{n} t_{i j} v_{i} < v_{i}$ . This leads to a small discrepancy between actual and estimated marginals in $M$ .

Figure 3:

Notes: Vertical dotted lines depict year of change in the ICD revision. For detailed information about the underlying CoDs, see Table A.1. For figures in colour, please refer to the online version.

The left panel of Figure 3 presents the reconstructed death series for the age group 75–79 and for all three approaches. Note that estimating coefficients based solely on data in this age group (dashed-line) may lead to unreasonable outcomes when few counts are involved in the reconstruction. Moreover, for certain CoDs, disruptions in the year of revision are noticeable when a constant coefficient is estimated for all ages (see dotted lines for CoD $b$ and $f$ ). Ultimately, smooth $t_{ij}$ over ages offer the most suitable and robust results in terms of reconstructing coherent series of mortality by CoD.

The development of CoD $h$ (malignant neoplasm without specification of site) over both age and time is portrayed in Figure 3, right panel. This reconstruction was based on smooth transition coefficients, and no clear sign of disruption is detectable for all age groups in the last year of the ICD-9 classification, 1997.

4.2 Heart diseases in France

In this section, we present the results for an association of some major heart diseases in France. A complete list of items with their original names is given in Table A.2. We start from ages 30–34, and 100+ represents the last open age group. Here, we analyse data from two ICD classifications: ICD-8, which was in use from 1968 to 1978, and ICD-9, which was in use from 1979 to 1999.

A crucial input for estimating transition coefficients is the correspondence table G, which describes possible exchanges across CoDs between the two classifications. Here G, was provided by Meslé and Vallin (1996), and it is given by

\begin{array}{l} \begin{matrix} A & B & C & D & E & F \end{matrix} \\ G = \begin{matrix} a \\ b \\ c \\ d \\ e \\ f \\ g \end{matrix} [\begin{matrix} 1 & 1 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 & 0 \end{matrix}] . \end{array}

(4.1)

Similar to the German example, we can build up a system of equations as in (3.10). Table 2 shows the outcomes for this group of heart diseases in France at the introduction of ICD-9 in 1979, in which we summed data over all ages. Note that due to the proportionality constraints in (3.9) and the structure of (4.1), our approach will favour equal transition coefficients for third and fourth columns. For comparison, Table 2 shows the estimation using a ridge penalty approach (in parentheses) and the transition coefficients computed manually for this association by (Meslé and Vallin, 1996) (p. 79; in square brackets).

Table 2

Estimated transition coefficients τ, ‰. Constant coefficients over age groups. Association for major heart diseases

${\hat{t}}_{i j}$	Old causes
New causes	A	B	C	D	E	F
a	6.1 [6]	45.3 [45]	62.3 (0.01)[62]	886.3 (948.6)[887]
b			65.1 (493.0)[67]	934.9 (507.0)[933]
c			66.1 (0.01)[65]	933.9 (999.99)[935]
d			65.2 (491.8)[57]	934.8 (508.2)[943]
e			65.0 (191.6)[65]	935.0 (808.4)[935]
f			63.7 (46.0)[65]	933.2 (950.9)[932]	0.5 [1]	2.6 [3]
g			64.0 (231.9)[65]	936.0 (768.1)[935]

Notes: Shaded areas are excluded since no link exists between the two items. In parentheses () transition coefficients estimated using ridge penalty. Coefficients provided in Meslé and Vallin (1996) are given in square brackets [].

Figure 4:

Notes: Size of the circles indicates number of deaths involved in the age- and cause-specific link based on coefficients independently fitted for each age.

Figure 4 shows the outcomes for this group of heart diseases in France at the introduction of ICD-9 in 1979. When we summed data over all ages, the presented regularization penalty provided a means to extract the most suitable solution based on our assumptions (see dashed horizontal lines in Figure 4). The fluctuating dotted lines in Figure 4 illustrate the outcomes of an age-independent estimation of transition coefficients. As in the previous example, the size of the circles represents the number of deaths in each cause-age combination based on the age-independent estimation of t_ij^k. Finally, the solid curves portray smooth estimated transition coefficients over age. As expected, these curves closely follow cause-age combinations with more deaths.

In terms of total absolute losses in death counts, assuming constant coefficients for all ages leads to an error of 2.3%. A better outcome is obtained when we smooth the transition coefficients, that is,absolute losses equal to 1.9%. When the estimation procedure is applied at each age independently, coefficients are always so that $\sum_{j = 1}^{n} t_{i j} v_{i} < v_{i}$ holds and therefore the regularization simply selects one of the possible solutions, equally accurate with respect to the actual marginals. As a consequence, a perfect fit for each age group between estimated and actual row and column sums in M is obtained. However, this high-precision is obtained at the price of unrealistically erratic coefficients over age groups.

Figure 5:

Notes: Vertical dotted lines depict year of change in the ICD revision. For detailed information about the underlying CoD, see Table A.2. For figures in colour, please refer to the online version.

Figure 5 presents the final outcomes obtained using the estimated transition coefficients to reconstruct coherent mortality series by CoD. The left panel shows reconstructed series of death counts for a specific age group (75–79) for the three different assumptions. As mentioned, outcomes due to smooth and age-independent transition coefficients are very much alike (solid and dashed lines). Exceptions are visible for CoDs with low death counts. Redistributing death counts for each age group by using transition coefficients estimated for all ages can lead to unsatisfactory disruptions in the year of transition (see dotted lines). The right panel in Figure 5 presents a shaded contour map of death counts for a specific CoD ( $g$ : cerebrovascular disease) over both age groups and years, reconstructed based on smooth transition coefficients: It is hard to detect any disruption in 1978, the last year of the old classification.

5 Concluding remarks

Changes in the classification of causes of death lead to disruptions in mortality series by CoD. Therefore, a comprehensive analysis of mortality trends involves the important task of reconstructing series that coherently consider the CoD dimension. An important step in this task is the computation of transition coefficients, that is, the proportions of deaths moving from old to new CoDs in the year of transition between classifications.

The first step in the reconstruction procedure is the identification of associations: independent groups of old and new CoDs that fully exchange their death counts in the transition years, however, with possible exchanges, fully or partially, between causes in the association. The construction of these associations is based on medical content in both revisions, and here they are considered a fixed input. Often these associations involved several CoDs, and estimation of transition coefficients is not evident and unique. Conventional approaches rely on manual computation of these coefficients, mainly based on experience acquired in the field. Moreover, this procedure is commonly performed for all age groups together and ex-post adjustments are needed for a more coherent reconstruction in each age group.

In this article, we propose a model to routinely estimate transition coefficients. Starting from basic assumptions on possible exchanges across causes of death between two revisions, we describe the problem as a least-squares optimization with inequality constraints. A QP approach allows us to estimate these coefficients within an interval, which is needed since the unknowns are proportions. Moreover, as we are dealing with an ill-posed problem, we propose a specialized regularization to mirror the logic used by researchers in the field.

Three possible options are provided in the estimation of transition coefficients. First, simple solution treats all age groups together, and the same coefficient is estimated for all ages. The second and more generalized approach assumes a series of transition coefficients that is estimated for each age group independently. The final and more reasonable compromise assumes smooth behaviour of the coefficients over age groups. A roughness penalty is introduced to incorporate this assumption in a system that simultaneously deals with all age groups.

We present two applications whose outcomes are more than satisfactory. First, we considered an association with specific malignant neoplasms in Germany and the disruptions due to change between ICD-9 and ICD-10 in 1997–1998. Estimated transition coefficients capture the levels of exchange across old and new CoDs. Associated reconstructed mortality series clearly show no disruptions based on visual examination. As a second example, we used French data on major heart diseases over the revisions ICD-8 and ICD-9. Disruptions are due to change in the classification in 1978–1979. Outcomes here are equally good. Further analysis to statistically assess the presence of disruptions in cause-specific mortality series goes beyond the scope of this article, and it was presented elsewhere (Camarda and Pechholdová, 2014).

The assumption that the proportions of counts by CoD are equal as in the years of classification revision, though reasonable, needs further consideration. In the current version of the model, we weaken this assumption in the estimation procedure if there is good reason to believe that the correspondence table is correct. Alternatively, we plan to extend the model assuming a smooth change of the proportions over the new period and back-forecasting these proportions in the last year of the old classification. For modelling proportions, Compositional Data Analysis seems to be a reasonable methodology to employ in this context (Pawlowsky-Glahn and Buccianti, 2011). However, this extension will only modify the way of obtaining the vector of expected deaths ( $u$ ) in (3.12) without modifying the subsequent estimation procedure. Furthermore, this alternative procedure may help to reduce some unstable condition due to small counts for specific CoDs in the years of transition.

Sometimes the presence of possible links between old and new CoDs is unclear within certain associations. Additional regularization might be envisaged to select possible exchanges between CoDs. Regression methods with regularization by the Lasso and the Elastic Net (Tibshirani, 1996; Zou and Hastie, 2005) seem good candidates for extending our approach in this direction.

Finally, we want to point out that the proposed method has been used extensively in the recently established Human Cause-of-Death Database (2018), a collection of coherent time series of cause-specific mortality for 16 countries. The suggested approach can be extended to other contexts where time series are disrupted for known reasons, for example, series with historic border changes.

Acknowledgments

I would like to show my gratitude to Paul Eilers for sharing his pearls of wisdom with me during the preparation of the manuscript. I am deeply grateful to Jutta Gampe for many helpful comments on earlier versions of this article, both statistical and stylistic. The credit for introducing me to the field of cause-of-death analysis goes to Markéta Pechholdová and France Meslé.

Declaration of conflicting interests

The author declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

This work was supported by the ‘AXA project on Mortality Divergence and Causes of Death’ and ‘Project ANR-12-FRAL-0003-01 DIMOCHA’.

Appendix

The original cause-of-death titles

For ease of presentation, in the article, we use upper- and lower-case letters for designating CoDs within a certain association. In this Appendix, we list the original corresponding names as provided by theWorld Health Organization. Table A.1 shows the items for specific malignant neoplasms in West Germany used for the ICD-9 and ICD-10 classifications during the period 1978–1997 and 1998–2013, respectively. These causes were used to illustrate the proposed model and in Section 4.1 to show the outcomes of an application. Table A.2 presents the original CoDs employed in Section 4.2 for a specific group of heart diseases in France for ICD-8 (until 1978) and ICD-9.

Table A.1

Cause of death for specific malignant neoplasms inWest Germany from 1968 to 2013 during two ICD revision periods: ICD-9 and ICD-10. Original WHO codes are also provided

	Code	WHO-code	Cause of Death
1968–1997	A	163	Malignant neoplasm of pleura
	B	164	Malignant neoplasm of thymus, heart, and mediastinum
	C	165	Malignant neoplasm of other and ill-defined sites within the respiratory system
	D	171	Malignant neoplasm of connective and other soft tissue
	E	172	Malignant melanoma of skin
	F	199	Malignant neoplasm without specification of site
1998–2013	a	C37	Malignant neoplasm of thymus
	b	C38	Malignant neoplasm of heart; mediastinum and pleura
	c	C39	Malignant neoplasm of other and ill-defined sites in the respiratory system
	d	C43	Malignant melanoma of skin
	e	C45	Mesothelioma
	f	C47	Malignant neoplasm of peripheral nerves and autonomic nervous system
	g	C49	Malignant neoplasm of other connective and soft tissue
	h	C80	Malignant neoplasm without specification of site

Table A.2

Cause of death for specific heart diseases in France from 1968 to 1999 during two ICD revision periods: ICD-8 and ICD-9. OriginalWHO codes are also provided

	Code	WHO-code	Cause of Death
1968–1978	A	411.0	Other acute and subacute forms of ischaemic heart disease with hypertensive disease
	B	411.9	Other acute and subacute forms of ischaemic heart disease without mention of hypertension
	C	412.0	Chronic ischaemic heart disease with hypertension
	D	412.9	Chronic ischaemic heart disease without mention of hypertension
	E	414.0	Ischaemic heart disease, asymptomatic with hypertension
	F	414.9	Ischaemic heart disease without hypertension
1979–1999	a	411	Other acute and subacute forms of ischaemic heart disease
	b	412	Old myocardial infarction
	c	414.0	Coronary atherosclerosis
	d	414.1	Aneurysm of heart
	e	414.8	Other specified forms of chronic ischaemic heart disease
	f	414.9	Chronic ischaemic heart disease, unspecified
	g	429.2	Cerebrovascular disease, unspecified

References

Camarda

Pechholdová

(2014) Assessing the presence of disruptions in cause-specific mortality series: A statistical approach. In European Population Conference 2014. Budapest: European Association for Population Studies.

Eilers

PHC

Marx

(1996) Flexible smoothing with B-splines and penalties (with discussion). Statistical Science , 11, 89–102.

Goldfarb

Idnani

(1983) A numerically stable dual method for solving strictly convex quadratic programs. Mathematical Programming , 27, 1–33.

Grigoriev

Meslé

Vallin

(2012) Reconstruction of continuous time series of mortality by cause of death in Belarus, 1965-2010 (MPIDR Working Paper 2012-023). Rostock: Max Planck Institute for Demographic Research.

Hoerl

Kennard

(1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics , 12, 55–67.

Human Cause-of-Death Database (2018). French Institute for Demographic Studies (France) and Max Planck Institute for Demographic Research (Germany). Available at www.causeofdeath.org.

Janssen

Kunst

(2004) ICD coding changes and discontinuities in trends in cause-specific mortality in six European countries, 1950–99. Bulletin of the World Health Organization , 82, 904–913.

Meslé

Vallin

(1996) Reconstructing long-term series of causes of death: The case of France. Historical Methods , 29, 72–87.

Moriyama

Loy

Robb-Smith

AHT

(2011) History of the Statistical Classification of Diseases and Causes of Death . Atlanta, GA: Centers for Disease Control and Prevention; Hyattsville, MD: National Center for Health Statistics.

10.

Pawlowsky-Glahn

Buccianti

(eds) (2011) Compositional Data Analysis: Theory and Applications . Hoboken, NJ: Wiley.

11.

Pechholdová

(2009) Results and observations from the reconstruction of continuous time series of mortality by cause of death: Case of West Germany, 1968–1997. Demographic Research , 21, 535–568.

12.

Pechholdová

Camarda

Meslé

Vallin

(2017) Reconstructing long-term coherent cause-of-death series: A necessary step for analyzing trends. European Journal of Population , 33, 629–650.

13.

R Development Core Team (2018) R: A Language and Environment for Statistical Computing . Vienna: R Foundation for Statistical Computing. http://www.R-project.org last accessed 8 February 2019.

14.

Rey

Aouba

Pavillon

Hoffmann

Plug

Westerling

Jougla

Mackenbach

(2011) Cause-specific mortality time series analysis: A general method to detect and correct for abrupt data production changes. Population Health Metrics , 9, 1–11.

15.

Tibshirani

(1996) Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society , 58, 267–288.

16.

van der Stegen

RHM

Koren

LGH

Harteloh

PPM

Kardaun

JWPF

Janssen

(2014) A novel time series approach to bridge coding changes with a consistent solution across causes of death. European Journal of Population , 30, 317–335.

17.

Zou

Hastie

(2005) Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society , 67, 301–20.