Abstract
The growth patterns of different anatomic structures in the human body vary in terms of growth amount over time, growth rate and growth periods. The oral and pharyngeal structures, also known as vocal tract structures, are housed in the craniofacial complex where the cranium/brain follows a distinct neural growth pattern, and the face follows a distinct somatic or skeletal growth pattern. Thus, it is reasonable to expect the oral and pharyngeal structures to follow a combined or mixed growth pattern. Existing parametric growth models are limited in that they are mainly focused on modeling one particular type of growth pattern. In this paper, we propose a novel composite growth model using neural and somatic baseline curves to fit the combined growth pattern of select vocal tract structures. The method can also determine the overall percent contribution of each of the growth types.
1 Introduction
Growth curves of the various structures of the human anatomy are of clinical interest, where the estimated growth curves serve as normative references against which growth is evaluated and atypical growth is identified. Clinical growth charts established by the Center for Disease Control and Prevention (CDC) (www.cdc.gov/growthcharts) for weight, height and head circumference (HC) confirm the two major types of growth pattern, namely the somatic and neural growth patterns.
1
These two major growth patterns are depicted in Figure 1. Figure 1(a) displays the growth of HC that follows a neural growth pattern. Specific characteristics of the neural growth pattern is that there is a period of rapid postnatal growth where about 80% of the adult size is achieved during early childhood; this is then followed by slower steady growth until adulthood. Figure 1(b) and (c) displays body weight and height both of which follow a somatic growth pattern where again much like the neural growth pattern there is rapid postnatal growth. The growth achieved during this early childhood phase, however, is less than 40% of the adult size. This is then followed by a slower growth trend but only up to puberty where there is a second marked accelerated growth period that tapers at about age 15 years for females and about age 18 years for males. These two major growth patterns also characterize the growth of the head–craniofacial complex where the cranium/brain follows a distinct neural growth pattern, and the face follows a distinct somatic or skeletal growth curve.
Nellhaus head circumference and CDC height and weight growth curves for male and female between the ages birth to 20 year old, with a schematic for the proposed growth model (3a); (a) Nellhaus head circumference, (b) CDC weight, (c) CDC height, and (d) Proposed mixture growth model. Example of data generated in Study 1 for female and male; green solid and red/blue dashed lines indicate population average fitted curves by double logistic model (7) and proposed model (3a), respectively.

While HC, weight and height follow one particular type of growth pattern, some structures may display developmental changes that cannot be characterized by a single growth pattern. For example, structures housed in the craniofacial complex, such as the vocal tract structures, appear to follow the mixture of both neural and somatic growth patterns.
2
Existing nonlinear human growth models lack flexibility in describing the complex growth pattern of the vocal tract. The empirical evidence so far suggests the vocal tract to have a composite growth model of the form
The main contribution of this paper is the introduction of the data-driven composite growth model of the form (1) and showing how the model is subsequently used to determine the contributions of different growth types. This is the first paper that models human growth as a composition of two different growth shapes.
2 Previous growth models
As Gasser pointed out, 3 efforts in analyzing human growth curves can be broadly divided into fixed and mixed-effects approaches. In this section, we provide a brief survey of notable models in each class.
2.1 Fixed-effects models
The model-fitting procedure in the fixed model approach can be either parametric, fully nonparametric or semiparametric. The parametric models are most commonly used nowadays in studying human growth. Crude nonlinear parametric models were first introduced to fit human growth locally. The Count model
4
In an attempt to complement parametric models, Gasser et al.
9
applied a nonparametric model to a longitudinal study of human height growth:
As an alternative to the classical parametric models and nonparametric models, the shape invariant model (SIM), also known as self-modeling nonlinear regression model, was introduced and applied to human growth data by Lawton et al.
10
The semiparametric approach postulates that a population has a common characteristic function and all the individual growth curves within the population can be modeled by shifting and scaling the characteristic curve. The individual growth curves can be written in the form
2.2 Mixed-effects models
The fixed-effects model approach of fitting nonlinear curves to individual subjects and then summarizing the parameter estimates for the population is inadequate when we consider the within-subject dependency. Mixed-effects models provide a solution for this problem. For the extensive survey on the mixed-effects model, please refer to Pinheiro and Bates.
12
Ke and Wang
13
proposed a semiparametric mixed-effects model:
2.3 Vocal tract growth modeling
Modeling vocal tract growth is a challenge, in that a good model would require a great deal of fine-tuning towards specific growth pattern such as the adolescent growth spurt. This requirement rules out a number of classical parametric models confined to describe less complex growth patterns. Polynomial curves and complicated parametric models, as well as nonparametric and semiparametric models, would in theory provide good fits. Vorperian et al. 16 modeled the growth change of various vocal tract portions from birth to adulthood by fourth-order polynomial model. Due to great flexibility and computational simplicity, polynomial curves in practice remain good candidates in modeling complex growth patterns such as vocal tract growth. 17 However, the main limitation of polynomial curves is downward bending in late adolescence. 16
Barbier et al. 18 used a double logistic model to fit the growth of the vocal tract from fetus to adulthood. While the double logistic model provides a close imitation of the vocal tract growth pattern, parameter estimation is nearly impossible for a highly unbalanced dataset when random effects are incorporated. Same issues occur with efforts to apply other complex parametric models with random effects. The much more flexible spline and kernel smoothing techniques are computationally demanding when the dataset is large. On the other hand, the proposed composite growth model will easily accommodates random effects even with large and unbalanced datasets. The patterns specific to vocal tract growth would also be kept by the model at all times.
3 Methods
The term composite growth refers to a linear combination of two different growth types. With the proper choice of the baseline curves, it is possible to model any complex vocal tract growth. For the current study, we use published normative HC and weight growth curves that are representative of neural and somatic growth. The neural growth curve N(t) represented by the HC growth was obtained by Vorperian et al. 16 from a study conducted by Nellhaus, 19 where gender-specific population mean growth curves were estimated (Figure 1(a)). The somatic growth curve S(t) represented by the sex-specific CDC weight growth curves is based upon several national health examination survey datasets taken between the years 1963 and 1994 1 (Figure 1(b)).
3.1 Mixture growth model
Let G(t) represents the measurement of a vocal tract structure at age t. Consider neural N(t) and somatic S(t) curves that characterize two different types of growth. We are interested in modeling G as a linear combination of N and S. Figure 1(d) shows a schematic of composite growth out of two baseline growth patterns N and S. We fit the following three models simultaneously:
Since the fixed-effects parameters α’s can be interpreted as the population averages for the corresponding mixed-effects parameters γ’s, we can construct the following formulas to quantify the population growth type based on the respective fixed-effects residual sums of squares
The proposed model (3a) can be interpreted as the scaling of additive characteristic somatic and neural functions from the shape invariant point of view. The variability of individual subjects within the population is incorporated in the random effects of the intercept and scaling factors. The proposed model (3a) has many advantages compared with the existing growth models. (1) Classical models often model a single growth type, whereas the proposed approach models the linear combination of two distinct types of growth. (2) In terms of computation, the proposed model (3a) can be easily implemented when the sample dataset is large, as opposed to the computationally demanding nonparametric and semiparametric mixed-effects models. (3) Since the normative baseline curves S(t) and N(t) originate from sources independent of the dataset, the proposed approach is less biased than estimating the baseline functions and fitting model from a single dataset.
3.2 Simulations
For simulation studies, the baseline longitudinal data were generated using a gender-specific fourth-degree polynomials:
The signals and noises are assumed to be independent, and their variances
Study 1.
Study 2.
Data generated in Study 1 are noisier than those generated in Study 2. Figures 2 and 3 show examples of simulated data in Study 1 and 2. One hundred simulations were run in each study and our composite and double logistic models were fitted in each simulation. The results do not differ greatly even if we increase the number of simulations or change the parameters in the model indicative of robustness of our simulation framework.
Example of data generated in Study 2 for female and male; green solid and red/blue dashed lines indicate population average fitted curves by double logistic model (7) and proposed model (3a) respectively. Midsagittal images displaying the anatomic landmarks used for making oral and pharyngeal measurements; the highlighted segments illustrate the actual measurements; left to right: VTL, VT-H, LP-V, and HP-V. The landmarks that are used to define the four variables are L, VF, PPW, PNS, and H.

We compared the proposed model (3a, 3b, 3c) against a mixed-effects version of the gender-specific double logistic model used by Barbier et al.
18
for vocal tract growth from fetus to adulthood
Mean squared error (MSE) and its one standard deviation for the double logistic and the proposed models (3a) for 100 simulations.
Note that the datasets generated in both Studies 1 and 2 were fairly balanced. The parameter estimation for the mixed-effects double logistic models was relatively easy to handle. However, for many highly unbalanced datasets we have attempted, the mixed-effects double logistic models often failed to converge, whereas the proposed model (3a) converged quickly in every case. The simulation studies suggest that the proposed model (3a) would make a better candidate in modeling unbalanced large-scale longitudinal vocal tract data in practice.
4 Application
We applied the proposed method to model the growth of the four vocal tract portions based on measurements secured from CT images.
4.1 Vocal tract data
Measurements were obtained from 771 CT and MRI imaging studies of individuals between birth and 19 years of age. All measurements were made from the midsagittal plane of 419 male and 352 female scans. Some of the individuals had repeated scans and therefore the number of scans were highly unbalanced among subjects. For example, between birth and 19 years, 229 subjects had a single scan. Some subject has up to 10 scans.
Figure 4 displays four variables we used to assess the composite growth model proposed in this paper. The four variables are: (a) VTL, vocal tract length, defined as the curvilinear distance along the midline of the vocal tract starting at the level of the vocal fold (VF) to the intersection with a line drawn tangentially to the lips (L); (b) VT-H, vocal tract-horizontal, defined as the horizontal distance form where the VTL ends to the posterior pharyngeal wall (PPW); (c) HP-V, hyoid posterior nasal spine-vertical, defined as the vertical distance from the posterior nasal spine (PNS) to the anterior-inferior border of the hyoid bone (H); and (d) LP-V, larynx posterior nasal spine-vertical, defined as the vertical distance from the PNS to the larynx at the level of the VF. The abbreviation of the variables is consistent with that used by Vorperian et al. 2
4.2 Results
The mixed-effects models based on (3a), (3b) and (3c) were fitted separately for male and female using the lme4.0 package in R. 20 All combinations of random effects (single, double and full combination) were fitted based on the full fixed-effects model. The Akaike information criterion (AIC) was used as a criterion in comparing the models. 15
Table 2 displays the AICs for all the random-effects combinations of model (3a) for VTL, VT-H, LP-V and HP-V. Chosen combinations have the smallest AICs. For instance, for VTL we should fit random effects on the intercept and the neural growth for female, and fit random effects on the somatic and neural growth for male. The AICs of the chosen models are set in bold face in the table. Figures 5 to 8 show the estimated population average growth patterns for VTL, VT-H, LP-V and HP-V. All four measurements see a sharp growth spurt between birth and approximately two years of age followed by the second more smooth growth spurt during adolescence.
VTL: population growth curve (left) and rate (right) based on model (3a). VT-H: population growth curve (left) and rate (right) based on model (3a). LP-V: population growth curve (left) and rate (right) based on model (3a). HP-V: population growth curve (left) and rate (right) based on model (3a). AIC for mixed-effects models based on (3a); models with the smallest AIC are selected. VTL: vocal tract length; VT-H: vocal tract-horizontal; LP-V: larynx posterior nasal spine-vertical; HP-V: hyoid posterior nasal spine-vertical; AIC: Akaike information criterion.



We also compared the performance of the proposed composite growth model to the existing double logistic model. Table 3 shows the comparison between the MSEs of the double logistic model and the chosen composite model in Table 2. Our model is in general comparable or outperforms the mixed-effects double logistic model. In fact, the double logistic model fails to converge for the male case of LP-V. Figure 9 shows depiction of VTL population growth trend by the double logistic model and the proposed composite growth model. Although the double logistic model manages to capture the overall growth trend, the sharp growth and plateau that respectively characterize early childhood and late-teen development are not as well depicted as the proposed growth model, particularly for the male curve.
Population average growth curves of VTL based on mixed-effects double logistic (7) (left) and composite growth model (3a) (right). Mean squared errors (MSEs) for the mixed-effects double logistic model (7) and mixed-effects composite growth model (3a) chosen in Table 2 for VTL, VT-H, LP-V and HP-V; NA indicates failure of convergence. VTL: vocal tract length; VT-H: vocal tract-horizontal; LP-V: larynx posterior nasal spine-vertical; HP-V: hyoid posterior nasal spine-vertical.
Fixed-effects residual sums of squares
In the last column, somatic/neural indicates dominance of somatic over neural growth and vice versa.
VTL: vocal tract length; VT-H: vocal tract-horizontal; LP-V: larynx posterior nasal spine-vertical; HP-V: hyoid posterior nasal spine-vertical.
Growth velocity is an important growth characteristic that can be easily computed based on a fitted model and visualized. The population growth velocity for a vocal tract portion is approximated from the population average of a model. At a particular age t, we estimate the growth velocity discretely using the finite difference
5 Conclusion and discussion
The proposed method uses existing two normative growth curves in modeling the growth of more complex vocal tract structures as a composition of somatic and neural growth types. Since this is an empirical approach based on available growth curves, the resulting growth model can closely represent documented growth trends. Compared with the traditional parametric growth models, our method is numerically simpler to implement and computationally more efficient. All the traditional models achieve the accuracy in depiction of finer features such as mid-growth spurts by adding parameters and nonlinearity in the model. This adds considerable difficulty in computation for large and highly unbalanced datasets. Algorithms fitting nonlinear models require sensible and stable initial values, which are difficult to obtain when the model consists of several nonlinear parameters. When random effects are added to the model, convergence might be difficult to obtain due to the unbalanced number of observations between subjects. Our composite growth model, on the other hand, has only linear parameters, which rarely cause divergence when fitting random effects.
The obvious limitation of the proposed model (3a), however, lies with the requirement of distinct baseline growth curves that behave like basis functions in representing more complex growth patterns. If a biological structure does not follow a documented combination of distinct growth trends, our approach may not offer an accurate depiction of the growth. Neither would it be useful when reliable reference growth curves do not exist.
One possible application and extension of the proposed model is toward the landmarked-based morphometric study of the human maxillary complex, 22 which is closely related to vocal tract structures in terms of growth characteristics. Given the composite biological structure of the maxillary complex, we can expect that the distances between various landmarks on the complex exhibit composite growth patterns similar to those found in the vocal tract structures. We can therefore model the growth of the human maxillary complex by a system of our proposed models. The fitted models could serve as normative references in medical and dental treatments such as maxillary expansion.
Footnotes
Acknowledgements
We thank Dr. Meghan M. Cotter for assistance securing CDC data and Michael Kelly for helpful comments.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported, in part, by National Institute on Deafness and Other Communication Disorders Grants R03 DC4362 (Anatomic Development of the Vocal Tract: MRI Procedures) and R01 DC6282 (MRI and CT Studies of the Developing Vocal Tract) as well as by National Institute of Child Health and Human Development Core Grant P-30 HD03352, awarded to the Waisman Center.
