## Abstract

Multivariate estimates of genetic parameters are subject to substantial sampling variation, especially for smaller data sets and more than a few traits. A simple modification of standard, maximum-likelihood procedures for multivariate analyses to estimate genetic covariances is described, which can improve estimates by substantially reducing their sampling variances. This is achieved by maximizing the likelihood subject to a penalty. Borrowing from Bayesian principles, we propose a mild, default penalty—derived assuming a Beta distribution of scale-free functions of the covariance components to be estimated—rather than laboriously attempting to determine the stringency of penalization from the data. An extensive simulation study is presented, demonstrating that such penalties can yield very worthwhile reductions in loss, *i.e.*, the difference from population values, for a wide range of scenarios and without distorting estimates of phenotypic covariances. Moreover, mild default penalties tend not to increase loss in difficult cases and, on average, achieve reductions in loss of similar magnitude to computationally demanding schemes to optimize the degree of penalization. Pertinent details required for the adaptation of standard algorithms to locate the maximum of the likelihood function are outlined.

ESTIMATION of genetic parameters, *i.e.*, partitioning of phenotypic variation into its causal components, is one of the fundamental tasks in quantitative genetics. For multiple characteristics of interest, this involves estimation of covariance matrices due to genetic, residual, and possibly other random effects. It is well known that such estimates can be subject to substantial sampling variation. This holds especially for analyses comprising more than a few traits, as the number of parameters to be estimated increases quadratically with the number of traits considered, unless the covariance matrices of interest have a special structure and can be modeled more parsimoniously. Indeed, a sobering but realistic view is that “Few datasets, whether from livestock, laboratory or natural populations, are of sufficient size to obtain useful estimates of many genetic parameters” (Hill 2010, p. 75). This not only emphasizes the importance of appropriate data, but also implies that a judicious choice of methodology for estimation—which makes the most of limited and precious records available—is paramount.

A measure of the quality of an estimator is its “loss,” *i.e.*, the deviation of the estimate from the true value. This is an aggregate of bias and sampling variation. We speak of improving an estimator if we can modify it so that the expected loss is lessened. In most cases, this involves reducing sampling variance at the expense of some bias—if the additional bias is small and the reduction in variance sufficiently large, the loss is reduced. In statistical parlance “regularization” refers to the use of some kind of additional information in an analysis. This is often used to solve ill-posed problems or to prevent overfitting through some form of penalty for model complexity; see Bickel and Li (2006) for a review. There has been longstanding interest, dating back to Stein (1975) and earlier (James and Stein 1961), in regularized estimation of covariance matrices to reduce their loss. Recently, as estimation of higher-dimensional matrices is becoming more ubiquitous, there has been a resurgence in interest (*e.g.*, Bickel and Levina 2008; Warton 2008; Witten and Tibshirani 2009; Ye and Wang 2009; Rothman *et al.* 2010; Fisher and Sun 2011; Ledoit and Wolf 2012; Deng and Tsui 2013; Won *et al.* 2013). In particular, estimation encouraging sparsity is an active field of research for estimation of covariance matrices (*e.g.*, Pourahmadi 2013) and in related areas, such as graphical models and structural equations.

## Improving Estimates of Genetic Parameters

As emphasized above, quantitative genetic analyses require at least two covariance matrices to be estimated, namely due to additive genetic and residual effects. The partitioning of the total variation into its components creates substantial sampling correlations between them and tends to exacerbate the effects of sampling variation inherent in estimation of covariance matrices. However, most studies on regularization of multivariate analyses considered a single covariance matrix only and the literature on regularized estimates of more than one covariance matrix is sparse. In a classic article, Hayes and Hill (1981) proposed to modify estimates of the genetic covariance matrix by shrinking the canonical eigenvalues of and the phenotypic covariance matrix toward their mean, a procedure described as “bending” the estimate of toward that of The underlying rationale was that the sum of all the causal components, is typically estimated much more accurately than any of its components, so that bending would “borrow strength” from the estimate of while shrinking estimated eigenvalues toward their mean would counteract their known, systematic overdispersion. The authors demonstrated by simulation that use of “bent” estimates in constructing selection indexes could increase the achieved response to selection markedly, as these were closer to the population values than unmodified estimates and thus provided more appropriate estimates of index weights. However, no clear guidelines to determine the optimal amount of shrinkage to use were available and bending was thus primarily used only to modify nonpositive definite estimates of covariance matrices and all but forgotten when methods that allowed estimates to be constrained to the parameter space became common procedures.

Modern analyses to estimate genetic parameters are generally carried out fitting a linear mixed model and using restricted maximum-likelihood (REML) or Bayesian methodology. The Bayesian framework directly offers the opportunity for regularization through the choice of appropriate priors. Yet, this is rarely exploited for this purpose and “flat” or minimally informative priors are often used instead (Thompson *et al.* 2005). In a maximum-likelihood context, estimates can be regularized by imposing a penalty on the likelihood function aimed at reducing their sampling variance. This provides a direct link to Bayesian estimation: For a given prior distribution of the parameters of interest or functions thereof, an appropriate penalty can be obtained as a multiple of minus the logarithmic value of the probability density function. For instance, shrinkage of eigenvalues toward their mean through a quadratic penalty on the likelihood is equivalent to assuming a normal distribution of the eigenvalues while the assumption of a double exponential prior distribution results in a LASSO-type penalty (Huang *et al.* 2006).

Meyer and Kirkpatrick (2010) demonstrated that a REML analog to bending is obtained by imposing a penalty proportional to the variance of the canonical eigenvalues on the likelihood and showed that this can yield substantial reductions in loss for estimates of both and the residual covariance matrix. Subsequent simulations (Meyer *et al.* 2011; Meyer 2011) examined the scope for penalties based on different functions of the parameters to be estimated and prior distributions for them and found them to be similarly effective, depending on the population values for the covariance matrices to be estimated.

A central component of the success of regularized estimation is the choice of how much to penalize. A common practice is to scale the penalty by a so-called “tuning factor” to regulate stringency of penalization. Various studies (again for a single covariance matrix, see above) demonstrated that this can be estimated reasonably well from the data at hand, using cross-validation techniques. Adopting these suggestions for genetic analyses and using *k*-fold cross-validation, Meyer (2011) estimated the appropriate tuning factor as that which maximized the average, unpenalized likelihood in the validation sets. However, this procedure was laborious and afflicted by problems in locating the maximum of a fairly flat likelihood surface for analyses involving many traits and not so large data sets. These technical difficulties all but prevented practical applications so far. Moreover, it was generally less successful than reported for studies considering a single covariance matrix. This led to the suggestion of imposing a mild penalty, determining the tuning factor as the largest value that did not cause a decrease in the (unpenalized) likelihood equivalent to a significant change in a single parameter. This pragmatic approach yielded reductions in loss that were generally of comparable magnitude to those achieved using cross-validation (Meyer 2011). However, it still required multiple analyses and thus considerably increased computational demands compared to standard, unpenalized estimation.

## Simple Penalties

In the Bayesian framework, the influence of the prior and thus the amount of regularization is generally specified through the so-called hyperparameters of the prior distribution, which determine its shape, scale, or location. This suggests that an alternative, tuning factor-free formulation for a penalty on the likelihood can be obtained by expressing it in terms of the distribution-specific (hyper)parameters. For instance, when assuming a normal prior for canonical eigenvalues, the regulating parameter is the variance of the normal distribution, with more shrinkage induced the lower its value. This may lend itself to applications employing default values for these parameters. Furthermore, such formulation may facilitate direct estimation of the regulating parameter, denoted henceforth as *ν*, simultaneously with the covariance components to be estimated (G. de los Campos, personal communication). In contrast, in a setting involving a tuning factor the penalized likelihood is, by definition, highest for a factor of zero (*i.e.*, no penalty) and thus does not provide this opportunity.

### Aims

This article examines the scope for REML estimation imposing penalties regulated by choosing the parameters of the selected prior distribution. Our aim is to obtain a formulation that allows uncomplicated use on a routine basis, free from laborious and complicated additional calculations, to determine the appropriate strength of penalization. It differs from previous work such that we do not strive to obtain maximum benefits, but are content with lesser—but still highly worthwhile—improvements in estimates that can be achieved through mild penalties, at little risk of detrimental effects for unusual cases where the population parameters do not match our assumptions for their prior distributions well. The focus is on penalties involving scale-free functions of covariance components that fall into defined intervals and are thus better suited to a choice of default-regulating parameters than functions that do not.

We begin with the description of suitable penalties together with a brief review of pertinent literature and outline the adaptation of standard REML algorithms. This is followed by a large-scale simulation study showing that the penalties proposed can yield substantial reductions in loss of estimates for a wide range of population parameters. In addition, the impact of penalized estimation is demonstrated for a practical data set, and the implementation of the methods proposed in our REML package wombat is described. We conclude with a discussion and recommendations on selection of default parameters for routine use in multivariate analyses.

## Penalized Maximum-Likelihood Estimation

Consider a simple mixed, linear model for *q* traits with covariance matrices and due to additive genetic and residual effects, respectively, to be estimated. Let denote the log-likelihood in a standard, unpenalized maximum-likelihood (or REML) analysis and denote the vector of parameters, composed of the distinct elements of and or the equivalent. The penalized likelihood is then (Green 1987)(1)with the penalty a nonnegative function of the parameters to be estimated and *ϕ* the so-called tuning factor that modulates the strength of penalization (the factor of 1/2 is used for algebraic consistency and could be omitted).

The penalty can be derived by assuming a suitable prior distribution for the parameters (or functions thereof) as minus the logarithmic value of the pertaining probability density. As in Bayesian estimation, the choice of the prior is often somewhat *ad hoc* or driven by aspects of convenience (such as conjugacy of the priors) and computational feasibility. In the following, we assume throughout and regulate the amount of penalization instead via the parameters of the distribution from which the penalty is derived.

### Functions to be penalized

We consider two types of scale-free functions of the covariance matrices to be estimated as the basis for regularization.

#### Canonical eigenvalues:

Following Hayes and Hill (1981), the first type comprises the canonical eigenvalues of and

Multivariate theory shows that for two symmetric, positive definite matrices of the same size there is a transformation that yields and with the diagonal matrix of canonical eigenvalues with elements (Anderson 1984). This can be thought of as transforming the traits considered to new variables that are uncorrelated and have phenotypic variance of unity; *i.e.*, the canonical eigenvalues are equal to heritabilities on the new scale and fall in the interval (Hayes and Hill 1980). It is well known that estimates of eigenvalues of covariance matrices are systematically overdispersed—the largest values are overestimated and the smallest are underestimated—while their mean is expected to be estimated correctly (Lawley 1956). Moreover, a major proportion of the sampling variation of covariance matrices can be attributed to this overdispersion of eigenvalues (Ledoit and Wolf 2004). Hence there have been various suggestions to modify the eigenvalues of sample covariance matrices in some way to reduce the loss in estimates; see Meyer and Kirkpatrick (2010) for a more detailed review.

#### Correlations:

The second type of functions comprises correlations between traits, in particular genetic correlations. A number of Bayesian approaches to the estimation of covariance matrices decompose the problem into variances (or standard deviations) and correlations with separate priors, thus alleviating the inflexibility of the widely used conjugate prior given by an inverse Wishart distribution (Barnard *et al.* 2000; Daniels and Kass 2001; Zhang *et al.* 2006; Daniels and Pourahmadi 2009; Hsu *et al.* 2012; Bouriga and Féron 2013; Gaskins *et al.* 2014). However, overall few suitable families of prior density functions for correlation matrices have been considered and practical applications have been limited. In particular, estimation using Monte Carlo sampling schemes has been hampered by difficulties in sampling correlation matrices conforming to the constraints of positive definiteness and unit diagonals.

Most statistical literature concerned with Bayesian or penalized estimation of correlation matrices considered shrinkage toward an identity matrix, *i.e.*, shrinkage of individual correlations toward zero, although other, simple correlation structures have been proposed (Schäfer and Strimmer 2005). As outlined above, the motivation for bending (Hayes and Hill 1981) included the desire to borrow strength from the estimate of the phenotypic covariance matrix. Similar arguments may support shrinkage of the genetic toward the phenotypic correlation matrix. This dovetails with what has become known as “Cheverud’s conjecture”: Reviewing a large body of literature, Cheverud (1988) found that estimates of genetic correlations were generally close to their phenotypic counterparts and thus proposed that phenotypic values should be substituted when genetic correlations could not be estimated. Subsequent studies reported similar findings for a range of traits in laboratory species, plants, and animals (*e.g.*, Koots *et al.* 1994; Roff 1995; Waitt and Levin 1998).

### Partial correlations

Often a reparameterization can transform a constrained matrix problem to an unconstrained one. For instance, it is common practice in REML estimation of covariance matrices to estimate the elements of their Cholesky factors, coupled with a logarithmic transformation of the diagonal elements, to remove constraints on the parameter space (Meyer and Smith 1996). Pinheiro and Bates (1996) examined various transformations for covariance matrices and their impact on convergence behavior of maximum-likelihood analyses, and corresponding forms for correlation matrices have been described (Rapisarda *et al.* 2007).

To alleviate the problems encountered in sampling valid correlation matrices, Joe (2006) proposed a reparameterization of correlations to partial correlations, which vary independently over the interval This involves a one-to-one transformation between partial and standard correlations, *i.e.*, is readily reversible. Hence, using the reparameterization, we can readily sample a random correlation matrix that is positive definite by sampling individual, partial correlations. Moreover, it allows for flexible specification of priors in a Bayesian context by choosing individual and independent distributions for each one. Daniels and Pourahmadi (2009) referred to these quantities as partial autocorrelations (PACs), interpreting them as correlations between traits *i* and *j* conditional on the intervening traits, to

#### Details:

Consider a correlation matrix **R** of size with elements (for ) and As **R** is symmetric, let For the PACs are equal to the standard correlations, as there are no intervening variables. For partition the submatrix of **R** composed of rows and columns *i* to *j* as(2)with and vectors of length with elements and respectively, and the corresponding matrix with elements for This gives PAC(3)and the reverse transformation is(4)(Joe 2006).

### Penalties

We derive penalties on canonical eigenvalues and correlations or partial autocorrelations, choosing independent Beta distributions as priors. This choice is motivated by the flexibility of this class of distributions and the property that its hyperparameters—which determine the strength of penalization—can be “set” by specifying a single value.

#### Beta distribution:

The Beta distribution is a continuous probability function that is widely used in Bayesian analyses and encompasses functions with many different shapes, determined by two parameters, *α* and *β*. While the standard Beta distribution is defined for the interval it is readily extended to a different interval. The probability density function for a variable following a Beta distribution is of the form(5)(Johnson *et al.* 1995, Chap. 25) with and and denoting the Beta and Gamma functions, respectively.

When employing a Beta prior in Bayesian estimation, the sum of the shape parameters, is commonly interpreted as the effective sample size of the prior (PSS) (Morita *et al.* 2008). It follows that we can specify the parameters of a Beta distribution with mode *m* as a function of the PSS () as(6)For this yields a unimodal distribution, and for the distribution is symmetric, with For given *m*, this provides a mechanism to regulate the strength of penalization through a single, intuitive parameter, the PSS *ν*.

Figure 1 shows the probability density of a variable with a standard Beta distribution on the interval with mode of 0.3 together with the resulting penalty (*i.e.*, minus the logarithmic value of the density), for three values of PSS. For the density function would be a uniform distribution, depicted by a horizontal line at height of 1, resulting in no penalty. With increasing values of *ν*, the distribution becomes more and more peaked and the penalty on values close to the extremes of the range becomes more and more severe. Conversely, in spite of substantial differences in point mass around the mode, penalty values in proximity of the mode differ relatively little for different values of *ν*. While in (1) was considered nonnegative, penalty values close to the mode can be negative—this does not affect suitability of the penalty and can be overcome by adding a suitable constant.

#### Penalty on canonical eigenvalues:

Canonical eigenvalues fall in the interval For *q* traits, there are likely to be *q* different values and attempts to determine the mode of their distribution may be futile. Hence we propose to substitute the mean canonical eigenvalue, Taking minus logarithmic values of (5) and assuming the same mode and PSS for all *q* eigenvalues then gives penalty(7)with . This formulation results in shrinkage of all eigenvalues toward with the highest and lowest values shrunk the most.

#### Penalty on correlations:

Following Joe (2006) and Daniels and Pourahmadi (2009), we assume independent shifted Beta distributions on for PAC.

Daniels and Pourahmadi (2009) considered several Bayesian priors for correlation matrices formulated via PAC, suggesting uniform distributions for individual *i.e.*, In addition, they showed that the equivalent to the joint uniform prior for **R** proposed by Barnard *et al.* (2000), is obtained by assuming Beta priors for PAC with shape parameters depending on the number of intervening variables; *i.e.*, Similarly, priors proportional to higher powers of the determinant of **R**, are obtained for (Daniels and Pourahmadi 2009). Gaskins *et al.* (2014) extended this framework to PAC-based priors with more aggressive shrinkage toward zero for higher lags, suitable to encourage sparsity in estimated correlation matrices for longitudinal data.

Both Joe (2006) and Daniels and Pourahmadi (2009) considered Beta priors for PAC with *i.e.*, shrinkage of all toward zero. We generalize this by allowing for different shrinkage targets —and thus different shape parameters and —for individual values This gives penalty(8)with Again, this assumes equal PSS for all values, but could of course readily be expanded to allow for different values for different PACs. For all (8) reduces to(9)with

### Maximizing the penalized likelihood

REML estimation in quantitative genetics usually relies on algorithms exploiting derivatives of the log-likelihood function to locate its maximum. In particular, the so-called average information algorithm (Gilmour *et al.* 1995) is widely used due to its relative computational ease, good convergence properties, and implementation in several REML software packages. It can be described as a Newton(–Raphson)-type algorithm where the Hessian matrix is approximated by the average of observed and expected information. To adapt the standard, unpenalized algorithm for penalized estimation we need to adjust first and second derivatives of for derivatives of the penalties with respect to the parameters, to be estimated. These differ if we choose fixed values to determine the modes of the assumed Beta priors (*e.g.*, or the mean from a preliminary, unpenalized analysis) or employ penalties that derive these from the parameter estimates.

#### Canonical eigenvalues:

If in (7) is estimated from the data, its derivatives are nonzero. This gives(10)and(11)Derivatives of *C* involve the digamma and trigamma functions, *e.g.*,with *ψ* the digamma function. Derivatives of required in (10) and (11) are easiest to evaluate if the analysis is parameterized to the canonical eigenvalues and the elements of the corresponding transformation matrix **T** (see Meyer and Kirkpatrick 2010), so that and and all other derivatives of and are zero. A possible approximation is to ignore contributions of derivatives of arguing that the mean eigenvalue is expected to be unaffected by sampling overdispersion and thus should change little.

#### Partial correlations:

Analogous arguments hold for penalties involving correlations. This gives(12)and(13)with obvious simplifications if shrinkage targets are fixed or treated as such, so that derivatives of are zero. As shown in the *Appendix*, derivatives of correlations and PACs are readily calculated from the derivatives of covariance components for any of the parameterizations commonly utilized in (unpenalized) REML algorithms for variance component estimation.

## Simulation Study

A large-scale simulation study, considering a wide range of population parameters, was carried out to examine the efficacy of the penalties proposed above.

### Setup

Data were sampled from multivariate normal distributions for traits, assuming a balanced paternal half-sib design composed of *s* unrelated sire families with 10 progeny each. Sample sizes considered were 400, and 1000, with records for all traits for each of the progeny but no records for sires.

Population values for genetic and residual variance components were generated by combining 13 sets of heritabilities with six different types of correlation structures to generate 78 cases. Details are summarized in the *Appendix* and population canonical eigenvalues for all sets are shown in Supplemental Material, File S1. To assess the potential for detrimental effects of penalized estimation, values were chosen deliberately to generate both cases that approximately matched the priors assumed in deriving the penalties and cases where this was clearly not the case. The latter included scenarios where population canonical eigenvalues were widely spread and in multiple clusters and cases where genetic and phenotypic correlations were highly dissimilar. A total of 500 samples per case and sample size were obtained and analyzed.

### Analyses

REML estimates of and for each sample were obtained without penalization and imposing a penalty on canonical eigenvalues as given in (7), and penalties on partial autocorrelations shrinking all values toward zero [ see (9)] and with shrinkage targets for each value equal to the corresponding phenotypic counterpart [ see (8)]. For the latter, penalties on genetic PAC only and both genetic and residual values were examined.

Analyses were carried out considering fixed values for the effective sample size, ranging from to 24. For penalties on both genetic and residual PAC, either the same value was used for both or the PSS for residual PAC was fixed at In addition, direct estimation of a suitable PSS for each replicate was attempted. As shown in (1), penalties were subtracted from the standard log-likelihood, incorporating a factor of

The model of analysis was a simple animal model, fitting means for each trait as the only fixed effects. A method-of-scoring algorithm together with simple derivative-free search steps was used to locate the maximum of the (penalized) likelihood function. To facilitate easy computation of derivatives of this was done using a parameterization to the elements of the canonical decomposition (see Meyer and Kirkpatrick 2010), restraining estimates of to the interval of

Direct estimation of *ν* was performed by evaluating points on the profile likelihood for *ν* [*i.e.*, maximizing with respect to the covariance components to be estimated for selected, fixed values of *ν*], combined with quadratic approximation steps of the profile to locate its maximum, using Powell’s (2006) Fortran subroutine NEWUOA. To avoid numerical problems, estimates of *ν* were constrained to the interval All calculations were carried out using custom Fortran programs (available on request).

### Summary statistics

For each sample and analysis, the quality of estimates was evaluated through their entropy loss (James and Stein 1961)(14)for *E*, and *P*, with denoting the matrix of population values for genetic, residual, and phenotypic covariances and the corresponding estimate. As suggested by Lin and Perlman (1985), the percentage reduction in average loss (PRIAL) was used as the criterion to summarize the effects of penalization(15)where denotes the entropy loss averaged over replicates, and and represent the penalized and corresponding unpenalized REML estimates of respectively.

In addition, the average reduction in due to penalization, was calculated as the mean difference across replicates between the unpenalized likelihood for estimates and the corresponding value for estimates

### Data availability

The author states that all data necessary for confirming the conclusions presented in the article are represented fully within the article.

## Results

Distributions of PRIAL across the 78 sets of population values together with the corresponding change in likelihood are shown in Figure 2 for two sample sizes, with penalties and applied to genetic PAC only. Distributions shown are trimmed; *i.e.*, their range reflects minimum and maximum values observed. Selected mean and minimum values are also reported in Table 1. More detailed results for individual cases are reported in File S1.

### Genetic covariances

Overall, for fixed PSS there were surprisingly few differences between penalties in mean PRIAL values achieved, especially for estimates of the genetic covariance matrix. Correlations between PRIAL for from 0.9 to unity suggested similar modes of action.

For comparison, additional analyses considered penalties on standard correlations, obtained by substituting for in (8), (9), (12), and (13). Corresponding PRIAL values (not shown) for these were consistently lower and, more importantly, a considerable number of minimum values were negative, even for small values of the PSS. Clearly this reflected a discrepancy between priors assumed and the distribution of correlations. Transformation to partial autocorrelation yielded a better match and thus yielded penalties markedly less likely to have detrimental effects. While easier to interpret than PAC, penalties on standard correlations based on independent Beta priors are thus not recommended.

Even for small values of *ν* there were worthwhile reductions in loss for estimates of in particular for the smallest sample (). Means increased with increasing stringency of penalization along with an increasing spread in results for individual cases, especially for the largest sample size. This pattern was due to the range of population values for genetic parameters used.

Moreover, for small samples or low PSS, it did not appear to be all that important whether the priors on which the penalties were based approximately matched population values or not: “Any” penalty proved beneficial, *i.e.*, resulted in positive PRIAL for For more stringent penalization, however, there was little improvement (or even adverse effects) for the cases where there was a clear mismatch. For instance, for for and sires, there were two cases with negative PRIAL for Both of these had a cluster of high and low population values for so that the assumption of a unimodal distribution invoked in deriving was inappropriate and led to sufficient overshrinkage to be detrimental. On the whole, however, unfavorable effects of penalization were few and restricted to the most extreme cases considered.

Paradoxically, PRIAL values for were also relatively low for cases where heritabilities were approximately the same and genetic and phenotypic correlations were similar, so that canonical eigenvalues differed little from their mean (see File S1). This could be attributed to the shape of the penalty function, as illustrated in Figure 1, resulting in little penalization for values close to the mode. In other words, these were cases were the prior also did not quite match the population values: While the assumption of a common mean for canonical eigenvalues clearly held, that of a distribution on the interval did not. This can be rectified by specifying a more appropriate interval. As the unpenalized estimates of are expected to be overdispersed, their extremes may provide a suitable range to be used. Additional simulations (not shown) for replaced values of and used to derive in (7) with and for each replicate, where and represented the largest and smallest canonical eigenvalue estimate from a preliminary, unpenalized analysis, respectively. This increased PRIAL for both and substantially for these cases. However, as the proportion of such cases was low (see File S1), overall results were little affected.

### Residual covariances

Differences between penalties were apparent for involved terms [see (7)], *i.e.*, the canonical eigenvalues of and Hence, yielded substantial PRIAL for estimates of both and especially for the smaller samples where sampling variances and losses were high. Conversely, applying penalties on genetic PAC resulted in some, but lower improvements in (except for large *ν*), but only as a by-product due to negative sampling correlations between and As shown in Table 2, imposing a corresponding penalty on residual PACs in addition could increase the PRIAL in estimates of markedly without reduction in the PRIAL for provided the PSS chosen corresponded to a relatively mild degree of penalization. Shrinking toward phenotypic PAC yielded somewhat less spread in PRIAL for than shrinking toward zero, accompanied by smaller changes in

### Phenotypic covariances

We argued above that imposing penalties based on the estimate of would allow us to borrow strength from it because typically is estimated more precisely than any of its components. Doing so, we would hope to have little—and certainly no detrimental—effect on the estimates of Loosely speaking, we would expect penalized estimation to redress, to some extent at least, any distortion in partitioning of due to sampling correlations. As demonstrated in Figure 2, this was generally the case for fixed values of the PSS less than or 12. Conversely, negative PRIAL for estimates of for higher values of *ν* flagged overpenalization for cases where population values for genetic parameters did not sufficiently match the assumptions on which the penalties were based.

### Canonical eigenvalues

Figure 3 shows the distribution of the largest and smallest canonical eigenvalues, contrasting population values with mean estimates from unpenalized and penalized analyses for a medium sample size and a fixed PSS of Results clearly illustrate the upward bias in estimates of the largest and downward bias in estimates of the smallest eigenvalues. As expected, imposing penalty reduced the mean of the largest and increased the mean of the smallest eigenvalues, with some overshrinkage, especially of the largest eigenvalue, evident. In contrast, for the small value of chosen, the distribution of the largest values from penalized and unpenalized analyses differed little for penalty *i.e.*, penalizing genetic PAC did not affect the leading canonical eigenvalues markedly, acting predominately on the smaller values. For more stringent penalties, however, some shrinkage of the leading eigenvalues due to penalties and was evident; detailed results for selected cases are given in File S1.

### Estimating PSS

On the whole, attempts to estimate the appropriate value of *ν* from the data were not all that successful. For numerous cases yielded an estimate of *ν* close to the lower end of the range allowed, *i.e.*, virtually no penalty. Conversely, for and a substantial number of cases resulted in estimates of *ν* close to the upper bound allowed. This increased PRIAL (compared to fixed values for *ν*) for cases that approximately matched the priors but caused reduced or negative PRIAL and substantial changes in otherwise. A possible explanation was that the penalized likelihood and thus the estimate of *ν* might be dominated by However, as shown in Table 2, neither estimating a value for () while fixing the PSS for () nor estimating a value for both (either separately or jointly, ) improved results greatly. Moreover, it yielded more cases for which penalization resulted in substantial, negative PRIAL, especially for Repeating selected analyses using a sequential grid search to determine optimal values of *ν* gave essentially the same results; *i.e.*, results could not be attributed to inadequacies in the quadratic approximation procedure. Given the additional computational requirements and the fact that suitable values could not be estimated reliably, joint estimation of PSS together with the covariance components cannot be recommended.

## Discussion

Sampling variation is the bane of multivariate analyses in quantitative genetics. While nothing can replace large numbers of observations with informative data and appropriate relationship structure, we often need to obtain reasonably trustworthy estimates of genetic parameters from relatively small data sets. This holds especially for data from natural populations but is also relevant for new or expensive to measure traits in livestock improvement or plant breeding schemes. We have shown that regularized estimation in a maximum-likelihood framework through penalization of the likelihood function can provide “better” estimates of covariance components, *i.e.*, estimates that are closer to the population values than those from standard, unpenalized analyses. This is achieved through penalties targeted at reducing sampling variation.

### Simplicity

The aim of this study was to obtain a procedure to improve REML estimates of covariance components and thus genetic parameters that is easy to use without considerable increase in computational requirements and suitable for routine aplication. We have demonstrated that it is feasible to choose default values for the strength of penalization that yield worthwhile reductions in loss for a wide range of scenarios and are robust, *i.e.*, are unlikely to result in penalties with detrimental effects, and are technically simple. While such a tuning-free approach may not yield a maximum reduction in loss, it appears to achieve a substantial proportion thereof in most cases with modest changes in the likelihood compared to the maximum of (without penalization). In contrast to earlier attempts to estimate a tuning factor (Meyer and Kirkpatrick 2010; Meyer 2011), it does not require multiple additional analyses to be carried out, and effects of penalization on computational requirements are thus mostly unimportant.

In addition, we can again make the link to Bayesian estimation, where the idea of mildly or weakly informative priors has been gaining popularity. Discussing priors for variance components in hierarchical models, Gelman (2006) advocated a half-t or half-Cauchy prior with large-scale parameters. Huang and Wand (2013) extended this to prior distributions for covariance matrices that resulted in half-t priors for standard deviations and marginal densities of correlations *ρ* proportional to a power of Chung *et al.* (2015) proposed a prior for covariance matrices proportional to a Wishart distribution with a diagonal scale matrix and low degrees of belief to obtain a penalty on the likelihood function that ensured nondegenerate estimates of variance components.

### Choice of penalty

We have presented two types of suitable penalties that fit well within the standard framework of REML estimation. Both achieved overall comparable reductions in loss but acted slightly differently, with penalties on correlations mainly affecting the smallest eigenvalues of the covariance matrices while penalties on canonical eigenvalues acted on both the smallest and largest values. Clearly it is the effect on the smallest eigenvalues (which have the largest sampling variances) that contributes most to the overall reduction in loss for a covariance matrix. An advantage of the penalty on correlations is that it is readily implemented for the parameterizations commonly employed in REML estimation, and it is straightforward to extend it to models with additional random effects and covariance matrices to be estimated or cases where traits are recorded on distinct subsets of individuals so that some residual covariances are zero. It also lends itself to scenarios where we may be less interested in a reduction in sampling variance but may want to shrink correlations toward selected target values.

### Strength of penalization

Results suggest that penalties on canonical eigenvalues or PAC assuming a Beta prior with a conservative choice of PSS of –10 will not only result in substantial improvements in estimates of genetic covariance components for many cases, but also result in little chance of detrimental effects for cases where the assumed prior does not quite match the underlying population values. Reanalyzing a collection of published heritability estimates from Mousseau and Roff (1987), M. Kirkpatrick (personal communication) suggested that their empirical distribution could be modeled as corresponding to and mode of 0.3.

### Additional bias

The price for a reduction in sampling variances from penalized analyses is an increase in bias. It is often forgotten that REML estimates of covariance components are biased even if no penalty is applied, as estimates are constrained to the parameter space; *i.e.*, the smallest eigenvalues are truncated at zero or, in practice, at a small positive value to ensure estimated matrices are positive definite. As shown in Figure 3, penalization tended to increase the lower limits for the smallest canonical eigenvalues and thus also for the corresponding values of the genetic covariance matrix, thus adding to the inherent bias. Previous work examined the bias due to penalization on specific genetic parameters in more detail (Meyer and Kirkpatrick 2010; Meyer 2011), showing that changes from unpenalized estimates were usually well within the range of standard errors. Employing a mild penalty with fixed PSS, changes in from the maximum value in an unpenalized analysis were generally small and well below significance levels. This suggests that additional bias in REML estimates due to a mild penalty on the likelihood is of minor concern and far outweighed by the benefits of a reduction in sampling variance.

### Alternatives

Obviously, there are many other choices. As in Bayesian estimation, results of penalized estimation are governed by the prior distribution selected, the more so the smaller the data set available. Mean reductions in loss obtained in a previous study, attempting to estimate tuning factors and using penalties derived assuming a normal distribution of canonical eigenvalues or inverse Wishart distributions of covariance or correlation matrices, again were by and large of similar magnitude (Meyer 2011). Other opportunities to reduce sampling variation arise through more parsimonious modeling, *e.g.*, by estimating at reduced rank or assuming a factor-analytic structure. Future work should examine the scope for penalization in this context and consider the effects on model selection.

## Application

Practical application of REML estimation using the simple penalties proposed is illustrated using the data set considered by Meyer and Kirkpatrick (2010) (available as supplemental material in their article). In brief, this was composed of six traits measured on beef cattle, with 912–1796 records per trait. Including pedigree information on parents, grandparents, etc., without records yielded a total of 4901 animals in the analysis. The model of analysis fitted animals’ additive genetic effects as random and so-called contemporary group effects, with up to 282 levels per trait, as fixed. The latter was a subclass representing a combination of systematic environmental factors to ensure only animals subject to similar conditions were directly compared.

Estimates of and were obtained by REML using WOMBAT (Meyer 2007), applying penalties on canonical eigenvalues and partial correlations (shrinking toward zero) using PSS of 4 and 8, and contrasted to unpenalized results. Lower bound sampling errors for estimates of canonical eigenvalues were approximated empirically (Meyer and Houle 2013).

Results are summarized in Table 3. For all penalties and both levels of PSS, deviations in the unpenalized likelihood from the maximum (for the “standard” unpenalized analysis) were small and well below the significance threshold of –1.92 for a likelihood-ratio test involving a single parameter (at an error probability of 5%), emphasizing the mildness of penalization even for the larger PSS. Similarly, changes in estimates of canonical eigenvalues were well within their 95% confidence limits, except for for with which was just outside the lower limit. As expected from simulation results above (*cf*. Figure 3), the penalty on canonical eigenvalues affected estimates of the largest values more than penalties on partial correlations. Conversely, the latter yielded somewhat larger changes in the lowest eigenvalues. Estimates for with agreed well with results from Meyer and Kirkpatrick (2010) (of 0.69, 0.50, 0.38, 0.27, 0.17, and 0.05 for – respectively), which were obtained by applying a quadratic penalty on the canonical eigenvalues and using cross-validation to determine the stringency of penalization.

Corresponding changes in individual parameters were small throughout and again well within the range of ∼95% confidence intervals (sampling errors derived from the inverse of the average information matrix) for the unpenalized analysis. The largest changes in correlations occurred for the pairs of traits with the smallest numbers of records and the higher PSS. For instance, for the residual correlation between traits 1 and 4 changed from 0.82 (with standard error of 0.17) to 0.65, while on both genetic and residual PAC reduced the estimate of the genetic correlation from 0.48 (standard error of 0.25) to 0.37. Plots with detailed results for individual parameters are given in File S1.

In summary, none of the penalties applied changed estimates significantly and none of the changes in estimates of individual parameters due to penalization were questionable. Indeed, the larger changes described yielded values more in line with literature results for the traits concerned, suggesting that penalization stabilized somewhat erratic estimates based on small numbers of observations.

## Implementation

Penalized estimation for the penalties proposed for fixed values of *ν* has been implemented in our mixed-model package wombat (available at http://didgeridoo.une.edu.au/km/wombat.php) (meyer 2007). For a parameterization to the elements of the canonical decomposition is used for ease of implementation, while penalties on correlations use the standard parameterization to elements of the Cholesky factors of the covariance matrices to be estimated. Maximization of the likelihood is carried out using the average information algorithm, combined with derivative-free search steps where necessary to ensure convergence. Example runs for a simulated data set are shown in File S1.

Experience with applications so far has identified small to moderate effects of penalization on computational requirements compared with an unpenalized analysis, with the bulk of extra computations arising from derivative-free search steps used to check for convergence. The parameterization to the elements of the canonical decomposition, however, tended to increase the number of iterations required even without penalization. Detrimental effects on convergence behavior when parameterizing to eigenvalues of covariance matrices have been reported previously (Pinheiro and Bates 1996).

Convergence rates of iterative maximum-likelihood analyses are dictated by the shape of the likelihood function. Newton–Raphson-type algorithms, including the average information algorithm, involve a quadratic approximation of the likelihood. When this is not the appropriate shape, the algorithm may become “stuck” and fail to locate the maximum. This happens quite frequently for standard (unpenalized) multivariate analyses comprising more than a few traits when estimated covariance matrices have eigenvalues close to zero. For such cases, additional maximization steps using alternative schemes, such as expectation maximization-type algorithms or a derivative-free search, are usually beneficial. For small data sets, we expect the likelihood surface around the maximum to be relatively flat. Adding additional “information” through the assumed prior distribution (*a.k.a*. the penalty) can improve convergence by adding curvature to the surface and creating a more distinct maximum. Conversely, too stringent a penalty may alter the shape of the surface sufficiently so that a quadratic approximation may not be successful. Careful checking of convergence should be an integral part of any multivariate analysis, penalized or not.

## Conclusion

We propose a simple but effective modification of standard multivariate maximum-likelihood analyses to “improve” estimates of genetic parameters: Imposing a penalty on the likelihood designed to reduce sampling variation will yield estimates that are on average closer to the population values than unpenalized values. There are numerous choices for such penalties. We demonstrate that those derived under the assumption of a Beta distribution for scale-free function of the covariance components to be estimated, namely generalized heritabilities (*a.k.a*. canonical eigenvalues) and genetic correlations, are well suited and tend not to distort estimates of the total, phenotypic variance. In addition, invoking a Beta distribution allows the stringency of penalization to be regulated by a single, intuitive parameter, known as effective sample size of the prior in a Bayesian context. Aiming at moderate rather than optimal improvements in estimates, suitable default values for this parameter can be identified that yield a mild penalty. This allows us to abandon the laborious quest to identify tuning factors suited to particular analyses. Choosing the penalty to be sufficiently mild can all but eliminate the risk of detrimental effects and results in only minor changes in the likelihood, compared to unpenalized analyses. Mildly penalized estimation is recommended for multivariate analyses in quantitative genetics considering more than a few traits to alleviate the inherent effects of sampling variation.

## Acknowledgments

The Animal Genetics and Breeding Unit is a joint venture of the New South Wales Department of Primary Industries and the University of New England. This work was supported by Meat and Livestock Australia grant B.BFG.0050.

## Appendix

### Population Values

Population values for the 13 sets of heritabilities used are summarized in Table A1. The six constellations of genetic () and residual () correlations between traits *i* and *j* () were obtained as

and

and

and

for and otherwise, and

for and computed from (4) with otherwise.

Phenotypic variances were equal to 1 throughout for correlation I and set to 2, 1, 3, 2, 1, 2, 3, 1, and 2 for traits 1–9 otherwise.

### Derivatives of Partial Autocorrelations

Partial autocorrelations [see (3)] can be written aswithThis gives partial derivatives of with respect to parameters and Derivatives of the components areandDecompose the correlation matrix as with the diagonal matrix of standard deviations for covariance matrix This gives the required derivatives of the correlation matrix:Finally, assuming derivatives of variances are available, the required derivatives of standard deviations areWhen estimating elements of directly, only are nonzero. Derivatives of covariance matrices when employing a parameterization to the elements of their Cholesky factors are given by Meyer and Smith (1996).

## Footnotes

*Communicating editor: L. E. B. Kruuk*Supplemental material is available online at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.186114/-/DC1.

- Received December 14, 2015.
- Accepted June 9, 2016.

- Copyright © 2016 by the Genetics Society of America