Multivariate permutation tests for two sample testing in presence of nondetects with application to microarray data

Abstract

Very often, data collected in medical research are characterized by censored observations and/or data with mass on the value zero. This happens for example when some measurements fall below the detection limits of the specific instrument used. This type of left censored observations is called “nondetects”. Such a situation of an excessive number of zeros in a data set is also referred to as zero-inflated data. In the present work, we aim at comparing different multivariate permutation procedures in two-sample testing for data with nondetects. The effect of censoring is investigated with regard to the different values that may be attributed to nondetected values, both under the null hypothesis and under alternative. We motivate the problem using data from allergy research.

Keywords

Two-sample test censored data permutation tests multivariate tests nonparametric combination

1 Introduction

Investigating the Immunoglobulin E (IgE) values related to specific allergens is crucial for assessing possible allergic diseases. Indeed, the presence of IgE antibodies is a necessary, but not sufficient, condition for the development of the corresponding allergy. The ImmunoCAP ISAC® is a miniaturized immunoassay platform that allows for multiplex measurement of specific IgE antibodies to many allergen components from human serum or plasma, resulting in fluorescent signals (microarray) which are then transformed into a semi-quantitative ISAC Standardized Unit (ISU). It is common when dealing with microarray data that observations may fall below the detection limits, thus leading to a large proportion of left-censored data (“nondetects”), often resulting in recorded zero values, or the smallest value detectable. When this data collection process results in an excessive number of zeros, this problem is also known as zero-inflated data.¹ These types of zeroes or nondetects cannot be considered missing values since they contain valuable information about the phenomenon of interest as they contain information on dependence relations among the many observed variables. Thus, we must take them into account in the analysis. Most of the literature on censored data revolves around survival analysis with right-censoring.^2–4 Quite recently, Dobler and Pauly⁵ have proposed nonparametric statistical inference procedures for both the Mann-Whitney effect P and the odds of the Mann-Whitney effect W in a classical survival model with non-informative randomly censored data, even allowing for ties in the data.

Furthermore, in case of diagnostic markers, it is common to use the receiver operating characteristic (ROC) curve and the area under the curve (AUC).^6–8 Schisterman et al.⁹ discuss an approach based on the Box–Cox power transformation, useful when dealing with ROC curves.^10,11 The authors show how this transformed normal approach can be developed to estimate the AUC for positive continuous data that may have zero values with positive probability.

In the framework of ANOVA and in the field of likelihood ratio tests, Farcomeni¹² has developed an asymptotic test for multivariate lognormal data with a point mass at zero, generalizing Wilks' lambda and Hotelling's T²-test to the case of semicontinuous data. In the same framework, Thulin¹³ discusses some parametric and nonparametric multivariate tests for biomarker data with nondetects. They argue that parametric tests perform better than their nonparametric alternatives.

In this paper, we aim at providing and comparing different solutions for multivariate two-sample testing with nondetects, with particular attention towards allergy research or immunology. In such a context, it is common to investigate specific IgE values related to several allergens, thus we refer to multiple endpoints. In many biological or medical data applications, the data distribution is far from normal, in particular in a multivariate setting, and this can impact the reliability of traditional parametric approaches. This is the main reason why we decide to stay within a nonparametric framework in the context of permutation tests. Here, no assumption for the underlying population is required, except for the “equality of underlying distributions” under null hypothesis, that is, when there are no treatment effects and treatments are randomized to subjects.

Bathke et al.^14–18 and Liu et al.¹⁹ proposed nonparametric versions of some parametric multivariate test statistics, also providing small sample approximations. The four rank-based tests they considered were of the ANOVA type, Wilks' Lambda type, Lawley-Hotelling type, and Bartlett-Nanda-Pillai type. Furthermore, recently Ellis et al.²⁰ and Burchett and Ellis²¹ developed an R package (npmv) where these inference procedures for samples of multivariate observations are implemented. Each of the four test statistics is also used for a multivariate permutation test, worked out by a random sample from the space of all possible permutations when the latter is too large to be fully exhausted. Since these procedures are based on (mid-)ranks, they represent suitable options for the problem at hand. The correct null hypothesis should be the equality of underlying distribution functions.

In the present paper, we also propose three permutation tests based on known test statistics following the nonparametric combination (NPC) methodology²² as a tool for obtaining multivariate results. In particular, we consider (1) a permutation test based on the Anderson-Darling test statistic with normalized distribution function,²³ (2) a permutation test based on Mann–Whitney test statistics with mid-ranks, and (3) a permutation test based on the traditional difference of sample means.

The following sections present the description of the permutation procedures and an overview on the NPC methodology. A comparative simulation study is presented to assess the behaviour of competing tests with data affected by nondetecs both under the null hypothesis and with regard to their power. We investigate the effect of censoring in different situations depending on the value assigned to non-detected observations (say ND). We also investigate the effect of different types of nondetected values with different scores. Finally, a recent study on exposure to allergens and the influence on IgE sensitization has been considered as a genuine application example.

2 Competing permutation tests

In the present section, we review some permutation tests based on the R package npmv, and we introduce some new permutation solutions by extending the NPC theory. Formalizing the problem, let us consider two independent samples of K-variate ( $K \geq 2$ ) independent observation vectors $X_{ij} = (X_{ij}^{(1)}, \dots, X_{ij}^{(K)})$ with $i = 1, \dots, n_{j}$ , j = 1, 2 where n_j is the sample size of the j-th sample. Let $X = X_{1} + X_{2}$ of size $N = n_{1} + n_{2}$ be the pooled sample.

We assume that $X_{ij} \sim F_{j}$ . That is, all observations in the j-th group follow the same K-variate distribution. The K variables may be dependent, and the dependence structure does not have to be specified. We are interested in testing the null hypothesis $H_{0} : F_{1} = F_{2}$ against the alternative $H_{1} : F_{1} \neq F_{2}$ . As permutation methods are conditional on the observed data, it is worth noting that under $H_{0} : F_{1} = F_{2}$ , the pooled data X are always a set of sufficient statistics for whatever underlying distribution F.

In order to clarify these ideas, Table 1 reports an extract of genuine data from allergic research,²⁴ where each row displays the IgE values related to five major indoor allergens as Der p 1 and Der p 2 originating from mite (columns Mite 1 and Mite 2), Fel d 1 originating from cat (column Cat), Can f 1 originating from dog (column Dog) and Alt a 1 originating from mold (column Mold) of each participant in the study. Each participant is either a pet owner (column Pet = 1) or not a pet owner (column Pet = 0). Note that these data are characterized by a large amount of zeros.

Table 1.

Example allergy data.

Pet	Mite 1	Mite 2	Cat	Dog	Mold
1	0	0	0	0	0
1	0	0	0	0	0
1	0	0.3	0	0	0
…	…	…	…	…	…
1	0	0	13.86	0	0
0	0	1.12	0	0	0
0	0	17.89	1.49	0	0
0	0	0	49.2	0	0
…	…	…	…	…	…
0	15.93	0	24.8	0	0

2.1 Multivariate tests from npmv package

Before elaborating on permutation tests for inference on multivariate data using the R package npmv, we first introduce some notation. Let $R_{ij}^{(k)}$ be the midrank of $X_{ij}^{(k)}$ among all N observations, and denote by R the K × N matrix containing the midranks for all variables and all observations. Let us define the matrix pairs $H_{1} = R ({\oplus_{j = 1}}^{2} \frac{1}{n_{j}} J_{n_{j}} - \frac{1}{N} J_{N}) R^{T}$ and $G_{1} = \frac{1}{N - 2} R ({\oplus_{j = 1}}^{2} P_{n_{j}}) R^{T}$ , as well as $H_{2} = R [({\oplus_{j = 1}}^{2} \frac{1}{n_{j}} 1_{n_{j}}) P_{2} ({\oplus_{j = 1}}^{2} \frac{1}{n_{j}} 1_{n_{j}}^{T})] R^{T}$ and $G_{2} = \frac{1}{2} R ({\oplus_{j = 1}}^{2} \frac{1}{n_{j} (n_{j} - 1)} P_{n_{j}}) R^{T}$ . The pair $(H_{1}, G_{1})$ corresponds to a weighted means analysis, while the pair $(H_{2}, G_{2})$ uses unweighted means, in a balanced design with $n_{j} = n, j = 1, 2, H_{1} = n H_{2}$ and $G_{1} = n G_{2}$ . Therefore, in a balanced design, both pairs lead to the same test statistic. Finally, we introduce the following four types of test statistics.

2.1.1 ANOVA type statistic

The ANOVA type statistic is defined as

T_{Anova} = tr (H_{2}) / tr (G_{2})

(1)

2.1.2 Wilks' Lambda type

Wilks' Lambda type statistic is defined as

T_{Λ} = \frac{det [(N - 2) \cdot G_{1}]}{det [(N - 2) \cdot G_{1} + H_{1}]}

(2)

2.1.3 Lawley Hotelling type

The Lawley Hotelling type statistics is calculated as

T_{LH} = tr [H_{1} ((N - 2) G_{1}) - 1]

(3)

2.1.4 Bartlett Nanda Pillai type

The Bartlett Nanda Pillai type statistic is defined as

T_{BNP} = tr {H_{1} [H_{1} + (N - 2) G_{1}] - 1}

(4)

For details see Bathke et al.,¹⁶ Liu et al.¹⁹ and Ellis et al.²⁰

For each of these tests, the permutation distribution is computed by permuting the N data vectors and recalculating the multivariate test statistics each time. The corresponding p-value statistics are computed considering the quantiles of the permutation distribution. That is, they are calculated as the percentage of values of the permutation distribution that are exceeding or equal to the value of the observed test statistic. Indeed the so-called p-value statistic, considered as a transformation of a permutation test T, namely $λ_{T} = \Pr {T^{*} \geq T^{O} | X}$ , has α as its critical value at level α. That is, $λ_{T_{α}} = α$ , which is uniform for all suitable test statistics.²⁵

2.2 NPC-based tests

Let us now define the multivariate permutation tests in the NPC framework.²² It is quite common in testing hypotheses that two or more aspects are of interest, and in such a case it is convenient to process data using a finite number of different partial tests (e.g. one for each aspect of interest). These partial tests, if jointly considered, can provide information on a general overall hypothesis that typically represents the objective of the simultaneous or multivariate testing problem. However, in the majority of real situations, independence between partial tests is difficult to be justified. Indeed, often partial tests are dependent in a way that is difficult to model explicitly, and to handle.

Furthermore, due to conditioning on the pooled data vector X and to its sufficiency property for the underlying distribution F under H₀, if g is any measurable function from $ℜ^{K} \to ℜ^{1}$ then, for every $t \in ℜ^{1}$ , $Pr {g (T_{1}^{*}, \dots, T_{K}^{*}) \leq t; F | X} = Pr {g (T_{1}^{*}, \dots, T_{K}^{*}) \leq t | X}$ . That is, the permutation null distribution of the combination of any set of K statistics does not depend on F (where F is a non-degenerate family of unknown distributions). In particular, it does not depend on its dependence parameters, provided that permutations are on individual data vectors. This is the main reason we decided to deal with that dependence in a nonparametric way by the NPC.^26–29 In this paper, we propose three types of permutation test statistics which we consider adequate for the problem at hand. As the first test statistic, we consider the Anderson–Darling test statistic with normalized distribution function.²³ The second test statistic is based on the Mann–Whitney test statistic computed using mid-ranks, and the third is based on the traditional difference of means.^30–33

Let us define the three test statistics:

2.2.1 Anderson–Darling

The Anderson–Darling test statistic with normalized distribution function is defined as

T_{AD}^{(k)} = \sum_{i = 1}^{N} \frac{{[F_{1}^{(k)} (X_{i}) - F_{2}^{(k)} (X_{i})]}^{2}}{F^{(k)} (X_{i}) (1 - F^{(k)} (X_{i}))}, k = 1, \dots, K

(5)

where

F_{1}^{(k)} (x) = [# (X_{1}^{(k)} < x) + \frac{1}{2} # (X_{1}^{(k)} = x)] / n_{1}, F_{2}^{(k)} (x) = (# (X_{2}^{(k)} < x) + \frac{1}{2} # (X_{2}^{(k)} = x)) / n_{2}

and

F^{(k)} (x) = (# (X^{(k)} < x) + \frac{1}{2} # (X^{(k)} = x)) / N

2.2.2 Mann–Whitney

The Mann–Whitney test statistic based on midranks is defined as

T_{MW}^{(k)} = max (\sum_{i = 1}^{n_{1}} R_{i 1}^{(k)} / n_{1}, \sum_{i = 1}^{n_{2}} R_{i 2}^{(k)} / n_{2}), k = 1, \dots, K

(6)

2.2.3 Difference of means

The difference of means test statistic is defined as

T_{DM}^{(k)} = abs (\frac{1}{n_{1}} \sum_{i = 1}^{n_{1}} X_{i 1}^{(k)} - \frac{1}{n_{2}} \sum_{i = 1}^{n_{2}} X_{i 2}^{(k)}), k = 1, \dots, K

(7)

Note that these test statistics have to be computed separately for each of the K components, but jointly (or synchronized) in the sense of being computed with just the same permutation of individual vectors so as to preserve sufficiency of pooled data X, and thus the underlying unknown dependence structure between the response variables, resulting in a K-dimensional vector of test statistics. Also in this case we compute the permutation distribution of each test by permuting the N data vectors and recalculating the test statistics each time and for each variable. Based on these distributions, we can compute a K-dimensional vector of p-value-statistics $λ = (λ^{(1)}, \dots, λ^{(K)})$ , considering the empirical quantiles.

In order to obtain a multivariate result, let us introduce the method of NPC of a finite number of dependent tests. Substantially, this approach corresponds to a method of analysis carried out in two successive phases, the first focusing on K partial aspects, and the second on their combination. Obviously, in order to preserve the sufficiency property of X, especially with respect to the underlying dependence structure among the variables, permutations must always be carried out on individual data vectors, so that all component variables must be jointly analyzed.

The NPC in one second-order test $T ″ = ψ (λ^{(1)}, \dots, λ^{(K)})$ is achieved by a continuous, non-increasing, univariate, measurable, and non-degenerate real function $ψ : (0, 1) K \to ℜ^{1}$ . For details about assumptions on partial tests and on desirable properties of combining functions, we refer to Pesarin and Salmaso,²² Pesarin et al.,³⁴ and Arboretti et al.^35,36 Let us refer to the combined observed value of the second-order test as $T ″ O = ψ (λ^{(1)}, \dots, λ^{(K)})$ and to the r-th combined value of the vector statistics as $T ″ * r = ψ (L^{* r (1)}, \dots, L^{* r (K)})$ where $L^{* r (k)} = [\frac{1}{2} + \sum_{b = 1}^{R} I (T^{* b (k)} \geq T^{* r (k)})] / (R + 1)$ is the empirical significance level function of $T^{(k)}$ at the r-th permutation, and R is the number of random permutations considered. Based on this $T ″ = (T ″ O, T ″^{* 1}, \dots, T^{* R})$ distribution, the p-value-statistic of the combined test is obtained in the usual way: $λ ″ = \sum_{r} I (T ″^{*} \geq T ″^{O}) / R$ . Combining functions ψ mostly used in practice are:

the Fisher omnibus combining function defined as $T ″_{F} = - 2 \cdot \sum (log λ_{k})$ ;

the Liptak combining function defined as $T ″_{L} = \sum_{k} Φ^{- 1} (1 - λ_{k})$ ;

the Tippett combining function defined as ${max}_{k = 1}^{K} (1 - λ_{k})$ .

3 Nondetects in allergy research

Exposure to indoor allergens is crucial for IgE sensitization and development of allergic symptoms. Exposure to allergens is a prerequisite for initiating an allergic sensitization leading to the production of allergen-specific IgE antibodies. However, there exist findings that show that the repeated exposure to allergens can lead to an immune response with an increase of the IgE antibody levels in the blood.³⁷ In a study conducted in Austria,²⁴ 501 adolescents from Salzburg were involved, where serum samples of participants were analyzed for specific IgE to Der p 1 (Mite 1), Der p 2 (Mite 2), Fel d1 (Cat), Can f 1 (Dog), and Alt a 1 (Mold) using a multiplex array ImmunoCAP ISAC.

One of the aims of the analysis was to assess whether there was a difference in the level of IgE antibodies between participants who had pets and those who did not have pets. As we can see from Table 2, the data are characterized by a large amount of zero or nondetect values.

Table 2.

Summary of nondetects in example data.

IgE	Nondetects Pets (%)	Nondetects No Pets (%)
Mite 1	83	88
Mite 2	83	79
Cat	80	88
Dog	96	100
Mold	97	99

Figure 1 shows the sample distributions in the two groups through violin plots of IgE values for five antibodies. As we can see from the solid dots which represent the sample means, in each variable, the data present very close means, but rather different distributions. Furthermore, some descriptive statistics of the data are shown in Table 3.

Figure 1.

Violin plot of IgE values for five allergens: comparison between participants with pet and no pet. Dot represents sample mean.

Table 3.

Main descriptive statistics of the example data.

	Mite 1	Mite 2	Cat	Dog	Mold
PET
Min:	0.000	0.000	0.000	0.000	0.000
1st Qu.:	0.000	0.000	0.000	0.000	0.000
Median:	0.000	0.000	0.000	0.000	0.000
Mean:	0.613	2.954	0.695	0.201	0.415
3rd Qu.:	0.000	0.000	0.000	0.000	0.000
Max:	32.880	111.52	18.58	27.310	93.75
NO PET
Min:	0.000	0.000	0.000	0.000	0.000
1st Qu.:	0.000	0.000	0.000	0.000	0.000
Median:	0.000	0.000	0.000	0.000	0.000
Mean:	0.833	3.214	1.026	0.000	0.166
3rd Qu.:	0.000	0.000	0.000	0.000	0.000
Max:	17.340	50.420	24.800	0.000	9.610

Table 4 shows the global p-values obtained applying different tests to the data, whereas Table 5 shows partial p-values, that is for individual variables, available for the NPC-based tests. We can see that the test based on differences of means fails in detecting any difference between the two groups, both partially as well as globally. This may indicate that in such an extreme censoring context, probably due to the too different error terms attached to non-detected data compared to detected ones, we need test statistics that are more robust. Global results show that only T_MW leads to a significant p-value at a significance level of 0.05, whereas the other tests, except T_DM, find evidence for an effect only at a significance level of 0.10. Looking at the partial results, we can see that T_AD identifies a difference between the two groups for Dog antibodies, whereas T_MW shows a difference for Mite 1 and Cat antibodies. Furthermore, note that with NPC-based tests, we can also perform directional tests, such as many one-sided instead of two-sided tests in order to investigate the direction of differences.

Table 4.

P-values of global tests: Anderson-Darling (T_AD), Mann–Whitney (T_MW), difference of means (T_DM), Anova type (T_Anova), Wilks' Lambda type ( $T_{Λ}$ ), Lawley Hotelling type (T_LH), and Bartlett Nanda Pillai type (T_BNP) using 10,000 permutations.

T _AD	0.069
T _MW	0.015
T _DM	0.462
T _Anova	0.081
$T_{Λ}$	0.081
T _LH	0.081
T _BNP	0.081

Table 5.

P-values of partial tests for NPC-based tests Anderson-Darling (T_AD), Mann–Whitney (T_MW), difference of means (T_DM) using 10,000 permutations.

	Mite 1	Mite 2	Cat	Dog	Mold
T _AD	0.220	0.159	0.122	0.048	0.496
T _MW	0.044	0.124	0.013	0.149	0.171
T _DM	0.432	0.805	0.287	0.114	0.742

4 Simulation study

In this section, we show the results of a simulation study aiming at comparing the seven permutation tests described in the previous section.

4.1 Assessing different values assigned to nondetects

In this first part of the simulation study, we investigate how the type I error rates of the compared tests are affected by censoring. Here we also try to assess if different values assigned to ND observations affect the results of the tests. It is common that when a measure falls below detection limit, it is assigned the value zero or the lowest detection limit (LDL). In this simulation study, we investigate the following situations: (a) ND fixed at 0; (b) ND fixed at LDL; (c) ND fixed at $LDL / 2$ ; (d) ND fixed at $LDL / 4$ . The general setting of the simulation study is the following:

- Data randomly generated by $Γ_{K} (2)$ , K = 3;

- Independent variables vs. correlated variables ( $ρ_{hk} = 0.4$ $\forall h \neq k$ with $h, k = 1, 2, 3$ );

- Equal sample sizes: $n_{1} = n_{2} = 12$ and 24;

- $X_{2} = δ X_{1}$ , δ = 1 (i.e. H₀);

- Fixed percentage of ND: 25%, 50%, 80%;

- $LD L_{25 %} = 0.96$ ; $LD L_{50 %} = 1.68$ and $LD L_{80 %} = 2.98$

We performed 2000 random permutations on each of 1000 MC simulations.

Note that for these settings, we kept the percentage of ND fixed, and for each sample size and correlation setting, we compared the seven tests in each situation (a)–(d). For example, when the percentage of ND was fixed at 25%, the lowest detection limit was 0.96. Therefore, values generated below 0.96 were set to (a) 0, (b) 0.96, (c) 0.48, (d) 0.24.

Tables 6 and 7 show the sizes of the tests for different percentages of ND with independent and correlated variables, respectively. It is worth noting that for some simulated data sets, the tests

T_{Λ}

, T_LH and T_BNP could not be computed due to singular covariance matrices, but this happened for less than 0.5% of times for each simulation setting. Taking into account that the number of simulations was not very high, we can cautiously conclude that the nominal alpha level 5% is reached by all tests both with small (

n_{1} = n_{2} = 12

) and moderate (

n_{1} = n_{2} = 24

) sample size.

Table 6.

Simulated sizes of the tests Anderson-Darling (T_AD), Mann–Whitney (T_MW), difference of means (T_DM), Anova type (T_Anova), Wilks' Lambda type ( $T_{Λ}$ ), Lawley Hotelling type (T_LH), and Bartlett Nanda Pillai type (T_BNP) with independent variables with percentage of ND fixed at 25%, 50%, and 80%.

	ND (25%)				ND (50%)				ND (80%)
	0	LDL	$LDL / 2$	$LDL / 4$	0	LDL	$LDL / 2$	$LDL / 4$	0	LDL	$LDL / 2$	$LDL / 4$
n = 12
T_AD	0.055	0.051	0.050	0.051	0.051	0.051	0.051	0.051	0.050	0.050	0.050	0.050
T_AD	0.055	0.051	0.050	0.051	0.051	0.051	0.051	0.051	0.050	0.050	0.050	0.050
T_MW	0.050	0.052	0.046	0.050	0.049	0.049	0.049	0.049	0.048	0.048	0.048	0.048
T_DM	0.052	0.047	0.043	0.054	0.053	0.043	0.055	0.055	0.048	0.043	0.056	0.045
T_Anova	0.053	0.047	0.053	0.054	0.046	0.046	0.046	0.046	0.057	0.057	0.057	0.057
$T_{Λ}$	0.054	0.046	0.054	0.054	0.045	0.045	0.045	0.045	0.054	0.054	0.054	0.055
T_LH	0.054	0.046	0.054	0.054	0.045	0.045	0.045	0.045	0.055	0.055	0.055	0.055
T_BNP	0.054	0.046	0.054	0.054	0.045	0.045	0.045	0.045	0.055	0.055	0.055	0.055
n = 24
T_AD	0.047	0.055	0.047	0.052	0.042	0.042	0.042	0.044	0.047	0.047	0.047	0.047
T_MW	0.046	0.054	0.046	0.054	0.040	0.040	0.040	0.042	0.048	0.048	0.048	0.048
T_DM	0.045	0.050	0.051	0.052	0.045	0.047	0.045	0.047	0.048	0.048	0.052	0.05
T_Anova	0.045	0.055	0.042	0.051	0.049	0.049	0.049	0.049	0.042	0.042	0.042	0.042
$T_{Λ}$	0.045	0.055	0.042	0.051	0.049	0.049	0.049	0.049	0.042	0.042	0.042	0.042
T_LH	0.045	0.055	0.042	0.051	0.049	0.049	0.049	0.049	0.042	0.042	0.042	0.042
T_BNP	0.045	0.055	0.042	0.051	0.049	0.049	0.049	0.049	0.042	0.042	0.042	0.042

Table 7.

Size of the tests Anderson-Darling (T_AD), Mann-Whitney (T_MW), difference of means (T_DM), Anova type (T_Anova), Wilks' Lambda type ( $T_{Λ}$ ), Lawley Hotelling type (T_LH), and Bartlett Nanda Pillai type (T_BNP) for correlated variables with percentage of ND fixed at 25%, 50%, and 80%.

	ND (25%)				ND (50%)				ND (80%)
	0	LDL	$LDL / 2$	$LDL / 4$	0	LDL	$LDL / 2$	$LDL / 4$	0	LDL	$LDL / 2$	$LDL / 4$
n = 12
T_AD	0.034	0.052	0.053	0.040	0.047	0.047	0.047	0.047	0.045	0.045	0.045	0.045
T_MW	0.035	0.054	0.049	0.041	0.047	0.047	0.047	0.047	0.048	0.048	0.048	0.048
T_DM	0.045	0.053	0.047	0.044	0.047	0.053	0.049	0.049	0.048	0.050	0.049	0.047
T_Anova	0.036	0.035	0.048	0.054	0.044	0.044	0.044	0.044	0.051	0.051	0.051	0.051
$T_{Λ}$	0.036	0.035	0.048	0.055	0.044	0.044	0.044	0.044	0.048	0.048	0.048	0.048
T_LH	0.036	0.035	0.048	0.055	0.044	0.044	0.044	0.044	0.048	0.048	0.048	0.048
T_BNP	0.036	0.035	0.048	0.055	0.044	0.044	0.044	0.044	0.049	0.049	0.049	0.049
n = 24
T_AD	0.047	0.048	0.048	0.055	0.047	0.047	0.047	0.047	0.054	0.054	0.050	0.05
T_MW	0.048	0.047	0.053	0.054	0.049	0.049	0.049	0.049	0.044	0.044	0.044	0.044
T_DM	0.043	0.051	0.053	0.052	0.050	0.054	0.054	0.051	0.044	0.045	0.045	0.045
T_Anova	0.051	0.040	0.052	0.043	0.052	0.052	0.052	0.052	0.045	0.044	0.045	0.044
$T_{Λ}$	0.051	0.040	0.053	0.043	0.051	0.051	0.051	0.051	0.044	0.044	0.044	0.044
T_LH	0.051	0.040	0.053	0.043	0.051	0.051	0.051	0.051	0.044	0.044	0.044	0.044
T_BNP	0.051	0.040	0.053	0.043	0.051	0.051	0.051	0.051	0.044	0.044	0.044	0.044

In order to investigate the behaviour under the alternative hypothesis, we consider a multiplicative shift $δ = 1.5$ . Since from the previous results it emerges that the value assigned to ND does not seem to affect the performance of tests, for the power of the tests, we show only the results with ND set equal to 0. Figures 2 and 3 show the power of each test in each simulated scenario with small ( $n_{1} = n_{2} = 12$ ) and moderate ( $n_{1} = n_{2} = 24$ ) sample size, respectively. Results with independent and correlated variables are, respectively, in the left and right panel of each figure. For small sample sizes, a dashed line indicates a power of 50%, whereas with moderate sample sizes, we indicate with a dashed line a power of 80%.

Figure 2.

Rejection rates of the seven tests Anderson-Darling (T_AD), Mann-Whitney (T_MW), difference of means (T_DM), Anova type (T_Anova), Wilks' Lambda type ( $T_{Λ}$ ), Lawley Hotelling type (T_LH), and Bartlett Nanda Pillai type (T_BNP) under the alternative hypothesis for independent variables (left panel) and correlated variables (right panel) with small sample size ( $n_{1} = n_{2} = 12$ ) with percentage of ND fixed at 25%, 50% and 80% and ND value equal to 0. In each graphic, a dashed line indicates power equal to 50%.

Figure 3.

Rejection rates of the seven tests Anderson-Darling (T_AD), Mann-Whitney (T_MW), difference of means (T_DM), Anova type (T_Anova), Wilks' Lambda type ( $T_{Λ}$ ), Lawley Hotelling type (T_LH), and Bartlett Nanda Pillai type (T_BNP) under the alternative hypothesis for independent variables (left panel) and correlated variables (right panel) with moderate sample size ( $n_{1} = n_{2} = 24$ ) with percentage of ND fixed at 25%, 50% and 80% and ND value equal to 0. In each graphic, a dashed line indicates power equal to 80%.

In general, we can see that methods from npmv package behaved very similarly to each other in all situations whereas NPC-based methods performed quite similarly to each other and always better than methods from npmv package. Note that correlation among variables seemed to affect NPC-based tests by reducing the power of multivariate tests. This issue is common to most parametric and nonparametric multivariate tests with positively dependent components where Fisher's information is maximal when these components are independent (see Giancristofaro et al.³⁸). Nevertheless, NPC-based procedures here still presented the highest power also with moderately correlated ( $ρ_{hk} = 0.4$ ) variables.

4.2 Evaluating performance for increasing percentage of nondetects

After having assessed that the values assigned to ND do not affect the performance, we further investigate the general behaviour of the tests as the percentages of ND increase. In what follows, we consider the value of ND fixed at 0. We firstly continue to considering $Γ (2)$ as data distribution and $δ = 1.5$ . From Figure 4(a) to (e) we can see the rejection rates of the tests for increasing percentage of ND in case of independent variables ( $Σ = I_{K}$ , K = 3) and two types of variance/covariance matrices $Σ_{1} = (\begin{matrix} 1 & 0.4 & 0.4 \\ 0.4 & 1 & 0.4 \\ 0.4 & 0.4 & 1 \end{matrix})$ and $Σ_{2} = (\begin{matrix} 1 & 0.4 & - 0.7 \\ 0.4 & 1 & 0.25 \\ 0.7 & 0.25 & 1 \end{matrix})$ . Results confirm that NPC-based tests show the best performances for all percentages of ND considered (up to 99%). For relatively small sample size ( $n_{1} = n_{2} = 24$ ), when the percentages of ND were 95% and 99%, the proportion of times in which $T_{Λ}, T_{LH}$ and T_BNP could not be computed due to singular covariance matrices, exceeded 2.5% and 5%, respectively. In Figure 4(d) we also show the curves for a higher sample size ( $n_{1} = n_{2} = 48$ ). Furthermore, in Figure 4(e) the results under the null hypothesis are shown, and we can see that the nominal α-level is met for most of the considered ND percentages except for the case of 99% of ND where NPC-based tests become conservative whereas the tests from npmv become very liberal.

Figure 4.

Effects of increasing percentage of ND for different sample sizes and variance/covariance matrices. (a) Rejection rates for $n_{1} = n_{2} = 24, Σ = I_{3}$ ; (b) Rejection rates for $n_{1} = n_{2} = 24, Σ = Σ_{1}$ ; (c) Rejection rates for $n_{1} = n_{2} = 24, Σ = Σ_{2}$ ; (d) Rejection rates for $n_{1} = n_{2} = 48, Σ = Σ_{1}$ ; and (e) Size for $n_{1} = n_{2} = 48, Σ = Σ_{2}$ .

In order to consider a distribution which mostly reflects the extreme situation of our real dataset, we chose the Pareto distribution with shape parameter a = 1. As we can see from Figure 5(a), in this extreme situation, T_DM becomes ineffective for detecting differences as shown by the lower rejection rates. Figure 6 displays the rejection rates of the tests as the effect δ increases for a percentage of ND fixed at 85%. The curves of the tests from the npmv package coincide whereas the curves of NPC-based tests present higher rejection rates, except in this case for T_DM. In particular T_MW appears to perform best in this situation.

Figure 5.

Effects of increasing percentage of ND when data are generated from $Paret o_{3} (a = 1)$ and $Σ = Σ_{2}$ . (a) Rejection rates for $n_{1} = n_{2} = 50$ and (b) Size for $n_{1} = n_{2} = 50$ .

Figure 6.

Power of the tests with data from $Paret o_{3} (a = 1)$ and $Σ = Σ_{2}, n_{1} = n_{2} = 50$ and percentage of ND fixed at 85%.

5 Conclusions and future perspectives

In this work, we have addressed the problem of analyzing data including nondetects, that is, left-censored data where the censoring may often be due to insufficient precision of the measurement instrument when trying to detect small data values. For example, this is a very common feature of data originating from microarrays. We motivate the problem showing a genuine example from allergy research, where data were characterized by a large amount of zeroes or nondetect values. In the application example, we were interested in assessing whether being a pet owner impacted the IgE levels for different groups of antibodies. Several of the measured IgE values were zero (or too small to be detected), and this was useful information related to the problem.

Regarding two-sample permutation tests studied in the present paper, it emerged that T_DM may become ineffective in extreme situations such as when the data come from a distribution like $Pareto (a = 1)$ . In general, we found that NPC-type test statistics based on nonparametric effects, and in particular T_MW, performed best, both with simulated data and when applied to real data.

The R codes to achieve the NPC-based tests presented in this work are available in a public GitHub repository under the name NPCnondetects.

In a next step, similar to testing in the presence of non-ignorable missing data²² (pp.232–251), we plan to consider a multi-aspect test³⁹ approach. That is, for each endpoint two tests are performed separately but simultaneously, one for the proportion of zeroes or nondetects, and one for observed positive data, and these are combined using NPC. In this way, we can investigate the probability of zeros or nondetects, as well as the “visible” effects of the treatment (or pseudo-treatment). From preliminary simulations, we conjecture that this way of testing presents good operating characteristics both under the null hypothesis and in power. Moreover, since $T_{DM} ″$ in most settings behaves better than $T_{MW}''$ , which in turns behaves better in other settings, it would be of interest to investigate the behaviour of their NPC by considering $T_{C}''' * = T''' (T_{DM}'' *, T_{MW}'' *)$ . This is because their combination is normally ruled by the best of the two.

Footnotes

Acknowledgements

The authors would like to thank the Editor and Referees for useful comments and suggestions which helped improving the manuscript and Professor Ludwig Hothorn for pointing out the problem, and for several interesting discussions in this context.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research was supported by Austrian Science Fund (FWF) I 2697-N31 and UNIPD BIRD185315/18.

References

Aitchison

. On the distibution of a positive random variable having a discrete probability mass at the origin. J Am Stat Assoc 1955; 50: 901–908.

Gehan

. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 1965; 52: 203–224.

Efron B. The two sample problem with censored data. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. Berkeley, US, 1967, pp.831–835.

Koziol

Jia

. The concordance index C and the Mann-Whitney parameter Pr(X > Y) with randomly censored data. Biom J 2009; 51: 467–474.

Dobler D and Pauly M. Boostrap- and permutation-based inference for the Mann-Whitney effect for right-censored and tied data. TEST 2017; 1–20. DOI: 10.1007/s11749-017-0565-z.

Zhou

Obuchowski

McClish

. Statistical methods in diagnostic medicine, New York, NY: Wiley, 2002.

Shapiro

. The interpretation of diagnostic tests. Stat Method Med Res 1999; 8: 113–134.

Lange

Brunner

. Sensitivity, specificity and ROC-curves in multiple reader diagnostic trials-a unified, nonparametric approach. Stat Methodol 2012; 9: 490–500.

Schisterman

Reiser

Faraggi

. ROC analysis for markers with mass at zero. Stat Med 2006; 25: 623–638.

10.

Zou

Hall

Shapiro

. Smooth non-parametric receiver operating characteristic (ROC) curves for continuous diagnostic tests. Stat Med 1997; 16: 2143–2156.

11.

Faraggi

Reiser

. Estimation of the area under the ROC curve. Stat Med 2002; 21: 3093–3106.

12.

Farcomeni

. A MANOVA test for multivatiate lognormal observations with a spike at zero, with application to ecological niches of South Africa. Biom J 2016; 58: 320–330.

13.

Thulin

. Two-sample tests and one-way MANOVA for multivariate biomarker data with nondetects. Stat Med 2016; 35: 3623–3644.

14.

Bathke

Harrar

. Nonparametric methods in multivariate factorial designs for large number of factor levels. J Stat Plann Inf 2008; 138(3): 588–610.

15.

Harrar

Bathke

. Nonparametric methods for unbalanced multivariate data and many factor levels. J Multiv Anal 2008; 99(8): 1635–1664.

16.

Bathke

Harrar

Madden

. How to compare small multivariate samples using nonparametric tests. Comput Stat Dat An 2008; 52: 4951–4965.

17.

Harrar

Bathke

. A nonparametric version of the Bartlett-Nanda-Pillai multivariate test. Asymptotics, approximations, and applications. Am J Math Manage Sci 2008; 28(3–4): 309–335.

18.

Bathke

Harrar

Ahmad

. Some contributions to the analysis of multivariate data. Biom J 2009; 51: 285–303.

19.

Liu

Bathke

Harrar

. A nonparametric version of Wilks' lambda-Asymptotic results and small sample approximations. Stat Probabil Lett 2011; 81: 1502–1506.

20.

Ellis

Burchett

Harrar

, et al. Nonparametric inference for multivariate data: the R Package npmv. J Stat Softw 2017; 76: 1–18.

21.

Burchett WW and Ellis AR. npmv: Nonparametric comparison of multivariate samples. R package version 2.4.0, https://CRAN.R-project.org/package=npmv (2017) (accessed 9 January 2017).

22.

Pesarin

Salmaso

. Permutation tests for complex data: theory, applications and software, Hoboken, NY: John Wiley and Sons, 2010.

23.

Ruymgaart

A unified approach to the asymptotic distribution theory of certain midrank statistics. In: Raoult

(ed). Statistique non Parametrique Asymptotique, Berlin: Springer, 1980, pp. 1–18.

24.

Stemeseder

Schweidler

Doppler

, et al. Exposure to indoor allergens in different residential settings and its influence on IgE sensitization in a geographically confined Austrian cohort. PloS One 2017; 12: e0168686.

25.

Pesarin

. Some elementary theory of permutation tests. Commun Stat Theory Meth 2015; 44: 4880–4892.

26.

Pesarin

Salmaso

. Finite-sample consistency of combination-based permutation tests with application to repeated measures designs. J Nonparametr Stat 2010; 22: 669–684.

27.

Pesarin

Salmaso

. On the weak consistency of permutation tests. Commun Stat Simul Comput 2013; 42: 1368–1379.

28.

Salmaso

. Combination-based permutation tests: equipower property and power behaviour in presence of correlation. Commun Stat Theory Meth 2015; 44: 5225–5239.

29.

Pesarin

Salmaso

. A review and some new results on permutation testing for multivariate problems. Stat Comput 2012; 22: 639–646.

30.

Arboretti Giancristofaro

Bonnini

Pesarin

. A permutation approach for testing heterogeneity in two-sample categorical variables. Stat Comput 2009; 19: 209–216.

31.

Arboretti Giancristofaro

Bonnini

. Moment-based multivariate permutation tests for ordinal categorical data. J Nonparametr Stat 2008; 20: 383–393.

32.

Arboretti Giancristofaro

Bonnini

. Some new results on univariate and multivariate permutation tests for ordinal categorical variables under restricted alternatives. Stat Method Appl 2009; 18: 221–236.

33.

Arboretti Giancristofaro

Pesarin

Salmaso

. Permutation Anderson-Darling type and moment based test statistics for univariate ordered categorical data. Commun Stat Simul Comput 2007; 36: 139–150.

34.

Pesarin

Salmaso

Carrozzo

, et al. Union-intersection permutation solution for two-sample equivalence testing. Stat Comput 2016; 26: 693–701.

35.

Arboretti

Carrozzo

Pesarin

, et al. A multivariate extension of union-intersection permutation solution for two-sample testing. J Stat Theory Pract 2017; 11: 436–448.

36.

Arboretti

Carrozzo

Pesarin

, et al. Testing for equivalence: an intersection-union permutation solution. Stat Biopharm Res 2018; 10: 130–138.

37.

von Mutius

. The microbial environment and its influence on asthma prevention in early life. J Allergy Clin Immunol 2016; 137: 680–689.

38.

Giancristofaro

Carrozzo

Cichi

, et al. The influence of the dependency structure in combination-based multivariate permutation tests in case of ordered categorical responses. In: Melas VB, Mignani S, Monari P and Salmaso L (eds) Topics in statistical simulation, New York, NY: Springer, 2014, pp. 229–238. .

39.

Salmaso

Solari

. Multiple aspect testing for case-control designs. Metrika 2005; 62: 331–340.