2016-10-01T20:33:16Z
http://oai.repec.org/oai.php
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:861-8722013-08-16RePEc:oup:biomet
article
Nonparametric estimation of the probability of illness in the illness-death model under cross-sectional sampling
Cross-sectional sampling is an attractive design that saves resources but results in biased data. For proper inference, one should first discover the bias function and then weigh observations appropriately. We consider cross-sectioning of the illness-death model with the aim of estimating the probability of visiting the illness state before death. We develop simple consistent and asymptotically normal estimators under various assumptions on the model and data collection and, in particular, compare designs with and without a follow-up. These designs are common in surveillance of hospital acquired infections, but estimators currently in use do not properly correct the bias. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
861
872
http://hdl.handle.net/10.1093/biomet/asp046
application/pdf
Access to full text is restricted to subscribers.
M. Mandel
R. Fluss
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:975-9822013-08-16RePEc:oup:biomet
article
Maximum likelihood estimation using composite likelihoods for closed exponential families
In certain multivariate problems the full probability density has an awkward normalizing constant, but the conditional and/or marginal distributions may be much more tractable. In this paper we investigate the use of composite likelihoods instead of the full likelihood. For closed exponential families, both are shown to be maximized by the same parameter values for any number of observations. Examples include log-linear models and multivariate normal models. In other cases the parameter estimate obtained by maximizing a composite likelihood can be viewed as an approximation to the full maximum likelihood estimate. An application is given to an example in directional data based on a bivariate von Mises distribution. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
975
982
http://hdl.handle.net/10.1093/biomet/asp056
application/pdf
Access to full text is restricted to subscribers.
Kanti V. Mardia
John T. Kent
Gareth Hughes
Charles C. Taylor
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:887-9012013-08-16RePEc:oup:biomet
article
Marginal hazards model for case-cohort studies with multiple disease outcomes
Case-cohort study designs are widely used to reduce the cost of large cohort studies while achieving the same goals, especially when the disease rate is low. A key advantage of the case-cohort study design is its capacity to use the same subcohort for several diseases or for several subtypes of disease. In order to compare the effect of a risk factor on different types of diseases, times to different events need to be modelled simultaneously. Valid statistical methods that take the correlations among the outcomes from the same subject into account need to be developed. To this end, we consider marginal proportional hazards regression models for case-cohort studies with multiple disease outcomes. We also consider generalized case-cohort designs that do not require sampling all the cases, which is more realistic for multiple disease outcomes. We propose an estimating equation approach for parameter estimation with two different types of weights. Consistency and asymptotic normality of the proposed estimators are established. Large sample approximation works well in small samples in simulation studies. The proposed methods are applied to the Busselton Health Study. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
887
901
http://hdl.handle.net/10.1093/biomet/asp059
application/pdf
Access to full text is restricted to subscribers.
S. Kang
J. Cai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1019-10232013-08-16RePEc:oup:biomet
article
A note on a conjectured sharpness principle for probabilistic forecasting with calibration
This note proves a weak sharpness principle as conjectured by Gneiting et al. (2007) in connection with probabilistic forecasting subject to calibration constraints. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1019
1023
http://hdl.handle.net/10.1093/biomet/asp054
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:835-8452013-08-16RePEc:oup:biomet
article
Bayesian lasso regression
The lasso estimate for linear regression corresponds to a posterior mode when independent, double-exponential prior distributions are placed on the regression coefficients. This paper introduces new aspects of the broader Bayesian treatment of lasso regression. A direct characterization of the regression coefficients' posterior distribution is provided, and computation and inference under this characterization is shown to be straightforward. Emphasis is placed on point estimation using the posterior mean, which facilitates prediction of future observations via the posterior predictive distribution. It is shown that the standard lasso prediction method does not necessarily agree with model-based, Bayesian predictions. A new Gibbs sampler for Bayesian lasso regression is introduced. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
835
845
http://hdl.handle.net/10.1093/biomet/asp047
application/pdf
Access to full text is restricted to subscribers.
Chris Hans
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:847-8602013-08-16RePEc:oup:biomet
article
Generalized fiducial inference for wavelet regression
We apply Fisher's fiducial idea to wavelet regression, first developing a general methodology for handling model selection problems within the fiducial framework. We propose fiducial-based methods for wavelet curve estimation and the construction of pointwise confidence intervals. We show that these confidence intervals have asymptotically correct coverage. Simulations demonstrate that they possess promising empirical properties. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
847
860
http://hdl.handle.net/10.1093/biomet/asp050
application/pdf
Access to full text is restricted to subscribers.
Jan Hannig
Thomas C. M. Lee
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1024-10242013-08-16RePEc:oup:biomet
article
'Generalized method of moments estimation for linear regression with clustered failure time data'
4
2009
96
Biometrika
1024
1024
http://hdl.handle.net/10.1093/biomet/asp061
application/pdf
Access to full text is restricted to subscribers.
Hui Li
Guosheng Yin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:917-9322013-08-16RePEc:oup:biomet
article
A unified approach to linearization variance estimation from survey data after imputation for item nonresponse
Variance estimation after imputation is an important practical problem in survey sampling. When deterministic imputation or stochastic imputation is used, we show that the variance of the imputed estimator can be consistently estimated by a unifying linearize and reverse approach. We provide some applications of the approach to regression imputation, fractional categorical imputation, multiple imputation and composite imputation. Results from a simulation study, under a factorial structure for the sampling, response and imputation mechanisms, show that the proposed linearization variance estimator performs well in terms of relative bias, assuming a missing at random response mechanism. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
917
932
http://hdl.handle.net/10.1093/biomet/asp041
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:971-9742013-08-16RePEc:oup:biomet
article
Construction of orthogonal Latin hypercube designs
Latin hypercube designs have found wide application. Such designs guarantee uniform samples for the marginal distribution of each input variable. We propose a method for constructing orthogonal Latin hypercube designs in which all the linear terms are orthogonal not only to each other, but also to the quadratic terms. This construction method is convenient and flexible, and the resulting designs can accommodate many more factors than can existing ones. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
971
974
http://hdl.handle.net/10.1093/biomet/asp058
application/pdf
Access to full text is restricted to subscribers.
Fasheng Sun
Min-Qian Liu
Dennis K. J. Lin
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:983-9902013-08-16RePEc:oup:biomet
article
Adaptive approximate Bayesian computation
Sequential techniques can enhance the efficiency of the approximate Bayesian computation algorithm, as in Sisson et al.'s (2007) partial rejection control version. While this method is based upon the theoretical works of Del Moral et al. (2006), the application to approximate Bayesian computation results in a bias in the approximation to the posterior. An alternative version based on genuine importance sampling arguments bypasses this difficulty, in connection with the population Monte Carlo method of Cappé et al. (2004), and it includes an automatic scaling of the forward kernel. When applied to a population genetics example, it compares favourably with two other versions of the approximate algorithm. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
983
990
http://hdl.handle.net/10.1093/biomet/asp052
application/pdf
Access to full text is restricted to subscribers.
Mark A. Beaumont
Jean-Marie Cornuet
Jean-Michel Marin
Christian P. Robert
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:873-8862013-08-16RePEc:oup:biomet
article
Nonparametric estimation for right-censored length-biased data: a pseudo-partial likelihood approach
To estimate the lifetime distribution of right-censored length-biased data, we propose a pseudo-partial likelihood approach that allows us to derive two nonparametric estimators. With its closed-form estimators and explicit limiting variances, this approach retains the simplicity of conditional analysis, and has only a small efficiency loss compared with the unconditional analysis. Under some regularity conditions, we show that the two estimators are uniformly consistent and converge weakly to Gaussian processes. A simulation study demonstrates that the proposed estimators have satisfactory finite-sample performance. Application to an Alzheimer's disease study is reported. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
873
886
http://hdl.handle.net/10.1093/biomet/asp064
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:821-8342013-08-16RePEc:oup:biomet
article
Bayesian analysis of matrix normal graphical models
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
821
834
http://hdl.handle.net/10.1093/biomet/asp049
application/pdf
Access to full text is restricted to subscribers.
Hao Wang
Mike West
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:781-7922013-08-16RePEc:oup:biomet
article
A new look at time series of counts
This paper proposes a simple new model for stationary time series of integer counts. Previous work has focused on thinning methods and classical time series autoregressive moving-average difference equations; in contrast, our methods use a renewal process to generate a correlated sequence of Bernoulli trials. By superpositioning independent copies of such processes, stationary series with binomial, Poisson, geometric or any other discrete marginal distribution can be readily constructed. The model class proposed is parsimonious, non-Markov and readily generates series with either short- or long-memory autocovariances. The model can be fitted with linear prediction techniques for stationary series. As an example, a stationary series with binomial marginal distributions is fitted to the number of rainy days in 210 consecutive weeks at Key West, Florida. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
781
792
http://hdl.handle.net/10.1093/biomet/asp057
application/pdf
Access to full text is restricted to subscribers.
Yunwei Cui
Robert Lund
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:998-10042013-08-16RePEc:oup:biomet
article
A note on the variance of doubly-robust G-estimators
A recursive variance calculation is derived for doubly-robust G-estimators for dynamic treatment regimes in a multi-interval setting. Treatment decision parameters are not assumed to be shared across treatment intervals; this independence of parameters permits sequential estimation of the G-estimators' variance when G-estimation is performed in a sequential fashion. The recursive variance calculation is both natural and computationally feasible. This development can easily be adapted to other complex estimating procedures that require nuisance parameter estimation. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
998
1004
http://hdl.handle.net/10.1093/biomet/asp043
application/pdf
Access to full text is restricted to subscribers.
E. E. M. Moodie
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:945-9562013-08-16RePEc:oup:biomet
article
Sliced space-filling designs
We propose an approach to constructing a new type of design, a sliced space-filling design, intended for computer experiments with qualitative and quantitative factors. The approach starts with constructing a Latin hypercube design based on a special orthogonal array for the quantitative factors and then partitions the design into groups corresponding to different level combinations of the qualitative factors. The points in each group have good space-filling properties. Some illustrative examples are given. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
945
956
http://hdl.handle.net/10.1093/biomet/asp044
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:903-9152013-08-16RePEc:oup:biomet
article
Tests and confidence intervals for secondary endpoints in sequential clinical trials
In a sequential clinical trial whose stopping rule depends on the primary endpoint, inference on secondary endpoints is an important long-standing problem. Ignoring the possibility of early stopping based on the primary endpoint may result in substantial bias. To address this problem, a commonly used approach is to develop bias correction by estimating the bias in the case of bivariate normal outcomes and appealing to joint asymptotic normality of the statistics associated with the primary and secondary endpoints. We propose herein a new approach that uses resampling and a novel ordering scheme in the sample space of sequential statistics observed up to a stopping time. This approach is shown to provide accurate inference in complex clinical trials, including time-sequential trials with survival endpoints and covariates. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
903
915
http://hdl.handle.net/10.1093/biomet/asp063
application/pdf
Access to full text is restricted to subscribers.
Tze Leung Lai
Mei-Chiung Shih
Zheng Su
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:957-9702013-08-16RePEc:oup:biomet
article
Nested Latin hypercube designs
We propose an approach to constructing nested Latin hypercube designs. Such designs are useful for conducting multiple computer experiments with different levels of accuracy. A nested Latin hypercube design with two layers is defined to be a special Latin hypercube design that contains a smaller Latin hypercube design as a subset. Our method is easy to implement and can accommodate any number of factors. We also extend this method to construct nested Latin hypercube designs with more than two layers. Illustrative examples are given. Some statistical properties of the constructed designs are derived. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
957
970
http://hdl.handle.net/10.1093/biomet/asp045
application/pdf
Access to full text is restricted to subscribers.
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:1012-10182013-08-16RePEc:oup:biomet
article
A note on adaptive Bonferroni and Holm procedures under dependence
Hochberg & Benjamini (1990) first presented adaptive procedures for controlling familywise error rate. However, until now, it has not been proved that these procedures control the familywise error rate. We introduce a simplified version of Hochberg & Benjamini's adaptive Bonferroni and Holm procedures. Assuming a conditional dependence model, we prove that the former procedure controls the familywise error rate in finite samples while the latter controls it approximately. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
1012
1018
http://hdl.handle.net/10.1093/biomet/asp048
application/pdf
Access to full text is restricted to subscribers.
Wenge Guo
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:793-8042013-08-16RePEc:oup:biomet
article
Bias reduction in exponential family nonlinear models
In Firth (1993, Biometrika) it was shown how the leading term in the asymptotic bias of the maximum likelihood estimator is removed by adjusting the score vector, and that in canonical-link generalized linear models the method is equivalent to maximizing a penalized likelihood that is easily implemented via iterative adjustment of the data. Here a more general family of bias-reducing adjustments is developed for a broad class of univariate and multivariate generalized nonlinear models. The resulting formulae for the adjusted score vector are computationally convenient, and in univariate models they directly suggest implementation through an iterative scheme of data adjustment. For generalized linear models a necessary and sufficient condition is given for the existence of a penalized likelihood interpretation of the method. An illustrative application to the Goodman row-column association model shows how the computational simplicity and statistical benefits of bias reduction extend beyond generalized linear models. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
793
804
http://hdl.handle.net/10.1093/biomet/asp055
application/pdf
Access to full text is restricted to subscribers.
Ioannis Kosmidis
David Firth
oai:RePEc:oup:biomet:v:96:y:2009:i:4:p:991-9972013-08-16RePEc:oup:biomet
article
Semiparametric methods for evaluating risk prediction markers in case-control studies
The performance of a well-calibrated risk model for a binary disease outcome can be characterized by the population distribution of risk and displayed with the predictiveness curve. Better performance is characterized by a wider distribution of risk, since this corresponds to better risk stratification in the sense that more subjects are identified at low and high risk for the disease outcome. Although methods have been developed to estimate predictiveness curves from cohort studies, most studies to evaluate novel risk prediction markers employ case-control designs. Here we develop semiparametric methods that accommodate case-control data. The semiparametric methods are flexible, and naturally generalize methods previously developed for cohort data. Applications to prostate cancer risk prediction markers illustrate the methods. Copyright 2009, Oxford University Press.
4
2009
96
Biometrika
991
997
http://hdl.handle.net/10.1093/biomet/asp040
application/pdf
Access to full text is restricted to subscribers.
Ying Huang
Margaret Sullivan Pepe
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:571-5862013-11-29RePEc:oup:biomet
article
Statistics of orthogonal axial frames
An orthogonal axial frame is a set of orthonormal unit vectors which are known only up to sign. Such frames arise in crystallography and seismology and as principal axes of multivariate data or of some physical tensors. We develop methods for analysing data of this form. A test of uniformity is given. Parametric models for orthogonal axial frames are presented and tests of location are considered. A brief illustrative example involving earthquakes is given. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
571
586
http://hdl.handle.net/10.1093/biomet/ast017
application/pdf
Access to full text is restricted to subscribers.
R. Arnold
P. E. Jupp
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:778-7802013-11-29RePEc:oup:biomet
article
Convergence of Luo and Tsai's iterative algorithm for estimation in proportional likelihood ratio models
Luo & Tsai (2012, Biometrika) introduced the proportional likelihood ratio model. They proposed an iterative algorithm for the estimation of the baseline distribution function but did not establish its convergence. Here we provide a proof of convergence. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
778
780
http://hdl.handle.net/10.1093/biomet/ast019
application/pdf
Access to full text is restricted to subscribers.
O. Davidov
G. Iliopoulos
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:695-7082013-11-29RePEc:oup:biomet
article
More efficient estimators for case-cohort studies
The case-cohort study design, used to reduce costs in large cohort studies, involves a random sample of the entire cohort, called the subcohort, augmented with subjects having the disease of interest but not in the subcohort sample. When several diseases are of interest, multiple case-cohort studies may be conducted using the same subcohort, with each disease analysed separately, ignoring the additional exposure measurements collected on subjects with the other diseases. This is not an efficient use of the data, and in this paper we propose more efficient estimators. We consider both joint and separate analyses for the multiple diseases. We propose an estimating equation approach with a new weight function, and we establish the consistency and asymptotic normality of the resulting estimator. Simulation studies show that the proposed methods using all available information lead to gains in efficiency. We apply our proposed method to data from the Busselton Health Study. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
695
708
http://hdl.handle.net/10.1093/biomet/ast018
application/pdf
Access to full text is restricted to subscribers.
S. Kim
J. Cai
W. Lu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:727-7402013-11-29RePEc:oup:biomet
article
Nonparametric estimation of the mean function for recurrent event data with missing event category
Recurrent event data frequently arise in longitudinal studies when study subjects possibly experience more than one event during the observation period. Often, such recurrent events can be categorized. However, part of the categorization may be missing due to technical difficulties. If the event types are missing completely at random, then a complete case analysis may provide consistent estimates of regression parameters in certain regression models, but estimates of the baseline event rates are generally biased. Previous work on nonparametric estimation of these rates has utilized parametric missingness models. In this paper, we develop fully nonparametric methods in which the missingness mechanism is completely unspecified. Consistency and asymptotic normality of the nonparametric estimators of the mean event functions accommodate nonparametric estimators of the event category probabilities, which converge more slowly than the parametric rate. Plug-in variance estimators are provided and perform well in simulation studies, where complete case estimators may exhibit large biases and parametric estimators generally have a larger mean squared error when the model is misspecified. The proposed methods are applied to data from a cystic fibrosis registry. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
727
740
http://hdl.handle.net/10.1093/biomet/ast016
application/pdf
Access to full text is restricted to subscribers.
Feng-Chang Lin
Jianwen Cai
Jason P. Fine
Huichuan J. Lai
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:655-6702013-11-29RePEc:oup:biomet
article
High-dimensional semiparametric bigraphical models
In multivariate analysis, a Gaussian bigraphical model is commonly used for modelling matrix-valued data. In this paper, we propose a semiparametric extension of the Gaussian bigraphical model, called the nonparanormal bigraphical model. A projected nonparametric rank-based regularization approach is employed to estimate sparse precision matrices and produce graphs under a penalized likelihood framework. Theoretically, our semiparametric procedure achieves the parametric rates of convergence for both matrix estimation and graph recovery. Empirically, our approach outperforms the parametric Gaussian model for non-Gaussian data and is competitive with its parametric counterpart for Gaussian data. Extensions to the categorical bigraphical model and the missing data problem are discussed. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
655
670
http://hdl.handle.net/10.1093/biomet/ast009
application/pdf
Access to full text is restricted to subscribers.
Yang Ning
Han Liu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:671-6802013-11-29RePEc:oup:biomet
article
Inverse probability weighting with error-prone covariates
Inverse probability-weighted estimators are widely used in applications where data are missing due to nonresponse or censoring and in the estimation of causal effects from observational studies. Current estimators rely on ignorability assumptions for response indicators or treatment assignment and outcomes being conditional on observed covariates which are assumed to be measured without error. However, measurement error is common for the variables collected in many applications. For example, in studies of educational interventions, student achievement as measured by standardized tests is almost always used as the key covariate for removing hidden biases, but standardized test scores may have substantial measurement errors. We provide several expressions for a weighting function that can yield a consistent estimator for population means using incomplete data and covariates measured with error. We propose a method to estimate the weighting function from data. The results of a simulation study show that the estimator is consistent and has no bias and small variance. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
671
680
http://hdl.handle.net/10.1093/biomet/ast022
application/pdf
Access to full text is restricted to subscribers.
Daniel F. McCaffrey
J. R. Lockwood
Claude M. Setodji
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:587-6062013-11-29RePEc:oup:biomet
article
Automatic declustering of rare events
The analysis of events with low probability but disastrous impact entails understanding how they cluster in time. We present an automatic three-step procedure for identifying clusters, estimating the cluster size distribution and constructing confidence intervals for the extremal index, which measures the degree of clustering of rare events. The third step combines empirical likelihood and parametric likelihood approaches. Simulations show that our new procedure performs very well for finite samples and outperforms previous methods in constructing confidence intervals for the extremal index when there is clustering in the data, as well as in estimating probabilities for small clusters. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
587
606
http://hdl.handle.net/10.1093/biomet/ast013
application/pdf
Access to full text is restricted to subscribers.
C. Y. Robert
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:555-5692013-11-29RePEc:oup:biomet
article
A unified approach to robust estimation in finite population sampling
We argue that the conditional bias associated with a sample unit can be a useful measure of influence in finite population sampling. We use the conditional bias to derive robust estimators that are obtained by downweighting the most influential sample units. Under the model-based approach to inference, our proposed robust estimator is closely related to the well-known estimator of Chambers (1986). Under the design-based approach, it possesses the desirable feature of being applicable with most sampling designs used in practice. For stratified simple random sampling, it is essentially equivalent to the estimator of Kokic & Bell (1994). The proposed robust estimator depends on a tuning constant. In this paper, we propose a method for determining the tuning constant and show that the resulting estimator is consistent. Results from a simulation study suggest that our approach improves the efficiency of standard nonrobust estimators when the population contains units that may be influential if selected in the sample. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
555
569
http://hdl.handle.net/10.1093/biomet/ast010
application/pdf
Access to full text is restricted to subscribers.
J.-F. Beaumont
D. Haziza
A. Ruiz-Gazen
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:771-7772013-11-29RePEc:oup:biomet
article
Species sampling models: consistency for the number of species
This paper considers species sampling models using constructions that arise from Bayesian nonparametric prior distributions. A discrete random measure, used to generate a species sampling model, can have either a countable infinite number of atoms, which has been the emphasis in the recent literature, or a finite number of atoms K, while allowing K to be assigned a prior probability distribution on the positive integers. It is the latter class of model we consider here, due to the interpretation of K as the number of species. We demonstrate the consistency of the posterior distribution of K as the sample size increases. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
771
777
http://hdl.handle.net/10.1093/biomet/ast006
application/pdf
Access to full text is restricted to subscribers.
P. G. Bissiri
A. Ongaro
S. G. Walker
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:681-6942013-11-29RePEc:oup:biomet
article
Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions
A dynamic treatment regime is a list of sequential decision rules for assigning treatment based on a patient's history. Q- and A-learning are two main approaches for estimating the optimal regime, i.e., that yielding the most beneficial outcome in the patient population, using data from a clinical trial or observational study. Q-learning requires postulated regression models for the outcome, while A-learning involves models for that part of the outcome regression representing treatment contrasts and for treatment assignment. We propose an alternative to Q- and A-learning that maximizes a doubly robust augmented inverse probability weighted estimator for population mean outcome over a restricted class of regimes. Simulations demonstrate the method's performance and robustness to model misspecification, which is a key concern. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
681
694
http://hdl.handle.net/10.1093/biomet/ast014
application/pdf
Access to full text is restricted to subscribers.
Baqun Zhang
Anastasios A. Tsiatis
Eric B. Laber
Marie Davidian
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:539-5532013-11-29RePEc:oup:biomet
article
A general modelling framework for multivariate disease mapping
This paper deals with multivariate disease mapping. We propose a novel framework that encompasses most of the models already proposed. Our framework starts with a simple identity, reformulating Kronecker products of covariance matrices as simple matrix products. This formula is computationally convenient, and its generalizations reproduce most of the proposals in the disease mapping literature. Use of the identity leads to a flexible, general and computationally convenient modelling framework, making it possible to combine spatial dependence structures and different relationships between diseases with limited effort. Moreover, as the proposed modelling framework covers most of the Gaussian Markov random field-based multivariate disease mapping models in the literature, it allows comparison of all these models in a common context, thus helping us to understand them better. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
539
553
http://hdl.handle.net/10.1093/biomet/ast023
application/pdf
Access to full text is restricted to subscribers.
Miguel A. Martinez-Beneito
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:757-7632013-11-29RePEc:oup:biomet
article
Adjusted regression estimation for time-to-event data with differential measurement error
Differential measurement error data plausibly arise in epidemiology and biomedical studies but have been rarely dealt with explicitly, especially for time-to-event data. We propose an estimation equation correction method in semiparametric censored linear regression to deal with differential measurement error for time-to-event data with validation samples. The method does not require explicit modelling of the error-prone covariates and leads to unbiased estimation. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
757
763
http://hdl.handle.net/10.1093/biomet/ast007
application/pdf
Access to full text is restricted to subscribers.
Menggang Yu
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:764-7702013-11-29RePEc:oup:biomet
article
Survival analysis without survival data: connecting length-biased and case-control data
We show that relative mean survival parameters of a semiparametric log-linear model can be estimated using covariate data from an incident sample and a prevalent sample, even when there is no prospective follow-up to collect any survival data. Estimation is based on an induced semiparametric density ratio model for covariates from the two samples, and it shares the same structure as for a logistic regression model for case-control data. Likelihood inference coincides with well-established methods for case-control data. We show two further related results. First, estimation of interaction parameters in a survival model can be performed using covariate information only from a prevalent sample, analogous to a case-only analysis. Furthermore, propensity score and conditional exposure effect parameters on survival can be estimated using only covariate data collected from incident and prevalent samples. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
764
770
http://hdl.handle.net/10.1093/biomet/ast008
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
oai:RePEc:oup:biomet:v:100:y:2013:i:3:p:709-7262013-11-29RePEc:oup:biomet
article
Robust analysis of semiparametric renewal process models
A rate model is proposed for a modulated renewal process comprising a single long sequence, where the covariate process may not capture the dependencies in the sequence as in standard intensity models. We consider partial likelihood-based inferences under a semiparametric multiplicative rate model, which has been widely studied in the context of independent and identical data. Under an intensity model, gap times in a single long sequence may be used naively in the partial likelihood, with variance estimation utilizing the observed information matrix. Under a rate model, the gap times cannot be treated as independent and studying the partial likelihood is much more challenging. We employ a mixing condition in the application of limit theory for stationary sequences to obtain consistency and asymptotic normality. The estimator's variance is quite complicated, owing to the unknown gap times dependence structure. We adapt block bootstrapping and cluster variance estimators to the partial likelihood. Simulation studies and an analysis of a semiparametric extension of a popular model for neural spike train data demonstrate the practical utility of the rate approach in comparison with the intensity approach. Copyright 2013, Oxford University Press.
3
2013
100
Biometrika
709
726
http://hdl.handle.net/10.1093/biomet/ast011
application/pdf
Access to full text is restricted to subscribers.
Feng-Chang Lin
Young K. Truong
Jason P. Fine
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:899-9142013-01-01RePEc:oup:biomet
article
Simultaneous supervised clustering and feature selection over a graph
In this article, we propose a regression method for simultaneous supervised clustering and feature selection over a given undirected graph, where homogeneous groups or clusters are estimated as well as informative predictors, with each predictor corresponding to one node in the graph and a connecting path indicating a priori possible grouping among the corresponding predictors. The method seeks a parsimonious model with high predictive power through identifying and collapsing homogeneous groups of regression coefficients. To address computational challenges, we present an efficient algorithm integrating the augmented Lagrange multipliers, coordinate descent and difference convex methods. We prove that the proposed method not only identifies the true homogeneous groups and informative features consistently but also leads to accurate parameter estimation. A gene network dataset is analysed to demonstrate that the method can make a difference by exploring dependency structures among the genes. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
899
914
http://hdl.handle.net/10.1093/biomet/ass038
application/pdf
Access to full text is restricted to subscribers.
Xiaotong Shen
Hsin-Cheng Huang
Wei Pan
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:945-9582013-01-01RePEc:oup:biomet
article
Penalized balanced sampling
Linear mixed models cover a wide range of statistical methods, which have found many uses in the estimation for complex surveys. The purpose of this work is to consider methods by which linear mixed models may be used at the design stage of a survey to incorporate available auxiliary information. This paper reviews the ideas of balanced sampling and the cube algorithm, and proposes an implementation of the latter by which penalized balanced samples can be selected. Such samples can reduce or eliminate the need for linear mixed model weight adjustments, a result demonstrated theoretically and via simulation. Horvitz--Thompson estimators for such samples will be highly efficient for any responses well approximated by a linear mixed model in the auxiliary information. In Monte Carlo experiments using nonparametric and temporal linear mixed models, the strategy of penalized balanced sampling with Horvitz--Thompson estimation dominates a variety of standard strategies. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
945
958
http://hdl.handle.net/10.1093/biomet/ass033
application/pdf
Access to full text is restricted to subscribers.
F. J. Breidt
G. Chauvet
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:915-9282013-01-01RePEc:oup:biomet
article
On the sparsity of signals in a random sample
This article proposes a method of moments technique for estimating the sparsity of signals in a random sample. This involves estimating the largest eigenvalue of a large Hermitian trigonometric matrix under mild conditions. As illustration, the method is applied to two well-known problems. The first focuses on the sparsity of a large covariance matrix and the second investigates the sparsity of a sequence of signals observed with stationary, weakly dependent noise. Simulation shows that the proposed estimators can have significantly smaller mean absolute errors than their main competitors. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
915
928
http://hdl.handle.net/10.1093/biomet/ass039
application/pdf
Access to full text is restricted to subscribers.
Binyan Jiang
Wei-Liem Loh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:799-8112013-01-01RePEc:oup:biomet
article
Choosing trajectory and data type when classifying functional data
In some problems involving functional data, it is desired to undertake prediction or classification before the full trajectory of a function is observed. In such cases, it is often preferable to suffer somewhat greater error in return for making a decision relatively early. The prediction and classification problems can be treated similarly, using mean squared prediction error, or classification error, respectively, as the means for quantifying performance, so in this paper we focus principally on classification. We introduce a method for determining when an early decision can reasonably be made, using only part of the trajectory, and we show how to use the method to choose among data types. Our approach is fully nonparametric, and no specific model is required. Properties of error-rate are studied as functions of time and data type. The effectiveness of the proposed method is illustrated in both theoretical and numerical terms. The classification referred to in this paper would be termed supervised classification in machine learning, to distinguish it from unsupervised classification, or clustering. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
799
811
http://hdl.handle.net/10.1093/biomet/ass011
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Tapabrata Maiti
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:865-8772013-01-01RePEc:oup:biomet
article
A two-stage dimension-reduction method for transformed responses and its applications
Researchers in the biological sciences nowadays often encounter the curse of dimensionality. To tackle this, sufficient dimension reduction aims to estimate the central subspace, in which all the necessary information supplied by the covariates regarding the response of interest is contained. Subsequent statistical analysis can then be made in a lower-dimensional space while preserving relevant information. Many studies are concerned with the transformed response rather than the original one, but they may have different central subspaces. When estimating the central subspace of the transformed response, direct methods will be inefficient. In this article, we propose a more efficient two-stage estimator of the central subspace of a transformed response. This approach is extended to censored responses and is applied to combining multiple biomarkers. Simulation studies and data examples support the superiority of the procedure. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
865
877
http://hdl.handle.net/10.1093/biomet/ass042
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:775-7862013-01-01RePEc:oup:biomet
article
Classification based on a permanental process with cyclic approximation
We introduce a doubly stochastic marked point process model for supervised classification problems. Regardless of the number of classes or the dimension of the feature space, the model requires only 2--3 parameters for the covariance function. The classification criterion involves a permanental ratio for which an approximation using a polynomial-time cyclic expansion is proposed. The approximation is effective even if the feature region occupied by one class is a patchwork interlaced with regions occupied by other classes. An application to DNA microarray analysis indicates that the cyclic approximation is effective even for high-dimensional data. It can employ feature variables in an efficient way to reduce the prediction error significantly. This is critical when the true classification relies on nonreducible high-dimensional features. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
775
786
http://hdl.handle.net/10.1093/biomet/ass047
application/pdf
Access to full text is restricted to subscribers.
J. Yang
K. Miescke
P. McCullagh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:929-9442013-01-01RePEc:oup:biomet
article
Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction
Several two-stage multiple testing procedures have been proposed to detect gene-environment interaction in genome-wide association studies. In this article, we elucidate general conditions that are required for validity and power of these procedures, and we propose extensions of two-stage procedures using the case-only estimator of gene-treatment interaction in randomized clinical trials. We develop a unified estimating equation approach to proving asymptotic independence between a filtering statistic and an interaction test statistic in a range of situations, including marginal association and interaction in a generalized linear model with a canonical link. We assess the performance of various two-stage procedures in simulations and in genetic studies from Women's Health Initiative clinical trials. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
929
944
http://hdl.handle.net/10.1093/biomet/ass044
application/pdf
Access to full text is restricted to subscribers.
James Y. Dai
Charles Kooperberg
Michael Leblanc
Ross L. Prentice
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:981-9882013-01-01RePEc:oup:biomet
article
Finite population estimators in stochastic search variable selection
Monte Carlo algorithms are commonly used to identify a set of models for Bayesian model selection or model averaging. Because empirical frequencies of models are often zero or one in high-dimensional problems, posterior probabilities calculated from the observed marginal likelihoods, renormalized over the sampled models, are often employed. Such estimates are the only recourse in several newer stochastic search algorithms. In this paper, we prove that renormalization of posterior probabilities over the set of sampled models generally leads to bias that may dominate mean squared error. Viewing the model space as a finite population, we propose a new estimator based on a ratio of Horvitz--Thompson estimators that incorporates observed marginal likelihoods, but is approximately unbiased. This is shown to lead to a reduction in mean squared error compared to the empirical or renormalized estimators, with little increase in computational cost. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
981
988
http://hdl.handle.net/10.1093/biomet/ass040
application/pdf
Access to full text is restricted to subscribers.
Merlise A. Clyde
Joyee Ghosh
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:995-10002013-01-01RePEc:oup:biomet
article
Proportional mean residual life model for right-censored length-biased data
To study disease association with risk factors in epidemiologic studies, cross-sectional sampling is often more focused and less costly for recruiting study subjects who have already experienced initiating events. For time-to-event outcome, however, such a sampling strategy may be length biased. Coupled with censoring, analysis of length-biased data can be quite challenging, due to induced informative censoring in which the survival time and censoring time are correlated through a common backward recurrence time. We propose to use the proportional mean residual life model of Oakes & Dasu (Biometrika 77, 409--10, 1990) for analysis of censored length-biased survival data. Several nonstandard data structures, including censoring of onset time and cross-sectional data without follow-up, can also be handled by the proposed methodology. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
995
1000
http://hdl.handle.net/10.1093/biomet/ass049
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
Ying Qing Chen
Chong-Zhi Di
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:879-8982013-01-01RePEc:oup:biomet
article
Scaled sparse linear regression
Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
879
898
http://hdl.handle.net/10.1093/biomet/ass043
application/pdf
Access to full text is restricted to subscribers.
Tingni Sun
Cun-Hui Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:763-7742013-01-01RePEc:oup:biomet
article
Testing one hypothesis twice in observational studies
In a matched observational study of treatment effects, a sensitivity analysis asks about the magnitude of the departure from random assignment that would need to be present to alter the conclusions of an analysis that assumes that matching for measured covariates removes all bias. The reported degree of sensitivity to unmeasured biases depends on both the process that generated the data and the chosen methods of analysis, so a poor choice of method may lead to an exaggerated report of sensitivity to bias. This suggests the possibility of performing more than one analysis with a correction for multiple inference, say testing one null hypothesis using two or three different tests. In theory and in an example, it is shown that, in large samples, the gains from testing twice will often be large, because testing twice has the larger of the two design sensitivities of the component tests, and the losses due to correcting for two tests will often be small, because two tests of one hypothesis will typically be highly correlated, so a correction for multiple testing that takes this into account will be small. An illustration uses data from the U.S. National Health and Nutrition Examination Survey concerning lead in the blood of cigarette smokers. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
763
774
http://hdl.handle.net/10.1093/biomet/ass032
application/pdf
Access to full text is restricted to subscribers.
P. R. Rosenbaum
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:973-9802013-01-01RePEc:oup:biomet
article
Statistical properties of an early stopping rule for resampling-based multiple testing
Resampling-based methods for multiple hypothesis testing often lead to long run times when the number of tests is large. This paper presents a simple rule that substantially reduces computation by allowing resampling to terminate early on a subset of tests. We prove that the method has a low probability of obtaining a set of rejected hypotheses different from those rejected without early stopping, and obtain error bounds for multiple hypothesis testing. Simulation shows that our approach saves more computation than other available procedures. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
973
980
http://hdl.handle.net/10.1093/biomet/ass051
application/pdf
Access to full text is restricted to subscribers.
Hui Jiang
Julia Salzman
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:1001-10072013-01-01RePEc:oup:biomet
article
An efficient empirical likelihood approach for estimating equations with missing data
We explore the use of estimating equations for efficient statistical inference in case of missing data. We propose a semiparametric efficient empirical likelihood approach, and show that the empirical likelihood ratio statistic and its profile counterpart asymptotically follow central chi-square distributions when evaluated at the true parameter. The theoretical properties and practical performance of our approach are demonstrated through numerical simulations and data analysis. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
1001
1007
http://hdl.handle.net/10.1093/biomet/ass045
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Yongsong Qin
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:851-8642013-01-01RePEc:oup:biomet
article
Bidirectional discrimination with application to data visualization
Linear classifiers are very popular, but can have limitations when classes have distinct subpopulations. General nonlinear kernel classifiers are very flexible, but do not give clear interpretations and may not be efficient in high dimensions. We propose the bidirectional discrimination classification method, which generalizes linear classifiers to two or more hyperplanes. This new family of classification methods gives much of the flexibility of a general nonlinear classifier while maintaining the interpretability, and much of the parsimony, of linear classifiers. They provide a new visualization tool for high-dimensional, low-sample-size data. Although the idea is generally applicable, we focus on the generalization of the support vector machine and distance-weighted discrimination methods. The performance and usefulness of the proposed method are assessed using asymptotics and demonstrated through analysis of simulated and real data. Our method leads to better classification performance in high-dimensional situations where subclusters are present in the data. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
851
864
http://hdl.handle.net/10.1093/biomet/ass029
application/pdf
Access to full text is restricted to subscribers.
Hanwen Huang
Yufeng Liu
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:959-9722013-01-01RePEc:oup:biomet
article
Bootstrap confidence bands for sojourn distributions in multistate semi-Markov models with right censoring
Transient semi-Markov processes have traditionally been used to describe the transitions of a patient through the various states of a multistate survival model. A survival distribution in this context is a sojourn through the states until passage to a fatal absorbing state or certain endpoint states. Using complete sojourn data, this paper shows how such survival distributions and associated hazard functions can be estimated nonparametrically and also how nonparametric bootstrap pointwise confidence bands can be constructed for them when patients are subject to independent right censoring from each state during the sojourn. Limitations to the estimability of such survival distributions that result from random censoring with bounded support are clarified. The methods are applicable to any sort of sojourn through any finite state process of arbitrary complexity involving feedback into previously occupied states. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
959
972
http://hdl.handle.net/10.1093/biomet/ass036
application/pdf
Access to full text is restricted to subscribers.
Ronald W. Butler
Douglas A. Bronson
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:813-8322013-01-01RePEc:oup:biomet
article
Dispersion operators and resistant second-order functional data analysis
Inferences related to the second-order properties of functional data, as expressed by covariance structure, can become unreliable when the data are non-Gaussian or contain unusual observations. In the functional setting, it is often difficult to identify atypical observations, as their distinguishing characteristics can be manifold but subtle. In this paper, we introduce the notion of a dispersion operator, investigate its use in probing the second-order structure of functional data, and develop a test for comparing the second-order characteristics of two functional samples that is resistant to atypical observations and departures from normality. The proposed test is a regularized M-test based on a spectrally truncated version of the Hilbert--Schmidt norm of a score operator defined via the dispersion operator. We derive the asymptotic distribution of the test statistic, investigate the behaviour of the test in a simulation study and illustrate the method on a structural biology dataset. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
813
832
http://hdl.handle.net/10.1093/biomet/ass037
application/pdf
Access to full text is restricted to subscribers.
David Kraus
Victor M. Panaretos
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:989-9942013-01-01RePEc:oup:biomet
article
Compatible weighted proper scoring rules
Many proper scoring rules such as the Brier and log scoring rules implicitly reward a probability forecaster relative to a uniform baseline distribution. Recent work has motivated weighted proper scoring rules, which have an additional baseline parameter. To date two families of weighted proper scoring rules have been introduced, the weighted power and pseudospherical scoring families. These families are compatible with the log scoring rule: when the baseline maximizes the log scoring rule over some set of distributions, the baseline also maximizes the weighted power and pseudospherical scoring rules over the same set. We characterize all weighted proper scoring families and prove a general property: every proper scoring rule is compatible with some weighted scoring family, and every weighted scoring family is compatible with some proper scoring rule. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
989
994
http://hdl.handle.net/10.1093/biomet/ass046
application/pdf
Access to full text is restricted to subscribers.
P. G. M. Forbes
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:787-7982013-01-01RePEc:oup:biomet
article
Orthogonalization of vectors with minimal adjustment
Two transformations are proposed that give orthogonal components with a one-to-one correspondence between the original vectors and the components. The aim is that each component should be close to the vector with which it is paired, orthogonality imposing a constraint. The transformations lead to a variety of new statistical methods, including a unified approach to the identification and diagnosis of collinearities, a method of setting prior weights for Bayesian model averaging, and a means of calculating an upper bound for a multivariate Chebychev inequality. One transformation has the property that duplicating a vector has no effect on the orthogonal components that correspond to nonduplicated vectors, and is determined using a new algorithm that also provides the decomposition of a positive-definite matrix in terms of a diagonal matrix and a correlation matrix. The algorithm is shown to converge to a global optimum. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
787
798
http://hdl.handle.net/10.1093/biomet/ass041
application/pdf
Access to full text is restricted to subscribers.
Paul H. Garthwaite
Frank Critchley
Karim Anaya-Izquierdo
Emmanuel Mubwandarikwa
oai:RePEc:oup:biomet:v:99:y:2012:i:4:p:833-8492013-01-01RePEc:oup:biomet
article
A geometric approach to projective shape and the cross ratio
Projective shape consists of the information about a configuration of points that is invariant under projective transformations. It is an important tool in machine vision to pick out features that are invariant to the choice of camera view. The simplest example is the cross ratio for a set of four collinear points. Recent work involving ideas from multivariate robustness enables us to introduce here a natural preshape on projective shape space. This makes it possible to adapt the Procrustes analysis that forms the basis of much methodology in the simpler setting of similarity shape space. Copyright 2012, Oxford University Press.
4
2012
99
Biometrika
833
849
http://hdl.handle.net/10.1093/biomet/ass055
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Kanti V. Mardia
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:585-5982014-04-02RePEc:oup:biomet
article
On the estimation of an average rigid body motion
This paper investigates the definition and the estimation of the Fréchet mean of a random rigid body motion in ℝ-super-p. The sample space SE(p) contains objects M=(R,t) where R is a p×p rotation matrix and t is a p×1 translation vector. This work is motivated by applications in biomechanics where the posture of a joint at a given time is expressed as M∈SE(3), the rigid body displacement needed to map a system of axes on one segment of the joint to a similar system on the other segment. This posture can also be reported as M-super- - 1=(R-super-T, - R-super-Tt) by interchanging the role of the two segments. Several definitions of a Fréchet mean for a random motion are proposed using weighted least squares distances. A special emphasis is given to a Fréchet mean that is equivariant with respect to the inverse transform; this means that if P is the Fréchet mean for M then P-super- - 1 is the Fréchet mean for M-super- - 1, where M is a random SE(p) object. The sampling properties of moment estimators of the Fréchet means are studied in a large concentration setting, where the scatter of the random Ms around their mean value P is small, and as the sample size goes to ∞. Some simple exponential family models for SE(p) data that generalize Downs' (1972) Fisher--von Mises matrix distribution for rotation matrices are introduced and the least squares mean values for these distributions are calculated. Asymptotic comparisons between the estimators presented in this work are carried out for a particular model when p=2. A numerical example involving the motion of the ankle is presented to illustrate the methodology. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
585
598
http://hdl.handle.net/10.1093/biomet/ass020
application/pdf
Access to full text is restricted to subscribers.
Karim Oualkacha
Louis-Paul Rivest
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:421-437.2015-07-30RePEc:oup:biomet
article
Effective dimension reduction for sparse functional data
We propose a method of effective dimension reduction for functional data, emphasizing the sparse design where one observes only a few noisy and irregular measurements for some or all of the subjects. The proposed method borrows strength across the entire sample and provides a way to characterize the effective dimension reduction space, via functional cumulative slicing. Our theoretical study reveals a bias-variance trade-off associated with the regularizing truncation and decaying structures of the predictor process and the effective dimension reduction space. A simulation study and an application illustrate the superior finite-sample performance of the method.
2
2015
102
Biometrika
421
437
http://hdl.handle.net/10.1093/biomet/asv006
application/pdf
Access to full text is restricted to subscribers.
F. Yao
E. Lei
Y. Wu
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:831-847.2015-07-30RePEc:oup:biomet
article
Interactive model building for Q-learning
Evidence-based rules for optimal treatment allocation are key components in the quest for efficient, effective health-care delivery. Q-learning, an approximate dynamic programming algorithm, is a popular method for estimating optimal sequential decision rules from data. Q-learning requires the modelling of nonsmooth, nonmonotone transformations of the data, complicating the search for adequately expressive, yet parsimonious, statistical models. The default Q-learning working model is multiple linear regression, which not only is misspecified under most data-generating models but also results in nonregular regression estimators, complicating inference. We propose an alternative strategy for estimating optimal sequential decision rules for which the requisite statistical modelling does not depend on nonsmooth, nonmonotone transformed data, does not result in nonregular regression estimators, is consistent under more data-generation models than is Q-learning, results in estimated sequential decision rules that have better sampling properties, and is amenable to established statistical methods for exploratory data analysis, model building and validation. We derive the new method, IQ-learning, via an interchange in the order of certain steps in Q-learning. In simulated experiments, IQ-learning improves upon Q-learning in terms of integrated mean-squared error and power. The method is illustrated using data from a study of major depressive disorder.
4
2014
101
Biometrika
831
847
http://hdl.handle.net/10.1093/biomet/asu043
application/pdf
Access to full text is restricted to subscribers.
Eric B. Laber
Kristin A. Linn
Leonard A. Stefanski
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:267-280.2015-07-30RePEc:oup:biomet
article
Hierarchical recognition of sparse patterns in large-scale simultaneous inference
We study how to separate signals from noisy data accurately and determine the patterns of the selected signals. Controlling the inflation of false positive errors is important in large-scale simultaneous inference but has not been addressed in the pattern recognition literature. We develop a decision-theoretic framework and formulate the sparse pattern recognition problem as a simultaneous inference problem with multiple decision trees. Oracle and adaptive classifiers are proposed for maximizing the expected number of true positives subject to a constraint on the overall false positive rate. Existing results on multiple testing are extended by allowing more than two states of nature, hierarchical decision-making and new error rate concepts.
2
2015
102
Biometrika
267
280
http://hdl.handle.net/10.1093/biomet/asv012
application/pdf
Access to full text is restricted to subscribers.
Wenguang Sun
Zhi Wei
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:319-332.2015-07-30RePEc:oup:biomet
article
Latin hypercube designs with controlled correlations and multi-dimensional stratification
Various methods have been proposed to construct Latin hypercube designs with small correlations. Orthogonal arrays have been used to construct Latin hypercube designs with multi-dimensional stratification. To integrate these two ideas, we propose a method to construct Latin hypercube designs with both controlled correlations and multi-dimensional stratification. For numerical integration, the constructed designs not only filter out lower-dimensional variance components as effectively as ordinary orthogonal array-based Latin hypercube designs, but also filter out bilinear terms more effectively. The proposed construction method entails no iterative searches. Sampling properties of the constructed designs are derived. Examples are given to illustrate the proposed construction method and the theoretical results.
2
2014
101
Biometrika
319
332
http://hdl.handle.net/10.1093/biomet/ast062
application/pdf
Access to full text is restricted to subscribers.
Jiajie Chen
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:151-168.2015-07-30RePEc:oup:biomet
article
Doubly robust learning for estimating individualized treatment with censored data
Individualized treatment rules recommend treatments based on individual patient characteristics in order to maximize clinical benefit. When the clinical outcome of interest is survival time, estimation is often complicated by censoring. We develop nonparametric methods for estimating an optimal individualized treatment rule in the presence of censored data. To adjust for censoring, we propose a doubly robust estimator which requires correct specification of either the censoring model or survival model, but not both; the method is shown to be Fisher consistent when either model is correct. Furthermore, we establish the convergence rate of the expected survival under the estimated optimal individualized treatment rule to the expected survival under the optimal individualized treatment rule. We illustrate the proposed methods using simulation study and data from a Phase III clinical trial on non-small cell lung cancer.
1
2015
102
Biometrika
151
168
http://hdl.handle.net/10.1093/biomet/asu050
application/pdf
Access to full text is restricted to subscribers.
Y. Q. Zhao
D. Zeng
E. B. Laber
R. Song
M. Yuan
M. R. Kosorok
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:205-218.2015-07-30RePEc:oup:biomet
article
Model averaging and weight choice in linear mixed-effects models
This article studies model averaging for linear mixed-effects models. We establish an unbiased estimator of the squared risk for the model averaging, and use the estimator as a criterion for choosing weights. The resulting model average estimator is proved to be asymptotically optimal under some regularity conditions. Simulation experiments show it is superior or comparable to estimators based on the final models selected by the commonly-used methods and some existing averaging procedures. The proposed procedure is applied to data from an AIDS clinic trial.
1
2014
101
Biometrika
205
218
http://hdl.handle.net/10.1093/biomet/ast052
application/pdf
Access to full text is restricted to subscribers.
Xinyu Zhang
Guohua Zou
Hua Liang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:978-984.2015-07-30RePEc:oup:biomet
article
Tests for Kronecker envelope models in multilinear principal components analysis
We develop likelihood methods for the Kronecker envelope model in the principal components analysis of matrix observations that have a multivariate normal distribution. Maximum likelihood estimates are derived and the associated likelihood ratio statistic for a test of this Knonecker envelope model is obtained. The asymptotic null distribution of the likelihood ratio statistic is derived as some nuisance parameters approach infinity, and a saddlepoint approximation for this limiting distribution is given. An alternative composite test for the Kronecker envelope model, which can be used when the sample size is too small to use the likelihood ratio test, is also given. Simulation results demonstrate the accuracy of our approximations.
4
2014
101
Biometrika
978
984
http://hdl.handle.net/10.1093/biomet/asu029
application/pdf
Access to full text is restricted to subscribers.
James R. Schott
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:393-408.2015-07-30RePEc:oup:biomet
article
A combined estimating function approach for fitting stationary point process models
A composite likelihood technique based on pairwise contributions provides a computationally simple but potentially inefficient approach for fitting spatial point process models. We propose a new estimation procedure that improves the efficiency. Our approach combines estimating functions derived from pairwise composite likelihood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate the efficacy of our proposed method through a simulation study and an application to the longleaf pine data.
2
2014
101
Biometrika
393
408
http://hdl.handle.net/10.1093/biomet/ast069
application/pdf
Access to full text is restricted to subscribers.
C. Deng
R. P. Waagepetersen
Y. Guan
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:849-864.2015-07-30RePEc:oup:biomet
article
Estimation of a semiparametric natural direct effect model incorporating baseline covariates
Establishing cause-effect relationships is a standard goal of empirical science. Once the existence of a causal relationship is established, the precise causal mechanism involved becomes a topic of interest. A particularly popular type of mechanism analysis concerns questions of mediation, i.e., to what extent an effect is direct, and to what extent it is mediated by a third variable. A semiparametric theory has recently been proposed that allows multiply robust estimation of direct and mediated marginal effect functionals in observational studies (Tchetgen Tchetgen & Shpitser, 2012). In this paper we extend the theory to handle parametric models of natural direct and indirect effects within levels of pre-exposure variables with an identity or log link function, where the model for the observed data likelihood is otherwise unrestricted. We show that estimation is generally infeasible in such a model because of the curse of dimensionality associated with the required estimation of auxiliary conditional densities or expectations, given high-dimensional covariates. Thus, we consider multiply robust estimation and propose a more general model which assumes that a subset, but not the entirety, of several working models holds.
4
2014
101
Biometrika
849
864
http://hdl.handle.net/10.1093/biomet/asu044
application/pdf
Access to full text is restricted to subscribers.
E. J. Tchetgen Tchetgen
I. Shpitser
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:365-375.2015-07-30RePEc:oup:biomet
article
Locally ϕp-optimal designs for generalized linear models with a single-variable quadratic polynomial predictor
Finding optimal designs for generalized linear models is a challenging problem. Recent research has identified the structure of optimal designs for generalized linear models with single or multiple unrelated explanatory variables that appear as first-order terms in the predictor. We consider generalized linear models with a single-variable quadratic polynomial as the predictor under a popular family of optimality criteria. When the design region is unrestricted, our results establish that optimal designs can be found within a subclass of designs based on a small support with symmetric structure. We show that the same conclusion holds with certain restrictions on the design region, but in other cases a larger subclass may have to be considered. In addition, we derive explicit expressions for some D-optimal designs.
2
2014
101
Biometrika
365
375
http://hdl.handle.net/10.1093/biomet/ast071
application/pdf
Access to full text is restricted to subscribers.
Hsin-Ping Wu
John Stufken
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:711-718.2015-07-30RePEc:oup:biomet
article
Posterior expectation based on empirical likelihoods
Posterior expectation is widely used as a Bayesian point estimator. In this note we extend it from parametric models to nonparametric models using empirical likelihood, and develop a nonparametric analogue of James–Stein estimation. We use the Laplace method to establish asymptotic approximations to our proposed posterior expectations, and show by simulation that they are often more efficient than the corresponding classical nonparametric procedures, especially when the underlying data are skewed.
3
2014
101
Biometrika
711
718
http://hdl.handle.net/10.1093/biomet/asu018
application/pdf
Access to full text is restricted to subscribers.
A. Vexler
G. Tao
A. D. Hutson
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:37-55.2015-07-30RePEc:oup:biomet
article
Information criteria for variable selection under sparsity
The optimization of an information criterion in a variable selection procedure leads to an additional bias, which can be substantial for sparse, high-dimensional data. One can compensate for the bias by applying shrinkage while estimating within the selected models. This paper presents modified information criteria for use in variable selection and estimation without shrinkage. The analysis motivating the modified criteria follows two routes. The first, which we explore for signal-plus-noise observations only, proceeds by comparing estimators with and without shrinkage. The second, discussed for general regression models, describes the optimization or selection bias as a double-sided effect, which we call a mirror effect: among the numerous insignificant variables, those with large, noisy values appear more valuable than an arbitrary variable, while in fact they carry more noise than an arbitrary variable. The mirror effect is investigated for Akaike’s information criterion and for Mallows’ Cp, with special attention paid to the latter criterion as a stopping rule in a least-angle regression routine. The result is a new stopping rule, which focuses not on the quality of a lasso shrinkage selection but on the least-squares estimator without shrinkage within the same selection.
1
2014
101
Biometrika
37
55
http://hdl.handle.net/10.1093/biomet/ast055
application/pdf
Access to full text is restricted to subscribers.
Maarten Jansen
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:397-408.2015-07-30RePEc:oup:biomet
article
Jump information criterion for statistical inference in estimating discontinuous curves
Nonparametric regression analysis when the regression function is discontinuous has many applications. Existing methods for estimating a discontinuous regression curve usually assume that the number of jumps in the regression curve is known beforehand, which is unrealistic in some situations. Although there has been research on estimation of a discontinuous regression curve when the number of jumps is unknown, the problem remains mostly open because such research often requires assumptions on other related quantities, such as a known minimum jump size. In this paper we propose a jump information criterion which consists of a term measuring the fidelity of the estimated regression curve to the observed data and a penalty related to the number of jumps and the jump sizes. The number of jumps can then be determined by minimizing our criterion. Theoretical and numerical studies show that our method works well.
2
2015
102
Biometrika
397
408
http://hdl.handle.net/10.1093/biomet/asv018
application/pdf
Access to full text is restricted to subscribers.
Zhiming Xia
Peihua Qiu
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:971-977.2015-07-30RePEc:oup:biomet
article
Generalized Cornfield conditions for the risk difference
A central question in causal inference with observational studies is the sensitivity of conclusions to unmeasured confounding. The classical Cornfield condition allows us to assess whether an unmeasured binary confounder can explain away the observed relative risk of the exposure on the outcome. It states that for an unmeasured confounder to explain away an observed relative risk, the association between the unmeasured confounder and the exposure and the association between the unmeasured confounder and the outcome must both be larger than the observed relative risk. In this paper, we extend the classical Cornfield condition in three directions. First, we consider analogous conditions for the risk difference and allow for a categorical, not just a binary, unmeasured confounder. Second, we provide more stringent thresholds that the maximum of the above-mentioned associations must satisfy, rather than weaker conditions that both must satisfy. Third, we show that all the earlier results on Cornfield conditions hold under weaker assumptions than previously used. We illustrate the potential applications by real examples, where our new conditions give more information than the classical ones.
4
2014
101
Biometrika
971
977
http://hdl.handle.net/10.1093/biomet/asu030
application/pdf
Access to full text is restricted to subscribers.
Peng Ding
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:494-499.2015-07-30RePEc:oup:biomet
article
Optimum designs for two treatments with unequal variances in the presence of covariates
Optimum designs are described for two treatments with different variances when covariates are included in the model. The designs, a generalization of Neyman allocation, are required in personalized medicine to model the effect of covariates on the choice of treatment. The use of the designs in clinical trials is indicated. D-optimality of the designs is established using results from Kiefer’s general equivalence theorem. The results are obtained with the use of surprisingly elementary algebra.
2
2015
102
Biometrika
494
499
http://hdl.handle.net/10.1093/biomet/asu071
application/pdf
Access to full text is restricted to subscribers.
A. C. Atkinson
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:1-14.2015-07-30RePEc:oup:biomet
article
Warped functional regression
A characteristic feature of functional data is the presence of phase variability in addition to amplitude variability. Existing functional regression methods do not handle time variability in an explicit and efficient way. In this paper we introduce a functional regression method that incorporates time warping as an intrinsic part of the model. The method achieves good predictive power in a parsimonious way and allows unified statistical inference about phase and amplitude components. The asymptotic distribution of the estimators is derived and their finite-sample properties are studied by simulation. An application involving ground-level ozone trajectories is presented.
1
2015
102
Biometrika
1
14
http://hdl.handle.net/10.1093/biomet/asu054
application/pdf
Access to full text is restricted to subscribers.
Daniel Gervini
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:377-392.2015-07-30RePEc:oup:biomet
article
Logistic regression for spatial Gibbs point processes
We propose a computationally efficient technique, based on logistic regression, for fitting Gibbs point process models to spatial point pattern data. The score of the logistic regression is an unbiased estimating function and is closely related to the pseudolikelihood score. Implementation of our technique does not require numerical quadrature, and thus avoids a source of bias inherent in other methods. For stationary processes, we prove that the parameter estimator is strongly consistent and asymptotically normal, and propose a variance estimator. We demonstrate the efficiency and practicability of the method on a real dataset and in a simulation study.
2
2014
101
Biometrika
377
392
http://hdl.handle.net/10.1093/biomet/ast060
application/pdf
Access to full text is restricted to subscribers.
Adrian Baddeley
Jean-François Coeurjolly
Ege Rubak
Rasmus Waagepetersen
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:219-228.2015-07-30RePEc:oup:biomet
article
Identifiability of Gaussian structural equation models with equal error variances
We consider structural equation models in which variables can be written as a function of their parents and noise terms, which are assumed to be jointly independent. Corresponding to each structural equation model is a directed acyclic graph describing the relationships between the variables. In Gaussian structural equation models with linear functions, the graph can be identified from the joint distribution only up to Markov equivalence classes, assuming faithfulness. In this work, we prove full identifiability in the case where all noise variables have the same variance: the directed acyclic graph can be recovered from the joint Gaussian distribution. Our result has direct implications for causal inference: if the data follow a Gaussian structural equation model with equal error variances, then, assuming that all variables are observed, the causal structure can be inferred from observational data only. We propose a statistical method and an algorithm based on our theoretical findings.
1
2014
101
Biometrika
219
228
http://hdl.handle.net/10.1093/biomet/ast043
application/pdf
Access to full text is restricted to subscribers.
J. Peters
P. Bühlmann
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:85-101.2015-07-30RePEc:oup:biomet
article
Graph estimation with joint additive models
In recent years, there has been considerable interest in estimating conditional independence graphs in high dimensions. Most previous work assumed that the variables are multivariate Gaussian or that the conditional means of the variables are linearly related. Unfortunately, if these assumptions are violated, the resulting conditional independence estimates can be inaccurate. We propose a semiparametric method, graph estimation with joint additive models, which allows the conditional means of the features to take an arbitrary additive form. We present an efficient algorithm for computation of our estimator, and prove that it is consistent. We extend our method to estimation of directed graphs with known causal ordering. Using simulated data, we show that our method performs better than existing methods when there are nonlinear relationships among the features, and is comparable to methods that assume multivariate normality when the conditional means are linear. We illustrate our method on a cell signalling dataset.
1
2014
101
Biometrika
85
101
http://hdl.handle.net/10.1093/biomet/ast053
application/pdf
Access to full text is restricted to subscribers.
Arend Voorman
Ali Shojaie
Daniela Witten
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:239-246.2015-07-30RePEc:oup:biomet
article
A Wilcoxon–Mann–Whitney-type test for infinite-dimensional data
The Wilcoxon–Mann–Whitney test is a robust competitor of the $t$ test in the univariate setting. For finite-dimensional multivariate non-Gaussian data, several extensions of the Wilcoxon–Mann–Whitney test have been shown to outperform Hotelling's $T^{2}$ test. In this paper, we study a Wilcoxon–Mann–Whitney-type test based on spatial ranks in infinite-dimensional spaces, we investigate its asymptotic properties and compare it with several existing tests. The proposed test is shown to be robust with respect to outliers and to have better power than some competitors for certain distributions with heavy tails. We study its performance using real and simulated data.
1
2015
102
Biometrika
239
246
http://hdl.handle.net/10.1093/biomet/asu072
application/pdf
Access to full text is restricted to subscribers.
Anirvan Chakraborty
Probal Chaudhuri
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:484-490.2015-07-30RePEc:oup:biomet
article
Convergence of sample eigenvalues, eigenvectors, and principal component scores for ultra-high dimensional data
The development of high-throughput biomedical technologies has led to increased interest in the analysis of high-dimensional data where the number of features is much larger than the sample size. In this paper, we investigate principal component analysis under the ultra-high dimensional regime, where both the number of features and the sample size increase as the ratio of the two quantities also increases. We bridge the existing results from the finite and the high-dimension low sample size regimes, embedding the two regimes in a more general framework. We also numerically demonstrate the universal application of the results from the finite regime.
2
2014
101
Biometrika
484
490
http://hdl.handle.net/10.1093/biomet/ast064
application/pdf
Access to full text is restricted to subscribers.
Seunggeun Lee
Fei Zou
Fred A. Wright
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:47-64.2015-07-30RePEc:oup:biomet
article
Selection and estimation for mixed graphical models
We consider the problem of estimating the parameters in a pairwise graphical model in which the distribution of each node, conditioned on the others, may have a different exponential family form. We identify restrictions on the parameter space required for the existence of a well-defined joint density, and establish the consistency of the neighbourhood selection approach for graph reconstruction in high dimensions when the true underlying graph is sparse. Motivated by our theoretical results, we investigate the selection of edges between nodes whose conditional distributions take different parametric forms, and show that efficiency can be gained if edge estimates obtained from the regressions of particular nodes are used to reconstruct the graph. These results are illustrated with examples of Gaussian, Bernoulli, Poisson and exponential distributions. Our theoretical findings are corroborated by evidence from simulation studies.
1
2015
102
Biometrika
47
64
http://hdl.handle.net/10.1093/biomet/asu051
application/pdf
Access to full text is restricted to subscribers.
Shizhe Chen
Daniela M. Witten
Ali Shojaie
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:371-380.2015-07-30RePEc:oup:biomet
article
Maximum projection designs for computer experiments
Space-filling properties are important in designing computer experiments. The traditional maximin and minimax distance designs consider only space-filling in the full-dimensional space; this can result in poor projections onto lower-dimensional spaces, which is undesirable when only a few factors are active. Restricting maximin distance design to the class of Latin hypercubes can improve one-dimensional projections but cannot guarantee good space-filling properties in larger subspaces. We propose designs that maximize space-filling properties on projections to all subsets of factors. We call our designs maximum projection designs. Our design criterion can be computed at no more cost than a design criterion that ignores projection properties.
2
2015
102
Biometrika
371
380
http://hdl.handle.net/10.1093/biomet/asv002
application/pdf
Access to full text is restricted to subscribers.
V. Roshan Joseph
Evren Gul
Shan Ba
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:229-236.2015-07-30RePEc:oup:biomet
article
Multivariate sign-based high-dimensional tests for sphericity
This article concerns tests for sphericity in cases where the data dimension is larger than the sample size. The existing multivariate sign-based procedure (Hallin & Paindaveine, 2006) for sphericity is not robust with respect to high dimensionality, producing tests with Type I error rates that are much larger than the nominal levels. This is mainly due to bias from estimating the location parameter. We develop a correction that makes the existing test statistic robust with respect to high dimensionality, and show that the proposed test statistic is asymptotically normal under elliptical distributions. The proposed method allows the dimensionality to increase as the square of the sample size. Simulations demonstrate that it has good size and power in a wide range of settings.
1
2014
101
Biometrika
229
236
http://hdl.handle.net/10.1093/biomet/ast040
application/pdf
Access to full text is restricted to subscribers.
Changliang Zou
Liuhua Peng
Long Feng
Zhaojun Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:733-740.2015-07-30RePEc:oup:biomet
article
Inadmissibility of the best equivariant predictive density in the unknown variance case
This work treats the problem of estimating the predictive density of a random vector when both the mean vector and the variance are unknown. We prove that the density of reference in this context is inadmissible under the Kullback–Leibler loss in a nonasymptotic framework. Our result holds even when the dimension of the vector is strictly lower than three, which is surprising compared to the known variance setting. Finally, we discuss the relationship between the prediction and the estimation problems.
3
2014
101
Biometrika
733
740
http://hdl.handle.net/10.1093/biomet/asu024
application/pdf
Access to full text is restricted to subscribers.
A. Boisbunon
Y. Maruyama
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:303-317.2015-07-30RePEc:oup:biomet
article
Bayesian monotone regression using Gaussian process projection
Shape-constrained regression analysis has applications in dose-response modelling, environmental risk assessment, disease screening and many other areas. Incorporating the shape constraints can improve estimation efficiency and avoid implausible results. We propose a novel method, focusing on monotone curve and surface estimation, which uses Gaussian process projections. Our inference is based on projecting posterior samples from the Gaussian process. We develop theory on continuity of the projection and rates of contraction. Our approach leads to simple computation with good performance in finite samples. The proposed projection method can also be applied to other constrained-function estimation problems, including those in multivariate settings.
2
2014
101
Biometrika
303
317
http://hdl.handle.net/10.1093/biomet/ast063
application/pdf
Access to full text is restricted to subscribers.
Lizhen Lin
David B. Dunson
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:465-476.2015-07-30RePEc:oup:biomet
article
Bootstrap for the case-cohort design
The case-cohort design facilitates economical investigation of risk factors in a large survival study, with covariate data collected only from the cases and a simple random subset of the full cohort. Methods that accommodate the design have been developed for various semiparametric models, but most inference procedures are based on asymptotic distribution theory. Such inference can be cumbersome to derive and implement, and does not permit confidence band construction. While the bootstrap is an obvious alternative, it is unclear how to resample because of complications from the two-stage sampling design. We establish an equivalent sampling scheme, and propose a novel and versatile nonparametric bootstrap for robust inference with an appealingly simple single-stage resampling. Theoretical justification and numerical assessment are provided for a number of procedures under the proportional hazards model.
2
2014
101
Biometrika
465
476
http://hdl.handle.net/10.1093/biomet/asu004
application/pdf
Access to full text is restricted to subscribers.
Yijian Huang
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:381-395.2015-07-30RePEc:oup:biomet
article
Automatic structure recovery for additive models
We propose an automatic structure recovery method for additive models, based on a backfitting algorithm coupled with local polynomial smoothing, in conjunction with a new kernel-based variable selection strategy. Our method produces estimates of the set of noise predictors, the sets of predictors that contribute polynomially at different degrees up to a specified degree M, and the set of predictors that contribute beyond polynomially of degree M. We prove consistency of the proposed method, and describe an extension to partially linear models. Finite-sample performance of the method is illustrated via Monte Carlo studies and a real-data example.
2
2015
102
Biometrika
381
395
http://hdl.handle.net/10.1093/biomet/asu070
application/pdf
Access to full text is restricted to subscribers.
Yichao Wu
Leonard A. Stefanski
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:169-180.2015-07-30RePEc:oup:biomet
article
Using covariate-specific disease prevalence information to increase the power of case-control studies
Public registration databases and large cohort studies provide vital information on disease prevalence at various levels of a risk factor. This auxiliary information can be helpful in conducting statistical inference in a new study. We aim to develop a statistical procedure that improves the efficiency of the logistic regression model for a case-control study by utilizing auxiliary information on covariate-specific disease prevalence via a series of unbiased estimating equations. We adopt empirical likelihood for statistical inference, and demonstrate its advantages through simulation and an application.
1
2015
102
Biometrika
169
180
http://hdl.handle.net/10.1093/biomet/asu048
application/pdf
Access to full text is restricted to subscribers.
Jing Qin
Han Zhang
Pengfei Li
Demetrius Albanes
Kai Yu
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:325-343.2015-07-30RePEc:oup:biomet
article
Information-theoretic optimality of observation-driven time series models for continuous responses
We investigate information-theoretic optimality properties of the score function of the predictive likelihood as a device for updating a real-valued time-varying parameter in a univariate observation-driven model with continuous responses. We restrict our attention to models with updates of one lag order. The results provide theoretical justification for a class of score-driven models which includes the generalized autoregressive conditional heteroskedasticity model as a special case. Our main contribution is to show that only parameter updates based on the score will always reduce the local Kullback–Leibler divergence between the true conditional density and the model-implied conditional density. This result holds irrespective of the severity of model misspecification. We also show that use of the score leads to a considerably smaller global Kullback–Leibler divergence in empirically relevant settings. We illustrate the theory with an application to time-varying volatility models. We show that the reduction in Kullback–Leibler divergence across a range of different settings can be substantial compared to updates based on, for example, squared lagged observations.
2
2015
102
Biometrika
325
343
http://hdl.handle.net/10.1093/biomet/asu076
application/pdf
Access to full text is restricted to subscribers.
F. Blasques
S. J. Koopman
A. Lucas
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:599-612.2015-07-30RePEc:oup:biomet
article
Characterization of the likelihood continual reassessment method
This paper deals with the design of the likelihood continual reassessment method, which is an increasingly widely used model-based method for dose-finding studies. It is common to implement the method in a two-stage approach, whereby the model-based stage is activated after an initial sequence of patients has been treated. While this two-stage approach is practically appealing, it lacks a theoretical framework, and it is often unclear how the design components should be specified. This paper develops a general framework based on the coherence principle, from which we derive a design calibration process. A real clinical-trial example is used to demonstrate that the proposed process can be implemented in a timely and reproducible manner, while offering competitive operating characteristics. We explore the operating characteristics of different models within this framework and show the performance to be insensitive to the choice of dose-toxicity model.
3
2014
101
Biometrika
599
612
http://hdl.handle.net/10.1093/biomet/asu012
application/pdf
Access to full text is restricted to subscribers.
Xiaoyu Jia
Shing M. Lee
Ying Kuen Cheung
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:121-140.2015-07-30RePEc:oup:biomet
article
Nonparametric estimation of a periodic sequence in the presence of a smooth trend
We investigate a nonparametric regression model including a periodic component, a smooth trend function, and a stochastic error term. We propose a procedure to estimate the unknown period and the function values of the periodic component as well as the nonparametric trend function. The theoretical part of the paper establishes the asymptotic properties of our estimators. In particular, we show that our estimator of the period is consistent. In addition, we derive the convergence rates and the limiting distributions of our estimators of the periodic component and the trend function. The asymptotic results are complemented with a simulation study and an application to global temperature anomaly data.
1
2014
101
Biometrika
121
140
http://hdl.handle.net/10.1093/biomet/ast051
application/pdf
Access to full text is restricted to subscribers.
Michael Vogt
Oliver Linton
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:245-251.2015-07-30RePEc:oup:biomet
article
The mode functional is not elicitable
This article is concerned with point forecasting of a real-valued random variable with a general Lebesgue density. Answering a question of Gneiting (2011), it is shown that the mode is not elicitable, or, in other words, that it is impossible to find a loss or scoring function under which the mode is the Bayes predictor.
1
2014
101
Biometrika
245
251
http://hdl.handle.net/10.1093/biomet/ast048
application/pdf
Access to full text is restricted to subscribers.
C. Heinrich
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:1-15.2015-07-30RePEc:oup:biomet
article
Efficient inference for spatial extreme value processes associated to log-Gaussian random functions
Max-stable processes arise as the only possible nontrivial limits for maxima of affinely normalized identically distributed stochastic processes, and thus form an important class of models for the extreme values of spatial processes. Until recently, inference for max-stable processes has been restricted to the use of pairwise composite likelihoods, due to intractability of higher-dimensional distributions. In this work we consider random fields that are in the domain of attraction of a widely used class of max-stable processes, namely those constructed via manipulation of log-Gaussian random functions. For this class, we exploit limiting d-dimensional multivariate Poisson process intensities of the underlying process for inference on all d-vectors exceeding a high marginal threshold in at least one component, employing a censoring scheme to incorporate information below the marginal threshold. We also consider the d-dimensional distributions for the equivalent max-stable process, and perform full likelihood inference by exploiting the methods of Stephenson & Tawn (2005), where information on the occurrence times of extreme events is shown to dramatically simplify the likelihood. The Stephenson–Tawn likelihood is in fact simply a special case of the censored Poisson process likelihood. We assess the improvements in inference from both methods over pairwise likelihood methodology by simulation.
1
2014
101
Biometrika
1
15
http://hdl.handle.net/10.1093/biomet/ast042
application/pdf
Access to full text is restricted to subscribers.
Jennifer L. Wadsworth
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:269-284.2015-07-30RePEc:oup:biomet
article
Variance estimation in high-dimensional linear models
The residual variance and the proportion of explained variation are important quantities in many statistical models and model fitting procedures. They play an important role in regression diagnostics and model selection procedures, as well as in determining the performance limits in many problems. In this paper we propose new method-of-moments-based estimators for the residual variance, the proportion of explained variation and other related quantities, such as the ℓ2 signal strength. The proposed estimators are consistent and asymptotically normal in high-dimensional linear models with Gaussian predictors and errors, where the number of predictors d is proportional to the number of observations n; in fact, consistency holds even in settings where d/n → ∞. Existing results on residual variance estimation in high-dimensional linear models depend on sparsity in the underlying signal. Our results require no sparsity assumptions and imply that the residual variance and the proportion of explained variation can be consistently estimated even when d>n and the underlying signal itself is nonestimable. Numerical work suggests that some of our distributional assumptions may be relaxed. A real-data analysis involving gene expression data and single nucleotide polymorphism data illustrates the performance of the proposed methods.
2
2014
101
Biometrika
269
284
http://hdl.handle.net/10.1093/biomet/ast065
application/pdf
Access to full text is restricted to subscribers.
Lee H. Dicker
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:457-477.2015-07-30RePEc:oup:biomet
article
On the degrees of freedom of reduced-rank estimators in multivariate regression
We study the effective degrees of freedom of a general class of reduced-rank estimators for multivariate regression in the framework of Stein's unbiased risk estimation. A finite-sample exact unbiased estimator is derived that admits a closed-form expression in terms of the thresholded singular values of the least-squares solution and hence is readily computable. The results continue to hold in the high-dimensional setting where both the predictor and the response dimensions may be larger than the sample size. The derived analytical form facilitates the investigation of theoretical properties and provides new insights into the empirical behaviour of the degrees of freedom. In particular, we examine the differences and connections between the proposed estimator and a commonly-used naive estimator. The use of the proposed estimator leads to efficient and accurate prediction risk estimation and model selection, as demonstrated by simulation studies and a data example.
2
2015
102
Biometrika
457
477
http://hdl.handle.net/10.1093/biomet/asu067
application/pdf
Access to full text is restricted to subscribers.
A. Mukherjee
K. Chen
N. Wang
J. Zhu
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:655-671.2015-07-30RePEc:oup:biomet
article
Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation
Approximate Bayesian computation has emerged as a standard computational tool when dealing with intractable likelihood functions in Bayesian inference. We show that many common Markov chain Monte Carlo kernels used to facilitate inference in this setting can fail to be variance bounding and hence geometrically ergodic, which can have consequences for the reliability of estimates in practice. This phenomenon is typically independent of the choice of tolerance in the approximation. We prove that a recently introduced Markov kernel can inherit the properties of variance bounding and geometric ergodicity from its intractable Metropolis–Hastings counterpart, under reasonably weak conditions. We show that the computational cost of this alternative kernel is bounded whenever the prior is proper, and present indicative results for an example where spectral gaps and asymptotic variances can be computed, as well as an example involving inference for a partially and discretely observed, time-homogeneous, pure jump Markov process. We also supply two general theorems, one providing a simple sufficient condition for lack of variance bounding for reversible kernels and the other providing a positive result concerning inheritance of variance bounding and geometric ergodicity for mixtures of reversible kernels.
3
2014
101
Biometrika
655
671
http://hdl.handle.net/10.1093/biomet/asu027
application/pdf
Access to full text is restricted to subscribers.
Anthony Lee
Krzysztof Łatuszyński
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:703-710.2015-07-30RePEc:oup:biomet
article
Extended empirical likelihood for estimating equations
We derive an extended empirical likelihood for parameters defined by estimating equations which generalizes the original empirical likelihood to the full parameter space. Under mild conditions, the extended empirical likelihood has all the asymptotic properties of the original empirical likelihood. The first-order extended empirical likelihood is easy to use and substantially more accurate than the original empirical likelihood.
3
2014
101
Biometrika
703
710
http://hdl.handle.net/10.1093/biomet/asu014
application/pdf
Access to full text is restricted to subscribers.
Min Tsao
Fan Wu
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:285-302.2015-07-30RePEc:oup:biomet
article
Bayes and empirical Bayes: do they merge?
Bayesian inference is attractive due to its internal coherence and for often having good frequentist properties. However, eliciting an honest prior may be difficult, and common practice is to take an empirical Bayes approach using an estimate of the prior hyperparameters. Although not rigorous, the underlying idea is that, for a sufficiently large sample size, empirical Bayes methods should lead to similar inferential answers as a proper Bayesian inference. However, precise mathematical results on this asymptotic agreement seem to be missing. In this paper, we give results in terms of merging Bayesian and empirical Bayes posterior distributions. We study two notions of merging: Bayesian weak merging and frequentist merging in total variation. We also show that, under regularity conditions, the empirical Bayes approach asymptotically gives an oracle selection of the prior hyperparameters. Examples include empirical Bayes density estimation with Dirichlet process mixtures.
2
2014
101
Biometrika
285
302
http://hdl.handle.net/10.1093/biomet/ast067
application/pdf
Access to full text is restricted to subscribers.
S. Petrone
J. Rousseau
C. Scricciolo
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:964-970.2015-07-30RePEc:oup:biomet
article
Analytical p-value calculation for the higher criticism test in finite-d problems
The higher criticism test is effective for testing a joint null hypothesis against a sparse alternative, e.g., for testing the effect of a gene or genetic pathway that consists of d genetic markers. Accurate p-value calculations for the higher criticism test based on the asymptotic distribution require a very large d, which is not the case for the number of genetic variants in a gene or a pathway. In this paper we propose an analytical method for accurately computing the p-value of the higher criticism test for finite-d problems. Unlike previous treatments, this method does not rely on asymptotics in d or on simulation, and is exact for arbitrary d when the test statistics are normally distributed. The method is particularly computationally advantageous when d is not large. We illustrate the proposed method with a case-control genome-wide association study of lung cancer and compare its power with competing methods through simulations.
4
2014
101
Biometrika
964
970
http://hdl.handle.net/10.1093/biomet/asu033
application/pdf
Access to full text is restricted to subscribers.
Ian J. Barnett
Xihong Lin
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:985-991.2015-07-30RePEc:oup:biomet
article
Semiparametric maximum likelihood inference by using failed contact attempts to adjust for nonignorable nonresponse
In marketing research, social science and epidemiological studies, call-back of nonrespondents is standard. If respondents and nonrespondents tend to give different answers, the missing data are called nonignorable, and using them alone may produce biased results. To extend earlier work on nonresponse in the presence of call-backs, Alho (1990) proposed modelling the probability of response at each attempt through logistic regression, where outcomes of interest and covariates are explanatory variables. In this paper we propose a semiparametric maximum likelihood approach, and discuss large-sample properties and the semiparametric likelihood ratio statistic used to test whether the data are missing completely at random. Simulations are conducted to evaluate this approach and a modification of the method of Alho (1990). Data from the National Health Interview Survey are used for illustration.
4
2014
101
Biometrika
985
991
http://hdl.handle.net/10.1093/biomet/asu046
application/pdf
Access to full text is restricted to subscribers.
Jing Qin
Dean A. Follmann
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:71-84.2015-07-30RePEc:oup:biomet
article
Better subset regression
This paper studies the relationship between model fitting and screening performance to find efficient screening methods for high-dimensional linear regression models. Under a sparsity assumption we show in a general asymptotic setting that a subset that includes the true submodel always yields a smaller residual sum of squares than those that do not. To seek such a subset, we consider the optimization problem associated with best subset regression. An em algorithm, known as orthogonalizing subset screening, and its accelerated version are proposed for searching for the best subset. Although the algorithms do not always find the best subset, their monotonicity makes the subset fit the data better than initial subsets, and thus the subset can have better screening performance asymptotically. Simulation results show that our methods are very competitive.
1
2014
101
Biometrika
71
84
http://hdl.handle.net/10.1093/biomet/ast041
application/pdf
Access to full text is restricted to subscribers.
Shifeng Xiong
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:359-370.2015-07-30RePEc:oup:biomet
article
A Möbius transformation-induced distribution on the torus
We propose a five-parameter bivariate wrapped Cauchy distribution as a unimodal model for toroidal data. It is highly tractable, displays numerous desirable properties, including marginal and conditional distributions that are all wrapped Cauchy, and arises as an appealing submodel of a six-parameter distribution obtained by applying Möbius transformation to a pre-existing bivariate circular model. Method of moments and maximum likelihood estimation of its parameters are fast, and tests for independence and goodness-of-fit are available. An analysis involving dihedral angles of the proteinogenic amino acid Tyrosine illustrates the distribution’s application. A Markov process for circular data is also explored.
2
2015
102
Biometrika
359
370
http://hdl.handle.net/10.1093/biomet/asv003
application/pdf
Access to full text is restricted to subscribers.
Shogo Kato
Arthur Pewsey
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:625-640.2015-07-30RePEc:oup:biomet
article
Multicategory angle-based large-margin classification
Large-margin classifiers are popular methods for classification. Among existing simultaneous multicategory large-margin classifiers, a common approach is to learn k different functions for a k-class problem with a sum-to-zero constraint. Such a formulation can be inefficient. We propose a new multicategory angle-based large-margin classification framework. The proposed angle-based classifiers consider a simplex-based prediction rule without the sum-to-zero constraint, and enjoy more efficient computation. Many binary large-margin classifiers can be naturally generalized for multicategory problems through the angle-based framework. Theoretical and numerical studies demonstrate the usefulness of the angle-based methods.
3
2014
101
Biometrika
625
640
http://hdl.handle.net/10.1093/biomet/asu017
application/pdf
Access to full text is restricted to subscribers.
Chong Zhang
Yufeng Liu
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:333-350.2015-07-30RePEc:oup:biomet
article
Permuting regular fractional factorial designs for screening quantitative factors
Fractional factorial designs are widely used in screening experiments. They are often chosen by the minimum aberration criterion, which regards factor levels as symbols. For designs with quantitative factors, however, permuting the levels for one or more factors could alter their geometrical structures and statistical properties. We provide a justification of the minimum β-aberration criterion for quantitative factors and study level permutations for regular fractional factorial designs in order to improve their efficiency for screening quantitative factors. We show how regular designs can be linearly permuted to reduce contamination of nonnegligible interactions on the estimation of linear effects without increasing the run size. We further show that such linear permutations are unique under the minimum β-aberration criterion and the best level permutations can be determined without an exhaustive search. We establish additional theoretical results for three-level designs and obtain the best level permutations for regular designs with 27 and 81 runs. We illustrate the practical benefits of level permutation with an antiviral drug combination experiment.
2
2014
101
Biometrika
333
350
http://hdl.handle.net/10.1093/biomet/ast073
application/pdf
Access to full text is restricted to subscribers.
Yu Tang
Hongquan Xu
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:231-238.2015-07-30RePEc:oup:biomet
article
Generalized Ewens–Pitman model for Bayesian clustering
We propose a Bayesian method for clustering from discrete data structures that commonly arise in genetics and other applications. This method is equivariant with respect to relabelling units; unsampled units do not interfere with sampled data; and missing data do not hinder inference. Cluster inference using the posterior mode performs well on simulated and real datasets, and the posterior predictive distribution enables supervised learning based on a partial clustering of the sample.
1
2015
102
Biometrika
231
238
http://hdl.handle.net/10.1093/biomet/asu052
application/pdf
Access to full text is restricted to subscribers.
Harry Crane
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:203-214.2015-07-30RePEc:oup:biomet
article
Double-bootstrap methods that use a single double-bootstrap simulation
We show that, when the double bootstrap is used to improve performance of bootstrap methods for bias correction, techniques based on using a single double-bootstrap sample for each single-bootstrap sample can produce third-order accuracy for much less computational expense than is required by conventional double-bootstrap methods. However, this improved level of performance is not available for the single double-bootstrap methods that have been suggested to construct confidence intervals or distribution estimators.
1
2015
102
Biometrika
203
214
http://hdl.handle.net/10.1093/biomet/asu060
application/pdf
Access to full text is restricted to subscribers.
Jinyuan Chang
Peter Hall
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:439-456.2015-07-30RePEc:oup:biomet
article
Envelopes and reduced-rank regression
We incorporate the nascent idea of envelopes (Cook et al., Statist. Sinica 20, 927–1010) into reduced-rank regression by proposing a reduced-rank envelope model, which is a hybrid of reduced-rank and envelope regressions. The proposed model has total number of parameters no more than either of reduced-rank regression or envelope regression. The resulting estimator is at least as efficient as both existing estimators. The methodology of this paper can be adapted to other envelope models, such as partial envelopes (Su & Cook, Biometrika 98, 133–46) and envelopes in predictor space (Cook et al., J. R. Statist. Soc. B 75, 851–77).
2
2015
102
Biometrika
439
456
http://hdl.handle.net/10.1093/biomet/asv001
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Liliana Forzani
Xin Zhang
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:191-202.2015-07-30RePEc:oup:biomet
article
Adaptive randomized trial designs that cannot be dominated by any standard design at the same total sample size
Prior work has shown that the power of adaptive designs with rules for modifying the sample size can always be matched or improved by suitably chosen, standard, group sequential designs. A natural question is whether analogous results hold for other types of adaptive designs. We focus on adaptive enrichment designs, which involve preplanned rules for modifying enrollment criteria based on accrued data in a randomized trial. Such designs often involve multiple hypotheses, e.g., one for the total population and one for a predefined subpopulation, such as those with high disease severity at baseline. We fix the total sample size, and consider overall power, defined as the probability of rejecting at least one false null hypothesis. We present adaptive enrichment designs whose overall power at two alternatives cannot simultaneously be matched by any standard design. In some scenarios there is a substantial gap between the overall power achieved by these adaptive designs and that of any standard design. We also prove that such gains in overall power come at a cost. To attain overall power above what is achievable by certain standard designs, it is necessary to increase power to reject some hypotheses and reduce power to reject others. We demonstrate that adaptive enrichment designs allow certain power trade-offs that are not available when restricting to standard designs.
1
2015
102
Biometrika
191
202
http://hdl.handle.net/10.1093/biomet/asu057
application/pdf
Access to full text is restricted to subscribers.
Michael Rosenblum
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:141-154.2015-07-30RePEc:oup:biomet
article
Frequentist estimation of an epidemic’s spreading potential when observations are scarce
We consider the problem of inferring the potential of an epidemic for escalating into a pandemic on the basis of limited observations in its initial stages. Classical results of Becker & Hasofer (J. R. Statist. Soc. B, 59, 415–29) illustrate that frequentist estimation of the complete set of parameters of an epidemic modelled as a birth and death process remains feasible even when one is able to observe only the deaths and the total number of births. These assumptions on the observation mechanism, however, are too strong to be met in practice. We consider a more realistic scenario where only temporally aggregated random proportions of the deaths are observed over time. We demonstrate that the frequentist estimation of the Malthusian parameter governing the growth of the epidemic is still feasible in this context. We construct explicit straightforwardly calculable estimators motivated heuristically by the martingale dynamics of the process, and show that they admit a rigorous quasilikelihood interpretation. We establish the consistency and asymptotic normality of these estimators, allowing for the construction of approximate confidence intervals that can be used to infer the spreading potential of the epidemic. A simulation study and an application to the initial outbreak data of the 2009 H1N1 influenza pandemic illustrate that the method can be expected to give reasonable results in practice.
1
2014
101
Biometrika
141
154
http://hdl.handle.net/10.1093/biomet/ast049
application/pdf
Access to full text is restricted to subscribers.
Andrea Kraus
Victor M. Panaretos
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:281-294.2015-07-30RePEc:oup:biomet
article
On random-effects meta-analysis
Meta-analysis is widely used to compare and combine the results of multiple independent studies. To account for between-study heterogeneity, investigators often employ random-effects models, under which the effect sizes of interest are assumed to follow a normal distribution. It is common to estimate the mean effect size by a weighted linear combination of study-specific estimators, with the weight for each study being inversely proportional to the sum of the variance of the effect-size estimator and the estimated variance component of the random-effects distribution. Because the estimator of the variance component involved in the weights is random and correlated with study-specific effect-size estimators, the commonly adopted asymptotic normal approximation to the meta-analysis estimator is grossly inaccurate unless the number of studies is large. When individual participant data are available, one can also estimate the mean effect size by maximizing the joint likelihood. We establish the asymptotic properties of the meta-analysis estimator and the joint maximum likelihood estimator when the number of studies is either fixed or increases at a slower rate than the study sizes and we discover a surprising result: the former estimator is always at least as efficient as the latter. We also develop a novel resampling technique that improves the accuracy of statistical inference. We demonstrate the benefits of the proposed inference procedures using simulated and empirical data.
2
2015
102
Biometrika
281
294
http://hdl.handle.net/10.1093/biomet/asv011
application/pdf
Access to full text is restricted to subscribers.
D. Zeng
D. Y. Lin
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:423-437.2015-07-30RePEc:oup:biomet
article
Measurement bias and effect restoration in causal inference
This paper highlights several areas where graphical techniques can be harnessed to address the problem of measurement errors in causal inference. In particular, it discusses the control of unmeasured confounders in parametric and nonparametric models and the computational problem of obtaining bias-free effect estimates in such models. We derive new conditions under which causal effects can be restored by observing proxy variables of unmeasured confounders with/without external studies.
2
2014
101
Biometrika
423
437
http://hdl.handle.net/10.1093/biomet/ast066
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
Judea Pearl
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:215-230.2015-07-30RePEc:oup:biomet
article
Multivariate max-stable spatial processes
Max-stable processes allow the spatial dependence of extremes to be modelled and quantified, so they are widely adopted in applications. For a better understanding of extremes, it may be useful to study several variables simultaneously. To this end, we study the maxima of independent replicates of multivariate processes, both in the Gaussian and Student-t cases. We define a Poisson process construction and introduce multivariate versions of the Smith Gaussian extreme-value, the Schlather extremal-Gaussian and extremal-t, and the Brown–Resnick models. We develop inference for the models based on composite likelihoods. We present results of Monte Carlo simulations and an application to daily maximum wind speed and wind gust.
1
2015
102
Biometrika
215
230
http://hdl.handle.net/10.1093/biomet/asu066
application/pdf
Access to full text is restricted to subscribers.
Marc G. Genton
Simone A. Padoan
Huiyan Sang
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:315-323.2015-07-30RePEc:oup:biomet
article
A useful variant of the Davis–Kahan theorem for statisticians
The Davis–Kahan theorem is used in the analysis of many statistical procedures to bound the distance between subspaces spanned by population eigenvectors and their sample versions. It relies on an eigenvalue separation condition between certain population and sample eigenvalues. We present a variant of this result that depends only on a population eigenvalue separation condition, making it more natural and convenient for direct application in statistical contexts, and provide an improvement in many cases to the usual bound in the statistical literature. We also give an extension to situations where the matrices under study may be asymmetric or even non-square, and where interest is in the distance between subspaces spanned by corresponding singular vectors.
2
2015
102
Biometrika
315
323
http://hdl.handle.net/10.1093/biomet/asv008
application/pdf
Access to full text is restricted to subscribers.
Y. Yu
T. Wang
R. J. Samworth
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:505-518.2015-07-30RePEc:oup:biomet
article
Self-consistent nonparametric maximum likelihood estimator of the bivariate survivor function
As usually formulated the nonparametric likelihood for the bivariate survivor function is overparameterized, resulting in uniqueness problems for the corresponding nonparametric maximum likelihood estimator. Here the estimation problem is redefined to include parameters for marginal hazard rates, and for double failure hazard rates only at informative uncensored failure time grid points where there is pertinent empirical information. Double failure hazard rates at other grid points in the risk region are specified rather than estimated. With this approach the nonparametric maximum likelihood estimator is unique, and can be calculated using a two-step procedure. The first step involves setting aside all doubly censored observations that are interior to the risk region. The nonparametric maximum likelihood estimator from the remaining data turns out to be the Dabrowska (1988) estimator. The omitted doubly censored observations are included in the procedure in the second stage using self-consistency, resulting in a noniterative nonparametric maximum likelihood estimator for the bivariate survivor function. Simulation evaluation and asymptotic distributional results are provided. Moderate sample size efficiency for the survivor function nonparametric maximum likelihood estimator is similar to that for the Dabrowska estimator as applied to the entire dataset, while some useful efficiency improvement arises for the corresponding distribution function estimator, presumably due to the avoidance of negative mass assignments.
3
2014
101
Biometrika
505
518
http://hdl.handle.net/10.1093/biomet/asu010
application/pdf
Access to full text is restricted to subscribers.
R. L. Prentice
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:181-190.2015-07-30RePEc:oup:biomet
article
A tractable and interpretable four-parameter family of unimodal distributions on the circle
This article presents a class of four-parameter distributions for circular data that are unimodal, possess simple characteristic and density functions and a tractable distribution function, can be interpretably parameterized directly in terms of their trigonometric moments, afford a very wide range of skewness and kurtosis, envelop numerous interesting submodels including the wrapped Cauchy and cardioid distributions, allow straightforward parameter estimation by both method of moments and maximum likelihood, and are closed under convolution. This class of distributions exhibits the widest range of attractive properties yet available while retaining unimodality.
1
2015
102
Biometrika
181
190
http://hdl.handle.net/10.1093/biomet/asu059
application/pdf
Access to full text is restricted to subscribers.
Shogo Kato
M. C. Jones
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:613-624.2015-07-30RePEc:oup:biomet
article
Estimation of mean response via the effective balancing score
We introduce the effective balancing score for estimation of the mean response under a missing-at-random mechanism. Unlike conventional balancing scores, the proposed score is constructed via dimension reduction free of model specification. Three types of such scores are introduced, distinguished by whether they carry the covariate information about the missingness, the response, or both. The effective balancing score leads to consistent estimation with little or no loss in efficiency. Compared to existing estimators, it reduces the burden of model specification and is more robust. It is a near-automatic procedure which is most appealing when high-dimensional covariates are involved. We investigate its asymptotic and numerical properties, and illustrate its application with an HIV disease study.
3
2014
101
Biometrika
613
624
http://hdl.handle.net/10.1093/biomet/asu022
application/pdf
Access to full text is restricted to subscribers.
Zonghui Hu
Dean A. Follmann
Naisyin Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:155-173.2015-07-30RePEc:oup:biomet
article
On the stationary distribution of iterative imputations
Iterative imputation, in which variables are imputed one at a time conditional on all the others, is a popular technique that can be convenient and flexible, as it replaces a potentially difficult multivariate modelling problem with relatively simple univariate regressions. In this paper, we begin to characterize the stationary distributions of iterative imputations and their statistical properties, accounting for the conditional models being iteratively estimated from data rather than being prespecified. When the families of conditional models are compatible, we provide sufficient conditions under which the imputation distribution converges in total variation to the posterior distribution of a Bayesian model. When the conditional models are incompatible but valid, we show that the combined imputation estimator is consistent.
1
2014
101
Biometrika
155
173
http://hdl.handle.net/10.1093/biomet/ast044
application/pdf
Access to full text is restricted to subscribers.
Jingchen Liu
Andrew Gelman
Jennifer Hill
Yu-Sung Su
Jonathan Kropko
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:77-94.2015-07-30RePEc:oup:biomet
article
Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems
We develop uniformly valid confidence regions for regression coefficients in a high-dimensional sparse median regression model with homoscedastic errors. Our methods are based on a moment equation that is immunized against nonregular estimation of the nuisance part of the median regression function by using Neyman’s orthogonalization. We establish that the resulting instrumental median regression estimator of a target regression coefficient is asymptotically normally distributed uniformly with respect to the underlying sparse model and is semiparametrically efficient. We also generalize our method to a general nonsmooth Z-estimation framework where the number of target parameters is possibly much larger than the sample size. We extend Huber's results on asymptotic normality to this setting, demonstrating uniform asymptotic normality of the proposed estimators over rectangles, constructing simultaneous confidence bands on all of the target parameters, and establishing asymptotic validity of the bands uniformly over underlying approximately sparse models.
1
2015
102
Biometrika
77
94
http://hdl.handle.net/10.1093/biomet/asu056
application/pdf
Access to full text is restricted to subscribers.
A. Belloni
V. Chernozhukov
K. Kato
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:121-134.2015-07-30RePEc:oup:biomet
article
Moment-type estimators for the proportional likelihood ratio model with longitudinal data
Luo & Tsai, Biometrika 99, 211–22, 2012, proposed a proportional likelihood ratio model and discussed a maximum likelihood method for its parameter estimation. In this paper, we use this model as the marginal distribution to analyse longitudinal data, where the maximum likelihood method is not directly applicable because the joint distribution is not fully specified. We propose a moment-type method that is an extension of the generalized estimating equation method. The resulting estimators are consistent, asymptotically normal and perform well in our simulation study.
1
2015
102
Biometrika
121
134
http://hdl.handle.net/10.1093/biomet/asu055
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:567-585.2015-07-30RePEc:oup:biomet
article
New approaches to nonparametric and semiparametric regression for univariate and multivariate group testing data
We consider nonparametric and semiparametric estimation of a conditional probability curve in the case of group testing data, where the individuals are pooled randomly into groups and only the pooled data are available. We derive a nonparametric weighted estimator that has optimality properties accounting for group sizes, and show how to extend it to multivariate settings, including the partially linear model. In the group testing context, it is natural to assume that the probability curve depends on the covariates only through a linear combination of them. Motivated by this condition, we develop a nonparametric estimator based on the single-index model. We study theoretical properties of the proposed estimators and derive data-driven procedures. Practical properties of the methods are demonstrated via real and simulated examples, and our estimators are shown to have smaller median integrated square error than existing competitors.
3
2014
101
Biometrika
567
585
http://hdl.handle.net/10.1093/biomet/asu025
application/pdf
Access to full text is restricted to subscribers.
A. Delaigle
P. Hall
J. R. Wishart
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:748-754.2015-07-30RePEc:oup:biomet
article
Inference on multiple correlation coefficients with moderately high dimensional data
When the multiple correlation coefficient is used to measure how strongly a given variable can be linearly associated with a set of covariates, it suffers from an upward bias that cannot be ignored in the presence of a moderately high dimensional covariate. Under an independent component model, we derive an asymptotic approximation to the distribution of the squared multiple correlation coefficient that depends on a simple correction factor. We show that this approximation enables us to construct reliable confidence intervals on the population coefficient even when the ratio of the dimension to the sample size is close to unity and the variables are non-Gaussian.
3
2014
101
Biometrika
748
754
http://hdl.handle.net/10.1093/biomet/asu023
application/pdf
Access to full text is restricted to subscribers.
Shurong Zheng
Dandan Jiang
Zhidong Bai
Xuming He
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:673-688.2015-07-30RePEc:oup:biomet
article
The asymptotic inadmissibility of the spatial sign covariance matrix for elliptically symmetric distributions
The asymptotic efficiency of the spatial sign covariance matrix relative to affine equivariant estimators of scatter is studied. In particular, the spatial sign covariance matrix is shown to be asymptotically inadmissible, i.e., the asymptotic covariance matrix of the consistency-corrected spatial sign covariance matrix is uniformly larger than that of its affine equivariant counterpart, namely Tyler’s scatter matrix. Although the spatial sign covariance matrix has often been recommended when one is interested in principal components analysis, its inefficiency is shown to be most severe in situations where principal components are of greatest interest. Simulation shows that the inefficiency of the spatial sign covariance matrix also holds for small sample sizes, and that the asymptotic relative efficiency is a good approximation to the finite-sample efficiency for relatively modest sample sizes.
3
2014
101
Biometrika
673
688
http://hdl.handle.net/10.1093/biomet/asu020
application/pdf
Access to full text is restricted to subscribers.
Andrew F. Magyar
David E. Tyler
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:719-725.2015-07-30RePEc:oup:biomet
article
Estimation from cross-sectional samples under bias and dependence
A population can be entered at a known sequence of discrete times; it is sampled cross-sectionally, and the sojourn times of individuals in the sample are observed. It is well known that cross-sectioning leads to length-bias, but less well known and often ignored that it may also result in dependence among the observations. We show that observed sojourn times are independent only under a multinomial entrance process. We study asymptotic properties of parametric and nonparametric estimators of the sojourn time distribution using the product of marginals in spite of dependence, and provide conditions under which this approach results in proper or improper and wrong inference. We apply the proposed methods to data on hospitalization time after bowel and hernia surgeries collected by a cross-sectional design.
3
2014
101
Biometrika
719
725
http://hdl.handle.net/10.1093/biomet/asu013
application/pdf
Access to full text is restricted to subscribers.
Micha Mandel
Yosef Rinott
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:477-483.2015-07-30RePEc:oup:biomet
article
Hypothesis testing for band size detection of high-dimensional banded precision matrices
Many statistical analysis procedures require a good estimator for a high-dimensional covariance matrix or its inverse, the precision matrix. When the precision matrix is banded, the Cholesky-based method often yields a good estimator of the precision matrix. One important aspect of this method is determination of the band size of the precision matrix. In practice, crossvalidation is commonly used; however, we show that crossvalidation not only is computationally intensive but can be very unstable. In this paper, we propose a new hypothesis testing procedure to determine the band size in high dimensions. Our proposed test statistic is shown to be asymptotically normal under the null hypothesis, and its theoretical power is studied. Numerical examples demonstrate the effectiveness of our testing procedure.
2
2014
101
Biometrika
477
483
http://hdl.handle.net/10.1093/biomet/asu006
application/pdf
Access to full text is restricted to subscribers.
Baiguo An
Jianhua Guo
Yufeng Liu
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:237-244.2015-07-30RePEc:oup:biomet
article
On adjustment for auxiliary covariates in additive hazard models for the analysis of randomized experiments
We consider additive hazard models (Aalen, 1989) for the effect of a randomized treatment on a survival outcome, adjusting for auxiliary baseline covariates. We demonstrate that the Aalen least-squares estimator of the treatment effect parameter is asymptotically unbiased, even when the hazard's dependence on time or on the auxiliary covariates is misspecified, and even away from the null hypothesis of no treatment effect. We furthermore show that adjustment for auxiliary baseline covariates does not change the asymptotic variance of the estimator of the effect of a randomized treatment. We conclude that, in view of its robustness against model misspecification, Aalen least-squares estimation is attractive for evaluating treatment effects on a survival outcome in randomized experiments, and the primary reasons to consider baseline covariate adjustment in such settings could be interest in subgroup effects or the need to adjust for informative censoring or baseline imbalances. Our results also shed light on the robustness of Aalen least-squares estimators against model misspecification in observational studies.
1
2014
101
Biometrika
237
244
http://hdl.handle.net/10.1093/biomet/ast045
application/pdf
Access to full text is restricted to subscribers.
S. Vansteelandt
T. Martinussen
E. J. Tchetgen Tchetgen
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:103-120.2015-07-30RePEc:oup:biomet
article
Sparse precision matrix estimation via lasso penalized D-trace loss
We introduce a constrained empirical loss minimization framework for estimating high-dimensional sparse precision matrices and propose a new loss function, called the D-trace loss, for that purpose. A novel sparse precision matrix estimator is defined as the minimizer of the lasso penalized D-trace loss under a positive-definiteness constraint. Under a new irrepresentability condition, the lasso penalized D-trace estimator is shown to have the sparse recovery property. Examples demonstrate that the new condition can hold in situations where the irrepresentability condition for the lasso penalized Gaussian likelihood estimator fails. We establish rates of convergence for the new estimator in the elementwise maximum, Frobenius and operator norms. We develop a very efficient algorithm based on alternating direction methods for computing the proposed estimator. Simulated and real data are used to demonstrate the computational efficiency of our algorithm and the finite-sample performance of the new estimator. The lasso penalized D-trace estimator is found to compare favourably with the lasso penalized Gaussian likelihood estimator.
1
2014
101
Biometrika
103
120
http://hdl.handle.net/10.1093/biomet/ast059
application/pdf
Access to full text is restricted to subscribers.
Teng Zhang
Hui Zou
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:439-448.2015-07-30RePEc:oup:biomet
article
Propensity score adjustment with several follow-ups
Propensity score weighting adjustment is commonly used to handle unit nonresponse. When the response mechanism is nonignorable in the sense that the response probability depends directly on the study variable, a follow-up sample is commonly used to obtain an unbiased estimator using the framework of two-phase sampling, where the follow-up sample is assumed to respond completely. In practice, the follow-up sample is also subject to missingness. We consider propensity score weighting adjustment for nonignorable nonresponse when there are several follow-ups and the final follow-up sample is also subject to missingness. We propose a method-of-moments estimator for estimating parameters in the response probability. The proposed method can be implemented using the generalized method of moments and a consistent variance estimate can be obtained relatively easily. A limited simulation study shows the robustness of the proposed method. The proposed methods are applied to a Korean household survey of employment.
2
2014
101
Biometrika
439
448
http://hdl.handle.net/10.1093/biomet/asu003
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
Jongho Im
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:927-942.2015-07-30RePEc:oup:biomet
article
Testing independence and goodness-of-fit in linear models
We consider a linear regression model and propose an omnibus test to simultaneously check the assumption of independence between the error and predictor variables and the goodness-of-fit of the parametric model. Our approach is based on testing for independence between the predictor and the residual obtained from the parametric fit by using the Hilbert–Schmidt independence criterion (Gretton et al., 2008). The proposed method requires no user-defined regularization, is simple to compute based on only pairwise distances between points in the sample, and is consistent against all alternatives. We develop distribution theory for the proposed test statistic, under both the null and the alternative hypotheses, and devise a bootstrap scheme to approximate its null distribution. We prove the consistency of the bootstrap scheme. A simulation study shows that our method has better power than its main competitors. Two real datasets are analysed to demonstrate the scope and usefulness of our method.
4
2014
101
Biometrika
927
942
http://hdl.handle.net/10.1093/biomet/asu026
application/pdf
Access to full text is restricted to subscribers.
A. Sen
B. Sen
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:175-188.2015-07-30RePEc:oup:biomet
article
Protective estimation of mixed-effects logistic regression when data are not missing at random
We consider estimation of mixed-effects logistic regression models for longitudinal data when missing outcomes are not missing at random. A typology of missingness mechanisms is presented that includes missingness dependent on observed or missing current outcomes, observed or missing lagged outcomes and subject-specific effects. When data are not missing at random, consistent estimation by maximum marginal likelihood generally requires correct parametric modelling of the missingness mechanism, which hinges on unverifiable assumptions. We show that standard maximum conditional likelihood estimators are protective in the sense that they are consistent for monotone or intermittent missing data under a wide range of missingness mechanisms. Our approach requires neither specification of parametric models for the missingness mechanism nor refreshment samples and is straightforward to implement in standard software.
1
2014
101
Biometrika
175
188
http://hdl.handle.net/10.1093/biomet/ast054
application/pdf
Access to full text is restricted to subscribers.
A. Skrondal
S. Rabe-Hesketh
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:449-464.2015-07-30RePEc:oup:biomet
article
Testing equality of a large number of densities
The problem of testing equality of a large number of densities is considered. The classical k-sample problem compares a small, fixed number of distributions and allows the sample size from each distribution to increase without bound. In our asymptotic analysis the number of distributions tends to infinity but the size of individual samples remains fixed. The proposed test statistic is motivated by the simple idea of comparing kernel density estimators from the various samples to the average of all density estimators. However, a novel interpretation of this familiar type of statistic arises upon centring it. The asymptotic distribution of the statistic under the null hypothesis of equal densities is derived, and power against local alternatives is considered. It is shown that a consistent test is attainable in many situations where all but a vanishingly small proportion of densities are equal to each other. The test is studied via simulation, and an illustration involving microarray data is provided.
2
2014
101
Biometrika
449
464
http://hdl.handle.net/10.1093/biomet/asu002
application/pdf
Access to full text is restricted to subscribers.
D. Zhan
J. D. Hart
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:519-533.2015-07-30RePEc:oup:biomet
article
Nonparametric inference on bivariate survival data with interval sampling: association estimation and testing
In many biomedical applications, interest focuses on the occurrence of two or more consecutive failure events and the relationship between event times, such as age of disease onset and residual lifetime. Bivariate survival data with interval sampling arise frequently when disease registries or surveillance systems collect data based on disease incidence occurring within a specific calendar time interval. The initial event is then retrospectively confirmed and the subsequent failure event may be observed during follow-up. In life history studies, the initial and two consecutive failure events could correspond to birth, disease onset and death. The statistical features and bias of observed data in relation to interval sampling were discussed by Zhu & Wang (2012). Here we propose nonparametric estimation of the association between bivariate failure times based on Kendall’s tau for data collected with interval sampling. A nonparametric estimator is given, where the contribution of each comparable and orderable pair is weighted by the inverse of the associated selection probability. Analysis methods for bivariate survival data with interval sampling rely on the assumption of quasi-independence, i.e., that bivariate failure times and the time of the initial event are independent in the observable region. This paper develops a nonparametric test of quasi-independence based on a bivariate conditional Kendall’s tau for such data. Simulation studies demonstrate that the association estimator and testing procedure perform well with moderate sample sizes. Illustrations with two real datasets are provided.
3
2014
101
Biometrika
519
533
http://hdl.handle.net/10.1093/biomet/asu005
application/pdf
Access to full text is restricted to subscribers.
Hong Zhu
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:15-32.2015-07-30RePEc:oup:biomet
article
Varying-coefficient additive models for functional data
Both varying-coefficient and additive models have been studied extensively in the literature as extensions to linear models. They have also been extended to deal with functional response data. However, existing extensions are still not flexible enough to reflect the functional nature of the responses. In this paper, we extend varying-coefficient and additive models to obtain a much more flexible model and propose a simple algorithm to estimate its nonparametric additive and varying-coefficient components. We establish the asymptotic properties of each component function. We demonstrate the applicability of the new model through analysis of traffic data.
1
2015
102
Biometrika
15
32
http://hdl.handle.net/10.1093/biomet/asu053
application/pdf
Access to full text is restricted to subscribers.
Xiaoke Zhang
Jane-Ling Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:957-963.2015-07-30RePEc:oup:biomet
article
Nearly orthogonal arrays mappable into fully orthogonal arrays
We develop a method for construction of arrays which are nearly orthogonal, in the sense that each column is orthogonal to a large proportion of the other columns, and which are convertible to fully orthogonal arrays via a mapping of the symbols in each column to a possibly smaller set of symbols. These arrays can be useful in computer experiments as designs which accommodate a large number of factors and enjoy attractive space-filling properties. Our construction allows both the mappable nearly orthogonal array and the consequent fully orthogonal array to be either symmetric or asymmetric. Resolvable orthogonal arrays play a key role in the construction.
4
2014
101
Biometrika
957
963
http://hdl.handle.net/10.1093/biomet/asu042
application/pdf
Access to full text is restricted to subscribers.
Rahul Mukerjee
Fasheng Sun
Boxin Tang
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:553-566.2015-07-30RePEc:oup:biomet
article
Statistical inference methods for recurrent event processes with shape and size parameters
This paper proposes a unified framework to characterize the rate function of a recurrent event process through shape and size parameters. In contrast to the intensity function, which is the event occurrence rate conditional on the event history, the rate function is the occurrence rate unconditional on the event history, and thus it can be interpreted as a population-averaged count of events in unit time. In this paper, shape and size parameters are introduced and used to characterize the association between the rate function λ(⋅) and a random variable X. Measures of association between X and λ(⋅) are defined via shape- and size-based coefficients. Rate-independence of X and λ(⋅) is studied through tests of shape-independence and size-independence, where the shape- and size-based test statistics can be used separately or in combination. These tests can be applied when X is a covariable possibly correlated with the recurrent event process through λ(⋅) or, in the one-sample setting, when X is the censoring time at which the observation of N(⋅) is terminated. The proposed tests are shape- and size-based, so when a null hypothesis is rejected, the test results can serve to distinguish the source of violation.
3
2014
101
Biometrika
553
566
http://hdl.handle.net/10.1093/biomet/asu016
application/pdf
Access to full text is restricted to subscribers.
Mei-Cheng Wang
Chiung-Yu Huang
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:479-485.2015-07-30RePEc:oup:biomet
article
Effective degrees of freedom: a flawed metaphor
To most applied statisticians, a fitting procedure’s degrees of freedom is synonymous with its model complexity, or its capacity for overfitting to data. In particular, the degrees of freedom is often used to parameterize the bias-variance trade-off in model selection. We argue that, on the contrary, model complexity and degrees of freedom may correspond very poorly. We exhibit and theoretically explore various fitting procedures for which the degrees of freedom is not monotonic in the model complexity parameter and can exceed the total dimension of the ambient space even in very simple settings. We show that the degrees of freedom for any nonconvex projection method can be unbounded.
2
2015
102
Biometrika
479
485
http://hdl.handle.net/10.1093/biomet/asv019
application/pdf
Access to full text is restricted to subscribers.
Lucas Janson
William Fithian
Trevor J. Hastie
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:641-654.2015-07-30RePEc:oup:biomet
article
Latent factor models for density estimation
Although discrete mixture modelling has formed the backbone of the literature on Bayesian density estimation, there are some well-known disadvantages. As an alternative to discrete mixtures, we propose a class of priors based on random nonlinear functions of a uniform latent variable with an additive residual. The induced prior for the density is shown to have desirable properties, including ease of centring on an initial guess, large support, posterior consistency and straightforward computation via Gibbs sampling. Some advantages over discrete mixtures, such as Dirichlet process mixtures of Gaussian kernels, are discussed and illustrated via simulations and an application.
3
2014
101
Biometrika
641
654
http://hdl.handle.net/10.1093/biomet/asu019
application/pdf
Access to full text is restricted to subscribers.
S. Kundu
D. B. Dunson
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:57-70.2015-07-30RePEc:oup:biomet
article
Asymptotic properties for combined L1 and concave regularization
Two important goals of high-dimensional modelling are prediction and variable selection. In this article, we consider regularization with combined L1 and concave penalties, and study the sampling properties of the global optimum of the suggested method in ultrahigh-dimensional settings. The L1 penalty provides the minimum regularization needed for removing noise variables in order to achieve oracle prediction risk, while a concave penalty imposes additional regularization to control model sparsity. In the linear model setting, we prove that the global optimum of our method enjoys the same oracle inequalities as the lasso estimator and admits an explicit bound on the false sign rate, which can be asymptotically vanishing. Moreover, we establish oracle risk inequalities for the method and the sampling properties of computable solutions. Numerical studies suggest that our method yields more stable estimates than using a concave penalty alone.
1
2014
101
Biometrika
57
70
http://hdl.handle.net/10.1093/biomet/ast047
application/pdf
Access to full text is restricted to subscribers.
Yingying Fan
Jinchi Lv
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:992-998.2015-07-30RePEc:oup:biomet
article
Robust Bayesian variable selection in linear models with spherically symmetric errors
This paper studies Bayesian variable selection in linear models with general spherically symmetric error distributions. We construct the posterior odds based on a separable prior, which arises as a class of mixtures of Gaussian densities. The posterior odds for comparing among nonnull models are shown to be independent of the error distribution, if this is spherically symmetric. Because of this invariance, we refer to our method as a robust Bayesian variable selection method. We demonstrate that our posterior odds have model selection consistency, and that our class of prior functions are the only ones within a large class which are robust in our sense.
4
2014
101
Biometrika
992
998
http://hdl.handle.net/10.1093/biomet/asu039
application/pdf
Access to full text is restricted to subscribers.
Yuzo Maruyama
William E. Strawderman
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:587-598.2015-07-30RePEc:oup:biomet
article
Semiparametric group testing regression models
Group testing, through the use of pooling, has proven to be an efficient method of reducing the time and cost associated with screening for a binary characteristic of interest, such as infection status. A topic of key interest in the statistical literature involves the development of regression models that relate individual-level covariates to testing responses observed from pooled specimens. In this article, we propose a general semiparametric framework that allows for the inclusion of multi-dimensional covariates, decoding information, and imperfect testing. The asymptotic properties of our estimators are presented and guidance on finite sample implementation is provided. We illustrate the performance of our methods through simulation and by applying them to chlamydia and gonorrhea data collected by the Nebraska Public Health Laboratory as a part of the Infertility Prevention Project.
3
2014
101
Biometrika
587
598
http://hdl.handle.net/10.1093/biomet/asu007
application/pdf
Access to full text is restricted to subscribers.
D. Wang
C. S. McMahan
C. M. Gallagher
K. B. Kulasekera
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:409-422.2015-07-30RePEc:oup:biomet
article
Distances and inference for covariance operators
A framework is developed for inference concerning the covariance operator of a functional random process, where the covariance operator itself is an object of interest for statistical analysis. Distances for comparing positive-definite covariance matrices are either extended or shown to be inapplicable to functional data. In particular, an infinite-dimensional analogue of the Procrustes size-and-shape distance is developed. Convergence of finite-dimensional approximations to the infinite-dimensional distance metrics is also shown. For inference, a Fréchet estimator of both the covariance operator itself and the average covariance operator is introduced. A permutation procedure to test the equality of the covariance operators between two groups is also considered. Additionally, the use of such distances for extrapolation to make predictions is explored. As an example of the proposed methodology, the use of covariance operators has been suggested in a philological study of cross-linguistic dependence as a way to incorporate quantitative phonetic information. It is shown that distances between languages derived from phonetic covariance functions can provide insight into the relationships between the Romance languages.
2
2014
101
Biometrika
409
422
http://hdl.handle.net/10.1093/biomet/asu008
application/pdf
Access to full text is restricted to subscribers.
Davide Pigoli
John A. D. Aston
Ian L. Dryden
Piercesare Secchi
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:689-702.2015-07-30RePEc:oup:biomet
article
Multivariate functional-coefficient regression models for nonlinear vector time series data
Vector time series data are widely met in practice. In this paper we propose a multivariate functional-coefficient regression model with heteroscedasticity for modelling such data. A local linear smoother is employed to estimate the unknown coefficient matrices. Asymptotic normality of the proposed estimators is established, and bandwidth selection is considered. To deal with the co-integration commonly observed in financial markets, we propose an error-corrected multivariate functional-coefficient model. Simulations show that our proposed estimation procedures capture nonlinear structures of coefficients well. Analysis of United States interest rates illustrates the proposed methods.
3
2014
101
Biometrika
689
702
http://hdl.handle.net/10.1093/biomet/asu011
application/pdf
Access to full text is restricted to subscribers.
Jiancheng Jiang
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:741-747.2015-07-30RePEc:oup:biomet
article
Construction of orthogonal and nearly orthogonal designs for computer experiments
This paper presents new infinite families of orthogonal designs for computer experiments. In cases where orthogonal designs cannot exist, we construct alternative, nearly orthogonal designs. Our designs can accommodate many factors and a large set of levels. No iterative computer search is required. To build up the desired orthogonal designs we develop and use new infinite classes of periodic Golay pairs.
3
2014
101
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/asu021
application/pdf
Access to full text is restricted to subscribers.
S. D. Georgiou
S. Stylianou
K. Drosou
C. Koukouvinos
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:726-732.2015-07-30RePEc:oup:biomet
article
Simple relaxed conditional likelihood
When the data are sparse but not exceedingly so, we face a trade-off between bias and precision that makes the usual choice between conducting either a fully unconditional inference or a fully conditional inference unduly restrictive. We propose a method to relax the conditional inference that relies upon commonly available computer outputs. In the rectangular array asymptotic setting, the relaxed conditional maximum likelihood estimator has smaller bias than the unconditional estimator and smaller mean square error than the conditional estimator.
3
2014
101
Biometrika
726
732
http://hdl.handle.net/10.1093/biomet/asu028
application/pdf
Access to full text is restricted to subscribers.
John J. Hanfelt
Lijia Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:253-268.2015-07-30RePEc:oup:biomet
article
Direct estimation of differential networks
It is often of interest to understand how the structure of a genetic network differs between two conditions. In this paper, each condition-specific network is modelled using the precision matrix of a multivariate normal random vector, and a method is proposed to directly estimate the difference of the precision matrices. In contrast to other approaches, such as separate or joint estimation of the individual matrices, direct estimation does not require those matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Under the assumption that the true differential network is sparse, the direct estimator is shown to be consistent in support recovery and estimation. It is also shown to outperform existing methods in simulations, and its properties are illustrated on gene expression data from late-stage ovarian cancer patients.
2
2014
101
Biometrika
253
268
http://hdl.handle.net/10.1093/biomet/asu009
application/pdf
Access to full text is restricted to subscribers.
Sihai Dave Zhao
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:785-797.2015-07-30RePEc:oup:biomet
article
Variable selection in regression with compositional covariates
Motivated by research problems arising in the analysis of gut microbiome and metagenomic data, we consider variable selection and estimation in high-dimensional regression with compositional covariates. We propose an ℓ1 regularization method for the linear log-contrast model that respects the unique features of compositional data. We formulate the proposed procedure as a constrained convex optimization problem and introduce a coordinate descent method of multipliers for efficient computation. In the high-dimensional setting where the dimensionality grows at most exponentially with the sample size, model selection consistency and $\ell _{\infty }$ bounds for the resulting estimator are established under conditions that are mild and interpretable for compositional data. The numerical performance of our method is evaluated via simulation studies and its usefulness is illustrated by an application to a microbiome study relating human body mass index to gut microbiome composition.
4
2014
101
Biometrika
785
797
http://hdl.handle.net/10.1093/biomet/asu031
application/pdf
Access to full text is restricted to subscribers.
Wei Lin
Pixu Shi
Rui Feng
Hongzhe Li
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:65-76.2015-07-30RePEc:oup:biomet
article
Conditional quantile screening in ultrahigh-dimensional heterogeneous data
To accommodate the heterogeneity that is often present in ultrahigh-dimensional data, we propose a conditional quantile screening method, which enables us to select features that contribute to the conditional quantile of the response given the covariates. The method can naturally handle censored data by incorporating a weighting scheme through redistribution of the mass to the right; moreover, it is invariant to monotone transformation of the response and requires substantially weaker conditions than do alternative methods. We establish sure independent screening properties for both the complete and the censored response cases. We also conduct simulations to evaluate the finite-sample performance of the proposed method, and compare it with existing approaches.
1
2015
102
Biometrika
65
76
http://hdl.handle.net/10.1093/biomet/asu068
application/pdf
Access to full text is restricted to subscribers.
Yuanshan Wu
Guosheng Yin
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:95-106.2015-07-30RePEc:oup:biomet
article
Dimension reduction based on the Hellinger integral
Sufficient dimension reduction is a useful tool for studying the dependence between a response and a multi-dimensional predictor. In this article, a new formulation is proposed that is based on the Hellinger integral of order two, introduced as a natural measure of the regression information contained in the predictor subspace. The response may be either continuous or discrete. We establish links between local and global central subspaces, and propose an efficient local estimation algorithm. Simulations and an application show that our method compares favourably with existing approaches.
1
2015
102
Biometrika
95
106
http://hdl.handle.net/10.1093/biomet/asu062
application/pdf
Access to full text is restricted to subscribers.
Qin Wang
Xiangrong Yin
Frank Critchley
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:913-926.2015-07-30RePEc:oup:biomet
article
A distribution-free two-sample run test applicable to high-dimensional data
We propose a multivariate generalization of the univariate two-sample run test based on the shortest Hamiltonian path. The proposed test is distribution-free in finite samples. While most existing two-sample tests perform poorly or are even inapplicable to high-dimensional data, our test can be conveniently used in high-dimension, low-sample-size situations. We investigate its power when the sample size remains fixed and the dimension of the data grows to infinity. Simulated and real datasets demonstrate our method’s superiority over existing nonparametric two-sample tests.
4
2014
101
Biometrika
913
926
http://hdl.handle.net/10.1093/biomet/asu045
application/pdf
Access to full text is restricted to subscribers.
Munmun Biswas
Minerva Mukhopadhyay
Anil K. Ghosh
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:865-882.2015-07-30RePEc:oup:biomet
article
Robust estimators for nondecomposable elliptical graphical models
Robust estimators of the restricted covariance matrices associated with elliptical graphical models are studied. General asymptotic results, which apply to both decomposable and nondecomposable graphical models, are presented for robust plug-in type estimators. These extend results previously established only for the decomposable case. Furthermore, a class of graphical M-estimators for the restricted covariance matrices is introduced and compared with the corresponding plug-in M-estimators. The two approaches are shown to be asymptotically equivalent under random sampling from an elliptical distribution. A simulation study demonstrates the superiority of the graphical M-estimators for small samples.
4
2014
101
Biometrika
865
882
http://hdl.handle.net/10.1093/biomet/asu041
application/pdf
Access to full text is restricted to subscribers.
D. Vogel
D. E. Tyler
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:345-358.2015-07-30RePEc:oup:biomet
article
On the dependence structure of bivariate recurrent event processes: inference and estimation
Bivariate or multivariate recurrent event processes are often encountered in longitudinal studies in which more than one type of event is of interest. There has been much research on regression analysis for such data, but little has been done to measure the dependence between recurrent event processes. We propose a time-dependent measure, termed the rate ratio, to assess the local dependence between two types of recurrent event processes. We model the rate ratio as a parametric function of time, and leave unspecified all other aspects of the distribution. We develop a composite likelihood procedure for model fitting and parameter estimation. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is evaluated by simulation and illustrated by an application to a soft tissue sarcoma study.
2
2015
102
Biometrika
345
358
http://hdl.handle.net/10.1093/biomet/asu073
application/pdf
Access to full text is restricted to subscribers.
Jing Ning
Yong Chen
Chunyan Cai
Xuelin Huang
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:999-1002.2015-07-30RePEc:oup:biomet
article
General type-token distribution
We consider the problem of estimating the number of types in a corpus using the number of types observed in a sample of tokens from that corpus. We derive exact and asymptotic distributions for the number of observed types, conditioned on the number of tokens and the latent type distribution. We use the asymptotic distributions to derive an estimator of the latent number of types and validate this estimator numerically.
4
2014
101
Biometrika
999
1002
http://hdl.handle.net/10.1093/biomet/asu035
application/pdf
Access to full text is restricted to subscribers.
S. Hidaka
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:943-956.2015-07-30RePEc:oup:biomet
article
Circular designs balanced for neighbours at distances one and two
We define three types of neighbour-balanced designs for experiments where the units are arranged in a circle or single line in space or time. The designs are balanced with respect to neighbours at distance one and at distance two. The variants come from allowing or forbidding self-neighbours, and from considering neighbours to be directed or undirected. For two of the variants, we give a method of constructing a design for all values of the number of treatments, except for some small values where it is impossible. In the third case, we give a partial solution that covers all sizes likely to be used in practice.
4
2014
101
Biometrika
943
956
http://hdl.handle.net/10.1093/biomet/asu036
application/pdf
Access to full text is restricted to subscribers.
R. E. L. Aldred
R. A. Bailey
Brendan D. Mckay
Ian M. Wanless
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:499-504.2015-07-30RePEc:oup:biomet
article
Multiscale variance stabilization via maximum likelihood
This article proposes maximum likelihood approaches for multiscale variance stabilization transformations for independently and identically distributed data. For two multiscale variance stabilization transformations we present new unified theoretical results on their Jacobians, a key component of the likelihood. The results provide a deeper understanding of the transformations and the ability to compute the likelihood in linear time. The transformations are shown empirically to compare favourably to the Box–Cox transformation.
2
2014
101
Biometrika
499
504
http://hdl.handle.net/10.1093/biomet/ast072
application/pdf
Access to full text is restricted to subscribers.
G. P. Nason
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:33-45.2015-07-30RePEc:oup:biomet
article
Covariance-enhanced discriminant analysis
Linear discriminant analysis has been widely used to characterize or separate multiple classes via linear combinations of features. However, the high dimensionality of features from modern biological experiments defies traditional discriminant analysis techniques. Possible interfeature correlations present additional challenges and are often underused in modelling. In this paper, by incorporating possible interfeature correlations, we propose a covariance-enhanced discriminant analysis method that simultaneously and consistently selects informative features and identifies the corresponding discriminable classes. Under mild regularity conditions, we show that the method can achieve consistent parameter estimation and model selection, and can attain an asymptotically optimal misclassification rate. Extensive simulations have verified the utility of the method, which we apply to a renal transplantation trial.
1
2015
102
Biometrika
33
45
http://hdl.handle.net/10.1093/biomet/asu049
application/pdf
Access to full text is restricted to subscribers.
Peirong Xu
Ji Zhu
Lixing Zhu
Yi Li
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:883-898.2015-07-30RePEc:oup:biomet
article
Nonparametric Bayes dynamic modelling of relational data
Symmetric binary matrices representing relations are collected in many areas. Our focus is on dynamically evolving binary relational matrices, with interest being on inference on the relationship structure and prediction. We propose a nonparametric Bayesian dynamic model, which reduces dimensionality in characterizing the binary matrix through a lower-dimensional latent space representation, with the latent coordinates evolving in continuous time via Gaussian processes. By using a logistic mapping function from the link probability matrix space to the latent relational space, we obtain a flexible and computationally tractable formulation. Employing Pólya-gamma data augmentation, an efficient Gibbs sampler is developed for posterior computation, with the dimension of the latent space automatically inferred. We provide theoretical results on flexibility of the model, and illustrate its performance via simulation experiments. We also consider an application to co-movements in world financial markets.
4
2014
101
Biometrika
883
898
http://hdl.handle.net/10.1093/biomet/asu040
application/pdf
Access to full text is restricted to subscribers.
Daniele Durante
David B. Dunson
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:247-266.2015-07-30RePEc:oup:biomet
article
Testing differential networks with applications to the detection of gene-gene interactions
Model organisms and human studies have yielded increasing empirical evidence that interactions among genes contribute broadly to genetic variation of complex traits. In the presence of gene-gene interactions, the dimensionality of the feature space becomes extremely high relative to the sample size. This poses a significant methodological challenge in the identification of gene-gene interactions. In this paper, by using a Gaussian graphical model framework, we translate the problem of identifying gene-gene interactions associated with a binary trait D into an inference problem on the difference of two high-dimensional precision matrices that summarize the conditional dependence network structures of the genes. We propose a procedure for testing the differential network globally, which is particularly powerful against sparse alternatives. In addition, a multiple testing procedure with false discovery rate control is developed to infer the specific structure of the differential network. Theoretical justification is provided to ensure the validity of the proposed tests, and optimality results are derived under sparsity assumptions. Through a simulation study we demonstrate that the proposed tests maintain the desired error rates under the null hypothesis and have good power under the alternative hypothesis. The methods are applied to a breast cancer gene expression study.
2
2015
102
Biometrika
247
266
http://hdl.handle.net/10.1093/biomet/asu074
application/pdf
Access to full text is restricted to subscribers.
Yin Xia
Tianxi Cai
T. Tony Cai
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:135-150.2015-07-30RePEc:oup:biomet
article
An extended hazard model with longitudinal covariates
In clinical trials and other medical studies, it has become increasingly common to observe simultaneously an event time of interest and longitudinal covariates. In the literature, joint modelling approaches have been employed to analyse both survival and longitudinal processes and to investigate their association. However, these approaches focus mostly on developing adaptive and flexible longitudinal processes based on a prespecified survival model, most commonly the Cox proportional hazards model. In this paper, we propose a general class of semiparametric hazard regression models, referred to as the extended hazard model, for the survival component. This class includes two popular survival models, the Cox proportional hazards model and the accelerated failure time model, as special cases. The proposed model is flexible for modelling event data, and its nested structure facilitates model selection for the survival component through likelihood ratio tests. A pseudo joint likelihood approach is proposed for estimating the unknown parameters and components via a Monte Carlo em algorithm. Asymptotic theory for the estimators is developed together with theory for the semiparametric likelihood ratio tests. The performance of the procedure is demonstrated through simulation studies. A case study featuring data from a Taiwanese HIV/AIDS cohort study further illustrates the usefulness of the extended hazard model.
1
2015
102
Biometrika
135
150
http://hdl.handle.net/10.1093/biomet/asu058
application/pdf
Access to full text is restricted to subscribers.
Y. K. Tseng
Y. R. Su
M. Mao
J. L. Wang
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:491-498.2015-07-30RePEc:oup:biomet
article
Sequential combination of weighted and nonparametric bagging for classification
We propose a simple sequential procedure for bagged classification, which modifies nonparametric bagging by randomizing class labels of resampled data points. The random labelling feature of the procedure also enables us to undertake unsupervised classification with the benefit of supervised learning. Theoretical properties are given for the nearest neighbour classifier in the case of supervised learning and a hard-thresholding indicator in the case of unsupervised learning, showing that sequential bagging accelerates convergence of the bagged predictor to the Bayes rule. Simulation results are provided in support of the proposed method.
2
2014
101
Biometrika
491
498
http://hdl.handle.net/10.1093/biomet/ast068
application/pdf
Access to full text is restricted to subscribers.
M. Soleymani
S. M. S. Lee
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:17-36.2015-07-30RePEc:oup:biomet
article
A sum characterization of hidden regular variation with likelihood inference via expectation-maximization
A fundamental deficiency of classical multivariate extreme value theory is the inability to distinguish between asymptotic independence and exact independence. In this work, we examine multivariate threshold modelling in the framework of regular variation on cones. Tail dependence is described by a limiting measure, which in some cases is degenerate on joint tail regions despite strong subasymptotic dependence in such regions. Hidden regular variation, a higher-order tail decay on these regions, offers a refinement of the classical theory. We develop a representation of random vectors possessing hidden regular variation as the sum of independent regular varying components. The representation is shown to be asymptotically valid via a multivariate tail equivalence result. We develop a likelihood-based estimation procedure from this representation via a Monte Carlo expectation-maximization algorithm which has been modified for tail estimation. The method is demonstrated on simulated data and applied to air pollution measurements.
1
2014
101
Biometrika
17
36
http://hdl.handle.net/10.1093/biomet/ast046
application/pdf
Access to full text is restricted to subscribers.
Grant B. Weller
Daniel Cooley
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:252.2015-07-30RePEc:oup:biomet
article
‘Objective Bayesian analysis for the Student-t regression model’
1
2014
101
Biometrika
252
252
http://hdl.handle.net/10.1093/biomet/asu001
application/pdf
Access to full text is restricted to subscribers.
T. C. O. Fonseca
M. A. R. Ferreira
H. S. Migon
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:295-313.2015-07-30RePEc:oup:biomet
article
Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator
When an unbiased estimator of the likelihood is used within a Metropolis–Hastings chain, it is necessary to trade off the number of Monte Carlo samples used to construct this estimator against the asymptotic variances of the averages computed under this chain. Using many Monte Carlo samples will typically result in Metropolis–Hastings averages with lower asymptotic variances than the corresponding averages that use fewer samples; however, the computing time required to construct the likelihood estimator increases with the number of samples. Under the assumption that the distribution of the additive noise introduced by the loglikelihood estimator is Gaussian with variance inversely proportional to the number of samples and independent of the parameter value at which it is evaluated, we provide guidelines on the number of samples to select. We illustrate our results by considering a stochastic volatility model applied to stock index returns.
2
2015
102
Biometrika
295
313
http://hdl.handle.net/10.1093/biomet/asu075
application/pdf
Access to full text is restricted to subscribers.
A. Doucet
M. K. Pitt
G. Deligiannidis
R. Kohn
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:815-829.2015-07-30RePEc:oup:biomet
article
Transformed sufficient dimension reduction
We propose a general framework for dimension reduction in regression to fill the gap between linear and fully nonlinear dimension reduction. The main idea is to first transform each of the raw predictors monotonically and then search for a low-dimensional projection in the space defined by the transformed variables. Both user-specified and data-driven transformations are suggested. In each case, the methodology is first discussed in generality and then a representative method is proposed and evaluated by simulation. The proposed methods are applied to a real dataset.
4
2014
101
Biometrika
815
829
http://hdl.handle.net/10.1093/biomet/asu037
application/pdf
Access to full text is restricted to subscribers.
T. Wang
X. Guo
L. Zhu
P. Xu
oai:RePEc:oup:biomet:v:101:y:2014:i:1:p:189-204.2015-07-30RePEc:oup:biomet
article
Retrospective-prospective symmetry in the likelihood and Bayesian analysis of case-control studies
Prentice & Pyke (1979) established that the maximum likelihood estimate of an odds ratio in a case-control study is the same as would be found by fitting a logistic regression; in other words, for this specific target the incorrect prospective model is inferentially equivalent to the correct retrospective model. Similar results have been obtained for other models, and conditions have also been identified under which the corresponding Bayesian property holds, namely that the posterior distribution of the odds ratio is the same whether it is computed using the prospective or the retrospective likelihood. In this article we demonstrate how these results follow directly from certain parameter independence properties of the models and priors, and identify prior laws that support such reverse analysis, for both standard and stratified designs.
1
2014
101
Biometrika
189
204
http://hdl.handle.net/10.1093/biomet/ast050
application/pdf
Access to full text is restricted to subscribers.
Simon P. J. Byrne
A. Philip Dawid
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:771-784.2015-07-30RePEc:oup:biomet
article
When does more regularization imply fewer degrees of freedom? Sufficient conditions and counterexamples
Regularization aims to improve prediction performance by trading an increase in training error for better agreement between training and prediction errors, which is often captured through decreased degrees of freedom. In this paper we give examples which show that regularization can increase the degrees of freedom in common models, including the lasso and ridge regression. In such situations, both training error and degrees of freedom increase, making the regularization inherently without merit. Two important scenarios are described where the expected reduction in degrees of freedom is guaranteed: all symmetric linear smoothers and convex constrained linear regression models like ridge regression and the lasso, when compared to unconstrained linear regression.
4
2014
101
Biometrika
771
784
http://hdl.handle.net/10.1093/biomet/asu034
application/pdf
Access to full text is restricted to subscribers.
S. Kaufman
S. Rosset
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:799-814.2015-07-30RePEc:oup:biomet
article
Censored rank independence screening for high-dimensional survival data
In modern statistical applications, the dimension of covariates can be much larger than the sample size. In the context of linear models, correlation screening (Fan & Lv, J. R. Statist. Soc. B, 70, 849–911, 2008) has been shown to reduce the dimension of such data effectively while achieving the sure screening property, i.e., all of the active variables can be retained with high probability. However, screening based on the Pearson correlation does not perform well when applied to contaminated covariates and/or censored outcomes. In this paper, we study censored rank independence screening of high-dimensional survival data. The proposed method is robust to predictors that contain outliers, works for a general class of survival models, and enjoys the sure screening property. Simulations and an analysis of real data demonstrate that the proposed method performs competitively on survival datasets of moderate size and high-dimensional predictors, even when these are contaminated.
4
2014
101
Biometrika
799
814
http://hdl.handle.net/10.1093/biomet/asu047
application/pdf
Access to full text is restricted to subscribers.
Rui Song
Wenbin Lu
Shuangge Ma
X. Jessie Jeng
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:409-420.2015-07-30RePEc:oup:biomet
article
A validated information criterion to determine the structural dimension in dimension reduction models
A crucial component of performing sufficient dimension reduction is to determine the structural dimension of the reduction model. We propose a novel information criterion-based method for this purpose, a special feature of which is that when examining the goodness-of-fit of the current model, one needs to perform model evaluation by using an enlarged candidate model. Although the procedure does not require estimation under the enlarged model of dimension k+1, the decision as to how well the current model of dimension k fits relies on the validation provided by the enlarged model; thus we call this procedure the validated information criterion, vic(k). Our method is different from existing information criterion-based model selection methods; it breaks free from dependence on the connection between dimension reduction models and their corresponding matrix eigenstructures, which relies heavily on a linearity condition that we no longer assume. We prove consistency of the proposed method, and its finite-sample performance is demonstrated numerically.
2
2015
102
Biometrika
409
420
http://hdl.handle.net/10.1093/biomet/asv004
application/pdf
Access to full text is restricted to subscribers.
Yanyuan Ma
Xinyu Zhang
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:899-911.2015-07-30RePEc:oup:biomet
article
A class of improved hybrid Hochberg–Hommel type step-up multiple test procedures
In this paper we derive a new p-value based multiple testing procedure that improves upon the Hommel procedure by gaining power as well as having a simpler step-up structure similar to the Hochberg procedure. The key to this improvement is that the Hommel procedure can be improved by a consonant procedure. Exact critical constants of this new procedure can be numerically determined. The zeroth-order approximations to the exact critical constants, albeit slightly conservative, are simple to use and need no tabling, and hence are recommended in practice. The proposed procedure is shown to control the familywise error rate under independence among the p-values. Simulations empirically demonstrate familywise error rate control under positive and negative dependence. Power superiority of the proposed procedure over competing ones is also empirically demonstrated. Illustrative examples are given.
4
2014
101
Biometrika
899
911
http://hdl.handle.net/10.1093/biomet/asu032
application/pdf
Access to full text is restricted to subscribers.
Jiangtao Gou
Ajit C. Tamhane
Dong Xi
Dror Rom
oai:RePEc:oup:biomet:v:102:y:2015:i:2:p:486-493.2015-07-30RePEc:oup:biomet
article
Semiparametric exponential families for heavy-tailed data
We propose a semiparametric method for fitting the tail of a heavy-tailed population given a relatively small sample from that population and a larger sample from a related background population. We model the tail of the small sample as an exponential tilt of the better-observed large-sample tail, using a robust sufficient statistic motivated by extreme value theory. In particular, our method induces an estimator of the small-population mean, and we give theoretical and empirical evidence that this estimator outperforms methods that do not use the background sample. We demonstrate substantial efficiency gains over competing methods in simulation and on data from a large controlled experiment conducted by Facebook.
2
2015
102
Biometrika
486
493
http://hdl.handle.net/10.1093/biomet/asu065
application/pdf
Access to full text is restricted to subscribers.
William Fithian
Stefan Wager
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:755-769.2015-07-30RePEc:oup:biomet
article
Classification with confidence
A framework for classification is developed with a notion of confidence. In this framework, a classifier consists of two tolerance regions in the predictor space, with a specified coverage level for each class. The classifier also produces an ambiguous region where the classification needs further investigation. Theoretical analysis reveals interesting structures of the confidence-ambiguity trade-off, and the optimal solution is characterized by extending the Neyman–Pearson lemma. We provide general estimating procedures, along with rates of convergence, based on estimates of the conditional probabilities. The method can be easily implemented with good robustness, as illustrated through theory, simulation and a data example.
4
2014
101
Biometrika
755
769
http://hdl.handle.net/10.1093/biomet/asu038
application/pdf
Access to full text is restricted to subscribers.
Jing Lei
oai:RePEc:oup:biomet:v:101:y:2014:i:2:p:351-363.2015-07-30RePEc:oup:biomet
article
Indicator functions and the algebra of the linear-quadratic parameterization
Indicator functions are constructed under the linear-quadratic parameterization for contrasts, and applied to the study of partial aliasing properties for three-level fractional factorial designs. An algebraic operation is introduced for the calculation of indicator function coefficients. This operation connects design construction methods to the analysis under the linear-quadratic system, and helps establish simple conditions for the estimability of interactions.
2
2014
101
Biometrika
351
363
http://hdl.handle.net/10.1093/biomet/ast070
application/pdf
Access to full text is restricted to subscribers.
Arman Sabbaghi
Tirthankar Dasgupta
C. F. Jeff Wu
oai:RePEc:oup:biomet:v:101:y:2014:i:4:p:1003.2015-07-30RePEc:oup:biomet
article
On exact forms of Taylor’s theorem for vector-valued functions
Exact forms of Taylor expansion for vector-valued functions have been incorrectly used in many statistical publications. We offer two methods to correct this error.
4
2014
101
Biometrika
1003
1003
http://hdl.handle.net/10.1093/biomet/asu061
application/pdf
Access to full text is restricted to subscribers.
Changyong Feng
Hongyue Wang
Tian Chen
Xin M. Tu
oai:RePEc:oup:biomet:v:101:y:2014:i:3:p:535-552.2015-07-30RePEc:oup:biomet
article
Tests for comparing estimated survival functions
We describe a class of statistical tests for the comparison of two or more survival curves, typically estimated using the Kaplan–Meier method. The class is based on the construction of O’Quigley (2003), and some special cases are of particular interest. Underlying the inferential development are the arguments of Efron & Hinkley (1978), leading to a theoretical sampling model that is in some sense closer to the observed data. The log-rank and weighted log-rank tests arise as special members of the class. In practice the log-rank test will often be a suboptimal, even poor, test due to the presence of non-proportional hazards. The proposed test maintains good power and, in all the cases considered, has greater power than the log-rank test under non-proportional hazards. The power will depend on the alternatives being considered, and under reasonable assumptions on the alternatives, we conclude that the proposed test is more powerful than the log-rank test. Simulations support these conclusions. An example is given as an illustration.
3
2014
101
Biometrika
535
552
http://hdl.handle.net/10.1093/biomet/asu015
application/pdf
Access to full text is restricted to subscribers.
C. Chauvel
J. O'Quigley
oai:RePEc:oup:biomet:v:102:y:2015:i:1:p:107-119.2015-07-30RePEc:oup:biomet
article
A transformation approach in linear mixed-effects models with informative missing responses
We consider a linear mixed-effects model in which the response panel vector has missing components and the missing data mechanism depends on observed data as well as missing responses through unobserved random effects. Using a transformation of the data that eliminates the random effects, we derive asymptotically unbiased and normally distributed estimators of certain model parameters. Estimators of model parameters that cannot be estimated using the transformed data are also constructed, and their asymptotic unbiasedness and normality are established. Simulation results are presented to examine the finite sample performance of the proposed estimators and a real data example is discussed.
1
2015
102
Biometrika
107
119
http://hdl.handle.net/10.1093/biomet/asu069
application/pdf
Access to full text is restricted to subscribers.
J. Shao
J. Zhang
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:601-6152013-06-14RePEc:oup:biomet
article
Pseudo-partial likelihood for proportional hazards models with biased-sampling data
We obtain a pseudo-partial likelihood for proportional hazards models with biased-sampling data by embedding the biased-sampling data into left-truncated data. The log pseudo-partial likelihood of the biased-sampling data is the expectation of the log partial likelihood of the left-truncated data conditioned on the observed data. In addition, asymptotic properties of the estimator that maximize the pseudo-partial likelihood are derived. Applications to length-biased data, biased samples with right censoring and proportional hazards models with missing covariates are discussed. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
601
615
http://hdl.handle.net/10.1093/biomet/asp026
application/pdf
Access to full text is restricted to subscribers.
Wei Yann Tsai
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:677-6902013-06-14RePEc:oup:biomet
article
Optimal repeated measurement designs for a model with partial interactions
We consider crossover designs for a model with partial interactions. In this model, the carryover effect depends on whether the treatment is preceded by itself or not. When the aim of the experiment is to study the total effects corresponding to a single treatment, we obtain approximate optimal symmetric designs, within the competing class of circular designs, by generalizing the method introduced by Kushner (1997) and Kunert & Martin (2000). This generalization places the method proposed by Bailey & Druilhet (2004) into Kushner's context. The optimal designs obtained are not binary, as in Kunert & Martin (2000). We also propose efficient designs generated by only one sequence. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
677
690
http://hdl.handle.net/10.1093/biomet/asp034
application/pdf
Access to full text is restricted to subscribers.
P. Druilhet
W. Tinsson
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:723-7342013-06-14RePEc:oup:biomet
article
Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data
Considerable recent interest has focused on doubly robust estimators for a population mean response in the presence of incomplete data, which involve models for both the propensity score and the regression of outcome on covariates. The usual doubly robust estimator may yield severely biased inferences if neither of these models is correctly specified and can exhibit nonnegligible bias if the estimated propensity score is close to zero for some observations. We propose alternative doubly robust estimators that achieve comparable or improved performance relative to existing methods, even with some estimated propensity scores close to zero. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
723
734
http://hdl.handle.net/10.1093/biomet/asp033
application/pdf
Access to full text is restricted to subscribers.
Weihua Cao
Anastasios A. Tsiatis
Marie Davidian
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:497-5122013-06-14RePEc:oup:biomet
article
Objective Bayesian model selection in Gaussian graphical models
This paper presents a default model-selection procedure for Gaussian graphical models that involves two new developments. First, we develop a default version of the hyper-inverse Wishart prior for restricted covariance matrices, called the hyper-inverse Wishart g-prior, and show how it corresponds to the implied fractional prior for selecting a graph using fractional Bayes factors. Second, we apply a class of priors that automatically handles the problem of multiple hypothesis testing. We demonstrate our methods on a variety of simulated examples, concluding with a real example analyzing covariation in mutual-fund returns. These studies reveal that the combined use of a multiplicity-correction prior on graphs and fractional Bayes factors for computing marginal likelihoods yields better performance than existing Bayesian methods. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
497
512
http://hdl.handle.net/10.1093/biomet/asp017
application/pdf
Access to full text is restricted to subscribers.
C. M. Carvalho
J. G. Scott
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:529-5442013-06-14RePEc:oup:biomet
article
Asymptotic properties of penalized spline estimators
We study the class of penalized spline estimators, which enjoy similarities to both regression splines, without penalty and with fewer knots than data points, and smoothing splines, with knots equal to the data points and a penalty controlling the roughness of the fit. Depending on the number of knots, sample size and penalty, we show that the theoretical properties of penalized regression spline estimators are either similar to those of regression splines or to those of smoothing splines, with a clear breakpoint distinguishing the cases. We prove that using fewer knots results in better asymptotic rates than when using a large number of knots. We obtain expressions for bias and variance and asymptotic rates for the number of knots and penalty parameter. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
529
544
http://hdl.handle.net/10.1093/biomet/asp035
application/pdf
Access to full text is restricted to subscribers.
Gerda Claeskens
Tatyana Krivobokova
Jean D. Opsomer
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:751-7602013-06-14RePEc:oup:biomet
article
A Student t-mixture autoregressive model with applications to heavy-tailed financial data
We introduce the class of Student t-mixture autoregressive models, which is promising for financial time series modelling. The model is able to capture serial correlations, time-varying means and volatilities, and the shape of the conditional distributions can be time varied from short-tailed to long-tailed, or from unimodal to multimodal. The use of t-distributed errors in each component of the model allows conditional leptokurtic distributions that account for the commonly observed excess unconditional kurtosis in financial data. Methods of parameter estimation and model selection are given. Finally, the proposed modelling procedure is illustrated through a real example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
751
760
http://hdl.handle.net/10.1093/biomet/asp031
application/pdf
Access to full text is restricted to subscribers.
C. S. Wong
W. S. Chan
P. L. Kam
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:617-6332013-06-14RePEc:oup:biomet
article
Pseudo-partial likelihood estimators for the Cox regression model with missing covariates
By embedding the missing covariate data into a left-truncated and right-censored survival model, we propose a new class of weighted estimating functions for the Cox regression model with missing covariates. The resulting estimators, called the pseudo-partial likelihood estimators, are shown to be consistent and asymptotically normal. A simulation study demonstrates that, compared with the popular inverse-probability weighted estimators, the new estimators perform better when the observation probability is small and improve efficiency of estimating the missing covariate effects. Application to a practical example is reported. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
617
633
http://hdl.handle.net/10.1093/biomet/asp027
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
Qiang Xu
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:691-7092013-06-14RePEc:oup:biomet
article
Use of functionals in linearization and composite estimation with application to two-sample survey data
An important problem associated with two-sample surveys is the estimation of nonlinear functions of finite population totals such as ratios, correlation coefficients or measures of income inequality. Computation and estimation of the variance of such complex statistics are made more difficult by the existence of overlapping units. In one-sample surveys, the linearization method based on the influence function approach is a powerful tool for variance estimation. We introduce a two-sample linearization technique that can be viewed as a generalization of the one-sample influence function approach. Our technique is based on expressing the parameters of interest as multivariate functionals of finite and discrete measures and then using partial influence functions to compute the linearized variables. Under broad assumptions, the asymptotic variance of the substitution estimator, derived from Deville (1999), is shown to be the variance of a weighted sum of the linearized variables. The paper then focuses on a general class of composite substitution estimators, and from this class the optimal estimator for minimizing the asymptotic variance is obtained. The efficiency of the optimal composite estimator is demonstrated through an empirical study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
691
709
http://hdl.handle.net/10.1093/biomet/asp039
application/pdf
Access to full text is restricted to subscribers.
C. Goga
J.-C. Deville
A. Ruiz-Gazen
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:577-5902013-06-14RePEc:oup:biomet
article
Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data
This paper extends the induced smoothing procedure of Brown & Wang (2006) for the semiparametric accelerated failure time model to the case of clustered failure time data. The resulting procedure permits fast and accurate computation of regression parameter estimates and standard errors using simple and widely available numerical methods, such as the Newton--Raphson algorithm. The regression parameter estimates are shown to be strongly consistent and asymptotically normal; in addition, we prove that the asymptotic distribution of the smoothed estimator coincides with that obtained without the use of smoothing. This establishes a key claim of Brown & Wang (2006) for the case of independent failure time data and also extends such results to the case of clustered data. Simulation results show that these smoothed estimates perform as well as those obtained using the best available methods at a fraction of the computational cost. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
577
590
http://hdl.handle.net/10.1093/biomet/asp025
application/pdf
Access to full text is restricted to subscribers.
Lynn M. Johnson
Robert L. Strawderman
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:545-5582013-06-14RePEc:oup:biomet
article
Empirical Bayes estimation for additive hazards regression models
We develop a novel empirical Bayesian framework for the semiparametric additive hazards regression model. The integrated likelihood, obtained by integration over the unknown prior of the nonparametric baseline cumulative hazard, can be maximized using standard statistical software. Unlike the corresponding full Bayes method, our empirical Bayes estimators of regression parameters, survival curves and their corresponding standard errors have easily computed closed-form expressions and require no elicitation of hyperparameters of the prior. The method guarantees a monotone estimator of the survival function and accommodates time-varying regression coefficients and covariates. To facilitate frequentist-type inference based on large-sample approximation, we present the asymptotic properties of the semiparametric empirical Bayes estimates. We illustrate the implementation and advantages of our methodology with a reanalysis of a survival dataset and a simulation study. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
545
558
http://hdl.handle.net/10.1093/biomet/asp024
application/pdf
Access to full text is restricted to subscribers.
Debajyoti Sinha
M. Brent McHenry
Stuart R. Lipsitz
Malay Ghosh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:635-6442013-06-14RePEc:oup:biomet
article
Approximating the α-permanent
The standard matrix permanent is the solution to a number of combinatorial and graph-theoretic problems, and the α-weighted permanent is the density function for a class of Cox processes called boson processes. The exact computation of the ordinary permanent is known to be #P-complete, and the same appears to be the case for the α-permanent for most values of α. At present, the lack of a satisfactory algorithm for approximating the α-permanent is a formidable obstacle to the use of boson processes in applied work. This paper proposes an importance-sampling estimator using nonuniform random permutations generated in a cycle format. Empirical investigation reveals that the estimator works well for the sorts of matrices that arise in point-process applications, involving up to a few hundred points. We conclude with a numerical illustration of the Bayes estimate of the intensity function of a boson point process, which is a ratio of α-permanents. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
635
644
http://hdl.handle.net/10.1093/biomet/asp036
application/pdf
Access to full text is restricted to subscribers.
S. C. Kou
P. McCullagh
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:711-7222013-06-14RePEc:oup:biomet
article
Effects of data dimension on empirical likelihood
We evaluate the effects of data dimension on the asymptotic normality of the empirical likelihood ratio for high-dimensional data under a general multivariate model. Data dimension and dependence among components of the multivariate random vector affect the empirical likelihood directly through the trace and the eigenvalues of the covariance matrix. The growth rates to infinity we obtain for the data dimension improve the rates of Hjort et al. (2008). Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
711
722
http://hdl.handle.net/10.1093/biomet/asp037
application/pdf
Access to full text is restricted to subscribers.
Song Xi Chen
Liang Peng
Ying-Li Qin
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:735-7492013-06-14RePEc:oup:biomet
article
A negative binomial model for time series of counts
We study generalized linear models for time series of counts, where serial dependence is introduced through a dependent latent process in the link function. Conditional on the covariates and the latent process, the observation is modelled by a negative binomial distribution. To estimate the regression coefficients, we maximize the pseudolikelihood that is based on a generalized linear model with the latent process suppressed. We show the consistency and asymptotic normality of the generalized linear model estimator when the latent process is a stationary strongly mixing process. We extend the asymptotic results to generalized linear models for time series, where the observation variable, conditional on covariates and a latent process, is assumed to have a distribution from a one-parameter exponential family. Thus, we unify in a common framework the results for Poisson log-linear regression models of Davis et al. (2000), negative binomial logit regression models and other similarly specified generalized linear models. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
735
749
http://hdl.handle.net/10.1093/biomet/asp029
application/pdf
Access to full text is restricted to subscribers.
Richard A. Davis
Rongning Wu
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:663-6762013-06-14RePEc:oup:biomet
article
Gaussian process emulation of dynamic computer codes
Computer codes are used in scientific research to study and predict the behaviour of complex systems. Their run times often make uncertainty and sensitivity analyses impractical because of the thousands of runs that are conventionally required, so efficient techniques have been developed based on a statistical representation of the code. The approach is less straightforward for dynamic codes, which represent time-evolving systems. We develop a novel iterative system to build a statistical model of dynamic computer codes, which is demonstrated on a rainfall-runoff simulator. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
663
676
http://hdl.handle.net/10.1093/biomet/asp028
application/pdf
Access to full text is restricted to subscribers.
S. Conti
J. P. Gosling
J. E. Oakley
A. O'Hagan
oai:RePEc:oup:biomet:v:96:y:2009:i:3:p:559-5752013-06-14RePEc:oup:biomet
article
Improving point and interval estimators of monotone functions by rearrangement
Suppose that a target function is monotonic and an available original estimate of this target function is not monotonic. Rearrangements, univariate and multivariate, transform the original estimate to a monotonic estimate that always lies closer in common metrics to the target function. Furthermore, suppose an original confidence interval, which covers the target function with probability at least 1-α, is defined by an upper and lower endpoint functions that are not monotonic. Then the rearranged confidence interval, defined by the rearranged upper and lower endpoint functions, is monotonic, shorter in length in common norms than the original interval, and covers the target function with probability at least 1-α. We illustrate the results with a growth chart example. Copyright 2009, Oxford University Press.
3
2009
96
Biometrika
559
575
http://hdl.handle.net/10.1093/biomet/asp030
application/pdf
Access to full text is restricted to subscribers.
V. Chernozhukov
I. Fernández-Val
A. Galichon
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:273-2902015-03-25RePEc:oup:biomet
article
Sample size and power analysis for sparse signal recovery in genome-wide association studies
Genome-wide association studies have successfully identified hundreds of novel genetic variants associated with many complex human diseases. However, there is a lack of rigorous work on evaluating the statistical power for identifying these variants. In this paper, we consider sparse signal identification in genome-wide association studies and present two analytical frameworks for detailed analysis of the statistical power for detecting and identifying the disease-associated variants. We present an explicit sample size formula for achieving a given false non-discovery rate while controlling the false discovery rate based on an optimal procedure. Sparse genetic variant recovery is also considered and a boundary condition is established in terms of sparsity and signal strength for almost exact recovery of both disease-associated variants and nondisease-associated variants. A data-adaptive procedure is proposed to achieve this bound. The analytical results are illustrated with a genome-wide association study of neuroblastoma. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
273
290
http://hdl.handle.net/10.1093/biomet/asr003
application/pdf
Access to full text is restricted to subscribers.
Jichun Xie
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:243-2502015-03-25RePEc:oup:biomet
article
Testing a linear time series model against its threshold extension
This paper derives the asymptotic null distribution of a quasilikelihood ratio test statistic for an autoregressive moving average model against its threshold extension. The null hypothesis is that of no threshold, and the error term could be dependent. The asymptotic distribution is rather complicated, and all existing methods for approximating a distribution in the related literature fail to work. Hence, a novel bootstrap approximation based on stochastic permutation is proposed in this paper. Besides being robust to the assumptions on the error term, our method enjoys more flexibility and needs less computation when compared with methods currently used in the literature. Monte Carlo experiments give further support to the new approach, and an illustration is reported. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
243
250
http://hdl.handle.net/10.1093/biomet/asq074
application/pdf
Access to full text is restricted to subscribers.
Guodong Li
Wai Keung Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:489-4942015-03-25RePEc:oup:biomet
article
The dimple in Gneiting's spatial-temporal covariance model
Gneiting (2002) proposed a nonseparable covariance model for spatial-temporal data. In the present paper we show that in certain circumstances his model possesses a counterintuitive dimple. In some cases, the magnitude of the dimple can be nontrivial. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
489
494
http://hdl.handle.net/10.1093/biomet/asr006
application/pdf
Access to full text is restricted to subscribers.
John T. Kent
Mohsen Mohammadzadeh
Ali M. Mosammam
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:355-3702015-03-25RePEc:oup:biomet
article
Efficient semiparametric regression for longitudinal data with nonparametric covariance estimation
For longitudinal data, when the within-subject covariance is misspecified, the semiparametric regression estimator may be inefficient. We propose a method that combines the efficient semiparametric estimator with nonparametric covariance estimation, and is robust against misspecification of covariance models. We show that kernel covariance estimation provides uniformly consistent estimators for the within-subject covariance matrices, and the semiparametric profile estimator with substituted nonparametric covariance is still semiparametrically efficient. The finite sample performance of the proposed estimator is illustrated by simulation. In an application to CD4 count data from an AIDS clinical trial, we extend the proposed method to a functional analysis of the covariance model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
355
370
http://hdl.handle.net/10.1093/biomet/asq080
application/pdf
Access to full text is restricted to subscribers.
Yehua Li
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:481-4882015-03-25RePEc:oup:biomet
article
On the likelihood function of Gaussian max-stable processes
We derive a closed form expression for the likelihood function of a Gaussian max-stable process indexed by ℝ-super-d at p≤d+1 sites, d≥1. We demonstrate the gain in efficiency in the maximum composite likelihood estimators of the covariance matrix from p=2 to p=3 sites in ℝ-super-2 by means of a Monte Carlo simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
481
488
http://hdl.handle.net/10.1093/biomet/asr020
application/pdf
Access to full text is restricted to subscribers.
Marc G. Genton
Yanyuan Ma
Huiyan Sang
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:997-10012015-03-25RePEc:oup:biomet
article
A note on overadjustment in inverse probability weighted estimation
Standardized means, commonly used in observational studies in epidemiology to adjust for potential confounders, are equal to inverse probability weighted means with inverse weights equal to the empirical propensity scores. More refined standardization corresponds with empirical propensity scores computed under more flexible models. Unnecessary standardization induces efficiency loss. However, according to the theory of inverse probability weighted estimation, propensity scores estimated under more flexible models induce improvement in the precision of inverse probability weighted means. This apparent contradiction is clarified by explicitly stating the assumptions under which the improvement in precision is attained. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asq049
application/pdf
Access to full text is restricted to subscribers.
Andrea Rotnitzky
Lingling Li
Xiaochun Li
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:851-8652015-03-25RePEc:oup:biomet
article
Nonparametric Bayesian density estimation on manifolds with applications to planar shapes
Statistical analysis on landmark-based shape spaces has diverse applications in morphometrics, medical diagnostics, machine vision and other areas. These shape spaces are non-Euclidean quotient manifolds. To conduct nonparametric inferences, one may define notions of centre and spread on this manifold and work with their estimates. However, it is useful to consider full likelihood-based methods, which allow nonparametric estimation of the probability density. This article proposes a broad class of mixture models constructed using suitable kernels on a general compact metric space and then on the planar shape space in particular. Following a Bayesian approach with a nonparametric prior on the mixing distribution, conditions are obtained under which the Kullback--Leibler property holds, implying large support and weak posterior consistency. Gibbs sampling methods are developed for posterior computation, and the methods are applied to problems in density estimation and classification with shape-based predictors. Simulation studies show improved estimation performance relative to existing approaches. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
851
865
http://hdl.handle.net/10.1093/biomet/asq044
application/pdf
Access to full text is restricted to subscribers.
Abhishek Bhattacharya
David B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:237-2422015-03-25RePEc:oup:biomet
article
Recapture models under equality constraints for the conditional capture probabilities
We introduce a general class of capture-recapture models in which capture probabilities depend on capture history. We discuss constrained versions of the saturated model based on equality constraints. Inference can be performed through a simple estimating equation. The approach is illustrated on a dataset concerning Great Copper butterflies in Willamette Valley of Oregon. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
237
242
http://hdl.handle.net/10.1093/biomet/asq068
application/pdf
Access to full text is restricted to subscribers.
A. Farcomeni
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:147-1622015-03-25RePEc:oup:biomet
article
Estimation of covariate effects in generalized linear mixed models with informative cluster sizes
In standard regression analyses of clustered data, one typically assumes that the expected value of the response is independent of cluster size. However, this is often false. For example, in studies of surgical interventions, investigators have frequently found surgery volume and outcomes to be related to the skill level of the surgeons. This paper examines the effect of ignoring response-dependent, informative, cluster sizes on standard analytical methods such as mixed-effects models and conditional likelihood methods using analytic calculations, simulation studies and an example from a study of periodontal disease. We consider the case in which cluster sizes and responses share random effects which we assume to be independent of the covariates. Our focus is on maximum likelihood methods that ignore informative cluster sizes, and we show that they exhibit little bias in estimating covariate effects that are uncorrelated with the random effects associated with cluster sizes. However, estimation of covariate effects that are associated with the random effects can be biased. In particular, for models with random intercepts only, ignoring informative cluster sizes can yield biased estimators of the intercept but little bias in estimation of all covariate effects. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
147
162
http://hdl.handle.net/10.1093/biomet/asq066
application/pdf
Access to full text is restricted to subscribers.
John M. Neuhaus
Charles E. McCulloch
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:81-902015-03-25RePEc:oup:biomet
article
A self-normalized confidence interval for the mean of a class of nonstationary processes
We construct an asymptotic confidence interval for the mean of a class of nonstationary processes with constant mean and time-varying variances. Due to the large number of unknown parameters, traditional approaches based on consistent estimation of the limiting variance of sample mean through moving block or non-overlapping block methods are not applicable. Under a block-wise asymptotically equal cumulative variance assumption, we propose a self-normalized confidence interval that is robust against the nonstationarity and dependence structure of the data. We also apply the same idea to construct an asymptotic confidence interval for the mean difference of nonstationary processes with piecewise constant means. The proposed methods are illustrated through simulations and an application to global temperature series. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
81
90
http://hdl.handle.net/10.1093/biomet/asq076
application/pdf
Access to full text is restricted to subscribers.
Zhibiao Zhao
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:905-9202015-03-25RePEc:oup:biomet
article
Penalized high-dimensional empirical likelihood
We propose penalized empirical likelihood for parameter estimation and variable selection for problems with diverging numbers of parameters. Our results are demonstrated for estimating the mean vector in multivariate analysis and regression coefficients in linear models. By using an appropriate penalty function, we showthat penalized empirical likelihood has the oracle property. That is, with probability tending to 1, penalized empirical likelihood identifies the true model and estimates the nonzero coefficients as efficiently as if the sparsity of the true model was known in advance. The advantage of penalized empirical likelihood as a nonparametric likelihood approach is illustrated by testing hypotheses and constructing confidence regions. Numerical simulations confirm our theoretical findings. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
905
920
http://hdl.handle.net/10.1093/biomet/asq057
application/pdf
Access to full text is restricted to subscribers.
Cheng Yong Tang
Chenlei Leng
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1013-10132015-03-25RePEc:oup:biomet
article
Amendments and Corrections
4
2010
97
Biometrika
1013
1013
http://hdl.handle.net/10.1093/biomet/asq052
application/pdf
Access to full text is restricted to subscribers.
Soumik Pal
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:867-8802015-03-25RePEc:oup:biomet
article
A weighted estimating equation approach for inhomogeneous spatial point processes
We introduce a new estimation method for parametric intensity function models of inhomogeneous spatial point processes based on weighted estimating equations. The weights can incorporate information on both inhomogeneity and dependence of the process. Simulations show that significant efficiency gains can be achieved for non-Poisson processes, compared to the Poisson maximum likelihood estimator. An application to tropical forest data illustrates the use of the proposed method. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
867
880
http://hdl.handle.net/10.1093/biomet/asq043
application/pdf
Access to full text is restricted to subscribers.
Yongtao Guan
Ye Shen
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:825-8382015-03-25RePEc:oup:biomet
article
Noncrossing quantile regression curve estimation
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
825
838
http://hdl.handle.net/10.1093/biomet/asq048
application/pdf
Access to full text is restricted to subscribers.
Howard D. Bondell
Brian J. Reich
Huixia Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:49-632015-03-25RePEc:oup:biomet
article
Bootstrap inference for mean reflection shape and size-and-shape with three-dimensional landmark data
Working within the framework of a multi-dimensional scaling approach to shape analysis, we develop bootstrap methods for inference about mean reflection shape and size-and-shape based on labelled landmark data. The approach is developed in general dimensions though we focus on the three-dimensional case. We consider two pivotal statistics which we use to construct bootstrap confidence regions for the mean reflection shape or size-and-shape, and present simulation results which show that these statistics perform well in a variety of examples. We also suggest regularized versions of the test statistics that are suitable for more challenging cases where sample size is not sufficiently large in relation to the number of landmarks and present numerical results confirming that regularization indeed leads to better performance. An algorithm for producing a graphical representation of the confidence region for the mean reflection shape is presented and applied in an example involving molecular dynamics simulation data. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
49
63
http://hdl.handle.net/10.1093/biomet/asq065
application/pdf
Access to full text is restricted to subscribers.
S. P. Preston
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:231-2362015-03-25RePEc:oup:biomet
article
A novel reversible jump algorithm for generalized linear models
We propose a novel methodology to construct proposal densities in reversible jump algorithms that obtain samples from parameter subspaces of competing generalized linear models with differing dimensions. The derived proposal densities are not restricted to moves between nested models and are applicable even to models that share no common parameters. We illustrate our methodology on competing logistic regression and log-linear graphical models, demonstrating how our suggested proposal densities, together with the resulting freedom to propose moves between any models, improve the performance of the reversible jump algorithm. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
231
236
http://hdl.handle.net/10.1093/biomet/asq071
application/pdf
Access to full text is restricted to subscribers.
M. Papathomas
P. Dellaportas
V. G. S. Vasdekis
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:921-9342015-03-25RePEc:oup:biomet
article
Estimation of controlled direct effects on a dichotomous outcome using logistic structural direct effect models
We consider the problem of assessing whether an exposure affects a dichotomous outcome other than by modifying a given mediator. The standard approach, logistic regression adjusting for both exposure and the mediator, is known to be biased in the presence of confounders for the mediator-outcome relationship. Because additional regression adjustment for such confounders is only justified when they are not affected by the exposure, inverse probability weighting has been advocated, but is not ideally tailored to mediators that are continuous or have strong measured predictors. We overcome this limitation by developing inference for a novel class of causal models that are closely related to Robins' logistic structural direct effect models, but do not inherit their difficulties of estimation. We study identification and efficient estimation under the assumption that all confounders for the exposure-outcome and mediator-outcome relationships have been measured, and find adequate performance in simulation studies. We discuss extensions to case-control studies and relevant implications for the generic problem of adjustment for time-varying confounding. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
921
934
http://hdl.handle.net/10.1093/biomet/asq053
application/pdf
Access to full text is restricted to subscribers.
Stijn Vansteelandt
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:391-4012015-03-25RePEc:oup:biomet
article
The union closure method for testing a fixed sequence of families of hypotheses
Statistical analyses often involve testing multiple hypotheses that are naturally grouped into a fixed sequence of families. An effective approach to control the familywise error rate is to prioritize the importance of prespecification in the testing order. A gatekeeping testing procedure examines the first family with no multiple adjustment and then examines the subsequent family depending on the decision made with respect to the previous one. In this paper, we describe the union closure method that can be used to design gatekeeping procedures. A bipolar disorder trial with three primary and two secondary outcomes is presented as an example. Power comparisons based on the bipolar disorder trial show that the proposed gatekeeping procedures under the union closure framework are more powerful than competing methods. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
391
401
http://hdl.handle.net/10.1093/biomet/asr015
application/pdf
Access to full text is restricted to subscribers.
Han-Joo Kim
A. Richard Entsuah
Justine Shults
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:433-4482015-03-25RePEc:oup:biomet
article
Maximum likelihood estimation of a generalized threshold stochastic regression model
There is hardly any literature on modelling nonlinear dynamic relations involving nonnormal time series data. This is a serious lacuna because nonnormal data are far more abundant than normal ones, for example, time series of counts and positive time series. While there are various forms of nonlinearities, the class of piecewise-linear models is particularly appealing for its relative ease of tractability and interpretation. We propose to study the generalized threshold model which specifies that the conditional probability distribution of the response variable belongs to an exponential family, and the conditional mean response is linked to some piecewise-linear stochastic regression function. We introduce a likelihood-based estimation scheme, and the consistency and limiting distribution of the maximum likelihood estimator are derived. We illustrate the proposed approach with an analysis of a hare abundance time series, which gives new insights on how phase-dependent predator-prey-climate interactions shaped the ten-year hare population cycle. A simulation study is conducted to examine the finite-sample performance of the proposed estimation method. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
433
448
http://hdl.handle.net/10.1093/biomet/asr008
application/pdf
Access to full text is restricted to subscribers.
Noelle I. Samia
Kung-Sik Chan
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:119-1322015-03-25RePEc:oup:biomet
article
Parametric fractional imputation for missing data analysis
Parametric fractional imputation is proposed as a general tool for missing data analysis. Using fractional weights, the observed likelihood can be approximated by the weighted mean of the imputed data likelihood. Computational efficiency can be achieved using the idea of importance sampling and calibration weighting. The proposed imputation method provides efficient parameter estimates for the model parameters specified in the imputation model and also provides reasonable estimates for parameters that are not part of the imputation model. Variance estimation is discussed and results from a limited simulation study are presented. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
119
132
http://hdl.handle.net/10.1093/biomet/asq073
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:107-1182015-03-25RePEc:oup:biomet
article
Horvitz--Thompson estimators for functional data: asymptotic confidence bands and optimal allocation for stratified sampling
When dealing with very large datasets of functional data, survey sampling approaches are useful in order to obtain estimators of simple functional quantities, without being obliged to store all the data. We propose a Horvitz--Thompson estimator of the mean trajectory. In the context of a superpopulation framework, we prove, under mild regularity conditions, that we obtain uniformly consistent estimators of the mean function and of its variance function. With additional assumptions on the sampling design we state a functional central limit theorem and obtain asymptotic confidence bands. Stratified sampling is studied in detail, and we also obtain a functional version of the usual optimal allocation rule, considering a mean variance criterion. These techniques are illustrated by a test population of N=18 902 electricity meters for which we have individual electricity consumption measures every 30 minutes over one week. We show that stratification can substantially improve both the accuracy of the estimators and reduce the width of the global confidence bands compared with simple random sampling without replacement. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
107
118
http://hdl.handle.net/10.1093/biomet/asq070
application/pdf
Access to full text is restricted to subscribers.
Hervé Cardot
Etienne Josserand
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:839-8502015-03-25RePEc:oup:biomet
article
Censored quantile regression with partially functional effects
Quantile regression offers a flexible approach to analyzing survival data, allowing each covariate effect to vary with quantiles. In practice, constancy is often found to be adequate for some covariates. In this paper, we study censored quantile regression tailored to the partially functional effect setting with a mixture of varying and constant effects. Such a model can offer a simpler view regarding covariate-survival association and, moreover, can enable improvement in estimation efficiency. We propose profile estimating equations and present an iterative algorithm that can be readily and stably implemented. Asymptotic properties of the resultant estimators are established. A simple resampling-based inference procedure is developed and justified. Extensive simulation studies demonstrate efficiency gains of the proposed method over a naive two-stage procedure. The proposed method is illustrated via an application to a recent renal dialysis study. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
839
850
http://hdl.handle.net/10.1093/biomet/asq050
application/pdf
Access to full text is restricted to subscribers.
Jing Qian
Limin Peng
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:215-2242015-03-25RePEc:oup:biomet
article
Assessing the validity of weighted generalized estimating equations
The inverse probability weighted generalized estimating equations approach (Robins et al. 1994; Robins et al. 1995), effectively removes bias and provides valid statistical inference for regression parameter estimation in marginal models when longitudinal data contain missing values. The validity of the weighted generalized estimating equations regarding consistent estimation depends on whether the underlying missing data process is properly modelled. However, there is little work available to examine whether or not this condition holds. In this paper we propose a test constructed from two sets of estimating equations: one set is known to be unbiased, but the other set is not known. We utilize the quadratic inference function (Qu et al. 2000) method to assess their compatibility, which is equivalent to testing for the validity of the weighted generalized estimating equations approach. We conduct simulation studies to assess the performance of the proposed method. The test procedure is illustrated through a real data example. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
215
224
http://hdl.handle.net/10.1093/biomet/asq078
application/pdf
Access to full text is restricted to subscribers.
A. Qu
G. Y. Yi
P. X.-K. Song
P. Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:177-1862015-03-25RePEc:oup:biomet
article
Nonparametric estimation for length-biased and right-censored data
This paper considers survival data arising from length-biased sampling, where the survival times are left truncated by uniformly distributed random truncation times. We propose a nonparametric estimator that incorporates the information about the length-biased sampling scheme. The new estimator retains the simplicity of the truncation product-limit estimator with a closed-form expression, and has a small efficiency loss compared with the nonparametric maximum likelihood estimator, which requires an iterative algorithm. Moreover, the asymptotic variance of the proposed estimator has a closed form, and a variance estimator is easily obtained by plug-in methods. Numerical simulation studies with practical sample sizes are conducted to compare the performance of the proposed method with its competitors. A data analysis of the Canadian Study of Health and Aging is conducted to illustrate the methods and theory. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
177
186
http://hdl.handle.net/10.1093/biomet/asq069
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:163-1752015-03-25RePEc:oup:biomet
article
A unified framework for studying parameter identifiability and estimation in biased sampling designs
Based on the odds ratio representation of a joint density, we propose a unified framework to study parameter identifiability in biased sampling designs. It is shown that most of these designs encountered in practice can be reformulated within the proposed framework and, as a result, the question of parameter identifiability can be largely clarified. Estimation of the identifiable parameters is considered and traditional results on the equivalence of the prospective and retrospective likelihoods are extended. Information contained in data on certain identifiable parameters is often very limited. Such parameters can be poorly estimated by the likelihood approach with practically attainable sample sizes, which can substantially affect the estimates of parameters of primary interest. A partially penalized likelihood approach is proposed to address this. Simulation results suggest that the proposed approach has good performance. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
163
175
http://hdl.handle.net/10.1093/biomet/asq059
application/pdf
Access to full text is restricted to subscribers.
Hua Yun Chen
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:199-2142015-03-25RePEc:oup:biomet
article
The effect of correlation in false discovery rate estimation
The objective of this paper is to quantify the effect of correlation in false discovery rate analysis. Specifically, we derive approximations for the mean, variance, distribution and quantiles of the standard false discovery rate estimator for arbitrarily correlated data. This is achieved using a negative binomial model for the number of false discoveries, where the parameters are found empirically from the data. We show that correlation may increase the bias and variance of the estimator substantially with respect to the independent case, and that in some cases, such as an exchangeable correlation structure, the estimator fails to be consistent as the number of tests becomes large. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
199
214
http://hdl.handle.net/10.1093/biomet/asq075
application/pdf
Access to full text is restricted to subscribers.
Armin Schwartzman
Xihong Lin
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:251-2712015-03-25RePEc:oup:biomet
article
False discovery rates and copy number variation
Copy number changes, the gains and losses of chromosome segments, are a common type of genetic variation among healthy individuals as well as an important feature in tumour genomes. Microarray technology enables us to simultaneously measure, with moderate accuracy, copy number variation at more than a million chromosome locations and for hundreds of subjects. This leads to massive data sets and complicated inference problems concerning which locations are more likely to vary. In this paper we consider a relatively simple false discovery rate approach to copy number analysis. More careful parametric change-point methods can then be focused on promising regions of the genome. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
251
271
http://hdl.handle.net/10.1093/biomet/asr018
application/pdf
Access to full text is restricted to subscribers.
Bradley Efron
Nancy R. Zhang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:473-4802015-03-25RePEc:oup:biomet
article
Empirical likelihood for small area estimation
Current methodologies in small area estimation are mostly either parametric or heavily dependent on the assumed linearity of the estimators of the small area means. We discuss an alternative empirical likelihood-based Bayesian approach, which neither requires a parametric likelihood nor assumes linearity of the estimators, and can handle both discrete and continuous data in a unified manner. Empirical likelihoods for both area- and unit-level models are introduced. We discuss the suitability of the proposed likelihoods in Bayesian inference and illustrate their performances on a real dataset and a simulation study. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
473
480
http://hdl.handle.net/10.1093/biomet/asr004
application/pdf
Access to full text is restricted to subscribers.
Sanjay Chaudhuri
Malay Ghosh
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:947-9602015-03-25RePEc:oup:biomet
article
Enhancing the sample average approximation method with U designs
Many computational problems in statistics can be cast as stochastic programs that are optimization problems whose objective functions are multi-dimensional integrals. The sample average approximation method is widely used for solving such a problem, which first constructs a sampling-based approximation to the objective function and then finds the solution to the approximated problem. Independent and identically distributed sampling is a prevailing choice for constructing such approximations. Recently it was found that the use of Latin hypercube designs can improve sample average approximations. In computer experiments, U designs are known to possess better space-filling properties than Latin hypercube designs. Inspired by this fact, we propose to use U designs to further enhance the accuracy of the sample average approximation method. Theoretical results are derived to show that sample average approximations with U designs can significantly outperform those with Latin hypercube designs. Numerical examples are provided to corroborate the developed theoretical results. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
947
960
http://hdl.handle.net/10.1093/biomet/asq046
application/pdf
Access to full text is restricted to subscribers.
Qi Tang
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:990-9962015-03-25RePEc:oup:biomet
article
On the equivalence of prospective and retrospective likelihood methods in case-control studies
We present new approaches to analyzing case-control studies using prospective likelihood methods. In the classical framework, we extend the equality of the profile likelihoods to the Barndorff-Nielsen modified profile likelihoods for prospective and retrospective models. This enables simple and accurate approximate conditional inference for stratified case-control studies of moderate stratum size. In the Bayesian framework, we provide sufficient conditions on priors for the prospective model parameters to yield a prospective marginal posterior density equal to its retrospective counterpart. Our results extend the prospective-retrospective equivalence in the Bayesian paradigm with a more general class of priors than has previously been investigated. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
990
996
http://hdl.handle.net/10.1093/biomet/asq054
application/pdf
Access to full text is restricted to subscribers.
Ana-Maria Staicu
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:371-3802015-03-25RePEc:oup:biomet
article
Sure independence screening and compressed random sensing
Compressed sensing is a very powerful and popular tool for sparse recovery of high dimensional signals. Random sensing matrices are often employed in compressed sensing. In this paper we introduce a new method named aggressive betting using sure independence screening for sparse noiseless signal recovery. The proposal exploits the randomness structure of random sensing matrices to greatly boost computation speed. When using sub-Gaussian sensing matrices, which include the Gaussian and Bernoulli sensing matrices as special cases, our proposal has the exact recovery property with overwhelming probability. We also consider sparse recovery with noise and explicitly reveal the impact of noise-to-signal ratio on the probability of sure screening. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
371
380
http://hdl.handle.net/10.1093/biomet/asr010
application/pdf
Access to full text is restricted to subscribers.
Lingzhou Xue
Hui Zou
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:325-3402015-03-25RePEc:oup:biomet
article
Nonparametric inference for competing risks current status data with continuous, discrete or grouped observation times
New methods and theory have recently been developed to nonparametrically estimate cumulative incidence functions for competing risks survival data subject to current status censoring. In particular, the limiting distribution of the nonparametric maximum likelihood estimator and a simplified naive estimator have been established under certain smoothness conditions. In this paper, we establish the large-sample behaviour of these estimators in two additional models, namely when the observation time distribution has discrete support and when the observation times are grouped. These asymptotic results are applied to the construction of confidence intervals in the three different models. The methods are illustrated on two datasets regarding the cumulative incidence of different types of menopause from a cross-sectional sample of women in the United States and subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
325
340
http://hdl.handle.net/10.1093/biomet/asq083
application/pdf
Access to full text is restricted to subscribers.
M. H. Maathuis
M. G. Hudgens
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:381-3902015-03-25RePEc:oup:biomet
article
Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control
Testing a low-dimensional null hypothesis against a high-dimensional alternative in a generalized linear model may lead to a test statistic that is a quadratic form in the residuals under the null model. Using asymptotic arguments, we show that the distribution of such a test statistic can be approximated by a ratio of quadratic forms in normal variables, for which algorithms are readily available. For generalized linear models, the asymptotic distribution shows good control of type I error for moderate to small samples, even when the number of covariates in the model far exceeds the sample size. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
381
390
http://hdl.handle.net/10.1093/biomet/asr016
application/pdf
Access to full text is restricted to subscribers.
Jelle J. Goeman
Hans C. van Houwelingen
Livio Finos
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:495-5012015-03-25RePEc:oup:biomet
article
An Akaike-type information criterion for model selection under inequality constraints
The Akaike information criterion for model selection presupposes that the parameter space is not subject to order restrictions or inequality constraints. Anraku (1999) proposed a modified version of this criterion, called the order-restricted information criterion, for model selection in the one-way analysis of variance model when the population means are monotonic. We propose a generalization of this to the case when the population means may be restricted by a mixture of linear equality and inequality constraints. If the model has no inequality constraints, then the generalized order-restricted information criterion coincides with the Akaike information criterion. Thus, the former extends the applicability of the latter to model selection in multi-way analysis of variance models when some models may have inequality constraints while others may not. Simulation shows that the information criterion proposed in this paper performs well in selecting the correct model. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
495
501
http://hdl.handle.net/10.1093/biomet/asr002
application/pdf
Access to full text is restricted to subscribers.
R. M. Kuiper
H. Hoijtink
M. J. Silvapulle
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:935-9462015-03-25RePEc:oup:biomet
article
Compound optimal allocation for individual and collective ethics in binary clinical trials
In recent years, several authors have investigated response-adaptive allocation rules for comparative clinical trials, in order to favour, at each stage of the trial, the treatment that appears to be best. In this paper, we define admissible allocations, namely treatment assignments that cannot be simultaneously improved upon with respect to both a specific design criterion, reflecting the inferential properties of the experiment, and the proportion of patients assigned to the best treatment or treatments; we survey existing designs from this viewpoint. We also suggest combining information and ethical considerations by taking a suitable weighted mean of two corresponding standardized criteria, with weights that depend on the actual treatment effects. This compound criterion leads to a locally optimal allocation that can be targeted by some response-adaptive randomization rule. The paper mainly deals with the case of two treatments, but the suggested methodology is shown to extend to more than two. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
935
946
http://hdl.handle.net/10.1093/biomet/asq055
application/pdf
Access to full text is restricted to subscribers.
Alessandro Baldi Antognini
Alessandra Giovagnoli
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:1002-10052015-03-25RePEc:oup:biomet
article
Parameter redundancy with covariates
We show how to determine the parameter redundancy status of a model with covariates from that of the same model without covariates, thereby simplifying the calculation considerably. A matrix decomposition is necessary to ensure that the symbolic computation computer programmes return correct results. The paper is illustrated by mark-recovery and latent-class models, with associated Maple code. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
1002
1005
http://hdl.handle.net/10.1093/biomet/asq041
application/pdf
Access to full text is restricted to subscribers.
Diana J. Cole
Byron J. T. Morgan
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:985-9892015-03-25RePEc:oup:biomet
article
Some insights into continuum regression and its asymptotic properties
Continuum regression encompasses ordinary least squares regression, partial least squares regression and principal component regression under the same umbrella using a nonnegative parameter Gamma. However, there seems to be no literature discussing the asymptotic properties for arbitrary continuum regression parameter Gamma. This article establishes a relation between continuum regression and sufficient dimension reduction and studies the asymptotic properties of continuum regression for arbitrary Gamma under inverse regression models. Theoretical and simulation results show that the continuum seems unnecessary when the conditional distribution of the predictors given the response follows the multivariate normal distribution. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
985
989
http://hdl.handle.net/10.1093/biomet/asq024
application/pdf
Access to full text is restricted to subscribers.
Xin Chen
R. Dennis Cook
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:893-9042015-03-25RePEc:oup:biomet
article
Consistent selection of the number of clusters via crossvalidation
In cluster analysis, one of the major challenges is to estimate the number of clusters. Most existing approaches attempt to minimize some distance-based dissimilarity measure within clusters. This article proposes a novel selection criterion that is applicable to all kinds of clustering algorithms, including distance based or non-distance based algorithms. The key idea is to select the number of clusters that minimizes the algorithm's instability, which measures the robustness of any given clustering algorithm against the randomness in sampling.Anovel estimation scheme for clustering instability is developed based on crossvalidation. The proposed selection criterion's effectiveness is demonstrated on a variety of numerical experiments, and its asymptotic selection consistency is established when the dataset is properly split. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
893
904
http://hdl.handle.net/10.1093/biomet/asq061
application/pdf
Access to full text is restricted to subscribers.
Junhui Wang
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:291-3062015-03-25RePEc:oup:biomet
article
Sparse Bayesian infinite factor models
We focus on sparse modelling of high-dimensional covariance matrices using Bayesian latent factor models. We propose a multiplicative gamma process shrinkage prior on the factor loadings which allows introduction of infinitely many factors, with the loadings increasingly shrunk towards zero as the column index increases. We use our prior on a parameter-expanded loading matrix to avoid the order dependence typical in factor analysis models and develop an efficient Gibbs sampler that scales well as data dimensionality increases. The gain in efficiency is achieved by the joint conjugacy property of the proposed prior, which allows block updating of the loadings matrix. We propose an adaptive Gibbs sampler for automatically truncating the infinite loading matrix through selection of the number of important factors. Theoretical results are provided on the support of the prior and truncation approximation bounds. A fast algorithm is proposed to produce approximate Bayes estimates. Latent factor regression methods are developed for prediction and variable selection in applications with high-dimensional correlated predictors. Operating characteristics are assessed through simulation studies, and the approach is applied to predict survival times from gene expression data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
291
306
http://hdl.handle.net/10.1093/biomet/asr013
application/pdf
Access to full text is restricted to subscribers.
A. Bhattacharya
D. B. Dunson
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:1-152015-03-25RePEc:oup:biomet
article
Joint estimation of multiple graphical models
Gaussian graphical models explore dependence relationships between random variables, through the estimation of the corresponding inverse covariance matrices. In this paper we develop an estimator for such models appropriate for data from several graphical models that share the same variables and some of the dependence structure. In this setting, estimating a single graphical model would mask the underlying heterogeneity, while estimating separate models for each category does not take advantage of the common structure. We propose a method that jointly estimates the graphical models corresponding to the different categories present in the data, aiming to preserve the common structure, while allowing for differences between the categories. This is achieved through a hierarchical penalty that targets the removal of common zeros in the inverse covariance matrices across categories. We establish the asymptotic consistency and sparsity of the proposed estimator in the high-dimensional case, and illustrate its performance on a number of simulated networks. An application to learning semantic connections between terms from webpages collected from computer science departments is included. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
1
15
http://hdl.handle.net/10.1093/biomet/asq060
application/pdf
Access to full text is restricted to subscribers.
Jian Guo
Elizaveta Levina
George Michailidis
Ji Zhu
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:341-3542015-03-25RePEc:oup:biomet
article
Time-dependent cross ratio estimation for bivariate failure times
In the analysis of bivariate correlated failure time data, it is important to measure the strength of association among the correlated failure times. One commonly used measure is the cross ratio. Motivated by Cox's partial likelihood idea, we propose a novel parametric cross ratio estimator that is a flexible continuous function of both components of the bivariate survival times. We show that the proposed estimator is consistent and asymptotically normal. Its finite sample performance is examined using simulation studies, and it is applied to the Australian twin data. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
341
354
http://hdl.handle.net/10.1093/biomet/asr005
application/pdf
Access to full text is restricted to subscribers.
Tianle Hu
Bin Nan
Xihong Lin
James M. Robins
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:187-1982015-03-25RePEc:oup:biomet
article
Variance estimation for generalized Cavalieri estimators
The precision of stereological estimators based on systematic sampling is of great practical importance. This paper presents methods of data-based variance estimation for generalized Cavalieri estimators where errors in sampling positions may occur. Variance estimators are derived under perturbed systematic sampling, systematic sampling with cumulative errors and systematic sampling with random dropouts. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
187
198
http://hdl.handle.net/10.1093/biomet/asq064
application/pdf
Access to full text is restricted to subscribers.
Johanna Ziegel
Eva B. Vedel Jensen
Karl-Anton Dorph-Petersen
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:459-4712015-03-25RePEc:oup:biomet
article
On balanced random imputation in surveys
Random imputation methods are often used in practice because they tend to preserve the distribution of the variable being imputed, which is an important property when the goal is to estimate population quantiles. However, this type of imputation method introduces additional variability, the imputation variance, due to the random selection of residuals. In this paper, we propose a class of random balanced imputation methods under which the imputation variance is eliminated while the distribution of the variable being imputed is preserved. The rationale behind balanced imputation is to select residuals at random so that appropriate constraints are satisfied. We describe an algorithm for selecting the random residuals that can be viewed as an adaptation of the cube algorithm proposed in the context of balanced sampling (Deville & Tille, 2004). Results of a simulation study support our findings. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
459
471
http://hdl.handle.net/10.1093/biomet/asr011
application/pdf
Access to full text is restricted to subscribers.
G. Chauvet
J.-C. Deville
D. Haziza
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:881-8922015-03-25RePEc:oup:biomet
article
Bootstrap confidence intervals and hypothesis tests for extrema of parameters
The bootstrap provides effective and accurate methodology for a wide variety of statistical problems which might not otherwise enjoy practicable solutions. However, there still exist important problems where standard bootstrap estimators are not consistent, and where alternative approaches, for example the m-out-of-n bootstrap and asymptotic methods, also face significant challenges. One of these is the problem of constructing confidence intervals or hypothesis tests for extrema of parameters, for example for the maximum of p parameters where each has to be estimated from data. In the present paper we suggest approaches to solving this problem. We use the bootstrap to construct an accurate estimator of the joint distribution of centred parameter estimators, and we base the procedure, either a confidence interval or a hypothesis test, on that distribution estimator. Our methodology is designed so that it errs on the side of conservatism, modulo the small inaccuracy of the bootstrap step. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
881
892
http://hdl.handle.net/10.1093/biomet/asq045
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Hugh Miller
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:417-4312015-03-25RePEc:oup:biomet
article
Distribution estimators and confidence intervals for stereological volumes
Assessing the precision of volume estimates from systematic samples is a question of great practical importance, but statistically a challenging task due to the strong spatial dependence of the data and typically small sample sizes. The approach taken in this paper is more ambitious than earlier methodologies, the goal of which was estimation of the variance of a volume estimator v̂, rather than estimation of the distribution of v̂. We shall show that bootstrap methods yield consistent estimators of the distribution of v̂, and also suggest a variety of confidence intervals for the true volume. Our new methodology covers cases where serial sections are exactly periodic, as well as instances where the physical slicing procedure introduces errors in the placement of the sampling points. Measurement errors within sections are also taken into account. The performance of the method is illustrated by a simulation study with synthetic data, and also applied to real datasets. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
417
431
http://hdl.handle.net/10.1093/biomet/asr012
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Johanna Ziegel
oai:RePEc:oup:biomet:v:98:y:2011:i:2:p:307-3232015-03-25RePEc:oup:biomet
article
Bayesian influence analysis: a geometric approach
In this paper we develop a general framework of Bayesian influence analysis for assessing various perturbation schemes to the data, the prior and the sampling distribution for a class of statistical models. We introduce a perturbation model to characterize these various perturbation schemes. We develop a geometric framework, called the Bayesian perturbation manifold, and use its associated geometric quantities including the metric tensor and geodesic to characterize the intrinsic structure of the perturbation model. We develop intrinsic influence measures and local influence measures based on the Bayesian perturbation manifold to quantify the effect of various perturbations to statistical models. Theoretical and numerical examples are examined to highlight the broad spectrum of applications of this local influence method in a formal Bayesian analysis. Copyright 2011, Oxford University Press.
2
2011
98
Biometrika
307
323
http://hdl.handle.net/10.1093/biomet/asr009
application/pdf
Access to full text is restricted to subscribers.
Hongtu Zhu
Joseph G. Ibrahim
Niansheng Tang
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:961-9682015-03-25RePEc:oup:biomet
article
Probability-based Latin hypercube designs for slid-rectangular regions
Existing space-filling designs are based on the assumption that the experimental region is rectangular, while in practice this assumption can be violated. Motivated by a data centre thermal management study, a class of probability-based Latin hypercube designs is proposed to accommodate a specific type of irregular region. A heuristic algorithm is proposed to search efficiently for optimal designs. Unbiased estimators are proposed, their variances are given and their performances are compared empirically. The proposed method is applied to obtain an optimal sensor placement plan to monitor and study the thermal distribution in a data centre. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
961
968
http://hdl.handle.net/10.1093/biomet/asq051
application/pdf
Access to full text is restricted to subscribers.
Ying Hung
Yasuo Amemiya
Chien-Fu Jeff Wu
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:133-1462015-03-25RePEc:oup:biomet
article
Partial envelopes for efficient estimation in multivariate linear regression
We introduce the partial envelope model, which leads to a parsimonious method for multivariate linear regression when some of the predictors are of special interest. It has the potential to achieve massive efficiency gains compared with the standard model in the estimation of the coefficients for the selected predictors. The partial envelope model is a variation on the envelope model proposed by Cook et al. (2010) but, as it focuses on part of the predictors, it has looser restrictions and can further improve the efficiency. We develop maximum likelihood estimation for the partial envelope model and discuss applications of the bootstrap. An example is provided to illustrate some of its operating characteristics. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
133
146
http://hdl.handle.net/10.1093/biomet/asq063
application/pdf
Access to full text is restricted to subscribers.
Zhihua Su
R. Dennis Cook
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:977-9842015-03-25RePEc:oup:biomet
article
On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process
The Voronoi estimator may be defined for any location as the inverse of the area of the corresponding Voronoi cell. We investigate the statistical properties of this estimator for the intensity of an inhomogeneous Poisson process, and demonstrate it is approximately unbiased with a gamma sampling distribution. We also introduce the centroidal Voronoi estimator, a simple extension based on spatial regularization of the point pattern. Simulations show the Voronoi estimator has remarkably low bias, while the centroidal Voronoi estimator has slightly more bias but is much less variable. The performance is compared to kernel estimators using two simulated datasets and a dataset consisting of earthquakes within the continental United States. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
977
984
http://hdl.handle.net/10.1093/biomet/asq047
application/pdf
Access to full text is restricted to subscribers.
C. D. Barr
F. P. Schoenberg
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:91-1062015-03-25RePEc:oup:biomet
article
On asymptotic normality and variance estimation for nondifferentiable survey estimators
Survey estimators of population quantities such as distribution functions and quantiles contain nondifferentiable functions of estimated quantities. The theoretical properties of such estimators are substantially more complicated to derive than those of differentiable estimators. In this article, we provide a unified framework for obtaining the asymptotic design-based properties of two common types of nondifferentiable estimators. Estimators of the first type have an explicit expression, while those of the second are defined only as the solution to estimating equations. We propose both analytical and replication-based design-consistent variance estimators for both cases, based on kernel regression. The practical behaviour of the variance estimators is demonstrated in a simulation experiment. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
91
106
http://hdl.handle.net/10.1093/biomet/asq077
application/pdf
Access to full text is restricted to subscribers.
Jianqiang C. Wang
J. D. Opsomer
oai:RePEc:oup:biomet:v:98:y:2011:i:1:p:35-482015-03-25RePEc:oup:biomet
article
Bayesian geostatistical modelling with informative sampling locations
We consider geostatistical models that allow the locations at which data are collected to be informative about the outcomes. A Bayesian approach is proposed, which models the locations using a log Gaussian Cox process, while modelling the outcomes conditionally on the locations as Gaussian with a Gaussian process spatial random effect and adjustment for the location intensity process. We prove posterior propriety under an improper prior on the parameter controlling the degree of informative sampling, demonstrating that the data are informative. In addition, we show that the density of the locations and mean function of the outcome process can be estimated consistently under mild assumptions. The methods show significant evidence of informative sampling when applied to ozone data over Eastern U.S.A. Copyright 2011, Oxford University Press.
1
2011
98
Biometrika
35
48
http://hdl.handle.net/10.1093/biomet/asq067
application/pdf
Access to full text is restricted to subscribers.
D. Pati
B. J. Reich
D. B. Dunson
oai:RePEc:oup:biomet:v:97:y:2010:i:4:p:969-9762015-03-25RePEc:oup:biomet
article
Varying coefficient transformation models with censored data
A maximum likelihood method with spline smoothing is proposed for linear transformation models with varying coefficients. The estimation and inference procedures are computationally easy. Under some regularity conditions, the estimators are proved to be consistent and asymptotically normal. A simulation study using the Stanford transplant data is presented to show that the proposed method performs well with a finite sample and is easy to use in practice. Copyright 2010, Oxford University Press.
4
2010
97
Biometrika
969
976
http://hdl.handle.net/10.1093/biomet/asq032
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Xingwei Tong
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:385-3982013-08-02RePEc:oup:biomet
article
Weighting in survey analysis under informative sampling
Sampling related to the outcome variable of a regression analysis conditional on covariates is called informative sampling and may lead to bias in ordinary least squares estimation. Weighting by the reciprocal of the inclusion probability approximately removes such bias but may inflate variance. This paper investigates two ways of modifying such weights to improve efficiency while retaining consistency. One approach is to multiply the inverse probability weights by functions of the covariates. The second is to smooth the weights given values of the outcome variable and covariates. Optimal ways of constructing weights by these two approaches are explored. Both approaches require the fitting of auxiliary weight models. The asymptotic properties of the resulting estimators are investigated and linearization variance estimators are obtained. The approach is extended to pseudo maximum likelihood estimation for generalized linear models. The properties of the different weighted estimators are compared in a limited simulation study. The robustness of the estimators to misspecification of the auxiliary weight model or of the regression model of interest is discussed. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
385
398
http://hdl.handle.net/10.1093/biomet/ass085
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
C. J. Skinner
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:519-5242013-08-02RePEc:oup:biomet
article
A central limit theorem in the β-model for undirected random graphs with a diverging number of vertices
Chatterjee et al. (2011) established the consistency of the maximum likelihood estimator in the β-model for undirected random graphs when the number of vertices goes to infinity. By approximating the inverse of the Fisher information matrix, we prove asymptotic normality of the maximum likelihood estimator under mild conditions. Simulation studies and a data example illustrate the theoretical results. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
519
524
http://hdl.handle.net/10.1093/biomet/ass084
application/pdf
Access to full text is restricted to subscribers.
Ting Yan
Jinfeng Xu
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:485-4942013-08-02RePEc:oup:biomet
article
Log-mean linear models for binary data
This paper introduces a novel class of models for binary data, which we call log-mean linear models. They are specified by linear constraints on the log-mean linear parameter, defined as a log-linear expansion of the mean parameter of the multivariate Bernoulli distribution. We show that marginal independence relationships between variables can be specified by setting certain log-mean linear interactions to zero and, more specifically, that graphical models of marginal independence are log-mean linear models. Our approach overcomes some drawbacks of the existing parameterizations of graphical models of marginal independence. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
485
494
http://hdl.handle.net/10.1093/biomet/ass080
application/pdf
Access to full text is restricted to subscribers.
A. Roverato
M. Lupparelli
L. La Rocca
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:473-4842013-08-02RePEc:oup:biomet
article
The role of the range parameter for estimation and prediction in geostatistics
Two canonical problems in geostatistics are estimating the parameters in a specified family of stochastic process models and predicting the process at new locations. We show that asymptotic results for a Gaussian process over a fixed domain with Matérn covariance function, previously proven only in the case of a fixed range parameter, can be extended to the case of jointly estimating the range and the variance of the process. Moreover, we show that intuition and approximations derived from asymptotics using a fixed range parameter can be problematic when applied to finite samples, even for large sample sizes. In contrast, we show via simulation that performance is improved and asymptotic approximations are applicable for smaller sample sizes when the parameters are jointly estimated. These effects are particularly apparent when the process is mean square differentiable or the effective range of spatial correlation is small. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
473
484
http://hdl.handle.net/10.1093/biomet/ass079
application/pdf
Access to full text is restricted to subscribers.
C. G. Kaufman
B. A. Shaby
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:269-2762013-08-02RePEc:oup:biomet
article
Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation
We show that the proportional likelihood ratio model proposed recently by Luo & Tsai (2012) enjoys model-invariant properties under certain forms of nonignorable missing mechanisms and randomly double-truncated data, so that target parameters in the population can be estimated consistently from those biased samples. We also construct an alternative estimator for the target parameters by maximizing a pseudolikelihood that eliminates a functional nuisance parameter in the model. The corresponding estimating equation has a U-statistic structure. As an added advantage of the proposed method, a simple score-type test is developed to test a null hypothesis on the regression coefficients. Simulations show that the proposed estimator has a small-sample efficiency similar to that of the nonparametric likelihood estimator and performs well for certain nonignorable missing data problems. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
269
276
http://hdl.handle.net/10.1093/biomet/ass056
application/pdf
Access to full text is restricted to subscribers.
Kwun Chuen Gary Chan
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:431-4452013-08-02RePEc:oup:biomet
article
Simple tiered classifiers
In this paper we propose simple, general tiered classifiers for relatively complex data. Empirical studies on real and simulated data show that three two-tier classifiers, which are respective extensions of linear discriminant analysis, linear logistic regression and support vector machines, can reduce noticeably the relatively high misclassification error of their original single-tier counterparts, without significantly increasing computational labour. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
431
445
http://hdl.handle.net/10.1093/biomet/ass086
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Yingcun Xia
Jing-Hao Xue
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:417-4302013-08-02RePEc:oup:biomet
article
Estimation with missing data: beyond double robustness
We propose an estimator that is more robust than doubly robust estimators, based on weighting complete cases using weights other than inverse probability when estimating the population mean of a response variable subject to ignorable missingness. We allow multiple models for both the propensity score and the outcome regression. Our estimator is consistent if any of the multiple models is correctly specified. Such multiple robustness against model misspecification is a significant improvement over double robustness, which allows only one propensity score model and one outcome regression model. Our estimator attains the semiparametric efficiency bound when one propensity score model and one outcome regression model are correctly specified, without requiring knowledge of which models are correct. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
417
430
http://hdl.handle.net/10.1093/biomet/ass087
application/pdf
Access to full text is restricted to subscribers.
Peisong Han
Lu Wang
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:91-1102013-08-02RePEc:oup:biomet
article
Sampling decomposable graphs using a Markov chain on junction trees
Full Bayesian computational inference for model determination in undirected graphical models is currently restricted to decomposable graphs or other special cases, except for small-scale problems, say up to 15 variables. In this paper we develop new, more efficient methodology for such inference, by making two contributions to the computational geometry of decomposable graphs. The first of these provides sufficient conditions under which it is possible to completely connect two disconnected complete subsets of vertices, or perform the reverse procedure, yet maintain decomposability of the graph. The second is a new Markov chainMonte Carlo sampler for arbitrary positive distributions on decomposable graphs, taking a junction tree representing the graph as its state variable. The resulting methodology is illustrated with numerical experiments on three models. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
91
110
http://hdl.handle.net/10.1093/biomet/ass052
application/pdf
Access to full text is restricted to subscribers.
Peter J. Green
Alun Thomas
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:3-152013-08-02RePEc:oup:biomet
article
Karl Pearson's Biometrika: 1901--36
Karl Pearson edited Biometrika for the first 35 years of its existence. Not only did he shape the journal, he also contributed over 200 pieces and inspired, more or less directly, most of the other contributions. The journal could not be separated from the man. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
3
15
http://hdl.handle.net/10.1093/biomet/ass077
application/pdf
Access to full text is restricted to subscribers.
John Aldrich
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:525-5302013-08-02RePEc:oup:biomet
article
Efficient estimation of the censored linear regression model
In linear regression or accelerated failure time models, complications in efficient estimation arise from the multiple roots of the efficient score and density estimation. This paper proposes a one-step efficient estimation method based on a counting process martingale, which has several advantages: it avoids the multiple-root problem, the initial estimator is easily available and the variance estimator can be obtained by employing plug-in rules. A simple and effective data-driven bandwidth selector is provided. The proposed estimator is proved to be semiparametric efficient, with the same asymptotic variance as the efficient estimator when the error distribution is known up to a location shift. Numerical studies with supportive evidence are presented. The proposal is applied to the Colorado Plateau uranium miners data. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
525
530
http://hdl.handle.net/10.1093/biomet/ass073
application/pdf
Access to full text is restricted to subscribers.
Yuanyuan Lin
Kani Chen
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:511-5182013-08-02RePEc:oup:biomet
article
Composite likelihood estimation for the Brown--Resnick process
Genton et al. (2011) investigated the gain in efficiency when triplewise, rather than pairwise, likelihood is used to fit the popular Smith max-stable model for spatial extremes. We generalize their results to the Brown--Resnick model and show that the efficiency gain is substantial only for very smooth processes, which are generally unrealistic in applications. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
511
518
http://hdl.handle.net/10.1093/biomet/ass089
application/pdf
Access to full text is restricted to subscribers.
R. Huser
A. C. Davison
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:319-3382013-08-02RePEc:oup:biomet
article
Using shared genetic controls in studies of gene-environment interactions
With the advent of modern genomic methods to adjust for population stratification, the use of external or publicly available controls has become an attractive option for reducing the cost of large-scale case-control genetic association studies. In this article, we study the estimation of joint effects of genetic and environmental exposures from a case-control study where data on genome-wide markers are available on the cases and a set of external controls while data on environmental exposures are available on the cases and a set of internal controls. We show that under such a design, one can exploit an assumption of gene-environment independence in the underlying population to estimate the gene-environment joint effects, after adjustment for population stratification. We develop a semiparametric profile likelihood method and related pseudolikelihood and working likelihood methods that are easy to implement in practice. We propose variance estimators for the methods based on asymptotic theory. Simulation is used to study the performance of the methods, and data from a multi-centre genome-wide association study of bladder cancer is further used to illustrate their application. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
319
338
http://hdl.handle.net/10.1093/biomet/ass078
application/pdf
Access to full text is restricted to subscribers.
Yi-Hau Chen
Nilanjan Chatterjee
Raymond J. Carroll
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:213-2202013-08-02RePEc:oup:biomet
article
Spatially varying cross-correlation coefficients in the presence of nugget effects
We derive sufficient conditions for the cross-correlation coefficient of a multivariate spatial process to vary with location when the spatial model is augmented with nugget effects. The derived class is valid for any choice of covariance functions, and yields substantial flexibility between multiple processes. The key is to identify the cross-correlation coefficient matrix with a contraction matrix, which can be either diagonal, implying a parsimonious formulation, or a fully general contraction matrix, yielding greater flexibility but added model complexity. We illustrate the approach with a bivariate minimum and maximum temperature dataset in Colorado, allowing the two variables to be positively correlated at low elevations and nearly independent at high elevations, while still yielding a positive definite covariance matrix. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
213
220
http://hdl.handle.net/10.1093/biomet/ass057
application/pdf
Access to full text is restricted to subscribers.
William Kleiber
Marc G. Genton
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:189-2022013-08-02RePEc:oup:biomet
article
Benchmarking small area estimators
This paper considers benchmarking issues in the context of small area estimation. We find optimal estimators within the class of benchmarked linear estimators under linear constraints. This extends existing results for external and internal benchmarking, and also links the two. Necessary and sufficient conditions for self-benchmarking are found for an augmented model. Most results of this paper are found using ideas of orthogonal projection Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
189
202
http://hdl.handle.net/10.1093/biomet/ass063
application/pdf
Access to full text is restricted to subscribers.
W. R. Bell
G. S. Datta
M. Ghosh
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:339-3542013-08-02RePEc:oup:biomet
article
Estimating time-varying effects for overdispersed recurrent events data with treatment switching
In the analysis of multivariate event times, frailty models assuming time-independent regression coefficients are often considered, mainly due to their mathematical convenience. In practice, regression coefficients are often time dependent and the temporal effects are of clinical interest. Motivated by a phase III clinical trial in multiple sclerosis, we develop a semiparametric frailty modelling approach to estimate time-varying effects for overdispersed recurrent events data with treatment switching. The proposed model incorporates the treatment switching time in the time-varying coefficients. Theoretical properties of the proposed model are established and an efficient expectation-maximization algorithm is derived to obtain the maximum likelihood estimates. Simulation studies evaluate the numerical performance of the proposed model under various temporal treatment effect curves. The ideas in this paper can also be used for time-varying coefficient frailty models without treatment switching as well as for alternative models when the proportional hazard assumption is violated. A multiple sclerosis dataset is analysed to illustrate our methodology. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
339
354
http://hdl.handle.net/10.1093/biomet/ass091
application/pdf
Access to full text is restricted to subscribers.
Qingxia Chen
Donglin Zeng
Joseph G. Ibrahim
Mouna Akacha
Heinz Schmidli
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:229-2342013-08-02RePEc:oup:biomet
article
The Kolmogorov filter for variable screening in high-dimensional binary classification
Variable screening techniques have been proposed to mitigate the impact of high dimensionality in classification problems, including t-test marginal screening (Fan & Fan, 2008) and maximum marginal likelihood screening (Fan & Song, 2010). However, these methods rely on strong modelling assumptions that are easily violated in real applications. To circumvent the parametric modelling assumptions, we propose a new variable screening technique for binary classification based on the Kolmogorov--Smirnov statistic. We prove that this so-called Kolmogorov filter enjoys the sure screening property under much weakened model assumptions. We supplement our theoretical study by a simulation study. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
229
234
http://hdl.handle.net/10.1093/biomet/ass062
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:459-4712013-08-02RePEc:oup:biomet
article
Data augmentation for non-Gaussian regression models using variance-mean mixtures
We use the theory of normal variance-mean mixtures to derive a data-augmentation scheme for a class of common regularization problems. This generalizes existing theory on normal variance mixtures for priors in regression and classification. It also allows variants of the expectation-maximization algorithm to be brought to bear on a wider range of models than previously appreciated. We demonstrate the method on several examples, focusing on the case of binary logistic regression. We also show that quasi-Newton acceleration can substantially improve the speed of the algorithm without compromising its robustness. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
459
471
http://hdl.handle.net/10.1093/biomet/ass081
application/pdf
Access to full text is restricted to subscribers.
N. G. Polson
J. G. Scott
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:1-12013-08-02RePEc:oup:biomet
article
Editorial
1
2013
100
Biometrika
1
1
http://hdl.handle.net/10.1093/biomet/ast003
application/pdf
Access to full text is restricted to subscribers.
A. C. Davison
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:249-2532013-08-02RePEc:oup:biomet
article
Blocked two-level regular factorial designs with weak minimum aberration
This paper considers the construction of blocked two-level regular designs with weak minimum aberration. We first obtain the minimum value of the number of two-factor interactions which are aliased with the block effects. Based on this result, two methods are then proposed in two different scenarios to construct weak minimum aberration blocked two-level designs with respect to some existing combined wordlength patterns. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
249
253
http://hdl.handle.net/10.1093/biomet/ass061
application/pdf
Access to full text is restricted to subscribers.
Shengli Zhao
Pengfei Li
Rohana Karunamuni
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:241-2482013-08-02RePEc:oup:biomet
article
Bias attenuation results for nondifferentially mismeasured ordinal and coarsened confounders
Suppose we are interested in the effect of a binary treatment on an outcome where that relationship is confounded by an ordinal confounder. We assume that the true confounder is not observed but, rather, we observe a nondifferentially mismeasured version of it. We show that, under certain monotonicity assumptions about its effect on the treatment and on the outcome, an effect measure controlling for the mismeasured confounder will fall between the corresponding crude and true effect measures. We also present results for coarsened and, under further assumptions, multiple misclassified confounders. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
241
248
http://hdl.handle.net/10.1093/biomet/ass054
application/pdf
Access to full text is restricted to subscribers.
Elizabeth L. Ogburn
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:157-1722013-08-02RePEc:oup:biomet
article
Simultaneous discovery of rare and common segment variants
Copy number variant is an important type of genetic structural variation appearing in germline DNA, ranging from common to rare in a population. Both rare and common copy number variants have been reported to be associated with complex diseases, so it is important to identify both simultaneously based on a large set of population samples. We develop a proportion adaptive segment selection procedure that automatically adjusts to the unknown proportions of the carriers of the segment variants. We characterize the detection boundary that separates the region where a segment variant is detectable by some method from the region where it cannot be detected. Although the detection boundaries are very different for the rare and common segment variants, it is shown that the proposed procedure can reliably identify both whenever they are detectable. Compared with methods for single-sample analysis, this procedure gains power by pooling information from multiple samples. The method is applied to analyse neuroblastoma samples and identifies a large number of copy number variants that are missed by single-sample methods. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
157
172
http://hdl.handle.net/10.1093/biomet/ass059
application/pdf
Access to full text is restricted to subscribers.
X. Jessie Jeng
T. Tony Cai
Hongzhe Li
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:447-4582013-08-02RePEc:oup:biomet
article
Penalized multivariate Whittle likelihood for power spectrum estimation
Nonparametric estimation procedures that can flexibly account for varying levels of smoothness among different functional parameters, such as penalized likelihoods, have been developed in a variety of settings. However, geometric constraints on power spectra have limited the development of such methods when estimating the power spectrum of a vector-valued time series. This article introduces a penalized likelihood approach to nonparametric multivariate spectral analysis through the minimization of a penalized Whittle negative loglikelihood. This likelihood is derived from the large-sample distribution of the periodogram and includes a penalty function that forms a measure of regularity on multivariate power spectra. The approach allows for varying levels of smoothness among spectral components while accounting for the positive definiteness of spectral matrices and the Hermitian and periodic structures of power spectra as functions of frequency. The consistency of the proposed estimator is derived and its empirical performance is demonstrated in a simulation study and in an analysis of indoor air quality. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
447
458
http://hdl.handle.net/10.1093/biomet/ass088
application/pdf
Access to full text is restricted to subscribers.
Robert T. Krafty
William O. Collinge
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:203-2122013-08-02RePEc:oup:biomet
article
Unified inference for sparse and dense longitudinal models
In longitudinal data analysis, statistical inference for sparse data and dense data could be substantially different. For kernel smoothing, the estimate of the mean function, the convergence rates and the limiting variance functions are different in the two scenarios. This phenomenon poses challenges for statistical inference, as a subjective choice between the sparse and dense cases may lead to wrong conclusions. We develop methods based on self-normalization that can adapt to the sparse and dense cases in a unified framework. Simulations show that the proposed methods outperform some existing methods. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
203
212
http://hdl.handle.net/10.1093/biomet/ass050
application/pdf
Access to full text is restricted to subscribers.
Seonjin Kim
Zhibiao Zhao
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:235-2402013-08-02RePEc:oup:biomet
article
Interval estimation of population means under unknown but bounded probabilities of sample selection
Applying concepts from partial identification to the domain of finite population sampling, we propose a method for interval estimation of a population mean when the probabilities of sample selection lie within a posited interval. The interval estimate is derived from sharp bounds on the Hajek (1971) estimator of the population mean. We demonstrate the method's utility for sensitivity analysis by applying it to a sample of needles collected as part of a syringe tracking and testing programme in New Haven, Connecticut. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
235
240
http://hdl.handle.net/10.1093/biomet/ass064
application/pdf
Access to full text is restricted to subscribers.
Peter M. Aronow
Donald K. K. Lee
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:75-892013-08-02RePEc:oup:biomet
article
Efficient Gaussian process regression for large datasets
Gaussian processes are widely used in nonparametric regression, classification and spatiotemporal modelling, facilitated in part by a rich literature on their theoretical properties. However, one of their practical limitations is expensive computation, typically on the order of n-super-3 where n is the number of data points, in performing the necessary matrix inversions. For large datasets, storage and processing also lead to computational bottlenecks, and numerical stability of the estimates and predicted values degrades with increasing n. Various methods have been proposed to address these problems, including predictive processes in spatial data analysis and the subset-of-regressors technique in machine learning. The idea underlying these approaches is to use a subset of the data, but this raises questions concerning sensitivity to the choice of subset and limitations in estimating fine-scale structure in regions that are not well covered by the subset. Motivated by the literature on compressive sensing, we propose an alternative approach that involves linear projection of all the data points onto a lower-dimensional subspace. We demonstrate the superiority of this approach from a theoretical perspective and through simulated and real data examples. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
75
89
http://hdl.handle.net/10.1093/biomet/ass068
application/pdf
Access to full text is restricted to subscribers.
Anjishnu Banerjee
David B. Dunson
Surya T. Tokdar
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:399-4152013-08-02RePEc:oup:biomet
article
Simple design-efficient calibration estimators for rejective and high-entropy sampling
For survey calibration, consider the situation where the population totals of auxiliary variables are known or where auxiliary variables are measured for all population units. For each situation, we develop design-efficient calibration estimators under rejective or high-entropy sampling. A general approach is to extend efficient estimators for missing-data problems with independent and identically distributed data to the survey setting. We show that this approach effectively resolves two long-standing issues in existing approaches: how to achieve design efficiency regardless of a linear superpopulation model in generalized regression and calibration estimation, and how to find a simple approximation in optimal regression estimation. Moreover, the proposed approach sheds light on several issues that seem not to be well studied in the literature. Examples include use of the weighted Kullback--Leibler distance in calibration estimation, and efficient estimation allowing for misspecification of a nonlinear superpopulation model. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
399
415
http://hdl.handle.net/10.1093/biomet/ass090
application/pdf
Access to full text is restricted to subscribers.
Z. Tan
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:355-3702013-08-02RePEc:oup:biomet
article
Estimation of a sparse group of sparse vectors
We consider estimating a sparse group of sparse normal mean vectors, based on penalized likelihood estimation with complexity penalties on the number of nonzero mean vectors and the numbers of their significant components, which can be performed by a fast algorithm. The resulting estimators are developed within a Bayesian framework and can be viewed as maximum a posteriori estimators. We establish their adaptive minimaxity over a wide range of sparse and dense settings. A simulation study demonstrates the efficiency of the proposed approach, which successfully competes with the sparse group lasso estimator. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
355
370
http://hdl.handle.net/10.1093/biomet/ass082
application/pdf
Access to full text is restricted to subscribers.
Felix Abramovich
Vadim Grinshtein
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:277-2812013-08-02RePEc:oup:biomet
article
Optimal estimation of Poisson intensity with partially observed covariates
Rathbun et al. (2007) and Waagepetersen (2008) propose estimating functions for parameters of Poisson point process intensity that may be applied when space- and/or time-varying covariates are sampled from a probability-based sampling design. This paper demonstrates that Waageptersen's estimating function is optimal in a class of weighted estimating functions. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
277
281
http://hdl.handle.net/10.1093/biomet/ass069
application/pdf
Access to full text is restricted to subscribers.
S. L. Rathbun
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:495-5022013-08-02RePEc:oup:biomet
article
The optimal power puzzle: scrutiny of the monotone likelihood ratio assumption in multiple testing
In single hypothesis testing, power is a nondecreasing function of Type I error rate; hence it is desirable to test at the nominal level exactly to achieve optimal power. The optimal power puzzle arises from the fact that for multiple testing under the false discovery rate paradigm, such a monotonic relationship may not hold. In particular, exact false discovery rate control may lead to a less powerful testing procedure if a test statistic fails to fulfil the monotone likelihood ratio condition. In this article, we identify different scenarios wherein the condition fails and give caveats for conducting multiple testing in practical settings. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
495
502
http://hdl.handle.net/10.1093/biomet/ast001
application/pdf
Access to full text is restricted to subscribers.
Hongyuan Cao
Wenguang Sun
Michael R. Kosorok
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:503-5102013-08-02RePEc:oup:biomet
article
A consistent multivariate test of association based on ranks of distances
We consider the problem of detecting associations between random vectors of any dimension. Few tests of independence exist that are consistent against all dependent alternatives. We propose a powerful test that is applicable in all dimensions and consistent against all alternatives. The test has a simple form, is easy to implement, and has good power. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
503
510
http://hdl.handle.net/10.1093/biomet/ass070
application/pdf
Access to full text is restricted to subscribers.
Ruth Heller
Yair Heller
Malka Gorfine
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:301-3172013-08-02RePEc:oup:biomet
article
A multiple comparison procedure for hypotheses with gatekeeping structure
We develop gatekeeping procedures that focus on comparing multiple treatments with a control when there are multiple endpoints. Our procedures utilize estimated correlations among individual test statistics without parametric assumptions. We make comparisons with other gatekeeping procedures with respect to properties of the trade-off in statistical power between families of hypotheses. We introduce a reward function to facilitate these comparisons. We illustrate our methods by simulation and an analysis of data from a randomized, multi-armed clinical trial. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
301
317
http://hdl.handle.net/10.1093/biomet/ass083
application/pdf
Access to full text is restricted to subscribers.
Xiaolong Luo
Guang Chen
S. Peter Ouyang
Bruce W. Turnbull
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:125-1382013-08-02RePEc:oup:biomet
article
A nonparametric prior for simultaneous covariance estimation
In the modelling of longitudinal data from several groups, appropriate handling of the dependence structure is of central importance. Standard methods include specifying a single covariance matrix for all groups or independently estimating the covariance matrix for each group without regard to the others, but when these model assumptions are incorrect, these techniques can lead to biased mean effects or loss of efficiency, respectively. Thus, it is desirable to develop methods for simultaneously estimating the covariance matrix for each group that will borrow strength across groups in a way that is ultimately informed by the data. In addition, for several groups with covariance matrices of even medium dimension, it is difficult to manually select a single best parametric model among the huge number of possibilities given by incorporating structural zeros and/or commonality of individual parameters across groups. In this paper we develop a family of nonparametric priors using the matrix stick-breaking process of Dunson et al. (2008) that seeks to accomplish this task by parameterizing the covariance matrices in terms of their modified Cholesky decompositions (Pourahmadi, 1999). We establish some theoretical properties of these priors, examine their effectiveness via a simulation study, and illustrate the priors using data from a longitudinal clinical trial. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
125
138
http://hdl.handle.net/10.1093/biomet/ass060
application/pdf
Access to full text is restricted to subscribers.
Jeremy T. Gaskins
Michael J. Daniels
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:17-732013-08-02RePEc:oup:biomet
article
Biometrika highlights from volume 28 onwards
Highlights, trends and influences are identified associated with the pages of Biometrika subsequent to the editorship of Karl Pearson. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
17
73
http://hdl.handle.net/10.1093/biomet/ass076
application/pdf
Access to full text is restricted to subscribers.
D. M. Titterington
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:254-2602013-08-02RePEc:oup:biomet
article
Strong orthogonal arrays and associated Latin hypercubes for computer experiments
This paper introduces, constructs and studies a new class of arrays, called strong orthogonal arrays, as suitable designs for computer experiments. A strong orthogonal array of strength t enjoys better space-filling properties than a comparable orthogonal array in all dimensions lower than t while retaining the space-filling properties of the latter in t dimensions. Latin hypercubes based on strong orthogonal arrays of strength t are more space-filling than comparable orthogonal array-based Latin hypercubes in all g dimensions for any 2 ≤ g ≤ t - 1. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
254
260
http://hdl.handle.net/10.1093/biomet/ass065
application/pdf
Access to full text is restricted to subscribers.
Yuanzhen He
Boxin Tang
oai:RePEc:oup:biomet:v:100:y:2013:i:1:p:173-1872013-08-02RePEc:oup:biomet
article
Smoothed nonparametric estimation for current status competing risks data
We study the nonparametric estimation of the cumulative incidence function and the cause-specific hazard function for current status data with competing risks via kernel smoothing. A smoothed naive nonparametric maximum likelihood estimator and a smoothed full nonparametric maximum likelihood estimator are shown to have pointwise asymptotic normality and faster convergence rates than the corresponding unsmoothed nonparametric likelihood estimators. Using the smoothed estimators and the plug-in principle, we can estimate the cause-specific hazard function, which has not been studied previously. We also propose semi-smoothed estimators of the cause-specific hazard as an alternative to the smoothed estimator and demonstrate that neither is uniformly more efficient than the other. Numerical studies show that a smoothed bootstrap method works well for selecting the bandwidths in the smoothed nonparametric estimation. The use of the estimators is exemplified by an application to cumulative incidence and hazard of subtype-specific HIV infection from a sero-prevalence study in injecting drug users in Thailand. Copyright 2013, Oxford University Press.
1
2013
100
Biometrika
173
187
http://hdl.handle.net/10.1093/biomet/ass053
application/pdf
Access to full text is restricted to subscribers.
Chenxi Li
Jason P. Fine
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:283-3002013-08-02RePEc:oup:biomet
article
Simultaneous confidence intervals uniformly more likely to determine signs
Many studies draw inferences about multiple endpoints but ignore the statistical implications of multiplicity. Effects inferred to be positive when there is no adjustment for multiplicity can lose their statistical significance when multiplicity is taken into account, perhaps explaining why such adjustments are so often omitted. We develop new simultaneous confidence intervals that mitigate this problem; these are uniformly more likely to determine signs than are standard simultaneous confidence intervals. When one or more of the parameter estimates are small, the new intervals sacrifice some length to avoid crossing zero; but when all the parameter estimates are large, the new intervals coincide with standard simultaneous confidence intervals, so there is no loss of precision. When only a small fraction of the estimates are small, the procedure can determine signs essentially as well as one-sided tests with prespecified directions, incurring only a modest penalty in maximum length. The intervals are constructed by inverting level-α tests to form a 1 - α confidence set, and then projecting that set onto the coordinate axes to get confidence intervals. The tests have hyper-rectangular acceptance regions that minimize the maximum amount by which the acceptance region protrudes from the orthant that contains the hypothesized parameter value, subject to a constraint on the maximum side-length of the hyper-rectangle. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
283
300
http://hdl.handle.net/10.1093/biomet/ass074
application/pdf
Access to full text is restricted to subscribers.
Yoav Benjamini
Vered Madar
Philip B. Stark
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:371-3832013-08-02RePEc:oup:biomet
article
Efficiency loss and the linearity condition in dimension reduction
Linearity, sometimes jointly with constant variance, is routinely assumed in the context of sufficient dimension reduction. It is well understood that, when these conditions do not hold, blindly using them may lead to inconsistency in estimating the central subspace and the central mean subspace. Surprisingly, we discover that even if these conditions do hold, using them will bring efficiency loss. This paradoxical phenomenon is illustrated through sliced inverse regression and principal Hessian directions. The efficiency loss also applies to other dimension reduction procedures. We explain this empirical discovery by theoretical investigation. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
371
383
http://hdl.handle.net/10.1093/biomet/ass075
application/pdf
Access to full text is restricted to subscribers.
Yanyuan Ma
Liping Zhu
oai:RePEc:oup:biomet:v:100:y:2013:i:2:p:531-5372013-08-02RePEc:oup:biomet
article
On the likelihood ratio test for envelope models in multivariate linear regression
We investigate the likelihood ratio test for a hypothesis regarding the dimension of the Σ-envelope of span(β) in a multivariate linear regression model. The asymptotic null distribution of the likelihood ratio statistic is obtained as some nuisance parameters approach infinity. A saddlepoint approximation is also given for this limiting distribution. The accuracy of this approximation and its comparison to the standard chi-squared approximation are assessed via simulation. The results can be used in a similar test for partial envelope models. Copyright 2013, Oxford University Press.
2
2013
100
Biometrika
531
537
http://hdl.handle.net/10.1093/biomet/ast002
application/pdf
Access to full text is restricted to subscribers.
James R. Schott
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:43-552012-05-01RePEc:oup:biomet
article
Modelling the distribution of the cluster maxima of exceedances of subasymptotic thresholds
A standard approach to model the extreme values of a stationary process is the peaks over threshold method, which consists of imposing a high threshold, identifying clusters of exceedances of this threshold and fitting the maximum value from each cluster using the generalized Pareto distribution. This approach is strongly justified by underlying asymptotic theory. We propose an alternative model for the distribution of the cluster maxima that accounts for the subasymptotic theory of extremes of a stationary process. This new distribution is a product of two terms, one for the marginal distribution of exceedances and the other for the dependence structure of the exceedance values within a cluster. We illustrate the improvement in fit, measured by the root mean square error of the estimated quantiles, offered by the new distribution over the peaks over thresholds analysis using simulated and hydrological data, and we suggest a diagnostic tool to help identify when the proposed model is likely to lead to an improved fit. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
43
55
http://hdl.handle.net/10.1093/biomet/asr078
application/pdf
Access to full text is restricted to subscribers.
Emma F. Eastoe
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:230-2372012-05-01RePEc:oup:biomet
article
Estimating overdispersion when fitting a generalized linear model to sparse data
We consider the problem of fitting a generalized linear model to overdispersed data, focussing on a quasilikelihood approach in which the variance is assumed to be proportional to that specified by the model, and the constant of proportionality, φ, is used to obtain appropriate standard errors and model comparisons. It is common practice to base an estimate of φ on Pearson's lack-of-fit statistic, with or without Farrington's modification. We propose a new estimator that has a smaller variance, subject to a condition on the third moment of the response variable. We conjecture that this condition is likely to be achieved for the important special cases of count and binomial data. We illustrate the benefits of the new estimator using simulations for both count and binomial data. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
230
237
http://hdl.handle.net/10.1093/biomet/asr083
application/pdf
Access to full text is restricted to subscribers.
D. J. Fletcher
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:85-1002012-05-01RePEc:oup:biomet
article
Combining data from two independent surveys: a model-assisted approach
Combining information from two or more independent surveys is a problem frequently encountered in survey sampling. We consider the case of two independent surveys, where a large sample from survey 1 collects only auxiliary information and a much smaller sample from survey 2 provides information on both the variables of interest and the auxiliary variables. We propose a model-assisted projection method of estimation based on a working model, but the reference distribution is design-based. We generate synthetic or proxy values of a variable of interest by first fitting the working model, relating the variable of interest to the auxiliary variables, to the data from survey 2 and then predicting the variable of interest associated with the auxiliary variables observed in survey 1. The projection estimator of a total is simply obtained from the survey 1 weights and associated synthetic values. We identify the conditions for the projection estimator to be asymptotically unbiased. Domain estimation using the projection method is also considered. Replication variance estimators are obtained by augmenting the synthetic data file for survey 1 with additional synthetic columns associated with the columns of replicate weights. Results from a simulation study are presented. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
85
100
http://hdl.handle.net/10.1093/biomet/asr063
application/pdf
Access to full text is restricted to subscribers.
Jae Kwang Kim
J. N. K. Rao
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:167-1842012-05-01RePEc:oup:biomet
article
Estimating treatment effects with treatment switching via semicompeting risks models: an application to a colorectal cancer study
Treatment switching is a frequent occurrence in clinical trials, where, during the course of the trial, patients who fail on the control treatment may change to the experimental treatment. Analysing the data without accounting for switching yields highly biased and inefficient estimates of the treatment effect. In this paper, we propose a novel class of semiparametric semicompeting risks transition survival models to accommodate treatment switches. Theoretical properties of the proposed model are examined and an efficient expectation-maximization algorithm is derived for obtaining the maximum likelihood estimates. Simulation studies are conducted to demonstrate the superiority of the model compared with the intent-to-treat analysis and other methods proposed in the literature. The proposed method is applied to data from a colorectal cancer clinical trial. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
167
184
http://hdl.handle.net/10.1093/biomet/asr062
application/pdf
Access to full text is restricted to subscribers.
Donglin Zeng
Qingxia Chen
Ming-Hui Chen
Joseph G. Ibrahim
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:15-282012-05-01RePEc:oup:biomet
article
Factor profiled sure independence screening
We propose a method of factor profiled sure independence screening for ultrahigh-dimensional variable selection. The objective of this method is to identify nonzero components consistently from a sparse coefficient vector. The new method assumes that the correlation structure of the high-dimensional data can be well represented by a set of low-dimensional latent factors, which can be estimated consistently by eigenvalue-eigenvector decomposition. The estimated latent factors should then be profiled out from both the response and the predictors. Such an operation, referred to as factor profiling, produces uncorrelated predictors. Therefore, sure independence screening can be applied subsequently and the resulting screening result is consistent for model selection, a major advantage that standard sure independence screening does not share. We refer to the new method as factor profiled sure independence screening. Numerical studies confirm its outstanding performance. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
15
28
http://hdl.handle.net/10.1093/biomet/asr074
application/pdf
Access to full text is restricted to subscribers.
H. Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:101-1132012-05-01RePEc:oup:biomet
article
Optimal allocation to maximize the power of two-sample tests for binary response
We study allocations that maximize the power of tests of equality of two treatments having binary outcomes. When a normal approximation applies, the asymptotic power is maximized by minimizing the variance, leading to a Neyman allocation that assigns observations in proportion to the standard deviations. This allocation, which in general requires knowledge of the parameters of the problem, is recommended in a large body of literature. Under contiguous alternatives the normal approximation indeed applies, and in this case the Neyman allocation reduces to a balanced design. However, when studying the power under a noncontiguous alternative, a large deviations approximation is needed, and the Neyman allocation is no longer asymptotically optimal. In the latter case, the optimal allocation depends on the parameters, but is rather close to a balanced design. Thus, a balanced design is a viable option for both contiguous and noncontiguous alternatives. Finite sample studies show that a balanced design is indeed generally quite close to being optimal for power maximization. This is good news as implementation of a balanced design does not require knowledge of the parameters. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
101
113
http://hdl.handle.net/10.1093/biomet/asr077
application/pdf
Access to full text is restricted to subscribers.
D. Azriel
M. Mandel
Y. Rinott
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:71-842012-05-01RePEc:oup:biomet
article
Optimal fractions of two-level factorials under a baseline parameterization
Two-level fractional factorial designs are considered under a baseline parameterization. The criterion of minimum aberration is formulated in this context and optimal designs under this criterion are investigated. The underlying theory and the concept of isomorphism turn out to be significantly different from their counterparts under orthogonal parameterization, and this is reflected in the optimal designs obtained. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
71
84
http://hdl.handle.net/10.1093/biomet/asr071
application/pdf
Access to full text is restricted to subscribers.
Rahul Mukerjee
Boxin Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:151-1652012-05-01RePEc:oup:biomet
article
A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error
Covariate measurement error and missing responses are typical features in longitudinal data analysis. There has been extensive research on either covariate measurement error or missing responses, but relatively little work has been done to address both simultaneously. In this paper, we propose a simple method for the marginal analysis of longitudinal data with time-varying covariates, some of which are measured with error, while the response is subject to missingness. Our method has a number of appealing properties: assumptions on the model are minimal, with none needed about the distribution of the mismeasured covariate; implementation is straightforward and its applicability is broad. We provide both theoretical justification and numerical results. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
151
165
http://hdl.handle.net/10.1093/biomet/asr076
application/pdf
Access to full text is restricted to subscribers.
Grace Y. Yi
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:245-2512012-05-01RePEc:oup:biomet
article
Optimality of group testing in the presence of misclassification
Several optimality properties of Dorfman's (1943) group testing procedure are derived for estimation of the prevalence of a rare disease whose status is classified with error. Exact ranges of disease prevalence are obtained for which group testing provides more efficient estimation when group size increases. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
245
251
http://hdl.handle.net/10.1093/biomet/asr064
application/pdf
Access to full text is restricted to subscribers.
Aiyi Liu
Chunling Liu
Zhiwei Zhang
Paul S. Albert
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:238-2442012-05-01RePEc:oup:biomet
article
On robust estimation via pseudo-additive information
We consider a robust parameter estimator minimizing an empirical approximation to the q-entropy and show its relationship to minimization of power divergences through a simple parameter transformation. The estimator balances robustness and efficiency through a tuning constant q and avoids kernel density smoothing. We derive an upper bound to the estimator mean squared error under a contaminated reference model and use it as a min-max criterion for selecting q. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
238
244
http://hdl.handle.net/10.1093/biomet/asr061
application/pdf
Access to full text is restricted to subscribers.
Davide Ferrari
Davide La Vecchia
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:57-692012-05-01RePEc:oup:biomet
article
Conservative hypothesis tests and confidence intervals using importance sampling
Importance sampling is a common technique for Monte Carlo approximation, including that of p-values. Here it is shown that a simple correction of the usual importance sampling p-values provides valid p-values, meaning that a hypothesis test created by rejecting the null hypothesis when the p-value is at most α will also have a Type I error rate of at most α. This correction uses the importance weight of the original observation, which gives valuable diagnostic information under the null hypothesis. Using the corrected p-values can be crucial for multiple testing and also in problems where evaluating the accuracy of importance sampling approximations is difficult. Inverting the corrected p-values provides a useful way to create Monte Carlo confidence intervals that maintain the nominal significance level and use only a single Monte Carlo sample. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
57
69
http://hdl.handle.net/10.1093/biomet/asr079
application/pdf
Access to full text is restricted to subscribers.
Matthew T. Harrison
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:211-2222012-05-01RePEc:oup:biomet
article
A proportional likelihood ratio model
We propose a semiparametric proportional likelihood ratio model which is particularly suitable for modelling a nonlinear monotonic relationship between the outcome variable and a covariate. This model extends the generalized linear model by leaving the distribution unspecified, and has a strong connection with semiparametric models such as the selection bias model (Gilbert et al., 1999), the density ratio model (Qin, 1998; Fokianos & Kaimi, 2006), the single-index model (Ichimura, 1993) and the exponential tilt regression model (Rathouz & Gao, 2009). A maximum likelihood estimator is obtained for the new model and its asymptotic properties are derived. An example and simulation study illustrate the use of the model. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
211
222
http://hdl.handle.net/10.1093/biomet/asr060
application/pdf
Access to full text is restricted to subscribers.
Xiaodong Luo
Wei Yann Tsai
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:223-2292012-05-01RePEc:oup:biomet
article
Proportional likelihood ratio models for mean regression
The proportional likelihood ratio model introduced in Luo & Tsai (2012) is adapted to explicitly model the means of observations. This is useful for the estimation of and inference on treatment effects, particularly in designed experiments and allows the data analyst greater control over model specification and parameter interpretation. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
223
229
http://hdl.handle.net/10.1093/biomet/asr075
application/pdf
Access to full text is restricted to subscribers.
Alan Huang
Paul J. Rathouz
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:115-1262012-05-01RePEc:oup:biomet
article
Directed acyclic graphs with edge-specific bounds
We give a definition of a bounded edge within the causal directed acyclic graph framework. A bounded edge generalizes the notion of a signed edge and is defined in terms of bounds on a ratio of survivor probabilities. We derive rules concerning the propagation of bounds. Bounds on causal effects in the presence of unmeasured confounding are also derived using bounds related to specific edges on a graph. We illustrate the theory developed by an example concerning estimating the effect of antihistamine treatment on asthma in the presence of unmeasured confounding. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
115
126
http://hdl.handle.net/10.1093/biomet/asr059
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
Zhiqiang Tan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:127-1402012-05-01RePEc:oup:biomet
article
Bayesian analysis of multistate event history data: beta-Dirichlet process prior
Bayesian analysis of a finite state Markov process, which is popularly used to model multistate event history data, is considered. A new prior process, called a beta-Dirichlet process, is introduced for the cumulative intensity functions and is proved to be conjugate. In addition, the beta-Dirichlet prior is applied to a Bayesian semiparametric regression model. To illustrate the application of the proposed model, we analyse a dataset of credit histories. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
127
140
http://hdl.handle.net/10.1093/biomet/asr067
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Lancelot James
Rafael Weissbach
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:185-1972012-05-01RePEc:oup:biomet
article
Mean residual life models with time-dependent coefficients under right censoring
The mean residual life provides the remaining life expectancy of a subject who has survived to a certain time-point. When covariates are present, regression models are needed to study the association between the mean residual life function and potential regression covariates. In this paper, we propose a flexible class of semiparametric mean residual life models where some effects may be time-varying and some may be constant over time. In the presence of right censoring, we use the inverse probability of censoring weighting approach and develop inference procedures for estimating the model parameters. In addition, we provide graphical and numerical methods for model checking and tests for examining whether or not the covariate effects vary with time. Asymptotic and finite sample properties of the proposed estimators are established and the approach is applied to real life datasets collected from clinical trials. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
185
197
http://hdl.handle.net/10.1093/biomet/asr065
application/pdf
Access to full text is restricted to subscribers.
Liuquan Sun
Xinyuan Song
Zhigang Zhang
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:29-422012-05-01RePEc:oup:biomet
article
A direct approach to sparse discriminant analysis in ultra-high dimensions
Sparse discriminant methods based on independence rules, such as the nearest shrunken centroids classifier (Tibshirani et al., 2002) and features annealed independence rules (Fan & Fan, 2008), have been proposed as computationally attractive tools for feature selection and classification with high-dimensional data. A fundamental drawback of these rules is that they ignore correlations among features and thus could produce misleading feature selection and inferior classification. We propose a new procedure for sparse discriminant analysis, motivated by the least squares formulation of linear discriminant analysis. To demonstrate our proposal, we study the numerical and theoretical properties of discriminant analysis constructed via lasso penalized least squares. Our theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate the Bayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size. The theory allows for general dependence among features. Simulated and real data examples show that lassoed discriminant analysis compares favourably with other popular sparse discriminant proposals. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
29
42
http://hdl.handle.net/10.1093/biomet/asr066
application/pdf
Access to full text is restricted to subscribers.
Qing Mai
Hui Zou
Ming Yuan
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:199-2102012-05-01RePEc:oup:biomet
article
A maximum pseudo-profile likelihood estimator for the Cox model under length-biased sampling
This paper considers semiparametric estimation of the Cox proportional hazards model for right-censored and length-biased data arising from prevalent sampling. To exploit the special structure of length-biased sampling, we propose a maximum pseudo-profile likelihood estimator, which can handle time-dependent covariates and is consistent under covariate-dependent censoring. Simulation studies show that the proposed estimator is more efficient than its competitors. A data analysis illustrates the methods and theory. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
199
210
http://hdl.handle.net/10.1093/biomet/asr072
application/pdf
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Jing Qin
Dean A. Follmann
oai:RePEc:oup:biomet:v:99:y:2012:i:1:p:1-142012-05-01RePEc:oup:biomet
article
Studies in the history of probability and statistics, L: Karl Pearson and the Rule of Three
Karl Pearson's role in the transformation that took the 19th century statistics of Laplace and Gauss into the modern era of 20th century multivariate analysis is examined from a new point of view. By viewing Pearson's work in the context of a motto he adopted from Charles Darwin, a philosophical theme is identified in Pearson's statistical work, and his three major achievements are briefly described. Copyright 2012, Oxford University Press.
1
2012
99
Biometrika
1
14
http://hdl.handle.net/10.1093/biomet/asr046
application/pdf
Access to full text is restricted to subscribers.
Stephen M. Stigler
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:253-2722012-08-31RePEc:oup:biomet
article
Dependence modelling for spatial extremes
Current dependence models for spatial extremes are based upon max-stable processes. Within this class, there are few inferentially viable models available, and we propose one further model. More problematic are the restrictive assumptions that must be made when using max-stable processes to model dependence for spatial extremes: it must be assumed that the dependence structure of the observed extremes is compatible with a limiting model that holds for all events more extreme than those that have already occurred. This problem has long been acknowledged in the context of finite-dimensional multivariate extremes, in particular when data display dependence at observable levels, but are independent in the limit. We propose a flexible class of models that is suitable for such data in a spatial context. In addition, we consider the situation where the extremal dependence structure may vary with distance. We apply our models to spatially referenced significant wave height data from the North Sea, finding evidence that their extremal structure is not compatible with a limiting dependence model. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
253
272
http://hdl.handle.net/10.1093/biomet/asr080
application/pdf
Access to full text is restricted to subscribers.
Jennifer L. Wadsworth
Jonathan A. Tawn
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:473-4802012-08-31RePEc:oup:biomet
article
A new residual for ordinal outcomes
We propose a new residual for regression models of ordinal outcomes, defined as E{sign(y,Y)}, where y is the observed outcome and Y is a random variable from the fitted distribution. This new residual is a single value per subject irrespective of the number of categories of the ordinal outcome, contains directional information between the observed value and the fitted distribution, and does not require the assignment of arbitrary numbers to categories. We study its properties, describe its connections with other residuals, ranks and ridits, and demonstrate its use in model diagnostics. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
473
480
http://hdl.handle.net/10.1093/biomet/asr073
application/pdf
Access to full text is restricted to subscribers.
Chun Li
Bryan E. Shepherd
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:327-3432012-08-31RePEc:oup:biomet
article
Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions
In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
327
343
http://hdl.handle.net/10.1093/biomet/ass006
application/pdf
Access to full text is restricted to subscribers.
Yongseok Park
Jeremy M. G. Taylor
John D. Kalbfleisch
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:569-5832012-08-31RePEc:oup:biomet
article
On multilinear principal component analysis of order-two tensors
Principal component analysis is commonly used for dimension reduction in analysing high-dimensional data. Multilinear principal component analysis aims to serve a similar function for analysing tensor structure data, and has empirically been shown effective in reducing dimensionality. In this paper, we investigate its statistical properties and demonstrate its advantages. Conventional principal component analysis, which vectorizes the tensor data, may lead to inefficient and unstable prediction due to the often extremely large dimensionality involved. Multilinear principal component analysis, in trying to preserve the data structure, searches for low-dimensional projections and, thereby, decreases dimensionality more efficiently. The asymptotic theory of order-two multilinear principal component analysis, including asymptotic efficiency and distributions of principal components, associated projections, and the explained variance, is developed. A test of dimensionality is also proposed. Finally, multilinear principal component analysis is shown to improve conventional principal component analysis in analysing the Olivetti faces dataset, which is achieved by extracting a more modularly oriented basis set in reconstructing the test faces. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
569
583
http://hdl.handle.net/10.1093/biomet/ass019
application/pdf
Access to full text is restricted to subscribers.
Hung Hung
Peishien Wu
Iping Tu
Suyun Huang
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:615-6302012-08-31RePEc:oup:biomet
article
Predictive accuracy of covariates for event times
We propose a graphical measure, the generalized negative predictive function, to quantify the predictive accuracy of covariates for survival time or recurrent event times. This new measure characterizes the event-free probabilities over time conditional on a thresholded linear combination of covariates and has direct clinical utility. We show that this function is maximized at the set of covariates truly related to event times and thus can be used to compare the predictive accuracy of different sets of covariates. We construct nonparametric estimators for this function under right censoring and prove that the proposed estimators, upon proper normalization, converge weakly to zero-mean Gaussian processes. To bypass the estimation of complex density functions involved in the asymptotic variances, we adopt the bootstrap approach and establish its validity. Simulation studies demonstrate that the proposed methods perform well in practical situations. Two clinical studies are presented. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
615
630
http://hdl.handle.net/10.1093/biomet/ass018
application/pdf
Access to full text is restricted to subscribers.
Li Chen
D. Y. Lin
Donglin Zeng
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:741-7472012-08-31RePEc:oup:biomet
article
The fitting of complex parametric models
Consider parametric models that are too complicated to allow calculation of a likelihood but from which observations can be simulated. We examine parameter estimators that are linear functions of a possibly large set of candidate features. A combination of simulations based on a fractional design and sets of discriminant analyses is then used to find an optimal estimator of the vector parameter and its covariance matrix. The procedure is an alternative to the approximate Bayesian computation scheme. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
741
747
http://hdl.handle.net/10.1093/biomet/ass030
application/pdf
Access to full text is restricted to subscribers.
D. R. Cox
Christiana Kartsonaki
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:687-7022012-08-31RePEc:oup:biomet
article
Inner envelopes: efficient estimation in multivariate linear regression
In this article we propose a new model, called the inner envelope model, which leads to efficient estimation in the context of multivariate normal linear regression. The asymptotic distribution and the consistency of its maximum likelihood estimators are established. Theoretical results, simulation studies and examples all show that the efficiency gains can be substantial relative to standard methods and to the maximum likelihood estimators from the envelope model introduced recently by Cook et al. (2010). Compared to the envelope model, the inner envelope model is based on a different construction and it can produce substantial efficiency gains in situations where the envelope model offers no gains. In effect, inner envelopes open a new frontier to the way in which reducing subspaces can be used to improve efficiency in multivariate problems. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
687
702
http://hdl.handle.net/10.1093/biomet/ass024
application/pdf
Access to full text is restricted to subscribers.
Zhihua Su
R. Dennis Cook
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:405-4212012-08-31RePEc:oup:biomet
article
Corrected-loss estimation for quantile regression with covariate measurement errors
We study estimation in quantile regression when covariates are measured with errors. Existing methods require stringent assumptions, such as spherically symmetric joint distribution of the regression and measurement error variables, or linearity of all quantile functions, which restrict model flexibility and complicate computation. In this paper, we develop a new estimation approach based on corrected scores to account for a class of covariate measurement errors in quantile regression. The proposed method is simple to implement. Its validity requires only linearity of the particular quantile function of interest, and it requires no parametric assumptions on the regression error distributions. Finite-sample results demonstrate that the proposed estimators are more efficient than the existing methods in various models considered. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
405
421
http://hdl.handle.net/10.1093/biomet/ass005
application/pdf
Access to full text is restricted to subscribers.
Huixia Judy Wang
Leonard A. Stefanski
Zhongyi Zhu
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:649-6622012-08-31RePEc:oup:biomet
article
Modelling covariance structure in bivariate marginal models for longitudinal data
It can be more challenging to efficiently model the covariance matrices for multivariate longitudinal data than for the univariate case, due to the correlations arising between multiple responses. The positive-definiteness constraint and the high dimensionality are further obstacles in covariance modelling. In this paper, we develop a data-based method by which the parameters in the covariance matrices are replaced by unconstrained and interpretable parameters with reduced dimensions. The maximum likelihood estimators for the mean and covariance parameters are shown to be consistent and asymptotically normally distributed. Simulations and real data analysis show that the new approach performs very well even when modelling bivariate nonstationary dependence structures. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
649
662
http://hdl.handle.net/10.1093/biomet/ass031
application/pdf
Access to full text is restricted to subscribers.
Jing Xu
Gilbert Mackenzie
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:511-5312012-08-31RePEc:oup:biomet
article
Nonparametric estimation of diffusions: a differential equations approach
We consider estimation of scalar functions that determine the dynamics of diffusion processes. It has been recently shown that nonparametric maximum likelihood estimation is ill-posed in this context. We adopt a probabilistic approach to regularize the problem by the adoption of a prior distribution for the unknown functional. A Gaussian prior measure is chosen in the function space by specifying its precision operator as an appropriate differential operator. We establish that a Bayesian--Gaussian conjugate analysis for the drift of one-dimensional nonlinear diffusions is feasible using high-frequency data, by expressing the loglikelihood as a quadratic function of the drift, with sufficient statistics given by the local time process and the end points of the observed path. Computationally efficient posterior inference is carried out using a finite element method. We embed this technology in partially observed situations and adopt a data augmentation approach whereby we iteratively generate missing data paths and draws from the unknown functional. Our methodology is applied to estimate the drift of models used in molecular dynamics and financial econometrics using high- and low-frequency observations. We discuss extensions to other partially observed schemes and connections to other types of nonparametric inference. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
511
531
http://hdl.handle.net/10.1093/biomet/ass034
application/pdf
Access to full text is restricted to subscribers.
Omiros Papaspiliopoulos
Yvo Pokern
Gareth O. Roberts
Andrew M. Stuart
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:551-5682012-08-31RePEc:oup:biomet
article
Analysis of principal nested spheres
A general framework for a novel non-geodesic decomposition of high-dimensional spheres or high-dimensional shape spaces for planar landmarks is discussed. The decomposition, principal nested spheres, leads to a sequence of submanifolds with decreasing intrinsic dimensions, which can be interpreted as an analogue of principal component analysis. In a number of real datasets, an apparent one-dimensional mode of variation curving through more than one geodesic component is captured in the one-dimensional component of principal nested spheres. While analysis of principal nested spheres provides an intuitive and flexible decomposition of the high-dimensional sphere, an interesting special case of the analysis results in finding principal geodesics, similar to those from previous approaches to manifold principal component analysis. An adaptation of our method to Kendall's shape space is discussed, and a computational algorithm for fitting principal nested spheres is proposed. The result provides a coordinate system to visualize the data structure and an intuitive summary of principal modes of variation, as exemplified by several datasets. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
551
568
http://hdl.handle.net/10.1093/biomet/ass022
application/pdf
Access to full text is restricted to subscribers.
Sungkyu Jung
Ian L. Dryden
J. S. Marron
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:502-5082012-08-31RePEc:oup:biomet
article
Inference for additive interaction under exposure misclassification
Results are given concerning inferences that can be drawn about interaction when binary exposures are subject to certain forms of independent nondifferential misclassification. Tests for interaction, using the misclassified exposures, are valid provided the probability of misclassification satisfies certain bounds. Results are given for additive statistical interactions, for causal interactions corresponding to synergism in the sufficient cause framework and for so-called compositional epistasis. Both two-way and three-way interactions are considered. The results require only that the probability of misclassification be no larger than 1/2 or 1/4, depending on the test. For additive statistical interaction, a method to correct estimates and confidence intervals for misclassification is described. The consequences for power of interaction tests under exposure misclassification are explored through simulations. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
502
508
http://hdl.handle.net/10.1093/biomet/ass012
application/pdf
Access to full text is restricted to subscribers.
Tyler J. Vanderweele
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:631-6482012-08-31RePEc:oup:biomet
article
An efficient method of estimation for longitudinal surveys with monotone missing data
Panel attrition is frequently encountered in panel sample surveys. When it is related to the observed study variable, the classical approach of nonresponse adjustment using a covariate-dependent dropout mechanism can be biased. We consider an efficient method of estimation with monotone panel attrition when the response probability depends on the previous values of study variable as well as other covariates. Because of the monotone structure of the missing pattern, the response mechanism is missing at random. The proposed estimator is asymptotically optimal in the sense that it minimizes the asymptotic variance of a class of estimators that can be written as a linear combination of the unbiased estimators of the panel estimates for each wave, and incorporates all available information using generalized least squares. Variance estimation is discussed and results from a simulation study are presented. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
631
648
http://hdl.handle.net/10.1093/biomet/ass026
application/pdf
Access to full text is restricted to subscribers.
Ming Zhou
Jae Kwang Kim
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:457-4722012-08-31RePEc:oup:biomet
article
Empirical bootstrap bias correction and estimation of prediction mean square error in small area estimation
We develop a method for bias correction, which models the error of the target estimator as a function of the corresponding estimator obtained from bootstrap samples, and the original estimators and bootstrap estimators of the parameters governing the model fitted to the sample data. This is achieved by considering a number of plausible parameter values, generating a pseudo original sample for each parameter and bootstrap samples for each such sample, and then searching for an appropriate functional relationship. Under certain conditions, the procedure also permits estimation of the mean square error of the bias corrected estimator. The method is applied for estimating the prediction mean square error in small area estimation of proportions under a generalized mixed model. Empirical comparisons with jackknife and bootstrap methods are presented. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
457
472
http://hdl.handle.net/10.1093/biomet/ass010
application/pdf
Access to full text is restricted to subscribers.
D. Pfeffermann
S. Correa
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:703-7162012-08-31RePEc:oup:biomet
article
Penalized empirical likelihood and growing dimensional general estimating equations
When a parametric likelihood function is not specified for a model, estimating equations may provide an instrument for statistical inference. Qin and Lawless (1994) illustrated that empirical likelihood makes optimal use of these equations in inferences for fixed low-dimensional unknown parameters. In this paper, we study empirical likelihood for general estimating equations with growing high dimensionality and propose a penalized empirical likelihood approach for parameter estimation and variable selection. We quantify the asymptotic properties of empirical likelihood and its penalized version, and show that penalized empirical likelihood has the oracle property. The performance of the proposed method is illustrated via simulated applications and a data analysis. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
703
716
http://hdl.handle.net/10.1093/biomet/ass014
application/pdf
Access to full text is restricted to subscribers.
Chenlei Leng
Cheng Yong Tang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:481-4872012-08-31RePEc:oup:biomet
article
Structuring shrinkage: some correlated priors for regression
This paper develops a rich class of sparsity priors for regression effects that encourage shrinkage of both regression effects and contrasts between effects to zero whilst leaving sizeable real effects largely unshrunk. The construction of these priors uses some properties of normal-gamma distributions to include design features in the prior specification, but has general relevance to any continuous sparsity prior. Specific prior distributions are developed for serial dependence between regression effects and correlation within groups of regression effects. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
481
487
http://hdl.handle.net/10.1093/biomet/asr082
application/pdf
Access to full text is restricted to subscribers.
J. E. Griffin
P. J. Brown
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:733-7402012-08-31RePEc:oup:biomet
article
Positive definite estimators of large covariance matrices
Using convex optimization, we construct a sparse estimator of the covariance matrix that is positive definite and performs well in high-dimensional settings. A lasso-type penalty is used to encourage sparsity and a logarithmic barrier function is used to enforce positive definiteness. Consistency and convergence rate bounds are established as both the number of variables and sample size diverge. An efficient computational algorithm is developed and the merits of the approach are illustrated with simulations and a speech signal classification example. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
733
740
http://hdl.handle.net/10.1093/biomet/ass025
application/pdf
Access to full text is restricted to subscribers.
Adam J. Rothman
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:299-3132012-08-31RePEc:oup:biomet
article
Componentwise classification and clustering of functional data
The infinite dimension of functional data can challenge conventional methods for classification and clustering. A variety of techniques have been introduced to address this problem, particularly in the case of prediction, but the structural models that they involve can be too inaccurate, or too abstract, or too difficult to interpret, for practitioners. In this paper, we develop approaches to adaptively choose components, enabling classification and clustering to be reduced to finite-dimensional problems. We explore and discuss properties of these methodologies. Our techniques involve methods for estimating classifier error rate and cluster tightness, and for choosing both the number of components, and their locations, to optimize these quantities. A major attraction of this approach is that it allows identification of parts of the function domain that convey important information for classification and clustering. It also permits us to determine regions that are relevant to one of these analyses but not the other. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
299
313
http://hdl.handle.net/10.1093/biomet/ass003
application/pdf
Access to full text is restricted to subscribers.
A. Delaigle
P. Hall
N. Bathia
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:599-6132012-08-31RePEc:oup:biomet
article
Nonparametric incidence estimation from prevalent cohort survival data
Incidence is an important epidemiological concept most suitably studied using an incident cohort study. However, data are often collected from the more feasible prevalent cohort study, whereby diseased individuals are recruited through a cross-sectional survey and followed in time. In the absence of temporal trends in survival, we derive an efficient nonparametric estimator of the cumulative incidence based on such data and study its asymptotic properties. Arbitrary calendar time variations in disease incidence are allowed. Age-specific incidence and adjustments for both stratified sampling and temporal variations in survival are also discussed. Simulation results are presented and data from the Canadian Study of Health and Aging are analysed to infer the incidence of dementia in the Canadian elderly population. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
599
613
http://hdl.handle.net/10.1093/biomet/ass017
application/pdf
Access to full text is restricted to subscribers.
Marco Carone
Masoud Asgharian
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:273-2842012-08-31RePEc:oup:biomet
article
Stochastic blockmodels with a growing number of classes
We present asymptotic and finite-sample results on the use of stochastic blockmodels for the analysis of network data. We show that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least poly-logarithmically in this size. We also establish finite-sample confidence bounds on maximum-likelihood blockmodel parameter estimates from data comprising independent Bernoulli random variates; these results hold uniformly over class assignment. We provide simulations verifying the conditions sufficient for our results, and conclude by fitting a logit parameterization of a stochastic blockmodel with covariates to a network data example comprising self-reported school friendships, resulting in block estimates that reveal residual structure. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
273
284
http://hdl.handle.net/10.1093/biomet/asr053
application/pdf
Access to full text is restricted to subscribers.
D. S. Choi
P. J. Wolfe
E. M. Airoldi
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:717-7312012-08-31RePEc:oup:biomet
article
On the robustness of the adaptive lasso to model misspecification
Penalization methods have been shown to yield both consistent variable selection and oracle parameter estimation under correct model specification. In this article, we study such methods under model misspecification, where the assumed form of the regression function is incorrect, including generalized linear models for uncensored outcomes and the proportional hazards model for censored responses. Estimation with the adaptive least absolute shrinkage and selection operator, lasso, penalty is proven to achieve sparse estimation of regression coefficients under misspecification. The resulting estimators are selection consistent, asymptotically normal and oracle, where the selection is based on the limiting values of the parameter estimators obtained using the misspecified model without penalization. We further derive conditions under which the penalized estimators from the misspecified model may yield selection consistency under the true model. The robustness is explored numerically via simulation and an application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
717
731
http://hdl.handle.net/10.1093/biomet/ass027
application/pdf
Access to full text is restricted to subscribers.
W. Lu
Y. Goldberg
J. P. Fine
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:315-3252012-08-31RePEc:oup:biomet
article
Global optimality of nonconvex penalized estimators
Nonconvex penalties such as the smoothly clipped absolute deviation or minimax concave penalties have desirable properties such as the oracle property, even when the dimension of the predictive variables is large. However, checking whether a given local minimizer has such properties is not easy since there can be many local minimizers. In this paper, we give sufficient conditions under which a local minimizer is unique, and show that the oracle estimator becomes the unique local minimizer with probability tending to one. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
315
325
http://hdl.handle.net/10.1093/biomet/asr084
application/pdf
Access to full text is restricted to subscribers.
Yongdai Kim
Sunghoon Kwon
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:379-3922012-08-31RePEc:oup:biomet
article
Efficient estimation for the Cox model with varying coefficients
A proportional hazards model with varying coefficients allows one to examine the extent to which covariates interact nonlinearly with an exposure variable. A global partial likelihood method, in contrast with the local partial likelihood method of Fan et al. (2006), is proposed for estimation of varying coefficient functions. The proposed estimators are proved to be consistent and asymptotically normal. Semiparametric efficiency of the estimators is demonstrated in terms of their linear functionals. Evidence in support of the superiority of the method is presented in numerical studies and real examples. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
379
392
http://hdl.handle.net/10.1093/biomet/asr081
application/pdf
Access to full text is restricted to subscribers.
Kani Chen
Huazhen Lin
Yong Zhou
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:439-4562012-08-31RePEc:oup:biomet
article
Improved double-robust estimation in missing data and causal inference models
Recently proposed double-robust estimators for a population mean from incomplete data and for a finite number of counterfactual means can have much higher efficiency than the usual double-robust estimators under misspecification of the outcome model. In this paper, we derive a new class of double-robust estimators for the parameters of regression models with incomplete cross-sectional or longitudinal data, and of marginal structural mean models for cross-sectional data with similar efficiency properties. Unlike the recent proposals, our estimators solve outcome regression estimating equations. In a simulation study, the new estimator shows improvements in variance relative to the standard double-robust estimator that are in agreement with those suggested by asymptotic theory. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
439
456
http://hdl.handle.net/10.1093/biomet/ass013
application/pdf
Access to full text is restricted to subscribers.
Andrea Rotnitzky
Quanhong Lei
Mariela Sued
James M. Robins
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:285-2982012-08-31RePEc:oup:biomet
article
Doubly misspecified models
Estimation bias arising from local model uncertainty and incomplete data has been studied by Copas & Eguchi (2005) under the assumption of a correctly specified marginal model. We extend the approach to allow additional local uncertainty in the assumed marginal model, arguing that this is almost unavoidable for nonlinear problems. We present a general bias analysis and sensitivity procedure for such doubly misspecified models and illustrate the breadth of application through three examples: logistic regression with a missing confounder, measurement error for binary responses and survival analysis with frailty. We show that a double-the-variance rule is not conservative under double misspecification. The ideas are brought together in a meta-analysis of studies of rehabilitation rates for juvenile offenders. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
285
298
http://hdl.handle.net/10.1093/biomet/asr085
application/pdf
Access to full text is restricted to subscribers.
N. X. Lin
J. Q. Shi
R. Henderson
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:393-4042012-08-31RePEc:oup:biomet
article
Nonparametric inference for assessing treatment efficacy in randomized clinical trials with a time-to-event outcome and all-or-none compliance
To evaluate the biological efficacy of a treatment in a randomized clinical trial, one needs to compare patients in the treatment arm who actually received treatment with the subgroup of patients in the control arm who would have received treatment had they been randomized into the treatment arm. In practice, subgroup membership in the control arm is usually unobservable. This paper develops a nonparametric inference procedure to compare subgroup probabilities with right-censored time-to-event data and unobservable subgroup membership in the control arm. We also present a procedure to estimate the onset and duration of treatment effect. The performance of our method is evaluated by simulation. An illustration is given using a randomized clinical trial for melanoma. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
393
404
http://hdl.handle.net/10.1093/biomet/ass004
application/pdf
Access to full text is restricted to subscribers.
Robert M. Elashoff
Gang Li
Ying Zhou
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:345-3612012-08-31RePEc:oup:biomet
article
Analysing bivariate survival data with interval sampling and application to cancer epidemiology
In biomedical studies, ordered bivariate survival data are frequently encountered when bivariate failure events are used as outcomes to identify the progression of a disease. In cancer studies, interest could be focused on bivariate failure times, for example, time from birth to cancer onset and time from cancer onset to death. This paper considers a sampling scheme, termed interval sampling, in which the first failure event is identified within a calendar time interval, the time of the initiating event can be retrospectively confirmed and the occurrence of the second failure event is observed subject to right censoring. In a cancer data application, the initiating, first and second events could correspond to birth, cancer onset and death. The fact that the data are collected conditional on the first failure event occurring within a time interval induces bias. Interval sampling is widely used for collection of disease registry data by governments and medical institutions, though the interval sampling bias is frequently overlooked by researchers. This paper develops statistical methods for analysing such data. Semiparametric methods are proposed under semi-stationarity and stationarity. Numerical studies demonstrate that the proposed estimation approaches perform well with moderate sample sizes. We apply the proposed methods to ovarian cancer registry data. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
345
361
http://hdl.handle.net/10.1093/biomet/ass009
application/pdf
Access to full text is restricted to subscribers.
Hong Zhu
Mei-Cheng Wang
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:488-4932012-08-31RePEc:oup:biomet
article
Information dynamics and optimal sampling in capture-recapture
The build up of information in a continued capture-recapture experiment of simple random sampling of an open population is studied by predicting the conditional approximate Fisher information for abundance in data from one survey given the previous data. By neglecting the stochasticity in survival, a simple approximate likelihood is obtained. Optimal temporal allocation of a given total effort is found by numerical optimization for various objective functions based on the approximate Fisher information. For aerial photographic surveys of bowhead whales, the performance of estimates of abundance and of demographic parameters is compared between constant yearly survey effort and nominally optimal sampling by simulating a realistic model over 50 years. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
488
493
http://hdl.handle.net/10.1093/biomet/ass001
application/pdf
Access to full text is restricted to subscribers.
T. Schweder
D. Sadykova
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:675-6862012-08-31RePEc:oup:biomet
article
Objective Bayes, conditional inference and the signed root likelihood ratio statistic
Bayesian properties of the signed root likelihood ratio statistic are analysed. Conditions for first-order probability matching are derived by the examination of the Bayesian posterior and frequentist means of this statistic. Second-order matching conditions are shown to arise from matching of the Bayesian posterior and frequentist variances of a mean-adjusted version of the signed root statistic. Conditions for conditional probability matching in ancillary statistic models are derived and discussed. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
675
686
http://hdl.handle.net/10.1093/biomet/ass028
application/pdf
Access to full text is restricted to subscribers.
Thomas J. Diciccio
Todd A. Kuffner
G. Alastair Young
oai:RePEc:oup:biomet:v:99:y:2012:i:3:p:755-7622012-08-31RePEc:oup:biomet
article
Quadratic inference function approach to merging longitudinal studies: validation and joint estimation
Merging data from multiple studies has been widely adopted in biomedical research. In this paper, we consider two major issues related to merging longitudinal datasets. We first develop a rigorous hypothesis testing procedure to assess the validity of data merging, and then propose a flexible joint estimation procedure that enables us to analyse merged data and to account for different within-subject correlations and follow-up schedules in different studies. We establish large sample properties for the proposed procedures. We compare our method with meta analysis and generalized estimating equations and show that our test provides robust control of Type I error against both misspecification of working correlation structures and heterogeneous dispersion parameters. Our joint estimating procedure leads to an improvement in estimation efficiency on all regression coefficients after data merging is validated. Copyright 2012, Oxford University Press.
3
2012
99
Biometrika
755
762
http://hdl.handle.net/10.1093/biomet/ass021
application/pdf
Access to full text is restricted to subscribers.
Fei Wang
Lu Wang
Peter X.-K. Song
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:423-4382012-08-31RePEc:oup:biomet
article
Multiple imputation in quantile regression
We propose a multiple imputation estimator for parameter estimation in a quantile regression model when some covariates are missing at random. The estimation procedure fully utilizes the entire dataset to achieve increased efficiency, and the resulting coefficient estimators are root-n consistent and asymptotically normal. To protect against possible model misspecification, we further propose a shrinkage estimator, which automatically adjusts for possible bias. The finite sample performance of our estimator is investigated in a simulation study. Finally, we apply our methodology to part of the Eating at American's Table Study data, investigating the association between two measures of dietary intake. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
423
438
http://hdl.handle.net/10.1093/biomet/ass007
application/pdf
Access to full text is restricted to subscribers.
Ying Wei
Yanyuan Ma
Raymond J. Carroll
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:494-5012012-08-31RePEc:oup:biomet
article
A generalized Dunnett test for multi-arm multi-stage clinical studies with treatment selection
We generalize the Dunnett test to derive efficacy and futility boundaries for a flexible multi-arm multi-stage clinical trial for a normally distributed endpoint with known variance. We show that the boundaries control the familywise error rate in the strong sense. The method is applicable for any number of treatment arms, number of stages and number of patients per treatment per stage. It can be used for a wide variety of boundary types or rules derived from α-spending functions. Additionally, we show how sample size can be computed under a least favourable configuration power requirement and derive formulae for expected sample sizes. Copyright 2012, Oxford University Press.
2
2012
99
Biometrika
494
501
http://hdl.handle.net/10.1093/biomet/ass002
application/pdf
Access to full text is restricted to subscribers.
D. Magirr
T. Jaki
J. Whitehead
oai:RePEc:oup:biomet:v:99:y:2012:i:2:p:509-5092012-08-31RePEc:oup:biomet
article
'On measuring the variability of small area estimators under a basic area level model'
2
2012
99
Biometrika
509
509
http://hdl.handle.net/10.1093/biomet/ass016
application/pdf
Access to full text is restricted to subscribers.
Gauri Sankar Datta
J. N. K. Rao
David Daniel Smith
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:371-3852013-03-04RePEc:oup:biomet
article
Pairwise dependence diagnostics for clustered failure-time data
Frailty and copula models specify a parametric dependence structure for multivariate failure-time data. Estimation of some joint quantities can be highly sensitive to the assumed parametric form, and hence model fit is an important issue. This paper lays out a general diagnostic framework for evaluating and selecting frailty and copula models. The approach is based on the cumulative sum of residuals that are calculated in bivariate time. The residuals reflect the difference between the observed and expected bivariate association structures. The proposed model-checking process is interpretable with a limiting distribution which can be approximated using the bootstrap. Simulations and a data example illustrate the practical application of the method. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
371
385
http://hdl.handle.net/10.1093/biomet/asm024
application/pdf
Access to full text is restricted to subscribers.
David V. Glidden
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:763-7832013-03-04RePEc:oup:biomet
article
Estimation of treatment effects in randomised trials with non-compliance and a dichotomous outcome using structural mean models
We consider estimation of the received treatment effect on a dichotomous outcome in randomised trials with non-compliance. We explore inference about the parameters of the structural mean models of Robins (1994, 1997) and Robins et al. (1999). We show that, in contrast to the additive and multiplicative structural mean models for continuous and count outcomes, unbiased estimating functions for a nonzero (structural) treatment effect parameter do not exist in the presence of many continuous and discrete baseline covariates, even when the randomisation probabilities are known. The best that can be hoped for are estimators, such as those proposed in this paper, that are guaranteed both to estimate consistently the (null) treatment effect when the null hypothesis of no treatment effect is true and to have small bias when the true treatment effect is close to but not equal to zero. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
763
783
http://hdl.handle.net/10.1093/biomet/91.4.763
text/html
Access to full text is restricted to subscribers.
James Robins
Andrea Rotnitzky
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:591-6022013-03-04RePEc:oup:biomet
article
Model selection for Gaussian concentration graphs
A multivariate Gaussian graphical Markov model for an undirected graph G, also called a covariance selection model or concentration graph model, is defined in terms of the Markov properties, i.e. conditional independences associated with G, which in turn are equivalent to specified zeros among the set of pairwise partial correlation coefficients. By means of Fisher's z-transformation and Šidák's correlation inequality, conservative simultaneous confidence intervals for the entire set of partial correlations can be obtained, leading to a simple method for model selection that controls the overall error rate for incorrect edge inclusion. The simultaneous p-values corresponding to the partial correlations are partitioned into three disjoint sets, a significant set S, an indeterminate set I and a nonsignificant set N. Our model selection method selects two graphs, a graph Ĝ-sub-SI whose edges correspond to the set S∪I, and a more conservative graph Ĝ-sub-S whose edges correspond to S only. Similar considerations apply to covariance graph models, which are defined in terms of marginal independence rather than conditional independence. The method is applied to some well-known examples and to simulated data. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
591
602
Mathias Drton
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:601-6192013-03-04RePEc:oup:biomet
article
Joint modelling of paired sparse functional data using principal components
We propose a modelling framework to study the relationship between two paired longitudinally observed variables. The data for each variable are viewed as smooth curves measured at discrete time-points plus random errors. While the curves for each variable are summarized using a few important principal components, the association of the two longitudinal variables is modelled through the association of the principal component scores. We use penalized splines to model the mean curves and the principal component curves, and cast the proposed model into a mixed-effects model framework for model fitting, prediction and inference. The proposed method can be applied in the difficult case in which the measurement times are irregular and sparse and may differ widely across individuals. Use of functional principal components enhances model interpretation and improves statistical and numerical stability of the parameter estimates. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
601
619
http://hdl.handle.net/10.1093/biomet/asn035
application/pdf
Access to full text is restricted to subscribers.
Lan Zhou
Jianhua Z. Huang
Raymond J. Carroll
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:179-1952013-03-04RePEc:oup:biomet
article
A shrinkage estimator for spectral densities
We propose a shrinkage estimator for spectral densities based on a multilevel normal hierarchical model. The first level captures the sampling variability via a likelihood constructed using the asymptotic properties of the periodogram. At the second level, the spectral density is shrunk towards a parametric time series model. To avoid selecting a particular parametric model for the second level, a third level is added which induces an estimator that averages over a class of parsimonious time series models. The estimator derived from this model, the model averaged shrinkage estimator, is consistent, is shown to be highly competitive with other spectral density estimators via simulations, and is computationally inexpensive. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
179
195
http://hdl.handle.net/10.1093/biomet/93.1.179
text/html
Access to full text is restricted to subscribers.
Carsten H. Botts
Michael J. Daniels
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:619-6322013-03-04RePEc:oup:biomet
article
Semiparametric Box–Cox power transformation models for censored survival observations
The accelerated failure time model specifies that the logarithm of the failure time is linearly related to the covariate vector without assuming a parametric error distribution. In this paper, we consider the semiparametric Box--Cox transformation model, which includes the above regression model as a special case, to analyse possibly censored failure time observations. Inference procedures for the transformation and regression parameters are proposed via a resampling technique. Prediction of the survival function of future subjects with a specific covariate vector is also provided via pointwise and simultaneous interval estimates. All the proposals are illustrated with datasets from two clinical studies. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
619
632
http://hdl.handle.net/10.1093/biomet/92.3.619
text/html
Access to full text is restricted to subscribers.
Tianxi Cai
Lu Tian
L. J. Wei
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:919-9312013-03-04RePEc:oup:biomet
article
Small area estimation when auxiliary information is measured with error
Small area estimation methods typically combine direct estimates from a survey with predictions from a model in order to obtain estimates of population quantities with reduced mean squared error. When the auxiliary information used in the model is measured with error, using a small area estimator such as the Fay--Herriot estimator while ignoring measurement error may be worse than simply using the direct estimator. We propose a new small area estimator that accounts for sampling variability in the auxiliary information, and derive its properties, in particular showing that it is approximately unbiased. The estimator is applied to predict quantities measured in the U.S. National Health and Nutrition Examination Survey, with auxiliary information from the U.S. National Health Interview Survey. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
919
931
http://hdl.handle.net/10.1093/biomet/asn048
application/pdf
Access to full text is restricted to subscribers.
Lynn M. R. Ybarra
Sharon L. Lohr
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:976-9812013-03-04RePEc:oup:biomet
article
Conditional likelihood inference under complex ascertainment using data augmentation
In many applications, particularly in genetics, samples are drawn under complex ascertainment rules. For example, families may only be selected for study if two or more siblings have trait values exceeding some threshold. The correct likelihood for inference in such situations involves the probabilities of ascertainment, and these are frequently intractable. A consistent, but not fully efficient, method of analysis of such studies is proposed. The main idea is to augment the data with additional pseudo-observations simulated under the ascertainment scheme, and to analyse using a conditional likelihood for discrimination between true observations and pseudo-observations. Ascertainment probabilities cancel in this likelihood. The method is illustrated with a simple example involving left-truncated failure times. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
976
981
David Clayton
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:719-7332013-03-04RePEc:oup:biomet
article
Survival analysis with temporal covariate effects
We propose a natural generalization of the Cox regression model, in which the regression coefficients have direct interpretations as temporal covariate effects on the survival function. Under the conditionally independent censoring mechanism, we develop a smoothing-free estimation procedure with a set of martingale-based equations. Our estimator is shown to be uniformly consistent and to converge weakly to a Gaussian process. A simple resampling method is proposed for approximating the limiting distribution of the estimated coefficients. Second-stage inferences with time-varying coefficients are developed accordingly. Simulations and a real example illustrate the practical utility of the proposed method. Finally, we extend this proposal of temporal covariate effects to the general class of linear transformation models and also establish a connection with the additive hazards model. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
719
733
http://hdl.handle.net/10.1093/biomet/asm058
application/pdf
Access to full text is restricted to subscribers.
Limin Peng
Yijian Huang
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:627-6462013-03-04RePEc:oup:biomet
article
Simulation and inference for stochastic volatility models driven by Lévy processes
We study Ornstein-Uhlenbeck stochastic processes driven by Lévy processes, and extend them to more general non-Ornstein-Uhlenbeck models. In particular, we investigate the means of making the correlation structure in the volatility process more flexible. For one model, we implement a method for introducing quasi long-memory into the volatility model. We demonstrate that the models can be fitted to real share price returns data. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
627
646
http://hdl.handle.net/10.1093/biomet/asm048
application/pdf
Access to full text is restricted to subscribers.
Matthew P. S. Gander
David A. Stephens
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:627-6402013-03-04RePEc:oup:biomet
article
Efficient estimation of semiparametric transformation models for counting processes
A class of semiparametric transformation models is proposed to characterise the effects of possibly time-varying covariates on the intensity functions of counting processes. The class includes the proportional intensity model and linear transformation models as special cases. Nonparametric maximum likelihood estimators are developed for the regression parameters and cumulative intensity functions of these models based on censored data. The estimators are shown to be consistent and asymptotically normal. The limiting variances for the estimators of the regression parameters achieve the semi-parametric efficient bounds and can be consistently estimated. The limiting variances for the estimators of smooth functionals of the cumulative intensity function can also be consistently estimated. Simulation studies reveal that the proposed inference procedures perform well in practical settings. Two medical studies are provided. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
627
640
http://hdl.handle.net/10.1093/biomet/93.3.627
text/html
Access to full text is restricted to subscribers.
Donglin Zeng
D. Y. Lin
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:997-10012013-03-04RePEc:oup:biomet
article
On consistency of Kendall's tau under censoring
Necessary and sufficient conditions for consistency of a simple estimator of Kendall's tau under bivariate censoring are presented. The results are extended to data subject to bivariate left truncation as well as right censoring. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
997
1001
http://hdl.handle.net/10.1093/biomet/asn037
application/pdf
Access to full text is restricted to subscribers.
David Oakes
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:873-8922013-03-04RePEc:oup:biomet
article
A General Approach to the Predictability Issue in Survival Analysis with Applications
Very often in survival analysis one has to study martingale integrals where the integrand is not predictable and where the counting process theory of martingales is not directly applicable, as for example in nonparametric and semiparametric applications where the integrand is based on a pilot estimate. We call this the predictability issue in survival analysis. The problem has been resolved by approximations of the integrand by predictable functions which have been justified by ad hoc procedures. We present a general approach to the solution of this problem. The usefulness of the approach is shown in three applications. In particular, we argue that earlier ad hoc procedures do not work in higher-dimensional smoothing problems in survival analysis. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
873
892
http://hdl.handle.net/10.1093/biomet/asm062
application/pdf
Access to full text is restricted to subscribers.
Enno Mammen
Jens Perch Nielsen
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:599-6142013-03-04RePEc:oup:biomet
article
Testing parametric assumptions of trends of a nonstationary time series
The paper considers testing whether the mean trend of a nonstationary time series is of certain parametric forms. A central limit theorem for the integrated squared error is derived, and a hypothesis-testing procedure is proposed. The method is illustrated in a simulation study, and is applied to assess the mean pattern of lifetime-maximum wind speeds of global tropical cyclones from 1981 to 2006. We also revisit the trend pattern in the central England temperature series. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
599
614
http://hdl.handle.net/10.1093/biomet/asr017
application/pdf
Access to full text is restricted to subscribers.
Ting Zhang
Wei Biao Wu
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:809-8252013-03-04RePEc:oup:biomet
article
Generalized Spatial Dirichlet Process Models
Many models for the study of point-referenced data explicitly introduce spatial random effects to capture residual spatial association. These spatial effects are customarily modelled as a zero-mean stationary Gaussian process. The spatial Dirichlet process introduced by Gelfand et al. (2005) produces a random spatial process which is neither Gaussian nor stationary. Rather, it varies about a process that is assumed to be stationary and Gaussian. The spatial Dirichlet process arises as a probability-weighted collection of random surfaces. This can be limiting for modelling and inferential purposes since it insists that a process realization must be one of these surfaces. We introduce a random distribution for the spatial effects that allows different surface selection at different sites. Moreover, we can specify the model so that the marginal distribution of the effect at each site still comes from a Dirichlet process. The development is offered constructively, providing a multivariate extension of the stick-breaking representation of the weights. We then introduce mixing using this generalized spatial Dirichlet process. We illustrate with a simulated dataset of independent replications and note that we can embed the generalized process within a dynamic model specification to eliminate the independence assumption. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
809
825
http://hdl.handle.net/10.1093/biomet/asm071
application/pdf
Access to full text is restricted to subscribers.
Jason A. Duan
Michele Guindani
Alan E. Gelfand
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:415-4262013-03-04RePEc:oup:biomet
article
Aster models for life history analysis
We present a new class of statistical models, designed for life history analysis of plants and animals, that allow joint analysis of data on survival and reproduction over multiple years, allow for variables having different probability distributions, and correctly account for the dependence of variables on earlier variables. We illustrate their utility with an analysis of data taken from an experimental study of Echinacea angustifolia sampled from remnant prairie populations in western Minnesota. These models generalize both generalized linear models and survival analysis. The joint distribution is factorized as a product of conditional distributions, each an exponential family with the conditioning variable being the sample size of the conditional distribution. The model may be heterogeneous, each conditional distribution being from a different exponential family. We show that the joint distribution is from a flat exponential family and derive its canonical parameters, Fisher information and other properties. These models are implemented in an R package 'aster' available from the Comprehensive R Archive Network, CRAN. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
415
426
http://hdl.handle.net/10.1093/biomet/asm030
application/pdf
Access to full text is restricted to subscribers.
Charles J. Geyer
Stuart Wagenius
Ruth G. Shaw
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:461-4702013-03-04RePEc:oup:biomet
article
Efficient Robbins--Monro procedure for binary data
The Robbins--Monro procedure does not perform well in the estimation of extreme quantiles, because the procedure is implemented using asymptotic results, which are not suitable for binary data. Here we propose a modification of the Robbins--Monro procedure and derive the optimal procedure for binary data under some reasonable approximations. The improvement obtained by using the optimal procedure for the estimation of extreme quantiles is substantial. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
461
470
V. Roshan Joseph
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:175-1862013-03-04RePEc:oup:biomet
article
Reducing variability of crossvalidation for smoothing-parameter choice
One of the attractions of crossvalidation, as a tool for smoothing-parameter choice, is its applicability to a wide variety of estimator types and contexts. However, its detractors comment adversely on the relatively high variance of crossvalidatory smoothing parameters, noting that this compromises the performance of the estimators in which those parameters are used. We show that the variability can be reduced simply, significantly and reliably by employing bootstrap aggregation or bagging. We establish that in theory, when bagging is implemented using an adaptively chosen resample size, the variability of crossvalidation can be reduced by an order of magnitude. However, it is arguably more attractive to use a simpler approach, based for example on half-sample bagging, which can reduce variability by approximately 50%. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
175
186
http://hdl.handle.net/10.1093/biomet/asn068
application/pdf
Access to full text is restricted to subscribers.
Peter Hall
Andrew P. Robinson
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:891-9052013-03-04RePEc:oup:biomet
article
Model diagnostic tests for selecting informative correlation structure in correlated data
In the generalized method of moments approach to longitudinal data analysis, unbiased estimating functions can be constructed to incorporate both the marginal mean and the correlation structure of the data. Increasing the number of parameters in the correlation structure corresponds to increasing the number of estimating functions. Thus, building a correlation model is equivalent to selecting estimating functions. This paper proposes a chi-squared test to choose informative unbiased estimating functions. We show that this methodology is useful for identifying which source of correlation it is important to incorporate when there are multiple possible sources of correlation. This method can also be applied to determine the optimal working correlation for the generalized estimating equation approach. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
891
905
http://hdl.handle.net/10.1093/biomet/asn051
application/pdf
Access to full text is restricted to subscribers.
Annie Qu
J. Jack Lee
Bruce G. Lindsay
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:777-7902013-03-04RePEc:oup:biomet
article
Observation-driven models for Poisson counts
This paper is concerned with a general class of observation-driven models for time series of counts whose conditional distributions given past observations and explanatory variables follow a Poisson distribution. These models provide a flexible framework for modelling a wide range of dependence structures. Conditions for stationarity and ergodicity of these processes are established from which the large-sample properties of the maximum likelihood estimators can be derived. Simulations are provided to give additional insight into the finite-sample behaviour of the estimators. Finally an application to a regression model for daily counts of asthma presentations at a Sydney hospital is described. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
777
790
Richard A. Davis
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:705-7142013-03-04RePEc:oup:biomet
article
The geometry of biplot scaling
A simple geometry allows the main properties of matrix approximations used in biplot displays to be developed. It establishes orthogonal components of an analysis of variance, from which different contributions to approximations may be assessed. Particular attention is paid to approximations that share the same singular vectors, in which case the solution space is a convex cone. Two- and three-dimensional approximations are examined in detail and then the geometry is interpreted for different forms of the matrix being approximated. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
705
714
J. C. Gower
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:195-2092013-03-04RePEc:oup:biomet
article
Generalised likelihood ratio tests for spectral density
There are few techniques available for testing whether or not a family of parametric times series models fits a set of data reasonably well without serious restrictions on the forms of alternative models. In this paper, we consider generalised likelihood ratio tests of whether or not the spectral density function of a stationary time series admits certain parametric forms. We propose a bias correction method for the generalised likelihood ratio test of Fan et al. (2001). In particular, our methods can be applied to test whether or not a residual series is white noise. Sampling properties of the proposed tests are established. A bootstrap approach is proposed for estimating the null distribution of the test statistics. Simulation studies investigate the accuracy of the proposed bootstrap estimate and compare the power of the various ways of constructing the generalised likelihood ratio tests as well as some classic methods like the Cramer--von Mises and Ljung--Box tests. Our results favour the newly proposed bias reduction method using the local likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
195
209
Jianqing Fan
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:459-4682013-03-04RePEc:oup:biomet
article
Optimal nested row-column designs with specified components
We consider nested row-column designs where each of the row and column component designs is specified. For the case that each of the component designs has second-order balance, we define such a nested row-column design to be special if it is generally balanced, with the smallest possible number of canonical treatment contrasts having the lower canonical efficiency factor in both components. We show that if any special row-column design exists then it is A-optimal over all nested row-column designs with the given components. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
459
468
http://hdl.handle.net/10.1093/biomet/asm039
application/pdf
Access to full text is restricted to subscribers.
R. A. Bailey
E. R. Williams
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:27-432013-03-04RePEc:oup:biomet
article
Bayesian information criteria and smoothing parameter selection in radial basis function networks
By extending Schwarz's (1978) basic idea we derive a Bayesian information criterion which enables us to evaluate models estimated by the maximum penalised likelihood method or the method of regularisation. The proposed criterion is applied to the choice of smoothing parameters and the number of basis functions in radial basis function network models. Monte Carlo experiments were conducted to examine the performance of the nonlinear modelling strategy of estimating the weight parameters by regularisation and then determining the adjusted parameters by the Bayesian information criterion. The simulation results show that our modelling procedure performs well in various situations. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
27
43
Sadanori Konishi
oai:RePEc:oup:biomet:v:91:y:2004:i:4:p:849-8622013-03-04RePEc:oup:biomet
article
A semiparametric changepoint model
A semiparametric changepoint model is considered and the empirical likelihood method is applied to detect the change from a distribution to a weighted distribution in a sequence of independent random variables. The maximum likelihood changepoint estimator is shown to be consistent. The empirical likelihood ratio test statistic is proved to have the same limit null distribution as that with parametric models. A data-based test for the validity of the models is also proposed. Simulation shows the sensitivity and robustness of the semiparametric approach. The methods are applied to some classical datasets such as the Nile River data and stock price data. Copyright 2004, Oxford University Press.
4
2004
91
December
Biometrika
849
862
http://hdl.handle.net/10.1093/biomet/91.4.849
text/html
Access to full text is restricted to subscribers.
Zhong Guan
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:469-4852013-03-04RePEc:oup:biomet
article
Resampling-based empirical prediction: an application to small area estimation
Best linear unbiased prediction is well known for its wide range of applications including small area estimation. While the theory is well established for mixed linear models and under normality of the error and mixing distributions, the literature is sparse for nonlinear mixed models under nonnormality of the error distribution or of the mixing distributions. We develop a resampling-based unified approach for predicting mixed effects under a generalized mixed model set-up. Second-order-accurate nonnegative estimators of mean squared prediction errors are also developed. Given the parametric model, the proposed methodology automatically produces estimators of the small area parameters and their mean squared prediction errors, without requiring explicit analytical expressions for the mean squared prediction errors. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
469
485
http://hdl.handle.net/10.1093/biomet/asm035
application/pdf
Access to full text is restricted to subscribers.
Soumendra N. Lahiri
Tapabrata Maiti
Myron Katzoff
Van Parsons
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:462-4692013-03-04RePEc:oup:biomet
article
On some models for multivariate binary variables parallel in complexity with the multivariate Gaussian distribution
It is shown that both the simple form of the Rasch model for binary data and a generalisation are essentially equivalent to special dichotomised Gaussian models. In these the underlying Gaussian structure is of single factor form; that is, the correlations between the binary variables arise via a single underlying variable, called in psychometrics a latent trait. The implications for scoring of the binary variables are discussed, in particular regarding the scoring system as in effect estimating the latent trait. In particular, the role of the simple sum score, in effect the total number of 'successes', is examined. Relations with the principal component analysis of binary data are outlined and some connections with the quadratic exponential binary model are sketched. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
462
469
D. R. Cox
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:119-1332013-03-04RePEc:oup:biomet
article
Multiscale generalised linear models for nonparametric function estimation
We present a method for extracting information about both the scale and trend of local components of an inhomogeneous function in a nonparametric generalised linear model. Our multiscale framework combines recursive partitions, which allow for the incorporation of scale in a natural manner, with systems of piecewise polynomials supported on the partition intervals, which serve to summarise the smooth trend within each interval. Our estimators are formulated as solutions of complexity-penalised likelihood optimisations, where the penalty seeks to limit the number of intervals used to model the data. The actual calculation of the estimators may be accomplished using standard software routines for generalised linear models, within the context of efficient, tree-based, polynomial-time algorithms. A risk analysis shows that these estimators achieve the same asymptotic rates in the nonparametric generalised linear model as the classical wavelet-based estimators in the Gaussian 'function plus noise' model, for suitably defined ranges of Besov spaces. Numerical simulations show that the method tends to perform at least as well as, and often better than, alternative wavelet-based methodologies in the context of finite samples, while applications to gamma-ray burst data in astronomy and packet loss data in computer network tra.c analysis confirm its practical relevance. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
119
133
http://hdl.handle.net/10.1093/biomet/92.1.119
text/html
Access to full text is restricted to subscribers.
Eric D. Kolaczyk
Robert D. Nowak
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:119-1322013-03-04RePEc:oup:biomet
article
Tapered empirical likelihood for time series data in time and frequency domains
We investigate data tapering in two formulations of empirical likelihood for time series. One empirical likelihood is formed from tapered data blocks in the time domain and a second is based on the tapered periodogram in the frequency domain. Limiting distributions are provided for both empirical likelihood versions under tapering. Theoretical and simulation evidence indicates that a data taper improves the coverage accuracy of empirical likelihood confidence intervals for time series parameters, such as means and correlations. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
119
132
http://hdl.handle.net/10.1093/biomet/asn071
application/pdf
Access to full text is restricted to subscribers.
Daniel J. Nordman
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:147-1612013-03-04RePEc:oup:biomet
article
On least-squares regression with censored data
The semiparametric accelerated failure time model relates the logarithm of the failure time linearly to the covariates while leaving the error distribution unspecified. The present paper describes simple and reliable inference procedures based on the least-squares principle for this model with right-censored data. The proposed estimator of the vector-valued regression parameter is an iterative solution to the Buckley--James estimating equation with a preliminary consistent estimator as the starting value. The estimator is shown to be consistent and asymptotically normal. A novel resampling procedure is developed for the estimation of the limiting covariance matrix. Extensions to marginal models for multivariate failure time data are considered. The performance of the new inference procedures is assessed through simulation studies. Illustrations with medical studies are provided. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
147
161
http://hdl.handle.net/10.1093/biomet/93.1.147
text/html
Access to full text is restricted to subscribers.
Zhezhen Jin
D. Y. Lin
Zhiliang Ying
oai:RePEc:oup:biomet:v:91:y:2004:i:1:p:81-942013-03-04RePEc:oup:biomet
article
Statistical inference for infinite-dimensional parameters via asymptotically pivotal estimating functions
Suppose that a consistent estimator for an infinite-dimensional parameter can be readily obtained via a set of estimating functions which has a 'good' local linear approximation around the true value of the parameter. However, it may be difficult to estimate the variance function of this estimator well. We show that, if the set of estimating functions evaluated at the true parameter value is 'asymptotically pivotal', then the 'fiducial' distribution of the parameter can be used to approximate the distribution of this consistent estimator. We present three examples to illustrate that the corresponding inference for the parameter can be made via a simple simulation technique without involving complex, high-dimensional nonparametric density estimates. Copyright Biometrika Trust 2004, Oxford University Press.
1
2004
91
March
Biometrika
81
94
M. A. Goldwasser
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:843-8602013-03-04RePEc:oup:biomet
article
Building mixture trees from binary sequence data
We develop a new method for building a hierarchical tree from binary sequence data. It is based on an ancestral mixture model. The sieve parameter in the model plays the role of time in the evolutionary tree of the sequences. By varying the sieve parameter, one can create a hierarchical tree that estimates the population structure at each fixed backward point in time. Application to the clustering of the mitochondrial DNA sequences of Griffiths & Tavare (1994) shows that the approach performs well. Theoretical and computational properties of the ancestral mixture model are further developed. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
843
860
http://hdl.handle.net/10.1093/biomet/93.4.843
text/html
Access to full text is restricted to subscribers.
Shu-Chuan Chen
Bruce G. Lindsay
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:465-4762013-03-04RePEc:oup:biomet
article
Saddlepoint approximations for the Bingham and Fisher–Bingham normalising constants
The Fisher--Bingham distribution is obtained when a multivariate normal random vector is conditioned to have unit length. Its normalising constant can be expressed as an elementary function multiplied by the density, evaluated at 1, of a linear combination of independent noncentral χ-sub-1-super-2 random variables. Hence we may approximate the normalising constant by applying a saddlepoint approximation to this density. Three such approximations, implementation of each of which is straightforward, are investigated: the first-order saddlepoint density approximation, the second-order saddlepoint density approximation and a variant of the second-order approximation which has proved slightly more accurate than the other two. The numerical and theoretical results we present showthat this approach provides highly accurate approximations in a broad spectrum of cases. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
465
476
http://hdl.handle.net/10.1093/biomet/92.2.465
text/html
Access to full text is restricted to subscribers.
A. Kume
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:893-9042013-03-04RePEc:oup:biomet
article
Uniform designs limit aliasing
When fitting a linear regression model to data, aliasing can adversely affect the estimates of the model coefficients and the decision of whether or not a term is significant. Optimal experimental designs give efficient estimators assuming that the true form of the model is known, while robust experimental designs guard against inaccurate estimates caused by model misspecification. Although it is rare for a single design to be both maximally efficient and robust, it is shown here that uniform designs limit the effects of aliasing to yield reasonable efficiency and robustness together. Aberration and resolution measure how well fractional factorial designs guard against the effects of aliasing. Here it is shown that the definitions of aberration and resolution may be generalised to other types of design using the discrepancy. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
893
904
Fred J. Hickernell
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:228-2342013-03-04RePEc:oup:biomet
article
A note on kernel polygons
Jones (1989) has pointed out that piecewise linear interpolated kernel density estimators on a sufficiently fine grid can be visually indistinguishable from the true density. A simple device, the kernel polygon, is proposed for eliminating the evaluation of the normalisation constant of the estimator while retaining its property of being a density function as well as providing practical advantages. The class of uniform and linear kernels of the kernel polygons is given. Finally, we present a simulation study and a real data example in which we compare bandwidth selectors for the kernel polygons. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
228
234
http://hdl.handle.net/10.1093/biomet/93.1.228
text/html
Access to full text is restricted to subscribers.
Chien-Tai Lin
Jyh-Shyang Wu
Chia-Hung Yen
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:709-7192013-03-04RePEc:oup:biomet
article
The Benjamini--Hochberg method with infinitely many contrasts in linear models
Benjamini and Hochberg's method for controlling the false discovery rate is applied to the problem of testing infinitely many contrasts in linear models. Exact, easily calculated critical values are derived, defining a new multiple comparisons method for testing contrasts in linear models. The method is adaptive, depending on the data through the F-statistic, like the Waller--Duncan Bayesian multiple comparisons method. Comparisons with Scheffé's method are given, and the method is extended to the simultaneous confidence intervals of Benjamini and Yekutieli. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
709
719
http://hdl.handle.net/10.1093/biomet/asn033
application/pdf
Access to full text is restricted to subscribers.
Peter H. Westfall
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:105-1182013-03-04RePEc:oup:biomet
article
Theory for penalised spline regression
Penalised spline regression is a popular new approach to smoothing, but its theoretical properties are not yet well understood. In this paper, mean squared error expressions and consistency results are derived by using a white-noise model representation for the estimator. The effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines. The penalised spline regression estimator is shown to achieve the optimal nonparametric convergence rateestablished by Stone (1982). Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
105
118
http://hdl.handle.net/10.1093/biomet/92.1.105
text/html
Access to full text is restricted to subscribers.
Peter Hall
J. D. Opsomer
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:961-9772013-03-04RePEc:oup:biomet
article
Estimating the false discovery rate using the stochastic approximation algorithm
Testing of multiple hypotheses involves statistics that are strongly dependent in some applications, but most work on this subject is based on the assumption of independence. We propose a new method for estimating the false discovery rate of multiple hypothesis tests, in which the density of test scores is estimated parametrically by minimizing the Kullback--Leibler distance between the unknown density and its estimator using the stochastic approximation algorithm, and the false discovery rate is estimated using the ensemble averaging method. Our method is applicable under general dependence between test statistics. Numerical comparisons between our method and several competitors, conducted on simulated and real data examples, show that our method achieves more accurate control of the false discovery rate in almost all scenarios. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
961
977
http://hdl.handle.net/10.1093/biomet/asn036
application/pdf
Access to full text is restricted to subscribers.
Faming Liang
Jian Zhang
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:183-1962013-03-04RePEc:oup:biomet
article
On measuring the variability of small area estimators under a basic area level model
In this paper based on a basic area level model we obtain second-order accurate approximations to the mean squared error of model-based small area estimators, using the Fay & Herriot (1979) iterative method of estimating the model variance based on weighted residual sum of squares. We also obtain mean squared error estimators unbiased to second order. Based on simulations, we compare the finite-sample performance of our mean squared error estimators with those based on method-of-moments, maximum likelihood and residual maximum likelihood estimators of the model variance. Our results suggest that the Fay--Herriot method performs better, in terms of relative bias of mean squared error estimators, than the other methods across different combinations of number of areas, pattern of sampling variances and distribution of small area effects. We also derive a noninformative prior on the model parameters for which the posterior variance of a small area mean is second-order unbiased for the mean squared error. The posterior variance based on such a prior possesses both Bayesian and frequentist interpretations. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
183
196
http://hdl.handle.net/10.1093/biomet/92.1.183
text/html
Access to full text is restricted to subscribers.
Gauri Sankar Datta
J. N. K. Rao
David Daniel Smith
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:217-2292013-03-04RePEc:oup:biomet
article
Variable selection for the single‐index model
We consider variable selection in the single-index model. We prove that the popular leave-m-out crossvalidation method has different behaviour in the single-index model from that in linear regression models or nonparametric regression models. A new consistent variable selection method, called separated crossvalidation, is proposed. Further analysis suggests that the method has better finite-sample performance and is computationally easier than leave-m-out crossvalidation. Separated crossvalidation, applied to the Swiss banknotes data and the ozone concentration data, leads to single-index models with selected variables that have better prediction capability than models based on all the covariates. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
217
229
http://hdl.handle.net/10.1093/biomet/asm008
application/pdf
Access to full text is restricted to subscribers.
Efang Kong
Yingcun Xia
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:995-9992013-03-04RePEc:oup:biomet
article
Wild bootstrap for quantile regression
The existing theory of the wild bootstrap has focused on linear estimators. In this note, we broaden its validity by providing a class of weight distributions that is asymptotically valid for quantile regression estimators. As most weight distributions in the literature lead to biased variance estimates for nonlinear estimators of linear regression, we propose a modification of the wild bootstrap that admits a broader class of weight distributions for quantile regression. A simulation study on median regression is carried out to compare various bootstrap methods. With a simple finite-sample correction, the wild bootstrap is shown to account for general forms of heteroscedasticity in a regression model with fixed design points. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
995
999
http://hdl.handle.net/10.1093/biomet/asr052
application/pdf
Access to full text is restricted to subscribers.
Xingdong Feng
Xuming He
Jianhua Hu
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:484-4892013-03-04RePEc:oup:biomet
article
Hypothesis testing when a nuisance parameter is present only under the alternative: Linear model case
The results of Davies (1977, 1987) are extended to a linear model situation with unknown residual variance. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
484
489
Robert B. Davies
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:291-3032013-03-04RePEc:oup:biomet
article
Regression methods for gap time hazard functions of sequentially ordered multivariate failure time data
Sequentially ordered multivariate failure time data are often observed in biomedical studies and inter-event, or gap, times are often of interest. Generally, standard hazard regression methods cannot be applied to the gap times because of identifiability issues and induced dependent censoring. We propose estimating equations for fitting proportional hazards regression models to the gap times. Model parameters are shown to be consistent and asymptotically normal. Simulation studies reveal the appropriateness of the asymptotic approximations in finite samples. The proposed methods are applied to renal failure data to assess the association between demographic covariates and both time until wait-listing and time from wait-listing to kidney transplantation. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
291
303
Douglas E. Schaubel
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:443-4582013-03-04RePEc:oup:biomet
article
Bayesian predictive information criterion for the evaluation of hierarchical Bayesian and empirical Bayes models
The problem of evaluating the goodness of the predictive distributions of hierarchical Bayesian and empirical Bayes models is investigated. A Bayesian predictive information criterion is proposed as an estimator of the posterior mean of the expected loglikelihood of the predictive distribution when the specified family of probability distributions does not contain the true distribution. The proposed criterion is developed by correcting the asymptotic bias of the posterior mean of the loglikelihood as an estimator of its expected loglikelihood. In the evaluation of hierarchical Bayesian models with random effects, regardless of our parametric focus, the proposed criterion considers the bias correction of the posterior mean of the marginal loglikelihood because it requires a consistent parameter estimator. The use of the bootstrap in model evaluation is also discussed. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
443
458
http://hdl.handle.net/10.1093/biomet/asm017
application/pdf
Access to full text is restricted to subscribers.
Tomohiro Ando
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:719-7232013-03-04RePEc:oup:biomet
article
Probabilistic model for two dependent circular variables
Motivated by problems in molecular biology and molecular physics, we propose a five-parameter torus analogue of the bivariate normal distribution for modelling the distribution of two circular random variables. The conditional distributions of the proposed distribution are von Mises. The marginal distributions are symmetric around their means and are either unimodal or bimodal. The type of shape depends on the configuration of parameters, and we derive the conditions that ensure a specific shape. The utility of the proposed distribution is illustrated by the modelling of angular variables in a short linear peptide. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
719
723
Harshinder Singh
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:53-712013-03-04RePEc:oup:biomet
article
Pattern-mixture models with proper time dependence
Recently, pattern-mixture modelling has become a popular tool for modelling incomplete longitudinal data. Such models are under-identified in the sense that, for any drop-out pattern, the data provide no direct information on the distribution of the unobserved outcomes, given the observed ones. One simple way of overcoming this problem, ordinary extrapolation of sufficiently simple pattern-specific models, often produces rather unlikely descriptions; several authors consider identifying restrictions instead. Molenberghs et al. (1998) have constructed identifying restrictions corresponding to missing at random. In this paper, the family of restrictions where drop-out does not depend on future, unobserved observations is identified. The ideas are illustrated using a clinical study of Alzheimer patients. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
53
71
M. G. Kenward
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:35-472013-03-04RePEc:oup:biomet
article
Population intervention models in causal inference
We propose a new causal parameter, which is a natural extension of existing approaches to causal inference such as marginal structural models. Modelling approaches are proposed for the difference between a treatment-specific counterfactual population distribution and the actual population distribution of an outcome in the target population of interest. Relevant parameters describe the effect of a hypothetical intervention on such a population and therefore we refer to these models as population intervention models. We focus on intervention models estimating the effect of an intervention in terms of a difference and ratio of means, called risk difference and relative risk if the outcome is binary. We provide a class of inverse-probability-of-treatment-weighted and doubly-robust estimators of the causal parameters in these models. The finite-sample performance of these new estimators is explored in a simulation study. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
35
47
http://hdl.handle.net/10.1093/biomet/asm097
application/pdf
Access to full text is restricted to subscribers.
Alan E. Hubbard
Mark J. van der Laan
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:958-9612013-03-04RePEc:oup:biomet
article
A note on a partial empirical likelihood
A partial profile empirical likelihood for a semiparametric mixture model (Zou et al., 2002) is shown to originate in a conditional likelihood involving additional nuisance parameters. The partial likelihood is the conditional likelihood with the nuisance parameters replaced by their estimators from the full likelihood. The conditional likelihood suggests alternative estimators. We demonstrate that the partial likelihood estimator is more efficient than an estimator for which the nuisance parameters are known. The practical implications of this counter-intuitive result are discussed. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
958
961
F. Zou
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:859-8742013-03-04RePEc:oup:biomet
article
Bayesian nonparametric inference on stochastic ordering
We consider Bayesian inference about collections of unknown distributions subject to a partial stochastic ordering. To address problems in testing of equalities between groups and estimation of group-specific distributions, we propose classes of restricted dependent Dirichlet process priors. These priors have full support in the space of stochastically ordered distributions, and can be used for collections of unknown mixture distributions to obtain a flexible class of mixture models. Theoretical properties are discussed, efficient methods are developed for posterior computation using Markov chain Monte Carlo simulation and the methods are illustrated using data from a study of DNA damage and repair. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
859
874
http://hdl.handle.net/10.1093/biomet/asn043
application/pdf
Access to full text is restricted to subscribers.
David B. Dunson
Shyamal D. Peddada
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:763-7752013-03-04RePEc:oup:biomet
article
Analysing panel count data with informative observation times
In this paper, we study panel count data with informative observation times. We assume nonparametric and semiparametric proportional rate models for the underlying event process, where the form of the baseline rate function is left unspecified and a subject-specific frailty variable inflates or deflates the rate function multiplicatively. The proposed models allow the event processes and observation times to be correlated through their connections with the unobserved frailty; moreover, the distributions of both the frailty variable and observation times are considered as nuisance parameters. The baseline rate function and the regression parameters are estimated by maximising a conditional likelihood function of observed event counts and solving estimation equations. Large-sample properties of the proposed estimators are studied. Numerical studies demonstrate that the proposed estimation procedures perform well for moderate sample sizes. An application to a bladder tumour study is presented. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
763
775
http://hdl.handle.net/10.1093/biomet/93.4.763
text/html
Access to full text is restricted to subscribers.
Chiung-Yu Huang
Mei-Cheng Wang
Ying Zhang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:979-9862013-03-04RePEc:oup:biomet
article
Identification of the age-period-cohort model and the extended chain-ladder model
We consider the identification problem that arises in the age-period-cohort models as well as in the extended chain-ladder model. We propose a canonical parameterization based on the accelerations of the trends in the three factors. This parameterization is exactly identified and eases interpretation, estimation and forecasting. The canonical parameterization is applied to a class of index sets which have trapezoidal shapes, including various Lexis diagrams and the insurance-reserving triangles. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
979
986
http://hdl.handle.net/10.1093/biomet/asn026
application/pdf
Access to full text is restricted to subscribers.
D. Kuang
B. Nielsen
J. P. Nielsen
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:982-9842013-03-04RePEc:oup:biomet
article
Conditional and marginal association for binary random variables
The relationship between marginal and conditional distributions of binary random variables is analysed via a log-linear model. Conditions for the Yule--Simpson effect are established and the implications for latent class analysis examined. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
982
984
D. R. Cox
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:919-9342013-03-04RePEc:oup:biomet
article
Quantifying the failure of bootstrap likelihood ratio tests
When testing geometrically irregular parametric hypotheses, the bootstrap is an intuitively appealing method to circumvent difficult distribution theory. It has been shown, however, that the usual bootstrap is inconsistent in estimating the asymptotic distributions involved in such problems. This paper is concerned with the asymptotic size of likelihood ratio tests when critical values are computed using the inconsistent bootstrap. We clarify how the asymptotic size of such a test can be obtained from the size of the corresponding bootstrap test in the relevant limiting normal experiment. For boundary problems, that is, hypotheses given by convex cones, we show the bootstrap test to always be anticonservative, and we compute the size numerically for different two-dimensional examples. The examples illustrate that the size can be below or above the nominal level, and reveal that the relationship between the size of the test and the geometry of the considered hypotheses is surprisingly subtle. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
919
934
http://hdl.handle.net/10.1093/biomet/asr033
application/pdf
Access to full text is restricted to subscribers.
Mathias Drton
Benjamin Williams
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:911-9262013-03-04RePEc:oup:biomet
article
A functional-based distribution diagnostic for a linear model with correlated outcomes
In this paper we present an easy-to-implement graphical distribution diagnostic for linear models with correlated errors. Houseman et al. (2004) constructed quantile--quantile plots for the marginal residuals of such models, suitably transformed. We extend the pointwise asymptotic theory to address the global stochastic behaviour of the corresponding empirical cumulative distribution function, and describe a simulation technique that serves as a computationally efficient parametric bootstrap for generating representatives of its stochastic limit. Thus, continuous functionals of the empirical cumulative distribution function may be used to form global tests of normality. Through the use of projection matrices, we generalised our methods to include tests that are directed at assessing the normality of particular components of the error. Thus, tests proposed by Lange & Ryan (1989) follow as a special case. Our method works well both for models having independent units of sampling and for those in which all observations are correlated. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
911
926
http://hdl.handle.net/10.1093/biomet/93.4.911
text/html
Access to full text is restricted to subscribers.
E. Andres Houseman
Brent A. Coull
Louise M. Ryan
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:799-8122013-03-04RePEc:oup:biomet
article
Covariance reducing models: An alternative to spectral modelling of covariance matrices
We introduce covariance reducing models for studying the sample covariance matrices of a random vector observed in different populations. The models are based on reducing the sample covariance matrices to an informational core that is sufficient to characterize the variance heterogeneity among the populations. They possess useful equivariance properties and provide a clear alternative to spectral models for covariance matrices. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
799
812
http://hdl.handle.net/10.1093/biomet/asn052
application/pdf
Access to full text is restricted to subscribers.
R. Dennis Cook
Liliana Forzani
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:645-6592013-03-04RePEc:oup:biomet
article
On multiple regression models with nonstationary correlated errors
We consider the estimation of parameters of a multiple regression model with nonstationary errors. We assume the nonstationary errors satisfy a time-dependent autoregressive process and describe a method for estimating the parameters of the regressors and the time-dependent autoregressive parameters. The parameters are rescaled as in nonparametric regression to obtain the asymptotic sampling properties of the estimators. The method is illustrated with an example taken from global temperature anomalies. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
645
659
Suhasini Subba Rao
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:455-4632013-03-04RePEc:oup:biomet
article
A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations
We introduce a family of multivariate binary distributions with certain conditional linear property. This family is particularly useful for efficient and easy simulation of correlated binary variables with a given marginal mean vector and correlation matrix. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
455
463
Bahjat F. Qaqish
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:345-3622013-03-04RePEc:oup:biomet
article
Stochastic multitype epidemics in a community of households: Estimation of threshold parameter R-sub-* and secure vaccination coverage
This paper is concerned with estimation of the threshold parameter R-sub-* for a stochastic model for the spread of a susceptible → infective → removed epidemic among a closed, finite population that contains several types of individual and is partitioned into households. It turns out that R-sub-* cannot be estimated consistently from final outcome data, so a Perron--Frobenius argument is used to obtain sharp lower and upper bounds for R-sub-*, which can be estimated consistently. Determining the allocation of vaccines that reduces the upper bound for R-sub-* to its threshold value of one, thus preventing the occurrence of a major outbreak, with minimum vaccine coverage is shown to be a linear programming problem. The estimates of R-sub-*, before and after vaccination, and of the secure vaccination coverage, i.e. the proportion of individuals that have to be vaccinated to reduce the upper bound for R-sub-* to 1 assuming an optimal vaccination scheme, are equipped with standard errors, thus yielding conservative confidence bounds for these key epidemiological parameters. The methodology is illustrated by application to data on influenza outbreaks in Tecumseh, Michigan. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
345
362
Frank G. Ball
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:1-182013-03-04RePEc:oup:biomet
article
Maxima of discretely sampled random fields, with an application to 'bubbles'
A smooth Gaussian random field with zero mean and unit variance is sampled on a discrete lattice, and we are interested in the exceedance probability or P-value of the maximum in a finite region. If the random field is smooth relative to the mesh size, then the P-value can be well approximated by results for the continuously sampled smooth random field (Adler, 1981; Worsley, 1995a; Taylor & Adler, 2003; Adler & Taylor, 2007). If the random field is not smooth, so that adjacent lattice values are nearly independent, then the usual Bonferroni bound is very accurate. The purpose of this paper is to bridge the gap between the two, and derive a simple, accurate upper bound for intermediate mesh sizes. The result uses a new improved Bonferroni-type bound based on discrete local maxima. We give an application to the 'bubbles' technique for detecting areas of the face used to discriminate fear from happiness. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
1
18
http://hdl.handle.net/10.1093/biomet/asm004
application/pdf
Access to full text is restricted to subscribers.
J. E. Taylor
K. J. Worsley
F. Gosselin
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:319-3262013-03-04RePEc:oup:biomet
article
Bayesian empirical likelihood
Research has shown that empirical likelihood tests have many of the same asymptotic properties as those derived from parametric likelihoods. This leads naturally to the possibility of using empirical likelihood as the basis for Bayesian inference. Different ways in which this goal might be accomplished are considered. The validity of the resultant posterior inferences is examined, as are frequentist properties of the Bayesian empirical likelihood intervals. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
319
326
Nicole A. Lazar
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:775-7892013-03-04RePEc:oup:biomet
article
Nonparametric estimation of the variogram and its spectrum
In the study of intrinsically stationary spatial processes, a new nonparametric variogram estimator is proposed through its spectral representation. The methodology is based on estimation of the variogram's spectrum by solving a regularized inverse problem through quadratic programming. The estimated variogram is guaranteed to be conditionally negative-definite. Simulation shows that our estimator is flexible and generally has smaller mean integrated squared error than the parametric estimator under model misspecification. Our methodology is applied to a spatial dataset of decadal temperature changes. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
775
789
http://hdl.handle.net/10.1093/biomet/asr056
application/pdf
Access to full text is restricted to subscribers.
Chunfeng Huang
Tailen Hsing
Noel Cressie
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:711-7202013-03-04RePEc:oup:biomet
article
Sudoku-based space-filling designs
Sudoku is played by millions of people across the globe. It has simple rules and is very addictive. The game board is a nine-by-nine grid of numbers from one to nine. Several entries within the grid are provided and the remaining entries must be filled in subject to no row, column, or three-by-three subsquare containing duplicate numbers. By exploiting these three types of uniformity, we propose an approach to constructing a new type of design, called a Sudoku-based space-filling design. Such a design can be divided into groups of subdesigns so that the complete design and each subdesign achieve maximum uniformity in univariate and bivariate margins. Examples are given illustrating the proposed construction method. Applications of such designs include computer experiments with qualitative and quantitative factors, linking parameters in engineering and crossvalidation. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
711
720
http://hdl.handle.net/10.1093/biomet/asr024
application/pdf
Access to full text is restricted to subscribers.
Xu Xu
BEN Haaland
Peter Z. G. Qian
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:807-8172013-03-04RePEc:oup:biomet
article
A new Bayesian method for nonparametric capture-recapture models in presence of heterogeneity
The intrinsic heterogeneity of individuals is a potential source of bias in estimation procedures for capture-recapture models. To account for this heterogeneity in the model a hierarchical structure has been proposed whereby the probabilities that each animal is caught on a single occasion are modelled as independent draws from a common unknown distribution F. However, there is general agreement that modelling F by a simple parametric curve may lead to unsatisfactory results. Here we propose an alternative Bayesian approach that relies on a different parameterisation which imposes no assumption on the shape of F but drives the problem back to a finite-dimensional setting. Our approach avoids some identifiability issues related to such a recapture model while allowing for a formal Bayesian default analysis. Results of analyses of computer simulations and of real data show that the method performs well. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
807
817
Luca Tardella
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:267-2832013-03-04RePEc:oup:biomet
article
A weighted multivariate sign test for cluster-correlated data
We consider the multivariate location problem with cluster-correlated data. A family of multivariate weighted sign tests is introduced for which observations from different clusters can receive different weights. Under weak assumptions, the test statistic is asymptotically distributed as a chi-squared random variable as the number of clusters goes to infinity. The asymptotic distribution of the test statistic is also given for a local alternative model under multivariate normality. Optimal weights maximizing Pitman asymptotic efficiency are provided. These weights depend on the cluster sizes and on the intracluster correlation. Several approaches for estimating these weights are presented. Using Pitman asymptotic efficiency, we show that appropriate weighting can increase substantially the efficiency compared to a test that gives the same weight to each cluster. A multivariate weighted t-test is also introduced. The finite-sample performance of the weighted sign test is explored through a simulation study which shows that the proposed approach is very competitive. A real data example illustrates the practical application of the methodology. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
267
283
http://hdl.handle.net/10.1093/biomet/asm026
application/pdf
Access to full text is restricted to subscribers.
Denis Larocque
Jaakko Nevalainen
Hannu Oja
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:759-7712013-03-04RePEc:oup:biomet
article
Extended Bayesian information criteria for model selection with large model spaces
The ordinary Bayesian information criterion is too liberal for model selection when the model space is large. In this paper, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to infinity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayesian information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayesian information criteria are extremely useful for variable selection in problems with a moderate sample size but with a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
759
771
http://hdl.handle.net/10.1093/biomet/asn034
application/pdf
Access to full text is restricted to subscribers.
Jiahua Chen
Zehua Chen
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:87-992013-03-04RePEc:oup:biomet
article
The unobserved heterogeneity distribution in duration analysis
In a large class of hazard models with proportional unobserved heterogeneity, the distribution of the heterogeneity among survivors converges to a gamma distribution. This convergence is often rapid. We derive this result as a general result for exponential mixtures and explore its implications for the specification and empirical analysis of univariate and multivariate duration models. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
87
99
http://hdl.handle.net/10.1093/biomet/asm013
application/pdf
Access to full text is restricted to subscribers.
Jaap H. Abbring
Gerard J. Van Den Berg
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:701-7102013-03-04RePEc:oup:biomet
article
Generalized varying coefficient models with unknown link function
We propose a new estimation method for generalized varying coefficient models where the link function is specified up to some smoothness conditions. Consistency and asymptotic normality of the estimated varying coefficient functions are established. Simulation results and a real data application demonstrate the usefulness of the new method. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
701
710
http://hdl.handle.net/10.1093/biomet/asr031
application/pdf
Access to full text is restricted to subscribers.
C. N. Kuruwita
K. B. Kulasekera
C. M. Gallagher
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:996-10022013-03-04RePEc:oup:biomet
article
Identification of a competing risks model with unknown transformations of latent failure times
This paper is concerned with identification of a competing risks model with unknown transformations of latent failure times. The model includes, as special cases, competing risks versions of proportional hazards, mixed proportional hazards and accelerated failure time models. It is shown that covariate effects on latent failure times, cause-specific link functions and the joint survivor function of the disturbance terms can be identified without relying on modelling the dependence between latent failure times parametrically nor using an exclusion restriction among covariates. As a result, the paper provides an identification result about the joint survivor function of the latent failure times conditional on covariates. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
996
1002
http://hdl.handle.net/10.1093/biomet/93.4.996
text/html
Access to full text is restricted to subscribers.
Sokbae Lee
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:807-8202013-03-04RePEc:oup:biomet
article
Sparse estimation of a covariance matrix
We suggest a method for estimating a covariance matrix on the basis of a sample of vectors drawn from a multivariate normal distribution. In particular, we penalize the likelihood with a lasso penalty on the entries of the covariance matrix. This penalty plays two important roles: it reduces the effective number of parameters, which is important even when the dimension of the vectors is smaller than the sample size since the number of parameters grows quadratically in the number of variables, and it produces an estimate which is sparse. In contrast to sparse inverse covariance estimation, our method's close relative, the sparsity attained here is in the covariance matrix itself rather than in the inverse matrix. Zeros in the covariance matrix correspond to marginal independencies; thus, our method performs model selection while providing a positive definite estimate of the covariance. The proposed penalized maximum likelihood problem is not convex, so we use a majorize-minimize approach in which we iteratively solve convex approximations to the original nonconvex problem. We discuss tuning parameter selection and demonstrate on a flow-cytometry dataset how our method produces an interpretable graphical display of the relationship between variables. We perform simulations that suggest that simple elementwise thresholding of the empirical covariance matrix is competitive with our method for identifying the sparsity structure. Additionally, we show how our method can be used to solve a previously studied special case in which a desired sparsity pattern is prespecified. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
807
820
http://hdl.handle.net/10.1093/biomet/asr054
application/pdf
Access to full text is restricted to subscribers.
Jacob Bien
Robert J. Tibshirani
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:579-5892013-03-04RePEc:oup:biomet
article
A diagnostic procedure based on local influence
Cook's (1986) normal curvature measure is useful for sensitivity analysis of model assumptions in statistical models. However, there is no rigorous approach based on the normal curvature for addressing two fundamental issues: to assess the extent of discrepancy between an assumed model and the underlying model from which the data are generated, and to identify suspicious data points for which the discrepancy is most evident. Our purpose is to establish a theoretically sound procedure for resolving these issues for case-weight perturbation under the framework of independent distributions. We show that the local influence measure, Cook's distance and likelihood distance are asymptotically equivalent. A diagnostic procedure, based on local influence, is proposed for evaluating model misspecification and for detecting influential points simultaneously. We analyse two real datasets. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
579
589
Hongtu Zhu
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:283-2982013-03-04RePEc:oup:biomet
article
A flexible additive multiplicative hazard model
We present a new additive-multiplicative hazard model which consists of two components. The first component contains additive covariate effects through an additive Aalen model while the second component contains multiplicative covariate effects through a Cox regression model. The Aalen model allows for time-varying covariate effects, while the Cox model allows only a common time-dependence through the baseline. Approximate maximum likelihood estimators are derived by solving the simultaneous score equations for the nonparametric and parametric components of the model. The suggested estimators are provided with large-sample properties and are shown to be efficient. The efficient estimators depend, however, on some estimated weights. We therefore also consider unweighted estimators and describe their large-sample properties. We finally extend the model to allow for time-varying covariate effects in the multiplicative part of the model as well. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
283
298
Torben Martinussen
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:447-4592013-03-04RePEc:oup:biomet
article
Assessing robustness of generalised estimating equations and quadratic inference functions
In the presence of data contamination or outliers, some empirical studies have indicated that the two methods of generalised estimating equations and quadratic inference functions appear to have rather different robustness behaviour. This paper presents a theoretical investigation from the perspective of the influence function to identify the causes for the difference. We show that quadratic inference functions lead to bounded influence functions and the corresponding M-estimator has a redescending property, but the generalised estimating equation approach does not. We also illustrate that, unlike generalised estimating equations, quadratic inference functions can still provide consistent estimators even if part of the data is contaminated. We conclude that the quadratic inference function is a preferable method to the generalised estimating equation as far as robustness is concerned. This conclusion is supported by simulations and real-data examples. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
447
459
Annie Qu
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:971-9742013-03-04RePEc:oup:biomet
article
A note on reducing the bias of the approximate Bayesian bootstrap imputation variance estimator
Rubin & Schenker (1986) proposed the approximate Bayesian bootstrap, a two-stage resampling procedure, as a method of creating multiple imputations when missing data are ignorable. Kim (2002) showed that the multiple imputation variance estimator is biased for moderate sample sizes when this method is used. To reduce the bias, Kim (2002) proposed modifying the number of samples drawn at the first stage of the Bayesian bootstrap procedure. In this note, we suggest an alternative method for reducing the bias via a simple correction factor applied to the standard multiple imputation variance estimate. The proposed correction is more easily implemented and more efficient than the procedure proposed by Kim (2002). Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
971
974
http://hdl.handle.net/10.1093/biomet/92.4.971
text/html
Access to full text is restricted to subscribers.
Michael Parzen
Stuart R. Lipsitz
Garrett M. Fitzmaurice
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:529-5412013-03-04RePEc:oup:biomet
article
Case-control current status data
In this paper, we show that the distribution function of survival times is identified, up to a one-parameter family of distribution functions, based on information from case-control current status data. With supplementary information on the population frequency of cases relative to controls, a simple weighted version of the nonparametric maximum likelihood estimator for prospective current status data provides a natural estimator for case-control samples. Following the parametric results of Scott & Wild (1997), we show that this estimator is, in fact, the nonparametric maximum likelihood estimator. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
529
541
Nicholas P. Jewell
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:909-9202013-03-04RePEc:oup:biomet
article
First-order intrinsic autoregressions and the de Wijs process
We discuss intrinsic autoregressions for a first-order neighbourhood on a two-dimensional rectangular lattice and give an exact formula for the variogram that extends known results to the asymmetric case. We obtain a corresponding asymptotic expansion that is more accurate and more general than previous ones and use this to derive the de Wijs variogram under appropriate averaging, a result that can be interpreted as a two-dimensional spatial analogue of Brownian motion obtained as the limit of a random walk in one dimension. This provides a bridge between geostatistics, where the de Wijs process was once the most popular formulation, and Markov random fields, and also explains why statistical analysis using intrinsic autoregressions is usually robust to changes of scale. We briefly describe corresponding calculations in the frequency domain, including limiting results for higher-order autoregressions. The paper closes with some practical considerations, including applications to irregularly-spaced data. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
909
920
http://hdl.handle.net/10.1093/biomet/92.4.909
text/html
Access to full text is restricted to subscribers.
Julian Besag
Debashis Mondal
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:243-2482013-03-04RePEc:oup:biomet
article
Plant-capture estimation of the size of a homogeneous population
We consider maximum likelihood estimation of the size of a target population to which has been added a known number of planted individuals. The standard equal-catchability model used in mark-recapture is assumed to be applicable to the augmented population. After proving the unimodality of the profile likelihood for the target population size, we obtain both the maximum likelihood estimator of this size and interval estimators based on its asymptotic distribution. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
243
248
http://hdl.handle.net/10.1093/biomet/asm012
application/pdf
Access to full text is restricted to subscribers.
I. B. J. Goudie
P. E. Jupp
J. Ashbridge
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:743-7502013-03-04RePEc:oup:biomet
article
Nonparametric confidence intervals for receiver operating characteristic curves
We study methods for constructing confidence intervals and confidence bands for estimators of receiver operating characteristics. Particular emphasis is placed on the way in which smoothing should be implemented, when estimating either the characteristic itself or its variance. We show that substantial undersmoothing is necessary if coverage properties are not to be impaired. A theoretical analysis of the problem suggests an empirical, plug-in rule for bandwidth choice, optimising the coverage accuracy of interval estimators. The performance of this approach is explored. Our preferred technique is based on asymptotic approximation, rather than a more sophisticated approach using the bootstrap, since the latter requires a multiplicity of smoothing parameters all of which must be chosen in nonstandard ways. It is shown that the asymptotic method can give very good performance. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
743
750
Peter Hall
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:199-2082013-03-04RePEc:oup:biomet
article
A nonparametric test for panel count data
Panel count data arise when a recurrent event is under investigation and each study subject is observed only at discrete time points. In this situation, observed data include only the numbers of occurrences of the event of interest between observation time points and no information is available on subjects between their observation time points. We propose a nonparametric test for comparing the point processes characterising the recurrent event when only panel count data are available. The asymptotic distribution of the test statistic is derived and a simulation study is conducted to evaluate its performance. The method is illustrated using data from a medical follow-up study. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
199
208
Jianguo Sun
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:519-5282013-03-04RePEc:oup:biomet
article
A note on composite likelihood inference and model selection
A composite likelihood consists of a combination of valid likelihood objects, usually related to small subsets of data. The merit of composite likelihood is to reduce the computational complexity so that it is possible to deal with large datasets and very complex models, even when the use of standard likelihood or Bayesian methods is not feasible. In this paper, we aim to suggest an integrated, general approach to inference and model selection using composite likelihood methods. In particular, we introduce an information criterion for model selection based on composite likelihood. We also describe applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful geyser dataset. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
519
528
http://hdl.handle.net/10.1093/biomet/92.3.519
text/html
Access to full text is restricted to subscribers.
Cristiano Varin
Paolo Vidoni
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:159-1822013-03-04RePEc:oup:biomet
article
Spectral models for covariance matrices
A new model for the simultaneous eigenstructure of multiple covariance matrices is proposed. The model is much more flexible than existing models and subsumes most of them as special cases. A Fisher scoring algorithm for computing maximum likelihood estimates of the parameters under normality is given. Asymptotic distributions of the estimators are derived under normality as well as under arbitrary distributions having finite fourth-order cumulants. Special attention is given to elliptically contoured distributions. Likelihood ratio tests are described and sufficient conditions are given under which the test statistics are asymptotically distributed as chi-squared random variables. Procedures are derived for evaluating Bartlett corrections under normality. Some conjectures made by Flury (1988) are verified; others are refuted. A small simulation study of the adequacy of the Bartlett correction is described and the new procedures are illustrated on two datasets. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
159
182
Robert J. Boik
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:355-3662013-03-04RePEc:oup:biomet
article
A serially correlated gamma frailty model for longitudinal count data
A Poisson-gamma model is introduced to account for between-subjects heterogeneity and within-subjects serial correlation occurring in longitudinal count data. The model extends the usual time-constant shared frailty approach to allow time-varying serially correlated gamma frailty whilst retaining standard marginal assumptions. A composite likelihood approach to estimation and testing for serial correlation is proposed. The work is motivated by a clinical trial on patient-controlled analgesia where the number of analgesic doses taken by hospital patients in successive time intervals following abdominal surgery is recorded. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
355
366
Robin Henderson
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:905-9192013-03-04RePEc:oup:biomet
article
Using Hierarchical Likelihood for Missing Data Problems
Most statistical solutions to the problem of statistical inference with missing data involve integration or expectation. This can be done in many ways: directly or indirectly, analytically or numerically, deterministically or stochastically. Missing-data problems can be formulated in terms of latent random variables, so that hierarchical likelihood methods of Lee & Nelder (1996) can be applied to missing-value problems to provide one solution to the problem of integration of the likelihood. The resulting methods effectively use a Laplace approximation to the marginal likelihood with an additional adjustment to the measures of precision to accommodate the estimation of the fixed effects parameters. We first consider missing at random cases where problems are simpler to handle because the integration does not need to involve the missing-value mechanism and then consider missing not at random cases. We also study tobit regression and refit the missing not at random selection model to the antidepressant trial data analyzed in Diggle & Kenward (1994). Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
905
919
http://hdl.handle.net/10.1093/biomet/asm063
application/pdf
Access to full text is restricted to subscribers.
Sung-Cheol Yun
Youngjo Lee
Michael G. Kenward
oai:RePEc:oup:biomet:v:90:y:2003:i:2:p:289-3022013-03-04RePEc:oup:biomet
article
Fully Bayesian spline smoothing and intrinsic autoregressive priors
There is a well-known Bayesian interpretation for function estimation by spline smoothing using a limit of proper normal priors. The limiting prior and the conditional and intrinsic autoregressive priors popular for spatial modelling have a common form, which we call partially informative normal. We derive necessary and sufficient conditions for the propriety of the posterior for this class of partially informative normal priors with noninformative priors on the variance components, a condition crucial for successful implementation of the Gibbs sampler. The results apply for fully Bayesian smoothing splines, thin-plate splines and L-splines, as well as models using intrinsic autoregressive priors. Copyright Biometrika Trust 2003, Oxford University Press.
2
2003
90
June
Biometrika
289
302
Paul L. Speckman
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:629-6412013-03-04RePEc:oup:biomet
article
A Bayesian justification of Cox's partial likelihood
In this paper, we establish both naive and formal Bayesian justifications of Cox's (1975) partial likelihood and its various modifications. We extend the original work of Kalbfieisch (1978), who showed that the partial likelihood is a limiting marginal posterior under noninformative priors for baseline hazards. We extend the result to scenarios with time-dependent covariates and time-varying regression parameters. We establish results for continuous time as well as grouped survival data. In addition, we present a Bayesian justification of a modified partial likelihood for handling ties. We also present tools for simplification of the Gibbs sampling algorithm for implementing partial likelihood based Bayesian inference in various practical applications. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
629
641
Debajyoti Sinha
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:63-742013-03-04RePEc:oup:biomet
article
Shared parameter models under random effects misspecification
A common objective in longitudinal studies is the investigation of the association structure between a longitudinal response process and the time to an event of interest. An attractive paradigm for the joint modelling of longitudinal and survival processes is the shared parameter framework, where a set of random effects is assumed to induce their interdependence. In this work, we propose an alternative parameterization for shared parameter models and investigate the effect of misspecifying the random effects distribution in the parameter estimates and their standard errors. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
63
74
http://hdl.handle.net/10.1093/biomet/asm087
application/pdf
Access to full text is restricted to subscribers.
Dimitris Rizopoulos
Geert Verbeke
Geert Molenberghs
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:481-4852013-03-04RePEc:oup:biomet
article
Models for recurring events with marginal proportional hazards
Semiparametric methods were proposed by Wei et al. (1989) to analyse recurring event-time data. They modelled the marginal distribution of each event time with a Cox proportional hazards model without imposing any constraint on the joint distribution of different event times. Therefore, it is unclear whether or not event times can simultaneously satisfy their respective marginal proportional hazards assumptions, while having continuous joint distribution. Often this leads to a difficulty of conducting simulation studies. In this note we construct parametric marginal proportional hazards models for recurring event times with proper joint density functions. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
481
485
http://hdl.handle.net/10.1093/biomet/93.2.481
text/html
Access to full text is restricted to subscribers.
Nader Ebrahimi
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:149-1582013-03-04RePEc:oup:biomet
article
Standard errors and covariance matrices for smoothed rank estimators
A 'pseudo-Bayesian' interpretation of standard errors yields a natural induced smoothing of statistical estimating functions. When applied to rank estimation, the lack of smoothness which prevents standard error estimation is remedied. Efficiency and robustness are preserved, while the smoothed estimation has excellent computational properties. In particular, convergence of the iterative equation for standard error is fast, and standard error calculation becomes asymptotically a one-step procedure. This property also extends to covariance matrix calculation for rank estimates in multi-parameter problems. Examples, and some simple explanations, are given. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
149
158
http://hdl.handle.net/10.1093/biomet/92.1.149
text/html
Access to full text is restricted to subscribers.
B. M. Brown
You-Gan Wang
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:95-1102013-03-04RePEc:oup:biomet
article
Estimating and interpolating a Markov chain from aggregate data
Given aggregated longitudinal data generated by a Markov chain, which may be nonhomogeneous, the problem considered is that of modelling, estimating and interpolating the logarithms of partial odds and hence the transition probabilities. By partial odds is meant the probability of a transition to another state divided by the probability of no transition. A result establishing asymptotic normality leads to vector weighted least squares estimation of parameterised partial odds using standard regression methods. It is shown how to obtain estimates of one-step transition probabilities from widely or irregularly spaced data. The methods are illustrated on an example concerning competing causes of death. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
95
110
B. A. Davis
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:197-2122013-03-04RePEc:oup:biomet
article
Adaptive two-stage test procedures to find the best treatment in clinical trials
A main objective in clinical trials is to find the best treatment in a given finite class of competing treatments and then to show superiority of this treatment against a control treatment. The traditional procedure estimates the best treatment in a first trial. Then in an independent second trial superiority of this treatment, estimated as best in the first trial, is to be shown against the control treatment by a size α test. In this paper we investigate these two trials of this traditional procedure as a two-stage test procedure. Additionally we introduce competing two-stage group-sequential test procedures. Then we derive formulae for the expected number of patients. These formulae depend on unknown parameters. When we have a prior for the unknown parameters we can determine the two-stage test procedure of size α and power β that is optimal, in that it needs a minimal number of observations. The results are illustrated by a numerical example, which indicates the superiority of the group-sequential procedures. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
197
212
http://hdl.handle.net/10.1093/biomet/92.1.197
text/html
Access to full text is restricted to subscribers.
Wolfgang Bischoff
Frank Miller
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:573-5862013-03-04RePEc:oup:biomet
article
A semiparametric regression cure model with current status data
This paper considers the analysis of current status data with a cured proportion in the population using a mixture model that combines a logistic regression formulation for the probability of cure with a semiparametric regression model for the time to occurrence of the event. The semiparametric regression model belongs to the flexible class of partly linear models that allows one to explore the possibly nonlinear effect of a certain covariate on the response variable. A sieve maximum likelihood estimation method is proposed and the asymptotic properties of the proposed estimators are discussed. Under some mild conditions, the estimators are shown to be strongly consistent. The convergence rate of the estimator for the unknown smooth function is obtained and the estimator for the unknown parameter is shown to be asymptotically efficient and normally distributed. Simulation studies were carried out to investigate the performance of the proposed method and the model is fitted to a dataset from a study of calcification of the hydrogel intraocular lenses, a complication of cataract treatment. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
573
586
http://hdl.handle.net/10.1093/biomet/92.3.573
text/html
Access to full text is restricted to subscribers.
K. F. Lam
Hongqi Xue
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:979-9852013-03-04RePEc:oup:biomet
article
False discovery rate for scanning statistics
The false discovery rate is a criterion for controlling Type I error in simultaneous testing of multiple hypotheses. For scanning statistics, due to local dependence, clusters of neighbouring hypotheses are likely to be rejected together. In such situations, it is more intuitive and informative to group neighbouring rejections together and count them as a single discovery, with the false discovery rate defined as the proportion of clusters that are falsely declared among all declared clusters. Assuming that the number of false discoveries, under this broader definition of a discovery, is approximately Poisson and independent of the number of true discoveries, we examine approaches for estimating and controlling the false discovery rate, and provide examples from biological applications. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
979
985
http://hdl.handle.net/10.1093/biomet/asr057
application/pdf
Access to full text is restricted to subscribers.
D. O. Siegmund
N. R. Zhang
B. Yakir
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:285-2962013-03-04RePEc:oup:biomet
article
Marginal tests with sliced average variance estimation
We present a new computationally feasible test for the dimension of the central subspace in a regression problem based on sliced average variance estimation. We also provide a marginal coordinate test. Under the null hypothesis, both the test of dimension and the marginal coordinate test involve test statistics that asymptotically have chi-squared distributions given normally distributed predictors, and have a distribution that is a linear combination of chi-squared distributions in general. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
285
296
http://hdl.handle.net/10.1093/biomet/asm021
application/pdf
Access to full text is restricted to subscribers.
Yongwu Shao
R. Dennis Cook
Sanford Weisberg
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:399-4182013-03-04RePEc:oup:biomet
article
Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies
We consider the problem of maximum-likelihood estimation in case-control studies of gene-environment associations with disease when genetic and environmental exposures can be assumed to be independent in the underlying population. Traditional logistic regression analysis may not be efficient in this setting. We study the semiparametric maximum likelihood estimates of logistic regression parameters that exploit the gene-environment independence assumption and leave the distribution of the environmental exposures to be nonparametric. We use a profile-likelihood technique to derive a simple algorithm for obtaining the estimator and we study the asymptotic theory. The results are extended to situations where genetic and environmental factors are independent conditional on some other factors. Simulation studies investigate small-sample properties. The method is illustrated using data from a case-control study designed to investigate the interplay of BRCA1/2 mutations and oral contraceptive use in the aetiology of ovarian cancer. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
399
418
http://hdl.handle.net/10.1093/biomet/92.2.399
text/html
Access to full text is restricted to subscribers.
Nilanjan Chatterjee
Raymond J. Carroll
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:399-4102013-03-04RePEc:oup:biomet
article
Optimal testing of multiple hypotheses with common effect direction
We present a theoretical basis for testing related endpoints. Typically, it is known how to construct tests of the individual hypotheses, but not how to combine them into a multiple test procedure that controls the familywise error rate. Using the closure method, we emphasize the role of consonant procedures, from an interpretive as well as a theoretical viewpoint. Surprisingly, even if each intersection test has an optimality property, the overall procedure obtained by applying closure to these tests may be inadmissible. We introduce a new procedure, which is consonant and has a maximin property under the normal model. The results are then applied to PROactive, a clinical trial designed to investigate the effectiveness of a glucose-lowering drug on macrovascular outcomes among patients with type 2 diabetes. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
399
410
http://hdl.handle.net/10.1093/biomet/asp006
application/pdf
Access to full text is restricted to subscribers.
Richard M. Bittman
Joseph P. Romano
Carlos Vallarino
Michael Wolf
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:241-2472013-03-04RePEc:oup:biomet
article
A note on path-based variable selection in the penalized proportional hazards model
We propose an efficient and adaptive shrinkage method for variable selection in the Cox model. The method constructs a piecewise-linear regularization path connecting the maximum partial likelihood estimator and the origin. Then a model is selected along the path. We show that the constructed path is adaptive in the sense that, with a proper choice of regularization parameter, the fitted model works as well as if the true underlying submodel were given in advance. A modified algorithm of the least-angle-regression type efficiently computes the entire regularization path of the new estimator. Furthermore, we show that, with a proper choice of shrinkage parameter, the method is consistent in variable selection and efficient in estimation. Simulation shows that the new method tends to outperform the lasso and the smoothly-clipped-absolute-deviation estimators with moderate samples. We apply the methodology to data concerning nursing homes. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
241
247
http://hdl.handle.net/10.1093/biomet/asm083
application/pdf
Access to full text is restricted to subscribers.
Hui Zou
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:509-5112013-03-04RePEc:oup:biomet
article
Adjusting estimative prediction limits
This note presents a direct adjustment of the estimative prediction limit to reduce the coverage error from a target value to third-order accuracy. The adjustment is asymptotically equivalent to those of Barndorff-Nielsen & Cox (1994, 1996) and Vidoni (1998). It has a simpler form with a plug-in estimator of the coverage probability of the estimative limit at the target value. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
509
511
http://hdl.handle.net/10.1093/biomet/asm032
application/pdf
Access to full text is restricted to subscribers.
Masao Ueki
Kaoru Fueda
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:385-3972013-03-04RePEc:oup:biomet
article
Some nonregular designs from the Nordstrom–Robinson code and their statistical properties
The Nordstrom--Robinson code is a well-known nonlinear code in coding theory. This paper explores the statistical properties of this nonlinear code. Many nonregular designs with 32, 64, 128 and 256 runs and 7--16 factors are derived from it. It is shown that these nonregular designs are better than regular designs of the same size in terms of resolution, aberration and projectivity. Furthermore, many of these nonregular designs are shown to have generalised minimum aberration among all possible designs. Seven orthogonal arrays are shown to have unique word-length pattern and four of them are shown to be unique up to isomorphism. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
385
397
http://hdl.handle.net/10.1093/biomet/92.2.385
text/html
Access to full text is restricted to subscribers.
Hongquan Xu
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:553-5662013-03-04RePEc:oup:biomet
article
Bayesian analysis of covariance matrices and dynamic models for longitudinal data
Parsimonious modelling of the within-subject covariance structure while heeding its positive-definiteness is of great importance in the analysis of longitudinal data. Using the Cholesky decomposition and the ensuing unconstrained and statistically meaningful reparameterisation, we provide a convenient and intuitive framework for developing conditionally conjugate prior distributions for covariance matrices and show their connections with generalised inverse Wishart priors. Our priors offer many advantages with regard to elicitation, positive definiteness, computations using Gibbs sampling, shrinking covariances toward a particular structure with considerable flexibility, and modelling covariances using covariates. Bayesian estimation methods are developed and the results are compared using two simulation studies. These simulations suggest simpler and more suitable priors for the covariance structure of longitudinal data. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
553
566
Michael J. Daniels
oai:RePEc:oup:biomet:v:96:y:2009:i:2:p:293-3062013-03-04RePEc:oup:biomet
article
Generalized method of moments estimation for linear regression with clustered failure time data
We propose a generalized method of moments approach to the accelerated failure time model with correlated survival data. We study the semiparametric rank estimator using martingale-based moments. We circumvent direct estimation of correlation parameters by concatenating the moments and minimizing a quadratic objective function. We establish the consistency and asymptotic normality of the parameter estimators, and derive the limiting distribution of the objective function. We carry out simulation studies to examine the finite-sample properties of the method, and demonstrate its substantial efficiency gain over the conventional method. Finally, we illustrate the new proposal with an example from a diabetic retinopathy study. Copyright 2009, Oxford University Press.
2
2009
96
Biometrika
293
306
http://hdl.handle.net/10.1093/biomet/asp005
application/pdf
Access to full text is restricted to subscribers.
Hui Li
Guosheng Yin
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:921-9362013-03-04RePEc:oup:biomet
article
Towards reconciling two asymptotic frameworks in spatial statistics
Two asymptotic frameworks, increasing domain asymptotics and infill asymptotics, have been advanced for obtaining limiting distributions of maximum likelihood estimators of covariance parameters in Gaussian spatial models with or without a nugget effect. These limiting distributions are known to be different in some cases. It is therefore of interest to know, for a given finite sample, which framework is more appropriate. We consider the possibility of making this choice on the basis of how well the limiting distributions obtained under each framework approximate their finite-sample counterparts. We investigate the quality of these approximations both theoretically and empirically, showing that, for certain consistently estimable parameters of exponential covariograms, approximations corresponding to the two frameworks perform about equally well. For those parameters that cannot be estimated consistently, however, the infill asymptotic approximation is preferable. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
921
936
http://hdl.handle.net/10.1093/biomet/92.4.921
text/html
Access to full text is restricted to subscribers.
Hao Zhang
Dale L. Zimmerman
oai:RePEc:oup:biomet:v:93:y:2006:i:2:p:357-3662013-03-04RePEc:oup:biomet
article
Confidence bands for hazard rates under random censorship
We suggest a completely empirical approach to the construction of confidence bands for hazard functions, based on smoothing the Nelsen-Aalen estimator. In particular, we introduce a local bandwidth-choice method. Our approach uses empirical information about both the survival rate and the censoring rate, and employs undersmoothing to alleviate difficulties caused by bias. We use both Edgeworth expansion and numerical simulation, the former to develop a basic formula and the latter to modify it for general use. Copyright 2006, Oxford University Press.
2
2006
93
June
Biometrika
357
366
http://hdl.handle.net/10.1093/biomet/93.2.357
text/html
Access to full text is restricted to subscribers.
Ming-Yen Cheng
Peter Hall
Dongsheng Tu
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:471-4902013-03-04RePEc:oup:biomet
article
Efficient importance sampling for events of moderate deviations with applications
We propose a method for finding the alternative distribution in importance sampling. The alternative distribution is optimal in the sense that the asymptotic variance is minimised for estimating tail probabilities of asymptotically normal statistics. Our contribution to importance sampling is three-fold. To begin with, we obtain an explicit expression for the mean of the optimal alternative distribution and the expression motivates a recursive approximation algorithm. Secondly, a new multi-dimensional exponential tilting formula is presented. Lastly, a conservative estimator of the variance is given to facilitate a quick comparison among different stratified sampling schemes in conjunction with importance sampling. Several numerical examples illustrating the efficacy of the proposed method are also included. These results indicate that the proposed method is considerably more efficient than the method based on large deviations theory and the efficiency gain is more significant in higher dimensions. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
471
490
Cheng-Der Fuh
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:393-4082013-03-04RePEc:oup:biomet
article
Nonparametric inference for stochastic linear hypotheses: Application to high-dimensional data
The Mann--Whitney--Wilcoxon rank sum test is limited to comparison of two groups with univariate responses. In this paper, we introduce a class of stochastic linear hypotheses that addresses these limitations within a nonparametric setting. We formulate hypotheses for simultaneous comparisons of several, multivariate response groups, without modelling the response distributions. Inference is developed based on U-statistics theory and an exchangeability assumption. The latter condition is required to identify testable hypotheses for high-dimensional response vectors, such as those arising in genomic and psychosocial research. The methodology is illustrated with two real-data applications. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
393
408
Jeanne Kowalski
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:283-3012013-03-04RePEc:oup:biomet
article
Additive hazards Markov regression models illustrated with bone marrow transplant data
When there are covariate effects to be considered, multi-state survival analysis is dominated either by parametric Markov regression models or by semiparametric Markov regression models using Cox's (1972) proportional hazards models for transition intensities between the states. The purpose of this research work is to study alternatives to Cox's model in a general finite-state Markov process setting. We shall look at two alternative models, Aalen's (1989) nonparametric additive hazards model and Lin & Ying's (1994) semiparametric additive hazards model. The former allows the effects of covariates to vary freely over time, while the latter assumes that the regression coefficients are constant over time. With the basic tools of the product integral and the functional delta-method, we present an estimator of the transition probability matrix and develop the large-sample theory for the estimator under each of these two models. Data on 1459 HLA identical sibling transplants for acute leukaemia from the International Bone Marrow Transplant Registry serve as illustration. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
283
301
http://hdl.handle.net/10.1093/biomet/92.2.283
text/html
Access to full text is restricted to subscribers.
Youyi Shu
John P. Klein
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:19-362013-03-04RePEc:oup:biomet
article
Efficient nonparametric estimation of causal effects in randomized trials with noncompliance
Causal approaches based on the potential outcome framework provide a useful tool for addressing noncompliance problems in randomized trials. We propose a new estimator of causal treatment effects in randomized clinical trials with noncompliance. We use the empirical likelihood approach to construct a profile random sieve likelihood and take into account the mixture structure in outcome distributions, so that our estimator is robust to parametric distribution assumptions and provides substantial finite-sample efficiency gains over the standard instrumental variable estimator. Our estimator is asymptotically equivalent to the standard instrumental variable estimator, and it can be applied to outcome variables with a continuous, ordinal or binary scale. We apply our method to data from a randomized trial of an intervention to improve the treatment of depression among depressed elderly patients in primary care practices. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
19
36
http://hdl.handle.net/10.1093/biomet/asn056
application/pdf
Access to full text is restricted to subscribers.
Jing Cheng
Dylan S. Small
Zhiqiang Tan
Thomas R. Ten Have
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:647-6662013-03-04RePEc:oup:biomet
article
The accelerated gap times model
This paper develops a new semiparametric model for the effect of covariates on the conditional intensity of a recurrent event counting process. The model is a transparent extension of the accelerated failure time model for univariate survival data. Estimation of the regression parameter is motivated by semiparametric efficiency considerations, extending the class of weighted log-rank estimating functions originally proposed in Prentice (1978) and subsequently studied in detail by Tsiatis (1990) and Ritov (1990). A novel rank-based one-step estimator for the regression parameter is proposed. An Aalen-type estimator for the baseline intensity function is obtained. Asymptotics are handled with empirical process methods, and finite sample properties are studied via simulation. Finally, the new model is applied to the bladder tumour data of Byar (1980). Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
647
666
http://hdl.handle.net/10.1093/biomet/92.3.647
text/html
Access to full text is restricted to subscribers.
Robert L. Strawderman
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:747-7622013-03-04RePEc:oup:biomet
article
Censored linear regression for case-cohort studies
Right-censored data from a classical case-cohort design and a stratified case-cohort design are considered. In the classical case-cohort design the subcohort is obtained as a simple random sample of the entire cohort, whereas in the stratified design this subcohort is elected by independent Bernoulli sampling with arbitrary selection probabilities. For each design and under a linear regression model, methods for estimating the regression parameters are proposed and analysed. These methods are derived by modifying the linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar to the pseudolikelihood estimating equation that has been used in relative risk regression for these models. The estimators so obtained are shown to be consistent and asymptotically normal. Variance estimation and numerical illustrations are also provided. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
747
762
http://hdl.handle.net/10.1093/biomet/93.4.747
text/html
Access to full text is restricted to subscribers.
Bin Nan
Menggang Yu
John D. Kalbfleisch
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:153-1652013-03-04RePEc:oup:biomet
article
Modelling the effects of partially observed covariates on Poisson process intensity
We propose an estimating function for parameters in a model for Poisson process intensity when time- or space-varying covariates are observed for both the events of the process and at sample times or locations selected from a probability-based sampling design. We investigate the large-sample properties of the proposed estimator under increasing domain asymptotics, demonstrating that it is consistent and asymptotically normally distributed. We illustrate our approach using data from an ecological momentary assessment of smoking. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
153
165
http://hdl.handle.net/10.1093/biomet/asm009
application/pdf
Access to full text is restricted to subscribers.
Stephen L. Rathbun
Saul Shiffman
Chad J. Gwaltney
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:585-5962013-03-04RePEc:oup:biomet
article
Using logistic regression procedures for estimating receiver operating characteristic curves
Estimation of a receiver operating characteristic, ROC, curve is usually based either on a fully parametric model such as a normal model or on a fully nonparametric model. In this paper, we explore a semiparametric approach by assuming a density ratio model for disease and disease-free densities. This model has a natural connection with the logistic regression model. The proposed semiparametric approach is more robust than a fully parametric approach and is more efficient than a fully nonparametric approach. Two real examples demonstrate that the ROC curve estimated by our semiparametric method is much smoother than that estimated by the nonparametric method. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
585
596
Jing Qin
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:225-2292013-03-04RePEc:oup:biomet
article
Optimal main effect plans with non-orthogonal blocking
The current literature on fractional factorial plans in block designs centres around orthogonal blocking which may not, however, always be attainable because of practical restrictions on the block size. For general factorials, including asymmetric ones, sufficient conditions are indicated in this paper for a main effect plan to be universally optimal under possibly non-orthogonal blocking. A construction procedure is given using generalised Youden designs in conjunction with orthogonal arrays. We also illustrate how the procedure can be applied to obtain optimal main effect plans in the practically important situation where each factor has two or three levels and the block size is small. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
225
229
Rahul Mukerjee
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:953-9642013-03-04RePEc:oup:biomet
article
A Jackknife Variance Estimator for Unistage Stratified Samples with Unequal Probabilities
Existing jackknife variance estimators used with sample surveys can seriously overestimate the true variance under unistage stratified sampling without replacement with unequal probabilities. A novel jackknife variance estimator is proposed which is as numerically simple as existing jackknife variance estimators. Under certain regularity conditions, the proposed variance estimator is consistent under stratified sampling without replacement with unequal probabilities. The high entropy regularity condition necessary for consistency is shown to hold for the Rao--Sampford design. An empirical study of three unequal probability sampling designs supports our findings. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
953
964
http://hdl.handle.net/10.1093/biomet/asm072
application/pdf
Access to full text is restricted to subscribers.
Yves G. Berger
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:49-602013-03-04RePEc:oup:biomet
article
Optimal asymmetric one-sided group sequential tests
We extend the optimal symmetric group sequential tests of Eales & Jennison (1992) to the broader class of asymmetric designs. Two forms of asymmetry are considered, involving unequal type I and type II error rates and different emphases on expected sample sizes at the null and alternative hypotheses. We discuss the properties of our optimal designs and use them to assess the efficiency of the family of tests proposed by Pampallona & Tsiatis (1994) and two families of one-sided tests defined through error spending functions. We show that the error spending designs are highly efficient, while the easily implemented tests of Pampallona & Tsiatis are a little less efficient but still not far from optimal. Our results demonstrate that asymmetric designs can decrease the expected sample size under one hypothesis, but only at the expense of a significantly larger expected sample size under the other hypothesis. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
49
60
Stuart Barber
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:37-472013-03-04RePEc:oup:biomet
article
Graphical identifiability criteria for causal effects in studies with an unobserved treatment/response variable
We consider the problem of using data in studies with an unobserved treatment/response variable in order to evaluate average causal effects, when cause-effect relationships between variables can be described by a directed acyclic graph and the corresponding recursive factorization of a joint distribution. The paper proposes graphical criteria to test whether average causal effects are identifiable even if a treatment/response variable is unobserved. If the answer is affirmative, we provide further formulations for average causal effects from the observed data. The graphical criteria enable us to evaluate average causal effects when it is difficult to observe a treatment/response variable. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
37
47
http://hdl.handle.net/10.1093/biomet/asm005
application/pdf
Access to full text is restricted to subscribers.
Manabu Kuroki
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:559-5782013-03-04RePEc:oup:biomet
article
Fractional hot deck imputation
To compensate for item nonresponse, hot deck imputation procedures replace missing values with values that occur in the sample. Fractional hot deck imputation replaces each missing observation with a set of imputed values and assigns a weight to each imputed value. Under the model in which observations in an imputation cell are independently and identically distributed, fractional hot deck imputation is shown to be an effective imputation procedure. A consistent replication variance estimation procedure for estimators computed with fractional imputation is suggested. Simulations show that fractional imputation and the suggested variance estimator are superior to multiple imputation estimators in general, and much superior to multiple imputation for estimating the variance of a domain mean. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
559
578
Jae Kwang Kim
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:248-2482013-03-04RePEc:oup:biomet
article
A note on time-ordered classification
1
2009
96
Biometrika
248
248
http://hdl.handle.net/10.1093/biomet/asn065
application/pdf
Access to full text is restricted to subscribers.
H. He
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:605-6182013-03-04RePEc:oup:biomet
article
Semiparametric inference in observational duration-response studies, with duration possibly right-censored
Once treatment is found to be effective in clinical studies, attention often focuses on optimum or efficacious treatment delivery. In treatment duration-response studies, the optimum treatment delivery refers to the treatment length that optimises the mean response. In many studies, the treatment length is often left to the discretion of an attending investigator or physician but may be abruptly terminated because of treatment-terminating events. Thus, a recommended treatment length often delineates a 'treatment duration policy' which prescribes that treatment be given for a specified length of time or until a treatment-terminating event occurs, whichever comes first. Estimating a functional relationship between the response and a treatment duration policy, continuously in time, is the focus of this paper. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
605
618
http://hdl.handle.net/10.1093/biomet/92.3.605
text/html
Access to full text is restricted to subscribers.
Brent A. Johnson
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:613-6282013-03-04RePEc:oup:biomet
article
Large-sample properties of the periodogram estimator of seasonally persistent processes
Seasonally persistent models were first introduced by Andel (1986) and Gray et al. (1989) to extend autoregressive moving-average and fractionally differenced models and to encompass long-memory quasi-periodic behaviour. These models are, for certain ranges of parameters, stationary, and we prove here that the behaviour of the periodogram and other tapered estimators cannot be simply extended from the work of Kunsch (1986) and Hurvich & Beltrao (1993) on long memory induced by a pole at the origin. We demonstrate that potentially large both positive and negative bias can be found from the same value of the long-memory parameter, and that the new distribution can be easily written down in the case of Gaussian processes. We also consider using both the cosine taper and the sine taper. The extended least squares estimator is also considered in this context. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
613
628
Sofia C. Olhede
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:953-9662013-03-04RePEc:oup:biomet
article
Asymptotic distributions of principal components based on robust dispersions
Algebraically, principal components can be defined as the eigenvalues and eigenvectors of a covariance or correlation matrix, but they are statistically meaningful as successive projections of the multivariate data in the direction of maximal variability. An attractive alternative in robust principal component analysis is to replace the classical variability measure, i.e. variance, by a robust dispersion measure. This projection-pursuit approach was first proposed in Li & Chen (1985) as a method of constructing a robust scatter matrix. Recent unpublished work of C. Croux and A. Ruiz-Gazen provided the influence functions of the resulting principal components. The present paper focuses on the asymptotic distributions of robust principal components. In particular, we obtain the asymptotic normality of the principal components that maximise a robust dispersion measure. We also explain the need to use a dispersion functional with a continuous influence function. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
953
966
Hengjian Cui
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:75-922013-03-04RePEc:oup:biomet
article
Predicting future responses based on possibly mis-specified working models
Under a general regression setting, we propose an optimal unconditional prediction procedure for future responses. The resulting prediction intervals or regions have a desirable average coverage level over a set of covariate vectors of interest. When the working model is not correctly specified, the traditional conditional prediction method is generally invalid. On the other hand, one can empirically calibrate the above unconditional procedure and also obtain its crossvalidated counterpart. Various large and small sample properties of these unconditional methods are examined analytically and numerically. We find that the 𝒦-fold crossvalidated procedure performs exceptionally well even for cases with rather small sample sizes. The new proposals are illustrated with two real examples, one with a continuous response and the other with a binary outcome. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
75
92
http://hdl.handle.net/10.1093/biomet/asm078
application/pdf
Access to full text is restricted to subscribers.
Tianxi Cai
Lu Tian
Scott D. Solomon
L.J. Wei
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:705-7222013-03-04RePEc:oup:biomet
article
Empirical Bayes block shrinkage of wavelet coefficients via the noncentral χ-super-2 distribution
Empirical Bayes approaches to the shrinkage of empirical wavelet coefficients have generated considerable interest in recent years. Much of the work to date has focussed on shrinkage of individual wavelet coefficients in isolation. In this paper we propose an empirical Bayes approach to simultaneous shrinkage of wavelet coefficients in a block, based on the block sum of squares. Our approach exploits a useful identity satisfied by the noncentral χ-super-2 density and provides some tractable Bayesian block shrinkage procedures. Our numerical results indicate that the new procedures perform very well. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
705
722
http://hdl.handle.net/10.1093/biomet/93.3.705
text/html
Access to full text is restricted to subscribers.
Xue Wang
Andrew T. A. Wood
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:67-822013-03-04RePEc:oup:biomet
article
D-optimal design of split-split-plot experiments
In industrial experimentation, there is growing interest in studies that span more than one processing step. Convenience often dictates restrictions in randomization in passing from one processing step to another. When the study encompasses three processing steps, this leads to split-split-plot designs. We provide an algorithm for computing D-optimal split-split-plot designs and several illustrative examples. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
67
82
http://hdl.handle.net/10.1093/biomet/asn070
application/pdf
Access to full text is restricted to subscribers.
Bradley Jones
Peter Goos
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:351-3702013-03-04RePEc:oup:biomet
article
Conditional Akaike information for mixed-effects models
This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, cAIC. The penalty term in cAIC is related to the effective degrees of freedom ρ for a linear mixed model proposed by Hodges & Sargent (2001); ρ reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The cAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data application is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
351
370
http://hdl.handle.net/10.1093/biomet/92.2.351
text/html
Access to full text is restricted to subscribers.
Florin Vaida
Suzette Blanchard
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:127-1372013-03-04RePEc:oup:biomet
article
Implementing matching priors for frequentist inference
Nuisance parameters do not pose any problems in Bayesian inference as marginalisation allows for study of the posterior distribution solely in terms of the parameter of interest. However, no general solution is available for removing nuisance parameters under the frequentist paradigm. In this paper, we merge the two approaches to construct a general procedure for frequentist elimination of nuisance parameters through the use of matching priors. In particular, we perform Bayesian marginalisation with respect to a prior distribution under which posterior inferences have approximate frequentist validity. Matching priors are constructed as solutions to a partial differential equation. Unfortunately, except in simple cases, these partial differential equations do not yield to analytical nor even standard numerical methods of solution. We present a numerical/Monte Carlo algorithm for obtaining the matching prior, in general, as a solution to the appropriate partial differential equation and draw posterior inferences. To be specific, we develop an automated routine through an implementation of the Metropolis--Hastings algorithm for deriving frequentist valid inferences via the matching prior. We illustrate our results in the contexts of fitting random effects models, fitting logistic regression models and fitting teratological data by beta-binomial models. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
127
137
Richard A. Levine
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:992-9982013-03-04RePEc:oup:biomet
article
Use of the Gibbs Sampler to Obtain Conditional Tests, with Applications
A random sample is drawn from a distribution which admits a minimal sufficient statistic for the parameters. The Gibbs sampler is proposed to generate samples, called conditionally sufficient or co-sufficient samples, from the conditional distribution of the sample given its value of the sufficient statistic. The procedure is illustrated for the gamma distribution. Co-sufficient samples may be used to give exact tests of fit; for the gamma distribution these are compared for size and power with approximate tests based on the parametric bootstrap. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
992
998
http://hdl.handle.net/10.1093/biomet/asm065
application/pdf
Access to full text is restricted to subscribers.
Richard A. Lockhart
Federico J. O'Reilly
Michael A. Stephens
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:939-9522013-03-04RePEc:oup:biomet
article
A Hybrid Pairwise Likelihood Method
A modification to the pairwise likelihood method is proposed, which aims to improve the estimation of the marginal distribution parameters. This is achieved by replacing the pairwise likelihood score equations, for estimating such parameters, by the optimal linear combinations of the marginal score functions. A further advantage of the proposed estimator of marginal parameters, over pairwise likelihood, is that it is robust to misspecification of the bivariate distributions as long as the univariate marginal distributions are correctly specified. While alternating logistic regression can be seen as a special case of the proposed method, it is shown that an existing generalization of alternating logistic regression applicable to ordinal data is not the same as and is inferior to the proposed method because it replaces certain conditional densities by pseudodensities that assume working independence. The fitting of the multivariate negative binomial distribution is another scenario involving intractable likelihood that calls for the use of pairwise likelihood methods, and the superiority of the modified method is demonstrated in a simulation study. Two examples, based on the analyses of salamander mating and patient-controlled analgesia data, demonstrate the usefulness of the proposed method. The possibility of combining optimally the pairwise, rather than marginal, scores is also considered and its difficulty and potential are discussed. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
939
952
http://hdl.handle.net/10.1093/biomet/asm051
application/pdf
Access to full text is restricted to subscribers.
Anthony Y. C. Kuk
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:469-4792013-03-04RePEc:oup:biomet
article
Determining the dimension of the central subspace and central mean subspace
The central subspace and central mean subspace are two important targets of sufficient dimension reduction. We propose a weighted chi-squared test to determine their dimensions based on matrices whose column spaces are exactly equal to the central subspace or the central mean subspace. The asymptotic distribution of the test statistic is obtained. Simulation examples are used to demonstrate the performance of this test. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
469
479
http://hdl.handle.net/10.1093/biomet/asn002
application/pdf
Access to full text is restricted to subscribers.
Peng Zeng
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:927-9412013-03-04RePEc:oup:biomet
article
Modelling of covariance structures in generalised estimating equations for longitudinal data
When used for modelling longitudinal data generalised estimating equations specify a working structure for the within-subject covariance matrices, aiming to produce efficient parameter estimators. However, misspecification of the working covariance structure may lead to a large loss of efficiency of the estimators of the mean parameters. In this paper we propose an approach for joint modelling of the mean and covariance structures of longitudinal data within the framework of generalised estimating equations. The resulting estimators for the mean and covariance parameters are shown to be consistent and asymptotically Normally distributed. Real data analysis and simulation studies show that the proposed approach yields e?cient estimators for both the mean and covariance parameters. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
927
941
http://hdl.handle.net/10.1093/biomet/93.4.927
text/html
Access to full text is restricted to subscribers.
Huajun Ye
Jianxin Pan
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:905-9162013-03-04RePEc:oup:biomet
article
Semiparametric inference in matched case-control studies with missing covariate data
We consider the problem of matched studies with a binary outcome that are analysed using conditional logistic regression, and for which data on some covariates are missing for some study participants. Methods for this problem involve either modelling the distribution of missing covariates or modelling the probability of data being missing. For this second approach, the previously proposed method did not make use of data for those persons with missing covariate data except in the model for the missingness. We propose a new class of estimators that use outcome and available covariate data for all study participants, and show that a particular member of this class always has better efficiency than the previously proposed estimator. We illustrate the efficiency gains that are possible with our approach using simulated data. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
905
916
Paul J. Rathouz
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:37-502013-03-04RePEc:oup:biomet
article
Partial and latent ignorability in missing-data problems
When an assumption of missing at random is untenable, it becomes necessary to model missing-data indicators, which carry information about the parameters of the complete-data population. Within a given application, however, researchers may believe that some aspects of missingness are ignorable but others are not. We argue that there are two different ways to formalize the notion that only part of the missingness is ignorable. These approaches correspond to assumptions that we call partially missing at random and latently missing at random. We explain these concepts and apply them in a latent-class analysis of survey questions with item nonresponse. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
37
50
http://hdl.handle.net/10.1093/biomet/asn069
application/pdf
Access to full text is restricted to subscribers.
Ofer Harel
Joseph L. Schafer
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:977-9842013-03-04RePEc:oup:biomet
article
Miscellanea Kernel-Type Density Estimation on the Unit Interval
We consider kernel-type methods for the estimation of a density on 0,1 which eschew explicit boundary correction. We propose using kernels that are symmetric in their two arguments; these kernels are conditional densities of bivariate copulas. We give asymptotic theory for the version of the new estimator using Gaussian copula kernels and report on simulation comparisons of it with the beta-kernel density estimator of Chen ([1]). We also provide automatic bandwidth selection in the form of 'rule-of-thumb' bandwidths for both estimators. As well as its competitive integrated squared error performance, advantages of the new approach include its greater range of possible values at 0 and 1, the fact that it is a bona fide density and that the individual kernels and resulting estimator are comprehensible in terms of a single simple picture. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
977
984
http://hdl.handle.net/10.1093/biomet/asm068
application/pdf
Access to full text is restricted to subscribers.
M.C. Jones
D.A. Henderson
oai:RePEc:oup:biomet:v:96:y:2009:i:1:p:237-2422013-03-04RePEc:oup:biomet
article
A note on cause-specific residual life
In medical research, investigators often wish to characterize the distributions of remaining lifetimes. While nonparametric analyses of residual life distributions have been widely studied with independently right-censored data, residual life analysis has not been examined in the competing risks setting, with multiple, potentially dependent, failure types. We define the cause-specific residual life distribution as the residual cumulative incidence function conditionally on survival to a given time. Because of the improper form of the cause-specific distribution, the mean cause-specific residual lifetime does not exist, theoretically. We develop nonparametric inferences for the cause-specific residual life function and its corresponding quantiles, which may exist. Theoretical justification, including uniform consistency and weak convergence, is established. Simulation studies and a breast cancer data analysis demonstrate the practical utility of the methods. Copyright 2009, Oxford University Press.
1
2009
96
Biometrika
237
242
http://hdl.handle.net/10.1093/biomet/asn063
application/pdf
Access to full text is restricted to subscribers.
J.-H. Jeong
J. P. Fine
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:514-5202013-03-04RePEc:oup:biomet
article
A new class of average moment matching priors
We derive a new class of priors for the variance component in the Fay--Herriot model, a mixed regression model widely used in small area estimation. This class includes the well-known uniform or superharmonic prior. Through simulation we illustrate the use of our class of priors. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
514
520
http://hdl.handle.net/10.1093/biomet/asn008
application/pdf
Access to full text is restricted to subscribers.
N. Ganesh
P. Lahiri
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:785-8062013-03-04RePEc:oup:biomet
article
Bayesian model discrimination for multiple strata capture-recapture data
Extending the work of Dupuis (1995), we motivate a range of biologically plausible models for multiple-site capture-recapture and show how the original Gibbs sampling algorithm of Dupuis can be extended to obtain posterior model probabilities using reversible jump Markov chain Monte Carlo. This model selection procedure improves upon previous analyses in two distinct ways. First, Bayesian model averaging provides a robust parameter estimation technique which properly incorporates model uncertainty in the resulting intervals. Secondly, by discriminating among perhaps millions of competing models, we are able to discern fine structure within the data and thereby answer questions of primary biological importance. We demonstrate how reversible jump Markov chain Monte Carlo methods provide the only viable method for exploring model spaces of this size. We examine the lizard data discussed in Dupuis (1995) and show that most of the posterior mass is placed upon models not previously considered for these data. We discuss model discrimination and model averaging and focus upon the increased scientific understanding of the data obtained via the Bayesian model comparison procedure. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
785
806
R. King
oai:RePEc:oup:biomet:v:92:y:2005:i:2:p:451-4642013-03-04RePEc:oup:biomet
article
Monte Carlo conditioning on a sufficient statistic
In this paper we derive general formulae suitable for Monte Carlo computation of conditional expectations of functions of a random vector given a sufficient statistic. The problem of direct sampling from the conditional distribution is considered in particular. It is shown that this can be done by a simple parameter adjustment of the original statistical model, provided the model has a certain pivotal structure. A connection with a classical problem regarding fiducial and posterior distributions is pointed out. Copyright 2005, Oxford University Press.
2
2005
92
June
Biometrika
451
464
http://hdl.handle.net/10.1093/biomet/92.2.451
text/html
Access to full text is restricted to subscribers.
Bo Henry Lindqvist
Gunnar Taraldsen
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:809-8302013-03-04RePEc:oup:biomet
article
Efficient estimation of covariance selection models
A Bayesian method is proposed for estimating an inverse covariance matrix from Gaussian data. The method is based on a prior that allows the off-diagonal elements of the inverse covariance matrix to be zero, and in many applications results in a parsimonious parameterisation of the covariance matrix. No assumption is made about the structure of the corresponding graphical model, so the method applies to both nondecomposable and decomposable graphs. All the parameters are estimated by model averaging using an efficient Metropolis--Hastings sampling scheme. A simulation study demonstrates that the method produces statistically efficient estimators of the covariance matrix, when the inverse covariance matrix is sparse. The methodology is illustrated by applying it to three examples that are high-dimensional relative to the sample size. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
809
830
Frederick Wong
oai:RePEc:oup:biomet:v:93:y:2006:i:4:p:1025-10262013-03-04RePEc:oup:biomet
article
Amendments and Corrections
Arising from an omitted term in a calculation in the Appendix, variance formulae in the paper should be adjusted. In particular, the constants in the numerators of equations (2·4) and (2·15) should be 6 rather than 18. Variances are, however, still higher than in the case of least-squares estimators. The changes are implied by the following corrections to the Appendix. On p. 423, 2cδΔ′-sub-cos(ω-super-(k)) should be included within braces on lines 11 and 17, and 2cδΔ′-sub-sin(ω-super-(k)) should be added within braces on lines 12 and 18, leading to the extra term 2cm-super- - 3/2{Δ′-sub-sin(ω-super-(k))Gamma-sub-sin-super-(k) + Δ′-sub-cos(ω-super-(k))Gamma-sub-cos-super-(k)} on line 21. We are grateful to Barry Quinn for drawing our attention to this error. Copyright 2006, Oxford University Press.
4
2006
93
December
Biometrika
1025
1026
http://hdl.handle.net/10.1093/biomet/93.4.1025-b
text/html
Access to full text is restricted to subscribers.
Peter Hall
Ming Li
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:539-5532013-03-04RePEc:oup:biomet
article
A new approach to weighting and inference in sample surveys
The validity of design-based inference is not dependent on any model assumption. However, it is well known that estimators derived through design-based theory may be inefficient for the estimation of population totals when the design weights are weakly related to the variables of interest and have widely dispersed values. We propose estimators that have the potential to improve the efficiency of any estimator derived under the design-based theory. Our main focus is limited to the improvement of the Horvitz--Thompson estimator, but we also discuss the extension to calibration estimators. The new estimators are obtained by smoothing design or calibration weights using an appropriate model. Our approach to inference requires the modelling of only one variable, the weight, and it leads to a single set of smoothed weights in multipurpose surveys. This is to be contrasted with other model-based approaches, such as the prediction approach, in which it is necessary to postulate and validate a model for each variable of interest leading potentially to variable-specific sets of weights. Our proposed approach is first justified theoretically and then evaluated through a simulation study. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
539
553
http://hdl.handle.net/10.1093/biomet/asn028
application/pdf
Access to full text is restricted to subscribers.
Jean-François Beaumont
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:17-332013-03-04RePEc:oup:biomet
article
Distortion of effects caused by indirect confounding
Undetected confounding may severely distort the effect of an explanatory variable on a response variable, as defined by a stepwise data-generating process. The best known type of distortion, which we call direct confounding, arises from an unobserved explanatory variable common to a response and its main explanatory variable of interest. It is relevant mainly for observational studies, since it is avoided by successful randomization. By contrast, indirect confounding, which we identify in this paper, is an issue also for intervention studies. For general stepwise-generating processes, we provide matrix and graphical criteria to decide which types of distortion may be present, when they are absent and how they are avoided. We then turn to linear systems without other types of distortion, but with indirect confounding. For such systems, the magnitude of distortion in a least-squares regression coefficient is derived and shown to be estimable, so that it becomes possible to recover the effect of the generating process from the distorted coefficient. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
17
33
http://hdl.handle.net/10.1093/biomet/asm092
application/pdf
Access to full text is restricted to subscribers.
Nanny Wermuth
D. R. Cox
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:603-6132013-03-04RePEc:oup:biomet
article
Sparse sufficient dimension reduction
Existing sufficient dimension reduction methods suffer from the fact that each dimension reduction component is a linear combination of all the original predictors, so that it is difficult to interpret the resulting estimates. We propose a unified estimation strategy, which combines a regression-type formulation of sufficient dimension reduction methods and shrinkage estimation, to produce sparse and accurate solutions. The method can be applied to most existing sufficient dimension reduction methods such as sliced inverse regression, sliced average variance estimation and principal Hessian directions. We demonstrate the effectiveness of the proposed method by both simulations and real data analysis. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
603
613
http://hdl.handle.net/10.1093/biomet/asm044
application/pdf
Access to full text is restricted to subscribers.
Lexin Li
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:425-4462013-03-04RePEc:oup:biomet
article
Bayes linear kinematics and Bayes linear Bayes graphical models
Probability kinematics (Jeffrey, 1965, 1983) furnishes a method for revising a prior probability specification based upon new probabilities over a partition. We develop a corresponding Bayes linear kinematic for a Bayes linear analysis given information which changes our beliefs about a random vector in some generalised way. We derive necessary and sufficient conditions for commutativity of successive Bayes linear kinematics which depend upon the eigenstructure of the joint kinematic resolution transform. As an application we introduce the Bayes linear Bayes graphical model, which is a mixture of fully Bayesian and Bayes linear graphical models, combining the simplicity of Gaussian graphical models with the ability to allow conditioning on marginal distributions of any form, and exploit Bayes linear kinematics to embed full conditional updates within Bayes linear belief adjustments. The theory is illustrated with a treatment of partition testing for software reliability. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
425
446
Michael Goldstein
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:489-5072013-03-04RePEc:oup:biomet
article
Diagnostic measures for empirical likelihood of general estimating equations
We develop diagnostic measures for assessing the influence of individual observations when using empirical likelihood with general estimating equations, and we use these measures to construct goodness-of-fit statistics for testing possible misspecification in the estimating equations. Our diagnostics include case-deletion measures, local influence measures and pseudo-residuals. Our goodness-of-fit statistics include the sum of local influence measures and the processes of pseudo-residuals. Simulation studies are conducted to evaluate our methods, and real datasets are analyzed to illustrate the use of our diagnostic measures and goodness-of-fit statistics. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
489
507
http://hdl.handle.net/10.1093/biomet/asm094
application/pdf
Access to full text is restricted to subscribers.
Hongtu Zhu
Joseph G. Ibrahim
Niansheng Tang
Heping Zhang
oai:RePEc:oup:biomet:v:89:y:2002:i:3:p:635-6482013-03-04RePEc:oup:biomet
article
Comparing nonnested Cox models
We derive the limiting distribution of the partial likelihood ratio under general conditions. The multiplicative hazards models being fitted may be nonnested and misspecified. The true model is not assumed to contain either model under consideration. The null hypothesis is that the models are equidistant in Kullback--Leibler metric applied to the rank likelihood. The statistic is consistent for the model which is closer to the truth. Its distribution depends on the unknown data-generating mechanism. A sequential testing procedure is proposed for nonnested comparisons which is valid regardless of the true model. This involves a novel statistic for the equality of the fitted models which is separate from the partial likelihood. The methodology has important applications in model assessment. Simulations and a real example demonstrate its utility in selecting the functional forms of covariates and relative risks. Copyright Biometrika Trust 2002, Oxford University Press.
3
2002
89
August
Biometrika
635
648
J. P. Fine
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:345-3582013-03-04RePEc:oup:biomet
article
Empirical likelihood-based inference in linear errors-in-covariables models with validation data
Linear errors-in-covariables models are considered, assuming the availability of independent validation data on the covariables in addition to primary data on the response variable and surrogate covariables. We first develop an estimated empirical loglikelihood with the help of validation data and prove that its asymptotic distribution is that of a weighted sum of independent standard x-super-2-sub-1 random variables with unknown weights. By estimating the unknown weights consistently, we construct an estimated empirical likelihood confidence region for the regression parameter vector. We also suggest an adjusted empirical loglikelihood and prove that its asymptotic distribution is a standard x-super-2. To avoid estimating the unknown weights or the adjustment factor, we propose a partially smoothed bootstrap empirical loglikelihood for constructing a confidence region which has asymptotically correct coverage probability. A simulation study is conducted to compare the proposed methods with a method based on a normal approximation in terms of coverage accuracy and average length of the confidence interval. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
345
358
Qihua Wang
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:992-9962013-03-04RePEc:oup:biomet
article
On the consequences of overstratification
It is common, in particular in observational studies in epidemiology, to impose stratification to adjust for possible effects of age and other variables on the binary outcome of interest. Overstratification may lower the precision of the estimated effects of interest. Understratification risks bias. These issues are studied analytically. Asymptotic results show that loss of efficiency depends on the true effect and on a measure of the average imbalance across strata between exposed and unexposed individuals. Bias depends on the correlation between stratum-specific size imbalances and event rates in the unexposed. Approximate results are also given. An example is used. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
992
996
http://hdl.handle.net/10.1093/biomet/asn039
application/pdf
Access to full text is restricted to subscribers.
B. L. De Stavola
D. R. Cox
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:559-5712013-03-04RePEc:oup:biomet
article
Locally-efficient robust estimation of haplotype-disease association in family-based studies
Modelling human genetic variation is critical to understanding the genetic basis of complex disease. The Human Genome Project has discovered millions of binary DNA sequence variants, called single nucleotide polymorphisms, and millions more may exist. As coding for proteins takes place along chromosomes, organisation of polymorphisms along each chromosome, the haplotype phase structure, may prove to be most important in discovering genetic variants associated with disease. As haplotype phase is often uncertain, procedures that model the distribution of parental haplotypes can, if this distribution is misspecified, lead to substantial bias in parameter estimates even when complete genotype information is available. Using a geometric approach to estimation in the presence of nuisance parameters, we address this problem and develop locally-efficient estimators of the effect of haplotypes on disease that are robust to incorrect estimates of haplotype frequencies. The methods are demonstrated with a simulation study of a case-parent design. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
559
571
http://hdl.handle.net/10.1093/biomet/92.3.559
text/html
Access to full text is restricted to subscribers.
Andrew S. Allen
Glen A. Satten
Anastasios A. Tsiatis
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:491-5072013-03-04RePEc:oup:biomet
article
Adaptive linear step-up procedures that control the false discovery rate
The linear step-up multiple testing procedure controls the false discovery rate at the desired level q for independent and positively dependent test statistics. When all null hypotheses are true, and the test statistics are independent and continuous, the bound is sharp. When some of the null hypotheses are not true, the procedure is conservative by a factor which is the proportion m-sub-0/m of the true null hypotheses among the hypotheses. We provide a new two-stage procedure in which the linear step-up procedure is used in stage one to estimate m-sub-0, providing a new level q′ which is used in the linear step-up procedure in the second stage. We prove that a general form of the two-stage procedure controls the false discovery rate at the desired level q. This framework enables us to study analytically the properties of other procedures that exist in the literature. A simulation study is presented that shows that two-stage adaptive procedures improve in power over the original procedure, mainly because they provide tighter control of the false discovery rate. We further study the performance of the current suggestions, some variations of the procedures, and previous suggestions, in the case where the test statistics are positively dependent, a case for which the original procedure controls the false discovery rate. In the setting studied here the newly proposed two-stage procedure is the only one that controls the false discovery rate. The procedures are illustrated with two examples of biological importance. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
491
507
http://hdl.handle.net/10.1093/biomet/93.3.491
text/html
Access to full text is restricted to subscribers.
Yoav Benjamini
Abba M. Krieger
Daniel Yekutieli
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:231-2422013-03-04RePEc:oup:biomet
article
Optimal sufficient dimension reduction for the conditional mean in multivariate regression
The aim of this article is to develop optimal sufficient dimension reduction methodology for the conditional mean in multivariate regression. The context is roughly the same as that of a related method by Cook & Setodji (2003), but the new method has several advantages. It is asymptotically optimal in the sense described herein and its test statistic for dimension always has a chi-squared distribution asymptotically under the null hypothesis. Additionally, the optimal method allows tests of predictor effects. A comparison of the two methods is provided. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
231
242
http://hdl.handle.net/10.1093/biomet/asm003
application/pdf
Access to full text is restricted to subscribers.
Jae Keun Yoo
R. Dennis Cook
oai:RePEc:oup:biomet:v:95:y:2008:i:2:p:509-5132013-03-04RePEc:oup:biomet
article
A note on deletion diagnostics for estimating equations
We describe an algorithm based upon the Sherman--Morrison--Woodbury formula for the inversion of matrices with special structure that occur in formulae for deletion diagnostics. Substantial computational savings relative to a method based upon Cholesky's decomposition are illustrated. The result has broad application to regression diagnostics for clustered data. Copyright 2008, Oxford University Press.
2
2008
95
Biometrika
509
513
http://hdl.handle.net/10.1093/biomet/asn019
application/pdf
Access to full text is restricted to subscribers.
John S. Preisser
Bahjat F. Qaqish
Jamie Perin
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:119-1342013-03-04RePEc:oup:biomet
article
Constrained local likelihood estimators for semiparametric skew-normal distributions
A local likelihood estimator for a nonparametric nuisance function is proposed in the context of semiparametric skew-normal distributions. Constraints imposed on such functions result in a nonparametric estimator with a different target function for maximization from classical local likelihood estimators. The optimal asymptotic semiparametric efficiency bound on parameters of interest is achieved by using this estimator in conjunction with an estimating equation formed by summing efficient scores. A generalized profile likelihood approach is also proposed. This method has the advantage of providing a unique estimate in cases where an estimating equation has multiple solutions. Our nonparametric estimator of the nuisance function leads to an estimator of the semiparametric skew-normal density. Both the estimating equation and profile likelihood approaches are applicable to more general skew-symmetric distributions. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
119
134
http://hdl.handle.net/10.1093/biomet/asm020
application/pdf
Access to full text is restricted to subscribers.
Yanyuan Ma
Jeffrey D. Hart
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:821-8302013-03-04RePEc:oup:biomet
article
Forward adaptive banding for estimating large covariance matrices
We propose a simple forward adaptive banding method for estimating large covariance matrices using the modified Cholesky decomposition. This approach requires the fitting of a prespecified set of models due to the adaptive banding structure and can be efficiently implemented. Aside from its computational attractiveness, we propose a novel Bayes information criterion that gives consistent model selection for estimating high dimensional covariance matrices. The method compares favourably to its competitors in simulation study. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
821
830
http://hdl.handle.net/10.1093/biomet/asr045
application/pdf
Access to full text is restricted to subscribers.
Chenlei Leng
Bo Li
oai:RePEc:oup:biomet:v:92:y:2005:i:3:p:633-6462013-03-04RePEc:oup:biomet
article
Bayesian adaptive designs for clinical trials
A Bayesian adaptive design is proposed for a comparative two-armed clinical trial using decision-theoretic approaches. A loss function is specified, based on the cost for each patient and the costs of making incorrect decisions at the end of a trial. At each interim analysis, the decision to terminate or to continue the trial is based on the expected loss function while concurrently incorporating efficacy, futility and cost. The maximum number of interim analyses is determined adaptively by the observed data. We derive explicit connections between the loss function and the frequentist error rates, so that the desired frequentist properties can be maintained for regulatory settings. The operating characteristics of the design can be evaluated on frequentist grounds. Extensive simulations are carried out to compare the proposed design with existing ones. The design is general enough to accommodate both continuous and discrete types of data. We illustrate the methods with an animal study evaluating a medical treatment for cardiac arrest. Copyright 2005, Oxford University Press.
3
2005
92
September
Biometrika
633
646
http://hdl.handle.net/10.1093/biomet/92.3.633
text/html
Access to full text is restricted to subscribers.
Yi Cheng
Yu Shen
oai:RePEc:oup:biomet:v:94:y:2007:i:1:p:185-1982013-03-04RePEc:oup:biomet
article
Partially linear models with missing response variables and error-prone covariates
We consider partially linear models of the form Y = X-super-Tβ + ν(Z) + ɛ when the response variable Y is sometimes missing with missingness probability π depending on (X, Z), and the covariate X is measured with error, where ν(z) is an unspecified smooth function. The missingness structure is therefore missing not at random, rather than the usual missing at random. We propose a class of semiparametric estimators for the parameter of interest β, as well as for the population mean E(Y). The resulting estimators are shown to be consistent and asymptotically normal under general assumptions. To construct a confidence region for β, we also propose an empirical-likelihood-based statistic, which is shown to have a chi-squared distribution asymptotically. The proposed methods are applied to an AIDS clinical trial dataset. A simulation study is also reported. Copyright 2007, Oxford University Press.
1
2007
94
Biometrika
185
198
http://hdl.handle.net/10.1093/biomet/asm010
application/pdf
Access to full text is restricted to subscribers.
Hua Liang
Suojin Wang
Raymond J. Carroll
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:851-8602013-03-04RePEc:oup:biomet
article
A practical affine equivariant multivariate median
A robust affine equivariant estimator of location for multivariate data is proposed which becomes the univariate median for data of dimension one. The estimator is robust in the sense that it has a bounded influence function, a positive breakdown value and has high efficiency compared to the sample mean for heavy-tailed distributions. Perhaps its greatest strength is that, unlike other affine equivariant multivariate medians, it is easily computed for data in any practical dimension. Copyright Biometrika Trust 2002, Oxford University Press.
4
2002
89
December
Biometrika
851
860
Thomas P. Hettmansperger
oai:RePEc:oup:biomet:v:98:y:2011:i:3:p:685-7002013-03-04RePEc:oup:biomet
article
Conditional Akaike information under generalized linear and proportional hazards mixed models
We study model selection for clustered data, when the focus is on cluster specific inference. Such data are often modelled using random effects, and conditional Akaike information was proposed in Vaida & Blanchard (2005) and used to derive an information criterion under linear mixed models. Here we extend the approach to generalized linear and proportional hazards mixed models. Outside the normal linear mixed models, exact calculations are not available and we resort to asymptotic approximations. In the presence of nuisance parameters, a profile conditional Akaike information is proposed. Bootstrap methods are considered for their potential advantage in finite samples. Simulations show that the performance of the bootstrap and the analytic criteria are comparable, with bootstrap demonstrating some advantages for larger cluster sizes. The proposed criteria are applied to two cancer datasets to select models when the cluster-specific inference is of interest. Copyright 2011, Oxford University Press.
3
2011
98
Biometrika
685
700
http://hdl.handle.net/10.1093/biomet/asr023
application/pdf
Access to full text is restricted to subscribers.
M. C. Donohue
R. Overholser
R. Xu
F. Vaida
oai:RePEc:oup:biomet:v:94:y:2007:i:3:p:529-5422013-03-04RePEc:oup:biomet
article
Integrated likelihood functions for non-Bayesian inference
Consider a model with parameter θ = (ψ, λ), where ψ is the parameter of interest, and let L(ψ, λ) denote the likelihood function. One approach to likelihood inference for ψ is to use an integrated likelihood function, in which λ is eliminated from L(ψ, λ) by integrating with respect to a density function π(λ|ψ). The goal of this paper is to consider the problem of selecting π(λ|ψ) so that the resulting integrated likelihood function is useful for non-Bayesian likelihood inference. The desirable properties of an integrated likelihood function are analyzed and these suggest that π(λ|ψ) should be chosen by finding a nuisance parameter ϕ that is unrelated to ψ and then taking the prior density for ϕ to be independent of ψ. Such an unrelated parameter is constructed and the resulting integrated likelihood is shown to be closely related to the modified profile likelihood. Copyright 2007, Oxford University Press.
3
2007
94
Biometrika
529
542
http://hdl.handle.net/10.1093/biomet/asm040
application/pdf
Access to full text is restricted to subscribers.
Thomas A. Severini
oai:RePEc:oup:biomet:v:93:y:2006:i:3:p:735-7412013-03-04RePEc:oup:biomet
article
Prospective survival analysis with a general semiparametric shared frailty model: A pseudo full likelihood approach
We provide a simple estimation procedure for a general frailty model for the analysis of prospective correlated failure times. The large-sample properties of the proposed estimators of both the regression coefficient vector and the dependence parameter are described, and consistent variance estimators are given. A brief outline of the proofs is given. In a simulation study under the widely used gamma frailty model, our proposed approach was found to have essentially the same efficiency as the EM-based maximum likelihood approach considered by other authors, with negligible difference between the standard errors of the two estimators. However, the proposed approach provides a framework capable of handling general frailty distributions with finite moments and yields an explicit consistent variance estimator. Copyright 2006, Oxford University Press.
3
2006
93
September
Biometrika
735
741
http://hdl.handle.net/10.1093/biomet/93.3.735
text/html
Access to full text is restricted to subscribers.
Malka Gorfine
David M. Zucker
Li Hsu
oai:RePEc:oup:biomet:v:95:y:2008:i:4:p:947-9602013-03-04RePEc:oup:biomet
article
Semiparametric maximum likelihood estimation in normal transformation models for bivariate survival data
We consider a class of semiparametric normal transformation models for right-censored bivariate failure times. Nonparametric hazard rate models are transformed to a standard normal model and a joint normal distribution is assumed for the bivariate vector of transformed variates. A semiparametric maximum likelihood estimation procedure is developed for estimating the marginal survival distribution and the pairwise correlation parameters. This produces an efficient estimator of the correlation parameter of the semiparametric normal transformation model, which characterizes the dependence of bivariate survival outcomes. In addition, a simple positive-mass-redistribution algorithm can be used to implement the estimation procedures. Since the likelihood function involves infinite-dimensional parameters, empirical process theory is utilized to study the asymptotic properties of the proposed estimators, which are shown to be consistent, asymptotically normal and semiparametric efficient. A simple estimator for the variance of the estimates is derived. Finite sample performance is evaluated via extensive simulations. Copyright 2008, Oxford University Press.
4
2008
95
Biometrika
947
960
http://hdl.handle.net/10.1093/biomet/asn049
application/pdf
Access to full text is restricted to subscribers.
Yi Li
Ross L. Prentice
Xihong Lin
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:965-9752013-03-04RePEc:oup:biomet
article
Hochberg's Step-Up Method: Cutting Corners Off Holm's Step-Down Method
Holm's method and Hochberg's method for multiple testing can be viewed as step-down and step-up versions of the Bonferroni test. We show that both are special cases of partition testing. The difference is that, while Holm's method tests each partition hypothesis using the largest order statistic, setting a critical value based on the Bonferroni inequality, Hochberg's method tests each partition hypothesis using all the order statistics, setting a series of critical values based on Simes' inequality. Geometrically, Hochberg's step-up method 'cuts corners' off the acceptance regions of Holm's step-down method by making assumptions on the joint distribution of the test statistics. As can be expected, partition testing making use of the joint distribution of the test statistics is more powerful than partition testing using probabilistic inequalities. Thus, if the joint distribution of the test statistics is available, through modelling for example, we recommend partition step-down testing, setting exact critical values based on the joint distribution. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
965
975
http://hdl.handle.net/10.1093/biomet/asm067
application/pdf
Access to full text is restricted to subscribers.
Yifan Huang
Jason C. Hsu
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:233-2402013-03-04RePEc:oup:biomet
article
Nonparametric estimation of cause-specific cross hazard ratio with bivariate competing risks data
We propose an alternative representation of the cause-specific cross hazard ratio for bivariate competing risks data. The representation leads to a simple plug-in estimator, unlike an existing ad hoc procedure. The large sample properties of the resulting inferences are established. Simulations and a real data example demonstrate that the proposed methodology may substantially reduce the computational burden of the existing procedure, while maintaining similar efficiency properties. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
233
240
http://hdl.handle.net/10.1093/biomet/asm089
application/pdf
Access to full text is restricted to subscribers.
Yu Cheng
Jason P. Fine
oai:RePEc:oup:biomet:v:91:y:2004:i:2:p:277-2902013-03-04RePEc:oup:biomet
article
Semiparametric regression analysis for doubly censored data
We analyse doubly censored data using semiparametric transformation models. We provide inference procedures for the regression parameters and derive the asymptotic distributions of the proposed estimators. Procedures for model checking and model selection are also discussed. We illustrate our approach with a viral-load dataset from a recent AIDS clinical trial. Copyright Biometrika Trust 2004, Oxford University Press.
2
2004
91
June
Biometrika
277
290
T. Cai
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:643-6542013-03-04RePEc:oup:biomet
article
A multiple-imputation Metropolis version of the EM algorithm
In this paper we introduce a new stochastic variant of the EM algorithm. The algorithm combines the principle of multiple imputation and the theory of simulated annealing to deal with cases where the E-step and the M-step can be intractable or numerically inefficient. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
643
654
Carlo Gaetan
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:31-462013-03-04RePEc:oup:biomet
article
Bayesian exponentially tilted empirical likelihood
While empirical likelihood has been shown to exhibit many of the properties of conventional parametric likelihoods, a formal probabilistic interpretation has so far been lacking. We show that a likelihood function very closely related to empirical likelihood naturally arises from a nonparametric Bayesian procedure which places a type of noninformative prior on the space of distributions. This prior gives preference to distributions having a small support and, among those sharing the same support, it favours entropy-maximising distributions. The resulting nonparametric Bayesian procedure admits a computationally convenient representation as an empirical-likelihood-type likelihood where the probability weights are obtained via exponential tilting. The proposed methodology provides an attractive alternative to the Bayesian bootstrap as a nonparametric limit of a Bayesian procedure for moment condition models. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
31
46
http://hdl.handle.net/10.1093/biomet/92.1.31
text/html
Access to full text is restricted to subscribers.
Susanne M. Schennach
oai:RePEc:oup:biomet:v:95:y:2008:i:3:p:621-6342013-03-04RePEc:oup:biomet
article
Pointwise testing with functional data using the Westfall--Young randomization method
We consider hypothesis testing with smooth functional data by performing pointwise tests and applying a multiple comparisons procedure. Methods based on general inequalities, such as Bonferroni's method, do not perform well because of the high correlation between observations at nearby points. We consider the multiple comparison procedure proposed by Westfall & Young (1993) and show that it approximates a multiple comparison correction for a continuum of comparisons as the grid for pointwise comparisons becomes finer. Simulations and an application verify that this result applies in practical settings. Copyright 2008, Oxford University Press.
3
2008
95
Biometrika
621
634
http://hdl.handle.net/10.1093/biomet/asn021
application/pdf
Access to full text is restricted to subscribers.
Dennis D. Cox
Jong Soo Lee
oai:RePEc:oup:biomet:v:98:y:2011:i:4:p:935-9512013-03-04RePEc:oup:biomet
article
Elliptical graphical modelling
We propose elliptical graphical models based on conditional uncorrelatedness as a robust generalization of Gaussian graphical models. Letting the population distribution be elliptical instead of normal allows the fitting of data with arbitrarily heavy tails. We study the class of proportionally affine equivariant scatter estimators and show how they can be used to perform elliptical graphical modelling. This leads to a new class of partial correlation estimators and analogues of the classical deviance test. General expressions for the asymptotic variance of partial correlation estimators, unconstrained and under decomposable models, are given, and the asymptotic chi square approximation for the pseudo-deviance test statistic is proved. The feasibility of our approach is demonstrated by a simulation study, using, among others, Tyler's scatter estimator, which is distribution-free within the elliptical model. Copyright 2011, Oxford University Press.
4
2011
98
Biometrika
935
951
http://hdl.handle.net/10.1093/biomet/asr037
application/pdf
Access to full text is restricted to subscribers.
D. Vogel
R. Fried
oai:RePEc:oup:biomet:v:92:y:2005:i:4:p:801-8202013-03-04RePEc:oup:biomet
article
Nonparametric maximum likelihood estimation of the structural mean of a sample of curves
A random sample of curves can be usually thought of as noisy realisations of a compound stochastic process X(t) = Z{W(t)}, where Z(t) produces random amplitude variation and W(t) produces random dynamic or phase variation. In most applications it is more important to estimate the so-called structural mean μ(t) = E{Z(t)} than the crosssectional mean E{X(t)}, but this estimation problem is difficult because the process Z(t) is not directly observable. In this paper we propose a nonparametric maximum likelihood estimator of μ(t). This estimator is shown to be √n-consistent and asymptotically normal under the assumed model and robust to model misspecification. Simulations and a realdata example show that the proposed estimator is competitive with landmark registration, often considered the benchmark, and has the advantage of avoiding time-consuming and often infeasible individual landmark identification. Copyright 2005, Oxford University Press.
4
2005
92
December
Biometrika
801
820
http://hdl.handle.net/10.1093/biomet/92.4.801
text/html
Access to full text is restricted to subscribers.
Daniel Gervini
Theo Gasser
oai:RePEc:oup:biomet:v:94:y:2007:i:2:p:427-4412013-03-04RePEc:oup:biomet
article
Uncertainty in prior elicitations: a nonparametric approach
A key task in the elicitation of expert knowledge is to construct a distribution from the finite, and usually small, number of statements that have been elicited from the expert. These statements typically specify some quantiles or moments of the distribution. Such statements are not enough to identify the expert's probability distribution uniquely, and the usual approach is to fit some member of a convenient parametric family. There are two clear deficiencies in this solution. First, the expert's beliefs are forced to fit the parametric family. Secondly, no account is then taken of the many other possible distributions that might have fitted the elicited statements equally well. We present a nonparametric approach which tackles both of these deficiencies. We also consider the issue of the imprecision in the elicited probability judgements. Copyright 2007, Oxford University Press.
2
2007
94
Biometrika
427
441
http://hdl.handle.net/10.1093/biomet/asm031
application/pdf
Access to full text is restricted to subscribers.
Jeremy E. Oakley
Anthony O'Hagan
oai:RePEc:oup:biomet:v:93:y:2006:i:1:p:127-1352013-03-04RePEc:oup:biomet
article
Efficient designs for one-sided comparisons of two or three treatments with a control in a one-way layout
The problem of providing lower confidence bounds for the mean improvements of p >= 2 test treatments over a control treatment is considered. The expected average and expected maximum allowances are two criteria for comparing different systems of confidence intervals or bounds. In this paper, lower bounds are derived for the expected average allowance and the expected maximum allowance of Dunnett's simultaneous lower confidence bounds for the p mean improvements. These lower bounds hold for any p >= 2 and any allocation of sample sizes. For p = 2 test treatments, sample allocations are given for which the bounds are achievable. For p = 3 test treatments, a tighter set of bounds is derived which enables easy determination of the sample allocation required to achieve highly efficient designs. A table of the bounds for the expected average and expected maximum allowances and the sample allocation that achieves these bounds is given for p = 2, 3. The theoretical results can easily be adapted to cover upper confidence bounds. Copyright 2006, Oxford University Press.
1
2006
93
March
Biometrika
127
135
http://hdl.handle.net/10.1093/biomet/93.1.127
text/html
Access to full text is restricted to subscribers.
Steven M. Bortnick
Angela M. Dean
Thomas J. Santner
oai:RePEc:oup:biomet:v:90:y:2003:i:4:p:991-9942013-03-04RePEc:oup:biomet
article
A note on 'Testing the number of components in a normal mixture'
In a recent paper, Lo et al. (2001) propose a test for the likelihood ratio statistic based on the Kullback--Leibler information criterion when testing the null hypothesis that a random sample is drawn from a mixture of k-sub-0 normal components against the alternative hypothesis of a mixture with k-sub-1 normal components with k-sub-0 less than k-sub-1. However, this result requires conditions that are generally not met when the null hypothesis holds. Consequently, the result is not proven and simulations suggest that it may not be correct. Copyright Biometrika Trust 2003, Oxford University Press.
4
2003
90
December
Biometrika
991
994
Neal O. Jeffries
oai:RePEc:oup:biomet:v:90:y:2003:i:3:p:669-6782013-03-04RePEc:oup:biomet
article
Assessing landmark influence on shape variation
Given two sets of landmark data which differ in shape, it is useful to determine the extent to which shape variation can be explained by the perturbations of individual landmarks. We propose a method for assessing this, based on analysing the relative reduction in distance between the shapes that can be achieved by varying the location of a single landmark. This method is applied to a set of landmark data from the cervical vertebrae of two subspecies of gorillas. Copyright Biometrika Trust 2003, Oxford University Press.
3
2003
90
September
Biometrika
669
678
Michael H. Albert
oai:RePEc:oup:biomet:v:95:y:2008:i:1:p:139-1472013-03-04RePEc:oup:biomet
article
Bayesian and frequentist confidence intervals arising from empirical-type likelihoods
For a general class of empirical-type likelihoods for the population mean, higher-order asymptotics are developed with a view to characterizing its members which allow, for any given prior, the existence of a confidence interval that has approximately correct posterior as well as frequentist coverage. In particular, it is seen that the usual empirical likelihood always allows such a confidence interval, while many of its variants proposed in the literature do not enjoy this property. An explicit form of the confidence interval is also given. Copyright 2008, Oxford University Press.
1
2008
95
Biometrika
139
147
http://hdl.handle.net/10.1093/biomet/asm088
application/pdf
Access to full text is restricted to subscribers.
In Hong Chang
Rahul Mukerjee
oai:RePEc:oup:biomet:v:89:y:2002:i:1:p:39-482013-03-04RePEc:oup:biomet
article
A semiparametric pseudolikelihood estimation method for panel count data
In this paper, we study panel count data with covariates. A semiparametric pseudolikelihood estimation method is proposed based on the assumption that, given a covariate vector Z, the underlying counting process is a nonhomogeneous Poisson process with the conditional mean function given by E{N (t) |Z} = &Lgr;-sub-0 (t) exp (&bgr;′-sub-0Z). The proposed estimation method is shown to be robust in the sense that the estimator converges to its true value regardless of whether or not N (t) is a conditional Poisson process, given Z. An iterative numerical algorithm is devised to compute the semiparametric maximum pseudolikelihood estimator of (&bgr;-sub-0, &Lgr;-sub-0). The algorithm appears to be attractive, especially when &bgr;-sub-0 is a high-dimensional regression parameter. Some simulation studies are conducted to validate the method. Finally, the method is applied to a real dataset from a bladder tumour study. Copyright Biometrika Trust 2002, Oxford University Press.
1
2002
89
March
Biometrika
39
48
Ying Zhang
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:19-292013-03-04RePEc:oup:biomet
article
Semiparametric regression analysis of mean residual life with censored survival data
As function of time t, a mean residual life is the remaining life expectancy of a subject given survival up to t. The proportional mean residual life model, proposed by Oakes & Dasu (1990), provides an alternative to the Cox proportional hazards model for studying the association between survival times and covariates. In the presence of censoring, we use counting process theory to develop semiparametric inference procedures for the regression coefficients of the Oakes--Dasu model. Simulation studies and an application to the well-known Veterans' Administration lung cancer survival data are presented. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
19
29
http://hdl.handle.net/10.1093/biomet/92.1.19
text/html
Access to full text is restricted to subscribers.
Y. Q. Chen
S. Cheng
oai:RePEc:oup:biomet:v:94:y:2007:i:4:p:985-9912013-03-04RePEc:oup:biomet
article
Importance Sampling Via the Estimated Sampler
Monte Carlo importance sampling for evaluating numerical integration is discussed. We consider a parametric family of sampling distributions and propose the use of the sampling distribution estimated by maximum likelihood. The proposed method of importance sampling using the estimated sampling distribution is shown to improve the asymptotic variance of the ordinary method using the true sampling distribution. The argument is closely related to the discussion of the paradox in Henmi & Eguchi (2004). We focus on a condition under which the estimated integration value obtained by the proposed method has asymptotic zero variance. Copyright 2007, Oxford University Press.
4
2007
94
Biometrika
985
991
http://hdl.handle.net/10.1093/biomet/asm076
application/pdf
Access to full text is restricted to subscribers.
Masayuki Henmi
Ryo Yoshida
Shinto Eguchi
oai:RePEc:oup:biomet:v:90:y:2003:i:1:p:15-272013-03-04RePEc:oup:biomet
article
Generalised linear models for correlated pseudo-observations, with applications to multi-state models
In multi-state models regression analysis typically involves the modelling of each transition intensity separately. Each probability of interest, namely the probability that a subject will be in a given state at some time, is a complex nonlinear function of the intensity regression coefficients. We present a technique which models the state probabilities directly. This method is based on the pseudo-values from a jackknife statistic constructed from simple summary statistic estimates of the state probabilities. These pseudo-values are then used in a generalised estimating equation to obtain estimates of the model parameters. We illustrate how this technique works by studying examples of common regression problems. We apply the technique to model acute graft-versus-host disease in bone marrow transplants. Copyright Biometrika Trust 2003, Oxford University Press.
1
2003
90
March
Biometrika
15
27
Per Kragh Andersen
oai:RePEc:oup:biomet:v:91:y:2004:i:3:p:715-7272013-03-04RePEc:oup:biomet
article
A superiority-equivalence approach to one-sided tests on multiple endpoints in clinical trials
This paper considers the problem of comparing a new treatment with a control based on multiple endpoints. The hypotheses are formulated with the goal of showing that the treatment is equivalent, i.e. not inferior, on all endpoints and superior on at least one endpoint compared to the control, where thresholds for equivalence and superiority are specified for each endpoint. Roy's (1953) union-intersection and Berger's (1982) intersection-union principles are employed to derive the basic test. It is shown that the critical constants required for the union-intersection test of superiority can be sharpened by a careful analysis of its type I error rate. The composite UI-IU test is illustrated by an example and compared in a simulation study to alternative tests proposed by Bloch et al. (2001) and Perlman & Wu (2004). The Bloch et al. test does not control the type I error rate because of its nonmonotone nature, and is hence not recommended. The UI-IU and the Perlman & Wu tests both control the type I error rate, but the latter test generally has a slightly higher power. Copyright Biometrika Trust 2004, Oxford University Press.
3
2004
91
September
Biometrika
715
727
Ajit C. Tamhane
oai:RePEc:oup:biomet:v:92:y:2005:i:1:p:91-1032013-03-04RePEc:oup:biomet
article
Exact likelihood ratio tests for penalised splines
Penalised-spline-based additive models allow a simple mixed model representation where the variance components control departures from linear models. The smoothing parameter is the ratio of the random-coefficient and error variances and tests for linear regression reduce to tests for zero random-coefficient variances. We propose exactlikelihood and restricted likelihood ratio tests for testing polynomial regression versus a general alternative modelled by penalised splines. Their spectral decompositions are used as the basis of fast simulation algorithms. We derive the asymptotic local power properties of the tests under weak conditions. In particular we characterise the local alternatives that are detected with asymptotic probability one. Confidence intervals for the smoothing parameter are obtained by inverting the tests for a fixed smoothing parameter versus a general alternative. We discuss F and R tests and show that ignoring the variability in the smoothing parameter estimator can have a dramatic effect on their null distributions. The powers of several known tests are investigated and a small set of tests with good power properties is identified. The restricted likelihood ratio test is among the best in terms of power. Copyright 2005, Oxford University Press.
1
2005
92
March
Biometrika
91
103
http://hdl.handle.net/10.1093/biomet/92.1.91
text/html
Access to full text is restricted to subscribers.
Ciprian Crainiceanu
David Ruppert
Gerda Claeskens
M. P. Wand
oai:RePEc:oup:biomet:v:89:y:2002:i:2:p:315-3312013-03-04RePEc:oup:biomet
article
Overestimation of the receiver operating characteristic curve for logistic regression
Logistic regression is often used to find a linear combination of covariates which best discriminates between two groups or populations. The ROC, receiver operating characteristic, curve is a good way of assessing the performance of the resulting score, but using the same data both to fit the score and to calculate its ROC leads to an over-optimistic estimate of the performance which the score would give if it were to be validated on a sample of future cases. The paper studies the extent of this overestimation, and suggests a shrinkage correction for the ROC curve itself and for the area under the curve. The correction is consistent with Efron's formula for the bias in the error rate of a binary prediction rule. Two medical examples are discussed. Copyright Biometrika Trust 2002, Oxford University Press.
2
2002
89
June
Biometrika
315
331
J. B. Copas
oai:RePEc:oup:biomet:v:89:y:2002:i:4:p:731-7432013-03-04RePEc:oup:biomet
article
On the applicability of regenerative simulation in Markov chain Monte Carlo
We consider the central limit theorem and the calculation of asymptotic standard errors for the ergodic averages constructed in Markov chain Monte Carlo. Chan & Geyer (1994) established a central limit theorem for ergodic averages by assuming that the underlying Markov chain is geometrically ergodic and that a simple moment condition is satisfied. While it is relatively straightforward to check Chan & Geyer's conditions, their theorem does not lead to a consistent and easily computed estimate of t