2021-10-16T03:31:31Z
http://oai.repec.org/oai.php
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:759-7602020-05-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:759-760
article
Stata tip 112: Where did my p-values go? (Part 2)
http://www.stata-journal.com/article.html?article=st0280
Maarten L. Buis
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:748-7582020-05-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:748-758
article
Speaking Stata: Matrices as look-up tables
Matrices in Stata can serve as look-up tables. Because Stata will accept references to matrix elements within many commands, most notably generate and replace, users can access and use values from a table in either vector or full matrix form. Examples are given for entry of small datasets, recoding of categorical variables, and quantile-based or similar binning of counted or measured variables. In the last case, the device grants easy exploration of the consequences of different binning conventions and the instability of bin allocation.
matrices, vectors, data entry, recoding, quantiles, binning, look-up tables
http://www.stata-journal.com/article.html?article=pr0054
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:571-5742020-05-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:571-574
article
The Stata Journal Editors' Prize 2012: David Roodman
http://www.stata-journal.com/article.html?article=gn0054
Editors
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:7652020-05-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:765
article
Stata tip 111: More on working with weeks, erratum
http://www.stata-journal.com/article.html?article=dm0065_1
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:761-7642020-05-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:761-764
article
Stata tip 113: Changing a variable's format: What it does and does not mean
http://www.stata-journal.com/article.html?article=dm0067
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:6:y:2006:i:4:p:530-5492018-07-12RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:4:p:530-549
article
Bayesian analysis in Stata with WinBUGS
WinBUGS is a program for Bayesian model fitting by Gibbs sampling. WinBUGS has limited facilities for data handling, whereas Stata has no routines for Bayesian analysis; therefore, much can be gained by running Stata and WinBUGS together. We present a set of ado-files that enable data to be processed in Stata and then passed to WinBUGS for model fitting; finally, the results are read back into Stata for further processing.
wbarray, wbdata, wbscalar, wbstructure, wbvector, wbrun, wbscript, wbcoda, wbac, wbbgr, wbgeweke, wbintervals, wbsection, wbtrace, wbstats, wbdensity, wbdic, wbbull, wbdecode, Bayesian methods, MCMC, Gibbs sampling, WinBUGS
http://www.stata-journal.com/article.html?article=st0115
John Thompson
Tom Palmer
Santiago Moreno
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:550-5892020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:550-589
article
Nonparametric frontier analysis using Stata
In this article, we describe five new Stata commands that fit and provide statistical inference in nonparametric frontier models. The tenonradial and teradial commands fit data envelopment models where nonradial and radial technical efficiency measures are computed (Färe, 1998, Fundamentals of Production Theory; Färe and Lovell, 1978, Journal of Economic Theory 19: 150–162; Färe, Grosskopf, and Lovell, 1994a, Production Frontiers). Technical efficiency measures are obtained by solving linear programming problems. The teradialbc, nptestind, and nptestrts commands provide tools for making statistical inference regarding radial technical efficiency measures (Simar and Wilson, 1998, Management Science 44: 49–61; 2000, Journal of Applied Statistics 27: 779–802; 2002, European Journal of Operational Research 139: 115–132). We provide a brief overview of the nonparametric efficiency measurement, and we describe the syntax and options of the new commands. Additionally, we provide an example showing the capabilities of the new commands. Finally, we perform a small empirical study of productivity growth. Copyright 2016 by StataCorp LP.
tenonradial, teradial, teradialbc, nptestind, nptestrts, nonpara- metric efficiency analysis, data envelopment analysis, technical efficiency, radial measure, nonradial measure, linear programming, bootstrap, subsampling boot- strap, smoothed bootstrap, bias correction, frontier analysis
http://www.stata-journal.com/article.html?article=st0444
Oleg Badunenko
Pavlo Mozharovskyi
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:523-5492020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:523-549
article
The Keane and Runkle estimator for panel-data models with serial correlation and instruments that are not strictly exogenous
In this article, we introduce the new command xtkr, which implements the Keane and Runkle (1992a, Journal of Business and Economic Statistics 10: 1–9) approach for fitting linear panel-data models when the available instruments are predetermined but not strictly exogenous. This is a common case that includes dynamic panel-data models as a leading example. Monte Carlo simulations show that, in certain situations, this approach offers an improvement over the popular difference generalized method of moments and system generalized method of moments estimators in terms of bias and root mean squared error. An empirical application to cigarette demand also demonstrates its usefulness for applied researchers. Copyright 2016 by StataCorp LP.
xtkr, forward filtering, GMM, panel data, lagged dependent variable, endogeneity, strict exogeneity, predetermination
http://www.stata-journal.com/article.html?article=st0443
Michael Keane
Timothy Neal
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:662-6672020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:662-667
article
dynsimpie: A command to examine dynamic compositional dependent variables
In this article, we adapt the modeling strategy proposed by Philips, Rutherford, and Whitten (2016, American Journal of Political Science 60: 268– 283) and create a user-friendly Stata command, dynsimpie. This command re- quires the installation of the clarify package of Tomz, Wittenberg, and King (2003, Journal of Statistical Software 8(1): 1–30) and uses the commands in the clarify package to produce estimates from models of compositional dependent variables over time. Users can also examine how counterfactual shocks play through the system with graphs that are easy to interpret. We illustrate this with a model of voter support for the three dominant political parties in the UK. Copyright 2016 by StataCorp LP.
dynsimpie, dynamic composition, counterfactual shocks
http://www.stata-journal.com/article.html?article=st0448
Andrew Q. Philips
Amanda Rutherford
Guy D. Whitten
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:702-7162020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:702-716
article
strmst2 and strmst2pw: New commands to compare survival curves using the restricted mean survival time
We present strmst2, a new command to implement k-sample comparisons using the restricted mean survival time (RMST) as the summary measure of the survival-time distribution. Unlike model-based summary measures such as the hazard ratio, the validity of which relies on the adequacy of the proportional-hazards assumption, the measures based on the RMST (that is, the difference in RMST, the ratio of RMST, and the ratio of the restricted mean time lost) provide more robust and clinically interpretable results about the between-group differences. strmst2 performs analysis of covariance-type adjusted analyses for k-sample comparisons as well as the corresponding unadjusted analyses. Pairwise comparisons for the adjusted analyses are summarized using the new postesti- mation command strmst2pw. We briefly describe the issues of the hazard ratio, introduce details of the method for the RMST, and then illustrate how to use the new commands. Copyright 2016 by StataCorp LP.
strmst2, strmst2pw, restricted mean survival time, restricted mean time lost, survival analysis, time-to-event data
http://www.stata-journal.com/article.html?article=st0451
Angel Cronin
Lu Tian
Hajime Uno
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:678-6902020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:678-690
article
Versatile tests for comparing survival curves based on weighted log-rank statistics
The log-rank test is perhaps the most commonly used nonparametric method for comparing two survival curves and yields maximum power under proportional hazards alternatives. While the assumption of proportional hazards is often reasonable, it need not hold. Several authors have therefore developed versa- tile tests using combinations of weighted log-rank statistics that are more sensitive to nonproportional hazards. Fleming and Harrington (1991, Counting Processes and Survival Analysis, Wiley) consider the family of Gρ statistics and their supre-mum versions, while Lee (1996, Biometrics 52: 721–725) and Lee (2007, Compu- tational Statistics and Data Analysis 51: 6557–6564) propose tests based on the more extended Gρ,γ family. In this article, I consider Zm = max(|Z1 |, |Z2 |, |Z3 |), where Z1 , Z2 , and Z3 are z statistics obtained from G0,0 , G1,0 , and G0,1 tests, respectively. G0,0 corresponds to the log-rank test, while G1,0 and G0,1 are more sensitive to early and late-difference alternatives. I conduct a simulation study to compare the performance of Zm with the log-rank test, the more optimally weighted test, and Lee’s (2007) tests, under the null hypothesis, proportional hazards, early difference, and late-difference alternatives. Results indicate that the method based on Zm maintains the type I error rate, provides increased power relative to the log-rank test under early difference and late-difference alternatives, and entails only a small to moderate power loss compared with the more optimally chosen test. I apply the procedure to two datasets reported in the literature, both of which exhibit nonproportional hazards. Versatile tests such as Zm may be useful in clinical trial settings where there is concern that the treatment effect may not conform to the proportional hazards assumption. I also describe the syntax for a Stata command, verswlr, to implement the method. Copyright 2016 by StataCorp LP.
verswlr, survival curves, log-rank test, nonproportional hazards, versatile tests, power
http://www.stata-journal.com/article.html?article=st0449
Theodore G. Karrison
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:691-7012020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:691-701
article
eq5dds: A command to analyze the descriptive system of EQ-5D quality-of-life instrument
In this article, we describe the eq5dds command, which presents the results of a descriptive system using individuals’ responses for mobility, self care, usual activities, pain or discomfort, anxiety or depression, and visual analog scale from the EQ-5D health-related quality-of-life instrument (in its 3L and 5L versions). The command presents each of the tables and the figures recommended in the official user guide of the instrument (developed by the EuroQol Group). Copyright 2016 by StataCorp LP.
eq5dds, EQ-5D, descriptive system
http://www.stata-journal.com/article.html?article=st0450
Juan Manuel Ramos-Goñi
JYolanda Ramallo-Fariña
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:613-6312020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:613-631
article
Hot and cold spot analysis using Stata
Spatial analysis is attracting more attention from Stata users because of the increasing availability of regional data. In this article, I present an imple- mentation of hot and cold spot analysis using Stata. I introduce the new command getisord, which calculates the Getis–Ord G∗i(d) statistic. To implement this command, one only needs the latitude and longitude of regions as the additional required information. In combination with shape files, the results obtained from the getisord command can be visually displayed in Stata. In this article, I also offer an interesting illustration to explain how the getisord command works. Copyright 2016 by StataCorp LP.
getisord, Getis–Ord G∗i(d), local spatial autocorrelation, shape file
http://www.stata-journal.com/article.html?article=st0446
Keisuke Kondo
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:761-7772020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:761-777
article
A test for exogeneity in the presence of nonlinearities
We provide a command, locmtest, that implements a test for exogeneity that is robust when the true relationship between the outcome variable and a discrete potentially endogenous variable is nonlinear. This test was developed in Lochner and Moretti (2015, Review of Economics and Statistics 97: 387–397), and it can be implemented even when only a single valid instrument is available. We present the motivation and general idea of the test. We also describe locmtest, which calculates the test, and provide empirical applications of the test and the command. Copyright 2016 by StataCorp LP.
locmtest, exogeneity, nonlinear
http://www.stata-journal.com/article.html?article=st0454
Michael P. Babington
Javier Cano-Urbina
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:805-8122020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:805-812
article
Speaking Stata: Shading zones on time series and other plots
Background shading of time series and other plots is a common need. For example, we often wish to identify particular periods of recession or war or other distinct conditions. The shading should be laid down before the data el- ements (for example, data points or lines). Several methods of shading can be effective. Tips are included for automating the production of a series of similar graphs. Copyright 2016 by StataCorp LP.
shading, time series, line plots, scatterplots, twoway, graphics
http://www.stata-journal.com/article.html?article=gr0067
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:778-8042020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:778-804
article
Estimation of panel vector autoregression in Stata
Panel vector autoregression (VAR) models have been increasingly used in applied research. While programs specifically designed to fit time-series VAR models are often included as standard features in most statistical packages, panel VAR model estimation and inference are often implemented with general-use routines that require some programming dexterity. In this article, we briefly discuss model selection, estimation, and inference of homogeneous panel VAR models in a generalized method of moments framework, and we present a set of programs to conveniently execute them. We illustrate the pvar package of programs by using standard Stata datasets. Copyright 2016 by StataCorp LP.
pvar, pvarfevd, pvargranger, pvarirf, pvarsoc, pvarstable, panel, vector autoregression, VAR, dynamic panel
http://www.stata-journal.com/article.html?article=st0455
Michael R. M. Abrigo
Inessa Love
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:650-6612020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:650-661
article
Using mi impute chained to fit ANCOVA models in randomized trials with censored dependent and independent variables
In this article, we illustrate how to use mi impute chained with intreg to fit an analysis of covariance analysis of censored and nondetectable immunological concentrations measured in a randomized pretest–posttest design. Copyright 2016 by StataCorp LP.
mi impute chained, multiple imputation, intreg, censoring, detection limit, ANCOVA
http://www.stata-journal.com/article.html?article=st0447
Andreas Andersen
Andreas Rieckmann
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:717-7392020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:717-739
article
Implementing Rubin's alternative multiple-imputation method for statistical matching in Stata
This article introduces two new commands, smpc and smmatch, that implement the statistical matching procedure proposed by Rubin (1986, Journal of Business and Economic Statistics 4: 87–94). The purpose of statistical matching in Rubin’s procedure is to generate a single dataset from various datasets, where each dataset contains a specific variable of interest and all contain some variables in common. For two variables of interest that are not observed jointly for any unit, smpc generates the predicted values of each as a function of the other vari- able of interest and a set of control variables by assuming a partial correlation value (defined by the user) between the two variables of interest (other statistical matching procedures assume that they are conditionally independent given the control variables). The smmatch command, on the other hand, matches observations of different datasets according to their predicted values (using a minimum distance criterion) conditional on a set of control variables, and it imputes the observed value of the match for the missing. Copyright 2016 by StataCorp LP.
smmatch, smpc, data combination, missing data, multiple imputation, statistical matching
http://www.stata-journal.com/article.html?article=st0452
Anil Alpman
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:813-8142020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:813-814
article
Software updates
Updates for previously published packages are provided.
Editors
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:632-6492020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:632-649
article
A sparser, speedier reshape
A new command, sreshape, supports sparse and speedy reshaping of data. Often reshaped data are “sparse” in the sense of containing many miss- ing values that are dropped after reshaping. sreshape automates the process of reshaping and dropping such missing information to avoid potential errors and, for both sparse and nonsparse data, yields speed improvements. Using large test datasets, sreshape achieves identical results to reshape 8 to 31 times faster in wide-to-long reshapes and 2 to 13 times faster in long-to-wide reshapes. Further suggested improvements may allow StataCorp to increase these speed gains in the built-in version. Copyright 2016 by StataCorp LP.
sreshape, reshape, sparse data, missing data, long form, wide form, data management, speed
http://www.stata-journal.com/article.html?article=dm0090
Kenneth L. Simons
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:740-7602020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:740-760
article
The lag-length selection and detrending methods for HEGY seasonal unit-root tests using Stata
The article extends the previous Hylleberg, Engle, Granger, and Yoo seasonal unit-root test commands (Baum and Sperling, 2001, https://ideas.repec.org/c/boc/bocode/s416502.html; Depalo, 2009, Stata Journal 9: 422–438), which allow for the use of both quarterly and monthly data. It is also possible to choose between ordinary least-squares and generalized least-squares detrending (Rodrigues and Taylor, 2007, Journal of Econometrics 141: 548–573) procedures to deal with the deterministic part of the process. The command allows for the use of the sequential method proposed by Hall (1994, Journal of Business and Economic Statistics 12: 461–470) and Ng and Perron (1995, Journal of the American Statistical Association 90: 268–281), the adaptation of the modified Akaike information criteria to the case of seasonal unit-root tests (del Barrio Castro, Osborn, and Taylor, 2016, Econometric Reviews 35: 122–168) as well as the inclusion of Akaike information and Bayesian information criteria to determine the order of augmentation of the serial correlation in the augmented Hylleberg, Engle, Granger, and Yoo regression. Finally, the use of the command is illustrated with an empirical application to the case of monthly passenger airport arrivals to Palma de Mallorca. Copyright 2016 by StataCorp LP.
hegy, HEGY test, GLS detrending, optimal augmentation lag
http://www.stata-journal.com/article.html?article=st0453
Tomás del Barrio Castro
Andrii Bodnar
Andreu Sansó
oai:RePEc:tsj:stataj:v:16:y:2016:i:3:p:590-6122020-07-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:3:p:590-612
article
Multiple imputation for categorical time series
The mict package provides a method for multiple imputation of categorical time-series data (such as life course or employment status histories) that preserves longitudinal consistency, using a monotonic series of imputations. It allows flexible imputation specifications with a model appropriate to the target variable (mlogit, ologit, etc.). Where transitions in individual units' data are substantially less frequent than one per period and where missingness tends to be consecutive (as is typical of life course data), mict produces imputations with better longitudinal consistency than mi impute or ice. Copyright 2016 by StataCorp LP.
mict impute, mict prep, mict model gap, mict model initial, mict model terminal, multiple imputation, categorical time series
http://www.stata-journal.com/article.html?article=st0445
Brendan Halpin
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:251-2522021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:251-252
article
Joseph M. Hilbe (1944-2017)
Editors
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:511-5142021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:511-514
article
Stata tip 127: Use capture noisily groups
http://www.stata-journal.com/article.html?article=pr0066
Roger B. Newson
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:490-5022021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:490-502
article
Rate decomposition for aggregate data using Das Gupta’s method
Social, behavioral, and health scientists frequently decompose changes or differences in outcome variables into components of change and assess their relative importance. Many Stata commands facilitate this exercise using unit- level data, notably by applying the Blinder – Oaxaca approach. However, none of the comparable user-written commands decompose changes or differences in aggregate data despite their availability and the widespread use of corresponding decomposition techniques. In this article, I present the user-written command rdecompose, which decomposes aggregate or cross-classified data based on Das Gupta’s (1993, Standardization and Decomposition of Rates: A User’s Manual, Volume 1) approach, and demonstrate its application in multiple settings. This command extends the original method by allowing multiple factors and flexible functional specifications.
rdecompose, decomposition, cross-classified, Das Gupta method
http://www.stata-journal.com/article.html?article=st0483
http://www.stata-journal.com/software/sj17-2/st0483/
Jinjing Li
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:462-4892021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:462-489
article
A flexible parametric competing-risks model using a direct likelihood approach for the cause-specific cumulative incidence function
In competing-risks analysis, the cause-specific cumulative incidence function (CIF) is usually obtained in a modeling framework by either 1) transform- ing on all cause-specific hazards or 2) transforming by using a direct relationship with the subdistribution hazard function. We expand on current competing-risks methodology from within the flexible parametric survival modeling framework and focus on the second approach. This approach models all cause-specific CIFs si- multaneously and is more useful for answering prognostic-related questions. We propose the direct flexible parametric survival modeling approach for the cause- specific CIF. This approach models the (log cumulative) baseline hazard without requiring numerical integration, which leads to benefits in computational time. It is also easy to make out-of-sample predictions to estimate more useful measures and incorporate alternative link functions, for example, logit links. To implement these methods, we introduce a new estimation command, stpm2cr, and demon- strate useful predictions from the model through an illustrative melanoma dataset.
stpm2cr, survival analysis, competing risks, flexible parametric models, subdistribution hazard, cumulative incidence function
http://www.stata-journal.com/article.html?article=st0482
http://www.stata-journal.com/software/sj17-2/st0482/
Sarwar Islam Mozumder
Mark J. Rutherford
Paul C. Lambert
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:343-3572021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:343-357
article
Fitting Bayesian item response models in Stata and Stan
Stata users have access to two easy-to-use implementations of Bayesian inference: Stata’s native bayesmh command and StataStan, which calls the general Bayesian engine, Stan. We compare these implementations on two important models for education research: the Rasch model and the hierarchical Rasch model. StataStan fits a more general range of models than can be fit by bayesmh and uses a superior sampling algorithm, that is, Hamiltonian Monte Carlo using the no-U- turn sampler. Furthermore, StataStan can run in parallel on multiple CPU cores, regardless of the flavor of Stata. Given these advantages and given that Stan is open source and can be run directly from Stata do-files, we recommend that Stata users interested in Bayesian methods consider using StataStan. Copyright 2017 by StataCorp LP.
stan, windowsmonitor, StataStan, bayesmh, Bayesian
http://www.stata-journal.com/article.html?article=st0477
http://www.stata-journal.com/software/sj17-2/st0477/
Robert L. Grant
Daniel C. Furr
Bob Carpenter
Andrew Gelman
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:442-4612021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:442-461
article
Multilevel multiprocess modeling with gsem
Multilevel multiprocess models are simultaneous equation systems that include multilevel hazard equations with correlated random effects. Demog- raphers routinely use these models to adjust estimates for endogeneity and sample selection. In this article, I demonstrate how multilevel multiprocess models can be fit with the gsem command. I distinguish between two classes of multilevel multiprocess models: nonrecursive systems of hazard equations without observed endogenous variables and recursive systems that include a hazard equation with ob- served endogenous qualitative variables. I illustrate the estimation of both classes of models using sample datasets shipped with the statistical software aML. I pay special attention to identifying structural coefficients in nonrecursive simultaneous systems.
survival analysis, multilevel multiprocess models, multilevel analysis, simultaneous equations, endogeneity, gsem
http://www.stata-journal.com/article.html?article=st0481
http://www.stata-journal.com/software/sj17-2/st0481/
Tamás Bartus
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:372-4042021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:372-404
article
rdrobust: Software for regression-discontinuity designs
We describe a major upgrade to the Stata (and R) rdrobust package, which provides a wide array of estimation, inference, and falsification methods for the analysis and interpretation of regression-discontinuity designs. The main new features of this upgraded version are as follows: i) covariate-adjusted band- width selection, point estimation, and robust bias-corrected inference, ii) cluster– robust bandwidth selection, point estimation, and robust bias-corrected inference, iii) weighted global polynomial fits and pointwise confidence bands in regression- discontinuity plots, and iv) several new bandwidth selection methods, including different bandwidths for control and treatment groups, coverage error-rate optimal bandwidths, and optimal bandwidths for fuzzy designs. In addition, the upgraded package has superior performance because of several numerical and implementa- tion improvements. We also discuss issues of backward compatibility and provide a companion R package with the same syntax and capabilities.
rdrobust, rdbwselect, rdplot, regression discontinuity
http://www.stata-journal.com/article.html?article=st0366_1
http://www.stata-journal.com/software/sj17-2/st0366_1/
Sebastian Calonico
Matias D. Cattaneo
Max H. Farrell
Roc ́ıo Titiunik
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:358-3712021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:358-371
article
Instantaneous geometric rates via generalized linear models
The instantaneous geometric rate represents the instantaneous proba- bility of an event of interest per unit of time. In this article, we propose a method to model the effect of covariates on the instantaneous geometric rate with two mod- els: the proportional instantaneous geometric rate model and the proportional instantaneous geometric odds model. We show that these models can be fit within the generalized linear model framework by using two nonstandard link functions that we implement in the user-defined link programs log igr and logit igr. We illustrate how to fit these models and how to interpret the results with an exam- ple from a randomized clinical trial on survival in patients with metastatic renal carcinoma. Copyright 2017 by StataCorp LP.
log igr, logit igr, instantaneous geometric rate, generalized linear models, glm, survival analysis
http://www.stata-journal.com/article.html?article=st0478
http://www.stata-journal.com/software/sj17-2/st0478/
Andrea Discacciati
Matteo Bottai
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:515-5162021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:515-516
article
Software updates
Updates for previously published packages are provided.
http://www.stata-journal.com/software/sj17-2/dm0078_2/
http://www.stata-journal.com/software/sj17-2/gr0064_1/
http://www.stata-journal.com/software/sj17-2/st0376_1/
http://www.stata-journal.com/software/sj17-2/st0389_4/
http://www.stata-journal.com/software/sj17-2/st0446_1/
http://www.stata-journal.com/software/sj17-2/st0470_1/
Editors
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:422-4412021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:422-441
article
Estimating responsiveness scores using rscore
rscore computes unit-specific responsiveness scores using an iter- ated random-coefficient regression approach. The model fit by rscore considers a regression of a response variable y, that is, outcome, on a series of factors (or re- gressors) x, that is, varlist, by assuming a different reaction (or “responsiveness”) of each unit to each factor contained in x. rscore allows for i) ranking units according to the obtained level of the responsiveness score; ii) detecting more in- fluential factors in driving unit performance; and iii) studying the distribution (heterogeneity) of factors’ responsiveness scores across units. Also, rscore offers useful graphical representation of results. We provide two illustrative applications of the model: the first is on a cross-section, and the second is on a longitudinal dataset.
rscore, responsiveness scores, random-coefficient regression
http://www.stata-journal.com/article.html?article=st0480
http://www.stata-journal.com/software/sj17-2/st0480/
Giovanni Cerulli
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:253-2782021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:253-278
article
Estimating inverse-probability weights for longitudinal data with dropout or truncation: The xtrccipw command
Individuals may drop out of a longitudinal study, rendering their out- comes unobserved but still well defined. However, they may also undergo trunca- tion (for example, death), beyond which their outcomes are no longer meaningful. Kurland and Heagerty (2005, Biostatistics 6: 241–258) developed a method to con- duct regression conditioning on nontruncation, that is, regression conditioning on continuation (RCC), for longitudinal outcomes that are monotonically missing at random (for example, because of dropout). This method first estimates the prob- ability of dropout among continuing individuals to construct inverse-probability weights (IPWs), then fits generalized estimating equations (GEE) with these IPWs. In this article, we present the xtrccipw command, which can both estimate the IPWs required by RCC and then use these IPWs in a GEE estimator by call- ing the glm command from within xtrccipw. In the absence of truncation, the xtrccipw command can also be used to run a weighted GEE analysis. We demon- strate the xtrccipw command by analyzing an example dataset and the original Kurland and Heagerty (2005) data. We also use xtrccipw to illustrate some em- pirical properties of RCC through a simulation study. Copyright 2017 by StataCorp LP.
xtrccipw, dropout, generalized estimating equations, inverse-probability weights, longitudinal data, missing at random, truncation, weighted GEE
http://www.stata-journal.com/article.html?article=st0474
http://www.stata-journal.com/software/sj17-2/st0474/
Eric J. Daza
Michael G. Hudgens
Amy H. Herring
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:405-4212021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:405-421
article
A combined test for a generalized treatment effect in clinical trials with a time-to-event outcome
Most randomized controlled trials with a time-to-event outcome are designed and analyzed assuming proportional hazards of the treatment effect. The sample-size calculation is based on a log-rank test or the equivalent Cox test. Nonproportional hazards are seen increasingly in trials and are recognized as a potential threat to the power of the log-rank test. To address the issue, Roys- ton and Parmar (2016, BMC Medical Research Methodology 16: 16) devised a new “combined test” of the global null hypothesis of identical survival curves in each trial arm. The test, which combines the conventional Cox test with a new formulation, is based on the maximal standardized difference in restricted mean survival time (RMST) between the arms. The test statistic is based on evaluations of RMST over several preselected time points. The combined test involves the min- imum p-value across the Cox and RMST-based tests, appropriately standardized to have the correct null distribution. In this article, I outline the combined test and introduce a command, stctest, that implements the combined test. I point the way to additional tools currently under development for power and sample-size calculation for the combined test.
stctest, randomized controlled trial, time-to-event outcome, restricted mean survival time, treatment effect, hypothesis testing, flexible para- metric model, jackknife
http://www.stata-journal.com/article.html?article=st0479
http://www.stata-journal.com/software/sj17-2/st0479/
Patrick Royston
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:503-5102021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:503-510
article
Covariate-constrained randomization routine for achieving baseline balance in cluster-randomized trials
In cluster-randomized trials, groups or clusters of individuals, rather than individuals themselves, are randomly allocated to intervention or control. In this article, we describe a new command, ccrand, that implements a covariate- constrained randomization procedure for cluster-randomized trials. It can ensure balance of one or more baseline covariates between trial arms by restriction to allocations that meet specified balance criteria. We provide a brief overview of the theoretical background, describe ccrand and its options, and illustrate it using an example.
ccrand, covariate-constrained randomization, cluster-randomized trials
http://www.stata-journal.com/article.html?article=st0484
http://www.stata-journal.com/software/sj17-2/st0484/
Eva Lorenz
Sabine Gabrysch
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:279-3132021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:279-313
article
Heuristic criteria for selecting an optimal aspect ratio in a two-variable line plot
Line plots encode a series of slopes from adjoining coordinates and aim to reveal suggestive patterns in the sequential rates of change. The choice of aspect ratio imposed on the line plot largely determines the judged prevalence of patterns in the bivariate series and the degree of steepness in the rates of change. Choosing an appropriate aspect ratio is key to designing informative line plots. The optaspect command calculates the optimal aspect ratio in a two-variable line graph using a number of heuristic criteria. Copyright 2017 by StataCorp LP.
optaspect, aspect ratio, line plot, time series, banking to 45 deg.
http://www.stata-journal.com/article.html?article=gr0069
http://www.stata-journal.com/software/sj17-2/gr0069/
Demetris Christodoulou
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:330-3422021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:330-342
article
Introducing the StataStan interface for fast, complex Bayesian modeling using Stan
In this article, we present StataStan, an interface that allows simulation-based Bayesian inference in Stata via calls to Stan, the flexible, open-source Bayesian inference engine. Stan is written in C++, and Stata users can use the commands stan and windowsmonitor to run Stan programs from within Stata. We provide a brief overview of Bayesian algorithms, details of the commands available from Statistical Software Components, considerations for users who are new to Stan, and a simple example. Stan uses a different algorithm than bayesmh, BUGS, JAGS, SAS, and MLwiN. This algorithm provides considerable improvements in efficiency and speed. In a companion article, we give an extended comparison of StataStan and bayesmh in the context of item response theory models. Copyright 2017 by StataCorp LP.
stan, windowsmonitor, StataStan, Bayesian, bayesmh, interface, shell commands, Stan
http://www.stata-journal.com/article.html?article=st0476
http://www.stata-journal.com/software/sj17-2/st0476/
Robert L. Grant
Bob Carpenter
Daniel C. Furr
Andrew Gelman
oai:RePEc:tsj:stataj:v:17:y:2017:i:2:p:314-3292021-04-29RePEc:tsj:stataj
RePEc:tsj:stataj:v:17:y:2017:i:2:p:314-329
article
Regression clustering for panel-data models with fixed effects
In this article, we describe the xtregcluster command, which implements the panel regression clustering approach developed by Sarafidis and Weber (2015, Oxford Bulletin of Economics and Statistics 77: 274–296). The method clas- sifies individuals into clusters, so that within each cluster, the slope parameters are homogeneous and all intracluster heterogeneity is due to the standard two-way error-components structure. Because the clusters are heterogeneous, they do not share common parameters. The number of clusters and the optimal partition are determined by the clustering solution, which minimizes the total residual sum of squares of the model subject to a penalty function that strictly increases in the number of clusters. The method is available for linear short panel-data models and useful for exploring heterogeneity in the slope parameters when there is no a priori knowledge about parameter structures. It is also useful for empirically evaluating whether any normative classifications are justifiable from a statistical point of view. Copyright 2017 by StataCorp LP.
xtregcluster, panel data, parameter heterogeneity
http://www.stata-journal.com/article.html?article=st0475
http://www.stata-journal.com/software/sj17-2/st0475/
Demetris Christodoulou
Vasilis Sarafidis
oai:RePEc:tsj:stataj:v:19:y:2019:i:1:p:83-862019-03-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:19:y:2019:i:1:p:83-86
article
On the importance of syntax coloring for teaching statistics
In this article, I underscore the importance of syntax coloring in teaching statistics. I also introduce the statax package, which includes JavaScript and LATEX programs for highlighting Stata code in HTML and LATEX documents. Furthermore, I provide examples showing how to implement this package for de- veloping educational materials on the web or for a classroom handout.
statax, syntax highlighting, statistical education, JavaScript, LATEX
http://hdl.handle.net/10.1177/1536867X19830891
E.F. Haghish
oai:RePEc:tsj:stataj:v:19:y:2019:i:1:p:61-822019-03-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:19:y:2019:i:1:p:61-82
article
Seamless interactive language interfacing between R and Stata
In this article, I propose a new approach to language interfacing for statistical software by allowing automatic interprocess communication between R and Stata. I advocate interactive language interfacing in statistical software by automatizing data communication. I introduce the rcall package and provide examples of how the R language can be used interactively within Stata or embed- ded into Stata programs using the proposed approach to interfacing. Moreover, I discuss the pros and cons of object synchronization in language interfacing.
rcall, rcall check, language interfacing, interprocess communication, synchronization, statistical programming, reproducible research
http://hdl.handle.net/10.1177/1536867X19830891
E.F. Haghish
oai:RePEc:tsj:stataj:y:19:y:2019:i:1:p:246-2592019-03-28RePEc:tsj:stataj
RePEc:tsj:stataj:y:19:y:2019:i:1:p:246-259
article
Speaking Stata: How best to generate indicator or dummy variables
Indicator or dummy variables record whether some condition is true or false in each observation by a value of 1 or 0. Values may also be missing if truth or falsity is not known, and that fact should be flagged. Such indicators may be created on the fly by using factor-variable notation. tabulate also offers one method for automating the generation of indicators. In this column, we discuss in detail how otherwise to best generate such variables directly, with comments here and there on what not to do.
indicator variable, dummy variable, true or false, any, all, missing values, logical and relational operators, functions, merge
http://hdl.handle.net/10.1177/1536867X19830921
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:19:y:2019:i:1:p:4-602019-03-28RePEc:tsj:stataj
RePEc:tsj:stataj:v:19:y:2019:i:1:p:4-60
article
Fast and wild: Bootstrap inference in Stata using boottest
The wild bootstrap was originally developed for regression models with heteroskedasticity of unknown form. Over the past 30 years, it has been extended to models estimated by instrumental variables and maximum likelihood and to ones where the error terms are (perhaps multiway) clustered. Like boot- strap methods in general, the wild bootstrap is especially useful when conventional inference methods are unreliable because large-sample assumptions do not hold. For example, there may be few clusters, few treated clusters, or weak instruments. The package boottest can perform a wide variety of wild bootstrap tests, often at remarkable speed. It can also invert these tests to construct confidence sets. As a postestimation command, boottest works after linear estimation commands, in- cluding regress, cnsreg, ivregress, ivreg2, areg, and reghdfe, as well as many estimation commands based on maximum likelihood. Although it is designed to perform the wild cluster bootstrap, boottest can also perform the ordinary (non- clustered) version. Wrappers offer classical Wald, score/Lagrange multiplier, and Anderson–Rubin tests, optionally with (multiway) clustering. We review the main ideas of the wild cluster bootstrap, offer tips for use, explain why it is particularly amenable to computational optimization, state the syntax of boottest, artest, scoretest, and waldtest, and present several empirical examples.
boottest, artest, waldtest, scoretest, Anderson–Rubin test, Wald test, wild bootstrap, wild cluster bootstrap, score bootstrap, multiway clustering, few treated clusters
http://hdl.handle.net/10.1177/1536867X19830877
David Roodman
James G. MacKinnon
Morten Ørregaard Nielsen
Matthew D. Webb
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:329-3732018-03-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:329-373
article
Confirmatory factor analysis using confa
This article describes the confa command, which fits confirmatory factor analysis models by maximum likelihood and provides diagnostics for the fitted models. Descriptions of the command and its options are given, and some illustrative examples are provided. Copyright 2009 by StataCorp LP.
confa, confa postestimation, bollenstine, Bollen–Stine bootstrap, confirmatory factor analysis, factor scores, Satorra–Bentler corrections
http://www.stata-journal.com/article.html?article=st0169
http://www.stata-journal.com/software/sj9-3/st0169/
Stanislav Kolenikov
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:439-4532018-03-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:439-453
article
Robust regression in Stata
In regression analysis, the presence of outliers in the dataset can strongly distort the classical least-squares estimator and lead to unreliable results. To deal with this, several robust-to-outliers methods have been proposed in the statistical literature. In Stata, some of these methods are available through the rreg and qreg commands. Unfortunately, these methods resist only some specific types of outliers and turn out to be ineffective under alternative scenarios. In this article, we present more effective robust estimators that we implemented in Stata. We also present a graphical tool that recognizes the type of detected outliers.
mmregress, sregress, msregress, mregress, mcd, S-estimators, MM-estimators, outliers, robustness
http://www.stata-journal.com/article.html?article=st0173
http://www.stata-journal.com/software/sj9-3/st0173/
Vincenzo Verardi
Christophe Croux
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:496-4992018-03-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:496-499
article
Stata tip 89: Estimating means and percentiles following multiple imputation
http://www.stata-journal.com/article.html?article=st0205
Peter A. Lachenbruch
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:503-5042018-03-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:503-504
article
Stata tip 91: Putting unabbreviated varlists into local macros
http://www.stata-journal.com/article.html?article=dm0051
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:500-5022018-03-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:500-502
article
Stata tip 90: Displaying partial results
http://www.stata-journal.com/article.html?article=st0206
Martin Weiss
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:281-3122018-03-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:281-312
article
Robust standard errors for panel regressions with cross-sectional dependence
I present a new Stata program, xtscc, that estimates pooled ordinary least-squares/weighted least-squares regression and fixed-effects (within) regression models with Driscoll and Kraay (Review of Economics and Statistics 80: 549–560) standard errors. By running Monte Carlo simulations, I compare the finite-sample properties of the cross-sectional dependence-consistent Driscoll-Kraay estimator with the properties of other, more commonly used covariance matrix estimators that do not account for cross-sectional dependence. The results indicate that Driscol-Kraay standard errors are well calibrated when cross-sectional dependence is present. However, erroneously ignoring cross-sectional correlation in the estimation of panel models can lead to severely biased statistical results. I illustrate the xtscc program by considering an application from empirical finance. Thereby, I also propose a Hausman-type test for fixed effects that is robust to general forms of cross-sectional and temporal dependence. Copyright 2007 by StataCorp LP.
xtscc, robust standard errors, nonparametric covariance estimation
http://www.stata-journal.com/article.html?article=st0128
http://www.stata-journal.com/software/sj7-3/st0128/
Daniel Hoechle
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:25-292021-01-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:25-29
article
Regressions are commonly misinterpreted: Comments on the article
http://www.stata-journal.com/article.html?article=st0421
J. Scott Long
David M. Drukker
oai:RePEc:tsj:stataj:v:15:y:2015:i:3:p:698-7112021-01-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:3:p:698-711
article
Estimation of mean health care costs and incremental cost-effectiveness ratios with possibly censored data
In this article, we describe the hcost program for estimating mean health care costs and incremental cost-effectiveness ratios with possibly censored data. hcost estimates the mean survival time and the mean costs, as well as their variances and covariance, for a given time horizon. If the group variable is specified, hcost will report the differences between two groups, as well as the incremental cost-effectiveness ratio and its confidence interval (optional). hcost can estimate the mean costs using two methods corresponding to different types of data: the Bang and Tsiatis (2000, Biometrika 87: 329–343) estimator using only the total costs or the Zhao and Tian (2001, Biometrics 57: 1002–1008) estimator when cost-history data are available. The hcost program can also be used to specify the annual discounting rates for survival time and costs. Copyright 2015 by StataCorp LP.
hcost, mean costs, censored data, cost history, cost-effectiveness analysis
http://www.stata-journal.com/article.html?article=st0399
Shuai Chen
Jennifer Rolfes
Hongwei Zhao
oai:RePEc:tsj:stataj:v:15:y:2015:i:4:p:1046-10592021-01-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:4:p:1046-1059
article
Best subsets variable selection in nonnormal regression models
We present a new program, gvselect, that helps users perform variable selection in regression. Best subsets variable selection is performed and provides the user with the best combinations of predictors for each level of model complexity. The leaps-and-bounds (Furnival and Wilson, 1974, Technometrics 16: 499–511) algorithm is applied using the log likelihoods of candidate models. This allows the user to perform variable selection on a wide variety of normal and non-normal regression models. Our method is described in Lawless and Singhal (1978, Biometrics 34: 318–327). Copyright 2015 by StataCorp LP.
gvselect, regress, vselect, variable selection
http://www.stata-journal.com/article.html?article=st0413
Charles Lindsey
Simon Sheather
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:39-642020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:39-64
article
Sar: Automatic generation of statistical reports using Stata and Microsoft Word for Windows
The output provided by most Stata commands is plain text not suitable to be presented or published. After the numerical and graphical outputs are obtained, the user has to copy them into a word processor to complete the editing process. Some Stata commands help you to obtain well-formatted output, especially tabulated results in LATEX or other formats, but they are not a complete solution nor are they friendly tools. Stata automatic report (Sar) is an easy-to-use macro for Microsoft Word for Windows that allows a powerful integration between Stata and Word. With Sar, the user can retrieve numerical results and graphs from Stata and automatically insert them into a well-formatted Word document, exploiting all the functions of Word. This process is managed by Word while Stata is executed in the background. Sar requires Stata commands and some specific Sar commands to be written in ordinary Word comments. Thus the report is well documented, and this can encourage the sharing of the workflow of data analysis and the reproducibility of the research. With Sar, the user can create an automatic report, that is, a Word document that can be automatically updated if data have changed. Sar works only on Windows systems. Copyright 2013 by StataCorp LP.
sar, Stata Automation object, report automation, Microsoft Word, reproducible research, Automation, OLE
http://www.stata-journal.com/article.html?article=pr0055
Giovanni L. Lo Magno
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:141-1642020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:141-164
article
Two-stage nonparametric bootstrap sampling with shrinkage correction for clustered data
This article describes a new Stata command, tsb, for performing a stratified two-stage nonparametric bootstrap resampling procedure for clustered data. Estimates for uncertainty around the point estimate, such as standard error and confidence intervals, are derived from the resultant bootstrap samples. A shrinkage estimator proposed for correcting possible overestimation due to secondstage sampling is implemented as default. Although this command is written with cost effectiveness analyses alongside cluster trials in mind, it is applicable to the analysis of continuous endpoints in cluster trials more generally. The use of this command is exemplified with a case study of a cost effectiveness analysis undertaken alongside a cluster randomized trial. We also report bootstrap confidence interval coverage by using data from a published simulation study. Copyright 2013 by StataCorp LP.
tsb, tsbceprob, two-stage nonparametric bootstrap, shrinkage correction, clustered data, cost effectiveness, health economics
http://www.stata-journal.com/article.html?article=st0288
Edmond S.-W. Ng
Richard Grieve
James R. Carpenter
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:114-1352020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:114-135
article
A menu-driven facility for sample-size calculations in cluster randomized controlled trials
We introduce the Stata menu-driven command clustersampsi, which calculates sample sizes, detectable differences, and power for cluster randomized controlled trials. The command permits continuous, binary, and rate outcomes (with normal approximations) for comparisons of two-sided tests in two equal-sized arms. The command allows for specification of the number of clusters available, or the cluster size, or the average cluster size along with an estimate of the variation of cluster sizes. When the number of clusters available is insufficient to detect the required difference at the prespecified power, clustersampsi will return the minimum number of clusters required under the prespecified design along with the minimum detectable difference and maximum achievable power (both for the prespecified number of clusters). Cluster heterogeneity can be parameterized by using either the intracluster correlation or the coefficient of variation. The command is illustrated via examples. Copyright 2013 by StataCorp LP.
clustersampsi, sample size, cluster randomized controlled trials, minimum detectable difference, maximum achievable power
http://www.stata-journal.com/article.html?article=st0286
Karla Hemming
Jen Marsh
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:65-762020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:65-76
article
Within and between estimates in random-effects models: Advantages and drawbacks of correlated random effects and hybrid models
Correlated random-effects (Mundlak, 1978, Econometrica 46: 69–85; Wooldridge, 2010, Econometric Analysis of Cross Section and Panel Data [MIT Press]) and hybrid models (Allison, 2009, Fixed Effects Regression Models [Sage]) are attractive alternatives to standard random-effects and fixed-effects models because they provide within estimates of level 1 variables and allow for the inclusion of level 2 variables. I discuss these models, give estimation examples, and address some complications that arise when interaction effects are included. Copyright 2013 by StataCorp LP.
xtreg, xtmixed, multilevel data, panel data, fixed effects, random effects, correlated random effects, hybrid model
http://www.stata-journal.com/article.html?article=st0283
Reinhard Schunck
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:217-2192020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:217-219
article
Stata tip 114: Expand paired dates to pairs of dates
http://www.stata-journal.com/article.html?article=dm0068
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:185-2052020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:185-205
article
Doubly robust estimation in generalized linear models
A common aim of epidemiological research is to assess the association between a particular exposure and a particular outcome, controlling for a set of additional covariates. This is often done by using a regression model for the outcome, conditional on exposure and covariates. A commonly used class of models is the generalized linear models. The model parameters are typically estimated through maximum likelihood. If the model is correct, then the maximum likelihood estimator is consistent but may otherwise be inconsistent. Recently, a new class of estimators known as doubly robust estimators has been proposed. These estimators use two regression models, one for the outcome and one for the exposure, and are consistent if either model is correct, not necessarily both. Thus doubly robust estimators give the analyst two chances instead of only one to make valid inference. In this article, we describe a new Stata command, drglm, that implements the most common doubly robust estimators for generalized linear models. Copyright 2013 by StataCorp LP.
drglm, doubly robust, generalized linear model
http://www.stata-journal.com/article.html?article=st0290
Nicola Orsini
Rino Bellocco
Arvid Sjolander
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:2202020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:220
article
Software updates
Updates for a previously published package is provided.
Editors
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:165-1842020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:165-184
article
Joint modeling of longitudinal and survival data
The joint modeling of longitudinal and survival data has received remarkable attention in the methodological literature over the past decade; however, the availability of software to implement the methods lags behind. The most common form of joint model assumes that the association between the survival and the longitudinal processes is underlined by shared random effects. As a result, computationally intensive numerical integration techniques such as adaptive Gauss–Hermite quadrature are required to evaluate the likelihood. We describe a new user-written command, stjm, that allows the user to jointly model a continuous longitudinal response and the time to an event of interest. We assume a linear mixed-effects model for the longitudinal submodel, allowing flexibility through the use of fixed or random fractional polynomials of time. Four choices are available for the survival submodel: the exponential, Weibull or Gompertz proportional hazard models, and the flexible parametric model (stpm2). Flexible parametric models are fit on the log cumulative-hazard scale, which has direct computational benefits because it avoids the use of numerical integration to evaluate the cumulative hazard. We describe the features of stjm through application to a dataset investigating the effect of serum bilirubin level on time to death from any cause in 312 patients with primary biliary cirrhosis. Copyright 2013 by StataCorp LP.
stjm, stjmgraph, stjm postestimation, joint modeling, mixed effects, survival analysis, longitudinal data, adaptive Gauss–Hermite quadrature
http://www.stata-journal.com/article.html?article=st0289
Michael J. Crowther
Keith R. Abrams
Paul C. Lambert
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:3-202020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:3-20
article
Stata as a numerical tool for scientific thought experiments: A tutorial with worked examples
Thought experiments based on simulation can be used to explain the impact of the chosen study design, statistical analysis strategy, or the sensitivity of results to fellow researchers. In this article, we demonstrate with two examples how to implement quantitative thought experiments in Stata. The first example uses a large-sample approach to study the impact on the estimated effect size of dichotomizing an exposure variable at different values. The second example uses simulations of datasets of realistic size to illustrate the necessity of using sampling fractions as inverse probability weights in statistical analysis for protection against bias in a complex sampling design. We also give a brief outline of the general steps needed for implementing quantitative thought experiments in Stata. We demonstrate how Stata provides programming facilities for conveniently implementing such thought experiments, with the advantage of saving researchers time, speculation, and debate as well as improving communication in interdisciplinary research groups. Copyright 2013 by StataCorp LP.
quantitative thought experiments, simulations
http://www.stata-journal.com/article.html?article=st0281
Theresa Wimberley
Erik Parner
Henrik Støvring
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:92-1062020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:92-106
article
Regression anatomy, revealed
The regression anatomy theorem (Angrist and Pischke, 2009, Mostly Harmless Econometrics: An Empiricist’s Companion [Princeton University Press]) is an alternative formulation of the Frisch–Waugh–Lovell theorem (Frisch and Waugh, 1933, Econometrica 1: 387–401; Lovell, 1963, Journal of the American Statistical Association 58: 993–1010), a key finding in the algebra of ordinary least-squares multiple regression models. In this article, I present a command, reganat, to implement graphically the method of regression anatomy. This addition complements the built-in Stata command avplot in the validation of linear models, producing bidimensional scatterplots and regression lines obtained by controlling for the other covariates, along with several fine-tuning options. Moreover, I provide 1) a fully worked-out proof of the regression anatomy theorem and 2) an explanation of how the regression anatomy and the Frisch–Waugh–Lovell theorems relate to partial and semipartial correlations, whose coefficients are informative when evaluating relevant variables in a linear regression model. Copyright 2013 by StataCorp LP.
reganat, regression anatomy, Frisch–Waugh–Lovell theorem, linear models, partial correlation, semipartial correlation
http://www.stata-journal.com/article.html?article=st0285
Valerio Filoso
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:136-1402020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:136-140
article
Estimating Geweke’s (1982) measure of instantaneous feedback
In this article, we describe the gwke82 command, which implements a measure of instantaneous feedback for two time series following Geweke (1982, Journal of the American Statistical Association 77: 304–313). Copyright 2013 by StataCorp LP.
gwke82, Geweke, Granger, causality, vector autoregression, time series, instantaneous feedback
http://www.stata-journal.com/article.html?article=st0287
Mehmet F. Dicle
John Levendis
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:206-2112020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:206-211
article
Review of Data Analysis Using Stata, Third Edition, by Kohler and Kreuter
This article reviews Data Analysis Using Stata, Third Edition, by Ulrich Kohler and Frauke Kreuter (2012 [Stata Press]). Copyright 2013 by StataCorp LP.
data analysis, introductory, teaching, German Socio-Economic Panel
http://www.stata-journal.com/article.html?article=gn0056
L. Philip Schumm
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:77-912020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:77-91
article
Trial sequential boundaries for cumulative meta-analyses
We present a new command, metacumbounds, for the estimation of trial sequential monitoring boundaries in cumulative meta-analyses. The approach is based on the Lan–DeMets method for estimating group sequential boundaries in individual randomized controlled trials by using the package ldbounds in R statistical software. Through Stata’s metan command, metacumbounds plots the Lan–DeMets bounds, z-values, and p-values obtained from both fixed and randomeffects cumulative meta-analyses. The analysis can be performed with count data or on the hazard scale for time-to-event data. Copyright 2013 by StataCorp LP.
metacumbounds, trial sequential analysis, cumulative metaanalysis, information size, Lan–DeMets bounds, monitoring boundary, cumulative z score, heterogeneity
http://www.stata-journal.com/article.html?article=st0284
Branko Miladinovic
Iztok Hozo
Benjamin Djulbegovic
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:107-1132020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:107-113
article
kmlmap: A Stata command for producing Google’s Keyhole Markup Language
kmlmap produces a Keyhole Markup Language file by using Stata data and geographical coordinates. This file produces a map when uploaded to Google Maps. The resulting map is a so-called choropleth or thematic map, where units of analysis are colored according to values of a variable. You can click on units of analysis on the map to display more information, to zoom, and to do all other things that Google Maps can do. The units of analysis can be points or polygons. Copyright 2013 by StataCorp LP.
kmlmap, geographic information systems, Keyhole Markup Language, XML, map, Google Maps, thematic, choropleth
http://www.stata-journal.com/article.html?article=gr0055
Adam Okulicz-Kozaryn
oai:RePEc:tsj:stataj:v:13:y:2013:i:1:p:21-382020-09-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:1:p:21-38
article
Versatile sample-size calculation using simulation
I present a new Stata command, simsam, that uses simulation to determine the sample size required to achieve a given statistical power for any hypothesis test under any probability model that can be programmed in Stata. simsam returns the smallest sample size (or smallest multiple of 5, 10, or some other user-specified increment) so that the estimated power exceeds the target. The user controls the precision of the power estimate, and power is reported with a confidence interval. The sample size returned is reliable to the extent that if simsam is repeated, it will, nearly every time, give a sample size no more than one increment away. Copyright 2013 by StataCorp LP.
simsam, sample size, power, simulation
http://www.stata-journal.com/article.html?article=st0282
Richard Hooper
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:975-9902018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:975-990
article
Speaking Stata: Design plots for graphical summary of a response given factors
Design plots, as defined in this article, show summaries of a response variable given the classes or distinct levels of numeric or string variables presented as influencing factors. Any summarize results can be plotted using statsby as an engine to produce summaries for groups of observations defined by classes and their cross-combinations. graph dot is used by default, but graphs may readily be recast using graph hbar or graph bar. Such plots offer scope for detailed yet concise data exploration and reporting. Copyright 2014 by StataCorp LP.
designplot, design plots, graphics, grmeanby, statsby, summarize
http://www.stata-journal.com/article.html?article=gr0061
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:909-9462018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:909-946
article
Robust data-driven inference in the regression-discontinuity design
In this article, we introduce three commands to conduct robust datadriven statistical inference in regression-discontinuity (RD) designs. First, we present rdrobust, a command that implements the robust bias-corrected confidence intervals proposed in Calonico, Cattaneo, and Titiunik (2014d, Econometrica 82: 2295–2326) for average treatment effects at the cutoff in sharp RD, sharp kink RD, fuzzy RD, and fuzzy kink RD designs. This command also implements other conventional nonparametric RD treatment-effect point estimators and confidence intervals. Second, we describe the companion command rdbwselect, which implements several bandwidth selectors proposed in the RD literature. Following the results in Calonico, Cattaneo, and Titiunik (2014a, Working paper, University of Michigan), we also introduce rdplot, a command that implements several data-driven choices of the number of bins in evenly spaced and quantile-spaced partitions that are used to construct the RD plots usually encountered in empirical applications. A companion R package is described in Calonico, Cattaneo, and Titiunik (2014b, Working paper, University of Michigan). Copyright 2014 by StataCorp LP.
rdrobust, rdbwselect, rdplot, regression discontinuity (RD), sharp RD, sharp kink RD, fuzzy RD, fuzzy kink RD, treatment effects, local polynomials, bias correction, bandwidth selection, RD plots
http://www.stata-journal.com/article.html?article=st0366
Sebastian Calonico
Matias D. Cattaneo
Rocio Titiunik
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:830-8462018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:830-846
article
iop: Estimating ex-ante inequality of opportunity
This article describes the user-written command iop to estimate exante inequality of opportunity for different types of variables. Inequality of opportunity is the part of inequality that is due to circumstances beyond the control of the individual. Therefore, it is the ethically offensive part of inequality. Several estimation procedures have been proposed over the past years, and iop is a comprehensive and easy-to-use command that implements many of them. It handles continuous, dichotomous, and ordered variables. In addition to the point estimates, iop also provides bootstrap standard errors and two decomposition methods. Copyright 2014 by StataCorp LP.
iop, inequality of opportunity, dissimilarity index, mean log deviation, decomposition
http://www.stata-journal.com/article.html?article=st0361
Florian Wendelspiess Chavez Juarez
Isidro Soloaga
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:863-8832018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:863-883
article
Analysis of partially observed clustered data using generalized estimating equations and multiple imputation
Clustered data arise in many settings, particularly within the social and biomedical sciences. For example, multiple-source reports are commonly collected in child and adolescent psychiatric epidemiologic studies where researchers use various informants (for instance, parents and adolescents) to provide a holistic view of a subject’s symptoms. Fitzmaurice et al. (1995, American Journal of Epidemiology 142: 1194–1203) have described estimation of multiple-source models using a standard generalized estimating equation (GEE) framework. However, these studies often have missing data because additional stages of consent and assent are required. The usual GEE is unbiased when data are missing completely at random in the context of Little and Rubin (2002, Statistical Analysis with Missing Data [Wiley]). This is a strong assumption that may not be tenable. Other options, such as the weighted GEE, are computationally challenging when missingness is nonmonotone. Multiple imputation is an attractive method to fit incomplete data models while requiring only the less restrictive missing-at-random assumption. Previously, estimation of partially observed clustered data was computationally challenging. However, recent developments in Stata have facilitated using them in practice. We demonstrate how to use multiple imputation in conjunction with a GEE to investigate the prevalence of eating disorder symptoms in adolescents as reported by parents and adolescents and to determine the factors associated with concordance and prevalence. The methods are motivated by the Avon Longitudinal Study of Parents and their Children, a cohort study that enrolled more than 14,000 pregnant mothers in 1991–92 and has followed the health and development of their children at regular intervals. While point estimates for the missing-atrandom model were fairly similar to those for the GEE under missing completely at random, the missing-at-random model had smaller standard errors and required less stringent assumptions regarding missingness. Copyright 2014 by StataCorp LP.
ALSPAC study, eating disorders, multiple informants, weighted estimating equations, generalized estimating equations, multiple imputation, missing data, missing at random, missing completely at random
http://www.stata-journal.com/article.html?article=st0363
Kathryn M. Aloisio
Sonja A. Swanson
Nadia Micali
Alison Field
Nicholas J. Horton
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:884-8942018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:884-894
article
Lee (2009) treatment-effect bounds for nonrandom sample selection
Nonrandom sample selection may render estimated treatment effects biased even if assignment of treatment is purely random. Lee (2009, Review of Economic Studies, 76: 1071–1102) proposes an estimator for treatment-effect bounds that limit the possible range of the treatment effect. In this approach, the lower and upper bound correspond to extreme assumptions about the missing information that are consistent with the observed data. In contrast to conventional parametric approaches to correcting for sample-selection bias, Lee's bounds estimator rests on very few assumptions. I introduce the new command leebounds, which implements the estimator in Stata. The command allows for several options, such as tightening bounds by using covariates. Copyright 2014 by StataCorp LP.
leebounds, nonparametric, randomized trial, sample selection, attrition, bounds, treatment effect
http://www.stata-journal.com/article.html?article=st0364
Harald Tauchmann
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:9972018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:997
article
Software updates
Updates for previously published packages are provided.
Editors
oai:RePEc:tsj:stataj:v:11:y:2011:i:4:p:6342018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:4:p:634
article
Software updates
Updates for a previously published package are provided. Copyright 2011 by StataCorp LP.
Editors
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:895-9082018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:895-908
article
General-to-specific modeling in Stata
Empirical researchers are frequently confronted with issues regarding which explanatory variables to include in their models. This article describes the application of a well-known model-selection algorithm to Stata: general-to-specific (GETS) modeling. This process provides a prescriptive and defendable way of selecting a few relevant variables from a large list of potentially important variables when fitting a regression model. Several empirical issues in GETS modeling are then discussed, specifically, how such an algorithm can be applied to estimations based upon cross-sectional, time-series, and panel data. A command is presented, written in Stata and Mata, that implements this algorithm for various data types in a flexible way. This command is based on Stata’s regress or xtreg command, so it is suitable for researchers in the broad range of fields where regression analysis is used. Finally, the genspec command is illustrated using data from applied studies of GETS modeling with Monte Carlo simulation. It is shown to perform as empirically predicted and to have good size and power (or gauge and potency) properties under simulation. Copyright 2014 by StataCorp LP.
genspec, model selection, general to specific, statistical analysis, specification tests
http://www.stata-journal.com/article.html?article=st0365
Damian Clarke
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:708-7372018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:708-737
article
Plotting regression coefficients and other estimates
Graphical display of regression results has become increasingly popular in presentations and in scientific literature because graphs are often much easier to read than tables. Such plots can be produced in Stata by the marginsplot command (see [R] marginsplot). However, while marginsplot is versatile and flexible, it has two major limitations: it can only process results left behind by margins (see [R] margins), and it can handle only one set of results at a time. In this article, I introduce a new command called coefplot that overcomes these limitations. It plots results from any estimation command and combines results from several models into one graph. The default behavior of coefplot is to plot markers for coefficients and horizontal spikes for confidence intervals. However, coefplot can also produce other types of graphs. I illustrate the capabilities of coefplot by using a series of examples. Copyright 2014 by StataCorp LP.
coefplot, marginsplot, margins, regression plot, coefficients plot, ropeladder plot
http://www.stata-journal.com/article.html?article=gr0059
Ben Jann
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:991-9962018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:991-996
article
Stata tip 121: Box plots side by side
http://www.stata-journal.com/article.html?article=gr0062
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:756-7772018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:756-777
article
Estimation of multiprocess survival models with cmp
Multilevel multiprocess hazard models are routinely used by demographers to control for endogeneity and selection effects. These models consist of multilevel proportional hazards equations, and possibly probit equations, with correlated random effects. Although Stata currently lacks a specialized command for fitting systems of multilevel proportional hazards models, systems of seemingly unrelated lognormal survival models can be fit with the user-written cmp command (Roodman 2011, Stata Journal 11: 159–206). In this article, we describe multiprocess survival models and demonstrate theoretical and practical aspects of estimation. We also illustrate the application of the cmp command using examples related to demographic research. The examples use a dataset shipped with the statistical software aML. Copyright 2014 by StataCorp LP.
survival analysis, multilevel analysis, multilevel multiprocess hazard model, simultaneous equations, SUR estimation, cmp
http://www.stata-journal.com/article.html?article=st0358
Tamas Bartus
David Roodman
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:847-8622018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:847-862
article
femlogit-Implementation of the multinomial logit model with fixed effects
Fixed-effects models have become increasingly popular in social-science research. The possibility to control for unobserved heterogeneity makes these models a prime tool for causal analysis. Fixed-effects models have been derived and implemented for many statistical software packages for continuous, dichotomous, and count-data dependent variables. Chamberlain (1980, Review of Economic Studies 47: 225–238) derived the multinomial logistic regression with fixed effects. However, this model has not yet been implemented in any statistical software package. Possible applications would be analyses of effects on employment status, with special consideration of part-time or irregular employment, and analyses of effects on voting behavior that implicitly control for long-time party identification rather than measuring it directly. This article introduces an implementation of this model with the new command femlogit. I show its application with British election panel data. Copyright 2014 by StataCorp LP.
femlogit, multinomial logit, fixed effects, panel data, multilevel data, unobserved heterogeneity, discrete choice, random effects, conditional logit
http://www.stata-journal.com/article.html?article=st0362
Klaus Pforr
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:738-7552018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:738-755
article
Tools for checking calibration of a Cox model in external validation: Approach based on individual event probabilities
The Cox proportional hazards model has been used extensively in medicine over the last 40 years. A popular application is to develop a multivariable prediction model, often a prognostic model to predict the clinical outcome of patients with a particular disorder from "baseline" factors measured at some initial time point. For such a model to be useful in practice, it must be "validated"; that is, it must perform satisfactorily in an external sample of patients independent of the sample on which the model was originally developed. One key aspect of performance is calibration, which is the accuracy of prediction, particularly of survival (or equivalently, failure or event) probabilities at any time after the time origin. We believe systematic evaluation of the calibration of a Cox model has been largely ignored in the literature. In this article, we suggest an approach to assessing calibration using individual event probabilities estimated at different time points. We exemplify the method by detailed analysis of two datasets in the disease primary biliary cirrhosis; the datasets comprise a derivation and a validation dataset. We describe a new command, stcoxcal, that performs the necessary calculations. Results for stcoxcal can be displayed graphically, which makes it easier for users to picture calibration (or lack thereof) according to follow-up time. Copyright 2014 by StataCorp LP.
stcoxcal, Cox proportional hazards model, multivariable model, prognostic factors, external validation, calibration, survival probabilities
http://www.stata-journal.com/article.html?article=st0357
Patrick Royston
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:759-7752018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:759-775
article
The chi-squared goodness-of-fit test for count-data models
In this article, we discuss the implementation of Andrews’s (1988a, Journal of Econometrics 37: 135–156; 1988b, Econometrica 56: 1419–1453) chisquared goodness-of-fit test as a postestimation command. The new command chi2gof reports the test statistic, its degrees of freedom, and its p-value. chi2gof can be used after the poisson, nbreg, zip, and zinb commands. Copyright 2014 by StataCorp LP.
chi2gof, Andrews’s chi-squared goodness-of-fit test, m-tests, count-data models
http://www.stata-journal.com/article.html?article=st0360
Miguel Manjon
Oscar Martinez
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:965-9742018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:965-974
article
Collecting and organizing Stata graphs
Stata includes a powerful set of tools for constructing a wide array of graphs. Stata's graphing capabilities are well suited for describing exploratory or preliminary data analyses as well as producing publication-quality graphics. Currently, however, Stata does not have a built-in suite of commands for constructing various types of files (for example, HTML, TEX, or RTF files) to display multiple graphs. Such files can be invaluable for organizing and facilitating the interpretation of the numerous graphs needed throughout an analysis or in the final stage of a project. In this article, we provide an overview of two commands, graphsto and graphout, designed to organize and process multiple graphs across various file types. Copyright 2014 by StataCorp LP.
graphsto, graphout, graph, graph export, HTML, RTF, TEX
http://www.stata-journal.com/article.html?article=gr0060
Joseph D. Wolfe
Shawn Bauldry
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:947-9642018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:947-964
article
adjcatlogit, ccrlogit, and ucrlogit: Fitting ordinal logistic regression models
In this article, I present three commands that perform adjacentcategory logistic regression (adjcatlogit), constrained continuation-ratio logistic regression (ccrlogit), and unconstrained continuation-ratio logistic regression (ucrlogit) for ordered response data. Copyright 2014 by StataCorp LP.
adjcatlogit, ccrlogit, ucrlogit, ordinal models, ordered regression, logistic models, adjacent category, continuation ratio
http://www.stata-journal.com/article.html?article=st0367
Morten W. Fagerland
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:778-7972018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:778-797
article
dhreg, xtdhreg, and bootdhreg: Commands to implement double-hurdle regression
The dhreg command implements maximum likelihood estimation of the double-hurdle model for continuously distributed outcomes. The command includes the option to fit a p-tobit model, that is, a model that estimates only an intercept for the hurdle equation. The bootdhreg command (the bootstrap version of dhreg) may be convenient if the data-generating process is more complicated or if heteroskedasticity is suspected. The xtdhreg command is a random-effects version of dhreg applicable to panel data. However, this estimator differs from standard random-effects estimators in the sense that the outcome of the first hurdle applies to the complete set of observations for a given subject instead of applying at the level of individual observations. Command options include estimation of a correlation parameter capturing dependence between the two hurdles. Copyright 2014 by StataCorp LP.
dhreg, xtdhreg, bootdhreg, hurdle model, double-hurdle model, random-effects double-hurdle model, tobit, p-tobit, inverse Mills ratio, bootstrapping
http://www.stata-journal.com/article.html?article=st0359
Christoph Engel
Peter G. Moffatt
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:703-7072018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:703-707
article
The Stata Journal Editors' Prize 2014: Roger Newson
http://www.stata-journal.com/article.html?article=gn0062
H. Joseph Newton
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:13:y:2013:i:4:p:669-6712018-09-20RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:4:p:669-671
article
The Stata Journal Editors' Prize 2013: Erik Thorlund Parner and Per Kragh Andersen
http://www.stata-journal.com/article.html?article=gn0058
H. Joseph Newton
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:682-6852018-06-14RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:682-685
article
Stata tip 68: Week assumptions
http://www.stata-journal.com/article.html?article=dm0052
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:465-5062018-06-14RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:465-506
article
Enhanced routines for instrumental variables/generalized method of moments estimation and testing
We extend our 2003 paper on instrumental variables and generalized method of moments estimation, and we test and describe enhanced routines that address heteroskedasticity- and autocorrelation-consistent standard errors, weak instruments, limited-information maximum likelihood and k-class estimation, tests for endogeneity and Ramsey's regression specification-error test, and autocorrelation tests for instrumental variable estimates and panel-data instrumental variable estimates. Copyright 2007 by StataCorp LP.
ivactest, ivendog, ivhettest, ivreg2, ivreset, overid, ranktest, instrumental variables, weak instruments, GMM, endogeneity, heteroskedasticity, serial correlation, HAC standard errors, LIML, CUE, overidentifying restrictions, Frisch-Waugh-Lovell theorem, RESET, Cumby-Huizinga test
http://www.stata-journal.com/article.html?article=st0030_3
http://www.stata-journal.com/software/sj7-4/st0030_3/
Christopher F Baum
Mark E. Schaffer
Steven Stillman
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:542-5552018-06-14RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:542-555
article
A generic function evaluator implemented in Mata
When implementing new statistical procedures, there is often a need for simple--and yet computationally efficient--ways of numerically evaluating composite distribution functions. If the statistical procedure must support calculations for censored and noncensored cases, those calculations should be carried out using efficient computational implementations of both definite and indefinite integrals (e.g., calculation of tail areas of distribution functions). We developed a generic function evaluator such that users may specify a function using reverse Polish notation. As its argument the function evaluator takes a matrix of pointers and then applies the rows of this matrix to its internally defined stack of pointers. Accordingly, each row of the argument matrix defines a single operation such as evaluating a function on the current element of the stack, applying an algebraic operation to the two top elements of the stack, or manipulating the stack itself. Defining new composite distribution functions from other (atomic) distribution functions then corresponds to joining two or more function-defining matrices vertically. This approach can further be used to obtain integrals of any defined function. As an example we show how the density and distribution function for the minimum of two Weibull distributed random variables can be numerically evaluated and integrated. The procedure provides a flexible and extensible framework for imple- menting numerical evaluation of general, composite distributions. The procedure is numerically relatively efficient, although not optimal. Copyright 2007 by StataCorp LP.
rpnfcn(), RPN, Mata
http://www.stata-journal.com/article.html?article=pr0034
http://www.stata-journal.com/software/sj7-4/pr0034/
Henrik Stovring
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:689-6902018-06-14RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:689-690
article
Stata tip 93: Handling multiple y axes on twoway graphs
http://www.stata-journal.com/article.html?article=gr0047
Vince Wiggins
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:445-4642018-06-14RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:445-464
article
Multiple imputation of missing values: further update of ice, with an emphasis on interval censoring
Multiple imputation of missing data continues to be a topic of considerable interest and importance to applied researchers. In this article, the ice package for multiple imputation is further updated. Special attention in this article is paid to imputing interval-censored observations, and a suggestion to use imputation of right-censored survival data to elucidate covariate effects graphically. Copyright 2007 by StataCorp LP.
ice, uvis, micombine, ice reformat, multiple imputation, interval censoring, visualization, censored survival data
http://www.stata-journal.com/article.html?article=st0067_3
http://www.stata-journal.com/software/sj7-4/st0067_3/
Patrick Royston
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:507-5412018-06-14RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:507-541
article
Causal inference with observational data
Problems with inferring causal relationships from nonexperimental data are briefly reviewed, and four broad classes of methods designed to allow estimation of and inference about causal parameters are described: panel regression, matching or reweighting, instrumental variables, and regression discontinuity. Practical examples are offered, and discussion focuses on checking required assumptions to the extent possible. Copyright 2007 by StataCorp LP.
xtreg, psmatch2, nnmatch, ivreg, ivreg2, ivregress, rd, lpoly, xtoverid, ranktest, causal inference, match, matching, reweighting, propensity score, panel, instrumental variables, excluded instrument, weak identification, regression, discontinuity, local polynomial
http://www.stata-journal.com/article.html?article=st0136
http://www.stata-journal.com/software/sj7-4/st0136/
Austin Nichols
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:335-3472020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:335-347
article
Tests and confidence sets with correct size when instruments are potentially weak
We consider inference in the linear regression model with one endoge- nous variable and potentially weak instruments. We construct confidence sets for the coefficient on the endogenous variable by inverting the Anderson-Rubin, Lagrange multiplier, and conditional likelihood-ratio tests. Our confidence sets have correct coverage probabilities even when the instruments are weak. We propose a numerically simple algorithm for finding these confidence sets, and we present a Stata command that supersedes the one presented in Moreira and Poi (Stata Journal 3: 57–70). Copyright 2006 by StataCorp LP.
condivreg, instrumental variables, weak instruments, confidence set, similar test
http://www.stata-journal.com/article.html?article=st0033_2
http://www.stata-journal.com/software/sj6-3/st0033_2/
Anna Mikusheva
Brian P. Poi
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:348-3632020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:348-363
article
Graphical representation of interactions
We provide a program to illustrate interactions between treatment and covariates or between two covariates by using forest plots under either the Cox proportional hazards or the logistic regression model. The program is flexible in both the possibility of illustrating more than one interaction at a time and variable specifications of scale.
fintplot, interaction, forest plot, randomized controlled trial, survival analysis, logistic regression
http://www.stata-journal.com/article.html?article=gr0024
http://www.stata-journal.com/software/sj6-3/gr0024/
Friederike Maria-Sophie Barthel
Patrick Royston
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:420-4242020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:420-424
article
Review of A Gentle Introduction to Stata by Acock
This article reviews A Gentle Introduction to Stata by Acock.
introductory statistics, social science, teaching Stata
http://stata-press.com/books/acock-review.pdf
http://www.stata-journal.com/article.html?article=gn0033
Michael Mulcahy
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:425-4272020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:425-427
article
Stata tip 34: Tabulation by listing
http://www.stata-journal.com/article.html?article=dm0023
David A. Harrison
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:422-4382020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:422-438
article
A seasonal unit-root test with Stata
Many economic time series exhibit important systematic fluctuations within the year, i.e., seasonality. In contrast to usual practice, I argue that using original data should always be considered, although the process is more compli- cated than that of using seasonally adjusted data. Motivations to use unadjusted data come from the information contained in their peaks and troughs and from economic theory. One major complication is the possible unit root at seasonal frequencies. In this article, I tackle the issue of implementing a test to identify the source of seasonality. In particular, I follow Hylleberg et al. (1990, Journal of Econometrics 44: 215–238) for quarterly data. Copyright 2009 by StataCorp LP.
sroot, unit roots, seasonality, seasonal unit root, HEGY
http://www.stata-journal.com/article.html?article=st0172
http://www.stata-journal.com/software/sj9-3/st0172/
Domenico Depalo
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:478-4962020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:478-496
article
Speaking Stata: Creating and varying box plots
Box plots have been a standard statistical graph since John W. Tukey and his colleagues and students publicized them energetically in the 1970s. In Stata, graph box and graph hbox are commands available to draw box plots, but sometimes neither is sufficiently flexible for drawing some variations on standard box plot designs. This column explains how to use egen to calculate the statistical ingredients needed for box plots and twoway to re-create the plots themselves. That then allows variations such as adding means, connecting medians, or showing all data points beyond certain quantiles.
box plots, dispersion diagrams, distributions, egen, graphics, percentile, quantile, range bars, twoway
http://www.stata-journal.com/article.html?article=gr0039
http://www.stata-journal.com/software/sj9-3/gr0039/
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:428-4292020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:428-429
article
Stata tip 35: Detecting whether data have changed
http://www.stata-journal.com/article.html?article=dm0024
William Gould
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:401-4122020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:401-412
article
Mata Matters: Macros
Mata is Stata's matrix language. In the Mata Matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. Macros are the subject of this column.
Mata, macros
http://www.stata-journal.com/article.html?article=pr0040
http://www.stata-journal.com/software/s8-3/pr0040/
William Gould
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:398-4212020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:398-421
article
Implementing weak-instrument robust tests for a general class of instrumental-variables models
We present a minimum distance approach for conducting hypothesis testing in the presence of potentially weak instruments. Under this approach, we propose size-correct tests for limited dependent variable models with endogenous explanatory variables such as endogenous tobit and probit models. Addition- ally, we extend weak-instrument tests for the linear instrumental-variables model by allowing for variance–covariance estimation that is robust to arbitrary het- eroskedasticity or intracluster dependence. We invert these tests to construct confidence intervals on the coefficient of the endogenous variable. We also provide a postestimation command for Stata, called rivtest, for computing the tests and estimating confidence intervals. Copyright 2009 by StataCorp LP.
rivtest, ivregress, ivprobit, ivtobit, condivreg, ivreg2, weak instruments, endogenous tobit, endogenous probit, two-stage least squares, hypothesis testing, confidence intervals
http://www.stata-journal.com/article.html?article=st0171
http://www.stata-journal.com/software/sj9-3/st0171/
Keith Finlay
Leandro M. Magnusson
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:364-3762020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:364-376
article
Jackknife instrumental variables estimation in Stata
The two-stage least-squares (2SLS) instrumental variables estimator is commonly used to address endogeneity. However, the estimator suffers from bias that is exacerbated when the instruments are only weakly correlated with the en- dogenous variables and when many instruments are used. In this article, I discuss jackknife instrumental variables estimation as an alternative to 2SLS. Monte Carlo simulations comparing the jackknife instrument variables estimators to 2SLS and limited information maximum likelihood (LIML) show that two of the four vari- ants perform remarkably well even when 2SLS does not. In a weak-instrument experiment, the two best performing jackknife estimators also outperform LIML. Copyright 2006 by StataCorp LP.
jive, 2SLS, LIML, JIVE, instrumental variables, endogeneity, weak instruments
http://www.stata-journal.com/article.html?article=st0108
http://www.stata-journal.com/software/sj6-3/st0108/
Brian P. Poi
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:384-3862020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:384-386
article
Importing Federal Reserve economic data
This note describes freduse, which imports datasets from the Federal Reserve economic data (FRED) repository. Copyright 2006 by StataCorp LP.
freduse, Federal Reserve economic data repository
http://www.stata-journal.com/article.html?article=st0110
http://www.stata-journal.com/software/sj6-3/st0110/
David M. Drukker
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:387-3962020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:387-396
article
Mata Matters: Interactive use
Mata is Stata’s matrix language. In the Mata Matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. In this quarter’s column, we look at interactive use of Mata.
Mata, interactive use
http://www.stata-journal.com/article.html?article=pr0024
http://www.stata-journal.com/software/sj6-3/pr0024/
William Gould
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:377-3832020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:377-383
article
Difference-based semiparametric estimation of partial linear regression models
This article describes the plreg command, which implements the difference-based algorithm for fitting partial linear regression models. Copyright 2006 by StataCorp LP.
plreg, nonparametric regression, difference-based estimator, partial linear regression
http://www.stata-journal.com/article.html?article=st0109
http://www.stata-journal.com/software/sj6-3/st0109/
Michael Lokshin
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:430-4322020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:430-432
article
Stata tip 36: Which observations?
http://www.stata-journal.com/article.html?article=dm0025
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:309-3342020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:309-334
article
Confidence intervals for rank statistics: Somers' D and extensions
Somers' D is an asymmetric measure of association between two variables, which plays a central role as a parameter behind rank or nonparametric statistical methods. Given predictor variable X and outcome variable Y, we may estimate D(YX) as a measure of the effect of X on Y, or we may estimate D(XY) as a performance indicator of X as a predictor of Y. The somersd package allows the estimation of Somers’ D and Kendall’s tau-a with confidence limits as well as p-values. The Stata 9 version of somersd can estimate extended versions of Somers' D not previously available, including the Gini index, the parameter tested by the sign test, and extensions to left- or right-censored data. It can also estimate stratified versions of Somers' D, restricted to pairs in the same stratum. Therefore, it is possible to define strata by grouping values of a confounder, or of a propensity score based on multiple confounders, and to estimate versions of Somers' D that measure the association between the outcome and the predictor, adjusted for the confounders. The Stata 9 version of somersd uses the Mata language for improved computational efficiency with large datasets. Copyright 2006 by StataCorp LP.
somersd, Somers' D, Kendall’s tau-a , Harrell’s c, ROC area, Gini index, population-attributable risk, rank correlation, rank-sum test, Wilcoxon test, sign test, confidence intervals, nonparametric methods, propensity score
http://www.stata-journal.com/article.html?article=snp15_6
http://www.stata-journal.com/software/sj6-3/snp15_6/
Roger Newson
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:208-3082020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:208-308
article
Maximum likelihood estimation of endogenous switching and sample selection models for binary, ordinal, and count variables
Studying behavior in economics, sociology, and statistics often involves fitting models in which the response variable depends on a dummy variable- also known as a regime-switch variable- or in which the response variable is observed only if a particular selection condition is met. In either case, standard regression techniques deliver inconsistent estimators if unobserved factors that affect the re- sponse are correlated with unobserved factors that affect the switching or selection variable. Consistent estimators can be obtained by maximum likelihood estimation of a joint model of the outcome and switching or selection variable. This article describes a “wrapper” program, ssm, that calls gllamm (Rabe-Hesketh, Skrondal, and Pickles, GLLAMM Manual [University of California – Berkeley, Division of Bio- statistics, Working Paper Series, Paper No. 160]) to fit such models. The wrapper accepts data in a simple structure, has a straightforward syntax, and reports out- put that is easily interpretable. One important feature of ssm is that the log likelihood can be evaluated using adaptive quadrature (Rabe-Hesketh, Skrondal, and Pickles, Stata Journal 2: 1–21; Journal of Econometrics 128: 301–323). Copyright 2006 by StataCorp LP.
endogenous switching, sample selection, binary variable, count data, ordinal variable, probit, Poisson regression, adaptive quadrature, gllamm, wrapper, ssm
http://www.stata-journal.com/article.html?article=st0107
http://www.stata-journal.com/software/sj6-3/st0107/
Alfonso Miranda
Sophia Rabe-Hesketh
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:397-4192020-11-26RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:397-419
article
Speaking Stata: Graphs for all seasons
Time series showing seasonality- marked variation with time of year- are of interest to many scientists, including climatologists, other environmental scientists, epidemiologists, and economists. The usual graphs plotting response variables against time, or even time of year, are not always the most effective at showing the fine structure of seasonality. I survey various modifications of the usual graphs and other kinds of graphs with a range of examples. Although I introduce here two new Stata commands, cycleplot and sliceplot, I emphasize exploiting standard functions, data management commands, and graph options to get the graphs desired.
cycleplot, sliceplot, seasonality, time series, graphics, cycle plot, rotation, state space, incidence plots, folding, repeating
http://www.stata-journal.com/article.html?article=gr0025
http://www.stata-journal.com/software/sj6-3/gr0025/
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:605-6222018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:605-622
article
Graphical augmentations to the funnel plot to assess the impact of a new study on an existing meta-analysis
Funnel plots are currently advocated to investigate the presence of publication bias (and other possible sources of bias) in meta-analysis. A previously described augmentation to the funnel plot—to aid its interpretation in assessing publication biases—is the addition of statistical contours indicating regions where studies would have to be for a given level of significance, as implemented in the Stata package confunnel by Palmer et al. (2008, Stata Journal 8: 242–254). In this article, we describe http://www.stata-journal.com/software/the implementation of a new range of overlay aug- mentations to the funnel plot, many described in detail recently by Langan et al. (2012, Journal of Clinical Epidemiology 65: 511–519). The purpose of these over- lays is to display the potential impact a new study would have on an existing meta-analysis, providing an indication of the robustness of the meta-analysis to the addition of new evidence. Thus these overlays extend the use of the funnel plot beyond assessments of publication biases. Two main graphical displays are described: 1) statistical significance contours, which define regions of the funnel plot where a new study would have to be located to change the statistical signifi- cance of the meta-analysis; and 2) heterogeneity contours, which show how a new study would affect the extent of heterogeneity in a given meta-analysis. We present the extfunnel command, which implements the methods of Langan et al. (2012, Journal of Clinical Epidemiology 65: 511–519), and, furthermore, we extend the graphical displays to illustrate the impact a new study has on lower and upper confidence interval values and the confidence interval width of the pooled meta-analytic result. We also describe http://www.stata-journal.com/software/overlays for the impact of a future study on user-defined limits of clinical equivalence. We implement inverse- variance weighted methods by using both explicit formulas for contour lines and a simulation approach optimized in Mata. Copyright 2012 by StataCorp LP.
extfunnel, funnel plots, meta-analysis, graphs
http://www.stata-journal.com/article.html?article=gr0054
Michael J. Crowther
Dean Langan
Alex J. Sutton
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:568-5842018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:568-584
article
Frequentist q-values for multiple-test procedures
Multiple-test procedures are increasingly important as technology in- creases scientists’ ability to make large numbers of multiple measurements, as they do in genome scans. Multiple-test procedures were originally defined to input a vector of input p-values and an uncorrected critical p-value, interpreted as a fami- lywise error rate or a false discovery rate, and to output a corrected critical p-value and a discovery set, defined as the subset of input p-values that are at or below the corrected critical p-value. A range of multiple-test procedures is implemented us- ing the smileplot package in Stata (Newson and the ALSPAC Study Team 2003, Stata Journal 3: 109–132; 2010, Stata Journal 10: 691–692). The qqvalue com- mand uses an alternative formulation of multiple-test procedures, which is also used by the R function p.adjust. qqvalue inputs a variable of p-values and out- puts a variable of q-values that are equal in each observation to the minimum familywise error rate or false discovery rate that would result in the inclusion of the corresponding p-value in the discovery set if the specified multiple-test pro- cedure was applied to the full set of input p-values. Formulas and examples are presented. Copyright 2010 by StataCorp LP.
qqvalue, smileplot, multproc, p.adjust, R, multiple-test procedure, data mining, familywise error rate, false discovery rate, Bonferroni, Sˇid ́ak, Holm, Holland, Copenhaver, Hochberg, Simes, Benjamini, Yekutieli
http://www.stata-journal.com/article.html?article=st0209
Roger B. Newson
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:413-4392018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:413-439
article
Speaking Stata: Correlation with conﬁdence, or Fisher's z revisited
Ronald Aylmer Fisher suggested transforming correlations by using the inverse hyperbolic tangent, or atanh function, a device often called Fisher's z transformation. This article reviews that function and its inverse, the hyperbolic tangent, or tanh function, with discussions of their deﬁnitions and behavior, their use in statistical inference with correlations, and how to apply them in Stata. Examples show the use of Stata and Mata in calculator style. New commands corrci and corrcii are also presented for correlation conﬁdence intervals. The results of using bootstrapping to produce conﬁdence intervals for correlations are also compared. Various historical comments are sprinkled throughout.
corrci, corrcii, correlation, conﬁdence intervals, Fisher’s z, transformation, bootstrap, Mata
http://www.stata-journal.com/article.html?article=pr0041
http://www.stata-journal.com/software/sj8-3/pr0041/
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:520-5312018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:520-531
article
A closer examination of subpopulation analysis of complex-sample survey data
In recent years, general-purpose statistical software packages have incorporated new procedures that feature several useful options for design-based analysis of complex-sample survey data. A common and frequently desired technique for analysis of survey data in practice is the restriction of estimation to a subpopulation of interest. These subpopulations are often referred to interchangeably in a variety of ﬁelds as subclasses, subgroups, and domains. In this article, we consider two approaches that analysts of complex-sample survey data can follow when analyzing subpopulations; we also consider the implications of each approach for estimation and inference. We then present examples of both approaches, using selected procedures in Stata to analyze data from the National Hospital Ambulatory Medical Care Survey (NHAMCS). We conclude with important considerations for subpopulation analyses and a summary of suggestions for practice. Copyright 2008 by StataCorp LP.
survey data analysis, statistical software, complex sample designs, subpopulation analysis
http://www.stata-journal.com/article.html?article=st0153
http://www.stata-journal.com/software/sj8-4/st0153/
Brady T. West
Patricia Berglund
Steven G. Heeringa
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:461-4782018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:461-478
article
Partial frontier efficiency analysis
Despite their frequent use in applied work, nonparametric approaches to efficiency analysis-namely, data envelopment analysis and free disposal hull- have bad reputations among econometricians. This is mainly because data envelopment analysis and free disposal hull represent deterministic approaches that are highly sensitive to outliers and measurement errors. However, so-called partial frontier approaches have recently been developed, namely, order-m and order-α. These approaches generalize free disposal hull by allowing for superefficient observations to be located beyond the estimated production-possibility frontier. Although these methods are also purely nonparametric, the sensitivity to outliers is substantially reduced by partial frontier approaches enveloping just a subsample of observations. In this article, I introduce the new Stata commands orderm and orderalpha, which implement order-m, order-α, and free disposal hull efficiency analysis in Stata. The commands allow for several options, such as statistical inference based on subsampling bootstrapping.
orderalpha, orderm, nonparametric, efficiency, partial frontier, free disposal hull, outlier-robust, decision-making unit
http://www.stata-journal.com/article.html?article=st0270
Harald Tauchmann
oai:RePEc:tsj:stataj:v:13:y:2013:i:4:p:759-7752018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:13:y:2013:i:4:p:759-775
article
Flexible parametric illness-death models
It is usual in time-to-event data to have more than one event of interest, for example, time to death from different causes. Competing risks models can be applied in these situations where events are considered mutually exclusive absorbing states. That is, we have some initial state—for example, alive with a diagnosis of cancer—and we are interested in several different endpoints, all of which are final. However, the progression of disease will usually consist of one or more intermediary events that may alter the progression to an endpoint. These events are neither initial states nor absorbing states. Here we consider one of the simplest multistate models, the illness-death model. stpm2illd is a postestimation command used after fitting a flexible parametric survival model with stpm2 to estimate the probability of being in each of four states as a function of time. There is also the option to generate confidence intervals and transition hazard functions. The new command is illustrated through a simple example. Copyright 2013 by StataCorp LP.
illdprep, stpm2illd, survival analysis, multistate models, flexible parametric models
http://www.stata-journal.com/article.html?article=st0316
Sally R. Hinchliffe
David A. Scott
Paul C. Lambert
oai:RePEc:tsj:stataj:v:9:y:2009:i:1:p:70-852018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:1:p:70-85
article
A general-purpose method for two-group randomization tests
We outline a novel approach to calculate exact p-levels for two-sample randomization tests. The approach closely resembles permute in its applications, with the main diﬀerence being that the results are approximated only if the execution time needed to calculate exact p-levels would exceed a speciﬁed maximum. We demonstrate its use by deriving p-levels for the signiﬁcance of Somers’ D, the coefficient of variation, the diﬀerence in means and in medians, and the difference in two multinomials. Copyright 2009 by StataCorp LP.
tsrtest, mwtest, fptest, somersdtest, cvtest, vartest, randomization tests, Monte Carlo, two-sample problem
http://www.stata-journal.com/article.html?article=st0158
http://www.stata-journal.com/software/sj9-1/st0158/
Johannes Kaiser
Michael G. Lacy
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:299-3042018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:299-304
article
Generating random samples from user-defined distributions
Generating random samples in Stata is very straightforward if the distribution drawn from is uniform or normal. With any other distribution, an inverse method can be used; but even in this case, the user is limited to the built- in functions. For any other distribution functions, their inverse must be derived analytically or numerical methods must be used if analytical derivation of the inverse function is tedious or impossible. In this article, I introduce a command that generates a random sample from any user-specified distribution function using numeric methods that make this command very generic.
rsample, random sample, user-defined distribution function, inverse method, Monte Carlo exercise
http://www.stata-journal.com/article.html?article=st0229
Katarına Lukacsy
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:557-5682018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:557-568
article
Speaking Stata: Distinct observations
Distinct observations are those diﬀerent with respect to one or more variables, considered either individually or jointly. Distinctness is thus a key aspect of the similarity or diﬀerence of observations. It is sometimes confounded with uniqueness. Counting the number of distinct observations may be required at any point from initial data cleaning or checking to subsequent statistical analysis. We review how far existing commands in oﬃcial Stata oﬀer solutions to this issue, and we show how to answer questions about distinct observations from ﬁrst principles by using the by preﬁx and the egen command. The new distinct command is oﬀered as a convenience tool.
distinct, by, egen, distinctness, uniqueness, data management
http://www.stata-journal.com/article.html?article=dm0042
http://www.stata-journal.com/software/sj8-4/dm0042/
Nicholas J. Cox
Gary M. Longton
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:252-2582018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:252-258
article
Computing Murphy–Topel-corrected variances in a heckprobit model with endogeneity
We outline a fairly simple method to obtain in Stata Murphy–Topel- corrected variances for a two-step estimation of a heckprobit model with endo- geneity in the main equation. The procedure uses predict’s score option and the powerful matrix tool accum in Stata and builds on previous works by Hardin (2002, Stata Journal 2: 253–266) and Hole (2006, Stata Journal 6: 521–529). Copyright 2010 by StataCorp LP.
binary choice model, heckprobit, selectivity, endogenous variables, two-step estimation, qualitative models, Murphy–Topel-corrected variances
http://www.stata-journal.com/article.html?article=st0191
Juan Muro
Cristina Suarez
Marıa del Mar Zamora
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:686-6882018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:686-688
article
Stata tip 92: Manual implementation of permutations and bootstraps
http://www.stata-journal.com/article.html?article=st0214
Lars Ängquist
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:309-3292018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:309-329
article
Estimation of marginal effects using margeff
This article describes the user-written program margeff, which enables the fast estimation of (average) marginal effects. Besides describing the program, this article offers a new discussion of some problems that are related to computation of marginal effects. I will argue that (1) marginal effects computed at means are not good approximations of average marginal effects, computed as means of marginal effects evaluated at each observations, if some of the parameter estimates are large; (2) both average marginal effects and marginal effects computed at means might produce wrong estimates for dummies that are part of a set of indicator variables indicating different categories of a single underlying variable; and (3) the use of marginal effects computed at means is preferred if some of the regressors are mathematical transformations of other regressors. Copyright 2005 by StataCorp LP.
margeff, mfx, average marginal effects, marginal effects at the mean, dummy variables, squared variables
http://www.stata-journal.com/article.html?article=st0086
http://www.stata-journal.com/software/sj5-3/st0086/
Tamás Bartus
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:230-2512018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:230-251
article
Two techniques for investigating interactions between treatment and continuous covariates in clinical trials
There is increasing interest in the medical world in the possibility of tailoring treatment to the individual patient. Statistically, the relevant task is to identify interactions between covariates and treatments, such that the patient’s value of a given covariate influences how strongly (or even whether) they are likely to respond to a treatment. The most valuable data are obtained in randomized controlled clinical trials of novel treatments in comparison with a control treat- ment. We describe two techniques to detect and model such interactions. The ﬁrst technique, multivariable fractional polynomials interaction, is based on fractional polynomials methodology, and provides a method of testing for continuous-by- binary interactions and by modeling the treatment effect as a function of a continuous covariate. The second technique, subpopulation treatment-eﬀect pattern plot, aims to do something similar but is focused on producing a nonparametric estimate of the treatment effect, expressed graphically. Stata programs for both of these techniques are described. Real data for brain and breast cancer are used as examples. Copyright 2009 by StataCorp LP.
mfpi, mfpi plot, stepp tail, stepp window, stepp plot, continuous covariates, treatment-covariate interaction, clinical trials, fractional polynomials, subpopulation treatment-effect pattern plot
http://www.stata-journal.com/article.html?article=st0164
http://www.stata-journal.com/software/sj9-2/st0164/
Patrick Royston
Willi Sauerbrei
oai:RePEc:tsj:stataj:y:13:y:2013:i:2:p:366-3782018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:y:13:y:2013:i:2:p:366-378
article
Standardizing anthropometric measures in children and adolescents with functions for egen: Update
In this article, we describe an extension to the egen functions zanthro() and zbmicat() (Vidmar et al., 2004, Stata Journal 4: 50–55). All functionality of the original version remains unchanged. In the 2004 version of zanthro(), z scores could be generated using the 2000 U.S. Centers for Disease Control and Prevention Growth Reference and the British 1990 Growth Reference. More recent growth references are now available. For measurement-for-age charts, age can now be adjusted for gestational age. The zbmicat() function previously categorized children according to body mass index (weight/height2) as normal weight, overweight, or obese. “Normal weight” is now split into normal weight and three grades of thinness. Finally, this updated version uses cubic rather than linear interpolation to calculate the values of L, M, and S for the child’s decimal age between successive ages (or length/height for weight-for-length/height charts). Copyright 2013 by StataCorp LP.
zanthro(), zbmicat(), z scores, LMS, egen, anthropometric standards
http://www.stata-journal.com/article.html?article=dm0004_1
Suzanna I. Vidmar
Tim J. Cole
Huiqi Pan
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:554-5562018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:554-556
article
Demand-system estimation: Update
The nlsur command is better suited to demand-system estimation than the suite of ado-ﬁfies provided in Poi (2002, Stata Journal 2: 403–410) because it is faster and requires only one ancillary ado-file. This article replicates the results presented in Poi (2002) by using nlsur instead of ml.
nlsur, demand-system estimation, nonlinear estimation, maximum likelihood, demand equations
http://www.stata-journal.com/article.html?article=st0029_1
http://www.stata-journal.com/software/sj8-4/st0029_1/
Brian P. Poi
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:334-3532018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:334-353
article
Implementing double-robust estimators of causal effects
This article describes the implementation of a double-robust estimator for pretest-posttest studies (Lunceford and Davidian, 2004, Statistics in Medicine 23: 2937-2960) and presents a new Stata command (dr) that carries out the procedure. A double-robust estimator gives the analyst two opportunities for obtaining unbiased inference when adjusting for selection eﬀects such as confounding by allowing for diﬀerent forms of model misspecification; a double-robust estimator also can offer increased efficiency when all the models are correctly speciﬁed. We demonstrate the results with a Monte Carlo simulation study, and we show how to implement the double-robust estimator on a single simulated dataset, both manually and by using the dr command. Copyright 2008 by StataCorp LP.
dr, double-robust estimators, causal models, confounding, inverse probability of treatment weights, propensity score
http://www.stata-journal.com/article.html?article=st0149
http://www.stata-journal.com/software/sj8-3/st0149/
Richard Emsley
Mark Lunt
Andrew Pickles
GraHam Dunn
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:46-632018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:46-63
article
Stata: The language of choice for time-series analysis?
This paper discusses the use of Stata for the analysis of time series and panel data. The evolution of time-series capabilities in Stata is reviewed. Facilities for data management, graphics, and econometric analysis from both official Stata and the user community are discussed. A new routine to provide moving-window regression estimates—rollreg—is described, and its use illustrated. Copyright 2005 by StataCorp LP.
time-series analysis, time-series data, time-series modeling, moving-window regression, rolling regression
http://www.stata-journal.com/software/sj5-1/st0080/
http://www.stata-journal.com/article.html?article=st0080
Christopher F Baum
oai:RePEc:tsj:stataj:y:13:y:2013:i:2:p:356-3652018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:y:13:y:2013:i:2:p:356-365
article
Goodness-of-fit tests for categorical data
A significant aspect of data modeling with categorical predictors is the definition of a saturated model. In fact, there are different ways of specifying it—the casewise, the contingency table, and the collapsing approaches—and they strictly depend on the unit of analysis considered. The analytical units of reference could be the subjects or, alternatively, groups of subjects that have the same covariate pattern. In the first case, the goal is to predict the probability of success (failure) for each individual; in the second case, the goal is to predict the proportion of successes (failures) in each group. The analytical unit adopted does not affect the estimation process; however, it does affect the definition of a saturated model. Consequently, measures and tests of goodness of fit can lead to different results and interpretations. Thus one must carefully consider which approach to choose. In this article, we focus on the deviance test for logistic regression models. However, the results and the conclusions are easily applicable to other linear models involving categorical regressors. We show how Stata 12.1 performs when implementing goodness of fit. In this situation, it is important to clarify which one of the three approaches is implemented as default. Furthermore, a prominent role is played by the shape of the dataset considered (individual format or events–trials format) in accordance with the analytical unit choice. In fact, the same procedure applied to different data structures leads to different approaches to a saturated model. Thus one must attend to practical and theoretical statistical issues to avoid inappropriate analyses. Copyright 2013 by StataCorp LP.
saturated models, categorical data, deviance, goodness-of-fit tests
http://www.stata-journal.com/article.html?article=st0299
Rino Bellocco
Sara Algeri
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:423-4572018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:423-457
article
Estimation of quantile treatment effects with Stata
In this article, we discuss the implementation of various estimators proposed to estimate quantile treatment effects. We distinguish four cases involv- ing conditional and unconditional quantile treatment effects with either exogenous or endogenous treatment variables. The introduced ivqte command covers four different estimators: the classical quantile regression estimator of Koenker and Bassett (1978, Econometrica 46: 33–50) extended to heteroskedasticity consis- tent standard errors; the instrumental-variable quantile regression estimator of Abadie, Angrist, and Imbens (2002, Econometrica 70: 91–117); the estimator for unconditional quantile treatment effects proposed by Firpo (2007, Econometrica 75: 259–276); and the instrumental-variable estimator for unconditional quantile treatment effects proposed by Fr ̈olich and Melly (2008, IZA discussion paper 3288). The implemented instrumental-variable procedures estimate the causal effects for the subpopulation of compliers and are only well suited for binary instruments. ivqte also provides analytical standard errors and various options for nonpara- metric estimation. As a by-product, the locreg command implements local linear and local logit estimators for mixed data (continuous, ordered discrete, unordered discrete, and binary regressors).
ivqte, locreg, quantile treatment effects, nonparametric regression, instrumental variables
http://www.stata-journal.com/article.html?article=st0203
Markus Frolich
Blaise Melly
oai:RePEc:tsj:stataj:v:14:y:2014:i:3:p:605-6222018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:3:p:605-622
article
Space-filling location selection
In this article, we describe an implementation of a space-filling locationselection algorithm. The objective is to select a subset from a list of locations so that the spatial coverage of the locations by the selected subset is optimized according to a geometric criterion. Such an algorithm designed for geographical site selection is useful for determining a grid of points that “covers” a data matrix as needed in various nonparametric estimation procedures. Copyright 2014 by StataCorp LP.
spacefill, spatial sampling, space-filling design, site selection, nonparametric regression, multivariate knot selection, point swapping
http://www.stata-journal.com/article.html?article=st0353
Michela Bia
Philippe Van Kerm
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:288-3082018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:288-308
article
Making regression tables from stored estimates
Organizing and archiving statistical results and processing a subset of those results for publication are important and often underestimated issues in conducting statistical analyses. Because automation of these tasks is often poor, processing results produced by statistical packages is quite laborious and vulnerable to error. I will therefore present a new package called estout that facilitates and automates some of these tasks. This new command can be used to produce regression tables for use with spreadsheets, LATEX, HTML, or word processors. For example, the results for multiple models can be organized in spreadsheets and can thus be archived in an orderly manner. Alternatively, the results can be directly saved as a publication-ready table for inclusion in, for example, a LATEX document. estout is implemented as a wrapper for estimates table but has many additional features, such as support for mfx. However, despite its flexibility, estout is —I believe —still very straightforward and easy to use. Furthermore, estout can be customized via so-called defaults files. A tool to make available supplementary statistics called estadd is also provided. Copyright 2005 by StataCorp LP.
estout, estoutdef, estadd, estimates, regression table, latex, html
http://www.stata-journal.com/article.html?article=st0085
http://www.stata-journal.com/software/sj5-3/st0085/
Ben Jann
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:403-4192018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:403-419
article
A closer examination of three small-sample approximations to the multiple-imputation degrees of freedom
Incomplete data is a common complication in applied research. In this study, we use simulation to compare two approaches to the multiple imputation of a continuous predictor: multiple imputation through chained equations and multivariate normal imputation. This study extends earlier work by being the first to 1) compare the small-sample approximations to the multiple-imputation degrees of freedom proposed by Barnard and Rubin (1999, Biometrika 86: 948– 955); Lipsitz, Parzen, and Zhao (2002, Journal of Statistical Computation and Simulation 72: 309–318); and Reiter (2007, Biometrika 94: 502–508) and 2) ask if the sampling distribution of the t statistics is in fact a Student’s t distribution with the specified degrees of freedom. In addition to varying the imputation method, we varied the number of imputa- tions (m = 5,10,20,100) that were averaged over 500,000 replications to obtain the combined estimates and standard errors for a linear model that regressed the log price of a home on its age (years) and size (square feet) in a sample of 25 observations. Six age values were randomly set equal to missing for each replication. As assessed by the absolute percentage and relative percentage bias, the two approaches performed similarly. The absolute bias of the regression coefficients for age and size was roughly −0.1% across the levels of m for both approaches; the ab- solute bias for the constant was 0.6% for the chained-equations approach and 1.0% for the multivariate normal model. The absolute biases of the standard errors for age, size, and the constant were 0.2%, 0.3%, and 1.2%, respectively. In general, the relative percentage bias was slightly smaller for the chained-equations approach. Graphical and numerical inspection of the empirical sampling distributions for the three t statistics suggested that the area from the shoulder to the tail was reasonably well approximated by a t distribution and that the small-sample ap- proximations to the multiple-imputation degrees of freedom proposed by Barnard and Rubin and by Reiter performed satisfactorily. Copyright 2011 by StataCorp LP.
missing data, multiple imputation, small-sample degrees of freedom
http://www.stata-journal.com/article.html?article=st0235
David A. Wagstaff
Ofer Harel
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:480-4922018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:480-492
article
The Blinder–Oaxaca decomposition for nonlinear regression models
In this article, a general Blinder–Oaxaca decomposition for nonlinear models is derived, which allows the diﬀerence in an outcome variable between two groups to be decomposed into several components. We show how, using nldecompose, this general decomposition can be applied to different models with discrete and limited dependent variables. We further demonstrate how the standard errors of the estimated components can be calculated by using Stata’s bootstrap command as a prefix. Copyright 2008 by StataCorp LP.
nldecompose, Blinder–Oaxaca decomposition, nonlinear models
http://www.stata-journal.com/article.html?article=st0152
http://www.stata-journal.com/software/sj8-4/st0152
Mathia Sinning
Markus Hahn
Thomas K. Bauer
oai:RePEc:tsj:stataj:v:6:y:2006:i:4:p:482-4962018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:4:p:482-496
article
Testing for cross-sectional dependence in panel-data models
This article describes a new Stata routine, xtcsd, to test for the presence of cross-sectional dependence in panels with many cross-sectional units and few time-series observations. The command executes three different test- ing procedures —namely, Friedman’s (Journal of the American Statistical Associ- ation 32: 675–701) (FR) test statistic, the statistic proposed by Frees (Journal of Econometrics 69: 393–414), and the cross-sectional dependence (CD) test of Pe- saran (General diagnostic tests for cross-section dependence in panels [University of Cambridge, Faculty of Economics, Cambridge Working Papers in Economics, Paper No. 0435]). We illustrate the command with an empirical example. Copyright 2006 by StataCorp LP.
xtcsd, panel data, cross-sectional dependence
http://www.stata-journal.com/article.html?article=st0113
http://www.stata-journal.com/software/sj6-4/st0113/
Rafael E. De Hoyos
Vasilis Sarafidis
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:540-5532018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:540-553
article
A shortcut through long loops: An illustration of two alternatives to looping over observations
It is well known that looping over observations can be slow and should be avoided. The objective of this article is to discuss two alternative solutions to looping over observations that can be used to overcome a particular data-management problem of merging datasets in which unique key identifiers changed over time. The ﬁrst alternative, mapch, which is introduced in this article, uses a combination of appending, indexing, and merging to solve the problem, while the second alternative uses repeated merging. Both solutions are much quicker than looping over observations. However, depending on the nature of the problem, one solution may work better than the other. It is argued that the use of such dataset-type manipulations may be suitable to overcome other data-management problems. More generally speaking, the issue that is addressed—searching for an alternative to looping over observations—may be common and illustrates the importance of balancing the costs of developing an efficient solution with the benefits accruing from that solution. Copyright 2008 by StataCorp LP.
mapch, appending, data management, indexing, looping, merging
http://www.stata-journal.com/article.html?article=dm0041
http://www.stata-journal.com/software/sj8-4/dm0041/
Ward Vanlaar
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:420-4382018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:420-438
article
Comparing coefficients of nested nonlinear probability models
In a series of recent articles, Karlson, Holm, and Breen (Breen, Karlson, and Holm, 2011, http://papers.ssrn.com/sol3/papers.cfm?abstractid=1730065; Karlson and Holm, 2011, Research in Stratification and Social Mobility 29: 221– 237; Karlson, Holm, and Breen, 2010, http://www.yale.edu/ciqle/Breen Scaling effects.pdf) have developed a method for comparing the estimated coefficients of two nested nonlinear probability models. In this article, we describe this method and the user-written program khb, which implements the method. The KHB method is a general decomposition method that is unaffected by the rescaling or attenuation bias that arises in cross-model comparisons in nonlinear models. It recovers the degree to which a control variable, Z, mediates or explains the relationship between X and a latent outcome variable, Y ∗, underlying the nonlin- ear probability model. It also decomposes effects of both discrete and continuous variables, applies to average partial effects, and provides analytically derived statistical tests. The method can be extended to other models in the generalized linear model family.
khb, decomposition, path analysis, total effects, indirect effects, direct effects, logit, probit, primary effects, secondary effects, generalized linear model, KHB method
http://www.stata-journal.com/article.html?article=st0236
Ulrich Kohler
Kristian Bernt Karlson
Anders Holm
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:718-7252018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:718-725
article
Incorporating complex sample design effects when only final survey weights are available
In this article, we consider the situation that arises when a survey data producer has collected data from a sample with a complex design (possibly featuring stratification of the population, cluster sampling, and unequal proba- bilities of selection) and for various reasons only provides secondary analysts of those survey data with a final survey weight for each respondent and “average” design effects for survey estimates computed from the data. In general, these “av- erage” design effects, presumably computed by the data producer in a way that fully accounts for all the complex sampling features, already incorporate possible increases in sampling variance due to the use of the survey weights in estimation. The secondary analyst of the survey data—who uses the provided information to compute weighted estimates; computes design-based standard errors reflecting variance in the weights (by using Taylor series linearization, for example); and inflates the estimated variances using the “average” design effects provided—is applying a “double” adjustment to the standard errors for the effect of weighting on the variance estimates, leading to overly conservative inferences. We propose a simple method to prevent this problem and provide a Stata program for applying appropriate adjustments to variance estimates in this situation. We illustrate two applications of the method with survey data from the Monitoring the Future study and conclude with suggested directions for future research in this area.
deft2corr, survey design effects, survey weights, average design effects
http://www.stata-journal.com/article.html?article=st0277
Brady T. West
Sean Esteban McCabe
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:197-2102018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:197-210
article
Updated tests for small-study effects in meta-analyses
This article describes an updated version of the metabias command, which provides statistical tests for funnel plot asymmetry. In addition to the previously implemented tests, metabias implements two new tests that are recom- mended in the recently updated Cochrane Handbook for Systematic Reviews of Interventions (Higgins and Green 2008). The ﬁrst new test, proposed by Harbord, Egger, and Sterne (2006, Statistics in Medicine 25: 3443–3457), is a modiﬁed ver- sion of the commonly used test proposed by Egger et al. (1997, British Medical Journal 315: 629–634). It regresses Z/√V against √V , where Z is the efficient score and V is Fisher’s information (the variance of Z under the null hypothesis). The second new test is Peters’ test, which is based on a weighted linear regression of the intervention effect estimate on the reciprocal of the sample size. Both of these tests maintain better control of the false-positive rate than the test proposed by Egger at al., while retaining similar power. Copyright 2009 by StataCorp LP.
metabias, meta-analysis, publication bias, small-study effects, funnel plots
http://www.stata-journal.com/article.html?article=sbe19_6
http://www.stata-journal.com/software/sj9-2/sbe19_6/
Roger M. Harbord
Ross J. Harris
Jonathan A. C. Sterne
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:374-3892018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:374-389
article
Creating print-ready tables in Stata
This article describes the new Stata command xml_tab, which outputs the results of estimation commands and Stata matrices directly into tables in XML format. The XML ﬁles can be opened with Microsoft Excel or OpenOffice Calc, or they can be linked with Microsoft Word ﬁles. By using XML, xml_tab allows Stata users to apply a rich set of formatting options to the elements of output tables. Copyright 2008 by StataCorp LP.
xml_tab, estimates, regression, matrices, xml, Excel, Word
http://www.stata-journal.com/article.html?article=dm0037
http://www.stata-journal.com/software/sj8-3/dm0037/
Michael Lokshin
Zurab Sajaia
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:585-6052018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:585-605
article
Making spatial analysis operational: Commands for generating spatial-effect variables in monadic and dyadic data
Spatial dependence exists whenever the expected utility of one unit of analysis is affected by the decisions or behavior made by other units of analysis. Spatial dependence is ubiquitous in social relations and interactions. Yet, there are surprisingly few social science studies accounting for spatial dependence. This holds true for settings in which researchers use monadic data, where the unit of analysis is the individual unit, agent, or actor, and even more true for dyadic data settings, where the unit of analysis is the pair or dyad representing an interaction or a relation between two individual units, agents, or actors. Dyadic data offer more complex ways of modeling spatial-effect variables than do monadic data. The commands described in this article facilitate spatial analysis by providing an easy tool for generating, with one command line, spatial-effect variables for monadic contagion as well as for all possible forms of contagion in dyadic data. Copyright 2010 by StataCorp LP.
spspc, spundir, spmon, spdir, spagg, spatial dependence, spatial analysis, contagion, spatial lag, spatial error, monadic data, dyadic data
http://www.stata-journal.com/article.html?article=st0210
Eric Neumayer
Thomas Plümper
oai:RePEc:tsj:stataj:v:9:y:2009:i:1:p:158-1602018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:1:p:158-160
article
Review of The Workflow of Data Analysis Using Stata, by J. Scott Long
This article reviews The Workflow of Data Analysis Using Stata, by J. Scott Long.
data analysis, workﬂow, data management, wdaus
http://www.stata-journal.com/article.html?article=gn0044
Alan C. Acock
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:736-7352018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:736-735
article
Robinson's square root of N consistent semiparametric regression estimator in Stata
In this article, we describe http://www.stata-journal.com/software/Robinson’s (1988, Econometrica 56: 931– 954) double residual semiparametric regression estimator and H ̈ardle and Mam- men’s (1993, Annals of Statistics 21: 1926–1947) specification test implementation in Stata. We use some simple simulations to illustrate how this newly coded estima- tor outperforms the already available semiparametric plreg command (Lokshin, 2006, Stata Journal 6: 377–383).
semipar, semiparametric estimation, double residual estimator
http://www.stata-journal.com/article.html?article=st0278
Vincenzo Verardi
Nicolas Debarsy
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:299-3052018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:299-305
article
The Skillings–Mack test (Friedman test when there are missing data)
The Skillings-Mack statistic (Skillings and Mack, 1981, Technometrics 23: 171 – 177) is a general Friedman-type statistic that can be used in almost any block design with an arbitrary missing-data structure. The missing data can be either missing by design, for example, an incomplete block design, or missing completely at random. The Skillings-Mack test is equivalent to the Friedman test when there are no missing data in a balanced complete block design, and the Skillings – Mack test is equivalent to the test suggested in Durbin (1951, British Journal of Psychology, Statistical Section 4: 85-90) for a balanced incomplete block design. The Friedman test was implemented in Stata by Goldstein (1991, Stata Technical Bulletin 3: 26-27) and further developed in Goldstein (2005, Stata Journal 5: 285). This article introduces the skilmack command, which performs the Skillings-Mack test. The skilmack command is also useful when there are many ties or equal ranks (N.B. the Friedman statistic compared with the χ2 distribution will give a conservative result), as well as for small samples; appropriate results can be obtained by simulating the distribution of the test statistic under the null hypothesis.
skilmack, Skillings – Mack, Friedman, block design, nonpara- metric, unbalanced, missing data, ties
http://www.stata-journal.com/article.html?article=st0167
http://www.stata-journal.com/software/sj9-2/st0167/
Mark Chatfield
Adrian Mander
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:315-3202018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:315-320
article
A statistician’s perspective on “Mostly Harmless Econometrics: An Empiricist's Companion”, by Joshua D. Angrist and Jorn-Steffen Pischke
This article reviews Mostly Harmless Econometrics: An Empiricist's Companion, by Joshua D. Angrist and Jorn-Steffen Pischke.
experimentation, causal inference, ordinary least-squares regression, instrumental variables, difference in differences, fixed effects, discontinuity analyses, mhe
http://www.stata-journal.com/article.html?article=gn0046
Andrew Gelman
oai:RePEc:tsj:stataj:v:14:y:2014:i:1:p:159-1752018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:1:p:159-175
article
Sample size and power calculations for trials and quasi-experimental studies with clustering
This article considers the estimation of power and sample size in experimental and quasi-experimental intervention studies, where there is clustering of subjects within one or both intervention arms, for both continuous and binary outcomes. A new command, clsampsi, which has a wide range of options, calculates the power and sample size needed (that is, the number of clusters and cluster size) by using the noncentral F distribution as described by Moser, Stevens, and Watts (1989, Communications in Statistics—Theory and Methods 18: 3963–3975). For comparative purposes, this command can also produce power and sample-size estimates on the basis of existing methods that use a normal approximation. Copyright 2014 by StataCorp LP.
clsampsi, sample size, power calculation, intervention studies
http://www.stata-journal.com/article.html?article=st0329
Evridiki Batistatou
Chris Roberts
Steve Roberts
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:308-3312018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:308-331
article
Using the margins command to estimate and interpret adjusted predictions and marginal effects
Many researchers and journals place a strong emphasis on the sign and statistical significance of effects—but often there is very little emphasis on the substantive and practical significance of the findings. As Long and Freese (2006, Regression Models for Categorical Dependent Variables Using Stata [Stata Press]) show, results can often be made more tangible by computing predicted or expected values for hypothetical or prototypical cases. Stata 11 introduced new tools for making such calculations—factor variables and the margins command. These can do most of the things that were previously done by Stata’s own adjust and mfx commands, and much more. Unfortunately, the complexity of the margins syntax, the daunting 50-page reference manual entry that describes it, and a lack of understanding about what margins offers over older commands that have been widely used for years may have dissuaded some researchers from examining how the margins command could benefit them. In this article, therefore, I explain what adjusted predictions and marginal effects are, and how they can contribute to the interpretation of results. I further explain why older commands, like adjust and mfx, can often produce incorrect results, and how factor variables and the margins command can avoid these errors. The relative merits of different methods for setting representative values for variables in the model (marginal effects at the means, average marginal effects, and marginal effects at representative values) are considered. I shows how the marginsplot command (introduced in Stata 12) provides a graphical and often much easier means for presenting and understanding the results from margins, and explain why margins does not present marginal effects for interaction terms. Copyright 2012 by StataCorp LP.
margins, marginsplot, adjusted predictions, marginal effects
http://www.stata-journal.com/article.html?article=st0260
Richard Williams
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:702-7172018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:702-717
article
HTML output in Stata
In this article, we present a suite of basic commands that facilitate the production of HTML files in Stata. Creating HTML files in Stata allows for the programmatic production of formatted statistical reports that users can easily open without proprietary software, a feature long desired by many Stata statisticians.
htclose, htexample, htlist, htlog, htopen, htput, htsummary, HTML, logging
http://www.stata-journal.com/article.html?article=dm0066
Llorenç Quintó
Sergi Sanz
Elisa De Lazzari
John J. Aponte
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:388-3972018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:388-397
article
Improved degrees of freedom for multivariate significance tests obtained from multiply imputed, small-sample data
We propose improvements to existing degrees of freedom used for significance testing of multivariate hypotheses in small samples when missing data are handled using multiple imputation. The improvements are for 1) tests based on unrestricted fractions of missing information and 2) tests based on equal fractions of missing information with M (p − 1) ≤ 4, where M is the number of imputations and p is the number of tested parameters. Using the mi command available as of Stata 11, we demonstrate via simulation that using these adjustments can result in a more sensible degrees of freedom (and hence closer-to-nominal rejection rates) than existing degrees of freedom. Copyright 2009 by StataCorp LP.
multiple imputation, degrees of freedom, sample, missing, testing, multivariate
http://www.stata-journal.com/article.html?article=st0170
http://www.stata-journal.com/software/sj9-3/st0170/
Yulia V. Marchenko
Jerome P. Reiter
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:532-5392018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:532-539
article
Erratum and discussion of propensity-score reweighting
xtreg, psmatch2, nnmatch, ivreg, ivreg2, ivregress, rd, lpoly, xtoverid, ranktest, causal inference, match, matching, reweighting, propensity score, panel, instrumental variables, excluded instrument, weak identiﬁcation, regression, discontinuity, local polynomial
http://www.stata-journal.com/article.html?article=st0136_1
http://www.stata-journal.com/software/sj8-4/st0136_1/
Austin Nichols
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:606-6272018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:606-627
article
Age-period-cohort modeling
Age–period–cohort models provide a useful method for modeling incidence and mortality rates. It is well known that age–period–cohort models suffer from an identifiability problem due to the exact relationship between the variables (cohort = period − age). In 2007, Carstensen published an article advocating the use of an analysis that models age, period, and cohort as continuous variables through the use of spline functions (Carstensen, 2007, Statistics in Medicine 26: 3018–3045). Carstensen implemented his method for age–period–cohort models in the Epi package for R. In this article, a new command is introduced, apcfit, that performs the methods in Stata. The identifiability problem is overcome by forcing constraints on either the period or cohort effects. The use of the command is illustrated through an example relating to the incidence of colon cancer in Finland. The example shows how to include covariates in the analysis.
apcfit, poprisktime, age–period–cohort models, incidence rates, mortality rates, Lexis diagrams
http://www.stata-journal.com/article.html?article=st0211
Mark J. Rutherford
Paul C. Lambert
John R. Thompson
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:224-2382018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:224-238
article
Stata in space: Econometric analysis of spatially explicit raster data
Realizing the importance of location, economists are increasingly adopting spatial analytical and spatial econometric perspectives to study questions such as the geographical targeting of policy interventions, regional agglomeration effects, the diffusion of technologies across space, or causes and consequences of land-cover change. Explicitly accounting for location in econometric estimations can be of great benefit for researchers working at the interface of economics or environmental sciences and geography. The objective of this article is to demonstrate how spatially explicit raster data derived from standard geographical information system (GIS) software can be used within Stata. Three programs implemented as ado-files are presented. These import geographic raster data into Stata (ras2dta), draw systematic spatial samples within Stata (spatsam), and export data and estimation results in a form usable by standard GIS software (dta2ras). A numerical example is presented to estimate the determinants of forest cover with a spatially explicit logit model, calculate predicted probabilities, and map the predictions with GIS software. Copyright 2005 by StataCorp LP.
ras2dta, spatsam, dta2ras, geographical information systems (GIS), raster data, spatial modeling, spatial econometrics
http://www.stata-journal.com/article.html?article=dm0014
http://www.stata-journal.com/software/sj5-2/dm0014/
Daniel Müller
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:454-4652018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:454-465
article
Nonparametric testing of distributions—the Epps–Singleton two-sample test using the empirical characteristic function
In statistics, two-sample tests are used to determine whether two sam- ples have been drawn from the same population. An example of such a test is the widely used Kolmogorov–Smirnov two-sample test. There are other distribution- free tests that might be applied in similar occasions. In this article, we describe a two-sample omnibus test introduced by Epps and Singleton, which usually has a greater power than the Kolmogorov–Smirnov test although it is distribution free. The superiority of the Epps–Singleton characteristic function test is illustrated in two examples. We compare the two tests and supplement this contribution with a Stata implementation of the omnibus test.
escftest, nonparametric tests, Kolomogorov–Smirnov, Epps–Singleton, two-sample case
http://www.stata-journal.com/article.html?article=st0174
http://www.stata-journal.com/software/sj9-3/st0174/
Sebastian J. Goerg
Johannes Kaiser
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:354-3732018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:354-373
article
A Stata package for the estimation of the dose–response function through adjustment for the generalized propensity score
In this article, we brieﬂy review the role of the propensity score in estimating dose-response functions as described in Hirano and Imbens (2004, Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, 73-84). Then we present a set of Stata programs that estimate the propensity score in a setting with a continuous treatment, test the balancing property of the generalized propensity score, and estimate the dose-response function. We illustrate these programs by using a dataset collected by Imbens, Rubin, and Sacerdote (2001, American Economic Review 91: 778-794). Copyright 2008 by StataCorp LP.
gpscore, doseresponse, doseresponse model, bias removal, dose-response function, generalized propensity score, weak unconfoundedness
http://www.stata-journal.com/article.html?article=st0150
http://www.stata-journal.com/software/sj8-3/st0150/
Michela Bia
Alessandra Mattei
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:265-2902018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:265-290
article
Further development of ﬂexible parametric models for survival analysis
Royston and Parmar (2002, Statistics in Medicine 21: 2175 – 2197) developed a class of flexible parametric survival models that were programmed in Stata with the stpm command (Royston, 2001, Stata Journal 1:1-28). In this article, we introduce a new command, stpm2, that extends the methodology. New features for stpm2 include improvement in the way time-dependent covariates are modeled, with these effects far less likely to be over parameterized; the ability to incorporate expected mortality and thus fit relative survival models; and a superior predict command that enables simple quantification of differences between any two covariate patterns through calculation of time-dependent hazard ratios, hazard differences, and survival differences. The ideas are illustrated through a study of breast cancer survival and incidence of hip fracture in prostate cancer patients. Copyright 2009 by StataCorp LP.
stpm2, survival analysis, relative survival, time-dependent effects
http://www.stata-journal.com/article.html?article=st0165
http://www.stata-journal.com/software/sj9-2/st0165/
Paul C. Lambert
Patrick Royston
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:439-4592018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:439-459
article
GMM estimation of the covariance structure of longitudinal data on earnings
In this article, we discuss generalized method of moments estimation of the covariance structure of longitudinal data on earnings, and we introduce and illustrate a Stata program that facilitates the implementation of the generalized method of moments approach in this context. The program, gmmcovearn, estimates a variety of models that encompass those most commonly used by labor economists. These include models where the permanent component of earnings follows a random growth or random walk process and where the transitory component can follow either an AR(1) or an ARMA(1,1) process. In addition, time-factor loadings and cohort-factor loadings may be incorporated in the transitory and permanent components.
gmmcovearn, permanent inequality, transitory inequality, generalized method of moments, GMM, covariance structure of earnings
http://www.stata-journal.com/article.html?article=st0239
Aedın Doris
Donal O’Neill
Olive Sweetman
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:505-5142018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:505-514
article
Exact and mid-p confidence intervals for the odds ratio
The odds ratio is a frequently used effect measure for two independent binomial proportions. Unfortunately, the confidence intervals that are available for it in Stata and other standard software packages are generally wider than necessary, particularly for small-sample and exact estimation. The performance of the Cornfield exact interval—the only widely available exact interval for the odds ratio—may be improved by incorporating a small modification attributed to Baptista and Pike (1977, Journal of the Royal Statistical Society, Series C 26: 214–220). A further improvement is achieved when the Baptista–Pike method is combined with the mid-p approach. In this article, I present the command merci (mid-p and exact odds-ratio confidence intervals) and its immediate version mercii, which calculate the Cornfield exact, Cornfield mid-p, Baptista-Pike exact, and Baptista-Pike mid-p confidence intervals for the odds ratio. I compare these intervals with three well-known logit intervals. I strongly recommend the Baptista-Pike mid-p interval.
merci, mercii, confidence intervals, odds ratio, mid-p, exact, quasi-exact, Cornfield, Baptista-Pike
http://www.stata-journal.com/article.html?article=st0271
Morten W. Fagerland
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:93-1222018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:93-122
article
Tabulation of multiple responses
Although multiple-response questions are quite common in survey research, Stata’s official release does not provide much capability for an effective analysis of multiple-response variables. For example, in a study on drug addiction an interview question might be, “Which substances did you consume during the last four weeks?” The respondents just list all the drugs they took, if any; e.g., an answer could be “cannabis, cocaine, heroin” or “ecstasy, cannabis” or “none”, etc. Usually, the responses to such questions are stored as a set of variables and, therefore, cannot be easily tabulated. I will address this issue here and present a new module to compute one- and two-way tables of multiple responses. The module supports several types of data structure, provides significance tests, and offers various options to control the computation and display of the results. In addition, tools to create graphs of multiple-response distributions are presented. Copyright 2005 by StataCorp LP.
mrtab, mrgraph, mrsvmat, multiple responses, multiple testing, tabulate
http://www.stata-journal.com/software/sj5-1/st0082/
http://www.stata-journal.com/article.html?article=st0082
Ben Jann
oai:RePEc:tsj:stataj:v:14:y:2014:i:3:p:623-6612018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:3:p:623-661
article
Adaptive Markov chain Monte Carlo sampling and estimation in Mata
I describe algorithms for drawing from distributions using adaptive Markov chain Monte Carlo (MCMC) methods; I introduce a Mata function for performing adaptive MCMC, amcmc(); and I present a suite of functions, amcmc_*(), that allows an alternative implementation of adaptive MCMC. amcmc() and amcmc_*() can be used with models set up to work with Mata’s moptimize( ) (see [M-5] moptimize( )) or optimize( ) (see [M-5] optimize( )) or with standalone functions. To show how the routines can be used in estimation problems, I give two examples of what Chernozhukov and Hong (2003, Journal of Econometrics 115: 293–346) refer to as quasi-Bayesian or Laplace-type estimators—simulationbased estimators using MCMC sampling. In the first example, I illustrate basic ideas and show how a simple linear model can be fit by simulation. In the next example, I describe simulation-based estimation of a censored quantile regression model following Powell (1986, Journal of Econometrics 32: 143–155); the discussion describes the workings of the command mcmccqreg. I also present an example of how the routines can be used to draw from distributions without a normalizing constant and used in Bayesian estimation of a mixed logit model. This discussion introduces the command bayesmixedlogit. Copyright 2014 by StataCorp LP.
amcmc(), amcmc_*(), bayesmixedlogit, mcmccqreg, Mata, Markov chain Monte Carlo, drawing from distributions, Bayesian estimation, mixed logit
http://www.stata-journal.com/article.html?article=st0354
Matthew J. Baker
oai:RePEc:tsj:stataj:v:14:y:2014:i:3:p:511-5402018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:3:p:511-540
article
Merger simulation with nested logit demand
In this article, we show how to implement merger simulation in Stata as a postestimation command, that is, after estimating an aggregate nested logit demand system with a linear regression model. We also show how to implement merger simulation when the demand parameters are not estimated but instead calibrated to be consistent with outside information on average price elasticities and profit margins. We allow for a variety of extensions, including the role of (marginal) cost savings, remedies (divestiture), and conduct different from Bertrand–Nash behavior.
mergersim, merger simulation, aggregate nested logit model, unit demand, constant expenditures demand
http://www.stata-journal.com/article.html?article=st0349
Jonas Björnerstedt
Frank Verboven
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:315-3172018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:315-317
article
Stata tip 97: Getting at ρ’s and σ’s
Maarten L. Buis
oai:RePEc:tsj:stataj:v:5:y:2004:i:3:p:395-4042018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2004:i:3:p:395-404
article
Stings in the tails: Detecting and dealing with censored data
Variables often show evidence of clustering at extreme values and of graininess, that is, of a limited number of distinct values. Scores on two subscales of a quality-of-life measure, traditionally analyzed with OLS regression or ANOVA models, provide examples. Ignoring or failing to detect such features of the data will result in poor estimates of effect size. Copyright 2005 by StataCorp LP.
censored data, diagnostic plots, intreg
http://www.stata-journal.com/article.html?article=st0090
http://www.stata-journal.com/software/sj5-3/st0090/
Ronán M. Conroy
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:395-4072018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:395-407
article
metaan: Random-effects meta-analysis
This article describes the new meta-analysis command metaan, which can be used to perform fixed- or random-effects meta-analysis. Besides the stan- dard DerSimonian and Laird approach, metaan offers a wide choice of available models: maximum likelihood, profile likelihood, restricted maximum likelihood, and a permutation model. The command reports a variety of heterogeneity mea- sures, including Cochran’s Q, I2, HM2 , and the between-studies variance estimate τb2. A forest plot and a graph of the maximum likelihood function can also be generated.
metaan, meta-analysis, random effect, effect size, maximum likelihood, profile likelihood, restricted maximum likelihood, REML, permutation model, forest plot
http://www.stata-journal.com/article.html?article=st0201
Evangelos Kontopantelis
David Reeves
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:211-2292018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:211-229
article
metandi: Meta-analysis of diagnostic accuracy using hierarchical logistic regression
Meta-analysis of diagnostic test accuracy presents many challenges. Even in the simplest case, when the data are summarized by a 2 × 2 table from each study, a statistically rigorous analysis requires hierarchical (multilevel) models that respect the binomial data structure, such as hierarchical logistic regression. We present a Stata package, metandi, to facilitate the ﬁtting of such models in Stata. The commands display the results in two alternative parameterizations and produce a customizable plot. metandi requires either Stata 10 or above (which has the new command xtmelogit), or Stata 8.2 or above with gllamm installed. Copyright 2009 by StataCorp LP.
metandi, metandiplot, diagnosis, meta-analysis, sensitivity and speciﬁcity, hierarchical models, generalized mixed models, gllamm, xtmelogit, re- ceiver operating characteristic (ROC), summary ROC, hierarchical summary ROC
http://www.stata-journal.com/article.html?article=st0163
http://www.stata-journal.com/software/sj9-2/st0163/
Roger M. Harbord
Penny Whiting
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:549-5612018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:549-561
article
Speaking Stata: Axis practice, or what goes where on a graph
Conventions about what information goes on each axis of a two-way plot are precisely that, conventions. This column discusses—historically, syntactically, and by example-the idea that flouting convention in various ways can lead to small but useful improvements in graph display. Putting y-axis information on the right or on the top, or putting x-axis information on the top, often is useful. The most substantial examples are for multiple quantile plots, for which the new command multqplot is offered, and table-like graphs, which are made even more table-like by mimicking column headers.
multqplot, qplot, tabplot, axes, coordinates, quantile plots, two-way bar charts
http://www.stata-journal.com/article.html?article=gr0053
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:374-3872018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:374-387
article
Graphical representation of multivariate data using Chernoff faces
Chernoff (1971, Technical Report 71, Department of Statistics, Stan- ford University; 1973, Journal of the American Statistical Association 68: 361– 368) proposed the use of cartoon-like faces to represent points in k dimensions. This article describes a Stata implementation of a face-generating algorithm using the method proposed by Flury (1980, Technical Report 3, Institute of Mathemati- cal Statistics and Actuarial Science, Bern University), Schu ̈pbach (1987, Technical Report 25, Institute of Mathematical Statistics and Acturial Science, Bern Univer- sity), and Friendly (1991, http://www.math.yorku.ca/SCS/sasmac/faces.html). I present examples of applying Chernoff faces to data clustering and outlier detection. Copyright 2009 by StataCorp LP.
chernoff, Chernoff faces, graphs
http://www.stata-journal.com/article.html?article=gr0038
http://www.stata-journal.com/software/sj9-3/gr0038/
Rafal Raciborski
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:306-3142018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:306-314
article
Speaking Stata: I. J. Good and quasi-Bayes smoothing of categorical frequencies
I. J. Good (1916 – 2009) was a proliﬁc scientist who contributed to many ﬁelds, mostly from a Bayesian standpoint. This column explains his idea of quasi- Bayes (a.k.a. pseudo-Bayes) estimation or smoothing of categorical frequencies in a contingency table, which is especially useful as a way of dealing with awkward sampling or random zeros. It shows how the method can be implemented, almost calculator-style, using a combination of Stata and Mata. Convenience commands qsbayesi and qsbayes are also introduced.
qsbayesi, qsbayes, categorical data, contingency tables, Mata, pseudo-Bayes, quasi-Bayes, random zeros, sampling zeros, smoothing
http://www.stata-journal.com/article.html?article=st0168
http://www.stata-journal.com/software/sj9-2/st0168/
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:623-6382018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:623-638
article
Fitting and modeling cure in population-based cancer studies within the framework of flexible parametric survival models
When the mortality among a cancer patient group returns to the same level as in the general population, that is, when the patients no longer experi- ence excess mortality, the patients still alive are considered “statistically cured”. Cure models can be used to estimate the cure proportion as well as the survival function of the “uncured”. One limitation of parametric cure models is that the functional form of the survival of the uncured has to be specified. It can some- times be hard to find a survival function flexible enough to fit the observed data, for example, when there is high excess hazard within a few months from diagno- sis, which is common among older age groups. This has led to the exclusion of older age groups in population-based cancer studies using cure models. Here we use flexible parametric survival models that incorporate cure as a special case to estimate the cure proportion and the survival of the uncured. Flexible parametric survival models use splines to model the underlying hazard function; therefore, no parametric distribution has to be specified. We have updated the stpm2 command for flexible parametric models to enable cure modeling.
stpm2, stpm2 postestimation, cure models, flexible parametric survival model, relative survival, survival analysis
http://www.stata-journal.com/article.html?article=st0165_1
Therese M.-L. Andersson
Paul C. Lambert
oai:RePEc:tsj:stataj:v:9:y:2009:i:1:p:17-392018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:1:p:17-39
article
Accommodating covariates in receiver operating characteristic analysis
Classification accuracy is the ability of a marker or diagnostic test to discriminate between two groups of individuals, cases and controls, and is com- monly summarized by using the receiver operating characteristic (ROC) curve. In studies of classification accuracy, there are often covariates that should be incorporated into the ROC analysis. We describe three ways of using covariate information. For factors that affect marker observations among controls, we present a method for covariate adjustment. For factors that aﬀect discrimination (i.e., the ROC curve), we describe methods for modeling the ROC curve as a function of covariates. Finally, for factors that contribute to discrimination, we propose combining the marker and covariate information, and we ask how much discriminatory accuracy improves (in incremental value) with the addition of the marker to the covariates. These methods follow naturally when representing the ROC curve as a summary of the distribution of case marker observations, standardized with respect to the control distribution. Copyright 2009 by StataCorp LP.
roccurve, comproc, rocreg, receiver operating characteristic analysis, ROC, covariates, sensitivity, specificity
http://www.stata-journal.com/article.html?article=st0155
http://www.stata-journal.com/software/sj9-1/st0155/
Holly Janes
Gary Longton
Margaret S. Pepe
oai:RePEc:tsj:stataj:v:9:y:2009:i:1:p:86-1362018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:1:p:86-136
article
How to do xtabond2: An introduction to difference and system GMM in Stata
The difference and system generalized method-of-moments estimators, developed by Holtz-Eakin, Newey, and Rosen (1988, Econometrica 56: 1371-1395); Arellano and Bond (1991, Review of Economic Studies 58: 277-297); Arellano and Bover (1995, Journal of Econometrics 68: 29-51); and Blundell and Bond (1998, Journal of Econometrics 87: 115-143), are increasingly popular. Both are general estimators designed for situations with "small T , large N" panels, meaning few time periods and many individuals; independent variables that are not strictly exogenous, meaning they are correlated with past and possibly current realizations of the error; fixed effects; and heteroskedasticity and autocorrelation within individuals. This pedagogic article first introduces linear generalized method of moments. Then it describes how limited time span and potential for fixed eﬀects and endogenous regressors drive the design of the estimators of interest, offering Stata-based examples along the way. Next it describes how to apply these estimators with xtabond2. It also explains how to perform the Arellano-Bond test for autocorrelation in a panel after other Stata commands, using abar. The article concludes with some tips for proper use. Copyright 2009 by StataCorp LP.
xtabond2, generalized method of moments, GMM, Arellano-Bond test, abar
http://www.stata-journal.com/article.html?article=st0159
http://www.stata-journal.com/software/sj9-1/st0159/
David Roodman
oai:RePEc:tsj:stataj:v:12:y:2012:i:4:p:688-7012018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:4:p:688-701
article
A command to calculate age-standardized rates with efficient interval estimation
In this article, we illustrate the command distrate, which calcu- lates age-standardized rates with efficient interval estimation by using formulas developed by Tiwari, Clegg, and Zou (2006, Statistical Methods in Medical Re- search 15: 547–569) as a modification of the method proposed by Fay and Feuer (1997, Statistics in Medicine 16: 791–801). This method is currently used in the Surveillance, Epidemiology, and End Results Program of the National Can- cer Institute in Bethesda, Maryland; the Italian Association of Cancer Registries (Associazione Italiana Registro Tumori, AIRTUM); and the Lombardy Mesothe- lioma and Sinonasal Cancer Registry in Northern Italy. The command produces a compact output and allows for the possibility of specifying a rate multiplier, for instance, ×100,000 or ×1,000,000. Furthermore, rates and confidence limits can be easily exported to an external dataset for further processing (for example, for making graphs). The command distrate is a useful addition to the official Stata command dstdize.
distrate, confounding, standardization, incidence rates, mortality rates, confidence intervals
http://www.stata-journal.com/article.html?article=st0276
Dario Consonni
Enzo Coviello
Carlotta Buzzoni
Carolina Mensi
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:162-1872018-08-09RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:162-187
article
Multilingual datasets
This insert describes a new command mlanguage that facilitates the creation and maintenance of “comprehensive” multilingual datasets. These are datasets with many variables, many of which are value labeled, with labels in different languages, all contained within the dataset. The tools make it easy to add labels in a new language by translating an existing set of labels, to switch between the sets of labels, to verify the integrity of such labels, and to assist in keeping the labels complete. Copyright 2005 by StataCorp LP.
mlanguage, multilingual datasets, data integrity, value labels
http://www.stata-journal.com/article.html?article=dm0013
http://www.stata-journal.com/software/sj5-2/dm0013/
Jeroen Weesie
oai:RePEc:tsj:stataj:v:19:y:2019:i:4:cumindex2020-01-16RePEc:tsj:stataj
RePEc:tsj:stataj:v:19:y:2019:i:4:cumindex
article
Cumulative author index, volumes 1-19
This index does not appear in the print version of the Stata Journal.
http://repec.org/tsj/SJindexVol1-19.pdf
Christopher F Baum
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:565-5692020-03-05RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:565-569
article
Stata tip 111: More on working with weeks
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:39-502021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:39-50
article
Bootstrap unit-root test for random walk with drift: The bsrwalkdrift command
In this article, we introduce the command bsrwalkdrift, which is primarily intended to perform a bootstrap unit-root test under the null hypothesis of random walk with drift. The method implemented in this command is considerably more precise than the corresponding case of the conventional augmented Dickey–Fuller test, which can be inaccurate when the true value of the drift term is small relative to the standard deviation of the innovations. The command also has an option to account for deterministic linear trend and another option to perform bootstrap unit-root tests under the null hypothesis of random walk without drift.
xtavplot, xtavplots, added-variable plot, panel data, postestimation diagnostics, xtreg
http://hdl.handle.net/10.1177/1536867X211000003
Miguel Dorta
Gustavo Sanchez
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:263-2712021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:263-271
article
Stata tip 140: Shorter or fewer category labels with graph bar
http://hdl.handle.net/10.1177/1536867X211000032
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:3-382021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:3-38
article
Estimation of nested and zero-inflated ordered probit models
We introduce three new commands—nop, ziop2, and ziop3—for the estimation of a three-part nested ordered probit model, the two-part zero-inflated ordered probit models of Harris and Zhao (2007, Journal of Econometrics 141: 1073–1099) and Brooks, Harris, and Spencer (2012, Economics Letters 117: 683–686), and a three-part zero-inflated ordered probit model of Sirchenko (2020, Studies in Nonlinear Dynamics and Econometrics 24: 1) for ordinal outcomes, with both exogenous and endogenous switching. The three-part models allow the probabilities of positive, neutral (zero), and negative outcomes to be generated by distinct processes. The zero-inflated models address a preponderance of zeros and allow them to emerge in different latent regimes. We provide postestimation commands to compute probabilistic predictions and various measures of their accuracy, to assess the goodness of fit, and to perform model comparison using the Vuong test (Vuong, 1989, Econometrica 57: 307–333) with the corrections based on the Akaike and Schwarz information criteria. We investigate the finite-sample performance of the maximum likelihood estimators by Monte Carlo simulations, discuss the relations among the models, and illustrate the new commands with an empirical application to the U.S. federal funds rate target.
nop, ziop2, ziop3, ordinal outcomes, zero inflation, nested ordered probit, zero-inflated ordered probit, endogenous switching, Vuong test, federal funds rate target
http://hdl.handle.net/10.1177/1536867X211000002
David Dale
Andrei Sirchenko
oai:RePEc:tsj:stataj:y:20:y:2020:i:1:p:244-2492021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:y:20:y:2020:i:1:p:244-249
article
Stata tip 135: Leaps and bounds
http://hdl.handle.net/10.1177/1536867X20909707
Maarten L. Buis
oai:RePEc:tsj:stataj:y:19:y:2019:i:1:p:195-2052021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:y:19:y:2019:i:1:p:195-205
article
Calculating level-specific SEM fit indices for multilevel mediation analyses
Stata’s gsem command provides the ability to fit multilevel structural equation models (SEM) and related multilevel models. A motivating example is provided by multilevel mediation analyses (MA) conducted on patient data from Methadone Maintenance Treatment clinics in China. Multilevel MA conducted through the gsem command examined the mediating effects of patients’ treatment progression and rapport with counselors on their treatment satisfaction. Multilevel models accounted for the clustering of patient observations within clinics. SEM fit indices, such as the comparative fit index and the root mean squared error of approximation, are commonly used in the SEM model selection process. Multilevel models present challenges in constructing fit indices because there are multiple levels of hierarchy to account for in establishing goodness of fit. Level-specific fit indices have been proposed in the literature but have not been incorporated into the gsem command. I created the gsemgof command to fill this role. Model results from the gsem command are used to calculate the level-specific comparative fit index and root mean squared error of approximation fit indices. I illustrate the gsemgof command through multilevel MA applied to two-level Methadone Maintenance Treatment data.
gsemgof, gsem, sem, multilevel, structural equation model, mediation analysis, fit index
http://hdl.handle.net/10.1177/1536867X211000022
W. Scott Comulada
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:180-1942021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:180-194
article
Implementing blopmatching in Stata
The blopmatching estimator for average treatment effects in observational studies is a nonparametric matching estimator proposed by Díaz, Rau, and Rivera (2015, Review of Economics and Statistics 97: 803–812). This approach uses the solutions of linear programming problems to build the weighting schemes that are used to impute the missing potential outcomes. In this article, we describe blopmatch, a new command that implements these estimators.
blopmatch, average treatment effects, matching, linear pro- gramming, synthetic covariate
http://hdl.handle.net/10.1177/1536867X211000021
Juan D. Díaz
Iván Gutiérrez
Jorge Rivera
oai:RePEc:tsj:stataj:y:20:y:2020:i:1:p:236-2432021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:y:20:y:2020:i:1:p:236-243
article
Speaking Stata: Concatenating values over observations
Concatenation, or joining together, of strings or other values, possibly with extra punctuation such as spaces, is supported in Stata by addition of strings and by the egen function concat(), which concatenates values of variables within observations. In this column, I discuss basic techniques for concatenating values of variables over observations, emphasizing simple loops that can be tuned to suit variants as desired. Commonly, such concatenated strings report a profile or history of each individual within panel or longitudinal data. Such histories can then be analyzed further.
concatenation, strings, panel data, longitudinal data
http://hdl.handle.net/10.1177/1536867X20909698
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:81-962021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:81-96
article
Estimation of marginal effects for models with alternative variable transformations
margins is a powerful postestimation command that allows the estimation of marginal effects for official and community-contributed commands, with well-defined predicted outcomes (see predict). While the use of factor-variable notation allows one to easily estimate marginal effects when interactions and polynomials are used, estimation of marginal effects when other types of transformations such as splines, logs, or fractional polynomials are used remains a challenge. In this article, I describe how margins’s capabilities can be extended to analyze other variable transformations using the command f_able.
f_able, margins, marginal effects, predict, variable transformations, nonlinear
http://hdl.handle.net/10.1177/1536867X211000005
Fernando Rios-Avila
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:141-1512021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:141-151
article
From common to firm-specific event dates: A new version of the estudy command
The estudy command proposed by Pacicco, Vena, and Venegoni (2018, Stata Journal 18: 461–476) performs event studies only for event-date clustering, that is, when the event date is common to all securities. This constitutes a relevant limitation because the vast majority of this methodology’s applications concerns studies in which the events happen on different dates for each statistical unit considered. In this article, we propose and describe a substantial update to estudy, which 1) performs event studies in the absence of event-date clustering (that is, when each security has its own event date); 2) further customizes the output by producing LATEX-formatted tables; 3) graphs the cumulative abnormal returns over a customized period set by the user; 4) makes more output data available through either the return list or Excel files; 5) allows a double possibility as input: either prices or returns; and 6) uses wildcards.
estudy, event study, financial econometrics
http://hdl.handle.net/10.1177/1536867X211000010
Fausto Pacicco
Luigi Vena
Andrea Venegoni
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:123-1402021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:123-140
article
msreg: A command for consistent estimation of linear regression models using matched data
Economists often use matched samples, especially when dealing with earning data where some observations are missing in one sample and need to be imputed from another sample. Hirukawa and Prokhorov (2018, Journal of Econometrics 203: 344–358) show that the ordinary least-squares estimator using matched samples is inconsistent and propose two consistent estimators. We de- scribe a new command, msreg, that implements these two consistent estimators based on two samples. The estimators attain the parametric convergence rate if the number of continuous matching variables is no greater than four.
msreg, bias correction, linear regression, matching estimation
http://hdl.handle.net/10.1177/1536867X211000008
Masayuki Hirukawa
Di Lu
Artem Prokhorov
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:97-1222021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:97-122
article
Evaluating the maximum regret of statistical treatment rules with sample data on treatment response
In this article, we present the wald_tc command, which computes the maximum regret (MR) of a user-specified statistical treatment rule that uses sample data on realized treatment response (and optionally an instrumental variable) to determine a treatment choice for a population. Because the outcomes of counterfactual treatments are not observed and treatment selection in the study population may not be random, decision makers may be able only to partially identify average treatment effects. wald_tc allows users to compute the MR of a proposed statistical treatment rule under a flexible specification of the data-generating process and determines the state that generates MR.
wald_tc, maximum regret, average treatment effect, instrumen- tal variable, partial identification
http://hdl.handle.net/10.1177/1536867X211000006
Valentyn Litvin
Charles F. Manski
oai:RePEc:tsj:stataj:y:19:y:2019:i:1:p:259-2622021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:y:19:y:2019:i:1:p:259-262
article
Review of Psychological Statistics and Psychometrics Using Stata, by Scott A. Baldwin
In this article, I review Psychological Statistics and Psychometrics Using Stata, by Scott A. Baldwin (2019, Stata Press).
book review, psychometrics, regression, ANOVA, multilevel, confirmatory factor analysis, exploratory factor analysis, Stata
http://hdl.handle.net/10.1177/1536867X211000031
Christine Wells
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:51-802021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:51-80
article
Testing for slope heterogeneity in Stata
In this article, we introduce a new community-contributed command, xthst, to test for slope heterogeneity in panels with many observations over cross-sectional units and time periods. The command implements such a test, the delta test (Pesaran and Yamagata, 2008, Journal of Econometrics 142: 50–93). Under its null, slope coefficients are homogeneous across cross-sectional units. Under the alternative, slope coefficients are heterogeneous in the cross-sectional dimension. xthst also includes two extensions. The first is a heteroskedasticity- and autocorrelation-consistent robust test along the lines of Blomquist and Westerlund (2013, Economics Letters 121: 374–378). The second extension is a cross-sectional-dependence robust version. We discuss all tests and present examples using an economic growth model. A Monte Carlo simulation shows that the size and the power behave as expected.
xthst, parameter heterogeneity, fixed effects, pooled OLS, mean-group estimator, cross-section dependence, heterogeneity, common correlated random effects
http://hdl.handle.net/10.1177/1536867X211000004
Tore Bersvendsen
Jan Ditzen
oai:RePEc:tsj:stataj:y:19:y:2019:i:1:p:220-2582021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:y:19:y:2019:i:1:p:220-258
article
Beyond linearity, stability, and equilibrium: The edm package for empirical dynamic modeling and convergent cross-mapping in Stata
How can social and health researchers study complex dynamic systems that function in nonlinear and even chaotic ways? Common methods, such as experiments and equation-based models, may be ill-suited to this task. To address the limitations of existing methods and offer nonparametric tools for characterizing and testing causality in nonlinear dynamic systems, we introduce the edm command in Stata. This command implements three key empirical dynamic modeling (EDM) methods for time series and panel data: 1) simplex projection, which characterizes the dimensionality of a system and the degree to which it appears to function deterministically; 2) S-maps, which quantify the degree of nonlinearity in a system; and 3) convergent cross-mapping, which offers a nonparametric approach to modeling causal effects. We illustrate these methods using simulated data on daily Chicago temperature and crime, showing an effect of temperature on crime but not the reverse. We conclude by discussing how EDM allows checking the assumptions of traditional model-based methods, such as residual autocorrelation tests, and we advocate for EDM because it does not assume linearity, stability, or equilibrium.
edm, empirical dynamic model, convergent cross-mapping, simplex projection, S-maps, causality, manifold, equilibrium
http://hdl.handle.net/10.1177/1536867X211000030
Jinjing Li
Michael J. Zyphur
George Sugihara
Patrick J. Laub
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:152-1792021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:152-179
article
segregsmall: A command to estimate segregation in the presence of small units
Suppose that a population, composed of a minority and a majority group, is allocated into units, which can be neighborhoods, firms, classrooms, etc. Qualitatively, there is some segregation whenever allocation leads to the concentration of minority individuals in some units more than in others. Quantitative measures of segregation have struggled with the small-unit bias. When units contain few individuals, indices based on the minority shares in units are upward biased. For instance, they would point to a positive amount of segregation even when allocation is strictly random. The command segregsmall implements three recent methods correcting for such bias: the nonparametric, partial identification approach of D’Haultfoeuille and Rathelot (2017, Quantitative Economics 8: 39–73); the parametric model of Rathelot (2012, Journal of Business & Economic Statistics 30: 546–553); and the linear correction of Carrington and Troske (1997, Journal of Business & Economic Statistics 15: 402–409). The package also allows for con-ditional analyses, namely, measures of segregation accounting for characteristics of the individuals or the units.
segregation indices, small-unit bias, partial identification, Dun- can index, Theil index, Atkinson index, Coworker index, Gini index
http://hdl.handle.net/10.1177/1536867X211000018
Xavier D’Haultfoeuille
Lucas Girard
Roland Rathelot
oai:RePEc:tsj:stataj:y:19:y:2019:i:1:p:206-2192021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:y:19:y:2019:i:1:p:206-219
article
kmr: A command to correct survey weights for unit nonresponse using groups’ response rates
In this article, we describe kmr, a command to estimate a microcompliance function using groups’ nonresponse rates (Korinek, Mistiaen, and Ravallion, 2007, Journal of Econometrics 136: 213–235), which can be used to correct survey weights for unit nonresponse. We illustrate the use of kmr with an empirical example using the current population survey and state-level nonresponse rates.
gsemgof, gsem, sem, multilevel, structural equation model, mediation analysis, fit index
http://hdl.handle.net/110.1177/1536867X211000025
Ercio Muñoz
Salvatore Morelli
oai:RePEc:tsj:stataj:v:21:y:2021:i:1:p:2722021-04-08RePEc:tsj:stataj
RePEc:tsj:stataj:v:21:y:2021:i:1:p:272
article
Software updates
Updates for previously published packages are provided.
Editors
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:309-3122017-12-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:309-312
article
Stata tip 88: Efficiently evaluating elasticities with the margins command
Christopher F. Baum
oai:RePEc:tsj:stataj:v:9:y:2009:i:2:p:291-2982017-12-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:2:p:291-298
article
Implementing Horn’s parallel analysis for principal component analysis and factor analysis
I present paran, an implementation of Horn's parallel analysis criteria for factor or component retention in common factor analysis or principal compo- nent analysis in Stata. The command permits classical parallel analysis and more recent extensions to it for the pca and factor commands. paran provides a needed extension to Stata’s built-in factor- and component-retention criteria.
paran, parallel analysis, factor analysis, principal component analysis, factor retention, component retention, Horn’s criterion
http://www.stata-journal.com/article.html?article=st0166
http://www.stata-journal.com/software/sj9-2/st0166/
Alexis Dinno
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:303-3042017-12-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:303-304
article
Stata tip 86: The missing() function
Bill Rising
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:297-3022017-12-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:297-302
article
Review of Multivariable Model-building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modeling Continuous Variables, by Royston and Sauerbrei
This article reviews Multivariable Model-building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modeling Continuous Variables, by Patrick Royston and Willi Sauerbrei.
applied statistics, nonlinear regression, fractional polynomials
http://www.stata-journal.com/article.html?article=gn0050
William D. Dupont
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:305-3082017-12-21RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:305-308
article
Stata tip 87: Interpretation of interactions in nonlinear models
Maarten L. Buis
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:604-6062020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:604-606
article
Stata tip 27: Classifying data points on scatter plots
http://www.stata-journal.com/article.html?article=gr0023
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:242-2562020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:242-256
article
Graphing confidence ellipses: An update of ellip for Stata 8
This paper describes an update of the ellip command for graphing confidence ellipses in Stata 8. Two of the most notable new features are the option to graph confidence ellipses around variable means and the ability to add inscribed lines. These features allow a geometric characterization of linear regression with unequal error variances, as in McCartin (2003). Copyright 2004 by StataCorp LP.
ellip, confidence ellipse, error-variance regression, elliptical distribution
http://www.stata-journal.com/software/sj4-3/gr32.1/
http://www.stata-journal.com/sjpdf.html?articlenum=gr32_1
Anders Alexandersson
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:216-2192020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:216-219
article
Review of Statistics with Stata (Updated for Version 8) by Hamilton
The new book by Hamilton (2004) is reviewed.
introductory, teaching
http://www.stata-journal.com/sjpdf.html?articlenum=gn0012
Richard Williams
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:131-1362020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:131-136
article
Review of An Introduction to Modern Econometrics Using Stata by Baum
This article reviews An Introduction to Modern Econometrics Using Stata by Christopher F. Baum.
econometrics, textbook, ivreg
http://stata-press.com/books/baum-review.pdf
http://www.stata.com/bookstore/sjj.html
Austin Nichols
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:403-4102020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:403-410
article
From the help desk: Demand system estimation
This article provides an example illustrating how to use Stata to estimate systems of household demand equations. More generally,the techniques developed here can be used to estimate any system of nonlinear equations using Stata's maximum likelihood routines. Copyright 2002 by Stata Corporation.
nonlinear estimation, maximum likelihood, demand equations
http://www.stata-journal.com/software/sj2-4/st0029/
http://www.stata-journal.com/sjpdf.html?articlenum=st0029
Brian P. Poi
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:391-4022020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:391-402
article
The clustergram: A graph for visualizing hierarchical and nonhierarchical cluster analyses
In hierarchical cluster analysis, dendrograms are used to visualize how clusters are formed. I propose an alternative graph called a "clustergram" to examine how cluster members are assigned to clusters as the number of clusters increases. This graph is useful in exploratory analysis for nonhierarchical clustering algorithms such as k means and for hierarchical cluster algorithms when the number of observations is large enough to make dendrograms impractical. I present the Stata code and give two examples. Copyright 2002 by Stata Corporation.
dendrogram, tree, clustering, nonhierarchical, large data, asbestos
http://www.stata-journal.com/software/sj2-4/st0028/
http://www.stata-journal.com/sjpdf.html?articlenum=st0028
Matthias Schonlau
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:1-212020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:1-21
article
Estimating variance components in Stata
This article gives a brief overview of the popular methods for estimating variance components in linear models and describes several ways to obtain such estimates in Stata for various experimental designs. The article’s emphasis is on using xtmixed to estimate variance components. Prior to Stata 9, loneway could be used to estimate variance components for one-way random-effects models. For other experimental designs, variance components could be computed manually using saved results after anova. The latter approach is viable but requires tedious computations for complicated experimental designs. Instead, as of Stata 9, variance components are easily obtained by using xtmixed. Copyright 2006 by StataCorp LP.
variance components, experimental design, ANOVA, REML, ML, multilevel, random coefficients, mixed models
http://www.stata-journal.com/article.html?article=st0095
http://www.stata-journal.com/software/sj6-1/st0095/
Yulia Marchenko
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:178-1842020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:178-184
article
From the help desk: hurdle models
This article demonstrates that, although there is no command in Stata for fitting hurdle models, the parameters of a hurdle model can be estimated in Stata rather easily using a combination of existing commands. We also include a likelihood evaluator to be used with Stata's ml facilities to illustrate how to fit a hurdle model using ml's cluster(), svy, and constraints() options. Copyright 2003 by Stata Corporation.
hurdle model
http://www.stata-journal.com/sjpdf.html?articlenum=st0040
Allen McDowell
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:208-2102020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:208-210
article
Review of Generalized Estimating Equations by Hardin and Hilbe
The new book by Hardin and Hilbe (2003) is reviewed. Copyright 2003 by Stata Corporation.
generalized estimating equations, generalized linear models
http://www.stata-journal.com/sjpdf.html?articlenum=gn0008
Steven Stillman
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:101-1042020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:101-104
article
Residual diagnostics for cross-section time series regression models
These routines support the diagnosis of groupwise heteroskedasticity and cross-sectional correlation in the context of a regression model fit to pooled cross-section time series (xt) data. Copyright 2001 by Stata Corporation.
fixed effects, groupwise heteroskedasticity, contemporaneous correlation
http://www.stata-journal.com/software/sj1-1/st0004/
http://www.stata-journal.com/sjpdf.html?articlenum=st0004
Christopher F Baum
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:295-3012020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:295-301
article
Lean mainstream schemes for Stata 8 graphics
The new Stata 8 graphics are powerful and flexible. Now, a few months after the first release, the graphics still have some shortcomings-both in design and in the manual documenting the program-but progress is being made. The graph layout used throughout the Graphics Reference Manual has led some users to underestimate the potential of the program. This paper presents two schemes for a lean layout, conforming to the mainstream in scientific publishing. Copyright 2003 by StataCorp LP.
graphs, schemes
http://www.stata-journal.com/software/sj3-3/gr0002/
http://www.stata-journal.com/sjpdf.html?articlenum=gr0002
Svend Juul
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:147-1482020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:147-148
article
Stata tip 29: For all times and all places
http://www.stata-journal.com/article.html?article=dm0020
Charles H. Franklin
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:486-4872020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:486-487
article
Stata tip 14: Using value labels in expressions
http://www.stata-journal.com/sjpdf.html?articlenum=dm0009
Kenneth Higbee
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:127-1412020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:127-141
article
Funnel plots in meta-analysis
Funnel plots are a visual tool for investigating publication and other bias in meta-analysis. They are simple scatterplots of the treatment effects estimated from individual studies (horizontal axis) against a measure of study size (vertical axis). The name "funnel plot" is based on the precision in the estimation of the underlying treatment effect increasing as the sample size of component studies increases. Therefore, in the absence of bias, results from small studies will scatter widely at the bottom of the graph, with the spread narrowing among larger studies. Publication bias (the association of publication probability with the statistical significance of study results) may lead to asymmetrical funnel plots. It is, however, important to realize that publication bias is only one of a number of possible causes of funnel-plot asymmetry-funnel plots should be seen as a generic means of examining small study effects (the tendency for the smaller studies in a meta-analysis to show larger treatment effects) rather than a tool to diagnose specific types of bias. This article introduces the metafunnel command, which produces funnel plots in Stata. In accordance with published recommendations, standard error is used as the measure of study size. Treatment effects expressed as ratio measures (for example risk ratios or odds ratios) may be plotted on a log scale. Copyright 2004 by StataCorp LP.
metafunnel, funnel plots, meta-analysis, publication bias, smallstudy effects
http://www.stata-journal.com/software/sj4-2/st0061/
http://www.stata-journal.com/sjpdf.html?articlenum=st0061
Jonathan A.C. Sterne
Roger M. Harbord
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:188-2012020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:188-201
article
Multiple imputation of missing values: update
This article describes a substantial update to mvis, which brings it more closely in line with the feature set of S. van Buuren and C. G. M. Oudshoorn’s implementation of the MICE system in R and S-PLUS (for details, see http://www.multiple-imputation.com). To make a clear distinction from mvis,the principal program of the new Stata release is called ice. I will give details of how to use the new features and a practical illustrative example using real data. All the facilities of mvis are retained by ice. Some improvements to micombine for computing estimates from multiply imputed datasets are also described. Copyright 2005 by StataCorp LP.
ice, mvis, uvis, micombine, mijoin, misplit, missing data, missing at random, multiple imputation, multivariate imputation, regression modeling
http://www.stata-journal.com/article.html?article=st0067_1
http://www.stata-journal.com/software/sj5-2/st0067_1/
Patrick Royston
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:112-1232020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:112-123
article
Mata Matters: Creating new variables--sounds boring, isn’t
Mata is Stata’s matrix language. In the Mata Matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. In this quarter’s column, we continue to explore the handling of Stata datasets in Mata and focus on creating new variables.
Mata, views, Stata datasets
http://www.stata-journal.com/article.html?article=pr0021
William Gould
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:270-2772020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:270-277
article
Do-it-yourself shuffling and the number of runs under randomness
A common class of problem in statistical science is estimating, as a benchmark, the probability of some event under randomness. For example, in a sequence of events in which several outcomes are possible and the length of the sequence and number of outcomes of each type known, the number of runs gives an indication of whether the outcomes are random, clustered, or alternating. This note explains and illustrates a simple method of random shuffling that is often useful. We show how the conditional probability distribution of the number of runs may be derived easily in Stata, thus yielding p-values for testing the null hypothesis that the type of outcome is random. We also compare our direct approach with that using the simulate command. Copyright 2003 by StataCorp LP.
alternation, categorical data, clustering, conditional distribution, forvalues, p-value, permutation, run, sequence, simulate, simulation
http://www.stata-journal.com/sjpdf.html?articlenum=st0044
Nigel Smeeton
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:124-1372020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:124-137
article
Speaking Stata: Time of day
Many problems in statistical analysis include time-of-day variables, but Stata offers limited support for time-of-day calculations. Support is needed for dates with times, times alone, and durations or timings. This article presents two new programs as general utilities to convert back and forth between string and numeric representations.
ntimeofday, stimeofday, time of day, time series, calendar
http://www.stata-journal.com/article.html?article=dm0018
http://www.stata-journal.com/software/sj6-1/dm0018/
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:357-3582020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:357-358
article
Stata tip 12: Tuning the plot region aspect ratio
http://www.stata-journal.com/sjpdf.html?articlenum=gr0007
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:402-4202020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:402-420
article
Controlling for time-dependent confounding using marginal structural models
Longitudinal studies in which exposures, confounders, and outcomes are measured repeatedly over time have the potential to allow causal inferences about the effects of exposure on outcome. There is particular interest in estimating the causal effects of medical treatments (or other interventions) in circumstances in which a randomized controlled trial is difficult or impossible. However, standard methods for estimating exposure effects in longitudinal studies are biased in the presence of time-dependent confounders affected by prior treatment. This article describes the use of marginal structural models (described by Robins, Hernán, and Brumback [2000]) to estimate exposure or treatment effects in the presence of time-dependent confounders affected by prior treatment. The method is based on deriving inverse-probability-of-treatment weights, which are then used in a pooled logistic regression model to estimate the causal effect of treatment on outcome. We demonstrate the use of marginal structural models to estimate the effect of methotrexate on mortality in persons suffering from rheumatoid arthritis. Copyright 2004 by StataCorp LP.
marginal structural models, causal models, weighted regression, survival analysis, logistic regression, confounding
http://www.stata-journal.com/software/sj4-4/st0075/
http://www.stata-journal.com/sjpdf.html?articlenum=st0075
Zoe Fewell
Frederick Wolfe
Hyon Choi
Miguel A. Hernán
Kate Tilling
Jonathan A. C. Sterne
oai:RePEc:tsj:stataj:v:5:y:2004:i:1:p:43-452020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2004:i:1:p:43-45
article
Stata at 20: a personal view
http://www.stata-journal.com/sjpdf.html?articlenum=gn0024
Patrick Royston
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:27-392020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:27-39
article
Semi-nonparametric estimation of extended ordered probit models
This paper presents a semi-nonparametric estimator for a series of generalized models that nest the ordered probit model and thereby relax the distributional assumption in that model. It describes a new Stata command for fitting such models and presents an illustration of the approach. Copyright 2004 by StataCorp LP.
ordered response models, ordered probit, semi-nonparametric estimation
http://www.stata-journal.com/software/sj4-1/st0056/
http://www.stata-journal.com/sjpdf.html?articlenum=st0056
Mark B. Stewart
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:89-922020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:89-92
article
Review of Veterinary Epidemiologic Research by Dohoo, Martin and Stryhn
The new book by Dohoo, Martin and Stryhn (2003) is reviewed.
veterinary, epidemiology
http://www.stata-journal.com/sjpdf.html?articlenum=gn0010
Laurent Audige
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:351-3602020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:351-360
article
Instrumental variables, bootstrapping, and generalized linear models
This paper discusses and illustrates the qvf command for fitting generalized linear models. The differences between this new command and StataÕs glm command are highlighted. One of the most notable features of the qvf command is its ability to include instrumental variables. This functionality was added specifically to address measurement error but may be utilized by the user for other purposes. The qvf command was developed in the C-language using StataÕs new plugin features and executes much faster than the glm ado-file. Copyright 2003 by StataCorp LP.
measurement error, instrumental variables, MurphyÐTopel, bootstrap, generalized linear models
http://www.stata-journal.com/software/sj3-4/st0049/
http://www.stata-journal.com/sjpdf.html?articlenum=st0049
James W. Hardin
Henrik Schmeidiche
Raymond J. Carroll
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:32-342020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:32-34
article
In at the creation
http://www.stata-journal.com/sjpdf.html?articlenum=gn0019
Sean Becketti
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:278-2942020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:278-294
article
Multivariate probit regression using simulated maximum likelihood
We discuss the application of the GHK simulation method for maximum likelihood estimation of the multivariate probit regression model and describe and illustrate a Stata program mvprobit for this purpose. Copyright 2003 by StataCorp LP.
maximum likelihood estimation, multivariate probit regression model, GHK, mvprobit, mvppred
http://www.stata-journal.com/software/sj3-3/st0045/
http://www.stata-journal.com/sjpdf.html?articlenum=st0045
Lorenzo Cappellari
Stephen P. Jenkins
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:480-4832020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:480-483
article
Review of Statistical Evaluation of Measurement Errors by Dunn
The new edition of the book by Dunn (2004) is reviewed.
measurement errors, linear models, mixed models, gllamm
http://www.stata-journal.com/sjpdf.html?articlenum=gn0015
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:476-4792020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:476-479
article
Review of A Visual Guide to Stata Graphics by Mitchell
The new book by Mitchell (2004) is reviewed.
graphics, Stata texts
http://stata-journal.com/books/vgsg-review.pdf
http://www.stata-journal.com/sjpdf.html?articlenum=gn0014
Ulrich Kohler
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:331-3502020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:331-350
article
Using Aalen's linear hazards model to investigate time-varying effects in the proportional hazards regression model
In this paper, we describe a new Stata command, stlh, which estimates and tests for the significance of the time-varying regression coefficients in Aalen's linear hazards model; see Aalen (1989). We see two potential uses for this command. One may use it as an alternative to a proportional hazards or other nonlinear hazards regression model analysis to describe the effects of covariates on survival time. A second application is to use the command to supplement a proportional hazards regression model analysis to assist in detecting and then describing the nature of time-varying effects of covariates through plots of the estimated cumulative regression coefficients, with confidence bands, from Aalen's model. We illustrate the use of the command to perform this supplementary analysis with data from a study of residential treatment programs of different durations that are designed to prevent return to drug use. Copyright 2002 by Stata Corporation.
survival analysis, survival-time regression models, time-to-event analysis
http://www.stata-journal.com/software/sj2-4/st0024/
http://www.stata-journal.com/sjpdf.html?articlenum=st0024
David W.Hosmer
Patrick Royston
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:64-822020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:64-82
article
Visualizing main effects and interactions for binary logit models
This paper considers the role of covariates when using predicted probabilities to interpret main effects and interactions in logit models. While predicted probabilities are very intuitive for interpreting main effects and interactions, the pattern of results depends on the contribution of covariates. We introduce a concept called the covariate contribution, which reflects the aggregate contribution of all of the remaining predictors (covariates) in the model and a family of tools to help visualize the relationship between predictors and the predicted probabilities across a variety of covariate contributions. We believe this strategy and the accompanying tools can help researchers who wish to use predicted probabilities as an interpretive framework for logit models acquire and present a more comprehensive interpretation of their results. These visualization tools could be extended to other models (such as binary probit, multinomial logistic, ordinal logistic models, and other nonlinear models). Copyright 2005 by StataCorp LP.
logistic regression, predicted probabilities, main effects, interactions, covariate contribution
http://www.stata-journal.com/software/sj5-1/st0081/
http://www.stata-journal.com/article.html?article=st0081
Michael N. Mitchell
Xiao Chen
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:248-2582020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:248-258
article
Teaching statistics to physicians using Stata
The Clinical Research Training Program (CRTP) at the Albert Einstein College of Medicine at Yeshiva University is a two-year program for physicians leading to a Master of Science degree in Clinical Research Methods. Beginning in July 2004, the program began teaching data analysis using Stata 8 in order to better meet the advanced statistical needs of the students. This paper details the structure and content of the course, how Stata was introduced, and the problems we encountered. Student comments and suggestions on future enhancements to Stata are included. Although challenging, our first semester teaching Stata was a success: the students all learned Stata and, more importantly, continued to use it for the analysis of their own research data after the course was complete. Copyright 2005 by StataCorp LP.
teaching statistics to physicians, menu-driven interaction style
http://www.stata-journal.com/article.html?article=gn0027
Susan M. Hailpern
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:58-752020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:58-75
article
Haplotype analysis in population-based association studies
This paper describes how to use the command hapipf and introduces the command profhap written for Stata that analyzes population-based genetic data. For these studies, association can be linkage disequilibrium within a set of loci or allelic/haplotype association with disease status. Confidence intervals for odds ratios are calculated with or without adjustment for possible factors that are confounding the relationship. Additionally, this command allows the specification of many models of association that are not widely implemented. Copyright 2001 by Stata Corporation.
haplotype analysis, association tests, profile likelihood, odds ratio
http://www.stata-journal.com/software/sj1-1/st0003/
http://www.stata-journal.com/sjpdf.html?articlenum=st0003
A. P. Mander
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:420-4392020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:420-439
article
Speaking Stata: Problems with tables, Part II
Three user-written commands are reviewed as illustrations of different approaches to tabulation problems, each one step beyond what is possible to do directly through official Stata. tabcount is a wrapper for tabdisp written to produce tables that show how often specified values occur or specified conditions are satisfied so that, in particular, tables may include explicit zeros whenever desired. makematrix is designed for situations in which a table of results may be compiled by populating a matrix. matrix list or list maythenbeusedtodisplay the table. groups shows frequencies of combinations of values using list. Users should find these commands to be helpful additions to the toolkit. Programmers may be interested in examples of the wrapper approach, calculating the values to be tabulated before passing them to a workhorse display command. This is the second of two papers on this topic. Copyright 2003 by StataCorp LP.
tables, matrices, tabcount, makematrix, groups, tabdisp, list
http://www.stata-journal.com/sjpdf.html?articlenum=pr0011
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:1:p:81-992020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:1:p:81-99
article
Speaking Stata: On structure and shape: the case of multiple responses
A frequent problem in data management is that datasets may not arrive in the best structure for many analyses, so that it may be necessary to restructure the data in some way. The particular case of multiple response data is discussed at length, with special attention to different possible structures; the generation of new variables holding the data in different form; valuable inbuilt string and egen functions; using foreach and forvalues to loop over lists; and the use of the reshape command. Tabulations and graphics for such data are also reviewed brießy. Copyright 2003 by Stata Corporation.
composite variables, concatenation, egen, foreach, forvalues, graphics, indicator variables, multiple responses, reshape, split, string functions, tabulations
http://www.stata-journal.com/sjpdf.html?articlenum=pr0008
Nicholas J. Cox
Ulrich Kohler
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:106-1112020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:106-111
article
Decomposing inequality and obtaining marginal effects
This article describes a user-written command, descogini, that decomposes the Gini coefficient by income source and allows the calculation of the impact that a marginal change in a particular income source will have on inequality. descogini can be used with bootstrap to obtain standard errors and confidence intervals. Copyright 2006 by StataCorp LP.
descogini, Gini, Gini decomposition by income source, inequality
http://www.stata-journal.com/article.html?article=st0100
http://www.stata-journal.com/software/sj6-1/st0100/
Alejandro Lopez-Feldman
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:140-1502020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:140-150
article
strbee: Randomization-based efficacy estimator
strbee analyzes a two-group clinical trial with a survival outcome, in which some subjects may "crossover" to receive the treatment of the other arm. Adjustment for treatment crossover is done by a randomization-respecting method that preserves the intention-to-treat p-value. Copyright 2002 by Stata Corporation.
clinical trials,treatment changes,randomizationÐrespecting\
http://www.stata-journal.com/software/sj2-2/st0012/
http://www.stata-journal.com/sjpdf.html?articlenum=st0012
Ian R. White
Sarah Walker
Abdel Babiker
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:465-4662020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:465-466
article
Stata tip 22: Variable name abbreviation
http://www.stata-journal.com/article.html?article=dm0016
Philip Ryan
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:474-4772020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:474-477
article
Stata tip 102: Highlighting specific bars
http://www.stata-journal.com/article.html?article=gr0049
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:40-492020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:40-49
article
FIML estimation of an endogenous switching model for count data
This paper presents code for fitting a FIML endogenous switching Poisson count model for cross-sectional data in Stata 7: the espoisson command. The Poisson process depends on an unobserved heterogeneity term, ; a set of explanatory variables, x; and an endogenous dummy, d. The endogenous dummy depends on an unobserved random term, . Correlation between and is allowed. If a model with exogenous d is fitted instead, correlation between and will result in simultaneous equation bias. The endogenous switching model corrects this problem. After describing the underlying econometric theory behind the command, an example is discussed.
count models, endogenous switch, sample selection
http://www.stata-journal.com/software/sj4-1/st0057/
http://www.stata-journal.com/sjpdf.html?articlenum=st0057
Alfonso Miranda
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:227-2412020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:227-241
article
Multiple imputation of missing values
Following the seminal publications of Rubin about thirty years ago, statisticians have become increasingly aware of the inadequacy of "complete-case" analysis of datasets with missing observations. In medicine, for example, observations may be missing in a sporadic way for different covariates, and a complete-case analysis may omit as many as half of the available cases. Hotdeck imputation was implemented in Stata in 1999 by Mander and Clayton. However, this technique may perform poorly when many rows of data have at least one missing value. This article describes an implementation for Stata of the MICE method of multiple multivariate imputation described by van Buuren, Boshuizen, and Knook (1999). MICE stands for multivariate imputation by chained equations. The basic idea of data analysis with multiple imputation is to create a small number (e.g., 5-10) of copies of the data, each of which has the missing values suitably imputed, and analyze each complete dataset independently. Estimates of parameters of interest are averaged across the copies to give a single estimate. Standard errors are computed according to the "Rubin rules", devised to allow for the between- and within-imputation components of variation in the parameter estimates. This article describes five ado-files. mvis creates multiple multivariate imputations. uvis imputes missing values for a single variable as a function of several covariates, each with complete data. micombine fits a wide variety of regression models to a multiply imputed dataset, combining the estimates using Rubin's rules, and supports survival analysis models (stcox and streg), categorical data models, generalized linear models, and more. Finally, misplit and mijoin are utilities to interconvert datasets created by mvis and by the miset program from John Carlin and colleagues. The use of the routines is illustrated with an example of prognostic modeling in breast cancer. Copyright 2004 by StataCorp LP.
mvis, uvis, micombine, mijoin, misplit, missing data, missing at random, multiple imputation, multivariate imputation, regression modeling
http://www.stata-journal.com/software/sj4-3/st0067/
http://www.stata-journal.com/sjpdf.html?articlenum=st0067
Patrick Royston
oai:RePEc:tsj:stataj:v:5:y:2004:i:2:p:239-2472020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2004:i:2:p:239-247
article
Using the file command to produce formatted output for other applications
The file command provides a way to produce tables for use in other application software. It can be especially useful for combining descriptive results (such as means and percentages) and results from significance tests. Extracting and manipulating the results directly from Stata matrices gives more control over arrangement, while other Stata functions may be used to control numeric formats. This tutorial includes examples based on survey data of both plain text and HTML output. Copyright 2005 by StataCorp LP.
file, presentation of results, tables, HTML, spreadsheets, word processors, browsers
http://www.stata-journal.com/article.html?article=dm0015
Emma Slaymaker
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:227-2522020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:227-252
article
Structural choice analysis with nested logit models
The nested logit model has become an important tool for the empirical analysis of discrete outcomes. There is some confusion about its specification of the outcome probabilities. Two major variants show up in the literature. This paper compares both and Þnds that one of them (called random utility maximization nested logit, RUMNL) is preferable in most situations. Since the command nlogit of Stata 7.0 implements the other variant (called non-normalized nested logit, NNNL), an implementation of RUMNL called nlogitrum is introduced.Numerous examples support and illustrate the differences between both specifications. Copyright 2002 by Stata Corporation.
nlogitdn,nlogitrum,nested logit model,discrete choice,random utility maximization model
http://www.stata-journal.com/software/sj2-3/st0017/
http://www.stata-journal.com/sjpdf.html?articlenum=st0017
Florian Heiss
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:378-3902020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:378-390
article
Programmable GLM: Two user-defined links
With the release of Stata 7, the glm command for fitting generalized linear models underwent a substantial overhaul. Stata 7 glm contains an expanded array of variance estimators, regression diagnostics, and other enhancements. The overhaul took place to coincide with the release of Hardin and Hilbe (2001). With the new glm came a modular design that enables users to program customized link functions, variance functions, and weight functions to be used if Newey-West covariance estimates are desired. Because cases requiring customized link functions are the more prevalent in the literature, only those are considered here. We give two examples where a nonstandard link function is required: the relative survival model of Hakulinen and Tenkanen (1987) and a logistic model that accounts for natural response as described in Collett (2003). The relative ease (over previous versions of Stata) with which these alternate links can be programmed into glm is demonstrated. Copyright 2002 by Stata Corporation.
GLM, survival analysis, Cox regression, programming
http://www.stata-journal.com/software/sj2-4/st0027/
http://www.stata-journal.com/sjpdf.html?articlenum=st0027
Weihua Guan
Roberto G. Gutierrez
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:208-2332020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:208-233
article
Data inspection using biplots
Biplots display interunit distances, as well as variances and correlations of variables of large datasets. They can be used as a tool to reveal clustering, multicollinearity, and multivariate outliers, and to guide the interpretation of principal component analyses (PCA). This article describes the uses of biplots and its implementation in Stata. Copyright 2005 by StataCorp LP.
biplot, biplot8, principal component analysis, exploratory data analysis, multivariate statistics, euclidean distance, mahalanobis distance, relative variation diagram, projection
http://www.stata-journal.com/article.html?article=gr0011
http://www.stata-journal.com/software/sj5-2/gr0011/
Ulrich Kohler
Magdalena Luniak
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:290-3112020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:290-311
article
Implementing matching estimators for average treatment effects in Stata
This paper presents an implementation of matching estimators for average treatment effects in Stata. The nnmatch command allows you to estimate the average effect for all units or only for the treated or control units; to choose the number of matches; to specify the distance metric; to select a bias adjustment; and to use heteroskedastic-robust variance estimators. Copyright 2004 by StataCorp LP.
nnmatch, average treatment effects, matching, exogeneity, unconfoundedness, ignorability
http://www.stata-journal.com/software/sj4-3/st0072/
http://www.stata-journal.com/sjpdf.html?articlenum=st0072
Alberto Abadie
David Drukker
Jane Leber Herr
Guido W. Imbens
oai:RePEc:tsj:stataj:v:2:y:2002:i:1:p:22-442020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:1:p:22-44
article
Parametric frailty and shared frailty survival models
Frailty models are the survival data analog to regression models, which account for heterogeneity and random effects. A frailty is a latent multiplicative effect on the hazard function and is assumed to have unit mean and variance theta, which is estimated along with the other model parameters. A frailty model is an heterogeneity model where the frailties are assumed to be individual- or spell-specific. A shared frailty model is a random effects model where the frailties are common (or shared) among groups of individuals or spells and are randomly distributed across groups. Parametric frailty models were made available in Stata with the release of Stata 7, while parametric shared frailty models were made available in a recent series of updates. This article serves as a primer to those fitting parametric frailty models in Stata via the streg command. Frailty models are compared to shared frailty models, and both are shown to be equivalent in certain situations. The user-specified form of the distribution of the frailties (whether gamma or inverse Gaussian) is shown to subtly affect the interpretation of the results. Methods for obtaining predictions that are either conditional or unconditional on the frailty are discussed. An example that analyzes the time to recurrence of infection after catheter insertion in kidney patients is studied. Copyright 2002 by Stata Corporation.
parametric survival analysis, frailty, random effects, overdispersion, heterogeneity
http://www.stata-journal.com/sjpdf.html?articlenum=st0006
Roberto G. Gutierrez
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:105-1082020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:105-108
article
The Stata Journal so far: Editors' report
The editor and executive editor give some thoughts on the recent past, the present, and the immediate future of the Stata Journal. Copyright 2003 by Stata Corporation.
http://www.stata-journal.com/sjpdf.html?articlenum=gn0006
H. Joseph Newton
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:58-822020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:58-82
article
Generalized ordered logit/partial proportional odds models for ordinal dependent variables
This article describes the gologit2 program for generalized ordered logit models. gologit2 is inspired by Vincent Fu’s gologit routine (Stata Technical Bulletin Reprints 8: 160–164) and is backward compatible with it but offers several additional powerful options. A major strength of gologit2 is that it can fit three special cases of the generalized model: the proportional odds/parallel-lines model, the partial proportional odds model, and the logistic regression model. Hence, gologit2 can fit models that are less restrictive than the parallel-lines models fitted by ologit (whose assumptions are often violated) but more parsimonious and interpretable than those fitted by a nonordinal method, such as multinomial logistic regression (i.e., mlogit). Other key advantages of gologit2 include support for linear constraints, survey data estimation, and the computation of estimated probabilities via the predict command.
gologit2, gologit, logistic regression, ordinal regression, proportional odds, partial proportional odds, generalized ordered logit model, parallel- lines model
http://www.stata-journal.com/article.html?article=st0097
http://www.stata-journal.com/software/sj6-1/st0097/
Richard Williams
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:329-3412020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:329-341
article
Measurement error, GLMs, and notational conventions
This paper introduces additive measurement error in a generalized linear-model context. We discuss the types of measurement error along with their effects on fitted models. In addition, we present the notational conventions to be used in this and the accompanying papers. Copyright 2003 by StataCorp LP.
generalized linear models, transportability, measurement error
http://www.stata-journal.com/sjpdf.html?articlenum=st0047
James W. Hardin
Raymond J. Carroll
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:39-402020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:39-40
article
The birth of the bulletin
http://www.stata-journal.com/sjpdf.html?articlenum=gn0022
Joseph M. Hilbe
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:76-852020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:76-85
article
From the help desk
Welcome to From the help desk. From the help desk is written by the people in Technical Services at StataCorp and deals with issues that they have found to be of concern to a large fraction of Stata users. It is the rare column in this series that deals with sophisticated programming issues because such issues, by definition, are not of concern to a large fraction of Stata users. From the help desk discusses the use of sophisticated programs and the use of sophisticated statistics. Copyright 2001 by Stata Corporation.
internet, web, ado-files, Stata executable installation, updates, downloading, user-written additions, packages, search, find
http://www.stata-journal.com/sjpdf.html?articlenum=pr0002
Allen McDowell
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:312-3282020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:312-328
article
From the help desk: Some bootstrapping techniques
Bootstrapping techniques have become increasingly popular in applied econometrics and other areas. This article presents several methods and shows how to implement them using Stata's bootstrap command. Copyright 2004 by StataCorp LP.
bssize initial, bssize refine, bssize analyze, bssize cleanup, bootstrap, confidence intervals, percentile-t, dependent processes
http://www.stata-journal.com/software/sj4-3/st0073/
http://www.stata-journal.com/sjpdf.html?articlenum=st0073
Brian P. Poi
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:354-3552020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:354-355
article
Stata tip 10: Fine control of axis title positions
http://www.stata-journal.com/sjpdf.html?articlenum=gr0006
Philip Ryan
Nicholas Winter
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:601-6022020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:601-602
article
Stata tip 25: Sequence index plots
http://www.stata-journal.com/article.html?article=gr0022
Ulrich Kohler
Christian Brzinsky-Fay
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:314-3292020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:314-329
article
Speaking Stata: On numbers and strings
The great divide among data types in Stata is between numeric and string variables. Most of the time, which kind you want to use for particular variables is clear and unproblematic, but surprisingly often,users face difficulties in making the right decision or need to convert variables from one kind to another. The main problems that may arise and their possible solutions are surveyed with reference both to official Stata and to user-written programs. Copyright 2002 by Stata Corporation.
binary variables,categorical variables,Data Editor,dates,decode, destring,encode,identifiers,missing values,numeric variables,spreadsheets, string functions,string variables,tostring,value labels
http://www.stata-journal.com/sjpdf.html?articlenum=pr0006
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:4482020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:448
article
Stata tip 3: How to be assertive
http://www.stata-journal.com/sjpdf.html?articlenum=dm0004
William Gould
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:223-2252020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:223-225
article
Review of Statistics with Stata (Updated for Version 7)
The new book by Hamilton (2002) is reviewed. Copyright 2002 by Stata Corporation.
http://www.stata-journal.com/sjpdf.html?articlenum=gn0003
Jeremy Freese
oai:RePEc:tsj:stataj:v:3:y:2003:i:1:p:1-312020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:1:p:1-31
article
Instrumental variables and GMM: Estimation and testing
We discuss instrumental variables (IV) estimation in the broader context of the generalized method of moments (GMM), and describe an extended IV estimation routine that provides GMM estimates as well as additional diagnostic tests. Stand-alone test procedures for heteroskedasticity, overidentification, and endogeneity in the IV context are also described. Copyright 2003 by Stata Corporation.
instrumental variables, generalized method of moments, endogeneity, heteroskedasticity, overidentifying restrictions
http://www.stata-journal.com/software/sj3-1/st0030/
http://www.stata-journal.com/sjpdf.html?articlenum=st0030
Christopher F Baum
Mark E. Schaffer
Steven Stillman
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:83-912020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:83-91
article
Further processing of estimation results: Basic programming with matrices
Rather than process estimation results in other applications, such as spreadsheets, this article shows how easy it is to process them inside Stata by undertaking some basic programming with matrices. Spreadsheet visualization helps define the task, but the steps are all core Stata: macros, loops, and matrices. The programming challenge is only a modest one for novices, while the benefits of converting do-files into ado programs can be considerable. Copyright 2005 by StataCorp LP.
matrices, programming, estimation results
http://www.stata-journal.com/software/sj5-1/pr0015/
http://www.stata-journal.com/article.html?article=pr0015
Ian Watson
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:203-2072020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:203-207
article
Review of Statistical Modeling for Biomedical Researchers by Dupont
The new book by Dupont (2002) is reviewed. Copyright 2003 by Stata Corporation.
biostatistics
http://www.stata-journal.com/sjpdf.html?articlenum=gn0007
Joanne M. Garrett
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:3252020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:325
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2003 by StataCorp LP.
http://www.stata-journal.com/software/sj3-3/sg151_1/
http://www.stata-journal.com/software/sj3-3/snp15_4/
http://www.stata-journal.com/software/sj3-3/snp16_2/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:4492020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:449
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2003 by StataCorp LP.
http://www.stata-journal.com/software/sj3-4/gr41_2/
http://www.stata-journal.com/software/sj3-4/sbe19_5/
http://www.stata-journal.com/software/sj3-4/sg67_2/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:440-4442020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:440-444
article
Review of Maximum Likelihood Estimation with Stata by Gould, Pitblado, and Sribney
The new book by Gould, Pitblado, and Sribney (2003) is reviewed. Copyright 2003 by StataCorp LP.
maximum likelihood, Stata programming
http://www.stata-journal.com/sjpdf.html?articlenum=gn0009
Stephen P. Jenkins
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:517-5262020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:517-526
article
Buckley-James method for analyzing censored data, with an application to a cardiovascular disease and an HIV/AIDS study
The Buckley-James method and the Cox proportional hazards model were proposed in the 1970s. Both methods can be used to analyze survival-type data, although the former focuses on calculation of the expected value of the sur- vival time and the latter on the relative risk of explanatory variables on the failure event. In cardiovascular disease epidemiological studies, it is essential to correct the effect of taking antihypertensive medicine, which means we need to calculate the expected blood pressure for people who take the medicine. I developed a Stata program to calculate the Buckley-James estimate. I will describe how to use this program to calculate the expected value of a censored outcome and illustrate the method through an example from a cardiovascular disease and an HIV/AIDS study. Copyright 2005 by StataCorp LP.
Buckley-James method, censoring, expectation, survival
http://www.stata-journal.com/article.html?article=st0093
http://www.stata-journal.com/software/sj5-4/st0093/
James Cui
oai:RePEc:tsj:stataj:v:3:y:2003:i:1:p:32-462020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:1:p:32-46
article
Intra-class correlation in random-effects models for binary data
We review the concept of intra-class correlation in random-effects models for binary outcomes as estimated by Stata's xtprobit, xtlogit, and xtclog. We consider the usual measures of correlation based on a latent variable formulation of these models and note corrections to the last two procedures. We also discuss alternative measures of association based on manifest variables or actual outcomes and introduce a new command xtrho for computing these measures for all three types of models. Copyright 2003 by Stata Corporation.
intra-class correlation, random-effects, probit, logit, complementary log-log, PearsonÕs r, YuleÕs Q
http://www.stata-journal.com/software/sj3-1/st0031/
http://www.stata-journal.com/sjpdf.html?articlenum=st0031
German Rodriguez
Irma Elo
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:22-392020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:22-39
article
Automatic generation of documents
This paper describes a natural interaction between Stata and markup languages. Stata’s programming and analysis features, together with the flexibility in output formatting of the markup languages, allow generation and/or update of whole documents (e.g., reports, presentations on screen or web). We give examples for both LATEX and HTML. Copyright 2006 by StataCorp LP.
doutput, format, report, LATEX, HTML, markup language
http://www.stata-journal.com/article.html?article=pr0020
http://www.stata-journal.com/software/sj6-1/pr0020/
Rosa Gini
Jacopo Pasquini
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:329-3492020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:329-349
article
Speaking Stata: Graphing agreement and disagreement
Many statistical problems involve comparison and, in particular, the assessment of agreement or disagreement between data measured on identical scales. Some commonly used plots are often ineffective in assessing the fine structure of such data, especially scatterplots of highly correlated variables and plots of values measured "before" and "after" using tilted line segments. Valuable alternatives are available using horizontal reference patterns, changes plotted as parallel lines, and parallel coordinates plots. The quantities of interest (usually differences on some scale) should be shown as directly as possible, and the responses of given individuals should be identified as easily as possible. Copyright 2004 by StataCorp LP.
graphics, comparison, agreement, paired data, panel data, scatterplot, difference-mean plot, Bland-Altman plot, parallel lines plot, parallel coordinates plot, pairplot, parplot, linkplot, Tukey
http://www.stata-journal.com/sjpdf.html?articlenum=gr0005
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:972020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:97
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2004 by StataCorp LP.
http://www.stata-journal.com/software/sj4-1/st0037_1/
http://www.stata-journal.com/software/sj4-1/gr0002_1/
http://www.stata-journal.com/software/sj4-1/gr42_2/
http://www.stata-journal.com/software/sj4-1/ip24_1/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:473-5002020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:473-500
article
Estimation and inference in dynamic unbalanced panel-data models with a small number of individuals
This article describes a new Stata routine, xtlsdvc, that computes bias-corrected least-squares dummy variable (LSDV) estimators and their boot- strap variance-covariance matrix for dynamic (possibly) unbalanced panel-data models with strictly exogenous regressors. A Monte Carlo analysis is carried out to evaluate the finite-sample performance of the bias-corrected LSDV estimators in comparison to the original LSDV estimator and three popular N-consistent estimators: Arellano-Bond, Anderson-Hsiao and Blundell-Bond. Results strongly sup- port the bias-corrected LSDV estimators according to bias and root mean squared error criteria when the number of individuals is small. Copyright 2005 by StataCorp LP.
xtlsdvc, bias approximation, unbalanced panels, dynamic panel data, LSDV estimator, Monte Carlo experiment, bootstrap variance-covariance
http://www.stata-journal.com/article.html?article=st0091
http://www.stata-journal.com/software/sj5-4/st0091/
Giovanni S. F. Bruno
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:442-4602020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:442-460
article
Speaking Stata: The protean quantile plot
Quantile plots showing by default ordered values versus cumulative probabilities are both well known and also often neglected, considering their ma jor advantages. Their flexibility and power is emphasized by using the qplot program to show several variants on the standard form, making full use of options for reverse, ranked, and transformed scales and for superimposing and juxtaposing quantile traces. Examples are drawn from the analysis of species abundance data in ecology. A revised version of qplot is formally released with this column. Distribution plots in which the axes are interchanged are also discussed briefly, in conjunction with a revised version of distplot, also released now.
qplot, distplot, distributions, quantile plots, statistical graph- ics, species abundance, ecology, Whittaker plots, broken stick, lognormal, power laws, scaling laws
http://www.stata-journal.com/article.html?article=gr0018
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:351-3572020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:351-357
article
Two-graph receiver operating characteristic
The command roctg allows visualizing sensitivity (Se) and specificity (Sp) curves according to the range of values of a new diagnostic test, given a "true" state of an event, the reference test. On request,several options for displaying Se and Sp estimates in, or enhancements for, the graphs are also available. Copyright 2002 by Stata Corporation.
concurrent validity, sensitivity, specificity, ROC analysis
http://www.stata-journal.com/software/sj2-4/st0025/
http://www.stata-journal.com/sjpdf.html?articlenum=st0025
Michael E. Reichenheim
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:95-962020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:95-96
article
Stata tip 6: Inserting awkward characters in the plot
http://www.stata-journal.com/sjpdf.html?articlenum=dm0006
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:484-4852020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:484-485
article
Stata tip 13: generate and replace use the current sort order
http://www.stata-journal.com/sjpdf.html?articlenum=dm0008
Roger Newson
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:86-972020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:86-97
article
Speaking Stata: How to repeat yourself without going mad
This column will focus on how to improve your fluency in Stata. Over the next issues we will look at Stata problems of intermediate size which turn out to be soluble with a few command lines. As an introduction, systematic ways of repeating the same or similar operations are surveyed to give one overview of the territory to be covered. Copyright 2001 by Stata Corporation.
append, by, collapse, contract, do-files, egen, for, foreach, forvalues, log files, merge, naming conventions, programs, repetition, reshape, statsby, subset or group structure, tabulations
http://www.stata-journal.com/sjpdf.html?articlenum=pr0003
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:1:p:57-702020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:1:p:57-70
article
Implementing tests with correct size in the simultaneous equations model
In this paper, we propose a fix to the size distortions of tests for structural parameters in the simultaneous equations model by computing critical value functions based on the conditional distribution of test statistics. The conditional tests can then be used to construct informative confidence regions for the structural parameter with correct coverage probability. Commands to implement these tests in Stata are also introduced. Copyright 2003 by Stata Corporation.
instrumental variables, weak instruments, similar tests, score test, Wald test, likelihood-ratio test, conÞdence regions, 2SLS estimator, LIML estimator
http://www.stata-journal.com/software/sj3-1/st0033/
http://www.stata-journal.com/sjpdf.html?articlenum=st0033
Marcelo J. Moreira
Brian P. Poi
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:148-1562020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:148-156
article
Adaptive kernel density estimation
This insert describes the module akdensity. akdensity extends the official kdensity that estimates density functions by the kernel method. The extensions are of two types: akdensity allows the use of an "adaptive kernel" approach with varying, rather than fixed, bandwidths; and akdensity estimates pointwise variability bands around the estimated density functions. Copyright 2003 by Stata Corporation.
adaptive kernel density, local bandwidths, variability bands
http://www.stata-journal.com/software/sj3-2/st0037/
http://www.stata-journal.com/sjpdf.html?articlenum=st0037
Philippe Van Kerm
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:6072020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:607
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2005 by StataCorp LP.
http://www.stata-journal.com/software/sj5-4/dm67_3/
http://www.stata-journal.com/software/sj5-4/dm88_1/
http://www.stata-journal.com/software/sj5-4/sg134_1/
http://www.stata-journal.com/software/sj5-4/st0030_2/
http://www.stata-journal.com/software/sj5-4/st0037_2/
http://www.stata-journal.com/article.html?article=up0013
Editors
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:309-3242020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:309-324
article
Speaking Stata: Problems with tables, Part I
Tables in some form or another are part and parcel of data management and analysis. The main general-purpose tabulation commands, tabulate, table, and tabstat, are reviewed and compared. When these do not provide a tabulation solution, one key strategy is to prepare the material for tabulation as a set of variables, after which the table itself can be presented with tabdisp or list. This is the first of two papers on this topic. Copyright 2003 by StataCorp LP.
tables, tabulate, table, tabstat, tabdisp, list
http://www.stata-journal.com/sjpdf.html?articlenum=pr0010
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:107-1122020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:107-112
article
Generalized Lorenz curves and related graphs: an update for Stata 7
The glcurve command is updated to a Stata 7 version, glcurve7, which is described and illustrated. Copyright 2001 by Stata Corporation.
generalized Lorenz curves
http://www.stata-journal.com/software/sj1-1/gr0001/
http://www.stata-journal.com/sjpdf.html?articlenum=gr0001
Philippe Van Kerm
Stephen P. Jenkins
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:2852020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:285
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2005 by StataCorp LP.
http://www.stata-journal.com/software/sj5-2/sed9_2/
http://www.stata-journal.com/software/sj5-2/snp2_1/
http://www.stata-journal.com/software/sj5-2/st0033_1/
http://www.stata-journal.com/software/sj5-2/st0045_1/
http://www.stata-journal.com/software/sj5-2/st0053_2/
http://www.stata-journal.com/article.html?article=up0011
Editors
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:560-5662020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:560-566
article
Suggestions on Stata programming style
Various suggestions are made on Stata programming style, under the headings of presentation, helpful Stata features, respect for datasets, speed and efficiency, reminders, and style in the large. Copyright 2005 by StataCorp LP.
Stata language, programming style
http://www.stata-journal.com/article.html?article=pr0018
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:1:p:100-1042020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:1:p:100-104
article
Review of A Short Introduction to Stata For Biostatistics by Hills and De Stavola
The new book by Hills and De Stavola (2002) is reviewed. Copyright 2002 by Stata Corporation.
biostatistics
http://www.stata-journal.com/sjpdf.html?articlenum=gn0005
John McGready
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:66-882020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:66-88
article
Speaking Stata: Graphing distributions
Graphing univariate distributions is central to both statistical graphics, in general, and StataÕs graphics, in particular. Now that Stata 8 is out, a review of official and user-written commands is timely. The emphasis here is on going beyond what is obviously and readily available, with pointers to minor and major trickery and various user-written commands. For plotting histogram-like displays, kerneldensity estimates and plots based on distribution functions or quantile functions, a large variety of choices is now available to the researcher. Copyright 2004 by StataCorp LP.
graphics, histogram, spikeplot, dotplot, onewayplot, kdensity, distplot, qplot, skewplot, bin width, rug, density function, kernel estimation, transformations, logarithmic scale, root scale, intensity function, distribution function, quantile function, skewness
http://www.stata-journal.com/sjpdf.html?articlenum=gr0003
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:211-2112020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:211-211
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2003 by Stata Corporation.
http://www.stata-journal.com/software/sj3-2/gr41_1/
http://www.stata-journal.com/software/sj3-2/sbe31_1/
http://www.stata-journal.com/software/sj3-2/sg113_1/
http://www.stata-journal.com/software/sj3-2/st0004_1/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:932020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:93
article
Stata tip 4: Using display as an online calculator
http://www.stata-journal.com/sjpdf.html?articlenum=gn0011
Philip Ryan
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:467-4682020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:467-468
article
Stata tip 23: Regaining control over axis ranges
http://www.stata-journal.com/article.html?article=gr0019
Nicholas J.G. Winter
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:390-4002020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:390-400
article
kountry: A Stata utility for merging cross-country data from multiple sources
This article describes kountry, a data-management command that can be used to translate one country-coding scheme into another, to recode country names into a "standardized form", and to generate geographic-region variables. Users can build a custom dictionary through a helper command, kountryadd, that "teaches" kountry new name variations. The dictionary can be protected from an accidental overwriting through two helper commands: kountrybackup and kountryrestore. Copyright 2008 by StataCorp LP.
kountry, kountryadd, kountrybackup, kountryrestore, country names, country-coding schemes, geographical, region
http://www.stata-journal.com/article.html?article=dm0038
http://www.stata-journal.com/software/sj8-3/dm0038/
Rafal Raciborski
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:202-2222020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:202-222
article
Speaking Stata: How to face lists with fortitude
Three commands in official Stata, foreach, forvalues, and for, provide structures for cycling through lists of values (variable names, numbers, arbitrary text) and repeating commands using members of those lists in turn. All these commands may be used interactively, and none is restricted to use in Stata programs.They are explained and compared in some detail with a variety of examples.In addition,a self-contained exposition is given on local macros, understanding of which is needed for use of foreach and forvalues. Copyright 2002 by Stata Corporation.
foreach, forvalues, for, lists, local macros, substitution first
http://www.stata-journal.com/sjpdf.html?articlenum=pr0005
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:130-1332020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:130-133
article
Review of Generalized Latent Variable Modeling by Skrondal and Rabe-Hesketh
The new book by Skrondal and Rabe-Hesketh (2004) is reviewed.
GLLAMM, generalized linear latent and mixed models, latent variables
http://www.stata-journal.com/sjpdf.html?articlenum=gn0025
Roger Newson
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:412-4192020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:412-419
article
From the help desk: Local polynomial regression and Stata plugins
Local polynomial regression is a generalization of local mean smoothing as described by Nadaraya (1964)andWat s on (1964). Instead of fitting a local mean, one instead fits a local pth-order polynomial. Calculations for local polynomial regression are naturally more complex than those for local means, but local polynomial smooths have better statistical properties. The computational complexity may, however, be alleviated by using a Stata plugin. In this article, we describe the locpoly command for performing local polynomial regression. The calculations involved are implemented in both ado-code and with a plugin, allowing the user to assess the speed improvement obtained from using the plugin. Source code for the plugin is also provided as part of the package for this program. Copyright 2003 by StataCorp LP.
local polynomial, local linear, smoothing, kernel, plugin
http://www.stata-journal.com/software/sj3-4/st0053/
http://www.stata-journal.com/sjpdf.html?articlenum=st0053
Roberto G. Gutierrez
Jean Marie Linhart
Jeffrey S. Pitblado
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:105-1062020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:105-106
article
Sort a list of items
The command listsort for sorting the contents of a set of local macros is introduced and illustrated. Copyright 2001 by Stata Corporation.
sorting, local macros
http://www.stata-journal.com/software/sj1-1/dm0001/
http://www.stata-journal.com/sjpdf.html?articlenum=dm0001
Patrick Royston
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:190-2132020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:190-213
article
Speaking Stata: Graphing categorical and compositional data
A variety of graphs have been devised for categorical and compositional data, ranging from widely familiar to more unusual displays. Both official Stata commands and user-written programs are available. After a stacking trick for binary responses is explained, bar charts and related displays for cross-tabulations are discussed in detail. Tips and tricks are introduced for plotting cumulative distributions of graded (ordinal) data. Triangular plots are explained for threeway compositions, such as three proportions or percentages. Copyright 2004 by StataCorp LP.
graphics, categorical data, binary data, nominal data, ordinal data, grades, compositional data, cross-tabulations, bar charts, cumulative distributions, logit scale, catplot, tabplot, tableplot, distplot, mylabels, triplot
http://www.stata-journal.com/sjpdf.html?articlenum=gr0004
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:19-312020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:19-31
article
A conversation with William Gould
William Gould is President of StataCorp. He was born in Burbank, California, on January 21, 1952. He received a B.A. in economics from UCLA in 1974 and a C.Phil. in economics from UCLA in 1977, after initially majoring in physics and then engineering. He studied economics in the Ph.D. program at UCLA and was simultaneously a Research Fellow at The Rand Corporation. He did not turn in his dissertation in labor economics before becoming a Senior Research Associate at the National Bureau of Economic Research in Stanford, California, in 1977. In 1979, he become a Senior Economist at Unicon Research Corporation in Los Angeles, a company that he helped to found. He cofounded and served as Vice-President of Computing Resource Center in 1982, the company that went on to develop Stata. Bill became President of CRC in 1990 and, in 1993, CRC was renamed StataCorp. Copyright 2005 by StataCorp LP.
http://www.stata-journal.com/sjpdf.html?articlenum=gn0018
H. Joseph Newton
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:361-3722020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:361-372
article
The regression-calibration method for fitting generalized linear models with additive measurement error
This paper discusses and illustrates the method of regression calibration. This is a straightforward technique for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). Discussion will include specified measurement error, measurement error estimated by replicate error-prone proxies, and measurement error estimated by instrumental variables. The discussion focuses on software developed as part of a small business innovation research (SBIR) grant from the National Institutes of Health (NIH). Copyright 2003 by StataCorp LP.
regression calibration, measurement error, instrumental variables, replicate measures, generalized linear models
http://www.stata-journal.com/software/sj3-4/st0050/
http://www.stata-journal.com/sjpdf.html?articlenum=st0050
James W. Hardin
Henrik Schmeidiche
Raymond J. Carroll
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:449-4752020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:449-475
article
Speaking Stata: Graphing model diagnostics
Plotting diagnostic information calculated from residuals and fitted values is a long-standard method for assessing models and seeking ways of improving them. This column focuses on the statistical mainstream defined by regression models for continuous responses, treated in a broad sense to include (for example) generalized linear models. After some comments on the history of such ideas (and even their anthropology and psychology), the commands available in official Stata are reviewed, and a modeldiag package is introduced. A detailed example on fuelwood yield from fallow areas in Nigeria illustrates a variety of general points and specific tips. Copyright 2004 by StataCorp LP.
modeldiag, anovaplot, indexplot, ofrtplot, ovfplot, qfrplot, racplot, rdplot, regplot, rhetplot, rvfplot2, rvlrplot, rvpplot2, graphics, diagnostics, regression, generalized linear models, analysis of variance
http://www.stata-journal.com/software/sj4-4/gr0009/
http://www.stata-journal.com/sjpdf.html?articlenum=gr0009
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:2202020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:220
article
Stata tip 7: Copying and pasting under Windows
http://www.stata-journal.com/sjpdf.html?articlenum=dm0007
Shannon Driver
Patrick Royston
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:221-2222020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:221-222
article
Stata tip 8: Splitting time-span records with categorical time-varying covariates
http://www.stata-journal.com/sjpdf.html?articlenum=st0066
Ben Jann
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:142-1532020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:142-153
article
Sample size and power calculations using the noncentral t-distribution
The standard formulas for sample size and power calculation, as implemented in the command sampsi, make use of a normal approximation to the t-distribution. When the sample sizes are small, this approximation is poor, resulting in overestimating power (or underestimating sample size). One particular situation in which this is likely to be important is the field of cluster randomized trials. Although the total number of individuals in a cluster randomized trial may be large, the number of clusters will often be small. We present a simulation study from the design of a cluster randomized crossover trial that motivated this work and a command to perform more accurate sample size and power calculations based on the noncentral t-distribution. Copyright 2004 by StataCorp LP.
sampncti, sample size, power, noncentral t-distribution
http://www.stata-journal.com/software/sj4-2/st0062/
http://www.stata-journal.com/sjpdf.html?articlenum=st0062
David A. Harrison
Anthony R. Brady
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:301-3132020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:301-313
article
From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models
Occasionally, there is a need to compare the predictive accuracy of several fitted logit (logistic) or probit models by comparing the areas under the corresponding receiver operating characteristic (ROC) curves. Although Stata currently does not have a ready routine for comparing two or more ROC areas generated from these models, this article describes how these comparisons can be performed using Stata's roccomp command. Copyright 2002 by Stata Corporation.
Receiving Operating Characteristic (ROC) curve
http://www.stata-journal.com/sjpdf.html?articlenum=st0023
Mario A. Cleves
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:265-2732020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:265-273
article
Understanding the multinomial-Poisson transformation
There is a known connection between the multinomial and the Poisson likelihoods. This, in turn, means that a Poisson regression may be transformed into a logit model and vice versa. In this paper, I show the data transformations required to implement this transformation. Several examples are used as illustrations. Copyright 2004 by StataCorp LP.
Poisson regression, logit, conditional logit
http://www.stata-journal.com/software/sj4-3/st0069/
http://www.stata-journal.com/sjpdf.html?articlenum=st0069
Paulo Guimaraes
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:371-3842020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:371-384
article
Using density-distribution sunflower plots to explore bivariate relationships in dense data
Density-distribution sunflower plots are used to display high-density bivariate data. They are useful for data where a conventional scatterplot is difficult to read due to overstriking of the plot symbol. The x – y plane is subdivided into a lattice of small, regular, hexagonal bins. These bins are divided into low-, medium-, and high-density groups. In low-density bins, the individual observations are plotted as in a conventional scatterplot. Medium- and high-density bins contain light and dark sunflowers, respectively. In a light sunflower, each petal represents one observation. In a dark sunflower, each petal represents a specific number of observations. The user can control the sizes and colors of the sunflowers. By selecting appropriate colors and sizes for the light and dark sunflowers, plots can be obtained that give both the overall sense of the data-density distribution, as well as the number of data points in any given region. Sunflower plots are also contrasted with contour plots of bivariate kernel- density estimates. The appearance of these plots is markedly affected by the choice of smoothing parameters and the spacing of points at which the probability density function is evaluated. Sunflower plots can be helpful in guiding the se- lection of these parameters and in distinguishing between chance and systematic variation in the distribution of bivariate data. Copyright 2005 by StataCorp LP.
scatterplot, sunflower plot, bivariate data, density plot, prob- ability density function, graphical statistics
http://www.stata-journal.com/article.html?article=gr0016
http://www.stata-journal.com/software/sj5-3/gr0016/
William D. Dupont
W. Dale Plummer, Jr.
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:274-2812020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:274-281
article
Analysis of matched cohort data
Matching is occasionally used in cohort studies; examples include studies of twins and some studies of traffic crashes. Analysis of matched cohort data is not discussed in many textbooks or articles and is not mentioned in the Stata manuals. Risk ratios can be estimated using matched-pair cohort data with Stata's mcc command. We describe a new command, csmatch, which can produce these risk ratios and is often more convenient. We briefly review flexible regression methods that can estimate risk ratios in matched cohort data: conditional Poisson regression and some versions of Cox regression. Copyright 2004 by StataCorp LP.
csmatch, cohort study, conditional Poisson regression, matching, matched-pair, matched cohort study, risk ratio, odds ratio
http://www.stata-journal.com/software/sj4-3/st0070/
http://www.stata-journal.com/sjpdf.html?articlenum=st0070
Peter Cummings
Barbara McKnight
oai:RePEc:tsj:stataj:v:6:y:2006:i:4:p:5972020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:4:p:597
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2006 by StataCorp LP.
http://www.stata-journal.com/software/sj6-4/gr42_4/
http://www.stata-journal.com/software/sj6-4/gr0001_2/
http://www.stata-journal.com/software/sj6-4/st0105_1/
http://www.stata-journal.com/software/sj6-4/st0044_1/
http://www.stata-journal.com/software/sj6-4/st0053_3/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:154-1612020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:154-161
article
Value label utilities: labeldup and labelrename
I describe two utilities dealing with value labels. labeldup reports and optionally removes duplicate value labels. labelrename renames a value label. Both utilities, of course, preserve the links between variables and value labels and support multilingual datasets. Copyright 2005 by StataCorp LP.
labeldup, labelrename, value labels, data integrity, multilingual datasets
http://www.stata-journal.com/article.html?article=dm0012
http://www.stata-journal.com/software/sj5-2/dm0012/
Jeroen Weesie
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:421-4412020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:421-441
article
Mata matters: Translating Fortran
Mata is Stata’s matrix language. In the Mata matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. In this column, we demonstrate how Fortran programs can be translated and incorporated into Stata.
Mata, Fortran
http://www.stata-journal.com/article.html?article=pr0017
http://www.stata-journal.com/software/sj5-3/pr0017/
William Gould
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:2-182020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:2-18
article
A brief history of Stata on its 20th anniversary
http://www.stata-journal.com/sjpdf.html?articlenum=gn0017
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:103-1122020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:103-112
article
Cumulative incidence estimation in the presence of competing risks
When competing risks are present, the appropriate estimate of the failure probabilities is the cumulative incidence. stcompet creates new variables containing the estimate of this function, its standard error, and ln(-ln) transformed confidence bounds. Two examples are presented to illustrate the use of the new command and some key features of the cumulative incidence. Copyright 2004 by StataCorp LP.
stcompet, survival analysis, competing risks, cumulative incidence
http://www.stata-journal.com/software/sj4-2/st0059/
http://www.stata-journal.com/sjpdf.html?articlenum=st0059
Vincenzo Coviello
May Boggess
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:138-1432020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:138-143
article
Review of Multilevel and Longitudinal Modeling Using Stata by Rabe-Hesketh and Skrondal
This article reviews Multilevel and Longitudinal Modeling Using Stata, by Rabe-Hesketh and Skrondal.
longitudinal, multilevel, gllamm, generalized latent variable model
http://www.stata-journal.com/article.html?article=gn0031
Rory Wolfe
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:154-1672020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:154-167
article
Computing interaction effects and standard errors in logit and probit models
This paper explains why computing the marginal effect of a change in two variables is more complicated in nonlinear models than in linear models. The command inteff computes the correct marginal effect of a change in two interacted variables for a logit or probit model, as well as the correct standard errors. The inteff command graphs the interaction effect and saves the results to allow further investigation. Copyright 2004 by StataCorp LP.
inteff, interaction terms, logit, probit, nonlinear models
http://www.stata-journal.com/software/sj4-2/st0063/
http://www.stata-journal.com/sjpdf.html?articlenum=st0063
Edward C. Norton
Hua Wang
Chunrong Ai
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:442-4482020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:442-448
article
From the help desk: Seemingly unrelated regression with unbalanced equations
This article demonstrates how to estimate the parameters of a system of seemingly unrelated regressions when the equations are unbalanced, i.e., when the equations have an unequal number of observations. With estimators that require the data to be in wide format, such as Stata's sureg, the equations must be balanced. Any additional observations that are available for some equations, but not for all, are discarded, potentially resulting in a loss of efficiency. Reshaping and scaling the data allows us to use Stata's xtgee command to fit the model and obtain estimates utilizing all the available data. The resulting estimator is potentially more efficient when the equations are unbalanced. Copyright 2004 by StataCorp LP.
SUR, seemingly unrelated regression, unbalanced equations, generalized estimating equations
http://www.stata-journal.com/sjpdf.html?articlenum=st0079
Allen McDowell
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:164-1822020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:164-182
article
G-estimation of causal effects, allowing for time-varying confounding
This article describes the stgest command, which implements G-estimation (as proposed by Robins) to estimate the effect of a time-varying exposure on survival time, allowing for time-varying confounders. Copyright 2002 by Stata Corporation.
G-estimation, time-varying confounding, survival analysis
http://www.stata-journal.com/software/sj2-2/st0014/
http://www.stata-journal.com/sjpdf.html?articlenum=st0014
Jonathan A. C. Sterne
Kate Tilling
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:594-6002020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:594-600
article
Review of Data Analysis Using Stata by Kohler and Kreuter
The new book by Kohler and Kreuter (2005) is reviewed.
data analysis, introductory, teaching, GSOEP
http://stata-press.com/books/daus-review.pdf
http://www.stata-journal.com/article.html?article=gn0030
L. Philip Schumm
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:135-1362020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:135-136
article
Stata tip 17: Filling in the gaps
http://www.stata-journal.com/article.html?article=dm0011
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:429-4352020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:429-435
article
Confidence intervals for the variance component of random-effects linear models
We present the postestimation command xtvc to provide confidence intervals for the variance components of random-effects linear regression models. This command must be used after xtreg with option mle. Confidence intervals are based on the inversion of a score-based test (Bottai 2003). Copyright 2004 by StataCorp LP.
xtvc, variance components, confidence intervals, score test, random-effects linear models
http://www.stata-journal.com/software/sj4-4/st0077/
http://www.stata-journal.com/sjpdf.html?articlenum=st0077
Matteo Bottai
Nicola Orsini
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:446-4472020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:446-447
article
Stata tip 2: Building with floors and ceilings
http://www.stata-journal.com/sjpdf.html?articlenum=dm0002
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:1:p:71-802020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:1:p:71-80
article
From the help desk: Bootstrapped standard errors
Bootstrapping is a nonparametric approach for evaluating the distribution of a statistic based on random resampling. This article illustrates the bootstrap as an alternative method for estimating the standard errors when the theoretical calculation is complicated or not available in the current software. Copyright 2003 by Stata Corporation.
st0034, bootstrap, cluster, nl, instrumental variables
http://www.stata-journal.com/sjpdf.html?articlenum=st0034
Weihua Guan
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:537-5592020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:537-559
article
Confidence intervals for predicted outcomes in regression models for categorical outcomes
We discuss methods for computing confidence intervals for predictions and discrete changes in predictions for regression models for categorical outcomes. The methods include endpoint transformation, the delta method, and bootstrap- ping. We also describe an update to prvalue and prgen from the SPost package, which adds the ability to compute confidence intervals. The article provides several examples that illustrate the application of these methods. Copyright 2005 by StataCorp LP.
prvalue, prgen, confidence interval, predicted probability, dis- crete choice models, endpoint transformation, delta method, bootstrap
http://www.stata-journal.com/article.html?article=st0094
http://www.stata-journal.com/software/sj5-4/st0094/
Jun Xu
J. Scott Long
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:224-2252020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:224-225
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2004 by StataCorp LP.
http://www.stata-journal.com/software/sj4-2/gr0002_2/
http://www.stata-journal.com/software/sj4-2/st0004_2/
http://www.stata-journal.com/software/sj4-2/st0026_1/
http://www.stata-journal.com/software/sj4-2/st0030_1/
http://www.stata-journal.com/software/sj4-2/st0058_1/
http://www.stata-journal.com/software/sj4-2/sg150_1/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:151-1632020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:151-163
article
A menu-driven facility for complex sample size calculation in randomized controlled trials with a survival or a binary outcome
We present a menu-driven Stata program for the calculation of sample size or power for complex clinical trials with a survival time or a binary outcome. The features supported include up to six treatment arms, an arbitrary time-to- event distribution, fixed or time-varying hazard ratios, unequal patient allocation, loss to follow-up, staggered patient entry,and crossover of patients from their allocated treatment to an alternative treatment. The computations of sample size and power are based on the logrank test and are done according to the asymptotic distribution of the logrank test statistic, adjusted appropriately for the design features. Copyright 2002 by Stata Corporation.
randomized controlled trials, survival analysis, logrank test, experimental design
http://www.stata-journal.com/software/sj2-2/st0013/
http://www.stata-journal.com/sjpdf.html?articlenum=st0013
Patrick Royston
Abdel Babiker
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:98-1012020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:98-101
article
Flexible parametric alternatives to the Cox model: update
Royston (2001)and Royston and Parmar (2002) introduced flexible parametric models for survival analysis, implemented in Stata through the ado-file stpm (Royston 2001). In the present article, stpm is updated to Stata 8.1 and has been shown to work correctly with Stata 8.2. To increase the reliability of the estimation procedure, the basis functions of the splines used to approximate the baseline distribution function have been orthogonalized. Copyright 2004 by StataCorp LP.
parametric survival analysis, proportional hazards, proportional odds, regression splines, orthogonal basis functions
http://www.stata-journal.com/software/sj4-1/st0001_2/
http://www.stata.com/bookstore/sjj.html
Patrick Royston
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:97-1052020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:97-105
article
Goodness-of-fit test for a logistic regression model fitted using survey sample data
After a logistic regression model has been fitted, a global test of goodness of fit of the resulting model should be performed. A test that is commonly used to assess model fit is the Hosmer-Lemeshow test, which is available in Stata and most other statistical software programs. However, it is often of interest to fit a logistic regression model to sample survey data, such as data from the National Health Interview Survey or the National Health and Nutrition Examination Survey. Unfortunately, for such situations no goodness-of-fit testing procedures have been developed or implemented in available software. To address this problem, a Stata ado-command, svylogitgof, for estimating the F -adjusted mean residual test after svy: logit or svy: logistic estimation has been developed, and this paper describes its implementation. Copyright 2006 by StataCorp LP.
svylogitgof, goodness of fit, survey design, svy, logistic regression, logit
http://www.stata-journal.com/article.html?article=st0099
http://www.stata-journal.com/software/sj6-1/st0099/
Kellie J. Archer
Stanley Lemeshow
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:361-3782020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:361-378
article
Simple thematic mapping
Thematic maps illustrate the spatial distribution of one or more variables of interest within a given geographic unit. In a sense, a thematic map is the spatial analyst's equivalent to the scatterplot in nonspatial analysis. This paper presents the tmap package, a set of Stata programs designed to draw five kinds of thematic maps: choropleth, proportional symbol, deviation, dot, and label maps. The first three kinds of maps are intended to depict area data, the fourth is suitable for representing point data, and the fifth can be used to show data of both types. Copyright 2004 by StataCorp LP.
tmap choropleth, tmap propsymbol, tmap deviation, tmap dot, tmap label, thematic map, choropleth map, proportional symbol map, deviation map, dot map, label map
http://www.stata-journal.com/software/sj4-4/gr0008/
http://www.stata-journal.com/sjpdf.html?articlenum=gr0008
Maurizio Pisati
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:488-4892020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:488-489
article
Stata tip 15: Function graphs on the fly
http://www.stata-journal.com/sjpdf.html?articlenum=gr0010
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:226-2442020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:226-244
article
Tools for analyzing multiple imputed datasets
The method of multiple imputation (MI) is used increasingly for analyzing datasets with missing observations. Two sets of tasks are required in order to implement the method: (a) generating multiple complete datasets in which missing values have been imputed by simulating from an appropriate probability distribution and (b) analyzing the multiple imputed datasets and combining complete data inferences from them to form an overall inference for parameters of interest. An increasing number of software tools are available for task (a), although this is difficult to automate, because the method of imputation should depend on the context and available covariate data. When the quantity of missing data is not great, the sensitivity of results to the imputation model may be relatively low. In this context, software tools that enable task (b) to be performed with similar ease to the analysis of a single dataset should facilitate the wider use of multiple imputation. Such tools need not only to implement techniques for inference from multiple imputed datasets but also to allow standard manipulations such as transformation and recoding of variables. In this article, we describe a set of Stata commands that we have developed for manipulating and analyzing multiple datasets. Copyright 2003 by StataCorp LP.
missing data, multiple imputation, RubinÕs rule of combination, overall estimates
http://www.stata-journal.com/software/sj3-3/st0042/
http://www.stata-journal.com/sjpdf.html?articlenum=st0042
John B. Carlin
Ning Li
Philip Greenwood
Carolyn Coffey
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:492-4982020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:492-498
article
Cumulative index, volumes 1-4
http://www.stata-journal.com/abstracts/gn0016.pdf
Editors
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:35-372020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:35-37
article
A short history of Statistics with Stata
http://www.stata-journal.com/sjpdf.html?articlenum=gn0020
Lawrence Hamilton
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:527-5362020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:527-536
article
Multiple imputation of missing values: Update of ice
http://www.stata-journal.com/article.html?article=st0067_2
http://www.stata-journal.com/software/sj5-4/st0067_2/
Patrick Royston
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:1-262020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:1-26
article
Analyzing distances
Gower and Krzanowski (1999) described the analysis of a multivariate dataset that violated the assumptions of normality for multivariate analysis of variance. They developed a method comprising two aspects: a graphical representation of the points in a fewer number of dimensions, known as principal coordinate analysis; and a technique similar to MANOVA except that it was based on partitioning the distances between subjects, rather than sums of squares, and did not assume that the data followed any particular distribution. This article summarizes both aspects of their analysis and describes the Stata pco and aod commands, which perform principal coordinate analysis and analysis of distance. Copyright 2004 by StataCorp LP.
multivariate data, principal coordinate analysis, multidimensional scaling, Euclidean distance, GowerÕs general coefficient of similarity, eigenvectors, analysis of distance, MANOVA, partitioning of squared distance, randomization tests, contrasts, idempotent matrices
http://www.stata-journal.com/software/sj4-1/st0055/
http://www.stata-journal.com/sjpdf.html?articlenum=st0055
Justin Fenty
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:280-2812020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:280-281
article
Stata tip 20: Generating histogram bin variables
http://www.stata-journal.com/article.html?article=gr0014
David A. Harrison
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:202-2072020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:202-207
article
Estimation and testing of fixed-effect panel-data systems
This paper describes how to specify, estimate, and test multipleequation, fixed-effect, panel-data equations in Stata. By specifying the system of equations as seemingly unrelated regressions, Stata panel-data procedures worked seamlessly for estimation and testing of individual variable coefficients, but additional routines using test were needed for testing of individual equations and differences between equations. Copyright 2005 by StataCorp LP.
panel data, fixed effect, multiple equations, seemingly unrelated regressions, heteroskedasticity, autocorrelation, contemporaneous correlation, tests of linear combinations
http://www.stata-journal.com/article.html?article=st0084
http://www.stata-journal.com/software/sj5-2/st0084/
J. Lloyd Blackwell, III
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:4452020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:445
article
Stata tip 1: The eform() option of regress
http://www.stata-journal.com/sjpdf.html?articlenum=st0054
Roger Newson
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:296-3002020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:296-300
article
Least likely observations in regression models for categorical outcomes
This article presents a method and program for identifying poorly fitting observations for maximum-likelihood regression models for categorical dependent variables. After estimating a model, the program leastlikely will list the observations that have the lowest predicted probabilities of observing the value of the outcome category that was actually observed. For example,when run after estimating a binary logistic regression model,leastlikely will list the observations with a positive outcome that had the lowest predicted probabilities of a positive outcome and the observations with a negative outcome that had the lowest predicted probabilities of a negative outcome. These can be considered the observations in which the outcome is most surprising given the values of the independent variables and the parameter estimates and, like observations with large residuals in ordinary least squares regression, may warrant individual inspection. Use of the program is illustrated with examples using binary and ordered logistic regression. Copyright 2002 by Stata Corporation.
outliers,predicted probabilities,categorical dependent variables,logistic regression
http://www.stata-journal.com/software/sj2-3/st0022/
http://www.stata-journal.com/sjpdf.html?articlenum=st0022
Jeremy Freese
oai:RePEc:tsj:stataj:v:2:y:2002:i:1:p:65-702020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:1:p:65-70
article
Analysis of quantitative traits using regression and log-linear modeling when phase is unknown
This function models the relationship between quantitative trait and the genotype of a person. It introduces a new syntax for model specification, which is necessary because when phase is unknown, the explanatory variables for the linear regression are never observed. The data must be a population-based sample because within family effects are not modeled. Copyright 2002 by Stata Corporation.
haplotype analysis, association studies, phase-unknown, linear regression
http://www.stata-journal.com/software/sj2-1/st0008/
http://www.stata-journal.com/sjpdf.html?articlenum=st0008
A. P. Mander
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:342-3502020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:342-350
article
Variance estimation for the instrumental variables approach to measurement error in generalized linear models
This paper derives and gives explicit formulas for a derived sandwich variance estimate. This variance estimate is appropriate for generalized linear additive measurement error models fitted using instrumental variables. We also generalize the known results for linear regression. As such, this article explains the theoretical justification for the sandwich estimate of variance utilized in the software for measurement error developed under the Small Business Innovation Research Grant (SBIR) by StataCorp. The results admit estimation of variance matrices for measurement error models where there is an instrument for the unknown covariate. Copyright 2003 by StataCorp LP.
sandwich estimate of variance, measurement error, White's estimator, robust variance, generalized linear models, instrumental variables
http://www.stata-journal.com/sjpdf.html?articlenum=st0048
James W. Hardin
Raymond J. Carroll
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:109-1322020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:109-132
article
Multiple-test procedures and smile plots
multproc carries out multiple-test procedures, taking as input a list of p-values and an uncorrected critical p-value, and calculating a corrected overall critical pvalue for rejection of null hypotheses. These procedures define a conÞdence region for a set-valued parameter, namely the set of null hypotheses that are true. They aim to control either the family-wise error rate (FWER) or the false discovery rate (FDR) at a level no greater than the uncorrected critical p-value. smileplot calls multproc and then creates a smile plot, with data points corresponding to estimated parameters, the p-values (on a reverse log scale) on the y-axis, and the parameter estimates (or another variable) on the x-axis. There are y-axis reference lines at the uncorrected and corrected overall critical p-values. The reference line for the corrected overall critical p-value, known as the parapet line, is an informal Òupper confidence limitÓ for the set of null hypotheses that are true and defines a boundary between data mining and data dredging. A smile plot summarizes a set of multiple analyses just as a Cochrane forest plot summarizes a meta-analysis. Copyright 2003 by Stata Corporation.
smile plot, multiple-test procedure, closed testing procedure, data mining, family-wise error rate, false discovery rate, Bonferroni, Sidak, Holm, Holland, Copenhaver, Hochberg, Rom, Simes, Benjamini, Yekutieli, Krieger, Liu
http://www.stata-journal.com/software/sj3-2/st0035/
http://www.stata-journal.com/sjpdf.html?articlenum=st0035
Roger Newson
The ALSPAC Study Team
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:428-4312020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:428-431
article
Review of An Introduction to Survival Analysis Using Stata
The new book by Cleves et al. (2002) is reviewed. Copyright 2002 by Stata Corporation.
http://www.stata-journal.com/sjpdf.html?articlenum=gn0004
David W. Hosmer
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:29-502020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:29-50
article
Statistical software certification
In the rst part (sections 1 and 2), this article describes the automated process that is used to certify Stata prior to shipment of new releases or updates. The second part (section 3) describes how these techniques can be adopted to your own work. Copyright 2001 by Stata Corporation.
certification scripts, ado-files, cscript
http://www.stata-journal.com/sjpdf.html?articlenum=pr0001
William Gould
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:461-4642020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:461-464
article
Review of Statistics for Epidemiology by Jewell
The new book by Jewell (2004) is reviewed.
epidemiology, biostatistics
http://www.stata-journal.com/article.html?article=gn0029
Rino Bellocco
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:942020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:94
article
Stata tip 5: Ensuring programs preserve dataset sort order
http://www.stata-journal.com/sjpdf.html?articlenum=dm0005
Roger Newson
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:245-2692020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:245-269
article
Confidence intervals and p-values for delivery to the end user
Statisticians make their living producing confidence intervals and pvalues. However, those in the Stata log are not ready for delivery to the end user, who usually wants to see statistical output either as a plot or as a table. This article describes a suite of programs used to convert Stata results to one or other of these forms. The eclplot package creates plots of estimates with conÞdence intervals, and the listtex package outputs a Stata dataset in the form of table rows that can be inserted into a plain TEX, LATEX, HTML, or word processor table. To create a Stata dataset that can be output in these ways, we can use the parmest, dsconcat, and lincomest packages to create datasets with one observation per estimated parameter; the sencode, tostring, ingap, and reshape packages to process these datasets into a form ready to be output; and the descsave and factext packages to reconstruct, in the output dataset, categorical predictor variables represented by dummy variables in regression models. Copyright 2003 by StataCorp LP.
confidence interval, p-value, plot, table, estimation results, TEX, LATEX, HTML, word processor, presentation, eclplot, listtex, parmest, dsconcat, lincomest, sencode, tostring, ingap, reshape, descsave, factext
http://www.stata-journal.com/software/sj3-3/st0043/
http://www.stata-journal.com/sjpdf.html?articlenum=st0043
Roger Newson
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:490-4912020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:490-491
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2004 by StataCorp LP.
http://www.stata-journal.com/software/sj4-4/gr3_2/
http://www.stata-journal.com/software/sj4-4/gr22_1/
http://www.stata-journal.com/software/sj4-4/gr0001_1/
http://www.stata-journal.com/software/sj4-4/sbe36_2/
http://www.stata-journal.com/software/sj4-4/sg159_1/
http://www.stata-journal.com/software/sj4-4/sqv10_1/
http://www.stata-journal.com/software/sj4-4/st0015_1/
http://www.stata-journal.com/software/sj4-4/st0038_1/
http://www.stata-journal.com/software/sj4-4/st0058_2/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:501-5162020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:501-516
article
Extended generalized linear models: Simultaneous estimation of flexible link and variance functions
I describe a command that simultaneously solves the extended es- timating equations estimator for parameters in the link and variance functions along with those of the linear predictor in a generalized linear model. The method addresses difficulties in choosing the correct link and variance functions in these models. It decouples the scale of estimation for the mean model, determined by the link function, from the scale of interest for the scientifically relevant effects. It also estimates a flexible variance structure from the data, leading to efficient estimation. Copyright 2005 by StataCorp LP.
pglm, pglmpredict, EEE, GLM, skewed, costs, estimating equations, link functions, variance functions
http://www.stata-journal.com/article.html?article=st0092
http://www.stata-journal.com/software/sj5-4/st0092/
Anirban Basu
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:123-1292020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:123-129
article
A menu-driven facility for complex sample size calculation in randomized controlled trials with a survival or a binary outcome: Update
Royston and Babiker (2002) presented a menu-driven Stata program for the calculation of sample size or power for complex clinical trial designs under a survival time or binary outcome. In the present article, the package is updated to Stata 8 under the new name ART. Furthermore, the program has been extended to incorporate noninferiority designs and provides more detailed output. This package is the only realistic sample size tool for survival studies available in Stata. Copyright 2005 by StataCorp LP.
sample size, power, randomized controlled trial, multiarm designs, survival analysis
http://www.stata-journal.com/software/sj5-1/st0013_1/
http://www.stata-journal.com/article.html?article=st0013_1
Friederike M.-S. Barthel
Patrick Royston
Abdel Babiker
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:1342020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:134
article
Stata tip 16: Using input to generate variables
http://www.stata-journal.com/article.html?article=dm0010
Ulrich Kohler
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:1512020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:151
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2006 by StataCorp LP.
http://www.stata-journal.com/software/sj6-1/sg2_1/
http://www.stata-journal.com/software/sj6-1/gr26_1/
http://www.stata-journal.com/article.html?article=up0014
Editors
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:385-3942020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:385-394
article
A simple approach to fit the beta-binomial model
In this paper, I show how to estimate the parameters of the beta- binomial distribution and its multivariate generalization, the Dirichlet-multinomial distribution. This approach involves no additional programming, as it relies on an existing Stata command used for overdispersed count panel data. Including covariates to allow for regression models based in these distributions is straightforward. Copyright 2005 by StataCorp LP.
overdispersion, beta binomial, Dirichlet multinomial, fixed-effects negative binomial
http://www.stata-journal.com/article.html?article=st0089
http://www.stata-journal.com/software/sj5-3/st0089/
Paulo Guimarães
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:436-4412020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:436-441
article
Boolean logit and probit in Stata
This paper introduces new statistical models, Boolean logit and probit, that allow researchers to model binary outcomes as the results of Boolean interactions among independent causal processes. Each process (or 'causal path') is modeled as the unobserved outcome in a standard logit or probit equation, and the dependent variable is modeled as the observed product of their Boolean interaction. Up to five causal paths can be modeled, in any combination: A and B and C produce Y, A and (B or [C and D]) produce Y, etc. Copyright 2004 by StataCorp LP.
mlboolean, dichotomous dependent variable, Boolean, logit, probit, multiple causal paths, complexity, random utility
http://www.stata-journal.com/software/sj4-4/st0078/
http://www.stata-journal.com/sjpdf.html?articlenum=st0078
Bear F. Braumoeller
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:190-2012020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:190-201
article
From the help desk: It's all about the sampling
Effective estimation and inference, when the data are collected using complex survey designs, requires estimators that fully account for the sampling design. This article explores, by means of Monte Carlo simulations of the power of simple hypothesis tests, the consequences of parameter estimation and inference when naive estimators are employed with survey data. Copyright 2002 by Stata Corporation.
cluster, design, power, strata, svy, svymean, svyset
http://www.stata-journal.com/sjpdf.html?articlenum=st0016
Allen McDowell
Jeff Pitblado
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:405-4122020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:405-412
article
A multivariable scatterplot smoother
We present an extension of Sasieni, Royston, and Cox’s bivariate smoother running to the multivariable context. The software aims to provide a picture of the relation between a response variable and each of several continu- ous predictors simultaneously. This may be a valuable tool in exploratory data analysis, before constructing a more formal multiple regression model. Copyright 2005 by StataCorp LP.
mrunning, running, scatterplot smoothing, multivariable regression analysis, running line
http://www.stata-journal.com/article.html?article=gr0017
http://www.stata-journal.com/software/sj5-3/gr0017/
Patrick Royston
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:373-3852020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:373-385
article
The regression-calibration method for fitting generalized linear models with additive measurement error
We discuss and illustrate the method of simulation extrapolation for fitting models with additive measurement error. We present this discussion in terms of generalized linear models (GLMs) following the notation defined in Hardin and Carroll (2003). As in Hardin, Schmiediche, and Carroll (2003), our discussion includes specified measurement error and measurement error estimated by replicate error-prone proxies. In addition, we discuss and illustrate three extrapolant functions. Copyright 2003 by StataCorp LP.
simulation extrapolation, measurement error, instrumental variables, replicate measures, generalized linear models
http://www.stata-journal.com/software/sj3-4/st0051/
http://www.stata-journal.com/sjpdf.html?articlenum=st0051
James W. Hardin
Henrik Schmeidiche
Raymond J. Carroll
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:180-1892020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:180-189
article
From the help desk: Polynomial distributed lag models
Polynomial distributed lag models (PDLs) are finite-order distributed lag models with the impulse-response function constrained to lie on a polynomial of known degree. You can estimate the parameters of a PDL directly via constrained ordinary least squares, or you can derive a reduced form of the model via a linear transformation of the structural model, estimate the reduced-form parameters, and recover estimates of the structural parameters via an inverse linear transformation of the reduced-form parameter estimates. This article demonstrates both methods using Stata. Copyright 2004 by StataCorp LP.
polynomial distributed lag, Almon, Lagrangian interpolation polynomials
http://www.stata-journal.com/sjpdf.html?articlenum=st0065
Allen McDowell
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:302-3082020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:302-308
article
From the help desk: Swamy's random-coefficients model
This article discusses the Swamy (1970) random-coefficients model and presents a command that extends Stata's xtrchh command by also providing estimates of the panel-specific coefficients. Copyright 2003 by StataCorp LP.
panel data, random-coefficients models
http://www.stata-journal.com/software/sj3-3/st0046/
http://www.stata-journal.com/sjpdf.html?articlenum=st0046
Brian P. Poi
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:168-1772020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:168-177
article
Testing for serial correlation in linear panel-data models
Because serial correlation in linear panel-data models biases the standard errors and causes the results to be less efficient, researchers need to identify serial correlation in the idiosyncratic error term in a panel-data model. A new test for serial correlation in random- or fixed-effects one-way models derived by Wooldridge (2002) is attractive because it can be applied under general conditions and is easy to implement. This paper presents simulation evidence that the new Wooldridge test has good size and power properties in reasonably sized samples. Copyright 2003 by Stata Corporation.
panel data, serial correlation, specification tests
http://www.stata-journal.com/software/sj3-2/st0039/
http://www.stata-journal.com/sjpdf.html?articlenum=st0039
David M. Drukker
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:280-2892020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:280-289
article
Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve
The area under the receiver operating characteristic (ROC) curve is often used to summarize and compare the discriminatory accuracy of a diagnostic test or modality,and to evaluate the predictive power of statistical models for binary outcomes. Parametric maximum likelihood methods for Þtting of the ROC curve provide direct estimates of the area under the ROC curve and its variance. Nonparametric methods, on the other hand, provide estimates of the area under the ROC curve, but do not directly estimate its variance. Three algorithms for computing the variance for the area under the nonparametric ROC curve are commonly used, although ambiguity exists about their behavior under diverse study conditions. Using simulated data, we found similar asymptotic performance between these algorithms when the diagnostic test produces results on a continuous scale, but found notable differences in small samples, and when the diagnostic test yields results on a discrete diagnostic scale. Copyright 2002 by Stata Corporation.
receiver operating characteristic (ROC )curve,trapezoidal rule, sensitivity,specificity,discriminatory accuracy,predictive power
http://www.stata-journal.com/sjpdf.html?articlenum=st0020
Mario A. Cleves
oai:RePEc:tsj:stataj:v:2:y:2002:i:1:p:103-1052020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:1:p:103-105
article
Review of Regression Models for Categorical Dependent Variables Using Stata by Long and Freese
The new book Long and Freese (2001) is reviewed. Copyright 2002 by Stata Corporation.
categorical data, regression models
http://www.stata-journal.com/sjpdf.html?articlenum=gn0002
John Hendrickx
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:124-1262020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:124-126
article
Submenus and dialogs for meta-analysis commands
The metadialog package provides Stata dialog boxes for the publicly available meta-analysis commands. It includes the commands needed to create a Meta-Analysis submenu on the StataCorp-defined User menu.
dialog, menu, meta-analysis
http://www.stata-journal.com/software/sj4-2/pr0012/
http://www.stata-journal.com/sjpdf.html?articlenum=pr0012
Thomas J. Steichen
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:141-1532020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:141-153
article
Exploratory analysis of single nucleotide polymorphism (SNP) for quantitative traits
With the decreasing cost and the increasing ability to quickly genotype single nucleotide polymorphisms (SNP) across the human genome, large databases containing possibly hundreds of typed SNPs are becoming common in populationbased studies of quantitative traits. Testing for association between individual SNPs and the quantitative trait is an important first step in the discovery of disease susceptibility SNPs. This task, however, could be time-consuming and tediousifalargenumberofSNPs is involved. In this article, I introduce two new commands designed to facilitate the screening and testing of multiple SNPsfor possible association with quantitative traits. Copyright 2005 by StataCorp LP.
hwsnp, qtlsnp, genetic epidemiology, genetic linkage, QTL, biallelic marker, single nucleotide polymorphisms, Hardy–Weinberg
http://www.stata-journal.com/article.html?article=st0083
http://www.stata-journal.com/software/sj5-2/st0083/
Mario A. Cleves
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:168-1792020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:168-179
article
Confidence intervals for kernel density estimation
This article describes asciker and bsciker, two programs that enrich the possibility for density analysis using Stata. asciker and bsciker compute asymptotic and bootstrap confidence intervals for kernel density estimation, respectively, based on the theory of kernel density confidence intervals estimation developed in Hall (1992b) and Horowitz (2001). asciker and bsciker allow several options and are compatible with Stata 7 and Stata 8, using the appropriate graphics engine under both versions. Copyright 2004 by StataCorp LP.
asciker, bsciker, vkdensity, kernel density asymptotic confidence intervals, kernel density bootstrap confidence intervals, asymptotic bias
http://www.stata-journal.com/software/sj4-2/st0064/
http://www.stata-journal.com/sjpdf.html?articlenum=st0064
Carlo V. Fiorio
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:83-962020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:83-96
article
Explained variation for survival models
This article introduces a new measure of explained variation for use with censored survival data. It is a modified version of a measure previously described by John O'Quigley and colleagues, itself a modification of Nagelkerke’s earlier proposal for a general index of determination. I describe Stata programs str2ph, which implements the new measure, and str2d, which implements a measure proposed in 2004 by Royston and Sauerbrei. I provide examples with real data. Copyright 2006 by StataCorp LP.
censored survival data, regression models, index of determina- tion, explained variation, explained randomness, information gain
http://www.stata-journal.com/article.html?article=st0098
http://www.stata-journal.com/software/sj6-1/st0098/
Patrick Royston
oai:RePEc:tsj:stataj:v:2:y:2002:i:1:p:86-1022020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:1:p:86-102
article
Speaking Stata: How to move step by: step
The by varlist: construct is reviewed, showing how it can be used to tackle a variety of problems with group structure. These range from simply doing some calculations for each of several groups of observations to doing more advanced manipulations making use of the fact that with this construct, subscripts and the built-ins _n and _N are all interpreted within groups. A fairly complete tutorial on numerical evaluation of true and false conditions is included. Copyright 2002 by Stata Corporation.
by, sorting, subscripts, true and false
http://www.stata-journal.com/sjpdf.html?articlenum=pr0004
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:144-1462020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:144-146
article
Stata tip 28: Precise control of dataset sort order
http://www.stata-journal.com/article.html?article=dm0019
L. Philip Schumm
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:4692020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:469
article
Stata tip 24: Axis labels on two or more levels
http://www.stata-journal.com/article.html?article=gr0020
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:574-5932020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:574-593
article
Speaking Stata: Smoothing in various directions
Identifying patterns in bivariate data on a scatterplot remains a ba- sic statistical problem, with special flavor when both variables are on the same footing. Ideas of double, diagonal, and polar smoothing inspired by Cleveland and McGill’s 1984 paper in the Journal of the American Statistical Association are revisited with various examples from environmental datasets. Double smooth- ing means smoothing both y given x and x given y. Diagonal smoothing means smoothing based on the sum and difference of y and x that treats the two variables symmetrically, possibly under standardization. Polar smoothing is based on the transformation from Cartesian to polar coordinates followed by smoothing and then reverse transformation; here the smoothing is implemented by regression on a series of sine and cosine terms. These methods thus offer exploratory tools for determining the broad structure of bivariate data.
exploratory data analysis, statistical graphics, bivariate data, double smoothing, doublesm, diagonal smoothing, diagsm, polar smoothing, polarsm
http://www.stata-journal.com/article.html?article=gr0021
http://www.stata-journal.com/software/sj5-4/gr0021/
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:3:y:2003:i:3:p:213-2252020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:3:p:213-225
article
Odds ratios and logistic regression: further examples of their use and interpretation
Logistic regression is perhaps the most widely used method for adjustment of confounding in epidemiologic studies. Its popularity is understandable. The method can simultaneously adjust for confounders measured on different scales; it provides estimates that are clinically interpretable; and its estimates are valid in a variety of study designs with few underlying assumptions. To those of us in practice settings, several aspects of applying and interpreting the model, however, can be confusing and counterintuitive. We attempt to clarify some of these points through several examples. We apply the method to a study of risk factors associated with periventricular leucomalacia and intraventricular hemorrhage in neonates. We relate the logit model to Cornfield's 2x2 table and discuss its application to both cohort and case-control study design. Interpretations of odds ratios, relative risk, and beta_0 from the logit model are presented. Copyright 2003 by StataCorp LP.
cc, cci, cs, csi, logistic, logit, relative risk, case-control study, odds ratio, cohort study
http://www.stata-journal.com/sjpdf.html?articlenum=st0041
Susan M. Hailpern
Paul F. Visintainer
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:567-5732020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:567-573
article
Mata Matters: Using views onto the data
Mata is Stata’s matrix language. In the Mata Matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. In this issue’s column, we explore view matrices, matrices that are views of the underlying Stata dataset rather than copies of it.
Mata, views, memory
http://www.stata-journal.com/article.html?article=pr0019
William Gould
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:1-282020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:1-28
article
Flexible alternatives to the Cox model, and more
Since its introduction to a wondering public in 1972, the Cox proportional hazards regression model has become an overwhelmingly popular tool in the analysis of censored survival data. However, some features of the Cox model may cause problems for the analyst or an interpreter of the data. They include the restrictive assumption of proportional hazards for covariate effects, and "loss" (non-estimation) of the baseline hazard function induced by conditioning on event times. In medicine, the hazard function is often of fundamental interest since it represents an important aspect of the time course of the disease in question. In the present article, the Stata implementation of a class of exible parametric survival models recently proposed by Royston and Parmar (2001) will be described. The models start by assuming either proportional hazards or proportional odds (user- selected option). The baseline distribution function is modeled by restricted cubic regression spline in log time, and parameter estimation is by maximum likelihood. Model selection and choice of knots for the spline function are discussed. Interval- censored data and models in which one or more covariates have non-proportional effects are also supported by the software. Examples based on a study of prognostic factors in breast cancer are given. Copyright 2001 by Stata Corporation.
parametric survival analysis, hazard function, proportional hazards, proportional odds
http://www.stata-journal.com/software/sj2-2/st0001_1/
http://www.stata-journal.com/sjpdf.html?articlenum=st0001
Patrick Royston
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:50-552020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:50-55
article
Standardizing anthropometric measures in children and adolescents with new functions for egen
A new function for egen has been developed to allow transformation of child anthropometric data to z-scores using the LMS method and the reference data available from the 1990 British Growth Reference and the 2000 US CDC Growth Reference. An additional function allows for children to be categorized according to body mass index (weight/height 2) using international cutoff points recommended by the Childhood Obesity Working Group of the International Obesity Taskforce. Copyright 2004 by StataCorp LP.
z-scores, LMS,egen
http://www.stata-journal.com/software/sj4-1/dm0004/
http://www.stata-journal.com/sjpdf.html?articlenum=dm0004
Suzanna Vidmar
John Carlin
Kylie Hesketh
Tim Cole
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:157-1672020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:157-167
article
CDSIMEQ: A program to implement two-stage probit least squares
The cdsimeq command implements the two-stage probit least squares estimation method described in Maddala (1983) for simultaneous equations models in which one of the endogenous variables is continuous and the other endogenous variable is dichotomous. The cdsimeq command implements all the necessary procedures for obtaining consistent estimates for the coefficients, as well as their corrected standard errors. Copyright 2003 by Stata Corporation.
simultaneous equations, Amemiya, Maddala, continuous endogenous, dichotomous endogenous, 2SPLS, 2SLS, instruments, standard errors
http://www.stata-journal.com/software/sj3-2/st0038/
http://www.stata-journal.com/sjpdf.html?articlenum=st0038
Omar M. G. Keshk
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:413-4202020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:413-420
article
Depending on conditions: a tutorial on the cond() function
This is a tutorial on the cond() function, giving explanations and examples and assessing its advantages and limitations. Copyright 2005 by StataCorp LP.
cond(), functions, if command, if qualifier, generate, replace
http://www.stata-journal.com/article.html?article=pr0016
David Kantor
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:4:p:6032020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:4:p:603
article
Stata tip 26: Maximizing compatibility between Macintosh and Windows
http://www.stata-journal.com/article.html?article=dm0017
Michael S. Hanson
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:253-2662020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:253-266
article
The robust variance estimator for two-stage models
This article discusses estimates of variance for two-stage models. We present the sandwich estimate of variance as an alternative to the Murphy-Topel estimate. The sandwich estimator has a simple formula that is similar to the formula for the Murphy ÐTopel estimator,and the two estimators are asymptotically equal when the assumed model distributions are true. The advantages of the sandwich estimate of variance are that it may be calculated for the complete parameter vector, and that it requires estimating equations instead of fully specified log likelihoods. Copyright 2002 by Stata Corporation.
robust variance estimator,Murphy-Topel estimator,two-stage estimation,estimating equation
http://www.stata-journal.com/sjpdf.html?articlenum=st0018
James W. Hardin
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:350-3532020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:350-353
article
Review of A Handbook of Statistical Analyses Using Stata by Rabe-Hesketh and Everitt
The new edition of the book by Rabe-Hesketh and Everitt (2004) is reviewed.
introductory, gllamm, Stata texts
http://www.stata-journal.com/sjpdf.html?articlenum=gn0013
Nicholas Winter
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:51-572020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:51-57
article
Predicted probabilities for count models
The post-estimation command prcounts for generating predicted probabilities after using poisson, nbreg, zip, and zinb is introduced and illustrated. Copyright 2001 by Stata Corporation.
predicted probabilities, count models
http://www.stata-journal.com/software/sj1-1/st0002/
http://www.stata-journal.com/sjpdf.html?articlenum=st0002
J. Scott Long
Jeremy Freese
oai:RePEc:tsj:stataj:v:2:y:2002:i:1:p:45-642020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:1:p:45-64
article
Parameters behind "nonparametric" statistics: Kendall's tau,Somers' D and median differences
So-called "nonparametric" statistical methods are often in fact based on population parameters, which can be estimated (with confidence limits) using the corresponding sample statistics. This article reviews the uses of three such parameters, namely Kendall's tau, Somers' D and the Hodges-Lehmann median difference. Confidence intervals for these are demonstrated using the somersd package. It is argued that confidence limits for these parameters, and their differences,are more informative than the traditional practice of reporting only p-values. These three parameters are also important in defining other tests and parameters, such as the Wilcoxon test, the area under the receiver operating characteristic (ROC) curve, Harrell's C, and the Theil median slope. Copyright 2002 by Stata Corporation.
confidence intervals, Gehan test, Harrell's C , Hodges-Lehmann median difference, Kendall's tau, nonparametric methods, rank correlation, rank-sum test, ROC area, Somers' D, Theil median slope, Wilcoxon test
http://www.stata-journal.com/sjpdf.html?articlenum=st0007
Roger Newson
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:185-2022020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:185-202
article
Speaking Stata: Problems with lists
Various problems in working through lists are discussed in view of changes in Stata 8. for is now undocumented, which provokes a detailed examination of ways of processing lists in parallel with foreach, forvalues, and other devices, including new, concise ways of incrementing and decrementing macros and evaluating other expressions to do with macros in place. New features for manipulating lists held in macros and the new levels command are also reviewed. Copyright 2003 by Stata Corporation.
lists, for, foreach, forvalues, levels, macros, tokenize
http://www.stata-journal.com/sjpdf.html?articlenum=pr0009
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:282-2842020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:282-284
article
Stata tip 21: The arrows of outrageous fortune
http://www.stata-journal.com/article.html?article=gr0015
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:282-2892020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:282-289
article
Maximum likelihood estimation of endogenous switching regression models
This article describes the movestay Stata command, which implements the maximum likelihood method to fit the endogenous switching regression model. Copyright 2004 by StataCorp LP.
movestay, endogenous variables, maximum likelihood, limited dependent variables, switching regression
http://www.stata-journal.com/software/sj4-3/st0071/
http://www.stata-journal.com/sjpdf.html?articlenum=st0071
Michael Lokshin
Zurab Sajaia
oai:RePEc:tsj:stataj:v:4:y:2004:i:1:p:56-652020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:1:p:56-65
article
From the help desk: Kaplan-Meier plots with stsatrisk
stsatrisk is a wrapper for sts graph that adds a table to a survival plot with at-risk information, making it easy to create graphs that follow the list of recommendations given by Pocock et al. (2002) for KaplanÐMeier plots. We use stsatrisk to create plots in the desired format with the desired information. Copyright 2004 by StataCorp LP.
stsatrisk, KaplanÐMeier, survival plots
http://www.stata-journal.com/software/sj4-1/st0058/
http://www.stata-journal.com/sjpdf.html?articlenum=st0058
Jean Marie Linhart
Jeffrey S. Pitblado
James Hassell
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:330-3542020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:330-354
article
Boosted regression (boosting): An introductory tutorial and a Stata plugin
Boosting, or boosted regression, is a recent data-mining technique that has shown considerable success in predictive accuracy. This article gives an overview of boosting and introduces a new Stata command, boost, that im- plements the boosting algorithm described in Hastie, Tibshirani, and Friedman (2001, 322). The plugin is illustrated with a Gaussian and a logistic regression example. In the Gaussian regression example, the R2 value computed on a test dataset is R2 = 21.3% for linear regression and R2 = 93.8% for boosting. In the logistic regression example, stepwise logistic regression correctly classifies 54.1% of the observations in a test dataset versus 76.0% for boosted logistic regression. Currently, boost accommodates Gaussian (normal), logistic, and Poisson boosted regression. boost is implemented as a Windows C++ plugin. Copyright 2005 by StataCorp LP.
boost, boosted regression, boosting, data mining
http://www.stata-journal.com/article.html?article=st0087
http://www.stata-journal.com/software/sj5-3/st0087/
Matthias Schonlau
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:472-4732020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:472-473
article
Stata tip 101: Previous but different
http://www.stata-journal.com/article.html?article=dm0059
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:41-422020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:41-42
article
The history of StataQuest
http://www.stata-journal.com/sjpdf.html?articlenum=gn0023
J. Theodore Anagnoson
oai:RePEc:tsj:stataj:v:3:y:2003:i:2:p:133-1472020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:2:p:133-147
article
Exploring the use of variable bandwidth kernel density estimators
Variable bandwidth kernel density estimators increase the window width at low densities and decrease it where data concentrate. This represents an improvement over the fixed bandwidth kernel density estimators. In this article, we explore the use of one implementation of a variable kernel estimator in conjunction with several rules and procedures for bandwidth selection applied to several real datasets. The considered examples permit us to state that when working with tens or a few hundreds of data observations, least-squares cross-validation bandwidth rarely produces useful estimates; with thousands of observations, this problem can be surpassed. Optimal bandwidth and biased cross-validation (BCV), in general, oversmooth multimodal densities. The Sheather-Jones plug-in rule produced bandwidths that behave slightly better in this respect. The Silverman test is considered as a very sophisticated and safe procedure to estimate the number of modes in univariate distributions; however, similar results could be obtained with the Sheather-Jones rule, but at a much lower computational cost. As expected, the variable bandwidth kernel density estimates showed fewer modes than those chosen by the Silverman test, especially those distributions in which multimodality was caused by several noisy minor modes. More research on the subject is needed. Copyright 2003 by Stata Corporation.
kernel density estimation, bandwidth, cross validation, multimodality test
http://www.stata-journal.com/software/sj3-2/st0036/
http://www.stata-journal.com/sjpdf.html?articlenum=st0036
Isaias H. Salgado-Ugarte
Marco A. Perez-Hernandez
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:358-3772020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:358-377
article
Estimation of average treatment effects based on propensity scores
In this paper, we give a short overview of some propensity score matching estimators suggested in the evaluation literature, and we provide a set of Stata programs,which we illustrate using the National Supported Work (NSW) demonstration widely known in labor economics. Copyright 2002 by Stata Corporation.
propensity score, matching, average treatment effect, evaluation
http://www.stata-journal.com/software/sj2-4/st0026/
http://www.stata-journal.com/sjpdf.html?articlenum=st0026
Sascha O. Becker
Andrea Ichino
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:382020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:38
article
Memories of Stata
http://www.stata-journal.com/sjpdf.html?articlenum=gn0021
Tony Lachenbruch
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:290-2952020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:290-295
article
Choosing an appropriate real-life measure of effect size:the case of a continuous predictor and a binary outcome
A case study of data on age and pregnancy is used to point up some morals for practicing data analysts, including the superiority of regression over t tests, exploratory scatterplot smoothing as a key method of checking form of relationship, and the value of logistic regression followed by adjust as a way of getting at the numbers of most interest. Copyright 2002 by Stata Corporation.
adjust,correlation,logistic regression,regression,scatterplot smoothing,t tests
http://www.stata-journal.com/sjpdf.html?articlenum=st0021
Ronan M. Conroy
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:421-4282020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:421-428
article
Confidence intervals for the kappa statistic
The command kapci calculates 100(1 - alpha) percent confidence intervals for the kappa statistic using an analytical method in the case of dichotomous variables or bootstrap for more complex situations. For instance, kapci allows estimating CI f or polychotomous variables using weighted kappa or for cases in which there are more than 2 raters/replications. Copyright 2004 by StataCorp LP.
kapci, reliability, kappa statistic, confidence intervals
http://www.stata-journal.com/software/sj4-4/st0076/
http://www.stata-journal.com/sjpdf.html?articlenum=st0076
Michael E. Reichenheim
oai:RePEc:tsj:stataj:v:2:y:2002:i:4:p:411-4272020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:4:p:411-427
article
Speaking Stata: On getting functions to do the work
Functions in Stata take two main forms, built-in functions that are part of the executable and egen functions written in Stata's own language. These are surveyed, giving a variety of tips and tricks, and noting the large number of user-written egen functions available for download from the Internet. Two substantial examples, the calculation of percentile ranks and plotting positions, and the calculation of measures summarizing properties of the other members of a group,provide detailed illustrations of egen in action. Copyright 2002 by Stata Corporation.
functions, egen, strings, percentile ranks, plotting positions, family data
http://www.stata-journal.com/sjpdf.html?articlenum=pr0007
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:6:y:2006:i:3:p:4332020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:3:p:433
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2006 by StataCorp LP.
http://www.stata-journal.com/software/sj6-3/sts15_2/
http://www.stata-journal.com/software/sj6-3/sts18_1/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:259-2732020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:259-273
article
Speaking Stata: Density probability plots
Density probability plots show two guesses at the density function of a continuous variable, given a data sample. The first guess is the density function of a specified distribution (e.g., normal, exponential, gamma, etc.) with appropriate parameter values plugged in. The second guess is the same density function evaluated at quantiles corresponding to plotting positions associated with the sample’s order statistics. If the specified distribution fits well, the two guesses will be close. Such plots, suggested by Jones and Daly in 1995, are explained and discussed with examples from simulated and real data. Comparisons are made with histograms, kernel density estimation, and quantile–quantile plots. Copyright 2005 by StataCorp LP.
gr0012, density probability plots, distributions, histograms, kernel density estimation, quantile–quantile plots, statistical graphics
http://www.stata-journal.com/software/sj5-2/gr0012/
http://www.stata-journal.com/article.html?article=gr0012
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:107-1242020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:107-124
article
Power by simulation
This paper describes how to write Stata programs to estimate the power of virtually any statistical test that Stata can perform. Examples given include the t test, Poisson regression, Cox regression, and the nonparametric rank-sum test. Copyright 2002 by Stata Corporation.
power,simulation,random number generation,post file,copula, sample size
http://www.stata-journal.com/sjpdf.html?articlenum=st0010
A. H. Feiveson
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:183-1892020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:183-189
article
A note on the concordance correlation coefficient
Program concord implements L. I. Lin Õs concordance correlation coefficient (Lin,1989), as well as the limits-of-agreement procedure (Bland and Altman, 1986). Recently, Lin (2000) issued an erratum reporting a number of typographical errors in his seminal 1989 paper. Further, changes in Stata Version 7 required modification of concord. This note reports the effect of the errors and provides a corrected and updated program. Copyright 2002 by Stata Corporation.
concordance correlation, graphics, measurement comparison, limits-of-agreement
http://www.stata-journal.com/software/sj2-2/st0015/
http://www.stata-journal.com/sjpdf.html?articlenum=st0015
Thomas J. Steichen
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:2:y:2002:i:3:p:267-2792020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:3:p:267-279
article
Estimation of sensitivity and specificity arising from validity studies with incomplete designs
This insert introduces valides, validesi, and validesu: a set of commands that allow the calculations of point and precision estimates of sensitivity and specificity obtained in validity studies with incomplete designs. Copyright 2002 by Stata Corporation.
validity,sensitivity,specificity,incomplete design
http://www.stata-journal.com/software/sj2-3/st0019/
http://www.stata-journal.com/sjpdf.html?articlenum=st0019
Michael E. Reichenheim
Antonio Ponce de Leon
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:2232020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:223
article
Stata tip 9: Following special sequences
http://www.stata-journal.com/sjpdf.html?articlenum=pr0013
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:4:y:2004:i:4:p:379-4012020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:4:p:379-401
article
Generalized power calculations for generalized linear models and more
The powercal command can compute any one of the five quantities involved in power calculations from the other four. These quantities are power, significance level, detectable difference, sample number, and the standard deviation (SD) of the influence function, which is equal to the standard error multiplied by the square root of the sample number. powercal can take arbitrary expressions (involving constants, scalars, or variables) as input and calculate the output as a new variable. The user can therefore plot input variables against output variables, and this often communicates the tradeoffs involved better than a point calculation as output by the sampsi command. General formulas are given for calculating the SD of the influence function when the detectable difference is a linear combination of link functions of subpopulation means for an outcome variable distributed according to a generalized linear model (GLM). This general case includes a very broad range of special cases, where the parameters to be estimated are differences between subpopulation proportions, arithmetic means and algebraic means, or ratios between subpopulation proportions, arithmetic means, geometric means, and odds. However, powercal is not limited to GLMsand can even be used with rank methods. Copyright 2004 by StataCorp LP.
powercal, power, alpha, significance level, detectable difference, detectable ratio, sample number, standard deviation, influence function, sample design, generalized linear model, proportion, arithmetic mean, algebraic mean, geometric mean, odds
http://www.stata-journal.com/software/sj4-4/st0074/
http://www.stata-journal.com/sjpdf.html?articlenum=st0074
Roger Newson
oai:RePEc:tsj:stataj:v:2:y:2002:i:1:p:71-852020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:1:p:71-85
article
From the help desk: Transfer functions
The question often arises as to whether one can estimate a transfer function model using Stata. While Stata does not currently have a convenience command for doing so, this article will demonstrate that estimating such a model can be accomplished quite easily using Stata's arima command. The classic text for transfer function modeling is Box, Jenkins, and Reinsel (1994); however, a more concise presentation can be found in Brockwell and Davis (1991). Copyright 2002 by Stata Corporation.
arima, xcorr, corrgram, transfer function, impulse-response function, autocorrelation function, cross-correlation function, pre-whitened, linear filter, difference equation
http://www.stata-journal.com/sjpdf.html?articlenum=st0009
Allen McDowell
oai:RePEc:tsj:stataj:v:2:y:2002:i:2:p:125-1392020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:2:p:125-139
article
Bootstrapping a conditional moments test for normality after tobit estimation
Categorical and limited dependent variable models are routinely estimated via maximum likelihood. It is well-known that the ML estimates of the parameters are inconsistent if the distribution or the skedastic component is misspecified. When conditional moment tests were first developed by Newey (1985) and Tauchen (1985),they appeared to offer a wide range of easy-to-compute specification tests for categorical and limited dependent variable models estimated by maximum likelihood. However, subsequent studies found that using the asymptotic critical values produced severe size distortions. This paper presents simulation evidence that the standard conditional moment test for normality after tobit estimation has essentially no size distortion and reasonable power when the critical values are obtained via a parametric bootstrap. Copyright 2002 by Stata Corporation.
conditional moment tests,bootstrap,tobit,normality
http://www.stata-journal.com/sjpdf.html?articlenum=st0011
David M. Drukker
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:355-3702020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:355-370
article
Introduction to game-theory calculations
Game theory can be defined as the study of mathematical mod- els of conflict and cooperation between intelligent and rational decision makers (Myerson 1991). Game-theory concepts apply in economy, sociology, biology, and health care, and whenever the actions of several agents (individuals, groups, or any combination of these) are interdependent. We present a new command gamet to represent the extensive form (game tree) and the strategic form (payoff matrix) of a noncooperative game and to identify the solution of a nonzero and zero-sum game through dominant and dominated strategies, iterated elimination of domi- nated strategies, and Nash equilibrium in pure and fully mixed strategies. Further, gamet can identify the solution of a zero-sum game through maximin criterion and the solution of an extensive form game through backward induction. Copyright 2005 by StataCorp LP.
Game theory, Nash equilibrium, payoff matrix, zero-sum game, game tree
http://www.stata-journal.com/article.html?article=st0088
http://www.stata-journal.com/software/sj5-3/st0088/
Nicola Orsini
Debora Rizzuto
Nicola Nante
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:1392020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:139
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2005 by StataCorp LP.
http://www.stata-journal.com/software/sj5-1/dm86_1/
http://www.stata-journal.com/software/sj5-1/snp12_1/
http://www.stata-journal.com/software/sj5-1/st0053_1/
http://www.stata-journal.com/software/sj5-1/st0071_1/
Editors
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:305-3332020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:305-333
article
Semiparametric analysis of case–control genetic data in the presence of environmental factors
In the past decade, many statistical methods have been proposed for the analysis of case-control genetic data with an emphasis on haplotype-based disease association studies. Most of the methodology has concentrated on the estimation of genetic (haplotype) main eﬀects. Most methods accounted for environmental and gene-environment interaction eﬀects by using prospective-type analyses that may lead to biased estimates when used with case-control data. Several recent publications addressed the issue of retrospective sampling in the analysis of case-control genetic data in the presence of environmental factors by developing efficient semiparametric statistical methods. This article describes the new Stata command haplologit, which implements efficient proﬁle-likelihood semiparametric methods for ﬁtting gene-environment models in the very important special cases of a rare disease, a single candidate gene in Hardy-Weinberg equilibrium, and independence of genetic and environmental factors. Copyright 2008 by StataCorp LP.
haplologit, haplotype-based analysis, haplotype-environment independence, case-control data, Hardy-Weinberg equilibrium, profile likelihood, retrospective study, single nucleotide polymorphisms (SNPs)
http://www.stata-journal.com/article.html?article=st0148
http://www.stata-journal.com/software/sj8-3/st0148/
Yulia V. Marchenko
Raymond K. Carroll
Danyu Y. Lin
Christopher I. Amos
Roberto G. Gutierrez
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:2792020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:279
article
Stata tip 19: A way to leaner, faster graphs
http://www.stata-journal.com/article.html?article=gr0013
Patrick Royston
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:3562020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:356
article
Stata tip 11: The nolog option with maximum-likelihood modeling commands
http://www.stata-journal.com/sjpdf.html?articlenum=pr0014
Patrick Royston
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:470-4712020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:470-471
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2005 by StataCorp LP.
http://www.stata-journal.com/software/sj5-3/gr41_3/
http://www.stata-journal.com/software/sj5-3/gr42_3/
http://www.stata-journal.com/software/sj5-3/sg139_1/
http://www.stata-journal.com/software/sj5-3/dg149_1/
http://www.stata-journal.com/software/sj5-3/snp15_5/
http://www.stata-journal.com/software/sj5-3/st0015_2/
http://www.stata-journal.com/software/sj5-3/st0026_2/
http://www.stata-journal.com/software/sj5-3/st0071_2/
http://www.stata-journal.com/article.html?article=up0012
Editors
oai:RePEc:tsj:stataj:v:9:y:2009:i:1:p:1732020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:1:p:173
article
Software updates
Updates for a previously published package is provided.
http://www.stata-journal.com/software/sj9-1/sbe22_1/
http://www.stata-journal.com/software/sj9-1/st0096_1/
http://www.stata-journal.com/software/sj9-1/st0143_1/
Editors
oai:RePEc:tsj:stataj:v:4:y:2004:i:2:p:113-1232020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:2:p:113-123
article
Production function estimation in Stata using inputs to control for unobservables
A key issue in the estimation of production functions is the correlation between unobservable productivity shocks and input levels. Profit-maximizing firms respond to positive productivity shocks by expanding output, which requires additional inputs. Negative shocks lead firms to pare back output, decreasing their input usage. Olley and Pakes (1996) develop an estimator that uses investment as a proxy for these unobservable shocks. More recently, Levinsohn and Petrin (2003a) introduce an estimator that uses intermediate inputs as proxies, arguing that intermediates may respond more smoothly to productivity shocks. This paper reviews Levinsohn and Petrin's approach and introduces a Stata command that implements it. Copyright 2004 by StataCorp LP.
levpet, production functions, productivity, endogeneity, GMM estimator
http://www.stata-journal.com/software/sj4-2/st0060/
http://www.stata-journal.com/sjpdf.html?articlenum=st0060
Amil Petrin
Brian P. Poi
James Levinsohn
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:257-2642020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:257-264
article
Compliance-adjusted intervention effects in survival data
Survival data are most frequently analyzed by the intention-to-treat principle. However, presenting a compliance-adjusted analysis alongside the primary analysis can provide an insight into the effect of the treatment for those individuals actually complying with their randomized intervention. There are a number of methods for this type of analysis. Loeys and Goetghebeur (2003) use proportional hazards techniques to provide an estimate of the treatment effect for compliers when compliance is measured on an all-or-nothing scale. This methodology is here made available through a new Stata command, stcomply. Copyright 2004 by StataCorp LP.
stcomply, compliance, proportional hazards
http://www.stata-journal.com/software/sj4-3/st0068/
http://www.stata-journal.com/sjpdf.html?articlenum=st0068
Lois G. Kim
Ian R. White
oai:RePEc:tsj:stataj:v:5:y:2005:i:3:p:2872020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:3:p:287
article
Editorial Announcements
H. Joseph Newton
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:1:p:137-1382020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:1:p:137-138
article
Stata tip 18: Making keys functional
http://www.stata-journal.com/article.html?article=gn0026
Shannon Driver
oai:RePEc:tsj:stataj:v:3:y:2003:i:4:p:386-4112020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:4:p:386-411
article
Maximum likelihood estimation of generalized linear models with covariate measurement error
Generalized linear models with covariate measurement error can be estimated by maximum likelihood using gllamm, a program that fits a large class of multilevel latent variable models (Rabe-Hesketh, Skrondal, and Pickles 2004). The program uses adaptive quadrature to evaluate the log likelihood, producing more reliable results than many other methods (Rabe-Hesketh, Skrondal, and Pickles 2002). For a single covariate measured with error (assuming a classical measurement model), we describe a ÒwrapperÓ command, cme, that calls gllamm to estimate the model. The wrapper makes life easy for the user by accepting a simple syntax and data structure and producing extended and easily interpretable output. The commands for preparing the data and running gllamm can also be obtained from cme. We first discuss the case where several measurements are available and subsequently consider estimation when the measurement error variance is instead assumed known. The latter approach is useful for sensitivity analysis assessing the impact of assuming perfectly measured covariates in generalized linear models. An advantage of using gllamm directly is that the classical covariate measurement error model can be extended in various ways. For instance, we can use nonparametric maximum likelihood estimation (NPMLE) to relax the normality assumption for the true covariate. We can also specify a congeneric measurement model which relaxes the assumption that the measurements for a unit are exchangeable replicates by allowing for different measurement scales and error variances. Copyright 2003 by StataCorp LP.
covariate measurement error, measurement model, congeneric measurement model, factor model, adaptive quadrature, nonparametric maximum likelihood, NPMLE, latent class model, empirical Bayes, simulation, wrapper, sensitivity analysis, gllamm, cme
http://www.stata-journal.com/software/sj3-4/st0052/
http://www.stata-journal.com/sjpdf.html?articlenum=st0052
Sophia Rabe-Hesketh
Anders Skrondal
Andrew Pickles
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:149-1502020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:149-150
article
Stata tip 30: May the source be with you
http://www.stata-journal.com/article.html?article=pr0022
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:5:y:2005:i:2:p:274-2782020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:5:y:2005:i:2:p:274-278
article
Review of Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models by Vittinghoff, Glidden, Shiboski, and McCulloch
The new book by Vittinghoff et al. (2005) is reviewed.
linear regression, logistic regression, survival analysis, repeated measures, generalized linear models, complex surveys
http://www.stata-journal.com/article.html?article=gn0028
Stanley Lemeshow
oai:RePEc:tsj:stataj:v:2:y:2002:i:1:p:1-212020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:2:y:2002:i:1:p:1-21
article
Reliable estimation of generalized linear mixed models using adaptive quadrature
Generalized linear mixed models or multilevel regression models have become increasingly popular. Several methods have been proposed for estimating such models. However,to date there is no single method that can be assumed to work well in all circumstances in terms of both parameter recovery and computational efficiency. Stata's xt commands for two-level generalized linear mixed models (e.g., xtlogit) employ Gauss-Hermite quadrature to evaluate and maximize the marginal log likelihood. The method generally works very well, and often better than common contenders such as MQL and PQL but there are cases where quadrature performs poorly.Adaptive quadrature has been suggested to overcome these problems in the two-level case. We have recently implemented a multilevel version of this method in gllamm, a program that fits a large class of multilevel latent variable models including multilevel generalized linear mixed models. As far as we know, this is the Þrst time that adaptive quadrature has been proposed for multilevel models. We show that adaptive quadrature works well in problems where ordinary quadrature fails. Furthermore,even when ordinary quadrature works, adaptive quadrature is often computationally more efficient since it requires fewer quadrature points to achieve the same precision. Copyright 2002 by Stata Corporation.
adaptive quadrature, gllamm, generalized linear mixed models, random-effects models, panel data, numerical integration, adaptive integration, multilevel models, clustered data
http://www.stata-journal.com/software/sj2-1/st0005/
http://www.stata-journal.com/sjpdf.html?articlenum=st0005
Sophia Rabe-Hesketh
Anders Skrondal
Andrew Pickles
oai:RePEc:tsj:stataj:v:3:y:2003:i:1:p:47-562020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:3:y:2003:i:1:p:47-56
article
Sample size calculations for main effects and interactions in case-control studies using Stata's nchi2 and npnchi2 functions
The non-central Chi-squared distribution can be used to calculate power for tests detecting departure from a null hypothesis. Required sample size can also be calculated because it is proportional to the non-centrality parameter for the distribution. We demonstrate how these calculations can be carried out in Stata using the example of calculating power and sample size for case-control studies of gene-gene and gene-environment interactions. Do-files are available for these calculations. Copyright 2003 by Stata Corporation.
gene-environment interaction, gene-gene interaction, power, sample size, study design, non-central Chi-squared
http://www.stata-journal.com/software/sj3-1/st0032/
http://www.stata-journal.com/sjpdf.html?articlenum=st0032
Catherine L. Saunders
D. Timothy Bishop
Jennifer H. Barrett
oai:RePEc:tsj:stataj:v:4:y:2004:i:3:p:3592020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:4:y:2004:i:3:p:359
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2004 by StataCorp LP.
http://www.stata-journal.com/software/sj4-3/st0043_1/
http://www.stata-journal.com/software/sj4-3/gr0002_3/
http://www.stata-journal.com/software/sj4-3/st0063_1/
http://www.stata-journal.com/software/sj4-3/sg144_1/
http://www.stata.com/bookstore/sjj.html
Editors
oai:RePEc:tsj:stataj:v:1:y:2001:i:1:p:98-1002020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:1:y:2001:i:1:p:98-100
article
Review of Generalized Linear Models and Extensions by Hardin and Hilbe
The new book Hardin and Hilbe (Stata Press, 2001) is reviewed. Copyright 2001 by Stata Corporation.
http://www.stata-journal.com/sjpdf.html?articlenum=gn0001
Roger Newson
oai:RePEc:tsj:stataj:v:6:y:2006:i:1:p:40-572020-08-13RePEc:tsj:stataj
RePEc:tsj:stataj:v:6:y:2006:i:1:p:40-57
article
Generalized least squares for trend estimation of summarized dose–response data
This paper presents a command, glst, for trend estimation across different exposure levels for either single or multiple summarized case-control, incidence-rate, and cumulative incidence data. This approach is based on constructing an approximate covariance estimate for the log relative risks and estimating a corrected linear trend using generalized least squares. For trend analysis of multiple studies, glst can estimate fixed- and random-effects metaregression models. Copyright 2006 by StataCorp LP.
glst, dose–response data, generalized least squares, trend, meta-analysis, metaregression
http://www.stata-journal.com/article.html?article=st0096
http://www.stata-journal.com/software/sj6-1/st0096/
Nicola Orsini
Rino Bellocco
Sander Greenland
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:45-762019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:45-76
article
More power through symbolic computation: Extending Stata by using the Maxima computer algebra system
Maxima is a free and open-source computer algebra system that can perform symbolic computations such as solving equations, determining derivatives of functions, obtaining Taylor series, and manipulating algebraic expressions. In this article, I present the Maxima Bridge System, which is a collection of software programs that allows Stata to interface with Maxima so that Maxima can be used for symbolic computation to transfer data from Stata to Maxima and to retrieve results from Maxima. The cooperation between Stata and Maxima provides an environment for statistical analysis in which symbolic computation can be easily used together with all the facilities supplied by Stata. In this environment, symbolic computation algorithms can be used to manage the complexity of algebra and calculus, whereas numerical computation can be used when speed matters. I also discuss the software architecture of the Maxima Bridge System, and I present several examples to illustrate how to develop new Stata commands that exploit the capabilities of Maxima. Copyright 2015 by StataCorp LP.
maxima, maximaget, maximaput, Maxima Bridge System, symbolic computation, computer algebra system
http://www.stata-journal.com/article.html?article=pr0059
Giovanni L. Lo Magno
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:226-2462019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:226-246
article
Regression models for count data from truncated distributions
We present new commands for analyzing count-data regression models for truncated distributions. The trncregress command allows specification of a regression model for the mean of the truncated distribution through options. In addition to support for truncated Poisson and negative binomial, trncregress fits models based on truncated versions of distributions including generalized Poisson, Poisson-inverse Gaussian, three-parameter negative binomial power, three-parameter Waring negative binomial, and three-parameter Famoye negative binomial. Copyright 2015 by StataCorp LP.
trncregress, truncation, generalized Poisson, negative binomial, Poisson-inverse Gaussian, Famoye, Waring, PIG, NB-P, NB-F
http://www.stata-journal.com/article.html?article=st0378
James W. Hardin
Joseph M. Hilbe
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:186-2152019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:186-215
article
Estimating and modeling relative survival
When estimating patient survival using data collected by populationbased cancer registries, it is common to estimate net survival in a relative-survival framework. Net survival can be estimated using the relative-survival ratio, which is the ratio of the observed survival of the patients (where all deaths are considered events) to the expected survival of a comparable group from the general population. In this article, we describe a command, strs, for life-table estimation of relative survival. We discuss three methods for estimating expected survival, as well as the cohort, period, and hybrid approaches for estimating relative survival. We also implement a life-table version of the Pohar Perme (2012, Biometrics 68: 113–120) estimator of net survival, and we describe two methods for age standardization. We also explain how, in addition to net probabilities of death, crude probabilities of death due to cancer and due to other causes can be estimated using the method of Cronin and Feuer (2000, Statistics in Medicine 19: 1729–1740). We conclude this article with discussion and examples of modeling excess mortality using various approaches, including the full-likelihood approach (using the ml command) and Poisson regression (using the glm command with a user-specified link function). Copyright 2015 by StataCorp LP.
strs, excess mortality, relative survival, survival analysis, Poisson regression, life table, cancer survival, period analysis
http://www.stata-journal.com/article.html?article=st0376
Paul W. Dickman
Enzo Coviello
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:309-3152019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:309-315
article
Review of Alan Acock’s Discovering Structural Equation Modeling Using Stata, Revised Edition
Richard Williams
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:21-442019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:21-44
article
Implementing intersection bounds in Stata
We present the clrbound, clr2bound, clr3bound, and clrtest commands for estimation and inference on intersection bounds as developed by Chernozhukov, Lee, and Rosen (2013, Econometrica 81: 667–737). The intersection bounds framework encompasses situations where a population parameter of interest is partially identified by a collection of consistently estimable upper and lower bounds. The identified set for the parameter is the intersection of regions defined by this collection of bounds. More generally, the methodology can be applied to settings where an estimable function of a vector-valued parameter is bounded from above and below, as is the case when the identified set is characterized by conditional moment inequalities. The commands clrbound, clr2bound, and clr3bound provide bound estimates that can be used directly for estimation or to construct asymptotically valid confidence sets. clrtest performs an intersection bound test of the hypothesis that a collection of lower intersection bounds is no greater than zero. The command clrbound provides bound estimates for one-sided lower or upper intersection bounds on a parameter, while clr2bound and clr3bound provide two-sided bound estimates using both lower and upper intersection bounds. clr2bound uses Bonferroni’s inequality to construct two-sided bounds that can be used to perform asymptotically valid inference on the identified set or the parameter of interest, whereas clr3bound provides a generally tighter confidence interval for the parameter by inverting the hypothesis test performed by clrtest. More broadly, inversion of this test can also be used to construct confidence sets based on conditional moment inequalities as described in Chernozhukov, Lee, and Rosen (2013). The commands include parametric, series, and local linear estimation procedures. Copyright 2015 by StataCorp LP.
clrbound, clr2bound, clr3bound, clrtest, intersection bounds, bound analysis, conditional moments, partial identification, infinite dimensional constraints, adaptive moment selection
http://www.stata-journal.com/article.html?article=st0369
Victor Chernozhukov
Wooyoung Kim
Sokbae Lee
Adam M. Rosen
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:95-1092019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:95-109
article
Generating univariate and multivariate nonnormal data
Because the assumption of normality is common in statistics, the robustness of statistical procedures to the violation of the normality assumption is often of interest. When one examines the impact of the violation of the normality assumption, it is important to simulate data from a nonnormal distribution with varying degrees of skewness and kurtosis. Fleishman (1978, Psychometrika 43: 521–532) developed a method to simulate data from a univariate distribution with specific values for the skewness and kurtosis. Vale and Maurelli (1983, Psychometrika 48: 465–471) extended Fleishman’s method to simulate data from a multivariate nonnormal distribution. In this article, I briefly introduce these two methods and present two new commands, rnonnormal and rmvnonnormal, for simulating data from the univariate and multivariate nonnormal distributions. Copyright 2015 by StataCorp LP.
rnonnormal, rmvnonnormal, nonnormal data, skewness, kurtosis
http://www.stata-journal.com/article.html?article=st0371
Sunbok Lee
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:292-3002019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:292-300
article
Nonparametric pairwise multiple comparisons in independent groups using Dunn's test
Dunn's test is the appropriate nonparametric pairwise multiplecomparison procedure when a Kruskal–Wallis test is rejected, and it is now implemented for Stata in the dunntest command. dunntest produces multiple comparisons following a Kruskal–Wallis k-way test by using Stata’s built-in kwallis command. It includes options to control the familywise error rate by using Dunn's proposed Bonferroni adjustment, the Sidak adjustment, the Holm stepwise adjustment, or the Holm–Sidak stepwise adjustment. There is also an option to control the false discovery rate using the Benjamini–Hochberg stepwise adjustment. Copyright 2015 by StataCorp LP.
dunntest, kwallis, Dunn's test, Kruskal–Wallis test, multiple comparisons, familywise error rate, Bonferroni, Sidak, Holm, Holm–Sidak, false discovery rate
http://www.stata-journal.com/article.html?article=st0381
Alexis Dinno
oai:RePEc:tsj:stataj:v:18:y:2018:i:4:cumindex2019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:18:y:2018:i:4:cumindex
article
Cumulative author index, volumes 1-18
This index does not appear in the print version of the Stata Journal.
http://repec.org/tsj/SJindexVol1-18.pdf
Christopher F Baum
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:247-2742019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:247-274
article
dynemp: A routine for distributed microdata analysis of business dynamics
In this article, we introduce a new command, dynemp, that implements a distributed microdata analysis of business and employment dynamics and firm demographics. As its data source, dynemp requires business registers or comparable firm- or establishment-level longitudinal databases that cover the (near-)universe of companies in all business sectors. Access to such confidential data is usually restricted, and the microlevel data cannot be brought together to a single platform for cross-country analysis. To solve this confidentiality problem while also maintaining a high level of harmonization of the key economic concepts, dynemp can be distributed in a network of researchers who have access to the national confidential microdata. This way, the rich firm-level employment dynamics can be analyzed from new angles (such as firm age and size), significantly expanding the scope of analyses relying only on more aggregated data. Copyright 2015 by StataCorp LP.
dynemp, employment dynamics, job flows, firm demographics, administrative data, data analysis
http://www.stata-journal.com/article.html?article=st0379
Chiara Criscuolo
Peter N. Gal
Carlo Menon
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:275-2912019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:275-291
article
Tools for checking calibration of a Cox model in external validation: Prediction of population-averaged survival curves based on risk groups
Royston (2014, Stata Journal 14: 738–755) explained how a popular application of the Cox proportional hazards model "is to develop a multivariable prediction model, often a prognostic model to predict the future clinical outcome of patients with a particular disorder from 'baseline' factors measured at some initial time point. For such a model to be useful in practice, it must be ‘validated’; that is, it must perform satisfactorily in an external sample of patients independent of the one on which the model was originally developed. One key aspect of performance is calibration, which is the accuracy of prediction, particularly of survival (or equivalently, failure or event) probabilities at any time after the time origin". In this article, I suggest an approach to assess calibration by comparing observed (Kaplan–Meier) and predicted survival probabilities in several prognostic groups derived by placing cutpoints on the prognostic index. I distinguish between full validation, where all relevant quantities are estimated on the derivation dataset and predicted on the validation dataset, and partial validation, where the prognostic index and prognostic groups are derived from published information and the baseline distribution function is estimated in the validation dataset. Partial validation is more feasible in practice because it is uncommon to have access to individual patient values in both datasets. I exemplify the method by detailed analysis of two datasets in the disease primary biliary cirrhosis; the datasets comprise a derivation and a validation dataset. I describe a new ado-file, stcoxgrp, that performs the necessary calculations. Results for stcoxgrp are displayed graphically, which makes it easier for users to picture calibration (or lack thereof ) according to follow-up time. Copyright 2015 by StataCorp LP.
stcoxgrp, Cox proportional hazards model, multivariable model, prognostic factors, prognostic groups, external validation, calibration, survival probabilities
http://www.stata-journal.com/article.html?article=st0380
Patrick Royston
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:77-942019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:77-94
article
Time-efficient algorithms for robust estimators of location, scale, symmetry, and tail heaviness
The analysis of the empirical distribution of univariate data often includes the computation of location, scale, skewness, and tail-heaviness measures, which are estimates of specific parameters of the underlying population distribution. Several measures are available, but they differ by Gaussian efficiency, robustness regarding outliers, and meaning in the case of asymmetric distributions. In this article, we briefly compare, for each type of parameter (location, scale, skewness, and tail heaviness), the “classical” estimator based on (centered) moments of the empirical distribution, an estimator based on specific quantiles of the distribution, and an estimator based on pairwise comparisons of the observations. This last one always performs better than the other estimators, particularly in terms of robustness, but it requires a heavy computation time of an order of n2. Fortunately, as explained in Croux and Rousseeuw (1992, Computational Statistics 1: 411–428), the algorithm of Johnson and Mizoguchi (1978, SIAM Journal of Scientific Computing 7: 147–153) allows one to substantially reduce the computation time to an order of n log n and, hence, allows the use of robust estimators based on pairwise comparisons, even in very large datasets. This has motivated us to program this algorithm for Stata. In this article, we describe the algorithm and the associated commands. We also illustrate the computation of these robust estimators by involving them in a normality test of Jarque–Bera form (Jarque and Bera 1980, Economics Letters 6: 255–259; Brys, Hubert, and Struyf, 2008, Computational Statistics 23: 429–442) using real data. Copyright 2015 by StataCorp LP.
mhl, sqn, medcouple, robjb, location, scale, symmetry, tail heaviness, mean, median, skewness, kurtosis, robust estimation
http://www.stata-journal.com/article.html?article=st0370
Wouter Gelade
Vincenzo Verardi
Catherine Vermandele
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:301-3082019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:301-308
article
Bayesian optimal interval design for phase I oncology clinical trials
In this article, I discuss the methods for generating nonnegatively correlated binary random variates. I provide a new command, rbinary, with examples showing how the command can be used. Copyright 2015 by StataCorp LP.
rbinary, correlated binary random data, Monte Carlo simulation, drawnorm, multivariate normal distribution
http://www.stata-journal.com/article.html?article=st0382
Minxing Chen
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:216-2252019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:216-225
article
A robust test for weak instruments in Stata
We introduce a routine, weakivtest, that implements the test for weak instruments by Montiel Olea and Pflueger (2013, Journal of Business and Economic Statistics 31: 358–369). weakivtest allows for errors that are not conditionally homoskedastic and serially uncorrelated. It extends the Stock and Yogo (2005, Testing for weak instruments in linear IV regression. In Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, ed. D. W. K. Andrews and J. J. Stock, 80–108. [Cambridge University Press]) weak-instrument tests available in ivreg2 and in the ivregress postestimation command estat firststage. weakivtest tests the null hypothesis that instruments are weak or that the estimator’s Nagar (1959, Econometrica 27: 575–595) bias is large relative to a benchmark for both two-stage least-squares estimation and limited-information maximum likelihood with one endogenous regressor. The routine can accommodate Eicker–Huber–White heteroskedasticity robust estimates, Newey and West (1987, Econometrica 55: 703–708) heteroskedasticity and autocorrelation-consistent estimates, and clustered variance estimates. Copyright 2015 by StataCorp LP.
weakivtest, F statistic, heteroskedasticity, autocorrelation, clustered, weak instruments, testing
http://www.stata-journal.com/article.html?article=st0377
Carolin E. Pflueger
Su Wang
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:110-1202019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:110-120
article
Bayesian optimal interval design for phase I oncology clinical trials
The Bayesian optimal interval (BOIN) design is a novel phase I trial design for finding the maximum tolerated dose (MTD). With the BOIN design, phase I trials are conducted as a sequence of decision-making steps for assigning an appropriate dose for each enrolled patient. The design optimizes the assignment of doses to patients by minimizing incorrect decisions of dose escalation or deescalation; that is, it decreases the chance of erroneously escalating or de-escalating the dose when the current dose is higher or lower than the MTD. This feature of the BOIN design strongly ensures adherence to ethical standards. The most prominent advantage of the BOIN design is that it simultaneously achieves design simplicity and superior performance in comparison with similar methods. The BOIN design can be implemented in a simple way that is similar to the 3 + 3 design, but it yields substantially better operating characteristics. Compared with the well-known continual reassessment method, the BOIN design yields average performance when selecting the MTD, but it has a substantially lower risk of assigning patients to subtherapeutic or overly toxic doses. In this article, we present a command (optinterval) for implementing the BOIN design in a phase I clinical trial setting. Copyright 2015 by StataCorp LP.
optinterval, Bayesian optimal interval, phase I clinical trial design, maximum tolerated dose, operating characteristic
http://www.stata-journal.com/article.html?article=st0372
Bryan M. Fellman
Ying Yuan
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:157-1582019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:157-158
article
Richard Sperling (1961-2011)
Christopher F Baum
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:173-1852019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:173-185
article
Estimating net survival using a life-table approach
Cancer registries are often interested in estimating net survival (NS), the probability of survival if the cancer under study is the only possible cause of death. Pohar Perme, Stare, and Esteve (2012, Biometrics 68: 113–120) proposed a new estimator of NS based on inverse probability weighting. They demonstrated that existing estimators of NS based on relative survival were biased, whereas the new estimator was unbiased. The new estimator was developed for continuous survival times, yet cancer registries often have only discrete survival times (for example, survival time in completed months or years). Therefore, we propose an approach to estimation for when survival times are discrete. In this article, we describe the stnet command for life-table estimation of NS, adapting the Pohar Perme estimation approach to life-table estimation. Estimates can be made using a period or hybrid approach in addition to the traditional cohort (or complete) approach, and age-standardized survival estimates are available. Copyright 2015 by StataCorp LP.
stnet, net survival, relative survival, competing risks, life table, cancer survival, age standardization, period analysis
http://www.stata-journal.com/article.html?article=st0375
Enzo Coviello
Paul W. Dickman
Karri Seppa
Arun Pokhrel
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:121-1342019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:121-134
article
Fixed-effect panel threshold model using Stata
Threshold models are widely used in macroeconomics and financial analysis for their simple and obvious economic implications. With these models, however, estimation and inference is complicated by the existence of nuisance parameters. To combat this issue, Hansen (1999, Journal of Econometrics 93: 345– 368) proposed the fixed-effect panel threshold model. In this article, I introduce a new command (xthreg) for implementing this model. I also use Monte Carlo simulations to show that, although the size distortion of the threshold-effect test is small, the coverage rate of the confidence interval estimator is unsatisfactory. I include an example on financial constraints (originally from Hansen [1999, Journal of Econometrics 93: 345–368]) to further demonstrate the use of xthreg. Copyright 2015 by StataCorp LP.
xthreg, panel threshold, fixed effect
http://www.stata-journal.com/article.html?article=st0373
Qunyong Wang
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:3242019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:324
article
Software updates
Updates for previously published packages are provided.
Editors
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:316-3182019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:316-318
article
Stata tip 122: Variable bar widths in two-way graphs
Ben Jann
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:135-1542019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:135-154
article
Frailty models and frailty-mixture models for recurrent event times
The analysis of recurrent event times faces three challenges: betweensubject heterogeneity (frailty), within-subject event dependence, and the possibility of a cured fraction. Frailty can be handled by including a latent random-effects term in a Cox-type model. Event dependence may be considered as contributing to the intervention effect, or it may be considered as a source of nuisance, depending on the analysts’ specific research questions. If it is seen as a nuisance, the analysis can stratify the recurrent event times according to event order. If it is seen as contributing to the intervention effect, stratification should not be used. Models with and without stratification for event order estimate two types of treatment effects. They are analogous to per-protocol analysis and intention-to-treat analysis, respectively. In the context of chronic disease treatment, we want to estimate whether there is a cured fraction; for infectious disease prevention, this is called a nonsusceptible fraction. In infectious disease prevention, we want to understand whether an intervention protects each of its recipients to some extent ("leaky" model) or whether it totally protects some recipients but offers no protection to the rest ("all-or-none" model). The truth may be a mixture of the two modes of protection. We describe a class of regression models that can handle all three issues in the analysis of recurrent event times. The model parameters are estimated by the expectation-maximization algorithm, and their variances are estimated by Louis’s formula. We provide a new command, strmcure, for implementing these models. Copyright 2015 by StataCorp LP.
strmcure, frailty models, frailty-mixture models, recurrent event times, event dependence, cured fraction
http://www.stata-journal.com/article.html?article=st0374
Ying Xu
Yin Bun Cheung
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:3-202019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:3-20
article
twopm: Two-part models
In this article, we describe twopm, a command for fitting two-part models for mixed discrete-continuous outcomes. In the two-part model, a binary choice model is fit for the probability of observing a positive-versus-zero outcome. Then, conditional on a positive outcome, an appropriate regression model is fit for the positive outcome. The twopm command allows the user to leverage the capabilities of predict and margins to calculate predictions and marginal effects and their standard errors from the combined first- and second-part models. Copyright 2015 by StataCorp LP.
twopm, two-part models, cross-sectional data, predictions, marginal effects
http://www.stata-journal.com/article.html?article=st0368
Federico Belotti
Partha Deb
Willard G. Manning
Edward C. Norton
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:155-1722019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:155-172
article
newspell: Easy management of complex spell data
Biographical data gathered in surveys are often stored in spell format, allowing for overlaps between spell states. On the one hand, these kind of data provide useful information for researchers. On the other hand, the data structure is often complex and not easy to handle. The newspell program offers a solution to the problem of spell-data management with three important features. First, it can rank spells and cut off overlaps according to the rank order. Second, newspell can combine overlapping parts of spells into new categories of spells, generating entirely new states. Third, it can detect gaps in the spell data that are not yet coded. It also includes subcommands for the management of complex spell data. Spell states can be merged and filled in with information from adjacent spells, and the data can be transformed to long or wide format. The command can be used to clean data, to combine two spell-data sources that have information on different kinds of states, or to deal with spell data that are complex by survey design. newspell is useful for users who are not familiar with complex spell data and have little experience in Stata programming or data management. For experienced users, newspell saves a lot of time and coding work. Copyright 2015 by StataCorp LP.
newspell, spell data, data management, complex data
http://www.stata-journal.com/article.html?article=dm0078
Hannes Kroger
oai:RePEc:tsj:stataj:v:15:y:2015:i:1:p:319-3232019-01-17RePEc:tsj:stataj
RePEc:tsj:stataj:v:15:y:2015:i:1:p:319-323
article
Stata tip 123: Spell boundaries
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:293-2942018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:293-294
article
Stata tip 61: Decimal commas in results output and data input
http://www.stata-journal.com/article.html?article=dm0036
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:255-2682018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:255-268
article
Mata Matters: Overflow, underflow and the IEEE floating-point format
Mata is Stata’s matrix language. The Mata Matters column shows how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. In this quarter’s column, we investigate underflow and overflow and then delve into the details of how floating-point numbers are stored in the IEEE 754 floating-point standard. We show how to test for overflow and underflow. We demonstrate how to use the %21x format to see underflow and the %16H, %16L, %8H, and %8L formats for displaying the byte content of doubles and floats.
Mata, underflow, overflow, denormalized number, normalized num- ber, subnormal number, double precision, missing values, IEEE 754, format, binary, hexadecimal
http://www.stata-journal.com/article.html?article=pr0038
Jean Marie Linhart
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:269-2892018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:269-289
article
Speaking Stata: Between tables and graphs
Table-like graphs can be interesting, useful, and even mildly innovative. This column outlines some Stata techniques for producing such graphs. graph dot is likely to be the most under-appreciated command among all existing commands. Using by() with various choices is a good way to mimic a categorical axis in many graph commands. When graph bar or graph dot is not flexible enough to do what you want, moving to the more flexible twoway is usually advisable. labmask and seqvar are introduced as new commands useful for preparing axis labels and axis positions for categorical variables. Applications of these ideas to, e.g., confidence interval plots lies ahead.
http://www.stata-journal.com/article.html?article=gr0034
http://www.stata-journal.com/software/sj8-2/gr0034/
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:170-1892018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:170-189
article
The Stata command felsdvreg to fit a linear model with two high-dimensional fixed effects
This article proposes a memory-saving decomposition of the design matrix to facilitate the estimation of a linear model with two high-dimensional fixed effects. A common way to fit such a model is to take into account one of the effects by including dummy variables and to sweep out the other effect by the within transformation (fixed-effects transformation). If the number of panel units is high, creating and storing the dummy variables can involve prohibitively large computer-memory requirements. The memory-saving procedure to set up the moment matrices for estimation presented in this article can reduce the memory requirements considerably. The companion Stata ado-file felsdvreg implements the estimation method, takes care of identification issues, and provides useful summary statistics. Copyright 2008 by StataCorp LP.
felsdvreg, linked employer-employee data, fixed effects, three-way error-components model
http://www.stata-journal.com/article.html?article=st0143
http://www.stata-journal.com/software/sj8-2/st0143/
Thomas Cornelissen
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:323-3242018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:323-324
article
Stata tip 100: Mata and the case of the missing macros
William Gould
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:221-2312018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:221-231
article
Production function estimation in Stata using the Olley and Pakes method
Productivity is often computed by approximating the weighted sum of the inputs from the estimation of the Cobb-Douglas production function. Such estimates, however, may suffer from simultaneity and selection biases. Olley and Pakes (1996, Econometrica 64: 1263-1297) introduced a semiparametric method that allows us to estimate the production function parameters consistently and thus obtain reliable productivity measures by controlling for such biases. This study first reviews this method and then introduces a Stata command to implement it. We show that when simultaneity and selection biases are not controlled for, the coefficients for the variable inputs are biased upward and the coefficients for the fixed inputs are biased downward. Copyright 2008 by StataCorp LP.
opreg, levpet, production function, bias, simultaneity
http://www.stata-journal.com/article.html?article=st0145
http://www.stata-journal.com/software/sj8-2/st0145/
Mahmut Yasar
Rafal Raciborski
Brian Poi
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:321-3222018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:321-322
article
Stata tip 99: Taking extra care with encode
Clyde Schechter
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:190-2202018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:190-220
article
SNP and SML estimation of univariate and bivariate binary-choice models
We discuss the semi-nonparametric approach of Gallant and Nychka (1987, Econometrica 55: 363-390), the semiparametric maximum likelihood ap- proach of Klein and Spady (1993, Econometrica 61: 387-421), and a set of new Stata commands for semiparametric estimation of three binary-choice models. The first is a univariate model, while the second and the third are bivariate models without and with sample selection, respectively. The proposed estimators are root-n consistent and asymptotically normal for the model parameters of interest under weak assumptions on the distribution of the underlying error terms. Our Monte Carlo simulations suggest that the efficiency losses of the semi-nonparametric and the semiparametric maximum likelihood estimators relative to a maximum like- lihood correctly specified estimator of a parametric probit are rather small. On the other hand, a comparison of these estimators in non-Gaussian designs suggests that semi-nonparametric and semiparametric maximum likelihood estimators sub- stantially dominate the parametric probit maximum likelihood estimator. Copyright 2008 by StataCorp LP.
snp, snp2, snp2s, sml, sml2s, binary-choice models, semi- nonparametric approach, SNP estimation, semiparametric maximum likelihood, SML estimation, Monte Carlo simulation
http://www.stata-journal.com/article.html?article=st0144
http://www.stata-journal.com/software/sj8-2/st0144/
Giuseppe De Luca
oai:RePEc:tsj:stataj:v:14:y:2014:i:4:p:817-8292018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:14:y:2014:i:4:p:817-829
article
txttool: Utilities for text analysis in Stata
This article describes txttool, a command that provides a set of tools for managing free-form text. The command integrates several built-in Stata functions with new text capabilities. These latter functions include a utility to create a bag-of-words representation of text and an implementation of Porter’s (1980, Program: Electronic library and information systems 14: 130–137) wordstemming algorithm. Collectively, these utilities provide a text-processing suite for text mining and other text-based applications in Stata. Copyright 2014 by StataCorp LP.
txttool, text mining, Porter stemmer, bag of words, cleaning, stop words, subwords
http://www.stata-journal.com/article.html?article=dm0077
Unislawa Williams
Sean P. Williams
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:318-3202018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:318-320
article
Stata tip 98: Counting substrings within strings
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:232-2412018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:232-241
article
Error-correction–based cointegration tests for panel data
This article describes a new Stata command called xtwest, which implements the four error-correction – based panel cointegration tests developed by Westerlund (2007). The tests are general enough to allow for a large degree of heterogeneity, both in the long-run cointegrating relationship and in the short-run dynamics, and dependence within as well as across the cross-sectional units. Copyright 2008 by StataCorp LP.
xtwest, panel cointegration test, common-factor restriction, cross-sectional dependence, health-care expenditures
http://www.stata-journal.com/article.html?article=st0146
http://www.stata-journal.com/software/sj8-2/st0146/
Damiaan Persyn
Joakim Westerlund
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:299-3032018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:299-303
article
Stata tip 63: Modeling proportions
http://www.stata-journal.com/article.html?article=st0147
Christopher F Baum
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:242-2542018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:242-254
article
Contour-enhanced funnel plots for meta-analysis
Funnel plots are commonly used to investigate publication and related biases in meta-analysis. Although asymmetry in the appearance of a funnel plot is often interpreted as being caused by publication bias, in reality the asymmetry could be due to other factors that cause systematic differences in the results of large and small studies, for example, confounding factors such as differential study quality. Funnel plots can be enhanced by adding contours of statistical significance to aid in interpreting the funnel plot. If studies appear to be missing in areas of low statistical significance, then it is possible that the asymmetry is due to publication bias. If studies appear to be missing in areas of high statistical significance, then publication bias is a less likely cause of the funnel asymmetry. It is proposed that this enhancement to funnel plots should be used routinely for meta-analyses where it is possible that results could be suppressed on the basis of their statistical significance. Copyright 2008 by StataCorp LP.
confunnel, funnel plots, meta-analysis, publication bias, small-study effects
http://www.stata-journal.com/article.html?article=gr0033
http://www.stata-journal.com/software/sj8-2/gr0033/
Tom M. Palmer
Jaime L. Peters
Alex J. Sutton
Santiago G. Moreno
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:147-1692018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:147-169
article
Multinomial goodness-of-fit: Large-sample tests with survey design correction and exact tests for small samples
I introduce the new mgof command to compute distributional tests for discrete (categorical, multinomial) variables. The command supports large-sample tests for complex survey designs and exact tests for small samples as well as classic large-sample Chi^2-approximation tests based on Pearson’s Chi^2, the likelihood ratio, or any other statistic from the power-divergence family (Cressie and Read, 1984, Journal of the Royal Statistical Society, Series B (Methodological) 46: 440 – 464). The complex survey correction is based on the approach by Rao and Scott (1981, Journal of the American Statistical Association 76: 221 – 230) and par- allels the survey design correction used for independence tests in svy: tabulate. mgof computes the exact tests by using Monte Carlo methods or exhaustive enu- meration. mgof also provides an exact one-sample Kolmogorov-Smirnov test for discrete data. Copyright 2008 by StataCorp LP.
mgof, mgofi, multinomial, goodness-of-fit, chi-squared, cat- egorical data, exact tests, Monte Carlo, exhaustive enumeration, combinatorial algorithms, complex survey correction, power-divergence statistic, Kolmogorov-Smirnov, Benford's law
http://www.stata-journal.com/article.html?article=st0142
http://www.stata-journal.com/software/sj8-2/st0142/
Ben Jann
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:290-2922018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:290-292
article
Stata tip 60: Making fast and easy changes to files with filefilter
http://www.stata-journal.com/article.html?article=pr0039
Alan R. Riley
oai:RePEc:tsj:stataj:v:8:y:2008:i:2:p:295-2982018-12-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:2:p:295-298
article
Stata tip 62: Plotting on reversed scales
http://www.stata-journal.com/article.html?article=gr0035
Nicholas J. Cox
Natasha L. M. Barlow
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:584-5862021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:584-586
article
Stata tip 53: Where did my p-values go?
http://www.stata-journal.com/article.html?article=st0137
Maarten L. Buis
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:275-2792021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:275-279
article
Stata tip 47: Quantile–quantile plots without programming
http://www.stata-journal.com/article.html?article=gr0027
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:334-3502021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:334-350
article
Simulation-based sensitivity analysis for matching estimators
This article presents a Stata program (sensatt) that implements the sensitivity analysis for matching estimators proposed by Ichino, Mealli, and Nannicini (Journal of Applied Econometrics , forthcoming). The analysis simulates a potential confounder to assess the robustness of the estimated treatment effects with respect to deviations from the conditional independence assumption. The program uses the commands for propensity-score matching (att* ) developed by Becker and Ichino (Stata Journal 2: 358–377). I give an example by using the National Supported Work demonstration, widely known in the program evaluation literature. Copyright 2007 by StataCorp LP.
sensatt, sensitivity analysis, matching, propensity score, pro- gram evaluation
http://www.stata-journal.com/article.html?article=st0130
http://www.stata-journal.com/software/sj7-3/st0130/
Tommaso Nannicini
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:590-5922021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:590-592
article
Stata tip 55: Better axis labeling for time points and time intervals
http://www.stata-journal.com/article.html?article=gr0030
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:571-5812021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:571-581
article
Speaking Stata: Counting groups, especially panels
Counting panels, and more generally groups, is sometimes possible in Stata through a reduction command (e.g., collapse, contract, statsby) that produces a smaller dataset or through a tabulation command. Yet there are also many problems, especially with irregular sets of observations for varying times, that do not yield easily to this approach. This column focuses on techniques for answering such questions while maintaining the same data structure. Especially useful are the Stata commands by: and egen and indicator variables constructed for the purpose. With by: we often exploit the fact that subscripts are defined within group, not within dataset. egen functions are often used to produce group-level statistics. Tagging each group just once ensures that summaries, including counts, are of groups, not individual observations.
data management, panels
http://www.stata-journal.com/article.html?article=dm0033
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:137-1392021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:137-139
article
Stata tip 40: Taking care of business
http://www.stata-journal.com/article.html?article=dm0028
Christopher F Baum
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:5942021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:594
article
Software updates
Updates for a previously published package is provided. Copyright 2008 by StataCorp LP.
http://www.stata-journal.com/software/sj8-4/dm89_1/
http://www.stata-journal.com/software/sj8-4/gr0024_1/
http://www.stata-journal.com/software/sj8-4/st0015_5/
http://www.stata-journal.com/software/sj8-4/st0100_1/
http://www.stata-journal.com/software/sj8-4/st0150_1/
http://www.stata-journal.com/software/sj8-4/sxd1_4/
Editors
oai:RePEc:tsj:stataj:v:9:y:2009:i:4:p:643-6472021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:4:p:643-647
article
Stata tip 81: A table of graphs
http://www.stata-journal.com/article.html?article=gr0042
Maarten L. Buis
Martin Weiss
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:497-4982021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:497-498
article
Stata tip 77: (Re)using macros in multiple do-files
http://www.stata-journal.com/article.html?article=pr0047
Jeph Herrin
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:5932021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:593
article
Software updates
Updates for a previously published package is provided. Copyright 2007 by StataCorp LP.
http://www.stata-journal.com/software/sj7-4/gr0012_1/
http://www.stata-journal.com/software/sj7-4/st0133_1/
Editors
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:592-5932021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:592-593
article
Stata tip 72: Using the Graph Recorder to create a pseudograph scheme
http://www.stata-journal.com/article.html?article=gr0036
Kevin Crow
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:450-4512021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:450-451
article
Stata tip 67: J() now has greater replicating powers
http://www.stata-journal.com/article.html?article=pr0043
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:446-4472021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:446-447
article
Stata tip 65: Beware the backstabbing backslash
http://www.stata-journal.com/article.html?article=pr0042
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:586-5872021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:586-587
article
Stata tip 70: Beware the evaluating equal sign
http://www.stata-journal.com/article.html?article=pr0045
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:388-4012021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:388-401
article
Fitting mixed logit models by using maximum simulated likelihood
This article describes the mixlogit Stata command for fitting mixed logit models by using maximum simulated likelihood. Copyright 2007 by StataCorp LP.
mixlogit, mixlpred, mixlcov, mixed logit, maximum simulated likelihood
http://www.stata-journal.com/article.html?article=st0133
http://www.stata-journal.com/software/sj7-3/st0133/
Arne Risa Hole
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:313-3332021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:313-333
article
Estimating parameters of dichotomous and ordinal item response models with gllamm
Item response theory models are measurement models for categorical responses. Traditionally, the models are used in educational testing, where responses to test items can be viewed as indirect measures of latent ability. The test items are scored either dichotomously (correct–incorrect) or by using an ordinal scale (a grade from poor to excellent). Item response models also apply equally for measurement of other latent traits. Here we describe the one- and two-parameter logit models for dichotomous items, the partial-credit and rating scale models for ordinal items, and an extension of these models where the latent variable is regressed on explanatory variables. We show how these models can be expressed as generalized linear latent and mixed models and fitted by using the user-written command gllamm. Copyright 2007 by StataCorp LP.
gllamm, gllapred, latent variables, Rasch model, partial-credit model, rating scale model, latent regression, generalized linear latent and mixed model, adaptive quadrature, item response theory
http://www.stata-journal.com/article.html?article=st0129
http://www.stata-journal.com/software/sj7-3/st0129/
Xiaohui Zheng
Sophia Rabe-Hesketh
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:440-4432021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:440-443
article
Review of A Visual Guide to Stata Graphics, Second Edition by Michael N. Mitchell
This article reviews A Visual Guide to Stata Graphics, Second Edition by Michael N. Mitchell.
graphics, Stata texts
http://www.stata-journal.com/article.html?article=gn0040
Scott Merryman
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:221-2262021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:221-226
article
predict and adjust with logistic regression
Within Stata there are two ways of getting average predicted values for different groups after an estimation command: adjust and predict. After OLS regression (regress), these two ways give the same answer. However, after logistic regression, the average predicted probabilities differ. This article discusses where that difference comes from and the consequent subtle difference in interpretation. Copyright 2007 by StataCorp LP.
adjust, predict, logistic regression
http://www.stata-journal.com/article.html?article=st0127
http://www.stata-journal.com/software/sj7-2/st0127/
Maarten L. Buis
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:249-2652021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:249-265
article
Speaking Stata: Identifying spells
Spells in time series (and more generally in any kind of one-dimensional series) may be defined as sequences of observations that are homogeneous in some sense. For example, a categorical variable may remain in the same state, or values of a measured variable may satisfy the same true-false condition. Devices for working with spells in Stata include marking the start of each spell with indicator variables and tagging spells with integer codes. Panel data are easy to handle with the by: prefix. Some kinds of spell identification require two passes through the data, as when only spells of some minimum length are of interest or short gaps are tolerable within spells. Many questions concerning spells are easy to answer given careful use of by: and appropriate sort order, selection of just 1 observation from each panel or spell, and appreciation of the many functions written for egen. Gaps before, between, and after spells can also be important, and I suggest a convention for handling them.
spells, runs, time series, data management
http://www.stata-journal.com/article.html?article=dm0029
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:587-5892021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:587-589
article
Stata tip 54: Post your results
http://www.stata-journal.com/article.html?article=pr0036
Philippe Van Kerm
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:585-5852021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:585-585
article
Stata tip 69: Producing log files based on successful interactive commands
http://www.stata-journal.com/article.html?article=pr0044
Alan R. Riley
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:4442021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:444
article
Software updates
Updates for a previously published package is provided. Copyright 2007 by StataCorp LP.
http://www.stata-journal.com/software/sj7-3/st0015_4/
Editors
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:402-4122021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:402-412
article
An exact and a Monte Carlo proposal to the Fisher–Pitman permutation tests for paired replicates and for independent samples
This article concerns the nonparametric Fisher-Pitman tests for paired replicates and independent samples. After outlining the theory of exact tests, I derive Monte Carlo simulations for both of them. Simulations can be useful if one deals with many observations because of the complexity of the algorithms in regard to sample sizes. The tests are designed to be a more powerful alternative to the Wilcoxon signed-rank test and the Wilcoxon-Mann-Whitney rank-sum test if the observations are given on at least an interval scale. The results gained by Monte Carlo versions of the tests are accurate enough in comparison to the exact versions. Finally, I give examples for using both supplemented tests. Copyright 2007 by StataCorp LP.
permtest1, permtest2, nonparametric tests, Monte Carlo, permutation tests
http://www.stata-journal.com/article.html?article=st0134
http://www.stata-journal.com/software/sj7-3/st0134/
Johannes Kaiser
oai:RePEc:tsj:stataj:v:9:y:2009:i:4:p:648-6512021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:4:p:648-651
article
Stata tip 82: Grounds for grids on graphs
http://www.stata-journal.com/article.html?article=gr0043
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:141-1422021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:141-142
article
Stata tip 42: The overlay problem: Offset for clarity
http://www.stata-journal.com/article.html?article=gr0026
James Cui
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:1402021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:140
article
Stata tip 41: Monitoring loop iterations
http://www.stata-journal.com/article.html?article=pr0030
David A. Harrison
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:438-4392021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:438-439
article
Stata tip 50: Efficient use of summarize
http://www.stata-journal.com/article.html?article=st0135
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:268-2712021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:268-271
article
Stata tip 45: Getting those data into shape
http://www.stata-journal.com/article.html?article=dm0031
Christopher F Baum
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:71-832021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:71-83
article
Sensitivity analysis for average treatment effects
Based on the conditional independence or unconfoundedness assump- tion, matching has become a popular approach to estimate average treatment effects. Checking the sensitivity of the estimated results with respect to devia- tions from this identifying assumption has become an increasingly important topic in the applied evaluation literature. If there are unobserved variables that affect assignment into treatment and the outcome variable simultaneously, a hidden bias might arise to which matching estimators are not robust. We address this prob- lem with the bounding approach proposed by Rosenbaum (Observational Studies, 2nd ed., New York: Springer), where mhbounds lets the researcher determine how strongly an unmeasured variable must influence the selection process to undermine the implications of the matching analysis. Copyright 2007 by StataCorp LP.
mhbounds, matching, treatment effects, sensitivity analysis, unobserved heterogeneity, Rosenbaum bounds
http://www.stata-journal.com/article.html?article=st0121
http://www.stata-journal.com/software/sj7-1/st0121/
Sascha O. Becker
Marco Caliendo
oai:RePEc:tsj:stataj:v:9:y:2009:i:4:p:640-6422021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:4:p:640-642
article
Stata tip 80: Constructing a group variable with specified group sizes
http://www.stata-journal.com/article.html?article=st0181
Martin Weiss
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:183-1962021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:183-196
article
Two postestimation commands for assessing confounding effects in epidemiological studies
Confounding is a major issue in observational epidemiological studies. This paper describes two postestimation commands for assessing confounding effects. One command (confall) displays and plots all possible effect estimates against one of p-value, Akaike information criterion, or Bayesian information criterion. This computing-intensive procedure allows researchers to inspect the variability of the effect estimates from various possible models. Another command (chest) uses a stepwise approach to identify variables that have substantially changed the effect estimate. Both commands can be used after most common estimation commands in epidemiological studies, such as logistic regression, conditional logistic regression, Poisson regression, linear regression, and Cox proportional hazards models. Copyright 2007 by StataCorp LP.
confall, confgr, chest, epidemiological methods, confounding, all possible effects, change in estimate
http://www.stata-journal.com/article.html?article=st0124
http://www.stata-journal.com/software/sj7-2/st0124/
Zhiqiang Wang
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:266-2672021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:266-267
article
Stata tip 44: Get a handle on your sample
http://www.stata-journal.com/article.html?article=dm0030
Ben Jann
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:556-5702021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:556-570
article
Mata Matters: Structures
Mata is Stata’s matrix language. In the Mata Matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. Structures are the sub ject of this column. Structures are an advanced programming technique that can greatly simplify complicated code.
Mata, structures, struct
http://www.stata-journal.com/article.html?article=pr0035
William Gould
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:22-442021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:22-44
article
Rasch analysis: Estimation and tests with raschtest
Analyzing latent variables is becoming more and more important in several fields, such as clinical research, psychology, educational sciences, ecology, and epidemiology. The item response theory allows analyzing latent variables measured by questionnaires of items with binary or ordinal responses. The Rasch model is the best known model of this theory for binary responses. Although one can estimate the parameters of the Rasch model with the clogit or xtlogit com- mand (or with the unofficial gllamm command), these commands require special data preparation. The proposed raschtest command easily allows estimating the parameters of the Rasch model and fitting the resulting model. Copyright 2007 by StataCorp LP.
raschtest, Rasch model, generalized estimating equations, conditional maximum likelihood method, marginal maximum likelihood method, An- dersen Z test, van den Wollenberg Q1 test, R1c, R1m, fit tests, item response theory, U test, splitting test, item characteristics curves
http://www.stata-journal.com/article.html?article=st0119
http://www.stata-journal.com/software/sj7-1/st0119/
Jean-Benoit Hardouin
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:167-1822021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:167-182
article
Maximum likelihood and two-step estimation of an ordered-probit selection model
We discuss the estimation of a regression model with an ordered- probit selection rule. We have written a Stata command, oheckman, that computes two-step and full-information maximum-likelihood estimates of this model. Using Monte Carlo simulations, we compare the performances of these estimators under various conditions. Copyright 2007 by StataCorp LP.
oheckman, selection bias, ordered probit, maximum likelihood
http://www.stata-journal.com/article.html?article=st0123
http://www.stata-journal.com/software/sj7-2/st0123/
Richard Chiburis
Michael Lokshin
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:574-5782021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:574-578
article
Stata par la pratique : statistiques, graphiques et elements de programmation par Eric Cahuzac et Christophe Bontemps
Cet article propose une revue de Stata par la pratique par Eric Cahuzac et Christophe Bontemps
apprentissage de Stata, Cahuzac, Bontemps, splp
http://www.stata-journal.com/article.html?article=gn0042
Antoine Terracol
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:376-3872021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:376-387
article
Profile likelihood for estimation and confidence intervals
Normal-based confidence intervals for a parameter of interest are inaccurate when the sampling distribution of the estimate is nonnormal. The technique known as profile likelihood can produce confidence intervals with better coverage. It may be used when the model includes only the variable of interest or several other variables in addition. Profile-likelihood confidence intervals are particularly useful in nonlinear models. The command pllf computes and plots the maximum likelihood estimate and profile likelihood-based confidence interval for one parameter in a wide variety of regression models. Copyright 2007 by StataCorp LP.
pllf, profile likelihood, confidence interval, nonnormality, nonlinear model
http://www.stata-journal.com/article.html?article=st0132
http://www.stata-journal.com/software/sj7-3/st0132/
Patrick Royston
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:84-972021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:84-97
article
Stata and the WeeW information system
The need for timely collection and analysis of epidemiological data is becoming of primary importance, e.g., for bioterrorism detection or epidemiological surveillance. Web-based information systems (WISs) may provide the needed technological support. Thus we present the WeeW (workflow-enabled epidemiological WIS) system-i.e., a WIS that helps epidemiologists, through workflow management, to effectively select remote centers, collect, and process the received data to produce conclusive technical reports. In detail, we show the functionalities of the WeeW system, its architecture, and particularly Stata's role in executing statistical analyses and producing graphs. Furthermore, we discuss the performance of the WeeW– Stata interface. Finally, we outline short conclusions regarding the advantages and drawbacks connected with the proposed solution. Copyright 2007 by StataCorp LP.
WeeW, web-based information systems, epidemiology, web development, benchmarks
http://www.stata-journal.com/article.html?article=pr0027
http://www.stata-journal.com/software/sj7-1/pr0027/
Pierpaolo Vittorini
Stefano Necozione
Ferdinando di Orio
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:147-1662021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:147-166
article
Improved generalized estimating equation analysis via xtqls for quasi-least squares in Stata
Abstract. Quasi-least squares (QLS) is an alternative method for estimating the correlation parameters within the framework of the generalized estimating equation (GEE) approach for analyzing correlated cross-sectional and longitudinal data. This article summarizes the development of QLS that occurred in several reports and describes its use with the user-written program xtqls in Stata. Also, it demonstrates the following advantages of QLS: (1) QLS allows some correlation structures that have not yet been implemented in the framework of GEE, (2) QLS can be applied as an alternative to GEE if the GEE estimate is infeasible, and (3) QLS uses the same estimating equation for estimation of beta as GEE; as a result, QLS can involve programs already available for GEE. In particular, xtqls calls the Stata program xtgee within an iterative approach that alternates between updating estimates of the correlation parameter alpha and then using xtgee to solve the GEE for beta at the current estimate of alpha. The benefit of this approach is that after xtqls, all the usual postregression estimation commands are readily available to the user. Copyright 2007 by StataCorp LP.
xtqls, correlated data, clustered data, longitudinal data, generalized estimating equations, quasi-least squares
http://www.stata-journal.com/article.html?article=st0122
http://www.stata-journal.com/software/sj7-2/st0122/
Justine Shults
Sarah J. Ratcliffe
Mary Leonard
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:143-1452021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:143-145
article
Stata tip 43: Remainders, selections, sequences, extractions: Uses of the modulus
http://www.stata-journal.com/article.html?article=pr0031
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:499-5032021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:499-503
article
Stata tip 78: Going gray gracefully: Highlighting subsets and downplaying substrates
http://www.stata-journal.com/article.html?article=gr0040
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:272-2742021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:272-274
article
Stata tip 46: Step we gaily, on we go
http://www.stata-journal.com/article.html?article=dm0032
Richard Williams
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:2802021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:280
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2007 by StataCorp LP.
http://www.stata-journal.com/software/sj7-2/gr0001_3/
http://www.stata-journal.com/software/sj7-2/st0097_1/
Editors
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:436-4372021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:436-437
article
Stata tip 49: Range frame plots
http://www.stata-journal.com/article.html?article=gr0029
Scott Merryman
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:98-1052021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:98-105
article
File filtering in Stata: Handling complex data formats and navigating log files efficiently
A text file filter is a program that converts one text file into another on the basis of a set of rules. For statistical applications, a text file filter can convert data embedded in a complicated text file so that Stata can read and analyze it. A text file filter can also automate the production of more user-friendly output from long Stata log files. The file command lets you use text file filters in Stata. This article reviews some key programming points for successful implementation of such filters. Copyright 2007 by StataCorp LP.
nullfilter, matchfilter, hyperlog, file, log, navigation, text, filtering, data import, data export, HTML, hyperlink index
http://www.stata-journal.com/article.html?article=dm0027
http://www.stata-journal.com/software/sj7-1/dm0027/
John Eng
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:448-4492021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:448-449
article
Stata tip 66: ds- A hidden gem
http://www.stata-journal.com/article.html?article=dm0040
Martin Weiss
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:440-4432021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:440-443
article
Stata tip 51: Events in intervals
http://www.stata-journal.com/article.html?article=pr0033
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:579-5822021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:579-582
article
Review of Multilevel and Longitudinal Modeling Using Stata, Second Edition, by Sophia Rabe-Hesketh and Anders Skrondal
Review of Multilevel and Longitudinal Modeling Using Stata, Second Edition, by Sophia Rabe-Hesketh and Anders Skrondal
multilevel, longitudinal, Rabe-Hesketh, Skrondal, mlmus2
http://www.stata-journal.com/article.html?article=gn0043
Nicholas J. Horton
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:45-702021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:45-70
article
Multivariable modeling with cubic regression splines: A principled approach
Spline functions provide a useful and flexible basis for modeling re- lationships with continuous predictors. However, to limit instability and provide sensible regression models in the multivariable setting, a principled approach to model selection and function estimation is important. Here the multivariable frac- tional polynomials approach to model building is transferred to regression splines. The essential features are specifying a maximum acceptable complexity for each continuous function and applying a closed-test approach to each continuous pre- dictor to simplify the model where possible. Important adjuncts are an initial choice of scale for continuous predictors (linear or logarithmic), which often helps one to generate realistic, parsimonious final models; a goodness-of-fit test for a parametric function of a predictor; and a preliminary predictor transformation to improve robustness. Copyright 2007 by StataCorp LP.
mvrs, uvrs, splinegen, multivariable analysis, continuous predictor, regression spline, model building, goodness of fit, choice of scale
http://www.stata-journal.com/article.html?article=st0120
http://www.stata-journal.com/software/sj7-1/st0120/
Patrick Royston
Willi Sauerbrei
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:1-212021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:1-21
article
A survey on survey statistics: What is done and can be done in Stata
This article will survey issues in analyzing complex survey data and describe some of the capabilities of Stata for such analyses. We will briefly review key elements of survey design and explain the effects of different design features on bias and variance. We compare different methods of variance estimation for stratified and clustered samples and discuss the handling of survey weights. We will also give examples for the practical importance of Stata's survey capabilities. Copyright 2007 by StataCorp LP.
cluster sampling, complex design, nonresponse, stratified sampling, variance estimation, weights, DEFECT, NHANES, NHIS, PISA
http://www.stata-journal.com/article.html?article=st0118
http://www.stata-journal.com/software/sj7-1/st0118/
Frauke Kreuter
Richard Valliant
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:413-4332021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:413-433
article
Speaking Stata: Turning over a new leaf
Stem-and-leaf displays have been widely taught since John W. Tukey publicized them energetically in the 1970s. They remain useful for many distributions of small or modest size, especially for showing fine structure such as digit preference. Stata’s implementation stem produces typed text displays and has some inevitable limitations, especially for comparison of two or more displays. One can re-create stem-and-leaf displays with a few basic Stata commands as scatterplots of stem variable versus position on line with leaves shown as marker labels. Comparison of displays then becomes easy and natural using scatter, by(). Back-to-back presentation of paired displays is also possible. I discuss vari- ants on standard stem-and-leaf displays in which each distinct value is a stem, each distinct value is its own leaf, or axes are swapped. The problem shows how one can, with a few lines of Stata, often produce standard graph forms from first principles, allowing in turn new variants. I also present a new program, stemplot, as a convenience tool.
stemplot, stem-and-leaf, graphics, distributions, digit preference
http://www.stata-journal.com/article.html?article=gr0028
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:117-1302021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:117-130
article
Speaking Stata: Making it count
The count command has one simple role, to count observations in general or that satisfy some condition(s). This task can be useful when some larger problem pivots on counting, especially if count is used with a loop over observations or variables. I use various problems, mostly of data management, as examples. I also make comparisons with the use of N, summarize, and egen for the same or similar problems.
count, counting, data management, dropping variables, finding matches
http://www.stata-journal.com/article.html?article=pr0029
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:588-5912021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:588-591
article
Stata tip 71: The problem of split identity, or how to group dyads
http://www.stata-journal.com/article.html?article=dm0043
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:466-4772021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:466-477
article
Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables
Multiple imputation of missing data continues to be a topic of considerable interest and importance to applied researchers. In this article, the ice package for multiple imputation by chained equations (also known as fully con- ditional specification) is further updated. Special attention is paid to categorical variables. The relationship between ice and the new multiple-imputation system in Stata 11 is clarified.
multiple imputation, chained equations, categorical variables, negative binomial distribution, ice, uvis, mi
http://www.stata-journal.com/article.html?article=st0067_4
http://www.stata-journal.com/software/sj9-3/st0067_4/
Patrick Royston
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:245-2482021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:245-248
article
Review of A Handbook of Statistical Analyses Using Stata, Fourth Edition, by Rabe-Hesketh and Everitt
This article reviews A Handbook of Statistical Analyses Using Stata, Fourth Edition, by Sophia Rabe-Hesketh and Brian S. Everitt.
applied statistics, Stata texts
http://www.stata-journal.com/article.html?article=gn0037
William D. Dupont
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:351-3752021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:351-375
article
Modeling of the cure fraction in survival studies
Cure models are a special type of survival analysis model where it is assumed that there are a proportion of sub jects who will never experience the event and thus the survival curve will eventually reach a plateau. In population-based cancer studies, cure is said to occur when the mortality (hazard) rate in the diseased group of individuals returns to the same level as that expected in the general population. The cure fraction is of interest to patients and a useful measure to monitor trends and differences in survival of curable disease. I will describe the strsmix and strsnmix commands, which fit the two main types of cure fraction model, namely, the mixture and nonmixture cure fraction models. These models allow incorporation of the expected background mortality rate and thus enable the modeling of relative survival when cure is a possibility. I give an example to illustrate the commands. Copyright 2007 by StataCorp LP.
strsmix, strsnmix, predict, relative survival, cure models, split population models, postestimation
http://www.stata-journal.com/article.html?article=st0131
http://www.stata-journal.com/software/sj7-3/st0131/
Paul C. Lambert
oai:RePEc:tsj:stataj:v:9:y:2009:i:4:p:6522021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:4:p:652
article
Software updates
Updates for a previously published package is provided. Copyright 2009 by StataCorp LP.
http://www.stata-journal.com/software/sj9-4/sg113_2/
http://www.stata-journal.com/software/sj9-4/st0150_2/
Editors
oai:RePEc:tsj:stataj:v:8:y:2008:i:3:p:444-4452021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:3:p:444-445
article
Stata tip 64: Cleaning up user-entered string variables
http://www.stata-journal.com/article.html?article=dm0039
Jeph Herrin
Eva Poen
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:197-2082021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:197-208
article
Estimation of nonstationary heterogeneous panels
We introduce a new Stata command, xtpmg, for estimating nonstationary heterogeneous panels in which the number of groups and number of time-series observations are both large. Based on recent advances in the nonstationary panel literature, xtpmg provides three alternative estimators: a traditional fixed-effects estimator, the mean-group estimator of Pesaran and Smith (Estimating long-run relationships from dynamic heterogeneous panels, Journal of Econometrics 68: 79-113), and the pooled mean-group estimator of Pesaran, Shin, and Smith (Estimating long-run relationships in dynamic heterogeneous panels, DAE Working Papers Amalgamated Series 9721; Pooled mean group estimation of dynamic heterogeneous panels, Journal of the American Statistical Association 94: 621-634). Copyright 2007 by StataCorp LP.
xtpmg, nonstationary panels, heterogeneous dynamic panels, pooled mean-group estimator, mean-group estimator, panel cointegration
http://www.stata-journal.com/article.html?article=st0125
http://www.stata-journal.com/software/sj7-2/st0125/
Edward F. Blackburne III
Mark W. Frank
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:209-2202021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:209-220
article
QIC program and model selection in GEE analyses
The generalized estimating equation (GEE) approach is a widely used statistical method in the analysis of longitudinal data in clinical and epidemiolog- ical studies. It is an extension of the generalized linear model (GLM) method to correlated data such that valid standard errors of the parameter estimates can be drawn. Unlike the GLM method, which is based on the maximum likelihood theory for independent observations, the GEE method is based on the quasilikelihood theory and no assumption is made about the distribution of response observations. Therefore, Akaike’s information criterion, a widely used method for model selection in GLM, is not applicable to GEE directly. However, Pan (Biometrics 2001; 57:120-125) proposed a model-selection method for GEE and termed it quasilikelihood under the independence model criterion. This criterion can also be used to select the best-working correlation structure. From Pan’s methods, I developed a general Stata program, qic, that accommodates all the distribution and link functions and correlation structures available in Stata version 9. In this paper, I introduce this program and demonstrate how to use it to select the best working correlation structure and the best subset of covariates through two examples in longitudinal studies. Copyright 2007 by StataCorp LP.
qic, Akaike’s information criterion, GEE, likelihood, model, quasilikelihood under the independence model criterion
http://www.stata-journal.com/article.html?article=st0126
http://www.stata-journal.com/software/sj7-2/st0126/
James Cui
oai:RePEc:tsj:stataj:v:9:y:2009:i:3:p:5042021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:9:y:2009:i:3:p:504
article
Stata tip 79: Optional arguments to options
http://www.stata-journal.com/article.html?article=pr0048
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:8:y:2008:i:4:p:569-5732021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:8:y:2008:i:4:p:569-573
article
Review of Stata par la pratique : statistiques, graphiques et elements de programmation, by Eric Cahuzac and Christophe Bontemps
This article reviews Stata par la pratique : statistiques, graphiques et elements de programmation, by Eric Cahuzac and Christophe Bontemps.
French, learning Stata, Cahuzac, Bontemps, splp
http://www.stata-journal.com/article.html?article=gn0041
Antoine Terracol
oai:RePEc:tsj:stataj:v:7:y:2007:i:1:p:106-1162021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:1:p:106-116
article
Mata Matters: Subscripting
Mata is Stata's matrix language. In the Mata Matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. Subscripting is the subject of this column. Stata has three subscripting modes, and two of them are about more than accessing an element of a vector or matrix. The advanced forms of subscripting can, by themselves, be the solution to some problems.
Mata, subscripts, list subscripts, range subscripts, sampling with replacement, permutation matrices and vectors
http://www.stata-journal.com/article.html?article=pr0028
William Gould
oai:RePEc:tsj:stataj:v:7:y:2007:i:3:p:434-4352021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:3:p:434-435
article
Stata tip 48: Discrete uses for uniform()
http://www.stata-journal.com/article.html?article=pr0032
Maarten L. Buis
oai:RePEc:tsj:stataj:v:7:y:2007:i:2:p:227-2442021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:2:p:227-244
article
Making regression tables simplified
estout, introduced by Jann (Stata Journal 5: 288–308), is a useful tool for producing regression tables from stored estimates. However, its syntax is relatively complex and commands may turn out long even for simple tables. Furthermore, having to store the estimates beforehand can be cumbersome. To facilitate the production of regression tables, I therefore present here two new commands called eststo and esttab. eststo is a wrapper for official Stata’s estimates store and simplifies the storing of estimation results for tabulation. esttab, on the other hand, is a wrapper for estout and simplifies compiling nice- looking tables from the stored estimates without much typing. I also provide updates to estout and estadd. Copyright 2007 by StataCorp LP.
csv, estadd, estimates, estout, eststo, esttab, excel, html, latex, regression table, rtf, word
http://www.stata-journal.com/article.html?article=st0085_1
http://www.stata-journal.com/software/sj7-2/st0085_1/
Ben Jann
oai:RePEc:tsj:stataj:v:7:y:2007:i:4:p:582-5832021-06-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:7:y:2007:i:4:p:582-583
article
Stata tip 52: Generating composite categorical variables
http://www.stata-journal.com/article.html?article=dm0034
Nicholas J. Cox
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:546-5722021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:546-572
article
SADI: Sequence analysis tools for Stata
The SADI package provides tools for sequence analysis, which focuses on the similarity and dissimilarity between categorical time series such as life-course trajectories. SADI’s main components are tools to calculate intersequence distances using several different algorithms, including the optimal matching algorithm, but it also includes utilities to graph, summarize, and manage sequence data. It provides similar functionality to the R package TraMineR and the Stata package SQ but is substantially faster than the latter.
SADI, ari, sdchronogram, combinadd, combinprep, corrsqm, cumuldur, sddiscrep, dynhamming, sdentropy, sdhamming, sdhollister, maketrpr, metricp, nspells, oma, omav, permtab, sdstripe, trans2subs, trprgr, twed, sequence analysis tools
http://www.stata-journal.com/article.html?article=st0486
Brendan Halpin
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:630-6512021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:630-651
article
Randomization inference with Stata: A guide and software
Randomization inference or permutation tests are only sporadically used in economics and other social sciences—this despite a steep increase in ran- domization in field and laboratory experiments that provide perfect experimental setups for applying randomization inference. In the context of causal inference, such tests can handle problems often faced by applied researchers, including issues arising in the context of small samples, stratified or clustered treatment assign- ments, or nonstandard randomization techniques. Standard statistical software packages have either no implementation of randomization tests or very basic im- plementations. Whenever researchers use randomization inference, they regularly code individual program routines, risking inconsistencies and coding mistakes. In this article, I show how randomization inference can best be conducted in Stata and introduce a new command, ritest, to simplify such analyses. I illustrate this approach’s usefulness by replicating the results in Fujiwara and Wantchekon (2013, American Economic Journal: Applied Economics 5: 241–255) and running simulations. The applications cover clustered and stratified assignments, with varying cluster sizes, pairwise randomization, and the computation of nonapprox- imate p-values. The applications also touch upon joint hypothesis testing with randomization inference.
ritest, randomization inference, permutation tests, treatment effects, causal inference
http://www.stata-journal.com/article.html?article=st0489
Simon Heß
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:652-6672021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:652-667
article
Dealing with misfits in random treatment assignment
In this article, I discuss the “misfits” problem, a practical issue that arises in random treatment assignment whenever observations cannot be neatly distributed among treatments. I also introduce the randtreat command, which performs random assignment of unequal treatment fractions and provides several methods to deal with misfits.
randtreat, random assignment, misfits, randomized control trial
http://www.stata-journal.com/article.html?article=st0490
Alvaro Carril
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:774-7782021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:774-778
article
Stata tip 128: Marginal effects in log-transformed models: A trade application
http://www.stata-journal.com/article.html?article=st0497
Luca J. Uberti
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:760-7732021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:760-773
article
Speaking Stata: Tables as lists: The groups command
Tables can often be conveniently considered or produced as lists. The list command is therefore a vehicle for obtaining such tables. The groups command for tabulation is built around a call to list. It has no particular limits on the number of identifiers (row, column, or other variables defining cells). Among other features, it offers support for various kinds of frequencies, percents, and cumulations thereof; for various subsetting and ordering by frequencies, percents, and so on; for reordering of columns from the default display; and for saving tabulated data to new datasets.
groups, tables, lists
http://www.stata-journal.com/article.html?article=st0496
Nicholas J. Cox
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:723-7352021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:723-735
article
Evaluating the maximum MSE of mean estimators with missing data
In this article, we present the wald mse command, which computes the maximum mean squared error of a user-specified point estimator of the mean for a population of interest in the presence of missing data. As pointed out by Manski (1989, Journal of Human Resources 24: 343–360; 2007, Journal of Econometrics 139: 105–115), the presence of missing data results in the loss of point identification of the mean unless one is willing to make strong assumptions about the nature of the missing data. Despite this, decision makers may be interested in reporting a single number as their estimate of the mean as opposed to an estimate of the identified set. It is not obvious which estimator of the mean is best suited to this task, and there may not exist a universally best choice in all settings. To evaluate the performance of a given point estimator of the mean, wald mse allows the decision maker to compute the maximum mean squared error of an arbitrary estimator under a flexible specification of the missing-data process.
wald mse, maximum mean squared error
http://www.stata-journal.com/article.html?article=st0494
Charles F. Manski
Max Tabord-Meehan
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:668-6862021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:668-686
article
How to test for goodness of fit in ordinal logistic regression models
Ordinal regression models are used to describe the relationship between an ordered categorical response variable and one or more explanatory variables. Several ordinal logistic models are available in Stata, such as the proportional odds, adjacent-category, and constrained continuation-ratio models. In this article, we present a command (ologitgof) that calculates four goodness-of-fit tests for assessing the overall adequacy of these models. These tests include an ordinal version of the Hosmer–Lemeshow test, the Pulkstenis–Robinson chi-squared and deviance tests, and the Lipsitz likelihood-ratio test. Together, these tests can detect several different types of lack of fit, including wrongly specified continuous terms, omission of different types of interaction terms, and an unordered response variable.
ologitgof, Hosmer–Lemeshow test, Pulkstenis–Robinson chi-squared and deviance tests, Lipsitz likelihood-ratio test, ordinal models, propor- tional odds, adjacent category, continuation ratio
http://www.stata-journal.com/article.html?article=st0491
Morten W. Fagerland
David W. Hosmer
Hajime Uno
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:517-5452021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:517-545
article
Bias corrections for probit and logit models with two-way fixed effects
In this article, we present the user-written commands probitfe and logitfe, which fit probit and logit panel-data models with individual and time unobserved effects. Fixed-effects panel-data methods that estimate the unobserved effects can be severely biased because of the incidental parameter problem (Ney- man and Scott, 1948, Econometrica 16: 1–32). We tackle this problem using the analytical and jackknife bias corrections derived in Fern ́andez-Val and Weidner (2016, Journal of Econometrics 192: 291–312) for panels where the two dimensions (N and T) are moderately large. We illustrate the commands with an empirical application to international trade and a Monte Carlo simulation calibrated to this application.
probitfe, logitfe, probit, logit, panel, fixed effects, bias corrections, incidental parameter problem
http://www.stata-journal.com/article.html?article=st0485
Mario Cruz-Gonzalez
Iván Fernández-Val
Martin Weidner
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:619-6292021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:619-629
article
Model selection for univariable fractional polynomials
Since Royston and Altman’s 1994 publication (Journal of the Royal Statistical Society, Series C 43: 429–467), fractional polynomials have steadily gained popularity as a tool for flexible parametric modeling of regression relation- ships. In this article, I present fp select, a postestimation tool for fp that allows the user to select a parsimonious fractional polynomial model according to a closed test procedure called the fractional polynomial selection procedure or function se- lection procedure. I also give a brief introduction to fractional polynomial models and provide examples of using fp and fp select to select such models with real data.
fp select, continuous covariate, fractional polynomials, regression models, model selection
http://www.stata-journal.com/article.html?article=st0488
Patrick Royston
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:600-6182021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:600-618
article
Literate data analysis with Stata and Markdown
In this article, I introduce markstat, a command for combining Stata code and output with comments and annotations written in Markdown into a beautiful webpage or PDF file, thus encouraging literate programming and repro- ducible research. The command tangles the input separating Stata and Markdown code, runs the Stata code, relies on Pandoc to process the Markdown code, and then weaves the outputs into a single file. HTML documents may include inline and display math using MathJax. Generating PDF output requires access to LATEX and a style file from Stata but works with the same input file.
markstat, Markdown, Pandoc, LATEX literate programming, dynamic documents, reproducible research
http://www.stata-journal.com/article.html?article=pr0067
Germán Rodríguez
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:748-7592021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:748-759
article
New graphic schemes for Stata: plotplain and plottig
While Stata’s computational capabilities have intensively increased over the last decade, the quality of its default graphic schemes is still a matter of debate among users. Some of the arguments speaking against Stata’s default graphic design are subject to individual taste but others are not, for example, horizontal labeling, unnecessary background tinting, missing gridlines, and over- sized markers. In this article, I present two new graphic schemes, plotplain and plottig, that attempt to address these concerns. These schemes provide users a set of 21 colors, of which 7 colors are distinguishable for people suffering from color blindness. I also give an introduction on how users can program their own graphic schemes.
plotplain, plottig, graphic scheme, colorblind
http://www.stata-journal.com/article.html?article=gr0070
Daniel Bischof
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:736-7472021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:736-747
article
Technical financial analysis tools for Stata
In this article, we provide four financial technical analysis tools: moving averages, Bollinger bands, moving-average convergence divergence, and the relative strength index. The tftools command is used with four subcommands, each referring to a technical analysis tool: bollingerbands, macd, movingaverage, and rsi. We provide examples for each tool. tftools allows researchers to backtest their own investment strategies and will be of interest to investors, researchers, and students of finance.
tftools bollingerbands, tftools macd, tftools movingaverage, tftools rsi, finance, technical analysis, moving average, Bollinger, MACD, RSI
http://www.stata-journal.com/article.html?article=st0495
Mehmet F. Dicle
John D. Levendis
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:573-5992021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:573-599
article
Analyzing repeated measurements while accounting for derivative tracking, varying within-subject variance, and autocorrelation: The xtmixediou command
Linear mixed-effects models are commonly used to model trajectories of repeated measures of biomarkers of disease. Taylor, Cumberland, and Sy (1994, Journal of the American Statistical Association 89: 727–736) proposed a linear mixed-effects model with an added integrated Ornstein–Uhlenbeck (IOU) process (linear mixed-effects IOU model). This allows for autocorrelation, changing within- subject variance, and the incorporation of derivative tracking (that is, how much a subject tends to maintain the same trajectory for extended periods of time). They argued that the covariance structure induced by the stochastic process in this model was interpretable and more biologically plausible than the standard linear mixed-effects model. However, their model is rarely used, partly because of the lack of available software. In this article, we present the new command xtmixediou, which fits the linear mixed-effects IOU model and its special case, the linear mixed- effects Brownian motion model. The model is fit to balanced and unbalanced data using restricted maximum-likelihood estimation, where the optimization algorithm is the Newton–Raphson, Fisher scoring, or average information algorithm, or any combination of these. To aid convergence, xtmixediou allows the user to change the method for deriving the starting values for optimization, the optimization algorithm, and the parameterization of the IOU process. We also provide a predict command to generate predictions under the model. We illustrate xtmixediou and predict with a simulated example of repeated biomarker measurements from HIV-positive patients.
xtmixediou, xtmixediou postestimation, autocorrelation, deriva- tive tracking, integrated Ornstein–Uhlenbeck process, repeated-measures data, within-subject variability
http://www.stata-journal.com/article.html?article=st0487
Rachael A. Hughes
Michael G. Kenward
Jonathan A. C. Sterne
Kate Tilling
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:7792021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:779
article
Software updates
Updates for previously published packages are provided.
Editors
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:704-7222021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:704-722
article
Response surface models for OLS and GLS detrending-based unit-root tests in nonlinear ESTAR models
In this article, we calculate response surface models for a large range of quantiles of the Kapetanios, Shin, and Snell (2003, Journal of Econometrics 112: 359–379) and Kapetanios and Shin (2008, Economics Letters 100: 377–380) tests for the null hypothesis of a unit root against the alternative—that the series of interest follows a globally stationary exponential smooth transition autoregressive process. The response surface models allow estimation of finite-sample critical values and approximate p-values for different combinations of the number of ob- servations, T, and the lag order in the test regression, p. The latter can be either specified by the user or optimally selected using a data-dependent procedure. We present the new commands kssur and ksur and illustrate their use with an empirical example.
kssur, ksur, unit-root test, nonlinear ESTAR models, Monte Carlo, response surface, critical values, lag length, p-values
http://www.stata-journal.com/article.html?article=st0493
Jesús Otero
Jeremy Smith
oai:RePEc:tsj:stataj:y:17:y:2017:i:3:p:687-7032021-07-01RePEc:tsj:stataj
RePEc:tsj:stataj:y:17:y:2017:i:3:p:687-703
article
Estimating measures of multidimensional poverty with Stata
In this article, we describe the multidimensional poverty measures developed by Alkire and Foster (2011, Journal of Public Economics 95: 476–487) and show how they can be computed with Stata by using the mpi command.
mpi, multidimensional poverty, Alkire–Foster method
http://www.stata-journal.com/article.html?article=st0492
Daniele Pacifico
Felix Poege
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:386-3942020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:386-394
article
Projection of power and events in clinical trials with a time-to-event outcome
In 2005, Barthel, Royston, and Babiker presented a menu-driven Stata program under the generic name of ART (assessment of resources for trials) to calculate sample size and power for complex clinical trial designs with a time-to- event or binary outcome. In this article, we describe a Stata tool called ARTPEP, which is intended to project the power and events of a trial with a time-to-event outcome into the future given patient accrual figures so far and assumptions about event rates and other defining parameters. ARTPEP has been designed to work closely with the ART program and has an associated dialog box. We illustrate the use of ARTPEP with data from a phase III trial in esophageal cancer.
artpep, artbin, artsurv, artmenu, randomized controlled trial, time-to-event outcome, power, number of events, projection, ARTPEP, ART
http://www.stata-journal.com/article.html?article=st0013_2
Patrick Royston
Friederike M.-S. Barthel
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:104-1242020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:104-124
article
Creating synthetic discrete-response regression models
The development and use of synthetic regression models has proven to assist statisticians in better understanding bias in data, as well as how to best interpret various statistics associated with a modeling situation. In this article, I present code that can be easily amended for the creation of synthetic binomial, count, and categorical response models. Parameters may be assigned to any num- ber of predictors (which are shown as continuous, binary, or categorical), negative binomial heterogeneity parameters may be assigned, and the number of levels or cut points and values may be specified for ordered and unordered categorical re- sponse models. I also demonstrate how to introduce an offset into synthetic data and how to test synthetic models using Monte Carlo simulation. Finally, I intro- duce code for constructing a synthetic NB2-logit hurdle model. Copyright 2010 by StataCorp LP.
synthetic, pseudorandom, Monte Carlo, simulation, logistic, probit, Poisson, NB1, NB2, NB-C, hurdle, offset, ordered, multinomial
http://www.stata-journal.com/article.html?article=st0186
Joseph M. Hilbe
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:72-932020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:72-93
article
Respondent-driven sampling
Respondent-driven sampling is a network sampling technique typi- cally employed for hard-to-reach populations (for example, drug users, men who have sex with men, people with HIV). Similarly to snowball sampling, initial seed respondents recruit additional respondents from their network of friends. The re- cruiting process repeats iteratively, thereby forming long referral chains. Unlike in snowball sampling, it is crucial to obtain estimates of respondents’ personal network sizes (that is, number of acquaintances in the target population) and information about who recruited whom. Markov chain theory makes it possible to derive population estimates and sampling weights. We introduce a new Stata command for respondent-driven sampling and illustrate its use. Copyright 2012 by StataCorp LP.
ds, rds_network, respondent-driven sampling
http://www.stata-journal.com/article.html?article=st0247
Matthias Schonlau
Elisabeth Liebau
oai:RePEc:tsj:stataj:v:11:y:2011:i:4:p:605-6192020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:4:p:605-619
article
Causal mediation analysis
Estimating the mechanisms that connect explanatory variables with the explained variable, also known as "mediation analysis," is central to a variety of social-science fields, especially psychology, and increasingly to fields like epidemiology. Recent work on the statistical methodology behind mediation analysis points to limitations in earlier methods. We implement in Stata computational approaches based on recent developments in the statistical methodology of mediation analysis. In particular, we provide functions for the correct calculation of causal mediation effects using several different types of parametric models, as well as the calculation of sensitivity analyses for violations to the key identifying assumption required for interpreting mediation results causally.
medeff, medsens, mediation, causal mechanism, direct effects, sensitivity analysis
http://www.stata-journal.com/article.html?article=st0243
Raymond Hicks
Dustin Tingley
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:406-4322020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:406-432
article
A review of Stata commands for fixed-effects estimation in normal linear models
Availability of large multilevel longitudinal databases in various fields of research, including labor economics (with workers and firms observed over time) and education (with students, teachers, and schools observed over time), has increased the application of models with one level or multiple levels of fixed effects (for example, teacher and student effects). There has been a corresponding rapid development of Stata commands designed for fitting these types of models. The commands parameterize the fixed-effects portions of models differently. In cases where estimates of the fixed-effects parameters are of interest, it is critical to understand precisely what parameters are being estimated by different commands. In this article, we catalog the estimates of reported fixed effects provided by different commands for several canonical cases of both one-level and two-level fixed-effects models. We also discuss issues regarding computational efficiency and standarderror estimation. Copyright 2012 by StataCorp LP.
longitudinal data, linked employer-employee data, fixed-effects estimators, regress, areg, a2reg, gpreg, reg2hdfe, xtreg, fese, felsdvregdm, software review
http://www.stata-journal.com/article.html?article=st0267
Daniel F. McCaffrey
J. R. Lockwood
Kata Mihaly
Tim R. Sass
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:359-3682020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:359-368
article
Using Stata with PHASE and Haploview: Commands for importing and exporting data
Modern genetics studies require the use of many specialty software programs for various aspects of the statistical analysis. PHASE is a program often used to reconstruct haplotypes from genotype data, and Haploview is a program often used to visualize and analyze single nucleotide polymorphism data. Three new commands are described for performing these three steps: 1) exporting geno- type data stored in Stata to PHASE, 2) importing the resulting inferred haplotypes back into Stata, and 3) exporting the haplotype/single nucleotide polymorphism data from Stata to Haploview. Copyright 2010 by StataCorp LP.
phaseout, phasein, haploviewout, genetics, haplotypes, SNPs, PHASE, haploview
http://www.stata-journal.com/article.html?article=st0199
J. Charles Huber Jr.
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:30-512020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:30-51
article
Nonparametric item response theory using Stata
Item response theory is a set of models and methods allowing for the analysis of binary or ordinal variables (items) that are influenced by a latent variable or latent trait- that is, a variable that cannot be measured directly. The theory was originally developed in educational assessment but has many other applications in clinical research, ecology, psychiatry, and economics. The Mokken scales have been described by Mokken (1971, A Theory and Procedure of Scale Analysis [De Gruyter]). They are composed of items that satisfy the three fundamental assumptions of item response theory: unidimensionality, monotonicity, and local independence. They can be considered nonparametric models in item response theory. Traces of the items and Loevinger's H coefficients are particularly useful indexes for checking whether a set of items constitutes a Mokken scale. However, these indexes are not available in general statistical packages. We introduce Stata commands to compute them. We also describe the options available and provide examples of output. Copyright 2011 by StataCorp LP.
tracelines, loevh, gengroup, msp, items trace lines, Mokken scales, item response theory, Loevinger coefficients, Guttman errors
http://www.stata-journal.com/article.html?article=st0216
Jean-Benoit Hardouin
Angelique Bonnaud-Antignac
Veronique Sebille
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:182-1902020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:182-190
article
What hypotheses do "nonparametric" two-group tests actually test?
In this article, I discuss measures of effect size for two-group comparisons where data are not appropriately analyzed by least-squares methods. The Mann–Whitney test calculates a statistic that is a very useful measure of effect size, particularly suited to situations in which differences are measured on scales that either are ordinal or use arbitrary scale units. Both the difference in medians and the median difference between groups are also useful measures of effect size. Copyright 2012 by StataCorp LP.
ranksum, Wilcoxon rank-sum test, Mann–Whitney statistic, Hodges–Lehman median shift, effect size, qreg
http://www.stata-journal.com/article.html?article=st0253
Ronan M. Conroy
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:242-2562020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:242-256
article
Faster estimation of a discrete-time proportional hazards model with gamma frailty
Fitting a complementary log-log model that accounts for gammadistributed unobserved heterogeneity often takes a significant amount of time. This is in part because numerical derivatives are used to approximate the gradient vector and Hessian matrix. The main contribution of this article is the use of Mata and a gf2 evaluator to express the gradient vector and Hessian matrix. Gradient vector expression allows one to use a few different options and postestimation commands. Furthermore, expression of the gradient vector and Hessian matrix increases the speed at which a likelihood function is maximized. In this article, I present a complementary log-log model, show how the gamma distribution has been incorporated, and point out why the gradient vector and Hessian matrix can be expressed. I then discuss the speed at which a maximum is achieved, and I apply sampling weights that require an expression of the gradient vector. I introduce a new command for fitting this model. To demonstrate how this model can be applied, I will examine information on when young males first try marijuana. Copyright 2012 by StataCorp LP.
pgmhazgf2, hazard model, discrete duration analysis, complementary log-log model, gamma distribution, unobserved heterogeneity
http://www.stata-journal.com/article.html?article=st0256
Michael G. Farnworth
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:482-4952020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:482-495
article
Speaking Stata: The limits of sample skewness and kurtosis
Sample skewness and kurtosis are limited by functions of sample size. The limits, or approximations to them, have repeatedly been rediscovered over the last several decades, but nevertheless seem to remain only poorly known. The limits impart bias to estimation and, in extreme cases, imply that no sample could bear exact witness to its parent distribution. The main results are explained in a tutorial review, and it is shown how Stata and Mata may be used to confirm and explore their consequences.
descriptive statistics, distribution shape, moments, sample size, skewness, kurtosis, lognormal distribution
http://www.stata-journal.com/article.html?article=st0204
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:94-1292020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:94-129
article
Stata graph library for network analysis
Network analysis is a multidisciplinary research method that is quickly becoming a popular and exciting field. Though some statistical programs possess sophisticated packages for analyzing networks, similar capabilities have yet to be made available in Stata. In an effort to motivate the use of Stata for network analysis, I designed in Mata the Stata graph library (SGL), which consists of algorithms that construct matrix representations of networks, compute centrality measures, calculate clustering coefficients, and solve maximum-flow problems. The SGL is designed for both directed and undirected one-mode networks containing edges that are either unweighted or weighted with positive values. Performance tests conducted between C++ and Stata graph library implementations indicate gross inefficiencies in current SGL routines, making the SGL impractical for large networks. The obstacles are, however, welcome challenges in the effort to spread the use of Stata for analyzing networks. Future developments will focus toward addressing computational time complexities and integrating additional capabilities into the SGL. Copyright 2012 by StataCorp LP.
netsis, netsummarize, centrality, clustering, network analysis
http://www.stata-journal.com/article.html?article=st0248
Hirotaka Miura
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:433-4462020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:433-446
article
Easy demand-system estimation with quaids
Previously, to fit an almost-ideal demand system in Stata, one would have to use the nlsur command and write a function evaluator program as described in [R] nlsur and Poi (2008, Stata Journal 8: 554–556). In this article, I introduce the command quaids, which obviates the need for any programming by the user. The command fits Deaton and Muellbauer's (1980b, American Economic Review 70: 312–326) original almost-ideal demand-system model as well as Banks, Blundell, and Lewbel's (1997, Review of Economics and Statistics 79: 527–539) quadratic variant. Demographic variables can also be included in the model. Postestimation tools calculate expenditure and price elasticities.
quaids, almost-ideal demand system
http://www.stata-journal.com/article.html?article=st0268
Brian P. Poi
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:284-2982020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:284-298
article
Fitting nonparametric mixed logit models via expectation-maximization algorithm
In this article, I provide an illustrative, step-by-step implementation of the expectation-maximization algorithm for the nonparametric estimation of mixed logit models. In particular, the proposed routine allows users to fit straightforwardly latent-class logit models with an increasing number of mass points so as to approximate the unobserved structure of the mixing distribution. Copyright 2012 by StataCorp LP.
latent classes, expectation-maximization algorithm, nonparametric mixed logit
http://www.stata-journal.com/article.html?article=st0258
Daniele Pacifico
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:1552020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:155
article
Software updates
Updates for a previously published package is provided.
Editors
oai:RePEc:tsj:stataj:v:11:y:2011:i:4:p:589-6042020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:4:p:589-604
article
Managing the U.S. Census 2000 and World Development Indicators databases for statistical analysis in Stata
This article introduces a new Stata command, labcenswdi, to automatically manage databases that provide variable descriptions on the second row in a dataset. While renaming all variables and converting them from string to numeric, labcenswdi automatically manages the variable descriptions including removing them from the second row to place them into Stata variable labels and saving them to a text file. The process yields a dataset ready for statistical analysis. I illustrate how this command can be used to efficiently manage datasets obtained from the U.S. Census 2000 and the World Development Indicators databases. Copyright 2011 by StataCorp LP.
labcenswdi, U.S. Census 2000, World Development Indicators, databases, data management, panel data
http://www.stata-journal.com/article.html?article=dm0060
P. Wilner Jeanty
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:215-2252020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:215-225
article
Model fit assessment via marginal model plots
We present a new Stata command, mmp, that generates marginal model plots (Cook and Weisberg, 1997, Journal of the American Statistical Association 92: 490–499) for a regression model. These plots allow for the comparison of the fitted model with a nonparametric or semiparametric model fit. The user may precisely specify how the alternative fit is computed. Demonstrations are given for logistic and linear regressions, using the lowess smoother to generate the alternate fit. Guidelines for the use of mmp under different models (through glm and other commands) and different smoothers (such as lpoly) are also presented. Copyright 2010 by StataCorp LP.
mmp, regress, glm, lpoly, logit, logistic, marginal model plots
http://www.stata-journal.com/article.html?article=st0189
Charles Lindsey
Simon Sheather
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:106-1192020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:106-119
article
Stata utilities for geocoding and generating travel time and travel distance information
This article describes geocode and traveltime, two commands that use Google Maps to provide spatial information for data. The geocode command allows users to generate latitude and longitude for various types of locations, including addresses. The traveltime command takes latitude and longitude information and finds travel distances between points, as well as the time it would take to travel that distance by either driving, walking, or using public transportation. Copyright 2011 by StataCorp LP.
geocode, traveltime, Google Maps, geocoding, ArcGIS
http://www.stata-journal.com/article.html?article=dm0053
Adam Ozimek
Daniel Miles
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:30-452020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:30-45
article
Using the world development indicators database for statistical analysis in Stata
The World Bank’s world development indicators (WDI) compilation is a rich and widely used database about the development of most economies in the world. However, after insheeting a WDI dataset, some data management is required prior to performing statistical analysis. In this article, I propose a new Stata command, wdireshape, for automating this data management. While reshaping a WDI dataset into structures amenable to panel data, seeming unrelated regression, or cross-sectional modeling, wdireshape renames the series and places the series descriptors into variable labels. Copyright 2010 by StataCorp LP.
wdireshape, paverage, world development indicators, reshape, panel data, seeming unrelated regression
http://www.stata-journal.com/article.html?article=dm0045
P. Wilner Jeanty
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:3132020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:313
article
Software updates
Updates for a number of previously published packages are provided. Copyright 2010 by StataCorp LP.
Editors
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:64-812020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:64-81
article
Pointwise confidence intervals for the covariate-adjusted survivor function in the Cox model
A graphical representation of the pointwise confidence intervals allows a researcher to easily assess the precision of estimators. In the absence of covariates, the official command sts graph can be used to plot these intervals for the survivor function or the cumulative hazard function; however, in the presence of covariates, sts graph is insufficient. The user-written command survci can be used to plot the pointwise intervals for the survivor function after the Cox model. In this article, I describe the current and new features of survci. The new features include pointwise confidence intervals for the cumulative hazard function and the support of stratified Cox models, as well as factor variables; available as of Stata 11. I describe the methods used in calculating pointwise confidence intervals in the Cox model for both the covariate-adjusted survivor function and the covariate-adjusted cumulative hazard function. I also demonstrate the syntax of survci using Stata’s example cancer dataset, cancer.dta. Copyright 2011 by StataCorp LP.
survci, confidence intervals, covariate adjusted survivor function, Cox model
http://www.stata-journal.com/article.html?article=st0217
Matthew Cefalu
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:515-5422020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:515-542
article
Long-run covariance and its applications in cointegration regression
Long-run covariance plays a major role in much of time-series inference, such as heteroskedasticity- and autocorrelation-consistent standard errors, generalized method of moments estimation, and cointegration regression. We propose a Stata command, lrcov, to compute long-run covariance with a prewhitening strategy and various kernel functions. We illustrate how long-run covariance matrix estimation can be used to obtain heteroskedasticity- and autocorrelation-consistent standard errors via the new hacreg command; we also illustrate cointegration regression with the new cointreg command. hacreg has several improvements compared with the official newey command, such as more kernel functions, automatic determination of the lag order, and prewhitening of the data. cointreg enables the estimation of cointegration regression using fully modified ordinary least squares, dynamic ordinary least squares, and canonical cointegration regression methods. We use several classical examples to demonstrate the use of these commands.
lrcov, hacreg, cointreg, long-run covariance, fully modified ordinary least squares, dynamic ordinary least squares, canonical cointegration regression
http://www.stata-journal.com/article.html?article=st0272
Qunyong Wang
Na Wu
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:61-682020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:61-68
article
riskplot: A graphical aid to investigate the effect of multiple categorical risk factors
In this article, we describe a command, riskplot, aiming to provide a visual aid to assess the strength, importance, and consistency of risk factor effects. The plotted form is a dendrogram that branches out as it moves from left to right. It displays the mean of some score or the absolute risk of some outcome for a sample that is progressively disaggregated by a sequence of categorical risk factors. Examples of the application of the new command are drawn from the analysis of depression and fluid intelligence in a sample of elderly men and women. Copyright 2010 by StataCorp LP.
riskplot, pathways, risk factors, graphs
http://www.stata-journal.com/article.html?article=gr0044
Milena Falcaro
Andrew Pickles
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:52-632020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:52-63
article
Visualization of social networks in Stata using multidimensional scaling
I describe and illustrate the use of multidimensional scaling methods for visualizing social networks in Stata. The procedure is implemented in the netplot command. I discuss limitations of the approach and sketch possibilities for improvement. Copyright 2011 by StataCorp LP.
netplot, mds, social network analysis, visualization, multidimensional scaling
http://www.stata-journal.com/article.html?article=gr0048
Rense Corten
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:628-6492020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:628-649
article
A simple feasible procedure to fit models with high-dimensional fixed effects
In this article, we describe an iterative approach for the estimation of linear regression models with high-dimensional fixed effects. This approach is computationally intensive but imposes minimum memory requirements. We also show that the approach can be extended to nonlinear models and to more than two high-dimensional fixed effects. Copyright 2010 by StataCorp LP.
fixed effects, panel data
http://www.stata-journal.com/article.html?article=st0212
Paulo Guimarães
Pedro Portugal
oai:RePEc:tsj:stataj:v:11:y:2011:i:4:p:577-5882020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:4:p:577-588
article
Dynamic simulations of autoregressive relationships
This postestimation technique produces dynamic simulations of autoregressive ordinary least-squares models.
dynsim, dynamic simulations, clarify, long-term effects, time series, lagged dependent variables
http://www.stata-journal.com/article.html?article=st0242
Laron K. Williams
Guy D. Whitten
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:259-2662020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:259-266
article
Multivariate outlier detection in Stata
Before implementing any multivariate statistical analysis based on em- pirical covariance matrices, it is important to check whether outliers are present because their existence could induce significant biases. In this article, we present the minimum covariance determinant estimator, which is commonly used in ro- bust statistics to estimate location parameters and multivariate scales. These estimators can be used to robustify Mahalanobis distances and to identify outliers. Verardi and Croux (1999, Stata Journal 9: 439–453; 2010, Stata Journal 10: 313) programmed this estimator in Stata and made it available with the mcd command. The implemented algorithm is relatively fast and, as we show in the simulation example section, outperforms the methods already available in Stata, such as the Hadi method. Copyright 2010 by StataCorp LP.
mcd, detection, multivariate outliers, robustness, minimum covariance determinant
http://www.stata-journal.com/article.html?article=st0192
Vincenzo Verardi
Catherine Dehon
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:1642020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:164
article
Software updates
Updates for a previously published package is provided.
Editors
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:120-1252020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:120-125
article
eq5d: A command to calculate index values for the EQ-5D quality-of-life instrument
The eq5d command computes an index value using the individual mobility, self care, usual activities, pain or discomfort, and anxiety or depression responses from the EuroQol EQ-5D quality-of-life instrument. The command calculates index values using value sets from eight countries: the United Kingdom, the United States, Spain, Germany, the Netherlands, Denmark, Japan, and Zimbabwe. Copyright 2011 by StataCorp LP.
eq5d, EQ-5D, index value
http://www.stata-journal.com/article.html?article=st0220
Juan Manuel Ramos-Goni
Oliver Rivero-Arias
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:299-3072020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:299-307
article
The S-estimator of multivariate location and scatter in Stata
In this article, we introduce a new Stata command, smultiv, that implements the S-estimator of multivariate location and scatter. Using simulated data, we show that smultiv outperforms mcd, an alternative robust estimator. Finally, we use smultiv to perform robust principal component analysis and least squares regression on a real dataset. Copyright 2012 by StataCorp LP.
smultiv, S-estimator, robustness, outlier, robust principal component analysis, robust regression
http://www.stata-journal.com/article.html?article=st0229
Vincenzo Verardi
Alice McCathie
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:375-3922020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:375-392
article
Apportionment methods
Apportionment methods are used to translate a set of positive natural numbers into a set of smaller natural numbers while keeping the proportions between the numbers very similar. The methods are used to allocate seats in a chamber proportionally to the number of votes for a party in an election or proportionally to regional populations. In this article, we describe six apportionment methods and the user-written egen function apport(), which implements these methods. Copyright 2012 by StataCorp LP.
apport(), egen, apportionment method, elections, Hamilton's method, Jefferson's method, Webster's method, Hill's method, Dean's method, Adams's method
http://www.stata-journal.com/article.html?article=st0265
Ulrich Kohler
Janina Zeh
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:331-3382020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:331-338
article
bacon: An effective way to detect outliers in multivariate data using Stata (and Mata)
Identifying outliers in multivariate data is computationally intensive. The bacon command, presented in this article, allows one to quickly identify out- liers, even on large datasets of tens of thousands of observations. bacon constitutes an attractive alternative to hadimvo, the only other command available in Stata for the detection of outliers. Copyright 2010 by StataCorp LP.
bacon, hadimvo, outliers detection, multivariate outliers
http://www.stata-journal.com/article.html?article=st0197
Sylvain Weber
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:332-3412020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:332-341
article
Speaking Stata: Transforming the time axis
The time variable is most commonly plotted precisely as recorded in graphs showing change over time. However, if the most interesting part of the graph is very crowded, then transforming the time axis to give that part more space is worth consideration. In this column, I discuss logarithmic scales, square root scales, and scale breaks as possible solutions. Copyright 2012 by StataCorp LP.
axis scale, graphics, logarithm, scale break, square root, time, transformation
http://www.stata-journal.com/article.html?article=gr0052
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:45-602020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:45-60
article
Age–period–cohort models in Stata
n this article, I describe and illustrate Stata programs that facilitate i) the fitting of smooth age–period–cohort models to event data and ii) the plotting of observed and fitted rates. The programs include postestimation functional- ity and flexibility to fit models not possible using Stata’s glm command. What distinguishes this article from a recent Stata Journal article on age–period–cohort models by Rutherford, Lambert, and Thompson (2010, Stata Journal 10: 606–627) is that the emphasis here is on extrapolating the model fit to make projections into the future. Copyright 2012 by StataCorp LP.
apcspline, grmean, age–period–cohort models
http://www.stata-journal.com/article.html?article=st0245
Peter D. Sasieni
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:165-1992020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:165-199
article
Resampling variance estimation for complex survey data
In this article, I discuss the main approaches to resampling variance es- timation in complex survey data: balanced repeated replication, the jackknife, and the bootstrap. Balanced repeated replication and the jackknife are implemented in the Stata svy suite. The bootstrap for complex survey data is implemented by the bsweights command. I describe this command and provide working examples. Copyright 2010 by StataCorp LP.
bsweights, balanced repeated replication, balanced bootstrap, bootstrap, complex survey data, Hadamard matrix, half-samples, jackknife, re- sampling, weighted bootstrap, mean bootstrap
http://www.stata-journal.com/article.html?article=st0187
Stanislav Kolenikov
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:215-2252020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:215-225
article
poisson: Some convergence issues
In this article, we identify and illustrate some shortcomings of the poisson command in Stata. Specifically, we point out that the command fails to check for the existence of the estimates, and we show that it is very sensitive to numerical problems. While these are serious problems that may prevent users from obtaining estimates or may even produce spurious and misleading results, we show that the informed user often has simple workarounds available for addressing these problems. Copyright 2011 by StataCorp LP.
ppml, Poisson regression, collinearity, complete separation, nu- merical problems, perfect prediction
http://www.stata-journal.com/article.html?article=st0225
J. M. C. Santos Silva
Silvana Tenreyro
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:5702020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:570
article
Software updates
Updates for a previously published package is provided.
Editors
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:143-1512020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:143-151
article
Speaking Stata: The statsby strategy
The statsby command collects statistics from a command yielding r-class or e-class results across groups of observations and yields a new reduced dataset. statsby is commonly used to graph such data in comparisons of groups; the subsets and total options of statsby are particularly useful in this regard. In this article, I give examples of using this approach to produce box plots and plots of confidence intervals.
statsby, graphics, groups, comparisons, box plots, confidence intervals
http://www.stata-journal.com/article.html?article=gr0045
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:327-3442020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:327-344
article
Logistic quantile regression in Stata
We present a set of Stata commands for the estimation, prediction, and graphical representation of logistic quantile regression described by Bottai, Cai, and McKeown (2010, Statistics in Medicine 29: 309–317). Logistic quantile regression models the quantiles of outcome variables that take on values within a bounded, known interval, such as proportions (or percentages) within 0 and 1, school grades between 0 and 100 points, and visual analog scales between 0 and 10 cm. We describe the syntax of the new commands and illustrate their use with data from a large cohort of Swedish men on lower urinary tract symptoms measured on the international prostate symptom score, a widely accepted score bounded between 0 and 35. Copyright 2011 by StataCorp LP.
lqreg, lqregpred, lqregplot, logistic quantile regression, robust regression, bounded outcomes
http://www.stata-journal.com/article.html?article=st0231
Nicola Orsini
Matteo Bottai
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:213-2392020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:213-239
article
Estimation of ordered response models with sample selection
We introduce two new Stata commands for the estimation of an or- dered response model with sample selection. The opsel command uses a standard maximum-likelihood approach to fit a parametric specification of the model where errors are assumed to follow a bivariate Gaussian distribution. The snpopsel command uses the semi-nonparametric approach of Gallant and Nychka (1987, Econometrica 55: 363–390) to fit a semiparametric specification of the model where the bivariate density function of the errors is approximated by a Hermite polynomial expansion. The snpopsel command extends the set of Stata routines for semi-nonparametric estimation of discrete response models. Compared to the other semi-nonparametric estimators, our routine is relatively faster because it is programmed in Mata. In addition, we provide new postestimation routines to compute linear predictions, predicted probabilities, and marginal effects. These improvements are also extended to the set of semi-nonparametric Stata commands originally written by Stewart (2004, Stata Journal 4: 27–39) and De Luca (2008, Stata Journal 8: 190–220). An illustration of the new opsel and snpopsel com- mands is provided through an empirical application on self-reported health with selectivity due to sample attrition. Copyright 2011 by StataCorp LP.
opsel, opsel postestimation, sneop, sneop postestimation, snp2, snp2 postestimation, snp2s, snp2s postestimation, snpopsel, snpopsel postestima- tion, snp, snp postestimation, ordered response models, sample selection, para- metric maximum-likelihood estimation, semi-nonparametric estimation
http://www.stata-journal.com/article.html?article=st0226
Giuseppe De Luca
Valeria Perotti
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:540-5672020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:540-567
article
Fitting heterogeneous choice models with oglm
When a binary or ordinal regression model incorrectly assumes that er- ror variances are the same for all cases, the standard errors are wrong and (unlike ordinary least squares regression) the parameter estimates are biased. Hetero- geneous choice models (also known as location–scale models or heteroskedastic ordered models) explicitly specify the determinants of heteroskedasticity in an at- tempt to correct for it. Such models are also useful when the variance itself is of substantive interest. This article illustrates how the author’s Stata program oglm (ordinal generalized linear models) can be used to fit heterogeneous choice and related models. It shows that two other models that have appeared in the liter- ature (Allison’s model for group comparisons and Hauser and Andrew’s logistic response model with proportionality constraints) are special cases of a heteroge- neous choice model and alternative parameterizations of it. The article further argues that heterogeneous choice models may sometimes be an attractive alterna- tive to other ordinal regression models, such as the generalized ordered logit model fit by gologit2. Finally, the article offers guidelines on how to interpret, test, and modify heterogeneous choice models. Copyright 2010 by StataCorp LP.
oglm, heterogeneous choice model, location–scale model, gologit2, ordinal regression, heteroskedasticity, generalized ordered logit model
http://www.stata-journal.com/article.html?article=st0208
Richard Williams
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:69-812020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:69-81
article
Power transformation via multivariate Box–Cox
We present a new Stata estimation program, mboxcox, that computes the normalizing scaled power transformations for a set of variables. The multivari- ate Box–Cox method (defined in Velilla, 1993, Statistics and Probability Letters 17: 259–263; used in Weisberg, 2005, Applied Linear Regression [Wiley]) is used to determine the transformations. We demonstrate using a generated example and a real dataset. Copyright 2010 by StataCorp LP.
mboxcox, mbctrans, boxcox, regress
http://www.stata-journal.com/article.html?article=st0184
Charles Lindsey
Simon Sheather
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:159-2062020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:159-206
article
Fitting fully observed recursive mixed-process models with cmp
At the heart of many econometric models are a linear function and a normal error. Examples include the classical small-sample linear regression model and the probit, ordered probit, multinomial probit, tobit, interval regression, and truncated-distribution regression models. Because the normal distribution has a natural multidimensional generalization, such models can be combined into mul- tiequation systems in which the errors share a multivariate normal distribution. The literature has historically focused on multistage procedures for fitting mixed models, which are more efficient computationally, if less so statistically, than maxi- mum likelihood. Direct maximum likelihood estimation has been made more prac- tical by faster computers and simulated likelihood methods for estimating higher- dimensional cumulative normal distributions. Such simulated likelihood methods include the Geweke–Hajivassiliou–Keane algorithm (Geweke, 1989, Econometrica 57: 1317–1339; Hajivassiliou and McFadden, 1998, Econometrica 66: 863–896; Keane, 1994, Econometrica 62: 95–116). Maximum likelihood also facilitates a generalization to switching, selection, and other models in which the number and types of equations vary by observation. The Stata command cmp fits seemingly un- related regressions models of this broad family. Its estimator is also consistent for recursive systems in which all endogenous variables appear on the right-hand sides as observed. If all the equations are structural, then estimation is full-information maximum likelihood. If only the final stage or stages are structural, then estima- tion is limited-information maximum likelihood. cmp can mimic a score of built-in and user-written Stata commands. It is also appropriate for a panoply of models that previously were hard to estimate. Heteroskedasticity, however, can render cmp inconsistent. This article explains the theory and implementation of cmp and of a related Mata function, ghk2(), that implements the Geweke–Hajivassiliou–Keane algorithm. Copyright 2011 by StataCorp LP.
cmp, ghk2, Geweke–Hajivassiliou–Keane algorithm, recursive mixed-process models, seemingly unrelated regression, conditional mixed-process models
http://www.stata-journal.com/article.html?article=st0224
David Roodman
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:240-2542020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:240-254
article
Simon's minimax and optimal and Jung's admissible two-stage designs with or without curtailment
This article describes a new Stata command called simontwostage, which calculates the critical values and sample sizes for two-stage designs for phase II oncology trials. Options are provided to determine the minimax and optimal designs proposed by Simon (1989, Controlled Clinical Trials 10: 1–10) and admissible designs described by Jung et al. (2004, Statistics in Medicine 23: 561–569). Furthermore, nonstochastic and stochastic curtailment rules can be im- plemented in both stages of the trial, and the properties of the curtailed designs can be examined. Copyright 2011 by StataCorp LP.
simontwostage, Simon's two-stage design, Jung's admissible design, phase II trials, conditional power, curtailment
http://www.stata-journal.com/article.html?article=st0227
Cornelia U. Kunz
Meinhard Kieser
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:3522020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:352
article
Software updates
Updates for a previously published package are provided. Copyright 2012 by StataCorp LP.
Editors
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:130-1462020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:130-146
article
tt: Treelet transform with Stata
The treelet transform is a recent data reduction technique from the field of machine learning. Sharing many similarities with principal component analysis, the treelet transform can reduce a multidimensional dataset to the projections on a small number of directions or components that account for much of the variation in the original data. However, in contrast to principal component analysis, the treelet transform produces sparse components. This can greatly simplify interpretation. I describe the tt Stata add-on for performing the treelet transform. The add- on includes a Mata implementation of the treelet transform algorithm alongside other functionality to aid in the practical application of the treelet transform. I demonstrate an example of a basic exploratory data analysis using the tt add-on. Copyright 2012 by StataCorp LP.
tt, ttcv, ttscree, ttdendro, ttloading, ttpredict, ttstab, treelet, principal component analysis, dimension reduction, factor analysis
http://www.stata-journal.com/article.html?article=st0249
Anders Gorst-Rasmussen
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:454-4602020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:454-460
article
Importing presidential approval poll results
The American Presidency Project (http://www.presidency.ucsb.edu) provides presidential job approval poll results. These data are available for each U.S. president since President Franklin D. Roosevelt and for all the job approval polls conducted since his presidency. In this article, we propose the Stata command approval, which downloads these approval poll results in their original format, an HTML table. approval then parses the HTML table and prepares the data as a usable Stata dataset.
approval, presidential job approval, presidential popularity, U.S. presidents, parse HTML
http://www.stata-journal.com/article.html?article=dm0064
Mehmet F. Dicle
Betul Dicle
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:5052020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:505
article
Software updates
Updates for a previously published package. Copyright 2010 by StataCorp LP.
Editors
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:479-5042020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:479-504
article
Sensible parameters for univariate and multivariate splines
The package bspline, downloadable from Statistical Software Components, now has three commands. The first, bspline, generates a basis of Schoenberg B-splines. The second, frencurv, generates a basis of reference splines whose parameters in the regression model are simply values of the spline at reference points on the X axis. The recent addition, flexcurv, is an easy-to-use version of frencurv that generates reference splines with automatically generated, sensibly spaced knots. frencurv and flexcurv now have the additional option of generating an incomplete basis of reference splines, with the reference spline for a baseline reference point omitted or set to 0. This incomplete basis can be completed by adding the standard unit vector to the design matrix and can then be used to estimate differences between values of the spline at the remaining reference points and the value of the spline at the baseline reference point. Reference splines therefore model continuous factor variables as indicator variables (or "dummies") model discrete factor variables. The method can be extended in a similar way to define factor-product bases, allowing the user to estimate factor-combination means, subset-specific effects, or even factor interactions involving multiple continuous or discrete factors.
bspline, flexcurv, frencurv, polynomial, spline, B-spline, interpolation, linear, quadratic, cubic, multivariate, factor, interaction
http://www.stata-journal.com/article.html?article=sg151_2
Roger B. Newson
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:191-2132020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:191-213
article
From resultssets to resultstables in Stata
The listtab package supersedes the listtex package. It inputs a list of variables and outputs them as a table in one of several formats, including TeX, LaTeX, HTML, Microsoft Rich Text Format, or possibly future XML-based formats. It works with a team of four other packages: sdecode, chardef, xrewide, and ingap, which are downloaded from the Statistical Software Components archive. The sdecode package is an extension of decode; it allows the user to add prefixes or suffixes and to output exponents as TeX, HTML, or Rich Text Format superscripts. It is called by another package, bmjcip, to convert estimates, confidence limits, and p-values from numeric to string. The chardef package is an extension of char define; it allows the user to define a characteristic for a list of variables, adding prefixes or suffixes. The xrewide package is an extension of reshape wide; it allows the user to store additional results in variable characteristics or local macros. The ingap package inserts gap observations into a dataset to form gap rows when the dataset is converted to a table. Together, these packages form a general toolkit to convert Stata datasets (or resultssets) to tables, complete with row labels and column labels. Examples are demonstrated using LaTeX and also using the rtfutil package to create Rich Text Format documents. This article uses data from the Avon Longitudinal Study of Parents and Children, based at Bristol University, UK. Copyright 2012 by StataCorp LP.
listtab, listtex, sdecode, chardef, xrewide, ingap, bmjcip, rtfutil, keyby, addinby, table, resultsset, TeX, LaTeX, HTML, XML, RTF
http://www.stata-journal.com/article.html?article=st0254
Roger B. Newson
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:345-3672020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:345-367
article
The Chen–Shapiro test for normality
The Chen–Shapiro test for normality (Chen and Shapiro, 1995, Journal of Statistical Computation and Simulation 53: 269–288) has been shown in simulation studies to be generally slightly more powerful than the commonly used Shapiro-Wilk and Shapiro-Francia tests, implemented in Stata official commands swilk and sfrancia. I present the chens command, which performs the Chen– Shapiro test in Stata. Copyright 2012 by StataCorp LP.
chens, normality testing, Chen–Shapiro test
http://www.stata-journal.com/article.html?article=st0264
Michal Brzezinski
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:61-712020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:61-71
article
Estimating panel time-series models with heterogeneous slopes
This article introduces a new Stata command, xtmg, that implements three panel time-series estimators, allowing for heterogeneous slope coefficients across group members: the Pesaran and Smith (1995, Journal of Econometrics 68: 79 – 113) mean group estimator, the Pesaran (2006, Econometrica 74: 967 – 1012) common correlated effects mean group estimator, and the augmented mean group estimator introduced by Eberhardt and Teal (2010, Discussion Paper 515, Department of Economics, University of Oxford). The latter two estimators further allow for unobserved correlation across panel members (cross-section dependence). Copyright 2012 by StataCorp LP.
xtmg, nonstationary panels, parameter heterogeneity, cross- sectional dependence
http://www.stata-journal.com/article.html?article=st0246
Markus Eberhardt
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:82-942020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:82-94
article
Estimation of hurdle models for overdispersed count data
Hurdle models based on the zero-truncated Poisson-lognormal distribution are rarely used in applied work, although they incorporate some advantages compared with their negative binomial alternatives. I present a command that enables Stata users to estimate Poisson-lognormal hurdle models. I use adaptive Gauss–Hermite quadrature to approximate the likelihood function, and I evaluate the performance of the estimator in Monte Carlo experiments. The model is applied to the number of doctor visits in a sample of the U.S. Medical Expenditure Panel Survey. Copyright 2011 by StataCorp LP.
ztpnm, count-data analysis, hurdle models, overdispersion, Poisson-lognormal hurdle models
http://www.stata-journal.com/article.html?article=st0218
Helmut Farbmacher
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:305-3142020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:305-314
article
Speaking Stata: Compared with...
Many problems in data management center on relating values to values in other observations, either within a dataset as a whole or within groups such as panels. This column reviews some basic Stata techniques helpful for such tasks, including the use of subscripts, summarize, by:, sum(), cond(), and egen. Several techniques exploit the fact that logical expressions yield 1 when true and 0 when false. Dividing by zero to yield missings is revealed as a surprisingly valuable device. Copyright 2011 by StataCorp LP.
data management, panels, subscripting, summarize, by, sum(), cond(), egen, logical expressions
http://www.stata-journal.com/article.html?article=dm0055
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:408-4222020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:408-422
article
Regression analysis of censored data using pseudo-observations
We draw upon a series of articles in which a method based on pseu- dovalues is proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. The models, once the pseudovalues have been computed, can be fit using standard generalized estimating equation software. Here we present Stata procedures for computing these pseudo-observations. An example from a bone marrow transplantation study is used to illustrate the method.
stpsurv, stpci, stpmean, pseudovalues, time-to-event, survival analysis
http://www.stata-journal.com/article.html?article=st0202
Erik T. Parner
Per K. Andersen
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:3-282020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:3-28
article
A new system for formatting estimation tables
I present an entirely rewritten version of the outreg command, which creates tables from the results of Stata estimation commands and generates for- matted Microsoft Word or LATEX files. My objective is to provide as complete control as is practical over the layout and formatting of the estimation tables in both file formats. outreg provides a wide range of estimation statistics (including confidence intervals and marginal effects), can control the number and arrange- ment of the statistics displayed, and can merge subsequent estimation results into the same table. Users can specify numeric formats, font sizes, and font types at the table cell level, as well as lines in the table and row spacing. Multiple tables can be written to the same document, making it possible to create a fully format- ted statistical appendix from a do-file. I demonstrate in examples the numerous formatting options for the outreg command. Copyright 2012 by StataCorp LP.
outreg, tables, estimation, formatting, Microsoft Word, TeX, LaTeX
http://www.stata-journal.com/article.html?article=sg97.4
John Luke Gallup
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:386-4022020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:386-402
article
The impact of different sources of body mass index assessment on smoking onset: An application of multiple-source information models
Multiple-source data are often collected to provide better information of some underlying construct that is difficult to measure or likely to be missing. In this article, we describe regression-based methods for analyzing multiple-source data in Stata. We use data from the BROMS Cohort Study, a cohort of Swedish adolescents who collected data on body mass index that was self-reported and that was measured by nurses. We draw together into a single frame of reference both source reports and relate these to smoking onset. This unified method has two advantages over traditional approaches: 1) the relative predictiveness of each source can be assessed and 2) all subjects contribute to the analysis. The methods are applicable to other areas of epidemiology where multiple-source reports are used. Copyright 2011 by StataCorp LP.
multiple informants, multiple-source predictors, regression analysis, generalized estimating equations, missing data
http://www.stata-journal.com/article.html?article=st0234
Maria Paola Caria
Rino Bellocco
Maria Rosaria Galanti
Nicholas J. Horton
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:267-2802020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:267-280
article
Data envelopment analysis
In this article, we introduce a user-written data envelopment analysis command for Stata. Data envelopment analysis is a linear programming method for assessing the efficiency and productivity of units called decision-making units. Over the last decades, data envelopment analysis has gained considerable attention as a managerial tool for measuring performance of organizations, and it has been used widely for assessing the efficiency of public and private sectors such as banks, airlines, hospitals, universities, defense firms, and manufacturers. The dea com- mand in Stata will allow users to conduct the standard optimization procedure and extended managerial analysis. The dea command developed in this article selects the chosen variables from a Stata data file and constructs a linear programming model based on the selected dea options. Examples are given to illustrate how one could use the code to measure the efficiency of decision-making units. Copyright 2010 by StataCorp LP.
dea, data envelopment analysis, linear programming, nonparametric, efficiency, decision-making units
http://www.stata-journal.com/article.html?article=st0193
Yong-bae Ji
Choonjoo Lee
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:255-2702020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:255-270
article
Multivariate random-effects meta-regression: Updates to mvmeta
An extension of mvmeta, my program for multivariate random-effects meta-analysis, is described. The extension handles meta-regression. Estima- tion methods available are restricted maximum likelihood, maximum likelihood, method of moments, and fixed effects. The program also allows a wider range of models (Riley’s overall correlation model and structured between-studies covari- ance); better estimation (using Mata for speed and correctly allowing for missing data); and new postestimation facilities (I-squared, standard errors and confidence intervals for between-studies standard deviations and correlations, and identifi- cation of the best intervention). The program is illustrated using a multiple- treatments meta-analysis. Copyright 2011 by StataCorp LP.
mvmeta, meta-analysis, meta-regression, I-squared
http://www.stata-journal.com/article.html?article=st0156_1
Ian R. White
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:447-4532020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:447-453
article
A generalized Hosmer–Lemeshow goodness-of-fit test for multinomial logistic regression models
Testing goodness of fit is an important step in evaluating a statistical model. For binary logistic regression models, the Hosmer–Lemeshow goodnessof- fit test is often used. For multinomial logistic regression models, however, few tests are available. We present the mlogitgof command, which implements a goodness-of-fit test for multinomial logistic regression models. This test can also be used for binary logistic regression models, where it gives results identical to the Hosmer-Lemeshow test.
mlogitgof, goodness of fit, logistic regression, multinomial logistic regression, polytomous logistic regression
http://www.stata-journal.com/article.html?article=st0269
Morten W. Fagerland
David W. Hosmer
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:271-2892020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:271-289
article
M statistic commands: Interpoint distance distribution analysis
We implement the commands mstat and mtest to perform inference based on the M statistic, a statistic that can be used to compare the interpoint distance distribution across groups of observations. The analyses are based on the study of the interpoint distances between n points in a k-dimensional setting to produce a one-dimensional real-valued test statistic. The locations are distributed in a region of the plane. When we consider all (n 2) interpoint distances, the dependencies among them are difficult to express 2 analytically, but their distribution is informative, and the M statistic can be built to summarize one aspect of this information. The two commands can be used on a wide class of datasets to test the null hypothesis that two groups have the same (spatial) distribution. mstat and mtest return the exact M test statistic. Moreover, mtest executes a Monte Carlo–type permutation test, which returns the empirical p-value together with its confidence interval. This is the command to use in most situations, because the convergence of M to its asymptotic chi-squared distribution is slow. Both commands can be used to obtain graphical output of the empirical density function of the interpoint distance distributions in the two groups and the two- dimensional map of the n observations in the plane. The descriptions of the commands are accompanied by examples of applications with real and simulated data. We run the test on the Alt and Vach grave site dataset (Manjourides and Pagano, forthcoming, Statistics in Medicine) and reject the null hypothesis, in contradiction to other published analyses. We also show how to adapt the techniques to discrete datasets with more than one unit in each location. Finally, we report an extensive application on breast cancer data in Massachusetts; in the application, we show the compatibility of the M commands with Pisati's spmap package. Copyright 2011 by StataCorp LP.
mstat, mtest, M statistic, interpoint distance, Monte Carlo test, spmap
http://www.stata-journal.com/article.html?article=st0228
Pietro Tebaldi
Marco Bonetti
Marcello Pagano
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:126-1422020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:126-142
article
Speaking Stata: MMXI and all that: Handling Roman numerals within Stata
The problem of handling Roman numerals in Stata is used to illustrate issues arising in the handling of classification codes in character string form and their numeric equivalents. The solutions include Stata programs and Mata func- tions for conversion from numeric to string and from string to numeric. Defining acceptable input and trapping and flagging incorrect or unmanageable inputs are key concerns in good practice. Regular expressions are especially valuable for this problem.
fromroman, toroman, Roman numerals, strings, regular expressions, Mata, data management
http://www.stata-journal.com/article.html?article=dm0054
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:339-3582020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:339-358
article
Comparing the predictive powers of survival models using Harrell’s C or Somers’ D
Medical researchers frequently make statements that one model pre- dicts survival better than another, and they are frequently challenged to provide rigorous statistical justification for those statements. Stata provides the estat concordance command to calculate the rank parameters Harrell’s C and Somers’ D as measures of the ordinal predictive power of a model. However, no confidence limits or p-values are provided to compare the predictive power of distinct models. The somersd package, downloadable from Statistical Software Components, can provide such confidence intervals, but they should not be taken seriously if they are calculated in the dataset in which the model was fit. Methods are demonstrated for fitting alternative models to a training set of data, and then measuring and comparing their predictive powers by using out-of-sample prediction and somersd in a test set to produce statistically sensible confidence intervals and p-values for the differences between the predictive powers of different models. Copyright 2010 by StataCorp LP.
somersd, stcox, estat concordance, streg, predict, survival, model validation, prediction, concordance, rank methods, Harrell’s C, Somers’ D
http://www.stata-journal.com/article.html?article=st0198
Roger B. Newson
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:147-1582020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:147-158
article
Speaking Stata: Output to order
Wanting to present Stata output in a different way is a very common desire that lies behind many substantial user-written programs. Here I start at the beginning with some basic tips and tricks. I discuss using display to replay the results you want; putting results into new variables so that they can be listed, tabulated, or plotted; and using existing reduction commands to produce new datasets containing results.
collapse, contract, display, egen, estimates, foreach, format, forvalues, generate, list, postfile, quietly, replace, saved results, SMCL, statsby, tabdisp
http://www.stata-journal.com/article.html?article=pr0053
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:3252020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:325
article
Software updates
Updates for a previously published package is provided. Copyright 2011 by StataCorp LP.
Editors
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:281-2962020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:281-296
article
Speaking Stata: Finding variables
A standard task in data management is producing a list of variable names showing which variables have specific properties, such as being of string type, or having value labels attached, or having a date format. A first-principles strategy is to loop over variables, checking one by one which have the property or properties concerned. I discuss this strategy in detail with a variety of examples. A canned alternative is offered by the official command ds or by a new command, findname, published formally with this column. Copyright 2010 by StataCorp LP.
findname, ds, variable names, variable labels, value labels, types, formats, characteristics, data management
http://www.stata-journal.com/article.html?article=dm0048
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:169-1812020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:169-181
article
A robust instrumental-variables estimator
The classical instrumental-variables estimator is extremely sensitive to the presence of outliers in the sample. This is a concern because outliers can strongly distort the estimated effect of a given regressor on the dependent variable. Although outlier diagnostics exist, they frequently fail to detect atypical observations because they are themselves based on nonrobust (to outliers) estimators. Furthermore, they do not take into account the combined influence of outliers in the first and second stages of the instrumental-variables estimator. In this article, we present a robust instrumental-variables estimator, initially proposed by Cohen Freue, Ortiz-Molina, and Zamar (2011, Working paper: http://www.stat.ubc.ca/˜ruben/website/cv/cohen-zamar.pdf ), that we have programmed in Stata and made available via the robivreg command. We have improved on their estimator in two different ways. First, we use a weighting scheme that makes our estimator more efficient and allows the computations of the usual identification and overidentifying restrictions tests. Second, we implement a generalized Hausman test for the presence of outliers. Copyright 2012 by StataCorp LP.
robivreg, multivariate outliers, robustness, S-estimator, instrumental variables
http://www.stata-journal.com/article.html?article=st0252
Rodolphe Desbordes
Vincenzo Verardi
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:650-6692020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:650-669
article
Variable selection in linear regression
We present a new Stata program, vselect, that helps users perform variable selection after performing a linear regression. Options for stepwise meth- ods such as forward selection and backward elimination are provided. The user may specify Mallows’s Cp, Akaike’s information criterion, Akaike’s corrected informa- tion criterion, Bayesian information criterion, or R2 adjusted as the information criterion for the selection. When the user specifies the best subset option, the leaps-and-bounds algorithm (Furnival and Wilson, Technometrics 16: 499–511) is used to determine the best subsets of each predictor size. All the previously men- tioned information criteria are reported for each of these subsets. We also provide options for doing variable selection only on certain predictors (as in [R] nestreg) and support for weighted linear regression. All options are demonstrated on real datasets with varying numbers of predictors.
vselect, variable selection, regress, nestreg
http://www.stata-journal.com/article.html?article=st0213
Charles Lindsey
Simon Sheather
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:11-292020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:11-29
article
Direct and indirect effects in a logit model
In this article, I discuss a method by Erikson et al. (2005, Proceedings of the National Academy of Science 102: 9730–9733) for decomposing a total effect in a logit model into direct and indirect effects. Moreover, I extend this method in three ways. First, in the original method the variable through which the indirect effect occurs is assumed to be normally distributed. In this article, the method is generalized by allowing this variable to have any distribution. Second, the original method did not provide standard errors for the estimates. In this article, the bootstrap is proposed as a method of providing those. Third, I show how to include control variables in this decomposition, which was not allowed in the original method. The original method and these extensions are implemented in the ldecomp command. Copyright 2010 by StataCorp LP.
ldecomp, mediation, intervening variable, logit, direct effect, indirect effect
http://www.stata-journal.com/article.html?article=st0182
Maarten L. Buis
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:1672020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:167
article
Software updates
Updates for a previously published package is provided.
Editors
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:1-292020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:1-29
article
A procedure to tabulate and plot results after flexible modeling of a quantitative covariate
The use of flexible models for the relationship between a quantitative covariate and the response variable can be limited by the difficulty in interpret- ing the regression coefficients. In this article, we present a new postestimation command, xblc, that facilitates tabular and graphical presentation of these relationships. Cubic splines are given special emphasis. We illustrate the command through several worked examples using data from a large study of Swedish men on the relation between physical activity and the occurrence of lower urinary tract symptoms. Copyright 2011 by StataCorp LP.
xblc, cubic spline, modeling strategies, logistic regression
http://www.stata-journal.com/article.html?article=st0215
Nicola Orsini
Sander Greenland
oai:RePEc:tsj:stataj:v:12:y:2012:i:1:p:29-442020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:1:p:29-44
article
Scrambled Halton sequences in Mata
In this article, I discuss the need for Halton sequences and discuss the Mata implementation of scrambling of Halton sequences, providing several examples of scrambling procedures. Copyright 2012 by StataCorp LP.
Halton sequence, quasi-Monte Carlo, scrambled Halton sequence
http://www.stata-journal.com/article.html?article=st0244
Stanislav Kolenikov
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:369-3852020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:369-385
article
simsum: Analyses of simulation studies including Monte Carlo error
A new Stata command, simsum, analyzes data from simulation studies. The data may comprise point estimates and standard errors from several analysis methods, possibly resulting from several different simulation settings. simsum can report bias, coverage, power, empirical standard error, relative precision, average model-based standard error, and the relative error of the standard error. Monte Carlo errors are available for all of these estimated quantities. Copyright 2010 by StataCorp LP.
simsum, simulation, Monte Carlo error, normal approximation, sandwich variance
http://www.stata-journal.com/article.html?article=st0200
Ian R. White
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:315-3302020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:315-330
article
An introduction to maximum entropy and minimum cross-entropy estimation using Stata
Maximum entropy and minimum cross-entropy estimation are applica- ble when faced with ill-posed estimation problems. I introduce a Stata command that estimates a probability distribution using a maximum entropy or minimum cross-entropy criterion. I show how this command can be used to calibrate survey data to various population totals. Copyright 2010 by StataCorp LP.
maxentropy, maximum entropy, minimum cross-entropy, survey calibration, sample weights
http://www.stata-journal.com/article.html?article=st0196
Martin Wittenberg
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:507-5392020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:507-539
article
A suite of commands for fitting the skew-normal and skew-t models
Nonnormal data arise often in practice, prompting the development of flexible distributions for modeling such situations. In this article, we describe two multivariate distributions, the skew-normal and the skew-t, which can be used to model skewed and heavy-tailed continuous data. We then discuss some inferential issues that can arise when fitting these distributions to real data. We also consider the use of these distributions in a regression setting for more flexible parametric modeling of the conditional distribution given other predictors. We present commands for fitting univariate and multivariate skew-normal and skew-t regressions in Stata (skewnreg, skewtreg, mskewnreg, and mskewtreg) as well as some postestimation features (predict and skewrplot). We also demonstrate the use of the commands for the analysis of the famous Australian Institute of Sport data and U.S. precipitation data. Copyright 2010 by StataCorp LP.
skewnreg, skewtreg, mskewnreg, mskewtreg, skewrplot, predict, distribution, heavy tails, nonnormal, precipitation, regression, skewness, skew- normal, skew-t
http://www.stata-journal.com/article.html?article=st0207
Yulia V. Marchenko
Marc G. Genton
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:345-3672020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:345-367
article
Nonparametric bounds for the causal effect in a binary instrumental-variable model
Instrumental variables can be used to make inferences about causal effects in the presence of unmeasured confounding. For a model in which the instrument, intermediate/treatment, and outcome variables are all binary, Balke and Pearl (1997, Journal of the American Statistical Association 92: 1172–1176) derived nonparametric bounds for the intervention probabilities and the average causal effect. We have implemented these bounds in two commands: bpbounds and bpboundsi. We have also implemented several extensions to these bounds. One of these extensions applies when the instrument and outcome are measured in one sample and the instrument and intermediate are measured in another sample. We have also implemented the bounds for an instrument with three categories, as is common in Mendelian randomization analyses in epidemiology and for the case where a monotonic effect of the instrument on the intermediate can be assumed. In each case, we calculate the instrumental-variable inequality constraints as a check for gross violations of the instrumental-variable conditions. The use of the commands is illustrated with a re-creation of the original Balke and Pearl analysis and with a Mendelian randomization analysis. We also give a simulated example to demonstrate that the instrumental-variable inequality constraints can both detect and fail to detect violations of the instrumental-variable conditions. Copyright 2011 by StataCorp LP.
bpbounds, bpboundsi, average causal effect, causal inference, instrumental variables, nonparametric bounds
http://www.stata-journal.com/article.html?article=st0232
Tom M. Palmer
Roland R. Ramsahai
Vanessa Didelez
Nuala A. Sheehan
oai:RePEc:tsj:stataj:v:11:y:2011:i:1:p:95-1052020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:1:p:95-105
article
Right-censored Poisson regression model
I present the rcpoisson command for right-censored count-data mod-els with a constant (Terza 1985, Economics Letters 18: 361–365) and variable censoring threshold (Caudill and Mixon 1995, Empirical Economics 20: 183–196). I show the effects of censoring on estimation results by comparing the censored Poisson model with the uncensored one. Copyright 2011 by StataCorp LP.
rcpoisson, censoring, count data, Poisson model
http://www.stata-journal.com/article.html?article=st0219
Rafal Raciborski
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:82-1032020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:82-103
article
Centering and reference groups for estimates of fixed effects: Modifications to felsdvreg
Availability of large, multilevel longitudinal databases in various fields including labor economics (with workers and firms observed over time) and ed- ucation research (with students and teachers observed over time) has increased the application of panel-data models with multiple levels of fixed-effects. Existing software routines for fitting fixed-effects models were not designed for applications in which the primary interest is obtaining estimates of any of the fixed-effects parameters. Such routines typically report estimates of fixed effects relative to arbitrary holdout units. Contrasts to holdout units are not ideal in cases where the fixed-effects parameters are of interest because they can change capriciously, they do not correspond to the structural parameters that are typically of inter- est, and they are inappropriate for empirical Bayes (shrinkage) estimation. We develop an improved parameterization of fixed-effects models using sum-to-zero constraints that provides estimates of fixed effects relative to mean effects within well-defined reference groups (e.g., all firms of a given type or all teachers of a given grade) and provides standard errors for those estimates that are appropriate for shrinkage estimation. We implement our parameterization in a Stata routine called felsdvregdm by modifying the felsdvreg routine designed for fitting high- dimensional fixed-effects models. We demonstrate our routine with an example dataset from the Florida Education Data Warehouse. Copyright 2010 by StataCorp LP.
felsdvreg, felsdvregdm, fixed effects, linked employer–employee data, longitudinal achievement data
http://www.stata-journal.com/article.html?article=st0185
Kata Mihaly
Daniel F. McCaffrey
J. R. Lockwood
Tim R. Sass
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:125-1422020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:125-142
article
Mata Matters: Stata in Mata
Mata is Stata’s matrix language. In the Mata Matters column, we show how Mata can be used interactively to solve problems and as a programming language to add new features to Stata. The subject of this column is using Mata to solve data analysis problems with the new Stata commands putmata and getmata, which were added to official Stata 11 in the update of 11 February 2010.
Mata, getmata, putmata
http://www.stata-journal.com/article.html?article=pr0050
William Gould
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:4782020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:478
article
Software updates
Updates for a previously published package. Copyright 2011 by StataCorp LP.
Editors
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:257-2832020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:257-283
article
Threshold regression for time-to-event analysis: The stthreg package
In this article, we introduce the stthreg package of Stata commands to fit the threshold regression model, which is based on the first hitting time of a boundary by the sample path of a Wiener diffusion process and is well suited to applications involving time-to-event and survival data. The threshold regression model serves as an important alternative to the Cox proportional hazards model. The four commands that comprise this package for the threshold regression model are the model-fitting command stthreg, the postestimation command trhr for hazard-ratio calculation, the postestimation command trpredict for prediction, and the model diagnostics command sttrkm. These commands can also be used to implement an extended threshold regression model that accommodates applications where a cure rate exists. Copyright 2012 by StataCorp LP.
stthreg, trhr, trpredict, sttrkm, bootstrap, Cox proportional hazards regression, cure rate, first hitting time, hazard ratios, model diagnostics, survival analysis, threshold regression, time-to-event data, Wiener diffusion process
http://www.stata-journal.com/article.html?article=st0257
Tao Xiao
G. A. Whitmore
Xin He
Mei-Ling Ting Lee
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:200-2142020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:200-214
article
Optimal power transformation via inverse response plots
We present a new Stata command, irp, that generates the inverse response plot (Cook and Weisberg, 1994, Biometrika 81: 731–737) of a response on its predictors. Using the inverse response plot, an appropriate scaled power transformation for the positive response variable can be found so that the trans- formed response mean is linear in the predictors. The optimal transformation is displayed in the plot, as are user-specified guesses. By using the graphical display, the user may determine whether an appropriate transformation exists as well as determine its value. We demonstrate the irp command using both a generated and a real example. Copyright 2010 by StataCorp LP.
irp, regress, scaled power transformation
http://www.stata-journal.com/article.html?article=st0188
Charles Lindsey
Simon Sheather
oai:RePEc:tsj:stataj:v:11:y:2011:i:2:p:290-2982020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:2:p:290-298
article
Estimating adjusted risk ratios for matched and unmatched data: An update
The Stata 11 margins command makes it easier to estimate adjusted risk ratios, and the new robust variance option for xtpoisson, fe provides correct confidence intervals for adjusted risk ratios from matched-cohort data. Copyright 2011 by StataCorp LP.
conditional Poisson regression, margins, matched-cohort design, risk ratio, standardization, xtpoisson
http://www.stata-journal.com/article.html?article=st0162_1
Peter Cummings
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:353-3672020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:353-367
article
Diagnostics for multiple imputation in Stata
Our new command midiagplots makes diagnostic plots for multiple imputations created by mi impute. The plots compare the distribution of the imputed values with that of the observed values so that problems with the imputation model can be corrected before the imputed data are analyzed. We include an example and suggest extensions to other diagnostics. Copyright 2012 by StataCorp LP.
midiagplots, multiple imputation, diagnostics, model checking, imputed values, missing data, missing at random
http://www.stata-journal.com/article.html?article=st0263
Wesley Eddings
Yulia Marchenko
oai:RePEc:tsj:stataj:v:11:y:2011:i:3:p:460-4712020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:3:p:460-471
article
Speaking Stata: Fun and fluency with functions
Functions are the unsung heroes of Stata. This column is a tour of functions that might easily be missed or underestimated, with a potpourri of tips, tricks, and examples for a wide range of basic problems.
functions, numeric, string
http://www.stata-journal.com/article.html?article=dm0058
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:543-5482020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:543-548
article
Kernel-smoothed cumulative distribution function estimation with akdensity
In this article, I describe estimation of the kernel-smoothed cumulative distribution function with the user-written package akdensity, with formulas and an example.
akdensity, smoothed cumulative distribution function, kernel functions
http://www.stata-journal.com/article.html?article=st0037_3
Philippe Van Kerm
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:393-4052020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:393-405
article
Adjusting for age effects in cross-sectional distributions
Income and wealth differ over the life cycle. In cross-sectional distributions of income or wealth, classical inequality measures such as the Gini could therefore find substantial inequality even if everyone has the same lifetime income or wealth. We describe the adjusted Gini index (Almas and Mogstad, 2012, Scandinavian Journal of Economics 114: 24–54), which is a generalization of the classical Gini index with attractive properties, and we describe the adgini command, which provides the adjusted Gini index and the classical Gini index. The adgini command also provides options to produce other well-known age-adjusted inequality measures, such as the Paglin-Gini (Paglin, 1975, American Economic Review 65: 598–609) and the Wertz-Gini (Wertz, 1979, American Economic Review 69: 670–672), and provides efficient estimation of the classical Gini coefficient. Copyright 2012 by StataCorp LP.
adgini, inequality, life cycle, age adjustments, Gini coefficient,
http://www.stata-journal.com/article.html?article=st0266
Ingvild Almas
Tarjei Havnes
Magne Mogstad
oai:RePEc:tsj:stataj:v:11:y:2011:i:4:p:620-6262020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:11:y:2011:i:4:p:620-626
article
Importing financial data
In this article, we describe two commands - fetchyahooquotes and fetchyahookeystats - that import historical financial data and key current finan- cial statistics from Yahoo! Finance.
fetchyahooquotes, fetchyahookeystats, finance, financial data, stocks, exchange-traded funds, historical data, Yahoo! Finance
http://www.stata-journal.com/article.html?article=dm0061
Mehmet F. Dicle
John Levendis
oai:RePEc:tsj:stataj:v:12:y:2012:i:3:p:562-5642020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:3:p:562-564
article
Review of Interpreting and Visualizing Regression Models Using Stata by Michael N. Mitchell
In this article, I review Interpreting and Visualizing Regression Models Using Stata, by Michael Mitchell (2012a [Stata Press]).
graphics, regression, piecewise regression, visualizing interaction, multilevel/longitudinal, marginsplot, interpreting models
http://www.stata-journal.com/article.html?article=gn0053
Alan C. Acock
oai:RePEc:tsj:stataj:v:10:y:2010:i:1:p:46-602020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:1:p:46-60
article
Tabulating SPost results using estout and esttab
The SPost user package (Long and Freese, 2006, Regression Models for Categorical Dependent Variables Using Stata [Stata Press]) is a suite of postestimation commands to compute additional tests and effects representations for a variety of regression models. To facilitate and automate the task of tabulating results from SPost commands for inclusion in reports, publications, and presentations, we introduce tools to integrate SPost with the estout user package (Jann, 2005, Stata Journal 5: 288 – 308; 2007, Stata Journal 7: 227 – 244). The estadd command can retrieve results computed by the SPost commands brant, fitstat, listcoef, mlogtest, prchange, prvalue, and asprvalue. These results can then be tabulated by esttab or estout. Copyright 2010 by StataCorp LP.
SPost, regression table, estadd, estout, esttab, brant, fitstat, listcoef, mlogtest, prchange, prvalue, asprvalue
http://www.stata-journal.com/article.html?article=st0183
Ben Jann
J. Scott Long
oai:RePEc:tsj:stataj:v:12:y:2012:i:2:p:214-2412020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:12:y:2012:i:2:p:214-241
article
Menu-driven X-12-ARIMA seasonal adjustment in Stata
The X-12-ARIMA software of the U.S. Census Bureau is one of the most popular methods for seasonal adjustment; the program x12a.exe is widely used around the world. Some software also provides X-12-ARIMA seasonal adjustments by using x12a.exe as a plug-in or externally. In this article, we illustrate a menudriven X-12-ARIMA seasonal-adjustment method in Stata. Specifically, the main utilities include how to specify the input file and run the program, how to make a diagnostics table, how to import data, and how to make graphs. Copyright 2012 by StataCorp LP.
sax12, sax12diag, sax12im, sax12del, seasonal adjustment, X-12-ARIMA, menu-driven
http://www.stata-journal.com/article.html?article=st0255
Qunyong Wang
Na Wu
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:670-6812020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:670-681
article
Speaking Stata: Graphing subsets
Graphical comparison of results for two or more groups or subsets can be accomplished by way of subdivision, superimposition, or juxtaposition. The choice between superimposition (several groups in one panel) and juxtaposition (several groups in several panels) can require fine discrimination: while juxtapo- sition increases clarity, it requires mental superimposition to be most effective. Discussion of this dilemma leads to exploration of a compromise design in which each subset is plotted in a separate panel, with the rest of the data as a backdrop. Univariate and bivariate examples are given, and associated Stata coding tips and tricks are commented on in detail.
graphics, subdivision, superimposition, juxtaposition, quantile plots, Gumbel distribution, scatterplots
http://www.stata-journal.com/article.html?article=gr0046
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:10:y:2010:i:4:p:691-6922020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:4:p:691-692
article
Software updates
Updates for a previously published package is provided. Copyright 2010 by StataCorp LP.
Editors
oai:RePEc:tsj:stataj:v:10:y:2010:i:2:p:226-2512020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:2:p:226-251
article
Analyzing longitudinal data in the presence of informative drop-out: The jmre1 command
Many studies in various research areas have designs that involve repeated measurements over time of a continuous variable across a group of subjects. A frequent and serious problem in such studies is the occurrence of missing data. In many cases, missing data are caused by an event that leads to a premature termination of the series of repeated measurements on some subjects. When the probability of the occurrence of this event is related to the subject-specific under- lying trend of the variable of interest, this missingness process is called informative censoring or informative drop-out. Standard likelihood-based methods (for example, linear mixed models) fail to give consistent estimates. In such cases, one needs to apply methods that simultaneously model the observed data and the missingness process. In this article, we review a method proposed by Touloumi et al. (1999, Statistics in Medicine 18: 1215-1233) to adjust for informative drop-out in longitudinal data analysis. We also present the jmre1 command, which can be used to fit the proposed model. The estimation method combines the restricted it- erative generalized least-squares method with a nested expectation-maximization algorithm. The method is implemented mainly using Stata’s matrix programming language, Mata. Our example is derived from the epidemiology of the HIV infection. Copyright 2010 by StataCorp LP.
jmre1, jmre1_p, datajoint1, missing data, informative censoring, informative drop-out, longitudinal data
http://www.stata-journal.com/article.html?article=st0190
Nikos Pantazis
Giota Touloumi
oai:RePEc:tsj:stataj:v:10:y:2010:i:3:p:458-4812020-08-27RePEc:tsj:stataj
RePEc:tsj:stataj:v:10:y:2010:i:3:p:458-481
article
Translation from narrative text to standard codes variables with Stata
In this article, we describe screening, a new Stata command for data management that can be used to examine the content of complex narrative-text variables to identify one or more user-defined keywords. The command is useful when dealing with string data contaminated with abbreviations, typos, or mistakes. A rich set of options allows a direct translation from the original narrative string to a user-defined standard coding scheme. Moreover, screening is flexible enough to facilitate the merging of information from different sources and to extract or reorganize the content of string variables.
screening, keyword matching, narrative-text variables, standard coding schemes
http://www.stata-journal.com/article.html?article=dm0050
Federico Belotti
Domenico Depalo
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:30-362020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:30-36
article
Regressions are commonly misinterpreted: A rejoinder
http://www.stata-journal.com/article.html?article=st0419
David C. Hoaglin
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:96-1112020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:96-111
article
bireprob: An estimator for bivariate random-effects probit models
I present the bireprob command, which fits a bivariate random-effects probit model. bireprob enables a researcher to estimate two (seemingly unrelated) nonlinear processes and to control for interrelations between their unobservables. The estimator uses quasirandom numbers (Halton draws) and maximum simulated likelihood to estimate the correlation between the error terms of both processes. The application of bireprob is illustrated in two examples: the first one uses artificial data, and the second one uses real data. Finally, in a simulation, the per- formance of the estimator is tested and compared with the official Stata command xtprobit. Copyright 2016 by StataCorp LP.
bireprob, bivariate random-effects probit, maximum simulated likelihood, Halton draws
http://www.stata-journal.com/article.html?article=st0426
Alexander Plum
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:52-712020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:52-71
article
diff: Simplifying the estimation of difference-in-differences treatment effects
In this article, I present the features of the user-written command diff, which estimates difference-in-differences (DID) treatment effects. diff simplifies the DID analysis by allowing the conventional DID setting to be combined with other nonexperimental evaluation methods. The command is equipped with an attractive set of options: the single DID with covariates, the kernel propensity-score matching DID, and the quantile DID. Specific options are included to obtain DID estimation on a repeated cross-section setting and to test the general balancing properties of the model. I illustrate the features of diff using a sample of the dataset from the pioneering implementation of DID by Card and Krueger (1994, American Economic Review 84: 772–793). Copyright 2016 by StataCorp LP.
iff, difference-in-differences, causal inference, kernel propensity score, quantile treatment effects, nonexperimental methods, DID, QDID
http://www.stata-journal.com/article.html?article=st0424
Juan M. Villa
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:88-952020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:88-95
article
Quantifying the uptake of user-written commands over time
A major factor in the uptake of new statistical methods is the availabil- ity of user-friendly software implementations. One attractive feature of Stata is that users can write their own commands and release them to other users via Sta- tistical Software Components at Boston College. Authors of statistical programs do not always get adequate credit, because programs are rarely cited properly. There is no obvious measure of a program’s impact, but researchers are under increasing pressure to demonstrate the impact of their work to funders. In ad- dition to encouraging proper citation of software, the number of downloads of a user-written package can be regarded as a measure of impact over time. In this article, we explain how such information can be accessed for any month from July 2007 and summarized using the new ssccount command. Copyright 2016 by StataCorp LP.
ssccount, SSC, impact
http://www.stata-journal.com/article.html?article=dm0086
Babak Choodari-Oskooei
Tim P. Morris
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:237-2422020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:237-242
article
Review of Michael N. Mitchell’s Stata for the Behavioral Sciences
Philip B. Ender
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:72-872020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:72-87
article
mfpa: Extension of mfp using the ACD covariate transformation for enhanced parametric multivariable modeling
In a recent article, Royston (2015, Stata Journal 15: 275–291) intro- duced the approximate cumulative distribution (ACD) transformation of a contin- uous covariate x as a route toward modeling a sigmoid relationship between x and an outcome variable. In this article, we extend the approach to multivariable mod- eling by modifying the standard Stata program mfp. The result is a new program, mfpa, that has all the features of mfp plus the ability to fit a new model for user- selected covariates that we call FP1(p1, p2). The FP1(p1, p2) model comprises the best-fitting combination of a dimension-one fractional polynomial (FP1) function of x and an FP1 function of ACD (x). We describe a new model-selection algorithm called function-selection procedure with ACD transformation, which uses signifi- cance testing to attempt to simplify an FP1(p1,p2) model to a submodel, an FP1 or linear model in x or in ACD (x). The function-selection procedure with ACD transformation is related in concept to the FSP (FP function-selection procedure), which is an integral part of mfp and which is used to simplify a dimension-two (FP2) function. We describe the mfpa command and give univariable and multivariable examples with real data to demonstrate its use. Copyright 2016 by StataCorp LP.
mfpa, mfp, continuous covariates, sigmoid function, ACD trans- formation, multivariable fractional polynomials, regression models
http://www.stata-journal.com/article.html?article=st0425
Patrick Royston
Willi Sauerbrei
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:23-242020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:23-24
article
Regressions are commonly misinterpreted: Comments on the article
http://www.stata-journal.com/article.html?article=st0420
James W. Hardin
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:2442020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:244
article
Software updates
Updates for previously published packages are provided.
Editors
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:197-2282020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:197-228
article
Implementing factor models for unobserved heterogeneity in Stata
We introduce a new command, heterofactor, for the maximum likeli- hood estimation of models with unobserved heterogeneity, including a Roy model. heterofactor fits models with up to four latent factors and allows the unobserved heterogeneity to follow general distributions. Our command differs from Stata’s sem command in that it does not rely on the linearity of the structural equations and distributional assumptions for identification of the unobserved heterogeneity. It uses the estimated distributions to numerically integrate over the unobserved factors in the outcome equations by using a mixture of normals in a Gauss–Hermite quadrature. heterofactor delivers consistent estimates, including the unobserved factor loadings, in a variety of model structures. Copyright 2016 by StataCorp LP.
heterofactor, unobserved heterogeneity, factor models, Roy model, maximum likelihood, numerical integration
http://www.stata-journal.com/article.html?article=st0431
Miguel Sarzosa
Sergio Urzúa
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:37-512020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:37-51
article
Estimation of multivariate probit models via bivariate probit
In this article, I suggest the utility of fitting multivariate probit models using a chain of bivariate probit estimators. This approach is based on Stata’s biprobit and suest commands and is driven by a Mata function, bvpmvp(). I discuss two potential advantages of the approach over the mvprobit command (Cappellari and Jenkins, 2003, Stata Journal 3: 278–294): significant reductions in computation time and essentially unlimited dimensionality of the outcome set. Computation time is reduced because the approach does not rely on simulation methods; unlimited dimensionality arises because only pairs of outcomes are con- sidered at each estimation stage. This approach provides a consistent estimator of all the multivariate probit model’s parameters under the same assumptions re- quired for consistent estimation via mvprobit, and simulation exercises I provide suggest no loss of estimator precision relative to mvprobit. Copyright 2016 by StataCorp LP.
bvpmvp(), bvopmvop(), multivariate probit models, bivariate probit
http://www.stata-journal.com/article.html?article=st0423
John Mullahy
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:229-2362020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:229-236
article
Speaking Stata: Truth, falsity, indication, and negation
Many problems in Stata call for selection of observations according to true or false conditions, indicator variables flagging the same, groupwise calculations, or a prearranged sort order. The example of finding the first (earliest) and last nonmissing value in panel or longitudinal data is used to explain and explore these devices and how they may be used together. Negating an indicator variable has the special virtue that selected observations may be sorted easily to the top of the dataset. Copyright 2016 by StataCorp LP.
true or false, logical, Boolean, indicator variable, dummy variable, sort, by, panel data, longitudinal data, programming, data management
http://www.stata-journal.com/article.html?article=dm0087
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:159-1842020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:159-184
article
bicop: A command for fitting bivariate ordinal regressions with residual dependence characterized by a copula function and normal mixture marginals
In this article, we describe a new Stata command, bicop, for fitting a model consisting of a pair of ordinal regressions with a flexible residual distri- bution, with each marginal distribution specified as a two-part normal mixture, and stochastic dependence governed by a choice of copula functions. The bicop command generalizes the existing biprobit and bioprobit commands, which as- sume a bivariate normal residual distribution. We present and explain the bicop estimation command and the available postestimation commands using data on financial well-being from the UK Understanding Society Panel Survey. Copyright 2016 by StataCorp LP.
bicop, bivariate ordinal regression, copula, mixture model
http://www.stata-journal.com/article.html?article=st0429
Mónica Hernández-Alava
Stephen Pudney
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:112-1382020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:112-138
article
conindex: Estimation of concentration indices
Concentration indices are frequently used to measure inequality in one variable over the distribution of another. Most commonly, they are applied to the measurement of socioeconomic-related inequality in health. We introduce the user-written command conindex, which provides point estimates and standard errors of a range of concentration indices. The command also graphs concentration curves (and Lorenz curves) and performs statistical inference for the comparison of inequality between groups. We offer an accessible introduction to the various concentration indices that have been proposed to suit different measurement scales and ethical responses to inequality. We also demonstrate the command’s capabil- ities and syntax by analyzing wealth-related inequality in health and health care in Cambodia. Copyright 2016 by StataCorp LP.
conindex, inequality, rank-dependent indices, concentration index, health, health care
http://www.stata-journal.com/article.html?article=st0427
Owen O’Donnell
Stephen O’Neill
Tom Van Ourti
Brendan Walsh
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:139-1582020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:139-158
article
Estimating polling accuracy in multiparty elections using surveybias
Any rigorous discussion of bias in opinion surveys requires a scalar measure of survey accuracy. Martin, Traugott, and Kennedy (2005, Public Opin- ion Quarterly 69: 342–369) propose such a measure A for the two-party case, and Arzheimer and Evans (2014, Political Analysis 22: 31–44) demonstrate how mea- sures A′i, B, and Bw for the more common multiparty case can be derived. We describe the commands surveybias, surveybiasi, and surveybiasseries, which enable the fast computation of these binomial and multinomial measures of bias in opinion surveys. While the examples are based on pre-election surveys, the methodology applies to any multinomial variable whose true distribution in the population is known (for example, through census data). Copyright 2016 by StataCorp LP.
surveybias, surveybiasi, surveybiasseries, multinomial vari- ables, surveys, survey bias
http://www.stata-journal.com/article.html?article=st0428
Kai Arzheimer
Jocelyn Evans
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:3-42020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:3-4
article
16 and all that
http://www.stata-journal.com/article.html?article=gn0068
Editors
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:5-222020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:5-22
article
Regressions are commonly misinterpreted
Much literature misinterprets results of fitting multivariable models for linear regression, logistic regression, and other generalized linear models, as well as for survival, longitudinal, and hierarchical regressions. For the leading case of multiple regression, regression coefficients can be accurately interpreted via the added-variable plot. However, a common interpretation does not reflect the way regression methods actually work. Additional support for the correct in- terpretation comes from examining regression coefficients in multivariate normal distributions and from the geometry of least squares. To properly implement mul- tivariable models, one must be cautious when calculating predictions that average over other variables, as in the Stata command margins. Copyright 2016 by StataCorp LP.
regression models, added-variable plot, multivariate normal distribution, geometry of least squares, margins command
http://www.stata-journal.com/article.html?article=st0419
David C. Hoaglin
oai:RePEc:tsj:stataj:v:16:y:2016:i:1:p:185-1962020-01-23RePEc:tsj:stataj
RePEc:tsj:stataj:v:16:y:2016:i:1:p:185-196
article
Features of the area under the receiver operating characteristic (ROC) curve. A good practice.
The area under the receiver operating characteristic (ROC) curve is a measure of discrimination ability used in diagnostic and prognostic research. The ROC plot is usually represented without additional information about decision thresholds used to generate the graph. In our article, we show that adding at least one or more informative cutoff points on the ROC graph facilitates the character- ization of the test and the evaluation of the discriminatory capacities, which can result in more informed medical decisions. We use the rocreg and rocregplot commands. Copyright 2016 by StataCorp LP.
receiver operating characteristic (ROC) curve, area under the ROC curve, cervix cancer, diagnostic test, discrimination, prognostic models, rocreg, rocregplot
http://www.stata-journal.com/article.html?article=st0430
David Lora
Israel Contador
José F. Pérez-Regadera
Agustín Gómez de la Cámara
oai:RePEc:tsj:stataj:y:18:y:2018:i:3:p:503-5162019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:y:18:y:2018:i:3:p:503-516
article
Graphing each individual’s data over time
Graphing each individual’s data over time (in separate graphs) can be a worthwhile approach in exploring longitudinal and panel datasets. This especially applies for datasets where several variables change over time and where there are many possible time points, for example, administrative datasets and patient safety profiles in clinical trials. Studying a few individuals’ graphs closely can provide insight into the nature and quality of the data, generate hypotheses, and inform data analysis. Selecting a few typical or unusual graphs can make for powerful presentations at meetings. I give examples of graphing a single variable and multiple variables over time for each individual, and I detail associated Stata coding tips and tricks.
longitudinal data, panel data, time series, graphics, twoway, scatter, putdocx, superimposition, xtline, patient profile, profile plot, developmental trajectory
http://www.stata-journal.com/article.html?article=gr0074
Mark D. Chatfield
oai:RePEc:tsj:stataj:y:18:y:2018:i:3:p:692-7152019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:y:18:y:2018:i:3:p:692-715
article
giniinc: A Stata package for measuring inequality from incomplete income and survival data
Often, observed income and survival data are incomplete because of left- or right-censoring or left- or right-truncation. Measuring inequality (for instance, by the Gini index of concentration) from incomplete data like these will produce biased results. We describe the package giniinc, which contains three independent commands to estimate the Gini concentration index under different conditions. First, survgini computes a test statistic for comparing two (survival) distributions based on the nonparametric estimation of the restricted Gini index for right-censored data, using both asymptotic and permutation inference. Second, survbound computes nonparametric bounds for the unrestricted Gini index from censored data. Finally, survlsl implements maximum likelihood estimation for three commonly used parametric models to estimate the unrestricted Gini index, both from censored and truncated data. We briefly discuss the methods, describe the package, and illustrate its use through simulated data and examples from an oncology and a historical income study.
survgini, survbound, survlsl, Gini index, income distribution, inequality, survival analysis, censored data, truncated data
http://www.stata-journal.com/article.html?article=st0539
Long Hong
Guido Alfani
Chiara Gigliarano
Marco Bonetti
oai:RePEc:tsj:stataj:v:18:y:2018:i:4:p:9972019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:18:y:2018:i:4:p:997
article
Software updates
Updates for previously published packages are provided.
Editors
oai:RePEc:tsj:stataj:v:18:y:2018:i:4:p:761-7642019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:18:y:2018:i:4:p:761-764
article
The Stata Journal Editors’ Prize 2018: Federico Belotti
http://www.stata-journal.com/article.html?article=gn0077
H. Joseph Newton
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:18:y:2018:i:2:p:293-3262019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:18:y:2018:i:2:p:293-326
article
Linear dynamic panel-data estimation using maximum likelihood and structural equation modeling
Panel data make it possible both to control for unobserved confounders and to include lagged, endogenous regressors. However, trying to do both simulta- neously leads to serious estimation difficulties. In the econometric literature, these problems have been addressed by using lagged instrumental variables together with the generalized method of moments, while in sociology the same problems have been dealt with using maximum likelihood estimation and structural equa- tion modeling. While both approaches have merit, we show that the maximum likelihood–structural equation models method is substantially more efficient than the generalized method of moments method when the normality assumption is met and that the former also suffers less from finite sample biases. We introduce the command xtdpdml, which has syntax similar to other Stata commands for linear dynamic panel-data estimation. xtdpdml greatly simplifies the structural equation model specification process; makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models; allows one to include time-invariant variables in the model, unlike most related methods; and takes advantage of Stata’s ability to use full-information maximum likelihood for dealing with missing data. The strengths and advantages of xtdpdml are illustrated via examples from both economics and sociology. Copyright 2018 by StataCorp LP.
xtdpdml, linear dynamic panel-data, structural equation mod- eling, maximum likelihood
http://www.stata-journal.com/article.html?article=st0523
http://www.stata-journal.com/software/sj18-2/st0523/
Richard Williams
Paul D. Allison
Enrique Moral-Benito
oai:RePEc:tsj:stataj:v:18:y:2018:i:2:p:432-4462019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:18:y:2018:i:2:p:432-446
article
qfactor: A command for Q-methodology analysis
In this article, I introduce qfactor, a new command for Q-methodology analysis. Q-methodology is a combination of qualitative and quantitative tech- niques for studying subjectivity. Its quantitative component is based on a by- person factor analysis, usually followed by a factor-rotation technique. Currently, only a handful of programs with limited capability are available for Q-methodology analysis, and none of them are in the major commercial statistical programs such as Stata, SPSS, and SAS. qfactor offers an attractive set of options, including different factor-extraction and factor-rotation techniques in Stata. The use of qfactor is illustrated using a dataset representing 40 individuals’ perceptions on marijuana legalization.
qfactor, Q-methodology, by-person factor analysis, bipolar factor extraction
http://www.stata-journal.com/article.html?article=st0530
http://www.stata-journal.com/software/sj18-2/st0530/
Noori Akhtar-Danesh
oai:RePEc:tsj:stataj:v:18:y:2018:i:4:p:951-9802019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:18:y:2018:i:4:p:951-980
article
Heteroskedasticity- and autocorrelation-robust F and t tests in Stata
In this article, we consider time-series, ordinary least-squares, and instrumental-variable regressions and introduce a new pair of commands, har and hart, that implement more accurate heteroskedasticity- and autocorrelation- robust (HAR) F and t tests. These tests represent part of the recent progress on HAR inference. The F and t tests are based on the convenient F and t ap- proximations and are more accurate than the conventional chi-squared and normal approximations. The underlying smoothing parameters are selected to target the type I and type II errors, which are the two fundamental objects in every hypoth- esis testing problem. The estimation command har and the postestimation test command hart allow for both kernel HAR variance estimators and orthonormal- series HAR variance estimators. In addition, we introduce another pair of new commands, gmmhar and gmmhart, that implement the recently developed F and t tests in a two-step generalized method of moments framework. For these com- mands, we opt for the orthonormal-series HAR variance estimator based on the Fourier bases because it allows us to develop convenient F and t approximations as in the first-step generalized method of moments framework. Finally, we present several examples to demonstrate these commands.
har, hart, gmmhar, gmmhart, heteroskedasticity- and auto- correlation-robust inference, fixed-smoothing, kernel function, orthonormal series, testing-optimal, AMSE, OLS/IV, two-step GMM, J statistic
http://www.stata-journal.com/article.html?article=st0548
Xiaoqing Ye
Yixiao Sun
oai:RePEc:tsj:stataj:v:18:y:2018:i:4:p:863-8702019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:v:18:y:2018:i:4:p:863-870
article
sdmxuse: Command to import data from statistical agencies using the SDMX standard
In this article, I present the command sdmxuse, which allows users to download and import statistical data from international organizations using the Statistical Data and Metadata eXchange standard (SDMX). The data structure is reviewed to show how users can send specific queries and import only the required time series.
sdmxuse, data import, data management, European Central Bank, Eurostat, International Monetary Fund, Organisation for Economic Co- operation and Development, United Nations, World Bank
http://www.stata-journal.com/article.html?article=dm0097
Sébastien Fontenay
oai:RePEc:tsj:stataj:y:18:y:2018:i:4:p:981-9942019-01-03RePEc:tsj:stataj
RePEc:tsj:stataj:y:18:y:2018:i:4:p:981-994
article
Speaking Stata: Seven steps for vexatious string variables
String variables that seemingly should be numeric require some care. The column provides a step-by-step guide explaining how to convert them or—as the case may merit—to leave them as they are. Dates in string form, identifiers and categorical variables, and pure numeric content trapped in string form need different actions. Practical advice on good and not so good technique is sprinkled throughout.
string variables, numeric variables, dates, encode, egen, destring
http://www.stata-journal.com/article.html?article=dm0098
Nicholas J. Cox
oai:RePEc:tsj:stataj:v:18:y:2018:i:2:p:461-4762019-01-03RePEc:tsj:stataj