barkse-package {barkse}R Documentation

Bayesian Additive Regression Kernels, Selection with Equal weights

Description

Implementation of BARK-SE for SNPs application, for both categorical and continuous data.

Details

Package: barkse
Type: Package
Version: 0.1-0
Date: 2008-09-04
License: GPL version 2 or newer
LazyLoad: yes

Overview:
BARK-SE is a customized model for BARK, Bayesian sum-of-kernels model.
For numeric response y, we have y = f(x) + e, where e ~ N(0,sigma^2).
For a binary response y, P(Y=1 | x) = F(f(x)), where F denotes the standard normal cdf (probit link).

The difference between BARK-SE package and BARK package is that BARK-SE only works for the selection and equal weights scenario. However, BARK-SE can accept both categorical (ordered or non-ordered) and continuous covariates, while the current implementation of BARK only works for continuous variables.

The usage is also slightly different. BARK-SE works with a particular data class called "snpdata", which can be generated from function make.snpdata(). Check sim.snps for a example use. If all variables are categorical, one can use the functions summarize.odds() and plot.odds() to view the marginal odds for each level of each variable. Continuous variables are coded as "-1" in the data type vector.

Functions:
barkse()
make.snpdata()
sim.snps()
summarize.odds()
plot.odds()

Author(s)

Zhi Ouyang <zo2@stat.duke.edu>,

Maintainer: Zhi Ouyang <zo2@stat.duke.edu>

References

Ouyang, Zhi (2008) Bayesian Additive Regression Kernels. Duke University. Ph.D. dissertation, Chapter 3.
at: http://stat.duke.edu/people/theses/OuyangZ.html

Examples

## Simple Logistic Model for 6 Categorical Variables
#  1. Simulate the data
#  1.1 Specify the number of categories for all variables
#      Integers for categorical variable, scale (sd) for continous variable
ncats <- c(2, 4, rep(3, 4))
#  1.2 Specify the types of the variables in generating the design table
#      -1: continuous; 1: ordered/binary; 2: dominant;
#       3: recessive;  4: non-ordered
simtypes <- c(1, 1, 4, 1, 2, 3)
#  1.3 Specify the types goes with the snpdata class (blind)
#      -1: continuous; 1: ordered/binary; 2: non-ordered
datatypes <- c(1, 1, 2, 2, 2, 2)
#  1.4 Specify the prior list for all variables
#      vector of probabilities for categorical variable
#      center (mean) for continous variable (assuming normal population)
priors <- list(c(.5, .5), c(.2, .3, .4, .1),
               c(.7, .2, .1), c(.4, .5, .1),
               c(.75, .2, .05), c(.6, .35, .05))
names <- c("SEX", "AGE", "RACE", "SNP1", "SNP2", "SNP3")
#  1.5 Specify which variables come into the logistic model
#      For simplicity, the logistic regression coefficient are 0 or 0.7
#      You many play with this vector to see different behaviour
delta <- c(1, 0, 1, 0, 1, 0)
#  1.6 Simulate the data from a logistic model
#      You may need to use make.snpdata() to generate the proper data format.
snpdata <- sim.snps(500, simtypes=simtypes,
                    datatypes=datatypes, ncats=ncats,
                    delta=delta, priors=priors, names=names)
summary(snpdata)
#  2. Fit the model
# Note: this step may take a while
fit <- barkse(snpdata, keepevery=20, printevery=2e2)
#  3. View summary statistics
boxplot(as.data.frame(fit$theta.lambda), ylim=c(0, 1))
# Note: this step may take a while
odds <- summarize.odds(fit$snpdata, fit$theta.list)
plot.odds(odds, snpdata$ncats, ylim=c(0, 3))

[Package barkse version 0.1-0 Index]