# Motivated by differential co-expression analysis in genomics, we consider in this

Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are discussed also. with mean = 1 and 2. The goal is to estimate the differential correlation matrix D = R1 ? R2. A particular focus Rabbit polyclonal to IL29 of the paper is on estimating an approximately sparse differential correlation matrix in the high dimensional setting where the dimension is much larger than the sample sizes, i.e., ? max(= be a = (= (= 1 and 2. Suppose we observe two i.i.d. random samples, { = 1 and 2, and diag(and is approximately normal with mean 0 and variance 1. Hence, measures the uncertainty of the sample covariance for ?, and > 0, of the sample correlation matrix = ( and by the following data-driven quantities, satisfying |? for some > 0; (C2) ?. Note that the commonly used soft thresholding function = 1 satisfy these three conditions. See Rothman et al. (2009) and Cai and Liu (2011). Although the hard thresholding function 1{|defined as is given by = are given by (9) and the thresholding constant can be chosen empirically through cross-validation. See Section 4.1 for more discussions on the empirical choice of with sparse difference < 1 approximately. Here R1, R2 ? 0 and diag(R1) = diag(R2) = 1 mean that R1 and R2 are symmetric, semi-positive definite, and with all diagonal entries 1. For (R1, R2) (ball with radius < 1. When = 0, this constraint becomes the used exact sparsity condition. Let is sub-Gaussian distributed, i.e. there exist constants > 0 such that for all 1 and = 1, 2, log = (12) (13) (2) (10) > 4 > 0 is the correlation between and are independently standard Gaussian. It is easy to calculate that is a good estimate of and |? log log = min(> 0> 0. Theorems 3.1 and 3.2 yield the minimax rate of convergence > 4 together. However, the theoretical choice of may not be optimal in finite 131543-23-2 supplier sample performance, as we can see in the following example. Let R2 and R1 be 200 200-dimensional matrices such that R1,= (?1)|= max(1?|= Rand generate 200 independent samples from X(1) ~ [0, 5], we implement the proposed method with hard thresholding and repeat the experiments for 100 times. The average loss in 131543-23-2 supplier spectral, ?1 and Frobenious norms are shown in Figure 1. In this example Obviously, > 4 is not the best choice. Figure 1 Average (Spectral, ?1, Frobenious) norm losses for [0, 5]. = 100, based on cross-validation. We begin by introducing the following 2 thus, we first divide both samples and into two groups for times as and = 1 randomly, , represents the = 1 and 2, the size of the first group is ( approximately? 1)/ and the size of the second group is approximately and for all four sub-samples. Partition the interval [0, 5] into an equi-spaced grid {0, {0, defined in (2) and (10) with the thresholding constant based on the subsamples and for the second sub-samples and as based on the 131543-23-2 supplier whole samples X(1) and X(2). 4.2 Estimation of Differential Correlation Matrix The adaptive thresholding estimator is easy to implement. We consider the following two models under which the differential correlation matrix is sparse. Model 1 (Random Sparse Difference) R1 and R2 are is a fixed matrix, where with B1,= 131543-23-2 supplier 1 if = and B1,= 02 if is the identity matrix, and R2 is randomly generated as with is a constant that ensures the positive definiteness of R2. Model 2 (Banded Difference) In this setting, =021{? = R1,? = diag(|= 1, 2, where are two i.i.d. samples from and are normalized to yield estimators of R2 and R1, and are estimated using the method proposed in Section 6 separately.2 and take the difference then. D is estimated directly the difference of the sample correlation matrices and and and and produced by breast cancer cells brings about in endothelial cells expression of GDF5. The findings in (Shy et al. (2013)) suggested the important role played by TCF7L1 in breast cancer. Although these biological studies focus on the the behavior of the single gene expression 131543-23-2 supplier mainly,.