
https://cpw.cvlcollections.org/files/original/8724aa71a5ffa9d052db079ba395c254.pdf
e5590dc2841b048067d89383b1e4ba69
PDF Text
Text
The research in this publication was partially or fully funded by Colorado Parks and Wildlife.
Heather Disney Dugan, Acting Director, Colorado Parks and Wildlife • Parks and Wildlife Commission: Carrie Besnette Hauser, Chair • Dallas May, ViceChair • Marie Haskett, Secretary • Taishya Adams • Karen Michelle Bailey • Betsy Blecha • Gabriel Otero • Duke Phillips, IV • Richard Reading • James Jay
Tutchton • Eden Vardy
�www.nature.com/scientificreports
OPEN
Community confounding in joint
species distribution models
Justin J. Van Ee 1*, Jacob S. Ivan 2 & Mevin B. Hooten 3
Joint species distribution models have become ubiquitous for studying speciesenvironment
relationships and dependence among species. Accounting for community structure often improves
predictive power, but can also affect inference on speciesenvironment relationships. Specifically,
some parameterizations of joint species distribution models allow interspecies dependence
and environmental effects to explain the same sources of variability in species distributions, a
phenomenon we call community confounding. We present a method for measuring community
confounding and show how to orthogonalize the environmental and random species effects in suite
of joint species distribution models. In a simulation study, we show that community confounding can
lead to computational difficulties and that orthogonalizing the environmental and random species
effects can alleviate these difficulties. We also discuss the inferential implications of community
confounding and orthogonalizing the environmental and random species effects in a case study of
mammalian responses to the Colorado bark beetle epidemic in the subalpine forest by comparing
the outputs from occupancy models that treat species independently or account for interspecies
dependence. We illustrate how joint species distribution models that restrict the random species
effects to be orthogonal to the fixed effects can have computational benefits and still recover the
inference provided by an unrestricted joint species distribution model.
Historically, species distributions have been modeled independently from each other due to unavailability of
multispecies datasets and computational restraints. However, ecological datasets that provide insights about
collections of organisms have become prevalent over the last decade thanks to efforts like Long Term Ecological
Research Network (LTER), National Ecological Observatory Network (NEON), and citizen science s urveys1.
In addition, technology has improved our ability to fit modern statistical models to these datasets that account
for both species environmental preferences and interspecies dependence. These advancements have allowed
for the development of joint species distribution models (JSDM)2–4 that can model dependence among species
simultaneously with environmental drivers of occurrence and/or abundance.
Species distributions are shaped by both interspecies dynamics and environmental preferences5–8. JSDMs
integrate both sources of variability and adjust uncertainty to reflect that multiple confounded factors can
contribute to similar patterns in species distributions. Some have proposed that JSDMs not only account for biotic
interactions but also correct estimates of association between species distributions and environmental drivers3,9,
while others claim JSDMs cannot disentangle the roles of interspecies dependence and environmental d
rivers5.
We address why JSDMs can provide inference distinct from their concomitant independent SDMs, how certain
parameterizations of a JSDM induce confounding between the environmental and random species effects, and
when deconfounding these effects may be appealing for computation and interpretation.
Because of the prevalence of occupancy data for biomonitoring in ecology, we focus our discussion of
community confounding in JSDMs on occupancy models, although we also consider a JSDM for species density
data in the simulation study. The individual species occupancy model was first formulated by MacKenzie et al.10
and has several joint species e xtensions4,11–16. We chose to investigate the impacts of community confounding on
the probit model since it has been widely used in the analysis of occupancy d
ata4,13,17. We also developed a joint
18
species extension to the RoyleNichols m
odel and consider community confounding in that model.
We use the probit and RoyleNichols occupancy models to improve our understanding of montaine mammal
communities in what follows. We show that including unstructured random species effects in either occupancy
model induces confounding between the fixed environmental and random species effects. We demonstrate how
to orthogonalize these effects in the model and compare the resulting inference compared to models where
species are treated independently.
1
Department of Statistics, Colorado State University, Fort Collins 80523, USA. 2Colorado Parks and Wildlife, Fort
Collins 80526, USA. 3Department of Statistics and Data Sciences, The University of Texas at Austin, Austin 78712,
USA. *email: vanee002@colostate.edu
Scientific Reports 
(2022) 12:12235
 https://doi.org/10.1038/s41598022156946
1
Vol.:(0123456789)
�www.nature.com/scientificreports/
Unlike previous approaches that have applied restricted regression techniques similar to ours, we use it in the
context of wellknown ecological models for species occupancy and intensity. While such approaches have been
discussed in spatial statistics and environmental science, they have not been adopted in settings involving the
multivariate analysis of community data. We draw parallels between restricted spatial regression and restricted
JSDMs but also highlight where the methods differ in goals and outcomes. We find that the computational
benefits conferred by performing restricted spatial regression also hold for some joint species distribution models.
Royle‑Nichols joint species distribution model. We present a JSDM extension to the RoyleNichols
odel18. The RoyleNichols model accounts for heterogeneity in detection induced by the species’ latent intensity,
m
a surrogate related to true species abundance. Abundance, density, and occupancy estimation often requires an
explicit spatial region that is closed to emmigration and immigration. In our model, the unobservable intensity
variable helps us explain heterogeneity in the frequencies we observe a species at different sites without making
assumptions about population closure. In the “Model” section, we further discuss the distinctions between
abundance and intensity in the RoyleNichols model.
The RoyleNichols model utilizes occupancy survey data but provides inference distinct from the basic
occupancy model10. In the RoyleNichols model, we estimate individual detection probability for homogeneous
members of the population, whereas in an occupancy model, we estimate probability of observing at least one
member of the population given that the site is occupied. Furthermore, the RoyleNichols model allows us to
relate environmental covariates to the latent intensity associated with a species at a site, while in an occupancy
model, environmental covariates are associated with the species latent probability of occupancy at a site. Species
intensity and occupancy may be governed by different mechanisms, and inference from an intensity model can be
distinct from that provided by an occupancy m
odel19–21. Cingolani et al.20 proposed that, in plant communities,
certain environmental filters preclude species from occupying a site and an additional set of filters may regulate
if a species can flourish. Hence, certain covariates that were unimportant in an occupancy model may improve
predictive power in an intensity model.
Community confounding. Species distributions are shaped by environment as well as competition
and mutualism within the community8,22,23. Community confounding occurs when species distributions are
explained by a convolution of environmental and interspecies effects and can lead to inferential differences
between a joint and single species distribution model as well as create difficulties for fitting JSDMs. Former
studies have incorporated interspecies dependence into an occupancy m
odel4,11–16, and others have addressed
spatial confounding1,17,24,25, but none of these explicitly addressed community confounding. However, all
Bayesian joint occupancy models naturally attenuate the effects of community confounding due to the prior on
the regression coefficients. The prior, assuming it is proper, induces regularization on the regression coefficients26
that can lessen the inferential and computational impacts of confounding27. Furthermore, latent factor models
like that described by Tobler et al.4 restrict the dimensionality of the random species effect which should also
reduce confounding with the environmental effects.
We address community confounding by formulating a version of our model that orthogonalizes the
environmental effects and random species effects. Orthogonalizing the fixed and random effects is common
practice in spatial statistics and often referred to as restricted spatial regression27–31. Restricted regression has
been applied to spatial generalized linear mixed models (SGLMM) for observations y, which can be expressed as
y ∼ [yµ, ψ],
(1)
g(µ) = Xβ + η,
(2)
η ∼ N (0, �),
(3)
where g(·) is a link function, ψ are additional parameters for the data model, and � is the covariance matrix of
the spatial random effect. In the SGLMM, prior information facilitates the estimation of η, which would not be
estimable otherwise due to its shared column space with β30. This is analogous to applying a ridge penalty to η,
which stabilizes the likelihood. Another method for fitting the confounded SGLMM is to specify a restricted
version:
y ∼ [yµ, ψ],
(4)
g(µ) = Xδ + (I − P X )η,
(5)
η ∼ N (0, �),
(6)
X. In the unrestricted SGLMM, the
where P X =
regression coefficients β and random effect η in (1) compete to explain variability in the latent mean µ in the
direction of X 27. In the restricted model, however, all variability in the direction of X is explained solely by the
regression coefficients δ in (4)31, and η explains residual variation that is orthogonal to X . We refer to β as the
conditional effects because they depend on η, and δ as the unconditional effects.
Restricted regression, as specified in (4), was proposed by Reich et al.28. Reich et al.28 described a diseasemapping example in which the inclusion of a spatial random effect rendered one covariate effect unimportant
that was important in the nonspatial model. Spatial maps indicated an association between the covariate and
X(XX)−1 X ′ is the projection matrix onto the column space of
Scientific Reports 
Vol:.(1234567890)
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
2
�www.nature.com/scientificreports/
response, making inference from the spatial model appear untenable. Reich et al.28 proposed restricted spatial
regression as a method for recovering the posterior expectations of the nonspatial model and shrinking the
posterior variances which tend to be inflated for the unrestricted SGLMM.
30,32–35
Several modifications of restricted spatial regression
. All restricted
regres� have been proposed
�
�
� spatial
sion methods seek to provide posterior means E δj y and marginal posterior variances Var δj y , j = 1, ..., p
that satisfy the following two conditions36:
�
�
� �
1. E δy
�
�
� = E �β NS y �and, �
2. Var βNS,j y ≤ Var δj y ≤ Var βSpatial,j y for j = 1, ..., p,
where β NS and β Spatial are the regression coefficients corresponding to the nonspatial and unrestricted spatial
models, respectively.
The inferential impacts of spatial confounding on the regression coefficients has been debated. Hodges and
Reich29 outlined five viewpoints on spatial confounding and restricted regression in the literature and refuted
the two following views:
1. Adding the random effect η corrects for bias in β resulting from missing covariates.
2. Estimates of β in a SGLMM are shrunk by the random effect and hence conservative.
The random effect η can increase or decrease the magnitude of β , and the change may be galvanized by mechanisms not related to missing covariates. Therefore, we cannot assume the regression coefficients in the SGLMM
will exceed those of the restricted model, nor should we regard
� the estimates
�
� in either model as biased due to
�
misspecification. Confounding in the SGLMM causes Var βj y ≥ Var δj y , j = 1, ..., p, because of the shared
column space of the fixed and random effects. Thus, we refer to the conditional coefficients as conservative with
regard to their credible intervals, not their posterior expectations.
Reich et al.28 argued that restricted spatial regression should always be applied because the spatial random
effect is generally added to improve predictions and/or correct the fixed effect variance estimate. While it may
be inappropriate to orthogonalize a set of fixed effects in an ordinary linear model, orthogonalizing the fixed
and random effect in a spatial model is permissible because the random effect is generally not of inferential
interest. Paciorek37 provided the alternative perspective that, if confounding exists, it is inappropriate to attribute all contested variability in y to the fixed effects. Hanks et al.31 discussed factors for deciding between the
unrestricted and restricted SGLMM on a continuous spatial support. The restricted SGLMM leads to improved
computational stability, but the unconditional effects are less conservative under model misspecification and
more prone to typeS errors: The Bayesian analogue of Type I error. Fitting the unrestricted SGLMM when the
fixed and random effects are truly orthogonal does not introduce bias, but it will increase the fixed effect variance. Given these considerations, Hanks et al.31 suggested a hybrid approach where the conditional effects, β ,
are extracted from the restricted SGLMM. This is possible because the restricted SGLMM is a reparameterization of the unrestricted SGLMM. This hybrid approach leads to improved computational stability but yields the
more conservative parameter estimates. We describe how to implement this hybrid approach for joint species
distribution models in the “Community confounding” section.
Restricted regression has also been applied in time series applications. Dominici et al.38 debiased estimates
of fixed effects confounded by time using restricted smoothing splines. Without the temporal random effect,
Dominici et al.38 asserted all temporal variation in the response would be wrongly attributed to temporally correlated fixed effects. Houseman et al.39 used restricted regression to ensure identifiability of a nonparametric
temporal effect and highlighted certain covariate effects that were more evident in the restricted model (i.e., the
unconditional effects’ magnitude was greater). Furthermore, restricted regression is implicit in restricted maximum likelihood estimation (REML). REML is often employed for debiasing the estimate of the variance of y in
linear regression and fitting linear mixed models that are not estimable in their unrestricted f ormat40. Because
REML is generally applied in the context of variance and covariance estimation, considerations regarding the
effects of REML on inference for the fixed effects are lacking in the literature.
In ecological science, JSDMs often include an unstructured random effect like η in (1) to account for interspecies dependence, and hence can also experience community confounding between X and η analogous to spatial
confounding. Unlike a spatial or temporal random effect, we consider random species effects to be inferentially
important, rather than a tool solely for improving predictions or catchall for missing covariates. An orthogonalization approach in a JSDM attributes contested variation between the fixed effects (environmental information)
and random effect (community information) to the fixed effect.
We describe how to orthogonalize the fixed and random species effects in a suite of JSDMs and present a
method for detecting community confounding. In the simulation study, we test the efficacy of our method for
detecting confounding, show that community confounding can lead to computational difficulties similar to those
caused by spatial confounding31, and highlight that, for some models, restricted regression can improve model
fitting. We also investigate the inferential implications of community confouding and restricted regression in
JSDMs by comparing outputs from the SDM, unrestricted JSDM, and restricted JSDM of the RoyleNichols and
probit occupancy models fit to mammalian camera trap data. Lastly, we discuss other inferential and computational methods for confounded models and consider their appropriateness for joint species distribution modeling.
Scientific Reports 
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
3
Vol.:(0123456789)
�www.nature.com/scientificreports/
Model
Data model. The probit and RoyleNichols occupancy models were developed for analyzing multispecies
binary detection data, yijk , arising from a zeroinflated Bernoulli process with probability of success pijk , where
i = 1, . . . , n, j = 1, . . . , Ji , and k = 1, . . . , K correspond to sites, occasions, and species, respectively. Occupancy
data of this form have traditionally been analyzed in a latent variable framework10,41,42. In what follows, we let
zik ∼ Bern(ψik ) be an indicator on whether species k occupies site i. Given a site is occupied, we detect species k
on occasion j with some probability pijk , such that (yijk zik = 1) ∼ Bern(pijk ), but if species k is absent from the
site, we have zero probability of detecting it, P(yijk = 0zik = 0) = 1.
The probit occupancy model is so named because it links ψik and pijk to occupancy and detection covariates
x ik and w ijk , respectively, with the standard normal CDF �. The probit link function can be paired with data
augmentation17,43–45 to yield efficient Gibbs samplers for the occupancy and detection regression coefficients β
and α, respectively.
Royle and N
ichols18 introduced a method for analyzing occupancy data that explicitly modeled the probability
of detecting species k at a site as a function of a surrogate related to the true species abundance. Assuming there
are Nik individuals of species k in sample region i and that all individuals in species k on the sample unit have
identical detection probabilities and are detected independently of other individuals, the probability of detecting
at least one of these individuals can be expressed as
ρijk = 1 − (1 − rjk )Nik ,
(7)
where rjk is a binomial sampling probability that a particular individual of species k is detected on occasion j.
While the RoyleNichols model facilitates inference on number of individuals of species k, Nik , at each site when
all the assumptions are met, we do not interpret them as such because sites are not necessarily closed in camera
trap studies due to mobile species with home ranges larger than the sampling radius of the camera. Note that
ρijk in (7) corresponds to the species probability of detection conditional on an intensity process. This is distinct
from pijk in the probit model that is conditional on an occupancy process.
The nonlinear function of rjk and Nik in (7) involves more parameters than would be identifiable in a typical
occupancy model, especially when the individual detection probability is heterogeneous across occasions (e.g.,
rjk are heterogeneous). In the heterogeneous case, rjk is connected to covariates with the logit link function:
logit(rjk ) = f (w jk , α k ),
(8)
where f (w ijk , α k ) is a linear function of the detection covariates w ijk and regression parameters α k.
Modeling interspecies dependence. We extend both occupancy models to account for interspecies
dependence by including random species effects in their process models. Following Royle and Nichols18, we
assume Nik ∼ Pois(�ik ), where �ik is mean intensity of species k at site i. We let � denote the vector of site specific
intensities stacked across the K species in the community. To model interspecies dependence, we specify the
conditional multivariate normal distribution:
log(�) ∼ N (Xβ + η, �� ),
(9)
η ∼ N (0, � spp ⊗ I n ),
(10)
where X is a blockdiagonal matrix of the K species design matrices, β = (β ′1 , . . . , β ′K )′ is a stacked vector of
species specific regression coefficients, η represents the random species effects, and � spp is a species covariance
matrix, and �� is a matrix that allows for additional covariance structures such as spatial dependence. For our
purposes of comparing the SDM, unrestricted JSDM, and restricted JSDM for differences galvanized by community confounding, we assumed a simple independent structure for log(�) and set �� = τ I .
In the probit model, we include a random species effect in the latent probability of occupancy:
�(ψ) = Xβ + η, where ψ is a vector of site specific occupancy probabilities stacked across the K species in the
community and X , β = (β ′1 , . . . , β ′K )′ , and η are defined as above.
In both occupancy models, η allows for dependence between all K species in the community at each site.
In the probit model, η characterizes interspecies dependence in the probability of occupancy, whereas in the
RoyleNichols model interspecies dependence is characterized in the species latent intensities. Just as certain
environmental features may not preclude species occupancy but can curb intensity, some species may coexist in
a region but not be able to jointly flourish46. Hence, interspecies dependence on latent intensity is conceptually
distinct from interspecies dependence on probability of occupancy and may lead to inferential differences in η
in the two occupancy models.
Tobler et al.4 developed a joint occupancy model that accounts for community structure using a latent variable
approach. They express the latent probability of occupancy of species k at site i as
�(ψik ) = x ′i β k + l ′i θ k ,
(11)
is a vector of length T of latent variables, and θ k are species specific regression coefficients. The latent
variable model (LVM) is a computationally efficient and implicitly accounts for community structure. Other
occupancy models have included interspecies dependence in the structure of the regression coefficients. Known
as multispecies models, these models assume the species specific regression coefficients β k stem from a common
multivariate normal distribution β k ∼ N (µ, �β ) where µ is the typical response of a species to covariates x and
�β allows for dependence in different species response to the same covariates47. In our study of mammalian
where l ′i
Scientific Reports 
Vol:.(1234567890)
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
4
�www.nature.com/scientificreports/
camera trap data, each species is modeled with unique covariates, and we do not consider shared environmental
responses.
Scheffe48 stipulated that the levels of a random effect are draws from a population, and the draws are not of
interest in themselves but only as samples from the larger population, which is of interest. In more recent literature, the term random effect is used more broadly. Hodges and Clayton49 categorized modern definitions of
a random effect into three different varieties. The definition commonly used in spatial statistics is, the levels of
the effect arise from a meaningful population, but they are the whole population and these particular levels are
of interest. We adopt this definition for the random species effects in (9). In practice, some levels of the population will likely not be included in the random species effects. For example, in Ivan et al.50, cameras were baited
and arranged to capture all members of the mammalian community, but several species were excluded from the
random species effects due to a lack of detections.
Priors. We specified normal priors for the regression coefficients, β, in the intensity and occupancy processes
of the RoyleNichols and probit models, respectively to facilitate comparison with the occupancy and spatial
confounding literature. We also specified normal priors for the detection coefficients, α, in the observation model
and the conjugate InverseWishart prior for the species covariance matrix � spp. A more general alternative to
the InverseWishart prior is to apply a Cholesky decomposition, � spp = LD−1 L′ , where L is lower diagonal
with ones along the diagonal and D is diagonal with positive diagonal elements, and specify priors for the
lower diagonal elements of L and diagonal elements of D51. We found the InverseWishart prior suitable for our
inferential goals, but see Chan and Jeliazkov51 for alternative covariance matrix priors.
The joint posterior distribution associated with our model is
[α, β, �, N, � spp y] ∝
�
Ji �
K
n
�
�
�
[yijk Nik , α k ] [Nik �ik ][α k ][β k ][�β 1 , · · · , β K , � spp ][� spp ].
k=1
i=1
(12)
j=1
See Appendix A for the full statements of both the joint probit and RoyleNichols occupancy models.
Community confounding
Restricted regression approach. We fit a restricted version of the each JSDM that orthogonalizes
the fixed and random species effects. In the RoyleNichols model, we express the species latent intensity and
occupancy process conditionally as
log(�) ∼ N (Xδ + (I − P X )η, τ 2 I),
(13)
η ∼ N (0, � spp ⊗ I n ),
(14)
where P X is the projection matrix onto the column space of X . Likewise, in the probit model we specify
�(ψ) = Xδ + (I − P X )η and retain the same prior for η as in (14). This specification forces the random species
effects to explain patterns in the community that are orthogonal to the fixed effects. The latent variables and fixed
effects in the LVM can also be orthogonalized. Writing (11) in matrix form, we have
�(ψ k ) = X ′ δ k + Lθ k ,
(15)
where X and L are the matrices of covariates and latent variables vertically stacked across sites, respectively. If
we assume common covariates across all K species, we can specify a restricted LVM as follows:
�(ψ k ) = X ′ δ k + (I − PX )Lθ k .
(16)
However, if covariates differ by species, i.e., X = X k , then the posterior distribution of latent variables will differ
by species. To retain a common posterior distribution of latent variables across all species, the latent variables
need to be orthogonalized against all covariates among the k species,
LR =
K
�
(I − P X k )L.
(17)
k=1
The specification of (17) is more restrictive than the orthogalization in the RoyleNichols and probit model, and
so we omit the LVM from our case study.
Hanks et al.31 showed that the restricted (13) and unrestricted (9) generalized linear mixed models GLMM
are reparameterizations of the same model and derived the following relationship between the unconditional δ
and conditional β fixed effects:
δ ≡ β + (X ′ X)−1 X ′ η.
(18)
Using (18), one can easily sample both sets of fixed effects by fitting either the restricted or unrestricted parameterization. We can also sample the covariance structure of the restricted random species effect from either model
fit by drawing samples from the distribution
Scientific Reports 
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
5
Vol.:(0123456789)
�www.nature.com/scientificreports/
Model
Data
β ESS Ratio
β Mean ESS
η ESS Ratio
η Mean ESS
�
�
E R2 (x 1 )Y
Rejection rate
Normal
Unconfounded
18.69
5143
6.20
5670
0.04
0.01
Normal
Confounded
8.67
4219
5.79
4800
0.51
0.87
Probit
Unconfounded
1.73
959
1.08
534
0.04
0.00
Probit
Confounded
1.96
444
1.23
293
0.19
0.63
RoyleNichols
Unconfounded
0.81
232
0.98
307
0.04
0.00
RoyleNichols
Confounded
0.80
186
0.98
301
0.18
0.51
Table 1. Summary of simulations results. All results are averaged across 3 magnitudes of random species
effects and 50 simulated datasets. ESS Ratio is the effective sample size
parameterizations
�
� of the restricted
over the unrestricted and the mean ESS is the average of the two. E R2 (x 1 )Y is the posterior mean R2 of
confounding for species 1 continuous habitat covariate. Rejection rate is the portion of times the the posterior
mean pvalue from overall Ftest of a linear relationship between x 1 and � was below 0.05.
′
� −1
spp,R ∼ Wishart(Sν + η (I − PX )η, ν + n).
(19)
Hence, regardless of which model is fit, we can obtain both the unconditional and conditional habitat effects as
well as the unrestricted and restricted species covariance matrices.
Measuring confounding. Hefley et al.27 showed how to assess confounding in SGLMM models by
computing the Pearson correlation coefficient between each pair of covariates and eigenvectors from the spectral
decomposition of the spatial covariance matrix. Likewise, Prates et al.35 proposed a test for spatial confounding
that can be calculated prior to model fitting. We propose another approach relevant to our method that aids
in interpretation. We compute the coefficient of determination of each covariate for species k regressed on the
estimated random species effects. Because the latent intensities are unknown, the coefficents of determination of
all covariates are derived quantities and can be computed at each iteration of the MCMC algorithm:
�
�′ �
�
(l) (l)
(l) (l)
(l) (x )
�
�
−
x̄
−
x̄
θ̂
θ̂
k
k
SSR
(l)
k
(20)
R2 (x k ) =
=
,
SST(x k )
(x k − x̄ k )′ (x k − x̄ k )
�
�
(l)
(l)
where x̄ k = (x̄k , . . . , x̄k )′ is the mean of the covariate x k for species k repeated n times, �(l) = η1 , . . . , ηK is
(l)
a matrix of the random species effects sampled for MCMC iteration l, and θ̂ are estimated
regression
coefficients
�
�
relating the estimated species intensities at iteration l to x k . The posterior mean E R2 (x k )Y provides a measure
of community confounding for the covariate x k and can help identify which fixed effects will vary between the
unrestricted and restricted models. Furthermore, we can use the global Ftest of the linear relationship between
x k and � to determine if confounding exists.
Simulation study
We performed a simulation study to investigate the effects of community confounding and orthogonalization
of the fixed and random species effects on model fitting. Specifically, we compared the effective sample sizes
of β and η for three different models for confounded and unconfounded data with unrestricted and restricted
parameterizations. The effective sample size (ESS) is the number of independent MCMC samples of a quantity
and is a metric for measuring the sampling efficiency of an MCMC algorithm. Higher ESS are preferable as
posterior distributions of quantities of interest can be obtained in fewer iterations.
We considered three models: The joint probit occupancy model, joint RoyleNichols model, and joint normal
model, which is derived from the scenario where � in the RoyleNichols is known (e.g., species density data). For
each model, 150 datasets were generated with the fixed and random species effects independent and another 150
datasets were generated with confounding between the fixed and random species effects. To induce confounding
between the fixed and random species effect, we expressed one covariate of the first species as a linear combination of the random species effects (i.e., x 1 = �θ).
Because the ratio of the random effects and random error magnitude is known to affect the severity of
confounding in the spatial context29,31,35, we varied the magnitude of the random species effect in each model
while holding the random error magnitude constant. Specifically, each dataset was subdivided into thirds with
50 datasets simulated to have small, medium, and large random species effects relative to the random error.
All 900 simulated datasets across models and confounding levels were for K = 2 species across n = 50 sites
with J = 10 occasions per site for the occupancy models. The correlation between the two species was allowed
to vary for each dataset. Each habitat design matrix included an intercept and one continuous covariate. Each
MCMC algorithm was run for a burnin period of L = 10000 to ensure convergence. The next L = 10000 iterations were used to calculate the posterior quantities in Table 1. Code for performing the simulation study in R
are available in the supplementary electronic files.
For both β and η, ESS was lower on average for the confounded data than the unconfounded data for all three
models demonstrating the negative impacts confounding can have on model fitting. For all three models, the
computational impact of fitting the restricted parameterization did not differ depending on whether confounding
Scientific Reports 
Vol:.(1234567890)
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
6
�www.nature.com/scientificreports/
Figure 1. Randomly selected sampling sites (gray circles) where passive infrared game cameras were deployed
in sprucefir (green) and lodgepole pine (yellow) forests in Colorado, USA, 20132014. Brown and orange are
the approximate extents of spruce beetle and mountain pine beetle impacts in sprucefir and lodgepole pine
forests, respectively, as of 2014. Reprinted from “Mammalian responses to changed forest conditions resulting
from bark beetle outbreaks in the southern Rocky Mountains,” by J. S. Ivan, 2018, Ecosphere, 9(8), e02369.
Figure produced with ArcMap 10.7 available at: https://desktop.arcgis.com/en/arcmap/10.7/getstarted/setup/
arcgisdesktopquickstartguide.html.
exists or not. In the case of the normal and probit models, fitting the restricted parameterization improved ESS
for both β and η, although the gains were much greater for the normal model. On the other hand, the restricted
parameterization of the RoyleNichols model did not improve ESS for β or η. The success of our method for
detecting community confounding differed across models. The method was most powerful for the normal model
followed by the probit and RoyleNichols models.
Camera trap survey
Study area. We analyzed data arising from a study area comprised of subalpine forests in the state of
Colorado between 2590 and 3660 m elevation (Fig. 1). Sites were restricted to public lands managed by the
United States Forest Service, National Park Service, Bureau of Land Management, and Colorado State Forest
Service. Forests in our study area were primarily composed of Lodgepole pine (Pinus contorta), Engelmann
spruce (Picea engelmannii), and subalpine fir (Abies lasiocarpa). Lodgepole pine was dominant at lower elevations
as well as higher elevations that were drier and/or on southfacing slopes; high elevation regions that had cool
northfacing slopes were codominated by Engelmann spruce and subalpine fir. Lodgepole pine is restricted
to the northern twothirds of Colorado, so all sites in the southern region of the study area were Engelmann
spruce, subalpine fir codominated. Quaking aspen (Populus tremuloides), Douglasfir (Pseudotsuga menziesii),
bristlecone pine (Pinus aristata), limber pine (Pinus flexilis), and blue spruce (Picea pungens) were also present
at some sites. Mean July and January temperature across the study area were 14 and − 6.1 °C respectively. All
camera data were collected during summers 2013–2014.
Sampling design. The primary goal of Ivan et al.50 was to assess mammalian responses to bark beetle
outbreaks, thus sites were randomly selected to facilitate inference on the beetle outbreak covariates. Beetle
outbreak covariates included the number of years since the initial outbreak (YSO) and the severity of the
outbreak measured by mean overstory mortality (severity). The sample of n = 300, 1 km2 sites was evenly split
across the two dominant forest types, sprucefir and lodgepole pine. Additional environmental covariates were
collected at each site, and a description of these is included in Appendix B.
Scientific Reports 
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
7
Vol.:(0123456789)
�www.nature.com/scientificreports/
Passive infrared camera traps (Reconyx PC800, Holmen, Wisconsin, USA) were deployed near the center of
each site. Cameras were approximately 0.5 m above the ground and pointed toward a lure tree 4–5 m away52. The
setup was designed to maximize detections of both large and smallbodied mammals in the local community
while minimizing attraction of individuals from outside the sampling region of the site. The sampling regions
were likely not closed to immigration/emigration; thus, we interpret elevated detections at a site as more
individuals using, as opposed to occupying, that s ite53. For additional details regarding the sampling design and
study area see Ivan et al.50.
Model fitting.
We fit both the RoyleNichols and probit occupancy models to the camera trap data binned
into 20 twoday occasions because simulations showed this was the number of replications needed to identify
a quadratic effect of occasion on individual detection probability. Not all cameras were operational for the
entire 40 day sampling period, and thus the number of occasions varied from 720. We discarded four sites at
which the camera was operational for less than one occasion. We also discarded another 12 sites that had been
infested by bark beetles for more than 10 years. Ivan et al.50 truncated the bark beetle infestation covariate at
10 years because estimates of response curves beyond 10 years would be unreliable with so few sites. The final
sample size was n = 284 sites. We built distribution models for the 13 species for which Ivan et al.50 performed
a single species analysis; several rare species were excluded from analysis due to insufficient detections. We
note, however, that these rare species parameters may be identifiable in the joint model as has been the case
in previous studies2,47,54–57. Our final dataset then included 3692 unique encounter histories at n = 284 sites,
stacked across K = 13 species.
Ivan et al.50 used a sequential procedure similar to that described in Lebreton et al.58 to select the covariates
in the occupancy and detection processes for each species. We adopted their detection model and used the
same covariates but a different set of basis functions for YSO. Ivan et al.50 treated YSO as a grouping variable and
considered probability of use response curves that allowed for cubic associations and delayed responses to bark
beetle infestation. Multiple response curves were model averaged to produce predictive YSO response curves for
each species. We used orthogonal polynomial basis functions for the YSO variable in the species intensity models.
The basis functions included a linear (YSO1) and quadratic (YSO2) effect. Appendix E provides a full description of the intensity and detection models. All continuous covariates were scaled to have mean 0 and variance 1.
We fit all models using Markov chain Monte Carlo (MCMC). To improve mixing and predictive ability, we
regularized the coefficients β and α with informative priors: β ∼ N (0, I) and α ∼ N (0, I)26. We specified a
−1
59
vague prior of � −1
spp ∼ Wishart(15, (15I) ) for the species variancecovariance matrix . For the RoyleNichols
model, we used Gibbs sampling based on conjugate priors for parameters � spp, η, and β and MetropolisHastings
updates for N , � , and α. Derivations of the conjugate fullconditional distributions are provided in Appendix C
with details about the MetropolisHastings updates. We tuned the MetropolisHastings updates so that acceptance rates varied between 20 and 40% for α, N , and � . Using data a ugmentation17,43–45, all the parameters of the
probit model can be sampled with Gibbs updates.
We set τ 2 = 2.25 in both (9) and (13). This choice was supported by the asymptotic equivalence between Poisson and logistic regression. In a generalized occupancy model, the latent probability of occupancy is specified as
logit(ψi ) ∼ N (x ′i β, τ 2 ). Hanson et al.60 investigated the relationship between the prior on β and induced prior
on the latent probability of success ψi in logistic regression; their work showed that specifying an uninformative
normal prior on β (i.e., setting τ 2 large) induces a Ushaped prior for ψi with most of the density concentrated
near 0 and 1. Broms et al.14 recommended setting τ 2 = 2.25 in occupancy models, which results in a relatively
flat prior for ψ . For rare species, �i in (9) and (13) is analogous to ψi , and specifying a variance of τ 2 = 2.25 is
minimally informative.
Baddeley61 motivated the asymptotic equivalence of Poisson and logistic regression in a spatial context where
counts of points from a nonhomogeneous Poisson process are recorded in a lattice; they showed that, as the
grid cells of the lattice become infinitesimally small, the inference yielded from Poisson and logistic regression
are equivalent. This result can be applied more generally to any dataset where there is a high proportion of zero
counts. We demonstrate the asymptotic equivalence between Poisson and logistic regression in the RoyleNichols
model in Appendix D.
We ran the MCMC algorithm for L = 50000 iterations, and discarded the first 12500 iterations as burnin. We
fit an SDM, unrestricted JSDM, and restricted JSDM of both the RoyleNichols and probit occupancy models.
The “Results” section presents inference for regression coefficients for all six model fits.
Results.
Ivan et al.50 fit SDMs to infer changes in mammalian use of stands impacted by the bark beetle
epidemic. The impact of bark beetle damage was measured by years since initial infestation (YSO) and severity
of outbreak quantified by mean overstory mortality (DeafConif). The posterior distributions of the regression
coefficients varied between the probit SDM and unrestricted JSDM, although the magnitude of difference
differed by species (Fig. 2). The posterior variances of the SDM regression coefficients were smaller than the
unrestricted JSDM. Posterior variances and means of the restricted probit JSDM regression coefficients quite
similar to those from unrestricted JSDM. The only noticeable difference between the unrestricted and restricted
regression coefficients was that the restricted coefficients had slightly smaller posterior variances on average.
As with the probit modeling results, posterior distributions of the regression coefficients in the RoyleNichols
SDM were more concentrated near zero than those of the JSDM (Fig. 3). Also, posterior distributions of the
restricted JSDM regression coefficients were slightly tighter and centered closer to zero.
We calculated the unrestricted and restricted posterior correlation matrices for both the probit and RoyleNichols models. Pairwise differences between each entry of the posterior mean of the four correlation matrices
were bounded between (−0.2, 0.2), so only the correlation matrix of the unrestricted RoyleNichols model
Scientific Reports 
Vol:.(1234567890)
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
8
�www.nature.com/scientificreports/
American Marten
Black Bear
Chipmunk
Coyote
Elk
Goldenmantled Ground Squirrel
Moose
Mule Deer
YSO2
YSO1
DeadConif
YSO2
YSO1
Model
Variable
DeadConif
Restricted
Porcupine
Red Fox
Red Squirrel
Snowshoe Hare
SDM
Unrestricted
YSO2
YSO1
DeadConif
2
Yellowbellied Marmot
1
0
1
2
2
1
0
1
2
2
1
0
1
2
YSO2
YSO1
DeadConif
2
1
0
1
2
Figure 2. Marginal posterior distributions of infestation regression parameters. Posterior distributions shown
are from the probit SDM, unrestricted JSDM, and restricted JSDM. DeadConif is the overstory mortality
percentage, a proxy for severity of bark beetle infestation. YSO1 is the linear effect of the number of years since a
site was infested with bark beetles. YSO2 is the quadratic effect. Figure created in R 4.1.262.
American Marten
Black Bear
Chipmunk
Coyote
Elk
Goldenmantled Ground Squirrel
Moose
Mule Deer
YSO2
YSO1
DeadConif
YSO2
YSO1
Model
Variable
DeadConif
Restricted
Porcupine
Red Fox
Red Squirrel
Snowshoe Hare
SDM
Unrestricted
YSO2
YSO1
DeadConif
1
Yellowbellied Marmot
0
1
1
0
1
1
0
1
YSO2
YSO1
DeadConif
1
0
1
Figure 3. Marginal posterior distributions of infestation regression parameters. Posterior distributions
shown are from the RoyleNichols SDM, unrestricted JSDM, and restricted JSDM. DeadConif is the overstory
mortality percentage, a proxy for severity of bark beetle infestation. YSO1 is the linear effect of the number of
years since a site was infested with bark beetles. YSO2 is the quadratic effect. Figure created in R 4.1.262.
Scientific Reports 
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
9
Vol.:(0123456789)
�AM
www.nature.com/scientificreports/
1.0
BB
0.0 1.0
CY
0.2 0.0 1.0
CM 0.1 0.0 0.0 1.0
EK
CM
CY
BB
AM
0.0 0.1 0.2 0.1 1.0
GM
0.1 0.1 0.6 0.0 0.2 1.0
MS
0.0 0.0 0.1 0.0 0.1 0.1 1.0
MD
0.0 0.1 0.1 0.1 0.2 0.2 0.1 1.0
PC
0.0 0.0 0.0 0.1 0.0 0.1 0.1 0.0 1.0
RF
0.0 0.0 0.1 0.0 0.0 0.1 0.1 0.0 0.0 1.0
RS
0.2 0.0 0.4 0.1 0.2 0.4 0.1 0.1 0.0 0.0 1.0
SH
0.0 0.1 0.3 0.2 0.2 0.3 0.1 0.1 0.1 0.1 0.2 1.0
YM
0.0 0.0 0.1 0.2 0.0 0.0 0.1 0.1 0.2 0.0 0.1 0.2 1.0
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
YM
SH
RS
RF
PC
MD
MS
GM
EK
1
Figure 4. Posterior mean of species correlation matrix. Estimates are from the RoyleNichols unrestricted joint
species distribution model. AM = American Marten, BB = Black Bear, CY = Coyote, CM = Chipmunk spp., Ek
= Elk, GM = Goldenmantled Ground Squirrel, MS = Moose, MD = Mule Deer, PC = Porcupine, RF = Red Fox,
RS = Red Squirrel, SH = Snowshoe Hare, YM = Yellowbellied Marmot. Figure created in R 4.1.262.
is shown (Fig. 4). The posterior distributions of the pairwise correlations all overlapped zero except for the
pairwise correlations between coyotes and goldenmantled ground squirrels, coyotes and red squirrels, and
goldenmantled ground squirrels and red squirrels. In the restricted probit JSDM, the correlations between
coyotes and snowshoe hares, and snowshoe hares and red squirrels also did not overlap zero.
We calculated the posterior R2 of confounding for each covariate in each species specific model as described
in (20). All posterior R2 were below 0.05 for both the RoyleNichols and probit models giving no indication of
community confounding for all covariates considered.
Discussion
We found that confounding between the fixed and random species effects can reduce sampling efficiency in
MCMC algorithms and that orthogonalizing the fixed and random species effects can alleviate this problem
when fitting some joint species distribution models. In the simulation study, we discovered that, even when the
data were not confounded, orthogonalizing the fixed and random species effects still conferred a computational
benefit for the normal and probit model. This was also true for our case study where the mean effective sample
size of the conditional habitat effects β in the probit model was 32% larger when fit with the restricted parameterization. The effective sample size of η in the probit model was 3% greater for the restricted parameterization.
The case study indicated that inference on speciesenvironment associations in occupancy models can change
based on whether the distribution model accounts for community structure. Orthogonalizing the fixed and random species effects in the probit and RoyleNichols model slightly reduced but did not nullify the differences as
in the case for normal data. The similarity between the restricted and unrestricted JSDM coupled with the lack
of evidence for community confounding suggests additional mechanisms lead inference in SDMs and JSDMs to
Scientific Reports 
Vol:.(1234567890)
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
10
�www.nature.com/scientificreports/
differ, a finding consistent with Caradima et al.63. Overall, there was still large agreement in posterior inference
produced by the SDM and JSDMs for both occupancy models. In additional simulation studies on the probit and
RolyeNichols occopancy models, we found that community confounding can lead to larger differences between
the SDM and unrestricted JSDM and that the restricted JSDM again mitigates but rarely nullifies these differences.
We were also interested in whether the RoyleNichols model could identify additional associations compared
with the probit model. The RoyleNichols model measures associations conditional on an intensity process
rather than an occupancy process, and intensity is likely a function of additional factors beyond those influencing occupancy19–21. For the camera trap data, the opposite was true, in that the probit model identified more
environmentalspecies and speciesspecies associations. One possible explanation for this is that the probit model
is more parsimonious which sharpens posterior distributions.
A related method to restricted regression, which orthogonalizes the fixed and random effects, is principal
components regression, which performs an orthogonalization procedure solely among the fixed effects. To
motivate their similarities, consider a simpler case where the latent intensities, � , of the K species in our community were known. We could construct K regression models for predicting each species intensity as follows:
�k = X k β k + �−k ηk + ε,
(21)
�
where �−k = �1 , . . . , �k−1 , �k+1 , . . . , �K is a matrix of the K − 1 other species intensities. If X k and �−k were
highly collinear, principal component regression might be applied. Principal components regression is so named
because
the variation explained by X k and �−k into p = p1 + p2 principal components,
� it decomposes
�
Ŵ k = γ 1 , . . . , γ p , where p1 and p2 are the number of columns of X k and �−k respectively. The p principal
components retain all the information explained by X k and �−k but are orthogonal. The regression model
�
�k = W k θ + ε,
(22)
�
�
W k = X k , �−k Ŵ k ,
(23)
�
�
Ŵ ⋆k = γ 1 , . . . , γ r .
(25)
often improves sampling efficiency and can recover the posterior means and variances of β k and ηk in (21).
However, inference on β k and ηk is often adjusted by truncating off the last p − r , for r < p, eigenvectors of Ŵ k
and employing the new design matrix
�
�
W ⋆k = X k , �−k Ŵ ⋆k ,
(24)
By retaining only the first r principal components, the smallest sources of variation are ignored in the estimation
of β k and ηk . Jeffers64 implemented this approach truncating off the last 7 of 13 principal components to adjust
the estimates of regression coefficients relating various tree characteristics to maximum compressive strength.
Other studies have selected a subset of principal components based on their strength of association with the
response variable65–68. In some cases, the coefficient estimates from these reduced rank approaches appeared
more tenable than those from the full rank specifications based on known physical relationships between the
predictors and response. Thus, like restricted regression, principal components regression can be used for solely
computation purposes or to adjust inference.
Recently, concerns regarding the coverage properties of the fixed effects estimator under restricted regression
have been expressed36,69. For example, Zimmerman and Ver Hoef69 showed that applying any restricted regression method to a SGLMM leads to frequentest coverage of the fixed effects that is lower than the corresponding
nonspatial model. Similarly, Khan and C
alder36 found that when fitting a restricted version of the SGLMM with
an intrinsic conditional autoregressive prior, credible intervals of the fixed effects from the restricted model were
generally nested inside those yielded by the nonspatial model. Given these results, both Zimmerman and Ver
Hoef69 and Khan and Calder36 recommended reverting to inference from the nonspatial model, rather than that
of the restricted SGLMM, when inference from the unrestricted SGLMM appears untenable.
We did not observe the same pattern in our restricted JSDM but found the length of credible intervals of the
restricted regression coefficients to generally be between that of the SDM and unrestricted JSDM. Nonetheless,
if higher coverage is desired, one can always extract the conditional coefficients from the restricted JSDM while
still benefiting from the increased stability that results from orthogonalizing the fixed and random effets. When
deciding between inference from the restricted and unrestricted JSDM, one should also consider the random
species effects η. Because the random effect η is rarely of interest in spatial applications, there has been little
investigation on the inferential impacts of restricted regression on η. Such investigation, however, may be helpful
in determining the appropriateness of restricted regression for JSDMs.
There are several conceptual facets to consider regarding the application of restricted regression in joint
species distribution modeling. Frequently, JSDMs are described as accounting for residual correlations between
species that cannot be explained by the environmental c ovariates4,5,63. We have shown, however, that in some
JSDMs, the random species effect can explain variation that is collinear with environmental covariates. Only
in the restricted JSDM, does the random species effect explain variation that is residual to the environmental
covariates attributing all contested sources of variation to the fixed effect. Yet, given that species environmental
requirements can fluctuate based on their symbiotic relationships, one might argue that interplay between the
environmental effects and interspecies dependence is ecology warranted. Therefore, any method that removes
the conditional nature of these effects like restricted regression is inappropriate.
Scientific Reports 
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
11
Vol.:(0123456789)
�www.nature.com/scientificreports/
JSDMs have been described as correcting our knowledge of speciesenvironment relationship by accounting
for interspecies dependence5. Poggiato et al.5 argued that JSDMs help us better quantify uncertainty regarding speciesenvironment relationships, but they cannot explain discrepancies in a species theoretical and realized niche. We agree that phenomenological JSDMs should not be used to disentangle the marginal effects of
environment and interspecies dependence on species distributions and would recommend the development of
mechanistic models to investigate interspeciesenvironment associations.
Experimental methods and modeling techniques for alleviating confounding have been proposed in ecology.
Hefley et al.70 showed that replicate populations can help disentangle confounded fixed and random effects. In
the context of joint species distribution modeling, replication involves analyzing several communities simultaneously, which is often infeasible. Hefley et al.70 also recommended explicit population models rather than
phenomenological regressionbased models for analysis of temporally confounded count data. Similarly, Fieberg
et al.71 advocated for mechanistic models guided by causal diagrams for analyzing temporally confounded animal
movement data. An avenue of future research for joint species distribution modeling is to compare inference from
phenomenological regressionbased models, such as the one proposed here, with that of models that explicitly
include ecological mechanisms such as competitive exclusion, mutualism, and predation. Because community
and temporal confounding have the same mathematical framework, mechanistic models are a promising solution for confounded multispecies data.
In summary, we specified a JSDM that accounts for interspecies dependence at the intensity level, and examined how inference from the joint model differed from the joint probit model. We performed a simulation study
on three JSDMs to examine the computational difficulties associated with community confounding and investigated whether orthogonalizing the fixed and random species effect could alleviate these difficulties. Further, we
considered how inference in both occupancy models differed depending on the assumed community structure.
Lastly, we discussed how joint species distribution modeling is distinct from spatial and time series applications
in that the random effect is almost always of inferential interest, and hence, adjustments to the regression coefficients, β , and random effects, η, should both be considered. Our main conclusion is that, even for researchers
who desire inference solely on the conditional relationship between the fixed speciesenvironment and random
species effects, fitting the JSDM with a restricted parameterization can give computational benefits.
Data availability
The data are available in the Supplementary Information files.
Code availability
All algorithms and code for fitting and analyzing results from the six model variants are available in the
Supplementary Information files. All MCMC algorithms and analyses were coded in R 4.1.262.
Received: 24 November 2021; Accepted: 28 June 2022
References
1. Altwegg, R. & Nichols, J. D. Occupancy models for citizenscience data. Methods Ecol. Evol. 10, 8–21 (2019).
2. Hui, F., Warton, D., Foster, S. & Dunstan, P. To mix or not to mix: Comparing the predictive performance of mixture models vs.
separate species distribution models. Ecology 94, 1913–1919. https://doi.org/10.1890/121322.1 (2013).
3. Warton, D. et al. So many variables: Joint modeling in community ecology. Trends Ecol. Evol. 30, 766–779 (2015).
4. Tobler, M. W. et al. Joint species distribution models with species correlations and imperfect detection. Ecology 100, e02754. https://
doi.org/10.1002/ecy.2754 (2019).
5. Poggiato, G. et al. On the interpretations of joint modeling in community ecology. Trends Ecol. Evol. 36, 391–401 (2021).
6. Estevo, C. A., NagyReis, M. B. & Nichols, J. D. When habitat matters: Habitat preferences can modulate cooccurrence patterns
of similar sympatric species. PLoS One 12, e0179489 (2017).
7. Steen, D. A. et al. Snake cooccurrence patterns are best explained by habitat and hypothesized effects of interspecific interactions.
J. Anim. Ecol. 83, 286–295 (2014).
8. Wisz, M. S. et al. The role of biotic interactions in shaping distributions and realised assemblages of species: Implications for species
distribution modelling. Biol. Rev. 88, 15–30 (2013).
9. Wilkinson, D. P., Golding, N., GuilleraArroita, G., Tingley, R. & McCarthy, M. A. A comparison of joint species distribution
models for presenceabsence data. Methods Ecol. Evol. 10, 198–211 (2019).
10. MacKenzie, D. I. et al. Estimating site occupancy rates when detection probabilities are less than one. Ecology 83, 2248–2255 (2002).
11. Dorazio, R. M. & Royle, J. A. Estimating size and composition of biological communities by modeling the occurrence of species.
J. Am. Stat. Assoc. 100, 389–398 (2005).
12. Dorazio, R. M., Royle, J. A., Söderström, B. & Glimskär, A. Estimating species richness and accumulation by modeling species
occurrence and detectability. Ecology 87, 842–854 (2006).
13. Tobler, M. W., Zúñiga Hartley, A., CarrilloPercastegui, S. E. & Powell, G. V. N. Spatiotemporal hierarchical modelling of species
richness and occupancy using camera trap data. J. Appl. Ecol. 52, 413–421. https://doi.org/10.1111/13652664.12399 (2015).
14. Broms, K. M., Hooten, M. B. & Fitzpatrick, R. M. Model selection and assessment for multispecies occupancy models. Ecology
97, 1759–1770. https://doi.org/10.1890/151471.1 (2016).
15. Rota, C. T. et al. A multispecies occupancy model for two or more interacting species. Methods Ecol. Evol. 7, 1164–1173. https://
doi.org/10.1111/2041210X.12587 (2016).
16. Maphisa, D. H., SmitRobinson, H. & Altwegg, R. Dynamic multispecies occupancy models reveal individualistic habitat
preferences in a highaltitude grassland bird community. PeerJ 7, e6276 (2019).
17. Johnson, D. S., Conn, P. B., Hooten, M. B., Ray, J. C. & Pond, B. A. Spatial occupancy models for large data sets. Ecology 94, 801–808
(2013).
18. Royle, J. & Nichols, J. Estimating abundance from repeated presenceabsence data or point counts. Ecology 84, 777–790 (2003).
19. Orrock, J. L., Pagels, J. F., McShea, W. J. & Harper, E. K. Predicting presence and abundance of a small mammal species: The effect
of scale and resolution. Ecol. Appl. 10, 1356–1366 (2000).
Scientific Reports 
Vol:.(1234567890)
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
12
�www.nature.com/scientificreports/
20. Cingolani, A. M., Cabido, M., Gurvich, D. E., Renison, D. & Díaz, S. Filtering processes in the assembly of plant communities: Are
species presence and abundance driven by the same traits?. J. Veg. Sci. 18, 911–920 (2007).
21. Dibner, R. R., Doak, D. F. & Murphy, M. Discrepancies in occupancy and abundance approaches to identifying and protecting
habitat for an atrisk species. Ecol. Evol. 7, 5692–5702. https://doi.org/10.1002/ece3.3131 (2017).
22. Bascompte, J. Mutualistic networks. Front. Ecol. Environ. 7, 429–436 (2009).
23. Van Dam, N. How plants cope with biotic interactions. Plant Biol. 11, 1–5 (2009).
24. Clark, A. E. & Altwegg, R. Efficient Bayesian analysis of occupancy models with logit link functions. Ecol. Evol. 9, 756–768 (2019).
25. Broms, K. M., Johnson, D. S., Altwegg, R. & Conquest, L. L. Spatial occupancy models applied to atlas data show Southern Ground
Hornbills strongly depend on protected areas. Ecol. Appl. 24, 363–374 (2014).
26. Hooten, M. B. & Hobbs, N. T. A guide to Bayesian model selection for ecologists. Ecol. Monogr. 85, 3–28. https://doi.org/10.1890/
140661.1 (2015).
27. Hefley, T. J., Hooten, M. B., Hanks, E. M., Russell, R. E. & Walsh, D. P. The Bayesian group lasso for confounded spatial data. J.
Agric. Biol. Environ. Stat. 22, 42–59 (2017).
28. Reich, B. J., Hodges, J. S. & Zadnik, V. Effects of residual smoothing on the posterior of the fixed effects in diseasemapping models.
Biometrics 62, 1197–1206 (2006).
29. Hodges, J. S. & Reich, B. J. Adding spatiallycorrelated errors can mess up the fixed effect you love. Am. Stat. 64, 325–334. https://
doi.org/10.1198/tast.2010.10052 (2010).
30. Hughes, J. & Haran, M. Dimension reduction and alleviation of confounding for spatial generalized linear mixed models. J. R.
Stat. Soc. Ser. B (Statistical Methodology) 75, 139–159. https://doi.org/10.1111/j.14679868.2012.01041.x (2013).
31. Hanks, E. M., Schliep, E. M., Hooten, M. B. & Hoeting, J. A. Restricted spatial regression in practice: Geostatistical models,
confounding, and robustness under model misspecification. Environmetrics 26, 243–254. https://d
oi.o
rg/1 0.1 002/e nv.2 331 (2015).
32. Bradley, J. R. et al. Multivariate spatiotemporal models for highdimensional areal data with application to longitudinal employerhousehold dynamics. Ann. Appl. Stat. 9, 1761–1791 (2015).
33. Murakami, D. & Griffith, D. A. Random effects specifications in eigenvector spatial filtering: A simulation study. J. Geogr. Syst. 17,
311–331 (2015).
34. Thaden, H. & Kneib, T. Structural equation models for dealing with spatial confounding. Am. Stat. 72, 239–252 (2018).
35. Prates, M. O. et al. Alleviating spatial confounding for areal data problems by displacing the geographical centroids. Bayesian Anal.
14, 623–647 (2019).
36. Khan, K. & Calder, C. A. Restricted spatial regression methods: Implications for inference. J. Am. Stat. Assoc 1–13 (2020).
37. Paciorek, C. The importance of scale for spatialconfounding bias and precision of spatial regression estimators. Stat. Sci. A Rev.
J. Inst. Math. Stat. 25, 107–125. https://doi.org/10.1214/10STS326 (2010).
38. Dominici, F., McDermott, A. & Hastie, T. J. Improved semiparametric time series models of air pollution and mortality. J. Am.
Stat. Assoc. 99, 938–948 (2004).
39. Houseman, E. A., Coull, B. A. & Shine, J. P. A nonstationary negative binomial time series with timedependent covariates:
Enterococcus counts in Boston Harbor. J. Am. Stat. Assoc. 101, 1365–1376 (2006).
40. Corbeil, R. R. & Searle, S. R. Restricted maximum likelihood (REML) estimation of variance components in the mixed model.
Technometrics 18, 31–38 (1976).
41. Hoeting, J. A., Leecaster, M. & Bowden, D. An improved model for spatially correlated binary responses. J. Agric. Biol. Environ.
Stat. 5, 102–114 (2000).
42. Tyre, A. J. et al. Improving precision and reducing bias in biological surveys: Estimating falsenegative error rates. Ecol. Appl. 13,
1790–1801. https://doi.org/10.1890/025078 (2003).
43. Albert, J. H. & Chib, S. Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679 (1993).
44. Hooten, M. B., Larsen, D. R. & Wikle, C. K. Predicting the spatial distribution of ground flora on large domains using a hierarchical
Bayesian model. Landsc. Ecol. 18, 487–502 (2003).
45. Dorazio, R. M. & Rodriguez, D. T. A Gibbs sampler for Bayesian analysis of siteoccupancy data. Methods Ecol. Evol. 3, 1093–1098
(2012).
46. Clark, J. S. et al. Highdimensional coexistence based on individual variation: A synthesis of evidence. Ecol. Monogr. 80, 569–608
(2010).
47. Ovaskainen, O. & Soininen, J. Making more out of sparse data: Hierarchical modeling of species communities. Ecology 92, 289–295
(2011).
48. Scheffe, H. The Analysis of Variance Vol. 72 (John Wiley & Sons, New Jersey, 1959).
49. Hodges, J. S. & Clayton, M. K. Random effects old and new. Stat. Sci. (2011).
50. Ivan, J., Seglund, A., Truex, R. & Newkirk, E. Mammalian responses to changed forest conditions resulting from bark beetle
outbreaks in the southern Rocky Mountains. Ecospherehttps://doi.org/10.1002/ecs2.2369 (2018).
51. Chan, J.C.C. & Jeliazkov, I. MCMC estimation of restricted covariance matrices. J. Comput. Graph. Stat. 18, 457–480 (2009).
52. Blecha, K. A. Riskreward tradeoffs in the foraging strategy of cougar (puma concolor): Prey distribution, anthropogenic
development, and patch selection. Thesis, Colo. State Univ. Fort Collins, Color. USA (2015).
53. MacKenzie, D. I. et al. Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence (Academic
Press, USA, 2006).
54. Guisan, A., Weiss, S. & Weiss, A. GLM versus CCA spatial modeling of plant species distribution. Plant Ecol. 143, 107–122. https://
doi.org/10.1023/A:1009841519580 (1999).
55. Madon, B., Warton, D. I. & Araújo, M. B. Communitylevel vs speciesspecific approaches to model selection. Ecography 36,
1291–1298. https://doi.org/10.1111/j.16000587.2013.00127.x (2013).
56. Ovaskainen, O., Abrego, N., Halme, P. & Dunson, D. Using latent variable models to identify large networks of speciestospecies
associations at different spatial scales. Methods Ecol. Evol. 7, 549–555. https://doi.org/10.1111/2041210X.12501 (2015).
57. Tikhonov, G., Abrego, N., Dunson, D. & Ovaskainen, O. Using joint species distribution models for evaluating how speciestospecies associations depend on the environmental context. Methods Ecol. Evol. 8, 443–452. https://doi.org/10.1111/2041210X.
12723 (2017).
58. Lebreton, J.D., Burnham, K. P., Clobert, J. & Anderson, D. R. Modeling survival and testing biological hypotheses using marked
animals: A unified approach with case studies. Ecol. Monogr. 62, 67–118. https://doi.org/10.2307/2937171 (1992).
59. Chung, Y., Gelman, A., RabeHesketh, S., Liu, J. & Dorie, V. Weakly informative prior for point estimation of covariance matrices
in hierarchical models. J. Educ. Behav. Stat. 40, 136–157. https://doi.org/10.3102/1076998615570945 (2015).
60. Hanson, T. E. et al. Informative g priors for logistic regression. Bayesian Anal. 9, 597–612 (2014).
61. Baddeley, A. et al. Spatial logistic regression and changeofsupport in poisson point processes. Electron. J. Stat. 4, 1151–1201.
https://doi.org/10.1214/10EJS581 (2010).
62. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria
(2021).
63. Caradima, B., Schuwirth, N. & Reichert, P. From individual to joint species distribution models: A comparison of model complexity
and predictive performance. J. Biogeogr. 46, 2260–2274 (2019).
64. Jeffers, J. N. Two case studies in the application of principal component analysis. J. R. Stat. Soc. Ser. C (Applied Statistics) 16, 225–236
(1967).
Scientific Reports 
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
13
Vol.:(0123456789)
�www.nature.com/scientificreports/
65. Hill, C. R., Fomby, T. B. & Johnson, S. R. Component selection norms for principal components regression. Commun. Stat. Methods
6, 309–334 (1977).
66. Kung, E. C. & Sharif, T. A. Regression forecasting of the onset of the Indian summer monsoon with antecedent upper air conditions.
J. Appl. Meteorol. Climatol. 19, 370–380 (1980).
67. Smith, G. & Campbell, F. A critique of some ridge regression methods. J. Am. Stat. Assoc. 75, 74–81 (1980).
68. Jolliffe, I. T. A note on the use of principal components in regression. J. R. Stat. Soc. Ser. C (Applied Statistics) 31, 300–303 (1982).
69. Zimmerman, D. L. & Ver Hoef, J. M. On deconfounding spatial confounding in linear models. Am. Stat. 1–9 (2021).
70. Hefley, T. J., Hooten, M. B., Drake, J. M., Russell, R. E. & Walsh, D. P. When can the cause of a population decline be determined?.
Ecol. Lett. 19, 1353–1362 (2016).
71. Fieberg, J., Ditmer, M. & Freckleton, R. Understanding the causes and consequences of animal movement: A cautionary note on
fitting and interpreting regression models with timedependent covariates. Methods Ecol. Evol.https://d
oi.o
rg/1 0.1 111/j.2 0412 10X.
2012.00239.x (2012).
Acknowledgements
This research was funded by Colorado Parks and Wildlife.
Author contributions
J.V. and M.H. wrote the manuscript and designed the model. J.I. acquired the data. All authors reviewed the
manuscript.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary Information The online version contains supplementary material available at https://doi.org/
10.1038/s41598022156946.
Correspondence and requests for materials should be addressed to J.J.V.E.
Reprints and permissions information is available at www.nature.com/reprints.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International
License, which permits use, sharing, adaptation, distribution and reproduction in any medium or
format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons licence, and indicate if changes were made. The images or other third party material in this
article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the
material. If material is not included in the article’s Creative Commons licence and your intended use is not
permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
© The Author(s) 2022, corrected publication 2023
Scientific Reports 
Vol:.(1234567890)
(2022) 12:12235 
https://doi.org/10.1038/s41598022156946
14
�
Dublin Core
The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.
Title
A name given to the resource
Journal Articles
Description
An account of the resource
CPW peerreviewed journal publications
Text
A resource consisting primarily of words for reading. Examples include books, letters, dissertations, poems, newspapers, articles, archives of mailing lists. Note that facsimiles or images of texts are still of the genre Text.
Dublin Core
The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.
Title
A name given to the resource
Community confounding in joint species distribution models
Description
An account of the resource
Joint species distribution models have become ubiquitous for studying speciesenvironment relationships and dependence among species. Accounting for community structure often improves predictive power, but can also affect inference on speciesenvironment relationships. Specifically, some parameterizations of joint species distribution models allow interspecies dependence and environmental effects to explain the same sources of variability in species distributions, a phenomenon we call community confounding. We present a method for measuring community confounding and show how to orthogonalize the environmental and random species effects in suite of joint species distribution models. In a simulation study, we show that community confounding can lead to computational difficulties and that orthogonalizing the environmental and random species effects can alleviate these difficulties. We also discuss the inferential implications of community confounding and orthogonalizing the environmental and random species effects in a case study of mammalian responses to the Colorado bark beetle epidemic in the subalpine forest by comparing the outputs from occupancy models that treat species independently or account for interspecies dependence. We illustrate how joint species distribution models that restrict the random species effects to be orthogonal to the fixed effects can have computational benefits and still recover the inference provided by an unrestricted joint species distribution model.
Bibliographic Citation
A bibliographic reference for the resource. Recommended practice is to include sufficient bibliographic detail to identify the resource as unambiguously as possible.
Van Ee, J. J., J. S. Ivan, and M. B. Hooten. 2022. Community confounding in joint species distribution models. Scientific Reports 12:1–14. https://doi.org/10.1038/s41598022156946
Creator
An entity primarily responsible for making the resource
Van Ee, Justin J.
Ivan, Jacob S.
Hooten, Mevin B.
Subject
The topic of the resource
Colorado bark beetle
Joint species distribution model
Community confounding
Extent
The size or duration of the resource.
14 pages
Rights
Information about rights held in and over the resource
<a href="http://rightsstatements.org/vocab/InCNC/1.0/">IN COPYRIGHT  NONCOMMERCIAL USE PERMITTED</a>
Format
The file format, physical medium, or dimensions of the resource
application/pdf
Language
A language of the resource
English
Is Part Of
A related resource in which the described resource is physically or logically included.
Scientific Reports
Date Accepted
Date of acceptance of the resource. Examples of resources to which a Date Accepted may be relevant are a thesis (accepted by a university department) or an article (accepted by a journal).
06/28/2022
Date Issued
Date of formal issuance (e.g., publication) of the resource.
07/18/2022
Date Submitted
Date of submission of the resource. Examples of resources to which a Date Submitted may be relevant are a thesis (submitted to a university department) or an article (submitted to a journal).
11/24/2021
Type
The nature or genre of the resource
Article