Back to publications

Journal Article

Spatial outliers as a pattern determinant for explaining heterogeneity

AuthorsKai Ren*, Yongze Song*, Xinyue Yang, Xi Wang, Min Chen, Qiang Yu

JournalInternational Journal of Geographical Information Science,

Keywords Spatial heterogeneity Spatial outlier patterns Second-dimension outliers Geographical detector Stratification-based analysis
https://doi.org/10.1080/13658816.2026.2682957

Overview

This paper introduces a second-dimension outlier-driven heterogeneity (SOH) model for explaining spatial heterogeneity with local outlier configurations. The model derives multi-scale spatial outlier patterns (SOPs), embeds them in a stratification-detection workflow, and evaluates individual, interaction, and scale-dependent effects in Australian barley production.

Abstract

Explaining spatial heterogeneity typically relies on first-dimension covariates that describe spatial gradients. However, many geographic phenomena also exhibit irregular and locally extreme structures that are difficult to capture using smooth relationships. This study proposed a second-dimension outlier-driven heterogeneity (SOH) model, in which the first dimension refers to covariate variation across space, while the second dimension represents spatial pattern information captured from local outlier configurations. SOH derives multi-scale spatial outlier patterns (SOPs) using a second-dimension outlier model and embeds them in a stratification-detection workflow, where decision tree-based stratification defines strata and the geographical detector evaluates explanatory power via the power of determinant (PD). The model supports evaluation of individual effects, SOP interactions, and SOP-variable interactions, with assessment of scale dependence across neighbourhood buffers. Application of this model to spatial heterogeneity in Australian barley production showed that SOPs strengthened heterogeneity explanation relative to original variables, and that SOP interactions and SOP-variable interactions yielded synergistic gains in PD. A scale threshold around 200 km was identified, beyond which SOP-only models approached the explanatory performance of combined models, indicating that multi-scale SOPs captured broad spatial context. Overall, SOH provides a unified approach for incorporating outlier-driven spatial patterns into spatial heterogeneity analysis.

Method Implementation

SOH explains spatial heterogeneity by adding a second dimension of spatial pattern information to conventional covariates. The first dimension is the original covariate variation across space, while the second dimension is a set of multi-scale spatial outlier patterns (SOPs) derived from local outlier configurations.

\[ X = \{x_i\}_{i=1}^{n}, \qquad Y = \{y_i\}_{i=1}^{n} \]
  1. Prepare spatial variables on a common support

    The response variable \(Y\) and explanatory variables \(X\) are first aligned on the same spatial units so that covariate values, SOPs, and heterogeneity statistics can be compared consistently. In the barley case study, the response was SA2-level barley production normalised by polygon area, and environmental predictors were harmonised to a common coordinate system and analysis grid before being aggregated to the modelling support.

  2. Generate multi-scale spatial outlier patterns

    For each explanatory variable \(X\), SOH uses the second-dimension outlier model to identify positive and negative local deviations relative to neighbourhood distributions. Across a set of buffer radii \(R=\{r_1,r_2,\ldots,r_k\}\), the spatial pattern variable \(\Psi\) is constructed as:

    \[ \Psi = \bigcup_{r \in R} \left\{ \sum_{v \in \mathcal{N}_r(u)} O^{+}(X,v)\,\mathbb{I}\!\left(O^{+}(X,v)>\tau\right), \sum_{v \in \mathcal{N}_r(u)} O^{-}(X,v)\,\mathbb{I}\!\left(O^{-}(X,v)<-\tau\right) \right\} \]

    Here, \(\mathcal{N}_r(u)\) is the neighbourhood around target spatial unit \(u\), \(O^{+}\) and \(O^{-}\) are positive and negative outlier components, and \(\tau\) controls the threshold for retaining pronounced local deviations. In the case study, positive and negative outliers were defined using the \(\bar{x}\pm2\sigma\) criterion, and SOPs were generated from 20 km to 200 km at 20-km intervals.

  3. Build SOP variables for each spatial unit

    For each spatial unit \(i\), positive and negative outlier summaries are retained at each neighbourhood scale to form the second-dimension pattern set:

    \[ \Psi_i = \left\{ O^{(r_1)}_{+,i}, O^{(r_1)}_{-,i}, O^{(r_2)}_{+,i}, O^{(r_2)}_{-,i}, \ldots, O^{(r_k)}_{+,i}, O^{(r_k)}_{-,i} \right\} \]

    These SOP variables describe where an explanatory factor is locally enriched, depleted, or spatially unusual relative to its surrounding context. They are used as explanatory pattern descriptors rather than as residuals or noise filters.

  4. Transform predictors into data-adaptive strata

    SOH uses a stratification-detection strategy. A CART regression tree converts an original variable, an SOP variable, or their combination into categorical strata:

    \[ Z^{(\Psi)} = \{z_i^{(\Psi)}\}_{i=1}^{n}, \qquad z_i^{(\Psi)} \in \{1,2,\ldots,L\} \]

    In the implementation, CART was fitted with the R package rpart using \(cp=0.01\), and the number of strata \(L\) was determined adaptively by the terminal nodes of the fitted tree. The tree is used here as a stratification tool, not as an out-of-sample prediction model.

  5. Measure explanatory power with the geographical detector

    The geographical detector evaluates whether the strata induced by \(\Psi\) explain spatial heterogeneity in \(Y\). The power of determinant (PD) is:

    \[ PD = \Omega_{\Psi} = 1 - \frac{ \sum_{h=1}^{L} N_h^{(\Psi)}\sigma_h^{2(\Psi)} }{ N^{(\Psi)}\sigma^{2(\Psi)} } \]

    A larger PD indicates that the constructed strata produce stronger between-stratum differentiation and lower within-stratum variance, meaning stronger explanatory power for spatial heterogeneity.

  6. Evaluate original variables, SOPs, interactions, and scale effects

    PD is calculated under several predictor configurations: original covariates \(X\), SOP variables \(\Psi\), and combined sets \(X \cup \Psi\). The same framework is then used to test pairwise SOP interactions, interactions between SOPs and original variables, category-level interactions, and the sensitivity of PD to neighbourhood buffer size. In the paper, this workflow showed that SOPs added explanatory power beyond original variables and that the SOH response stabilised around a neighbourhood threshold of approximately 200 km.

Figures and Tables

Data and Code Availability

The data and code supporting this study are publicly available through both Figshare and GitHub.

Funding

This research was supported by the National Natural Science Foundation of China (Outstanding Young Scholars Program, Grant No. 42325107).