
XVII CLAPEM
2026-03-06
Motivation
Mathematical Framework
Estimation Method and Theoretical Results
Applications to real data
Discussion and Future Work
\bf X^{(i)} = \big(X_1^{(i)}, X_2^{(i)}, \dots, X_d^{(i)}\big).
X_v^{(i)} \in A, A a finite alphabet, for v \in \{1,2,\dots,d\}.
Stationary and ergodic process \bf X = \{X^{(i)}: -\infty<i<\infty\} in \big((A^d)^\mathbb N, \mathcal F, \mathbb P\big).
We assume the process \bf X has an underlying graph G^* = (V, E^*).
\bf X^{(i)} = \big(X_1^{(i)}, X_2^{(i)}, \dots, X_d^{(i)}\big).
X_v^{(i)} \in A, A a finite alphabet, for v \in \{1,2,\dots,d\}.
Stationary and ergodic process \bf X = \{X^{(i)}: -\infty<i<\infty\} in \big((A^d)^\mathbb N, \mathcal F, \mathbb P\big).
We assume the process \bf X has an underlying graph G^* = (V, E^*).
Classic Problem: In the IID scenario, we have the standard model selection problem for discrete graphical models or Markov random fields on graphs.
Established Literature: Extensive research exists for this setting, including works by Lauritzen (1996), Koller and Friedman (2009), and Leonardi et al. (2024).
Common Approaches:
Most traditional graphical model techniques assume that observations \mathbf{X}^{(1)}, \mathbf{X}^{(2)}, \dots are Independent and Identically Distributed (IID).
However, in real-world scenarios, the independence assumption is often too restrictive:
Our Proposal: A global model selection criterion that ensures consistency even under mixing conditions.
Proposal: Method to estimate the graph of conditional dependencies for multivariate stochastic processes with mixing conditions.
Aim: combine and generalize previous works,
Leonardi et al. (2021): method assumes decomposition into subvectors with immediate neighbor dependencies,
Leonardi et al. (2023): estimator of neighborhood of each vertex for iid data.
Proposed solution: penalized pseudo-likelihood criterion for entire graph estimation for multivariate processes with mixing conditions.
Key advantages:
The process satisfies a mixing condition with rate \psi(\ell) if the dependence between the past (X^{(1:m)}) and the future (X^{(n:n+k-1)}) decays as the distance \ell=n-m increases:
\Bigl| \mathbb P (X_{future} | X_{past}) - \mathbb P (X_{future}) \Bigr| \leq \psi(n-m) \mathbb P(X_{future}) \tag{1}
As the stationary distribution of the process \pi is unknown, we must estimate it from the data
Assume we observe a sample of size n of the process, denoted by \{x^{(i)}\colon i=1,\dots,n\}. Then, for any W\subset V and any a_W\in A^{W} (\pi(a_W) = \mathbb P (X_W = a_W) ), \begin{equation*} \widehat{\pi}(a_W) = \frac{N(a_W)}{n}. \end{equation*}
where N(a_W) denotes the number of times the configuration 𝑎𝑊 appears in the sample. If \:\widehat\pi(a_W)>0, \begin{equation*} \widehat\pi(a_{U}|a_{W}) = \frac{\widehat\pi(a_{U\cup W})}{\widehat\pi(a_{W})}\,, \end{equation*} for two disjoint subsets W,U\subset V and configurations a_W\in A^W, a_{U}\in A^{U}.
Given a graph G=(V,E), define G(v) = \big\{u \in V: (u,v) \in E \big\}, for v \in V, the set of neighbors of v in graph G.
Then we can compute \begin{equation*} \widehat\pi(a_v|a_{G(v)}) = \frac{\widehat\pi(a_{\{v\}\cup G(v)})}{\widehat\pi(a_{G(v)})}. \end{equation*}
Given any graph G and a sample of the process, we define the pseudo-likelihood function by \widehat L(G) \;=\; \prod_{i=1}^n \:\prod_{v \in V} \widehat\pi(x^{(i)}_v | x^{(i)}_{G(v)})\,. \tag{2}
Applying the logarithm we can write expression above as \log \widehat L(G) = \sum_{v \in V} \sum_{(a_v\in A)} \sum_{a_{G(v)}\in A^{|G(v)|}} N(a_v,a_{G(v)})\log \widehat\pi(a_v|a_{G(v)}) \tag{3}
The regularized graph estimator is given by
\widehat G = \underset{G}{\arg\max}\Big\{\log \widehat L(G) - \lambda_n \sum_{v \in V} |A|^{|G(v)|}\Big\} \,.
Theorem (Leonardi and Severino, 2025): Assume the process \{X^{(i)}: i \in \mathbb{Z}\} satisfies the mixing condition presented before with \psi(\ell) = O(1/\ell^{1+\epsilon}) for some \epsilon>0. Then, by taking \lambda_n = c \log n, we have that \begin{equation*}
\widehat G = \underset{G}{\arg\max}\Big\{\log \widehat L(G) - \lambda_n \sum_{v \in V} |A|^{|G(v)|}\Big\}
\end{equation*} satisfies \widehat G=G^* eventually almost surely as n\to \infty.
The proof of this consistency is based on results about the rate of convergence of empirical probabilities.
Consider \{\widehat G\neq G^*\} = \big\{G^* \subsetneq \widehat G \big\} \cup \big\{G^* \not\subset \widehat G \big\}.


We prove that, eventually almost surely as n\to\infty, neither of the cases above can happen, which implies that \widehat G = G^*.
The objective is to find the optimal graph by maximizing the regularized log-pseudo-likelihood function:
H(G) = \log \widehat L(G) - \lambda_n \sum_{v \in V} |A|^{|G(v)|}.

Comprehensive tools for estimating the graph \hat{G}, executing search algorithms, and ensuring reproducible results.


Water volume measured at d stations along the river’s course.
Stations denoted as X_{u}, where u = 1, \ldots, d=10.
Vector \mathbf{X} observed at discrete time intervals (10-day mean).
Each observation denoted as \mathbf{X}^{(i)}= (X_{1}^{(i)}, \ldots, X_{10}^{(i)}).
Process \mathbf{X}^n=\{\mathbf{X}^{(i)}: 1 \leq i \leq n\}.
Data avaiability: daily measurements from October 1st, 1924, to February 28th, 2019.
Data considered: from January 1977 to December 2016.
Total of 1042 observations.
Discretizetion of the data into five levels based on quantiles.
We applied the Forward Stepwise Algorithm with 5-fold cross-validation to select the optimal penalty c.


Relation between stock exchanges’ performance reflect global market interconnectedness and economic trends.
Impacts investment strategies, risk assessment, and portfolio diversification.
Challenge: Variations in stock exchange operating hours due to different time zones.
Complex analysis required to account for global market fluctuations.

Objective: analyse the Relation between stock exchanges’ performance.
Discretization: daily indicator of an increase in the index rate.
Data from May 18th, 2010 to September 20th, 2023.
Adjustments made to address missing data and holiday schedules.
Final dataset comprises 2{,}654 rows for analysis.
Forward stepwise algorithm and a 5-fold cross-validation approach.
Estimated graph, considering the penalizing constant value chosen by cross-validation.
Cerqueira, A., Fraiman, D., Vargas, C., and Leonardi, F. (2017). A test of hypotheses for random graph distributions built from eeg data. IEEE Transactions on Network Science and Engineering.
Galves, A., Orlandi, E., and Takahashi, D. (2015). Identifying interacting pairs of sites in Ising models on a countable set. Braz. J. Probab. Stat..
Lauritzen, S. (1996) Graphical models, volume 17. Clarendon Press.
Leonardi, F, Carvalho, R. and Frondana, I. (2023). Structure recovery for partially observed discrete markov random fields on graphs under not necessarily positive distributions. Scandinavian Journal of Statistics.
Leonardi, F., Lopez-Rosenfeld, M., Rodriguez, D., Severino, M., and Sued, M. (2021). Independent block identification in multivariate time series. Journal of Time Series Analysis.
Leonardi, F. and Severino, M. (2025). Model selection for Markov random fields on graphs under a mixing condition. Stochastic Processes and their Applications.
Meinshausen, N. and Bühlmann, P. (2006). Highdimensional graphs and variable selection with the lasso. Ann. Statist..
Oodaira, H., Yoshihara, K. (1971)The law of the iterated logarithm for stationary processes satisfying mixing conditions, Kodai Math. Semin. Rep.
Ravikumar, P., Wainwright, M. and Laffert, J. (2010). High-dimensional Ising model selection using l1-regularized logistic regression. Ann. Statist..
magnotfs@insper.edu.br
magnotairone.github.io
This work was partially supported by FAPESP and CNPq, Brazil.
