Variational Non-Bayesian Inference of the Probability Density Function: Introduction

19 Apr 2024

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) U Jin Choi, Department of mathematical science, Korea Advanced Institute of Science and Technology & ujchoi@kaist.ac.kr;

(2) Kyung Soo Rim, Department of mathematics, Sogang University & ksrim@sogang.ac.kr.

Table of Links

This paper presents a research study focused on uncovering the hidden population distribution from the viewpoint of a variational non-Bayesian approach. It asserts that if the hidden probability density function (PDF) has continuous partial derivatives of at least half the dimension’s order, it can be perfectly reconstructed from a stationary ergodic process: First, we establish that if the PDF belongs to the Wiener algebra, its canonical ensemble form is uniquely determined through the Fréchet differentiation of the KullbackLeibler divergence, aiming to minimize their cross-entropy. Second, we utilize the result that the differentiability of the PDF implies its membership in the Wiener algebra. Third, as the energy function of the canonical ensemble is defined as a series, the problem transforms into finding solutions to the equations of analytic series for the coefficients in the energy function. Naturally, through the use of truncated polynomial series and by demonstrating the convergence of partial sums of the energy function, we ensure the efficiency of approximation with a finite number of data points. Finally, through numerical experiments, we approximate the PDF from a random sample obtained from a bivariate normal distribution and also provide approximations for the mean and covariance from the PDF. This study substantiates the excellence of its results and their practical applicability.

1. Introduction.

Statistical inference enables us to draw meaningful conclusions about a population’s distribution by studying a representative sample. Numerous facets of scientific research implicitly or explicitly require the estimation of population’s distribution from a sample. By analyzing a subset of individuals or observations, we can gather information that applies to the larger group, even when studying the entire population is impractical or impossible.

Drawing from the works of two authors ([8, 12]), machine learning methods have gained popularity for predicting hidden information from data. Today, most emerging artificial neural networks incorporate statistical methods. Nevertheless, these computations do not achieve global optimization of learning results because of the inherent challenges related to setting initial conditions and handling the small magnitudes of high-dimensional differential values.

To address this issue, many scholars have applied statistical inferences. Among these methods, variational Bayesian method is a commonly used technique for estimating posterior information. It’s worth noting that there are numerous excellent studies on the method. One

can refer to the citations [1, 2, 14, 22] as examples of the most recent works.

Our study aims to identify the hidden probability density function (PDF) without the need for a prior distribution in a non-Bayesian context. This is achieved solely from a stationary ergodic process using entropy minimization. Additionally, we aim to estimate moments of the PDF, such as the mean and variance. We provide a proof of the norm convergence of sequences of approximated PDFs from a finite sample size, which is useful in practical applications. In most probability models arising from natural phenomena, assuming the boundedness of a random variable within a specific window becomes reasonable when considering a sufficiently wide range of variability. Moreover, we assume ergodicity in the context of a stationary process, as is common in many scientific communities.

The basic framework of the idea to find the hidden PDF is as follows: First, we express the PDF in the form of a canonical ensemble, resembling a mechanical heat system. Second, by embedding the problem in an infinite-dimensional function space, we establish the existence of the Fréchet derivative of an entropy induced from the energy function of the PDF (for more information on the Fréchet derivative, refer to [18]). Third, we uniquely determine the energy function by minimizing the Kullback-Leibler divergence (KL-divergence) ([7, 15]). The concept of KL-divergence, valuable in information theory, finds widespread use, even in artificial intelligence (for example, see [10], [4], [11]). Finally, we derive a system of polynomial series that corresponds to the sample means of complex exponentials and present their numerical solutions from random samples in a bivariate normal distribution. Using the approximated PDFs that we obtained, we compare approximations of the mean and variance from the bivariate normal distribution.