When estimating a probability density within the empirical Bayes framework, the non-parametric maximum likelihood estimate (NPMLE) usually tends to overfit the data. This issue is usually taken care of by regularization - a penalization term is subtracted from the marginal log-likelihood before the maximization step, so that the estimate favors smooth solutions, resulting in the so-called maximum penalized likelihood estimation (MPLE).
The majority of penalizations currently in use are rather arbitrary brute-force solutions, which lack invariance under transformation of the parameters(reparametrization) and measurements.
This contradicts the principle that, if the underlying model
has several equivalent formulations, the methods of inductive inference should lead to consistent results. Motivated by this principle and using an information-theoretic point of view, we suggest an entropy-based penalization term that guarantees this kind of invariance. The resulting density estimate can be seen as a generalization of reference priors. Using the reference prior as a hyperprior, on the other hand, is argued to be a poor choice for regularization. We also present an insightful connection between the NPMLE, the cross entropy
and the principle of minimum discrimination information suggesting another method of inference that contains the doubly-smoothed maximum likelihood estimation as a special case.