Bayesian Hierarchical Probabilistic Matrix Factorization - Filling gaps in hierarchical data
Franziska Schrodt, Farideh Fazayeli
BHPMF is a machine learning (recommender system based) gap filling technique which has been developed specifically to make use of information within hierarchical data structures. In other words, where individual observations can be nested within higher orders (e.g. data collected on living organisms with hierarchial information from the taxonomy, soil data with hierarchical information from different soil layers or car manifacturing data with hierarchical information from car models, manufacturers and countries of origin.
BHPMF enables imputation (gap filling) where at least one measurement is available per observation. The advantages of BHPMF compared to other gap filling techniques are three fold. BHPMF (1) fills gaps at the individual observation level (e.g. individual plant rather than species), (2) makes use of correlations between measured variables and within hierarchical structures and (3) provides estimates of uncertainties for each imputed value.
The vignette provided as part of the R package demonstrates the different steps necessary to implement BHPMF, take full advantage of all its features and appropriately analyse the quality of the gap filling.
The R code and vignette are available on my GitHub account and on R CRAN.
For more information on the algorithm used and technical details, please refer to:
BHPMF for ecological applications:
F. Schrodt, J. Kattge, H. Shan, F. Fazayeli, J. Joswig, A. Banerjee, M. Reichstein, M. Boenisch, S. Diaz, J. Dickie, A. Gillison, A. Karpatne, S. Lavorel, P.W. Leadley, C. Wirth, I. Wright, S.J. Wright, P.B. Reich (2015) BHPMF – a hierarchical Bayesian approach to gap-filling and trait prediction for macroecology and functional biogeography. Global Ecology and Biogeography 24: 1510–1521
F. Fazayeli, A. Banerjee, J. Kattge, F. Schrodt, P.B. Reich (2014) Uncertainty quantified matrix completion using Bayesian Hierarchical Matrix factorization. International Conference on Machine Learning and Applications (ICMLA)
H. Shan, J. Kattge, P. B. Reich, A. Banerjee, F. Schrodt, M. Reichstein. (2012) Gap Filling in the Plant Kingdom – Trait Prediction Using Hierarchical Probabilistic Matrix Factorization. International Conference on Machine Learning (ICML)