Bishop, C. M. (2006).
Pattern recognition and machine learning. Springer New York.
Boyles, R. A. (1983).
On the convergence of the EM algorithm.
Journal of the Royal Statistical Society. Series B (Methodological),
45(1), 47–50.
Carmer, H. (1946). Mathematical methods of statistics. Princeton University Press.
Chau, N. H., Moulines, É., Rásonyi, M., Sabanis, S., and Zhang, Y. (2021).
On stochastic gradient langevin dynamics with dependent data streams: The fully nonconvex case.
SIAM Journal on Mathematics of Data Science,
3(3), 959–986.
Chopin, N., and Papaspiliopoulos, O. (2020).
An introduction to sequential monte carlo. Springer Cham.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).
Maximum likelihood from incomplete data via the EM algorithm.
Journal of the Royal Statistical Society. Series B (Methodological),
39(1), 1–22.
Diebolt, J., and Ip, E. (1996).
Markov chain monte carlo in practice. In W. R. Gilks, S. Richardson, and D. Spiegelhalter, editors, pages 259–274. Chapman; Hall.
Diebolt, Jean, and Robert, C. P. (1994).
Estimation of finite mixture distributions through bayesian sampling.
Journal of the Royal Statistical Society. Series B (Methodological),
56(2), 363–375.
Doucet, A., Godsill, S. J., and Robert, C. P. (2002).
Marginal maximum a posteriori estimation using markov chain monte carlo.
Statistics and Computing,
12, 77–84.
Finch, S. J., Mendell, N. R., and Thode Jr., H. C. (1989).
Probabilistic measures of adequacy of a numerical search for a global maximum.
Journal of the American Statistical Association,
84(408), 1020–1023.
Fisher, R. A. (1912).
On an absolute criterion for fitting frequency curves.
Messenger of Mathematics,
41, 155–160.
Fletcher, R. (1987).
Practical methods of optimization. John Wiley & Sons.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning. Springer New York.
MacKay, D. J. C. (2003).
Information theory, inference and learning algorithms. Cambridge University Press.
Meng, X.-L., and Rubin, D. B. (1991).
Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm.
Journal of the American Statistical Association,
86(416), 899–909.
Mengersen, K. L., and Robert, C. P. (1996).
Testing for mixtures: A bayesian entropic approach. In
Bayesian statistics 5: Proceedings of the fifth valencia international meetings, pages 255–276.
Neal, R. M., and Hinton, G. E. (1998).
Learning in graphical models. In M. I. Jordan, editor, pages 355–368. Springer Dordrecht.
Robbins, H., and Monro, S. (1951).
A stochastic approximation method.
The Annals of Mathematical Statistics,
22(3), 400–407.
Robert, C. P. (1996). Markov chain monte carlo in practice. In W. R. Gilks and D. Spiegelhalter, editors, pages 441–464. Chapman & Hall, London.
Robert, Christian P., and Casella, G. (2004).
Monte carlo statistical methods. Springer New York.
Sun, Y., Babu, P., and Palomar, D. P. (2016).
Majorization-minimization algorithms in signal processing, communications, and machine learning.
IEEE Transactions on Signal Processing,
65(3), 794–816.
Tanner, M. A., and Wong, W. H. (1987).
The calculation of posterior distributions by data augmentation.
Journal of the American Statistical Association,
82(398).
Wainwright, M. J., and Jordan, M. I. (2008).
Graphical models, exponential families, and variational inference.
Foundations and Trends in Machine Learning,
1(1-2), 1–305.
Wei, G. C. G., and Tanner, M. A. (1990a).
A monte carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm.
Journal of the American Statistical Association,
85(411), 699–704.
Wei, G. C. G., and Tanner, M. A. (1990b).
Posterior computations for censored regression data.
Journal of the American Statistical Association,
85(411), 829–839.
Wu, C. F. J. (1983).
On the convergence properties of the EM algorithm.
The Annals of Statistics,
11(1), 95–103.
Wu, T. T., and Lange, K. (2010).
The MM alternative to EM.
Statistical Science,
25(4), 492–505.
甘利俊一. (1989).
神経回路網モデルとコネクショニズム,Vol. 22. 東京大学出版会.
鎌谷研吾. (2020). モンテカルロ統計計算. 講談社.