No-U-Turn Sampler

Table of contents


Algorithm Description

The No-U-Turn Sampler (NUTS) is a Hamiltonian Monte Carlo (HMC) method that adaptively chooses the number of leapfrog steps and step size. The description below is a modified and shortened version of Algorithm 6 (‘No-U-Turn Sampler with Dual Averaging’) in Hoffman and Gelman (2014).

Let \(\theta^{(i)}\) denote a \(d\)-dimensional vector of stored values at stage \(i\) of the algorithm, and denote the Hamiltonian by

\[H(\theta, p) := \frac{1}{2} \log \left\{ (2 \pi)^d | \mathbf{M} | \right\} + \frac{1}{2} p^\top \mathbf{M}^{-1} p - \ln K(\theta | X)\]

where \(\mathbf{M}\) is a pre-conditioning matrix. The NUTS algorithm proceeds in 3 steps.

  1. (Initialization) Sample \(p^{(i)} \sim N(0,\mathbf{M})\),

\[u \sim U(0, \exp( H(\theta^{(i-1)}, p^{(i)}) )),\]

and set: \(n = 1\), \(s = 1\),

\[\theta^{(*)} = \theta^{(+)} = \theta^{(-)} = \theta^{(i-1)},\]

and

\[p^{(*)} = p^{(+)} = p^{(-)} = p^{(i)}\]
  1. (Proposal Step) while \(s = 1\) do:

  1. Sample a direction: \(v \sim R\), where \(R\) denotes the standard Rademacher distribution (i.e., \(v\) takes values in \(\{-1,1\}\) with equal probability).

  2. if \(v = -1\):

    update \(\theta^{(*)}\), \(\theta^{(-)}\), and \(p^{(-)}\) by calling the proposal tree-building function (see Hoffman and Gelman (2014) for details).

    else:

    update \(\theta^{(*)}\), \(\theta^{(+)}\), and \(p^{(+)}\) by calling the proposal tree-building function.

    In addition to the proposal and momentum values, \(n'\), \(s'\), \(\alpha\), and \(n_\alpha\) are also updated.

    (Note that, in the tree building process, each tree takes \(2^{\text{depth}}\) leapfrog steps with step size \(v \epsilon\).)

  3. if \(s' = 1\):

    \[\theta^{(i)} = \begin{cases} \theta^{(*)} & \text{ with probability } n' / n \\ \theta^{(i-1)} & \text{ else } \end{cases}\]
  4. Set: \(n = n + n'\), \(\text{depth} = \text{depth} + 1\), and

\[s = s' \times \mathbf{1}[ (\theta^{(+)} - \theta^{(-)}) \cdot p^{(-)} \geq 0 ] \times \mathbf{1}[ (\theta^{(+)} - \theta^{(-)}) \cdot p^{(+)} \geq 0 ]\]
  1. (Update Step Size)

if \(i \leq\) n_adapt_draws:

update

\[h_i = \left( 1 - \frac{1}{i + t_0} \right) \times h_{i-1} + \frac{1}{i + t_0} \left( \delta - \frac{\alpha}{n_\alpha} \right)\]

where: \(t_0\) is set via nuts_settings.t0_val; and \(\delta\), the target acceptance rate, is set via nuts_settings.target_accept_rate.

Set

\[\ln \epsilon = \mu - \frac{\sqrt{i}}{\gamma} \times h_i\]

where \(\mu = \ln(10 \times \epsilon_0)\) and \(\gamma\) is set via nuts_settings.gamma_val.

\[\ln \bar{\epsilon}_i = i^{-\kappa} \times \ln \epsilon + (1 - i^{-\kappa}) \times \ln \bar{\epsilon}_{i-1}\]

where \(\kappa\) is set via nuts_settings.kappa_val.

else:

Set \(\epsilon\) equal to \(\bar{\epsilon}\) from the last adaptation round.

The algorithm stops when the number of draws reaches n_burnin_draws + n_keep_draws, and returns the final n_keep_draws number of draws.


Function Declarations

bool nuts(const ColVec_t &initial_vals, std::function<fp_t(const ColVec_t &vals_inp, ColVec_t *grad_out, void *target_data)> target_log_kernel, Mat_t &draws_out, void *target_data)

The No-U-Turn Sampler (NUTS) MCMC Algorithm.

Parameters
  • initial_vals – a column vector of initial values.

  • target_log_kernel – the log posterior kernel function of the target distribution, taking three arguments:

    • vals_inp a vector of inputs; and

    • grad_out a vector to store the gradient; and

    • target_data additional data passed to the user-provided function.

  • draws_out – a matrix of posterior draws, where each row represents one draw.

  • target_data – additional data passed to the user-provided function.

Returns

a boolean value indicating successful completion of the algorithm.

bool nuts(const ColVec_t &initial_vals, std::function<fp_t(const ColVec_t &vals_inp, ColVec_t *grad_out, void *target_data)> target_log_kernel, Mat_t &draws_out, void *target_data, algo_settings_t &settings)

The No-U-Turn Sampler (NUTS) MCMC Algorithm.

Parameters
  • initial_vals – a column vector of initial values.

  • target_log_kernel – the log posterior kernel function of the target distribution, taking three arguments:

    • vals_inp a vector of inputs; and

    • grad_out a vector to store the gradient; and

    • target_data additional data passed to the user-provided function.

  • draws_out – a matrix of posterior draws, where each row represents one draw.

  • target_data – additional data passed to the user-provided function.

  • settings – parameters controlling the MCMC routine.

Returns

a boolean value indicating successful completion of the algorithm.


Control Parameters

The basic control parameters are:

  • size_t nuts_settings.n_burnin_draws: number of burn-in draws.

  • size_t nuts_settings.n_keep_draws: number of draws to keep (post sample burn-in period).

  • bool vals_bound: whether the search space of the algorithm is bounded. If true, then

    • ColVec_t lower_bounds: defines the lower bounds of the search space.

    • ColVec_t upper_bounds: defines the upper bounds of the search space.

Additional settings:

  • int nuts_settings.omp_n_threads: the number of OpenMP threads to use.

    • Default value: -1 (use all available threads divided by 2).

  • size_t nuts_settings.n_adapt_draws: the number of draws to use when adaptively setting the step size (\(\epsilon\)).

    • Default value: 1000.

  • fp_t nuts_settings.target_accept_rate: the target acceptance rate for the MCMC chain.

    • Default value: 0.55.

  • size_t nuts_settings.max_tree_depth: maximum tree depth for build tree function.

    • Default value: 10.

  • fp_t nuts_settings.gamma_val: the tuning parameter \(\gamma\), used when updating the step size (\(\epsilon\)).

    • Default value: 0.05.

  • fp_t nuts_settings.t0_val: the tuning parameter \(t_0\), used when updating the step size (\(\epsilon\)).

    • Default value: 10.

  • fp_t nuts_settings.kappa_val: the tuning parameter \(\kappa\), used when updating the step size (\(\epsilon\)).

    • Default value: 0.75.

  • Mat_t nuts_settings.precond_mat: preconditioning matrix for the leapfrog step.

    • Default value: a diagonal matrix.


Examples

Gaussian Distribution

Code to run this example is given below.

Armadillo (Click to show/hide)

#define MCMC_ENABLE_ARMA_WRAPPERS
#include "mcmc.hpp"

struct norm_data_t {
    arma::vec x;
};

double ll_dens(const arma::vec& vals_inp, arma::vec* grad_out, void* ll_data)
{
    const double pi = arma::datum::pi;

    const double mu    = vals_inp(0);
    const double sigma = vals_inp(1);

    norm_data_t* dta = reinterpret_cast<norm_data_t*>(ll_data);
    const arma::vec x = dta->x;
    const int n_vals = x.n_rows;

    //

    const double ret = - n_vals * (0.5 * std::log(2*pi) + std::log(sigma)) - arma::accu( arma::pow(x - mu,2) / (2*sigma*sigma) );

    //

    if (grad_out) {
        grad_out->set_size(2,1);

        //

        const double m_1 = arma::accu(x - mu);
        const double m_2 = arma::accu( arma::pow(x - mu,2) );

        (*grad_out)(0,0) = m_1 / (sigma*sigma);
        (*grad_out)(1,0) = (m_2 / (sigma*sigma*sigma)) - ((double) n_vals) / sigma;
    }

    //

    return ret;
}

double log_target_dens(const arma::vec& vals_inp, arma::vec* grad_out, void* ll_data)
{
    return ll_dens(vals_inp,grad_out,ll_data);
}

int main()
{
    const int n_data = 1000;

    const double mu = 2.0;
    const double sigma = 2.0;

    norm_data_t dta;

    arma::vec x_dta = mu + sigma * arma::randn(n_data,1);
    dta.x = x_dta;

    arma::vec initial_val(2);
    initial_val(0) = mu + 1; // mu
    initial_val(1) = sigma + 1; // sigma

    mcmc::algo_settings_t settings;

    settings.nuts_settings.n_burnin_draws = 2000;
    settings.nuts_settings.n_keep_draws = 2000;

    arma::mat draws_out;
    mcmc::nuts(initial_val, log_target_dens, draws_out, &dta, settings);

    //

    std::cout << "nuts mean:\n" << arma::mean(draws_out) << std::endl;
    std::cout << "acceptance rate: " << static_cast<double>(settings.nuts_settings.n_accept_draws) / settings.nuts_settings.n_keep_draws << std::endl;

    //

    return 0;
}

Eigen (Click to show/hide)

#define MCMC_ENABLE_EIGEN_WRAPPERS
#include "mcmc.hpp"

inline
Eigen::VectorXd
eigen_randn_colvec(size_t nr)
{
    static std::mt19937 gen{ std::random_device{}() };
    static std::normal_distribution<> dist;

    return Eigen::VectorXd{ nr }.unaryExpr([&](double x) { (void)(x); return dist(gen); });
}

struct norm_data_t {
    Eigen::VectorXd x;
};

double ll_dens(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* ll_data)
{
    const double pi = 3.14159265358979;

    const double mu    = vals_inp(0);
    const double sigma = vals_inp(1);

    norm_data_t* dta = reinterpret_cast<norm_data_t*>(ll_data);
    const Eigen::VectorXd x = dta->x;
    const int n_vals = x.size();

    //

    const double ret = - n_vals * (0.5 * std::log(2*pi) + std::log(sigma)) - (x.array() - mu).pow(2).sum() / (2*sigma*sigma);

    //

    if (grad_out) {
        grad_out->resize(2,1);

        //

        const double m_1 = (x.array() - mu).sum();
        const double m_2 = (x.array() - mu).pow(2).sum();

        (*grad_out)(0,0) = m_1 / (sigma*sigma);
        (*grad_out)(1,0) = (m_2 / (sigma*sigma*sigma)) - ((double) n_vals) / sigma;
    }

    //

    return ret;
}

double log_target_dens(const Eigen::VectorXd& vals_inp, Eigen::VectorXd* grad_out, void* ll_data)
{
    return ll_dens(vals_inp,grad_out,ll_data);
}

int main()
{
    const int n_data = 1000;

    const double mu = 2.0;
    const double sigma = 2.0;

    norm_data_t dta;

    Eigen::VectorXd x_dta = mu + sigma * eigen_randn_colvec(n_data).array();
    dta.x = x_dta;

    Eigen::VectorXd initial_val(2);
    initial_val(0) = mu + 1; // mu
    initial_val(1) = sigma + 1; // sigma

    mcmc::algo_settings_t settings;

    settings.nuts_settings.n_burnin_draws = 2000;
    settings.nuts_settings.n_keep_draws = 2000;

    //

    Eigen::MatrixXd draws_out;
    mcmc::nuts(initial_val, log_target_dens, draws_out, &dta, settings);

    //

    std::cout << "nuts mean:\n" << draws_out.colwise().mean() << std::endl;
    std::cout << "acceptance rate: " << static_cast<double>(settings.nuts_settings.n_accept_draws) / settings.nuts_settings.n_keep_draws << std::endl;

    //

    return 0;
}