Understanding the internal components of the sampler


It is important to understand the main principles behind the underlying algorithm, as they shed light on understanding the individual components. Once you understand the big picture it should also be easier to write your own extensions.

The structure of the algorithm can be summarized with the following graph which we will break down below:

internal structure

Inputs


The inputs to the algorithm are of three types.

MCMC

The MCMC object is simply a list of updates and a specification of a backend. At each MCMC iteration the sampler will cycle through the list of these updates and perform each of them consecutively, as depicted in the red box in The Algorithm (subject to some modifications, TODO see advanced use). This list is de facto how you point to a specific MCMC algorithm and ask the sampler to use it. The last, optional argument backend is a way to use your own Workspaces. If default backend is used, then also default Workspaces will be created at the Initialization step.

Callbacks

Callbacks are instructions that allow for certain interactions with the state of the sampler while the sampler is still working. In The Algorithm section of a graph you can see that each callback may be executed before and after any call to an MCMC update. These can be such instructions as:

  • print useful info to the console about the state of the sampler
  • update diagnostic plots
  • save the data to a csv file

Other

These are some prerequisites for running an MCMC sampler, such as:

  • the data
  • specification of the number of MCMC iterations
  • initial guess of a parameter

As well as some additional named options (kwargs...) that will be passed down to various initializers and that may be used by your own Workspaces etc.

Initialization


At the time of initialization of the sampler the global, and then, local workspaces are created. The concept of workspaces is central to ExtensibleMCMC and it is the reason why the package can be efficient even for very computationally intensive problems. The workspaces are essentially pre-allocated pieces of memory needed by the MCMC sampler. Consequently, if implemented correctly, the bottleneck of the algorithm should lie in computations and not memory-allocations.

Tip

Often times local workspaces are best conceptualized as a window view of some substate of the global workspace. The global workspace is initialized before local workspaces and is passed to the initializers of local workspaces. Consequently, it is often reasonable to define certain fields of the local workspaces as views to the relevant fields of the global workspace.

After workspaces, the MCMCSchedule is initialized. It is an iterator that iterates through the MCMC steps and at each step iterates through updates to perform. In the simplest setting it takes a form of a simple double for loop as illustrated in The Algorithm, but it can be modified.

Finally, each of the callback is given a chance to perform some initializations.

The algorithm


In principle, there is nothing explicit in The Algorithm signifying that it is an MCMC algorithm. However, it becomes an MCMC algorithm, by the virtue of updates being of the MCMC type.

The important part is marked in red. It represents a single MCMC update and illustrates that each update sees the corresponding pair of update and its local workspace as well global workspace and performs updates based on instructions in update and using pieces of memory that are in a local and the global workspace.

Additionally, notice that prior to each call to mcmc update there is an update call that allows for some exchange of information between the local and global workspaces.

See further sections of this manual to learn more about various components of the algorithm's flow above.