Multiple observations
In practice, we gather multiple observations of a single trajectory of a stochastic process. In this package, we refer to multiple discrete-time observations of a single trajectory as a single recording. We do not enforce any structure on a single recording and instead use a convention of using an appropriate NamedTuple
. However, when combining multiple recordings into a single data object AllObservations
it is expected that each recording follows the said convention.
Below we describe how to handle observations of
- a single recording
- multiple recordings from the same law
- multiple recordings from multiple laws
Defining a single recording
To fully describe a single recording we need four elements:
- The law of the underlying stochastic process
- A starting time
- A prior over the starting point
- Discrete-time observations of the process
Consequently, this package adopts the convention of defining a single recording with a NamedTuple
:
recording = (
P = ...,
obs = ...,
t0 = ...,
x0_prior = ...,
)
The law P
needs to be defined by the user.
To define diffusion laws you may use DiffusionDefinition.jl. To define conditioned diffusion laws you may use GuidedProposals.jl.
There must exist an implementation of a function var_parameter_names(::typeof(P))
if one wants to use functions and structs presented below.
obs
is assumed to be a vector of observations, with each element being of the type inheriting from Observation{D,T}
. x0_prior
is assumed to inherit from StartingPtPrior{T}
. We provide helper functions that create a NamedTuple
in the format above:
ObservationSchemes.build_recording
— Functionbuild_recording(P, obs, t0, x0_prior)
A utility function that creates an appropriate NamedTuple
that represents a single recording.
build_recording(
::Type{K}, tt, observs::Vector, P, t0, x0_prior; kwargs...
) where K
A utility function for building a recording. Times of recordings are assumed to be stored in tt
and their values in observs
. K
is the type of observation (for instance LinearGsnObs
). kwargs
are name arguments that are passed to every single initializer of K
.
Defining multiple recordings
A struct AllObservations
allows for a systematic definition of multiple recordings and, in addition, provides some handy functionality.
ObservationSchemes.AllObservations
— Typestruct AllObservations
recordings::Vector{Any}
param_depend::Dict{Symbol,Vector{Pair{Int64, Symbol}}}
obs_depend::Dict{Symbol,Vector{Tuple{Int64,Int64,Int64}}}
param_depend_rev::Vector{Vector{Tuple{Symbol,Symbol}}}
obs_depend_rev::Vector{Vector{Vector{Tuple{Symbol,Int64}}}}
end
A struct gathering multiple observations of diffusion processes. Additionaly, the interdependence structure between parameters shared between various diffusions laws used to generate the recorded data is kept.
recordings
: collects all recordingsparam_depend
: is a dictionary with- keys: parameter labels
- values: vectors that collect all indices of recordings whose laws depend on a corresponding parameter (in fact collects
Pairs
:idx-of-a-recording => name-of-parameter
).
obs_depend
: does the same asparam_depend
but for observations. In this case values are vectors that collectTuple
s in a format:(idx-of-a-recording, idx-of-an-observation, idx-of-parameter-in-θ-vector)
.param_depend_rev
: gives for each recording a list of variable parameters that its law depends on. These are given in a formatglobal-param-name => local-param-name
.
We use the term variable to refer to those parameters that are returned after a call var_parameter_names(typeof(P))
.
obs_depend_rev
: does the same asparam_depend_rev
but for the observations. These are given in a formatglobal-param-name => local-param-idx
.
AllObservations(;P=nothing, obs=nothing, t0=nothing, x0_prior=nothing)
Default constructor creating either an empty AllObservations
object, or initiating it immediately with a single recording where the target comes from the law P
, the observations are stored in obs
and the observed process was started at time t0
from som position which we put a prior x0_prior
on.
AllObservations(recording::NamedTuple)
Constructor creating an AllObservations
object and initiating it immediately with a single recording where the target comes from the law recording.P
, the observations are stored in recording.obs
and the starting point is at time recording.t0
and has a prior x0_prior
.
Recordings that share a single law
We can define multiple recordings using functions
ObservationSchemes.add_recording!
— Functionadd_recording!(all_obs::AllObservations, recording::NamedTuple)
Add a new recording recording
to an observations container all_obs
.
ObservationSchemes.add_recordings!
— Functionadd_recordings!(
all_obs::AllObservations,
recordings::AbstractArray{<:NamedTuple}
)
Add multiple new recordings recordings
to an observations container all_obs
.
for instance:
const OBS = ObservationSchemes
struct LawA α; β; end
OBS.var_parameter_names(P::LawA) = (:α, :β)
recordings = [
(
P = LawA(10,20),
obs = [
LinearGsnObs(1.0, 1.0; Σ=1.0),
LinearGsnObs(2.0, 2.0; full_obs=true),
LinearGsnObs(3.0, 3.0; Σ=2.0),
],
t0 = 0.0,
x0_prior = KnownStartingPt(2.0),
), # recording n°1
(
P = LawA(10,20),
obs = [
LinearGsnObs(1.3, 1.0; full_obs=true),
LinearGsnObs(2.3, 2.0; full_obs=true),
LinearGsnObs(3.3, 3.0; full_obs=true),
],
t0 = 0.3,
x0_prior = KnownStartingPt(-2.0),
), # recording n°2
]
all_obs = AllObservations()
add_recordings!(all_obs, recordings)
In here we defined the vector recordings
verbatim, however we provide an ObsScheme
struct together with @load_data
macro to do this in an automatic and concise way for many observations at once (see the following section to learn more about this).
Observations can be accessed via all_obs.recordings
. By default the laws from different recordings are assumed to be independent, but we can tell AllObservations
object that they are the same by indicating that the laws share some subsets (possibly all) parameters. This can be done by passing an appropriate dictionary to a function:
ObservationSchemes.add_dependency!
— Functionadd_dependency!(all_obs::AllObservations, dep::Dict)
Add a dependency structure dep
between parameters shared across various laws and observations used to generate various recordings stored in an observations container all_obs
.
add_dependency!(
all_obs,
Dict(
:α_shared => [(1, :α), (2, :α)],
:β_shared => [(1, :β), (2, :β)],
)
)
The first (respectively second) entry in the dictionary tells all_obs
that there is a parameter, which from now on will be labeled :α_shared
(resp. :β_shared
), that is present in the law of recording 1
and the law of recording 2
and in both of these cases if one calls var_parameter_names(P)
then the referred to parameter should have a name :α
(resp. :β
).
If the parameter appears in the observation instead of the law, then the previous tuple of the format (rec_idx, :param-name)
must be substituted with: (rec_idx, obs_idx, param_idx_in_obs_vec)
, for instance:
add_dependency!(
all_obs,
Dict(
:γ_shared => [(1, 2, 3), (40, 400, 4)],
)
)
indicates that there is a shared parameter :γ_shared
that enters:
- the second observations in a first recording and that it is the third parameter of this observation
- the 400th observation of the 40th recording and that it enters the 4th parameter of that observation
Now, we can additionally call
ObservationSchemes.initialize
— Functioninitialize(all_obs::AllObservations)
Split the recordings at the times of full observations to make full use of the Markov property (and make the code readily parallelisable). Introduce all variable parameters that were not mentioned in the current dependency dictionary. Create internal dictionaries all_obs.param_depend_rev
and all_obs.obs_depend_rev
.
as in:
initialised_all_obs, old_to_new_idx = initialize(all_obs)
to perform three useful operations.
- First, all parameters that are not shared between various laws will be marked (here, there is no such parameter, so this step does not do anything, see the example below which illustrates this idea),
- Second, the recordings are split at the times at which full observations are made, as full observations allows for employment of the Markov property and treatment of the problem in parallel. As a result,
initialised_all_obs
now has5
recordings, all coming from the same lawLawA
. - Third, an additional dependence structure is introduced that allows for efficient retrieval of information about parameter dependence when iterating through laws and observations.
The old_to_new_idx
might be helpful for keeping track of the original indices of recordings.
Some useful information is printed when we call
ObservationSchemes.print_parameters
— Functionprint_parameters(all_obs::AllObservations)
Print information about the variable parameters about which the all_obs.param_depend
object stores some interdependency information.
julia> print_parameters(initialised_all_obs)
There are 5 independent recordings.
There are also 2 variable parameters.
* * *
You may define the var-parameters using the following template:
# start of template
using OrderedCollections
θ_init = OrderedDict(
:β_shared => ... , # param 1
:α_shared => ... , # param 2
)
# end of template
and in an MCMC setting you may let your parameter update step
refer to a subset of indices you wish to update using the order
given above.
* * *
We can also inspect the field param_depend_rev
:
julia> initialised_all_obs.param_depend_rev
5-element Array{Array{Tuple{Symbol,Symbol},1},1}:
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
which allows for iterating through recordings and seeing immediately which parameters they depend on, in particular:
- how these parameters are referred to by the
all_obs
struct - how these parameters are referred to by the individual laws
Recordings under multiple laws
It should be clear that the formalism above allows for definition of recordings coming from multiple diffusion laws. For instance, we can have
struct LawB γ; β; end
OBS.var_parameter_names(P::LawB) = (:γ, :β)
extra_recording = (
P = LawB(30,40),
obs = [
LinearGsnObs(1.5, 1.0; Σ=1.0),
LinearGsnObs(2.5, 2.0; Σ=1.0),
],
t0 = 0.5,
x0_prior = KnownStartingPt(10.0),
) # recording n°3
push!(recordings, extra_recording)
all_obs = AllObservations()
add_recordings!(all_obs, recordings)
The dictionary specifying interdependence between the laws of stochastic processes can now be defined as follows:
add_dependency!(
all_obs,
Dict(
:α_shared => [(1, :α), (2, :α)],
:β_shared => [(1, :β), (2, :β), (3,:β)],
)
)
where, notice presence of additional (3,:β)
. This time, calling
initialised_all_obs, _ = initialize(all_obs)
not only splits the recordings at the time of full observations (resulting in 6
independent recordings), but also introduces a new named parameter REC3_γ
that only the last recording depends on. This comes from the fact that in the original all_obs
the third recording came from law LawB
, which depends on the parameter γ
that was not shared with any other recording and hence did not appear in the inter-dependency dictionary. Every such "lonely" parameter is introduced by a function initialize
and is given a name by pre-pending its original name with REC($i)_
, with ($i)
denoting the original index of a recording that the parameter came from.
Now:
julia> print_parameters(initialised_all_obs)
There are 6 independent recordings.
There are also 3 variable parameters.
* * *
You may define the var-parameters using the following template:
# start of template
using OrderedCollections
θ_init = OrderedDict(
:β_shared => ... , # param 1
:α_shared => ... , # param 2
:REC3_γ => ... , # param 3
)
# end of template
and in an MCMC setting you may let your parameter update step
refer to a subset of indices you wish to update using the order
given above.
* * *
and
julia> initialised_all_obs.param_depend_rev
6-element Array{Array{Tuple{Symbol,Symbol},1},1}:
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:α_shared, :α)]
[(:β_shared, :β), (:REC3_γ, :γ)]