Title: | Miscellaneous Utilities for Extreme Value Analysis |
---|---|
Description: | Provides utility functions and objects for Extreme Value Analysis. These include probability functions with their exact derivatives w.r.t. the parameters that can be used for estimation and inference, even with censored observations. The transformations exchanging the two parameterizations of Peaks Over Threshold (POT) models: Poisson-GP and Point-Process are also provided with their derivatives. |
Authors: | Yves Deville [cre, aut] |
Maintainer: | Yves Deville <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.5 |
Built: | 2025-02-13 03:57:04 UTC |
Source: | https://github.com/yvesdeville/nieve |
The DESCRIPTION file:
Package: | nieve |
Type: | Package |
Title: | Miscellaneous Utilities for Extreme Value Analysis |
Version: | 0.1.5 |
Authors@R: | c(person(given = "Yves", family = "Deville", role = c("cre", "aut"), email = "[email protected]", comment = c(ORCID = "0000-0002-1233-488X"))) |
Maintainer: | Yves Deville <[email protected]> |
Description: | Provides utility functions and objects for Extreme Value Analysis. These include probability functions with their exact derivatives w.r.t. the parameters that can be used for estimation and inference, even with censored observations. The transformations exchanging the two parameterizations of Peaks Over Threshold (POT) models: Poisson-GP and Point-Process are also provided with their derivatives. |
License: | GPL (>= 2) |
Suggests: | testthat, numDeriv, Renext, knitr, covr |
Encoding: | UTF-8 |
URL: | https://github.com/yvesdeville/nieve/ |
BugReports: | https://github.com/yvesdeville/nieve/issues/ |
RoxygenNote: | 7.2.3 |
VignetteBuilder: | knitr |
Repository: | https://yvesdeville.r-universe.dev |
RemoteUrl: | https://github.com/yvesdeville/nieve |
RemoteRef: | HEAD |
RemoteSha: | 93287a258482b1476976ce5196125a87599f8063 |
Author: | Yves Deville [cre, aut] (<https://orcid.org/0000-0002-1233-488X>) |
Index of help topics:
Exp1 Density, Distribution Function, Quantile Function and Random Generation for the One-Parameter Exponential Distribution GEV Density, Distribution Function, Quantile Function and Random Generation for the Generalized Extreme Value (GEV) Distribution GPD2 Density, Distribution Function, Quantile Function and Random Generation for the Two-Parameter Generalized Pareto Distribution (GPD) PP2poisGP Transform Point-Process Parameters into Poisson-GP Parameters nieve-package Miscellaneous Utilities for Extreme Value Analysis poisGP2PP Transform Poisson-GP Parameters into Point-Process Parameters
The nieve package provides utility functions for Extreme Value Analysis. It includes the probability functions for the two-parameter Generalized Pareto Distribution (GPD) and for the three-parameter Generalized Extreme Value (GEV) distribution. These functions are vectorized w.r.t. the parameters and optionally provide the exact derivatives w.r.t. the parameters: gradient and Hessian which can be used in optimization e.g., to maximize the log-likelihood. Since the gradient and the Hessian are available for the log-density and for the distribution function, the exact gradient and the exact Hessian of the log-likelihood function is available even when censored observations are used.
These functions should behave like the probability functions of
the stats package. For instance, when a probability p
= 0.0
or p = 1.0
is given, the quantile functions should
return the lower and the upper end-point, be they finite or
not. Also when evaluated at -Inf
and Inf
the
probability functions should return 0.0
and
1.0
. Mind however that the gradient and the Hessian of the
upper-end point are not to be trusted for now.
The nieve package was partly funded by the French Institut de Radioprotection et Sûreté Nucléaire (IRSN) and some of the code formerly was part of R packages owned by the IRSN Bureau d'Expertise en Hydrogéologie et sur les Risques d'Inondation, météorologiques et Géotechniques (Behrig).
Density, distribution function, quantile function and random
generation for the one-parameter Exponential Distribution
distribution with scale parameter scale
.
dexp1(x, scale = 1, log = FALSE, deriv = FALSE, hessian = FALSE) pexp1(q, scale = 1, lower.tail = TRUE, deriv = FALSE, hessian = FALSE) qexp1(p, scale = 1, lower.tail = TRUE, deriv = FALSE, hessian = FALSE) rexp1(n, scale = 1, array)
dexp1(x, scale = 1, log = FALSE, deriv = FALSE, hessian = FALSE) pexp1(q, scale = 1, lower.tail = TRUE, deriv = FALSE, hessian = FALSE) qexp1(p, scale = 1, lower.tail = TRUE, deriv = FALSE, hessian = FALSE) rexp1(n, scale = 1, array)
x , q
|
Vector of quantiles. |
scale |
Scale parameter. Numeric vector with suitable length, see Details. |
log |
Logical; if |
deriv |
Logical. If |
hessian |
Logical. If |
lower.tail |
Logical; if |
p |
Vector of probabilities. |
n |
Sample size. |
array |
Logical. If |
The survival and density functions are given by
where is the scale parameter. This distribution is
the Generalized Pareto Distribution for a shape
.
The probability functions d
, p
and q
all allow the parameter scale
to be a vector. Then the
recycling rule is used to get two vectors of the same length,
corresponding to the first argument and to the scale
parameter. This behaviour is the standard one for the
probability functions of the stats package but is
unusual in R packages devoted to Extreme Value in which the
parameters must generally have length one. Note that the
provided functions can be used e.g. to evaluate the quantile
with a given probability for a large number of values of the
parameter vector shape
. This is frequently required in
he Bayesian framework with MCMC inference.
A numeric vector with its length equal to the maximum of
the two lengths: that of the first argument and that of the
parameter scale
. When deriv
is TRUE
, the
returned value has an attribute named "gradient"
which
is a matrix with lines and
column containing
the derivative. A row contains the partial derivative of the
corresponding element w.r.t. the parameter
"scale"
.
The attributes "gradient"
and "hessian"
have
dimension c(n, 1)
and c(n, 1, 1)
, even when
n
equals 1
. Use the drop
method on these
objects to drop the extra dimension if wanted i.e. to get a
gradient vector and a Hessian matrix.
The exponential distribution
Exponential
with being the
inverse scale.
## Illustrate the effect of recycling rule. pexp1(1.0, scale = 1:4, lower.tail = FALSE) - exp(-1.0 / (1:4)) pexp1(1:4, scale = 1:4, lower.tail = FALSE) - exp(-1.0) ## With gradient and Hessian. pexp1(c(1.1, 1.7), scale = 1, deriv = TRUE, hessian = TRUE) ti <- 1:60; names(ti) <- 2000 + ti sigma <- 1.0 + 0.7 * ti ## simulate 40 paths y <- rexp1(n = 40, scale = sigma) matplot(ti, y, type = "l", col = "gray", main = "varying scale") lines(ti, apply(y, 1, mean))
## Illustrate the effect of recycling rule. pexp1(1.0, scale = 1:4, lower.tail = FALSE) - exp(-1.0 / (1:4)) pexp1(1:4, scale = 1:4, lower.tail = FALSE) - exp(-1.0) ## With gradient and Hessian. pexp1(c(1.1, 1.7), scale = 1, deriv = TRUE, hessian = TRUE) ti <- 1:60; names(ti) <- 2000 + ti sigma <- 1.0 + 0.7 * ti ## simulate 40 paths y <- rexp1(n = 40, scale = sigma) matplot(ti, y, type = "l", col = "gray", main = "varying scale") lines(ti, apply(y, 1, mean))
Density, distribution function, quantile function and
random generation for the Generalized Extreme Value (GEV)
distribution with parameters loc
, scale
and
shape
.
The distribution function
is given by
when and
, and by
for where
in both cases.
dGEV( x, loc = 0, scale = 1, shape = 0, log = FALSE, deriv = FALSE, hessian = FALSE ) pGEV( q, loc = 0, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) qGEV( p, loc = 0, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) rGEV(n, loc = 0, scale = 1, shape = 0, array)
dGEV( x, loc = 0, scale = 1, shape = 0, log = FALSE, deriv = FALSE, hessian = FALSE ) pGEV( q, loc = 0, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) qGEV( p, loc = 0, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) rGEV(n, loc = 0, scale = 1, shape = 0, array)
x , q
|
Vector of quantiles. |
loc |
Location parameter. Numeric vector with suitable length, see Details. |
scale |
Scale parameter. Numeric vector with suitable length, see Details. |
shape |
Shape parameter. Numeric vector with suitable length, see Details. |
log |
Logical; if |
deriv |
Logical. If |
hessian |
Logical. If |
lower.tail |
Logical; if |
p |
Vector of probabilities. |
n |
Sample size. |
array |
Logical. If |
Each of the probability function normally requires two
formulas: one for the non-zero shape case and one for the zero-shape case
. However
the non-zero shape formulas lead to numerical instabilities
near
, especially for the derivatives
w.r.t.
. This can create problem in optimization
tasks. To avoid this, a Taylor expansion w.r.t.
is
used for
for a small positive
. The expansion has order
for the
functions (log-density, distribution and quantile), order
for their first-order derivatives and order
for the second-order derivatives.
For the d
, p
and q
functions, the GEV
parameter arguments loc
, scale
and shape
are recycled in the same fashion as the classical R
distribution functions in the stats package, see e.g.,
Normal
, GammaDist
, ...
Let n
be the maximum length of the four arguments:
x
q
or p
and the GEV parameter arguments,
then the four provided vectors are recycled in order to have
length n
. The returned vector has length n
and
the attributes "gradient"
and "hessian"
, when
computed, are arrays wich dimension: c(1, 3)
and
c(1, 3, 3)
.
A numeric vector with length n
as described in the
Details section. When deriv
is TRUE
, the
returned value has an attribute named "gradient"
which
is a matrix with lines and
columns containing
the derivatives. A row contains the partial derivatives of the
corresponding element w.r.t. the three parameters
loc
scale
and shape
in that order.
ti <- 1:10; names(ti) <- 2000 + ti mu <- 1.0 + 0.1 * ti ## simulate 40 paths y <- rGEV(n = 40, loc = mu, scale = 1, shape = 0.05) matplot(ti, y, type = "l", col = "gray") lines(ti, apply(y, 1, mean))
ti <- 1:10; names(ti) <- 2000 + ti mu <- 1.0 + 0.1 * ti ## simulate 40 paths y <- rGEV(n = 40, loc = mu, scale = 1, shape = 0.05) matplot(ti, y, type = "l", col = "gray") lines(ti, apply(y, 1, mean))
Density, distribution function, quantile function and
random generation for the two-parameter Generalized Pareto
Distribution (GPD) distribution with scale
and
shape
.
dGPD2(x, scale = 1, shape = 0, log = FALSE, deriv = FALSE, hessian = FALSE) pGPD2( q, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) qGPD2( p, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) rGPD2(n, scale = 1, shape = 0, array)
dGPD2(x, scale = 1, shape = 0, log = FALSE, deriv = FALSE, hessian = FALSE) pGPD2( q, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) qGPD2( p, scale = 1, shape = 0, lower.tail = TRUE, deriv = FALSE, hessian = FALSE ) rGPD2(n, scale = 1, shape = 0, array)
x , q
|
Vector of quantiles. |
scale |
Scale parameter. Numeric vector with suitable length, see Details. |
shape |
Shape parameter. Numeric vector with suitable length, see Details. |
log |
Logical; if |
deriv |
Logical. If |
hessian |
Logical. If |
lower.tail |
Logical; if |
p |
Vector of probabilities. |
n |
Sample size. |
array |
Logical. If |
Let and
denote the scale and the shape; the
survival function
is given
for
by
for where
and by
for . For
we have
:
the support of the distribution is
.
The probability functions d
, p
and q
all
allow each of the two GP parameters to be a vector. Then the
recycling rule is used to get three vectors of the same length,
corresponding to the first argument and to the two GP
parameters. This behaviour is the standard one for the probability
functions of the stats. Note that the provided functions
can be used e.g. to evaluate the quantile with a given
probability for a large number of values of the parameter vector
c(shape, scale)
. This is frequently required in he Bayesian
framework with MCMC inference.
A numeric vector with length equal to the maximum of the
four lengths: that of the first argument and that of the two
parameters scale
and shape
. When deriv
is
TRUE
, the returned value has an attribute named
"gradient"
which is a matrix with lines and
columns containing the derivatives. A row contains the partial
derivatives of the corresponding element w.r.t. the two parameters
"scale"
and "shape"
in that order.
The attributes "gradient"
and "hessian"
have
dimension c(n, 2)
and c(n, 2, 2)
, even when n
equals 1
. Use the drop
method on these objects to
drop the extra dimension if wanted i.e. to get a gradient vector
and a Hessian matrix.
## Illustrate the effect of recycling rule. pGPD2(1.0, scale = 1:4, shape = 0.0, lower.tail = FALSE) - exp(-1.0 / (1:4)) pGPD2(1:4, scale = 1:4, shape = 0.0, lower.tail = FALSE) - exp(-1.0) ## With gradient and Hessian. pGPD2(c(1.1, 1.7), scale = 1, shape = 0, deriv = TRUE, hessian = TRUE) ## simulate 40 paths ti <- 1:20 names(ti) <- 2000 + ti y <- rGPD2(n = 40, scale = ti, shape = 0.05) matplot(ti, y, type = "l", col = "gray", main = "varying scale") lines(ti, apply(y, 1, mean))
## Illustrate the effect of recycling rule. pGPD2(1.0, scale = 1:4, shape = 0.0, lower.tail = FALSE) - exp(-1.0 / (1:4)) pGPD2(1:4, scale = 1:4, shape = 0.0, lower.tail = FALSE) - exp(-1.0) ## With gradient and Hessian. pGPD2(c(1.1, 1.7), scale = 1, shape = 0, deriv = TRUE, hessian = TRUE) ## simulate 40 paths ti <- 1:20 names(ti) <- 2000 + ti y <- rGPD2(n = 40, scale = ti, shape = 0.05) matplot(ti, y, type = "l", col = "gray", main = "varying scale") lines(ti, apply(y, 1, mean))
Transform Poisson-GP parameters into Point-Process
(PP) parameters. In the POT Poisson-GP framework the three
parameters are the rate lambda
of the Poisson process in time and the two GP parameters:
scale
and
shape
. The vector
loc
contains the fixed
threshold , and
w
the fixed block
duration. These parameters are converted into the vector of
three parameters of the GEV distribution for the maximum of
the marks on a time interval with duration
w
, the number of these marks being a r.v. with
Poisson distribution. More precisely, the GEV distribution
applies when
.
poisGP2PP(lambda, loc = 0.0, scale = 1.0, shape = 0.0, w = 1.0, deriv = FALSE)
poisGP2PP(lambda, loc = 0.0, scale = 1.0, shape = 0.0, w = 1.0, deriv = FALSE)
lambda |
A numeric vector containing the Poisson rate(s). |
loc |
A numeric vector containing the Generalized Pareto location, i.e. the threshold in the POT framework. |
scale , shape
|
Numeric vectors containing the Generalized Pareto scale and shape parameters. |
w |
The block duration. Its physical dimension is time and
the product |
deriv |
Logical. If |
The three PP parameters ,
and
relate to the Poisson-GP parameters according to
the fraction of the first
equation being to be replaced for
by its limit
.
A numeric matrix with three columns representing the
Point-Process parameters loc
,
scale
and
shape
.
This function is essentially a re-implementation in C of the
function Ren2gev
of Renext. As a
major improvement, this function is "vectorized" w.r.t. the
parameters so it can transform efficiently a large number of
Poisson-GP parameter vectors as can be required e.g. in a MCMC
Bayesian inference. Note also that this function copes with
values near zero for the shape parameter: it suitably computes
then both the function value and its derivatives.
PP2poisGP
for the reciprocal
transformation.
Transform Point Process (PP) parameters into
Poisson-GP parameters. The provided parameters are GEV
parameters: location , scale
and shape
. They are assumed to describe (the
tail of) the distribution for a maximum on a time-interval
with given duration
. For a given threshold
chosen to be in the interior of the support of the GEV
distribution, there exists a unique vector of three Poisson-GP
parameters such that the maximum
of the marks on an
interval with duration
w
has the prescribed GEV
tail. Remind that the three Poisson-GP parameters are the rate
of the Poisson process in time: , and the two
GP parameters:
scale
and
shape
. The shape parameters
and
are identical.
PP2poisGP(locStar = 0.0, scaleStar = 1.0, shapeStar = 0.0, threshold, w = 1.0, deriv = FALSE)
PP2poisGP(locStar = 0.0, scaleStar = 1.0, shapeStar = 0.0, threshold, w = 1.0, deriv = FALSE)
locStar , scaleStar , shapeStar
|
Numeric vectors containing the GEV location, scale and shape parameters. |
threshold |
Numeric vector containing the thresholds of the Poisson-GP model, i.e. the location of the Generalised Pareto Distribution. The threshold must be an interior point of the support of the corresponding GEV distribution. |
w |
The block duration. Its physical dimension is time and
the product |
deriv |
Logical. If |
The Poisson-GP parameters are obtained by
the second equation becomes for
.
A matrix with three columns representing the Poisson-GP
parameters lambda
, scale
and shape
.
This function is essentially a re-implementation in C of the
function gev2Ren
of Renext. As a
major improvement, this function is "vectorized" w.r.t. the
parameters so it can transform efficiently a large number of PP
parameter vectors as it can be required e.g. in a MCMC Bayesian
inference. Note also that this function copes with values near
zero for the shape parameter: it suitably computes then both the
function value and its derivatives.
poisGP2PP
for the reciprocal
transformation.