Interpolation, derivatives and integrals are three interrelated topics which are part of the field of mathematics called numerical analysis. This section explores the math expressions available for numerical anlysis.
Interpolation
Interpolation is used to construct new data points between a set of known control of points. The ability to predict new data points allows for sampling along the curve defined by the control points.
The interpolation functions described below all return an interpolation model that can be passed to other functions which make use of the sampling capability.
If returned directly the interpolation model returns an array containing predictions for each of the
control points. This is useful in the case of loess
interpolation which first smooths the control points
and then interpolates the smoothed points. All other interpolation functions simply return the original
control points because interpolation predicts a curve that passes through the original control points.
There are different algorithms for interpolation that will result in different predictions along the curve. The math expressions library currently supports the following interpolation functions:
-
lerp
: Linear interpolation predicts points that pass through each control point and form straight lines between control points. -
spline
: Spline interpolation predicts points that pass through each control point and form a smooth curve between control points. -
akima
: Akima spline interpolation is similar to spline interpolation but is stable to outliers. -
loess
: Loess interpolation first performs a non-linear local regression to smooth the original control points. Then a spline is used to interpolate the smoothed control points.
Upsampling
Interpolation can be used to increase the sampling rate along a curve. One example of this would be to take a time series with samples every minute and create a data set with samples every second. In order to do this the data points between the minutes must be created.
The predict
function can be used to predict values anywhere within the bounds of the interpolation
range. The example below shows a very simple example of upsampling.
let(x=array(0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20), (1)
y=array(5, 10, 60, 190, 100, 130, 100, 20, 30, 10, 5), (2)
l=lerp(x, y), (3)
u=array(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20), (4)
p=predict(l, u)) (5)
1 | In the example linear interpolation is performed on the arrays in variables x and y . The x variable,
which is the x-axis, is a sequence from 0 to 20 with a stride of 2. |
2 | The y variable defines the curve along the x-axis. |
3 | The lerp function performs the interpolation and returns the interpolation model. |
4 | The u value is an array from 0 to 20 with a stride of 1. This fills in the gaps of the original x axis.
The predict function then uses the interpolation function in variable l to predict values for
every point in the array assigned to variable u . |
5 | The variable p is the array of predictions, which is the upsampled set of y values. |
When this expression is sent to the /stream
handler it responds with:
{
"result-set": {
"docs": [
{
"g": [
5,
7.5,
10,
35,
60,
125,
190,
145,
100,
115,
130,
115,
100,
60,
20,
25,
30,
20,
10,
7.5,
5
]
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
Smoothing Interpolation
The loess
function is a smoothing interpolator which means it doesn’t derive
a function that passes through the original control points. Instead the loess
function
returns a function that smooths the original control points.
A technique known as local regression is used to compute the smoothed curve. The size of the neighborhood of the local regression can be adjusted to control how close the new curve conforms to the original control points.
The loess
function is passed x
- and y
-axes and fits a smooth curve to the data.
If only a single array is provided it is treated as the y
-axis and a sequence is generated
for the x
-axis.
The example below uses the loess
function to fit a curve to a set of y
values in an array.
The bandwidth
parameter defines the percent of data to use for the local
regression. The lower the percent the smaller the neighborhood used for the local
regression and the closer the curve will be to the original data.
let(echo="residuals, sumSqError",
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(y, bandwidth=.3),
residuals=ebeSubtract(y, curve),
sumSqError=sumSq(residuals))
In the example the fitted curve is subtracted from the original curve using the
ebeSubtract
function. The output shows the error between the
fitted curve and the original curve, known as the residuals. The output also includes
the sum-of-squares of the residuals which provides a measure
of how large the error is:
{
"result-set": {
"docs": [
{
"residuals": [
0,
0,
0,
-0.040524802275866634,
-0.10531988096456502,
0.5906115002526198,
0.004215074334896762,
0.4201374330912433,
0.09618315578013803,
0.012107948556718817,
-0.9892939034492398,
0.012014364143757561,
0.1093830927709325,
0.523166271893805,
0.09658362075164639,
-0.011433819306139625,
0.9899403519886416,
-0.011707983372932773,
-0.004223284004140737,
-0.00021462867928434548,
0.0018723112875456138
],
"sumSqError": 2.8016013870800616
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
In the next example the curve is fit using a bandwidth
of .25
:
let(echo="residuals, sumSqError",
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(y, .25),
residuals=ebeSubtract(y, curve),
sumSqError=sumSq(residuals))
Notice that the curve is a closer fit, shown by the smaller residuals
and lower value for the sum-of-squares of the
residuals:
{
"result-set": {
"docs": [
{
"residuals": [
0,
0,
0,
0,
-0.19117650587715396,
0.442863451538809,
-0.18553845993358564,
0.29990769020356645,
0,
0.23761890236245709,
-0.7344358765888117,
0.2376189023624491,
0,
0.30373119215254984,
-3.552713678800501e-15,
-0.23761890236245264,
0.7344358765888046,
-0.2376189023625095,
0,
2.842170943040401e-14,
-2.4868995751603507e-14
],
"sumSqError": 1.7539413576337557
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
Derivatives
The derivative of a function measures the rate of change of the y
value in respects to the
rate of change of the x
value.
The derivative
function can compute the derivative of any interpolation function.
It can also compute the derivative of a derivative.
The example below computes the derivative for a loess
interpolation function.
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
derivative=derivative(curve))
When this expression is sent to the /stream
handler it
responds with:
{
"result-set": {
"docs": [
{
"derivative": [
1.0022002675659012,
0.9955994648681976,
1.0154018729613081,
1.0643674501141696,
1.0430879694757085,
0.9698717643975381,
0.7488201070357539,
0.44627000894357516,
0.19019561285422165,
0.01703599324311178,
-0.001908408138535126,
-0.009121607450087499,
-0.2576361507216319,
-0.49378951291352746,
-0.7288073815664,
-0.9871806872210384,
-1.0025400632604322,
-1.001836567536853,
-1.0076227586138085,
-1.0021524620888589,
-1.0020541789058157
]
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
Integrals
An integral is a measure of the volume underneath a curve.
The integrate
function computes an integral for a specific
range of an interpolated curve.
In the example below the integrate
function computes an
integral for the entire range of the curve, 0 through 20.
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
integral=integrate(curve, 0, 20))
When this expression is sent to the /stream
handler it
responds with:
{
"result-set": {
"docs": [
{
"integral": 90.17446104846645
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
In the next example an integral is computed for the range of 0 through 10.
let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
curve=loess(x, y, bandwidth=.3),
integral=integrate(curve, 0, 10))
When this expression is sent to the /stream
handler it
responds with:
{
"result-set": {
"docs": [
{
"integral": 45.300912584519914
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}
Bicubic Spline
The bicubicSpline
function can be used to interpolate and predict values
anywhere within a grid of data.
A simple example will make this more clear:
let(years=array(1998, 2000, 2002, 2004, 2006),
floors=array(1, 5, 9, 13, 17, 19),
prices = matrix(array(300000, 320000, 330000, 350000, 360000, 370000),
array(320000, 330000, 340000, 350000, 365000, 380000),
array(400000, 410000, 415000, 425000, 430000, 440000),
array(410000, 420000, 425000, 435000, 445000, 450000),
array(420000, 430000, 435000, 445000, 450000, 470000)),
bspline=bicubicSpline(years, floors, prices),
prediction=predict(bspline, 2003, 8))
In this example a bicubic spline is used to interpolate a matrix of real estate data.
Each row of the matrix represent specific years
. Each column of the matrix
represents floors
of the building. The grid of numbers is the average selling price of
an apartment for each year and floor. For example in 2002 the average selling price for
the 9th floor was 415000
(row 3, column 3).
The bicubicSpline
function is then used to
interpolate the grid, and the predict
function is used to predict a value for year 2003, floor 8.
Notice that the matrix does not include a data point for year 2003, floor 8. The bicupicSpline
function creates that data point based on the surrounding data in the matrix:
{
"result-set": {
"docs": [
{
"prediction": 418279.5009328358
},
{
"EOF": true,
"RESPONSE_TIME": 0
}
]
}
}