Interpolation, Derivatives and Integrals | Apache Solr Reference Guide 8.8.2

This section explores the interrelated math expressions for interpolation and numerical calculus.

Interpolation

Interpolation is used to construct new data points between a set of known control of points. The ability to predict new data points allows for sampling along the curve defined by the control points.

The interpolation functions described below all return an interpolation function that can be passed to other functions which make use of the sampling capability.

If returned directly the interpolation function returns an array containing predictions for each of the control points. This is useful in the case of loess interpolation which first smooths the control points and then interpolates the smoothed points. All other interpolation functions simply return the original control points because interpolation predicts a curve that passes through the original control points.

There are different algorithms for interpolation that will result in different predictions along the curve. The math expressions library currently supports the following interpolation functions:

lerp: Linear interpolation predicts points that pass through each control point and form straight lines between control points.
spline: Spline interpolation predicts points that pass through each control point and form a smooth curve between control points.
akima: Akima spline interpolation is similar to spline interpolation but is stable to outliers.
loess: Loess interpolation first performs a non-linear local regression to smooth the original control points. Then a spline is used to interpolate the smoothed control points.

Sampling Along the Curve

One way to better understand interpolation is to visualize what it means to sample along a curve. The example below zooms in on a specific region of a curve by sampling the curve between a specific x-axis range.

$interpolate1$

The visualization above first creates two arrays with x and y-axis points. Notice that the x-axis ranges from 0 to 9. Then the akima, spline and lerp functions are applied to the vectors to create three interpolation functions.

Then 500 hundred random samples are drawn from a uniform distribution between 0 and 3. These are the new zoomed in x-axis points, between 0 and 3. Notice that we are sampling a specific area of the curve.

Then the predict function is used to predict y-axis points for the sampled x-axis, for all three interpolation functions. Finally all three prediction vectors are plotted with the sampled x-axis points.

The red line is the lerp interpolation, the blue line is the akima and the purple line is the spline interpolation. You can see they each produce different curves in between the control points.

Smoothing Interpolation

The loess function is a smoothing interpolator which means it doesn’t derive a function that passes through the original control points. Instead the loess function returns a function that smooths the original control points.

A technique known as local regression is used to compute the smoothed curve. The size of the neighborhood of the local regression can be adjusted to control how close the new curve conforms to the original control points.

The loess function is passed x- and y-axes and fits a smooth curve to the data. If only a single array is provided it is treated as the y-axis and a sequence is generated for the x-axis.

The example below shows the loess function being used to model a monthly time series. In the example the timeseries function is used to generate a monthly time series of average closing prices for the stock ticker AMZN. The date_dt and avg(close_d) fields from the time series are then vectorized and stored in variables x and y. The loess function is then applied to the y vector containing the average closing prices. The bandwidth named parameter specifies the percentage of the data set used to compute the local regression. The loess function returns the fitted model of smoothed data points.

The zplot function is then used to plot the x, y and y1 variables.

$loess$

Derivatives

The derivative of a function measures the rate of change of the y value in respects to the rate of change of the x value.

The derivative function can compute the derivative for any of the interpolation functions described above. Each interpolation function will produce different derivatives that match the characteristics of the function.

The First Derivative (Velocity)

A simple example shows how the derivative function is used to calculate the rate of change or velocity.

In the example two vectors are created, one representing hours and one representing miles traveled. The lerp function is then used to create a linear interpolation of the hours and miles vectors. The derivative function is then applied to the linear interpolation. zplot is then used to plot the hours on the x-axis and miles on the y-axis, and the derivative as mph, at each x-axis point.

$derivative$

Notice that the miles_traveled line has a slope of 10 until the 5th hour where it changes to a slope of 50. The mph line, which is the derivative, visualizes the velocity of the miles_traveled line.

Also notice that the derivative is calculated along straight lines showing immediate change from one point to the next. This is because linear interpolation (lerp) is used as the interpolation function. If the spline or akima functions had been used it would have produced a derivative with rounded curves.

The Second Derivative (Acceleration)

While the first derivative represents velocity, the second derivative represents acceleration. The second the derivative is the derivative of the first derivative.

The example below builds on the first example and adds the second derivative. Notice that the second derivative d2 is taken by applying the derivative function to a linear interpolation of the first derivative.

The second derivative is plotted as acceleration on the chart.

$derivatives$

Notice that the acceleration line is 0 until the mph line increases from 10 to 50. At this point the acceleration line moves to 40. As the mph line stays at 50, the acceleration line drops to 0.

Price Velocity

The example below shows how to plot the derivative for a time series generated by the timeseries function. In the example a monthly time series is generated for the average closing price for the stock ticker amzn. The avg(close) column is vectorized and interpolated using linear interpolation (lerp). The zplot function is then used to plot the derivative of the time series.

$derivative2$

Notice that the derivative plot clearly shows the rates of change in the stock price over time.

Integrals

An integral is a measure of the volume underneath a curve. The integral function computes the cumulative integrals for a curve or the integral for a specific range of an interpolated curve. Like the derivative function the integral function operates over interpolation functions.

Single Integral

If the integral function is passed a start and end range it will compute the volume under the curve within that specific range.

In the example below the integral function computes an integral for the entire range of the curve, 0 through 10. Notice that the integral function is passed the interpolated curve and the start and end range, and returns the integral for the range.

let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
    y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
    curve=loess(x, y, bandwidth=.3),
    integral=integral(curve,  0, 10))

When this expression is sent to the /stream handler it responds with:

{
  "result-set": {
    "docs": [
      {
        "integral": 45.300912584519914
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 0
      }
    ]
  }
}

Cumulative Integral Plot

If the integral function is passed a single interpolated curve it returns a vector of the cumulative integrals for the curve. The cumulative integrals vector contains a cumulative integral calculation for each x-axis point. The cumulative integral is calculated by taking the integral of the range between each x-axis point and the first x-axis point. In the example above this would mean calculating a vector of integrals as such:

let(x=array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
    y=array(0, 1, 2, 3, 4, 5.7, 6, 7, 7, 7,6, 7, 7, 7, 6, 5, 5, 3, 2, 1, 0),
    curve=loess(x, y, bandwidth=.3),
    integrals=array(0, integral(curve, 0, 1), integral(curve, 0, 2), integral(curve, 0, 3), ...)

The plot of cumulative integrals visualizes how much cumulative volume of the curve is under each point x-axis point.

The example below shows the cumulative integral plot for a time series generated by the timeseries function. In the example a monthly time series is generated for the average closing price for the stock ticker amzn. The avg(close) column is vectorized and interpolated using a spline.

The zplot function is then used to plot the cumulative integral of the time series.

$integral$

The plot above visualizes the volume under the curve as the AMZN stock price changes over time. Because this plot is cumulative, the volume under a stock price time series which stays the same over time, will have a positive linear slope. A stock that has rising prices will have a concave shape and a stock with falling prices will have a convex shape.

In this particular example the integral plot becomes more concave over time showing accelerating increases in stock price.

Bicubic Spline

The bicubicSpline function can be used to interpolate and predict values anywhere within a grid of data.

A simple example will make this more clear:

let(years=array(1998, 2000, 2002, 2004, 2006),
    floors=array(1, 5, 9, 13, 17, 19),
    prices = matrix(array(300000, 320000, 330000, 350000, 360000, 370000),
                    array(320000, 330000, 340000, 350000, 365000, 380000),
                    array(400000, 410000, 415000, 425000, 430000, 440000),
                    array(410000, 420000, 425000, 435000, 445000, 450000),
                    array(420000, 430000, 435000, 445000, 450000, 470000)),
    bspline=bicubicSpline(years, floors, prices),
    prediction=predict(bspline, 2003, 8))

In this example a bicubic spline is used to interpolate a matrix of real estate data. Each row of the matrix represent specific years. Each column of the matrix represents floors of the building. The grid of numbers is the average selling price of an apartment for each year and floor. For example in 2002 the average selling price for the 9th floor was 415000 (row 3, column 3).

The bicubicSpline function is then used to interpolate the grid, and the predict function is used to predict a value for year 2003, floor 8. Notice that the matrix does not include a data point for year 2003, floor 8. The bicubicSpline function creates that data point based on the surrounding data in the matrix:

{
  "result-set": {
    "docs": [
      {
        "prediction": 418279.5009328358
      },
      {
        "EOF": true,
        "RESPONSE_TIME": 0
      }
    ]
  }
}