# predict

The `predict`

operator uses a series of time-stamped numerical values to predict future values. The `predict`

operator is useful in the following cases:

- As an early warning system, alerting you when a threshold is about to be reached.
- For resource and capacity planning, helpful for determining seasonal impacts, like a Cyber Monday rush on an ecommerce site.
- Improved risk calculation.

For example, you could use `predict`

to take your current disk space capacity numbers, and predict when your system might run out of disk space. In these cases, the sooner an operations manager is informed that a key threshold is about to be reached the more effectively he or she can plan to avoid service degradation.

The `predict`

operator supports two predictive models:

- Auto-regressive. Uses an advanced auto-regressive (AR) algorithm to learn patterns in the data. It automatically detects the cyclical patterns in the data and uses the cycles in its prediction.

- Linear regression. Uses existing data over the query time range as a training set to generate a linear model, and then extrapolates future values using this model.

### Syntax

The syntax for ** predict** varies depending on whether you use the linear regression model or the auto-regressive model. In either case, the following requirements apply:

- The query must contain an aggregate operator for example,
`count, min, max, sum`

and so on. Aggregation must be by timeslice, for example,`count by _timeslice.`

- The query must contain the
`timeslice`

operator. - Both the aggregate operator and the
`timeslice`

operator must precede theoperator.`predict`

#### Syntax for the linear regression model

For the linear regression model:

`... | timeslice 1m | count by _timeslice | predict _count by 1m`

The linear regression algorithm produces the following fields in the output:

The number of matches per minute for the currently selected time range.`_count`

Value predicted by the simple linear model.`_count_predicted`

Value predicted by the simple linear model, minus the actual number.`(absolute value)_count_error`

#### Syntax for the auto-regressive model

`… | timeslice 1m | count by _timeslice | predict _count by 1m model=ar, ar.window=n, forecast=n`

The table below defines the parameters for running `predict`

using the AR model.

In the following query, the first three lines count the number of messages that contain an error term for every half minute. The last line uses the auto-regressive model to predict 100 data points in the future, based on 50 data points.

`_sourceCategory=taskmanager jobState=InQueue error | timeslice 30s | count by _timeslice | predict _count by 30s model=ar,ar.window=50,forecast=100`

The auto-regressive algorithm produces the following fields in the output:

`_count`

The number of errors per 30-second timeslice.`_count_predicted`

Value predicted by the auto-regressive algorithm.`_count_linear`

Value predicted by the simple linear regression.`(absolute value)_count_error`

Value predicted by the simple linear regression minus the actual number.

### Limitations

These internal limitations are meant to provide "speed bumps" to ensure the best performance.

`predict`

will not use more than 10,000 input points to estimate the model.`predict`

will not forecast more than 100 points into the future.`predict`

will not interpolate more than 20,000 input points. (Predict adds "phantom" input points where there should be a timeslice, but no data point is present.)

### Cyclical patterns and the auto-regressive model

If there are cyclical patterns that fit within the `ar.window`

, the auto-regressive algorithm will learn the cyclical pattern and use that in prediction.

For example, if there is an hourly cyclical pattern, the following query will learn that cycle:

`… | timeslice 5m`

| <aggregate function> by _timeslice as _val

| predict _val by 5m model=ar, ar.window=15

In this query, the window size (15 consecutive data points) covers more than 1 hour (15 data points * 5m interval = 75 minutes). So if there are cyclical patterns with a period of less than 75 minutes, the model will discover them.

### Examples

#### predict using linear regression

This query predicts the count of 404 errors per minute using linear regression.

`_sourceCategory=Labs/Apache/Access status_code=404 | timeslice 1m | count(status_code) as error_count by _timeslice | predict error_count by 1m `

The query returns an aggregation table with columns for `error_count`

, `error_count_predicted`

, and `error_count_error`

.

From here, you can select the **Line Chart** icon, and automatically create a Combo Chart that represents the `error_count_error`

as a column chart, and the `error_count`

and `error_count_predicted`

mapped on top of that with separate lines. Note that the `(absolute value)_count_error`

series is toggled off by default. Click it in the legend to display the column chart.

#### predict using auto-regressive model

This query predicts the count of 404 errors per minute using the auto-regressive model.

`_sourceCategory=Labs/Apache/Access status_code=404 | timeslice 1m | count(status_code) as error_count by _timeslice | predict error_count by 1m model=ar `

The query returns an aggregation table with columns for `error_count`

, `error_count_predicted`

, `error_count_linear`

, and `_error_count_error`

.

From here, you can select the **Line Chart** icon, and automatically create a Combo Chart that represents the `error_count_error`

as a column chart, and the `error_count`

and `error_count_predicted`

mapped on top of that with separate lines. Note that the `(absolute value)_count_error`

series is toggled off by default. Click it in the legend to display the column chart.

Note that, if desired, you can display the `_count_linear`

series, to see the value predicted by the simple linear regression model by clicking it in the legend.