diff --git a/docs/en/stack/ml/get-started/images/ml-gs-data-ip.jpg b/docs/en/stack/ml/get-started/images/ml-gs-data-ip.jpg deleted file mode 100644 index e3bf4109b..000000000 Binary files a/docs/en/stack/ml/get-started/images/ml-gs-data-ip.jpg and /dev/null differ diff --git a/docs/en/stack/ml/get-started/images/ml-gs-data-keyword.jpg b/docs/en/stack/ml/get-started/images/ml-gs-data-keyword.jpg index 41cb1bfc5..767142ee9 100644 Binary files a/docs/en/stack/ml/get-started/images/ml-gs-data-keyword.jpg and b/docs/en/stack/ml/get-started/images/ml-gs-data-keyword.jpg differ diff --git a/docs/en/stack/ml/get-started/images/ml-gs-data-metric.jpg b/docs/en/stack/ml/get-started/images/ml-gs-data-metric.jpg index dbb82b257..015c90220 100644 Binary files a/docs/en/stack/ml/get-started/images/ml-gs-data-metric.jpg and b/docs/en/stack/ml/get-started/images/ml-gs-data-metric.jpg differ diff --git a/docs/en/stack/ml/get-started/images/ml-gs-data-timestamp.jpg b/docs/en/stack/ml/get-started/images/ml-gs-data-timestamp.jpg deleted file mode 100644 index 2374021d5..000000000 Binary files a/docs/en/stack/ml/get-started/images/ml-gs-data-timestamp.jpg and /dev/null differ diff --git a/docs/en/stack/ml/get-started/ml-gs-visualizer.asciidoc b/docs/en/stack/ml/get-started/ml-gs-visualizer.asciidoc index 90e23bc75..0598096e5 100644 --- a/docs/en/stack/ml/get-started/ml-gs-visualizer.asciidoc +++ b/docs/en/stack/ml/get-started/ml-gs-visualizer.asciidoc @@ -27,55 +27,38 @@ exploring. Alternatively, click *Use full kibana_sample_data_logs data* to view the full time range of data. . Optional: Change the sample size, which is the number of documents per shard -that are used in the visualizations. There is a relatively small number of -documents in the sample data, so you can choose a value of `all`. For larger -data sets, keep in mind that using a large sample size increases query run times -and increases the load on the cluster. +that are used in the {data-viz}. There is a relatively small number of +documents in the {kib} sample data, so you can choose a value of `all`. For +larger data sets, keep in mind that using a large sample size increases query +run times and increases the load on the cluster. -. Explore the fields and metrics in the {data-viz}. +. Explore the fields in the {data-viz}. + -- -It lists the fields in two sections. The first section contains -the numeric ("metric") data types. The second section contains non-numeric data -types (such as `keyword`, `text`, `date`, `boolean`, `ip`, and `geo_point`). For -more information, see {ref}/mapping-types.html[Field data types]. - -For each metric, the {data-viz} indicates how many documents contain the field -in the selected time period. It also provides information about the minimum, -median, and maximum values, the number of distinct values, and their -distribution. You can use the distribution chart to get a better idea of how -the values in the data are clustered. Alternatively, you can view the top values -for metric fields. For example: - -[role="screenshot"] -image::images/ml-gs-data-metric.jpg["{data-viz} output for top values in {kib}", width="50%",role="screenshot left"] +You can filter the list by field names or {ref}/mapping-types.html[field types]. +The {data-viz} indicates how many of the documents in the sample for the +selected time period contain each field. In particular, look at the `clientip`, `response.keyword`, and `url.keyword` -fields, since we'll use them in our {anomaly-jobs}. For -{ref}/ip.html[`ip`] and {ref}/keyword.html[`keyword`] fields, the {data-viz} -provides the number of distinct values, a list of the top values, and the number -and percentage of documents that contain the field during the selected time -period. For example: +fields, since we'll use them in our {anomaly-jobs}. For these fields, the +{data-viz} provides the number of distinct values, a list of the top values, and +the number and percentage of documents that contain the field. For example: [role="screenshot"] -image:images/ml-gs-data-keyword.jpg["{data-viz} output for keyword fields in {kib}", width="50%",role="screenshot left"] +image::images/ml-gs-data-keyword.jpg["{data-viz} output for ip and keyword fields"] -[role="screenshot"] -image:images/ml-gs-data-ip.jpg["{data-viz} output for ip fields in {kib}", width="50%",role="screenshot left"] +For numeric fields, the {data-viz} provides information about the minimum, +median, maximum, and top values, the number of distinct values, and their +distribution. You can use the distribution chart to get a better idea of how the +values in the data are clustered. For example: --- +[role="screenshot"] +image::images/ml-gs-data-metric.jpg["{data-viz} for sample web logs"] -. Make note of the range of dates in the `@timestamp` field. They are relative -to when you added the sample data and you'll need that information later in the -tutorial. -+ --- -For {ref}/date.html[`date`] fields, the {data-viz} provides the earliest and -latest field values and the number and percentage of documents that contain the -field during the selected time period: +TIP: Make note of the range of dates in the `@timestamp` field. They are +relative to when you added the sample data and you'll need that information +later in the tutorial. -[role="screenshot"] -image:images/ml-gs-data-timestamp.jpg["{data-viz} output for date fields in {kib}",width="50%",role="screenshot left"] -- Now that you're familiar with the data in the `kibana_sample_data_logs` index,