-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Integrate a Data Importer Plugin into Dashboards #9199
Comments
I love the idea and the high level architecture. I do want us to think about the UX a bit more though. Lets embed this expereince correctly within the UI for OSD. e.g. the Sample data page is a good place for example to add an entry point to this. I also dont like it being a separate page. @kgcreative @lauralexis thoughts? |
I believe this is the wanted feature at least from my perspective :) But I'm not sure if it's necessary to introduce a new plugin for it? Could we have data ingestion be integrated seamlessly with the existing workflow? like within index management plugin, so that you can creating index with data from static data files or selecting an existing index and then ingest data via files? Or like @ashwin-pc suggested, could this ingest experience be part of the existing sample data import page? |
The plugin itself was more of a way to PoC changes without changing core all too much. I'm not opposed to porting this functionality to the existing sample data experience, especially the UI experience, but the benefit of a separate plugin is to easily enable/disable this feature and provide a separate platform for data import, especially if we consider adding more supported file types (structured/unstructured formats). Also, thinking out loud, if we were to integrate this into the sample data experience, we should rename "Sample Data" to "Import Data". Its mostly semantics but users now have the option to import our "sample data" or their "real data". |
Overview
Currently in OpenSearch Dashboards (OSD), users can ingest documents into OpenSearch and visualize their data in OpenSearch Dashboards. However, there is no existing mechanism to easily import custom static data through the Dashboards UI. Sample data exists but there is no way for custom data to be imported. This is similar to #1791, which articulates the issue as well. In short, there are several use cases for enabling data import through Dashboards:
Requirements
This list is by no means exhaustive but there are several basic features that must be a part of this feature.
As a user:
As a developer/admin:
Out of scope
Approach
To integrate support, we must split our approach into UI and server components
UI
OUI component library should contain the necessary components for the user to execute the actions specified in the requirements:
OuiFilePicker
: For uploading dataOuiCodeEditor
/monaco
: For inputting textOuiSelect
: For choosing file type and import type (file/text upload)OuiSelectable
: For choosing the index name (Fetching can be done with theIndexPatternsService
, or as a last resort, expose an API to query for index names)DataSourceSelectable
: This is exposed via thedataSourceManagement
plugin and should handle the datasource fetching for usOuiButton
: For executing the import processServer
In OSD core, we have a similar method to import files via Saved Objects import.
For client side:
OpenSearch-Dashboards/src/plugins/saved_objects_management/public/management_section/objects_table/components/flyout.tsx
Lines 193 to 226 in 33f7ba6
For server side:
OpenSearch-Dashboards/src/core/server/saved_objects/routes/import.ts
Lines 48 to 81 in 33f7ba6
We can follow a similar approach and expose two routes (tentatively named):
As the names suggest, we split up importing text and file inputs into two routes. There are two reasons why they're split:
For text input, the flow is as follows:
validateText()
to ensure data is well formedingestText()
to ingest that document into OpenSearchFor file input, the flow is as follows:
ingestFile()
and use the stream to validate and ingest to OpenSearch in chunksBecause the underlying
FileStream
may be arbitrarily large (indeed, we can add a config to limit the max size in bytes but that limit is set by the user), we cannot store the entire contents in memory. We must process this as astream
. This means there will be no pre-validation step and there's a possibility that only some documents can be ingested. The issue of how to handle these failed records is an implementation detail but we can specify which documents succeeded/failed to ingest in the response body.To accommodate the many types of files, there needs to be a parser dedicated to each type of file called
IFileParser
. The structure of this parser is as follows:Registering Custom file formats
By default, the
DataImporterPlugin
will supply three parsers:.ndjson, .csv, and .json
. Parsers have to be registered in theFileParserService
, which will register theIFileParsers
generated by theDataImporterPlugin
as well as any other plugins. For the latter,DataImporterPluginSetup
will expose a functionregisterFileParser()
to other plugins to register a customIFileParser
(like for example.xlsx, .geojson, .gltf, .biojson
, etc.):POC
A PoC plugin data-importer-plugin is introduced which helps capture this vision. It doesn't implement everything stated in this RFC, but it provides the core feature set outlined in the requirements
The text was updated successfully, but these errors were encountered: