Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom document Id selector when using client.BulkAll(...) #353

Open
david-alpert-nl opened this issue Sep 6, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@david-alpert-nl
Copy link

Is your feature request related to a problem?

I am using the client.BulkAll(...) method to index a collection of entries using the bulk helpers in that method to be more efficient than looping through a collection of 7000 documents and creating a unique index request per document.

While doing so I seem unable to customize the documentId.

What solution would you like?

Ideally I would be able to pass a DocumentIdSelector function like so:

            var bulkOperation = OpenSearchClient.BulkAll(entriesToIndex.Entries, req => req
                .Index(indexName)
                .DocumentId(e => e.MyCustomField)
                .BackOffRetries(2)
                .BackOffTime("30s")
                .MaxDegreeOfParallelism(4)
                .Size(100)
                .BulkResponseCallback(resultForOneChunk => { })

where e => e.MyCustomField) would be run on each document before it gets indexed and used to set the _id property for that document.

Do you have any additional context?

We use a specific internal identifier for each document when inserting them one at a time. By deriving that document id from the object in a deterministic way we can avoid storing generated ids per document and instead derive them whenever we need to use them (e.g. for delete requests).

@david-alpert-nl david-alpert-nl added enhancement New feature or request untriaged labels Sep 6, 2023
@david-alpert-nl
Copy link
Author

it looks like this would require a new property on IBulkAllRequest<T>> to support chaining a new method and attaching that selector Func<T, string> value to the request, then invoking it closer to where the actual index request gets created.

if I am on the right track here I might be able to submit a PR for consideration.

@Xtansia Xtansia removed the untriaged label Sep 6, 2023
@Xtansia
Copy link
Collaborator

Xtansia commented Sep 6, 2023

@david-alpert-nl In this case BulkAll uses the automatic ID inference, this is generally unchanged since the fork from Elasticsearch so https://www.elastic.co/guide/en/elasticsearch/client/net-api/7.16/ids-inference.html should be mostly valid.

  1. It will look for a property with the name Id on the document type:
public class Person
{
    public string Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
}
  1. You can configure the default mapping on the ConnectionSettings if you want to use a different property:
var settings = new ConnectionSettings()
            .DefaultMappingFor<Person>(m => m.IdProperty(p => p.FirstName));
  1. You can use the [OpenSearchType] attribute to specify the property:
[OpenSearchType(IdProperty = nameof(LastName))]
public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

However if you really need to set the IDs just for the bulk request you can override the default bulk operation like so:

var bulk = client.BulkAll(
            docs,
            d => d.BufferToBulk((b, buffer) =>
                b.IndexMany(buffer, (i, doc) => i
                    .Id(doc.FirstName)
                )
            )
        );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants