-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip indexing issue attachments #31
Comments
But attachments could be Excel, Word documents or other readable format. PDF's are often readable too - it is priceless to have the search render results based on such file content! |
Have you ever looked at an Excel or Word document in ASCII form? It is a binary format so is useless to elasticsearch without some sort of plaint-text conversion. |
In my digging around while trying to resolve the issue where I can't currently get a search result on anything other than a plain text file, it's become apparent that for a file such as a .jpg file the code will place the following into the index: If you decode the pertinent content from the line: using (for example) https://www.base64decode.org/ and passing in: you get back: So, the index will contain information about the file - it's name, size, author etc - but because it's not one of a supported list of file types the whole file content won't actually be loaded into the index. In the plugin file redmine_elasticsearch\app\serializers\attachment_serlializer.rb, the supported list of file extensions is listed as: SUPPORTED_EXTENSIONS = %w{ So, while it will index details about other formats of content (and I believe the 'tika' module is supposed to convert the otherwise unreadable binary content into something which can indeed be indexed - although I'm having problems with that in my environment at the moment) - it wont put large volumes of pointless data into the index. def file The above code snippet from the same .rb file is where files which aren't in the list (and also a list of mime types) will have their content replaced with the string 'unsupported' prior to running the encoding. So - Feel free to correct me if I'm wrong (I'm certainly no expert in this area) but it looks to me like this is not really a relevant point - The plugin already works out what is likely to be usable (or not) and doesn't load up large files of binary data into the index. Assuming I'm correct, I think this item can probably be closed. Please do let me know Kind Regards - Steve |
Ok so the situation is not as bad as I thought, but I'd still prefer to disable indexing of files completely as at least in my case there will be no benefit from indexing attachments. Thanks for adding the details. |
Issue attachments are potentially large and of a format that isn't trivially indexable (like JPG, PDF, etc). No reason to store these in elasticsearch.
The text was updated successfully, but these errors were encountered: