Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDSEQ-1178 - Stronger validation for input filenames #2538

Merged
merged 6 commits into from
Aug 21, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,12 @@ class LocalSampleFileUpload extends React.Component {
Paired files must be labeled with "_R1" or
"_R2" at the end of the basename.
</li>
<li>
File names must be no longer than 120 characters and can only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this text a constant for reuse?

Copy link
Contributor Author

@davidrissato davidrissato Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, the entire text or just this new sentence? Because it seems there are other lines that can be reused as well.
Do you think this is a blocker for this PR or is it a sanitization task that we should consider a specific PR dedicated to remove such redundancies?

Copy link
Contributor

@tfrcarvalho tfrcarvalho Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to do it since it is a new sentence. It is not the problem of redundancy but keeping it in sync.
I see your point but I prefer to start doing it and set an example than keep following a bad pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering this is not only about this sentence, but the entire block of instructions (the previous 2 <li> items are exactly the same), would it make sense to create a new react component with these instructions instead of using a constant?

Copy link
Contributor Author

@davidrissato davidrissato Aug 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tfrcarvalho Do we have any pre-existing pattern that I should follow in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, you got me convinced. Let's just make a low priority jira task to just consolidate this

contain letters from English alphabet (A-Z, upper and lower
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: *the English alphabet ? (and below)

case), numbers (0-9), periods (.), hyphens (-) and underscores
(_). Spaces are not allowed.
</li>
</ul>
</div>
)}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,13 @@ class RemoteSampleFileUpload extends React.Component {
Paired files must be labeled with &quot;_R1&quot; or
&quot;_R2&quot; at the end of the basename.
</li>
<li>
File names must be no longer than 120 characters and can only
contain letters from English alphabet (A-Z, upper and lower
case), numbers (0-9), periods (.), hyphens (-) and underscores
(_). Spaces are not allowed.
</li>
<li>File names must be no longer than 140 characters.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not included above? Any particular reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, actually that line should have been removed, since the previous item already includes it. Thank you for pointing it out.

</ul>
</div>
)}
Expand Down
34 changes: 23 additions & 11 deletions app/helpers/samples_helper.rb
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
require 'open3'
require 'csv'
require 'aws-sdk-s3'

module SamplesHelper
include PipelineOutputsHelper
Expand Down Expand Up @@ -142,14 +143,26 @@ def parsed_samples_for_s3_path(s3_path, project_id, host_genome_id)
default_attributes = { project_id: project_id.to_i,
host_genome_id: host_genome_id,
status: 'created', }
s3_path.chomp!('/')
s3_bucket = 's3://' + s3_path.sub('s3://', '').split('/')[0]
s3_output, _stderr, status = Open3.capture3("aws", "s3", "ls", "--recursive", "#{s3_path}/")
return unless status.exitstatus.zero?
s3_output.chomp!
entries = s3_output.split("\n").reject { |line| line.include? "Undetermined" }
davidrissato marked this conversation as resolved.
Show resolved Hide resolved

parsed_uri = URI.parse(s3_path)
return unless parsed_uri.scheme == "s3"
tfrcarvalho marked this conversation as resolved.
Show resolved Hide resolved

s3_bucket_name = parsed_uri.host
return if s3_bucket_name.nil?

s3_prefix = parsed_uri.path.sub(%r{^/(.*?)/?$}, '\1/')

client = Aws::S3::Client.new
davidrissato marked this conversation as resolved.
Show resolved Hide resolved
begin
entries = client.list_objects_v2(bucket: s3_bucket_name, prefix: s3_prefix).contents.map(&:key)
rescue Aws::S3::Errors::ServiceError => e # Covers all S3 access errors (AccessDenied/NoSuchBucket/AllAccessDisabled)
Rails.logger.info("parsed_samples_for_s3_path Aws::S3::Errors::ServiceError. s3_path: #{s3_path}, error_class: #{e.class.name}, error_message: #{e.message} ")
return
end

samples = {}
entries.each do |file_name|
entries.each do |s3_entry|
file_name = File.basename(s3_entry)
matched_paired = InputFile::BULK_FILE_PAIRED_REGEX.match(file_name)
matched_single = InputFile::BULK_FILE_SINGLE_REGEX.match(file_name)
if matched_paired
Expand All @@ -161,12 +174,11 @@ def parsed_samples_for_s3_path(s3_path, project_id, host_genome_id)
else
next
end
source = matched[0]
name = matched[1].split('/')[-1]
name = matched[1]
samples[name] ||= default_attributes.clone
samples[name][:input_files_attributes] ||= []
samples[name][:input_files_attributes][read_idx] = { name: source.split('/')[-1],
source: "#{s3_bucket}/#{source}",
samples[name][:input_files_attributes][read_idx] = { name: file_name,
source: "s3://#{s3_bucket_name}/#{s3_entry}",
source_type: InputFile::SOURCE_TYPE_S3, }
end

Expand Down
6 changes: 3 additions & 3 deletions app/models/input_file.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ class InputFile < ApplicationRecord
SOURCE_TYPE_S3 = 's3'.freeze
SOURCE_TYPE_BASESPACE = 'basespace'.freeze

FILE_REGEX = %r{\A[^\s\/]+\.(fastq|fq|fastq.gz|fq.gz|fasta|fa|fasta.gz|fa.gz)\z}
BULK_FILE_PAIRED_REGEX = /([^ ]*)_R(\d)(_001)?.(fastq.gz|fq.gz|fastq|fq|fasta.gz|fa.gz|fasta|fa)\z/
BULK_FILE_SINGLE_REGEX = /([^ ]*).(fastq.gz|fq.gz|fastq|fq|fasta.gz|fa.gz|fasta|fa)\z/
FILE_REGEX = /\A[-.A-Za-z0-9_]{1,120}\.(fastq|fq|fastq.gz|fq.gz|fasta|fa|fasta.gz|fa.gz)\z/
BULK_FILE_PAIRED_REGEX = /\A([-.A-Za-z0-9_]{1,120})_R(\d)(_001)?\.(fastq.gz|fq.gz|fastq|fq|fasta.gz|fa.gz|fasta|fa)\z/
BULK_FILE_SINGLE_REGEX = /\A([-.A-Za-z0-9_]{1,120})\.(fastq.gz|fq.gz|fastq|fq|fasta.gz|fa.gz|fasta|fa)\z/
S3_CP_PIPE_ERROR = '[Errno 32] Broken pipe'.freeze

validates :name, presence: true, format: { with: FILE_REGEX, message: "file must match format '#{FILE_REGEX}'" }
Expand Down