-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] FileSplitter with FTP upload support #17549
Conversation
lib/manageiq/util/file_splitter.rb
Outdated
def initialize(options = {}) | ||
@input_file = options[:input_file] || ARGF | ||
@input_filename = options[:input_filename] | ||
@byte_count = options[:byte_count] || 10_485_760 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This default is 10.megabytes
, but just expressed as the integer directly (avoids needing the ActiveSupport
dependency when run as a script).
I am open to a better default (specifically for pg_dump
outputs and backups), but I just went with something here that could be reasonable to deal with in the test suite, since I am testing this on the filesystem directly with 10MB temp files.
8dbd414
to
0abd860
Compare
Is there a reason this all wasn't done as an enhancement to If you need a separate script I would think that script should use an enhanced |
I guess I didn't mention this in the PR description, but the first reason I made this a script was because That is worth noting because I want to avoid having to have a tmp copy of the dump on the host machine for this script incase the DB is massive and there is limited disk space on the host running the dump. This is a big factor in every one of the other decisions made (regardless of how smart or dumb going down this path may be) This lead to a bunch of smaller decisions:
That said, I think not sharing code between this and For reference, the usage when called out to via a shell out is going to be something like: $ pg_dump -h example.com -u root ... vmdb_production | file_splitter.rb -b 512M --ftp-host foo.com ... |
e5ca2b4
to
53a1bf0
Compare
6e3b8e9
to
a6a46ec
Compare
a6a46ec
to
b0b08ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"k" => KILOBYTE, | ||
"m" => MEGABYTE, | ||
"g" => GIGABYTE | ||
}.freeze |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
KILOBYTE = 1024 | ||
MEGABYTE = KILOBYTE * 1024 | ||
GIGABYTE = MEGABYTE * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splits files based on a given size, similar to `split`. Uses number based filename increments instead of letter based (which is the default in `split`)
A library for shared Net::FTP functions that can be used throughout the ManageIQ project. Extracted from FileDepotFtp.
Allows for sending both anonymous and authorized FTP uploads directly from the script. Splits are done in memory, so the file never is created on the host machine, which is ideal for large backups and such that are being split up. Spec for testing this functionality are done via a live FTP (vsftpd) server setup with the following config: listen=NO listen_ipv6=YES local_enable=YES local_umask=022 write_enable=YES connect_from_port_20=YES anonymous_enable=YES anon_root=/var/ftp/pub anon_umask=022 anon_upload_enable=YES anon_mkdir_write_enable=YES anon_other_write_enable=YES pam_service_name=vsftpd userlist_enable=YES userlist_deny=NO tcp_wrappers=YES Running on the 192.168.50.3 IP address. Full vagrant setup for the above can be found here: https://gist.github.com/NickLaMuro/c883a997c7ae943dd684bccd469cea43 Put the `Vagrantfile` into a directory and run: $ vagrant up (with `vagrant` installed on that machine, of course) The specs will only run if the user specifically targets the tests using a tag flag: $ bundle exec rspec --tag with_real_ftp
b0b08ff
to
fabf763
Compare
Checked commits NickLaMuro/manageiq@65d9ca0~...fabf763 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 lib/manageiq/util/file_splitter.rb
|
Might change how we are handling file splitting, so setting this as See #17798 (comment) for some further details. |
Closing this for now. I think I have ripped out the parts that I do need for this, so no need to leave it open. Main rational can be found here: #17798 (comment) |
Built off of this reference code: buf_left = byte_count while buf_left.positive? cur_readsize = buf_left - FTP_CHUNKSIZE >= 0 ? FTP_CHUNKSIZE : buf_left buf = input_file.read(cur_readsize) break if buf == nil # rubocop:disable Style/NilComparison (from original) conn.write(buf) buf_left -= FTP_CHUNKSIZE end From: ManageIQ/manageiq#17549 This basically allows a shared form for uploading the source_input to a file IO (whether that IO is a socket or or some other form of IO). The method has been split up from the reference code since there are some places where the loop itself might not be needed, but the code to deliver a chunk at a time might be (see next commit for an example of this using `MiqSwiftStorage`). Also, since `read_single_chunk` is designed to be called on it's own, we want to be able to eject with an empty string from that method itself. This basically re-arranges some code (from the reference code) to allow that, but maintain the same amount of conditional logic per chunk read that was there previously.
Why?
Well, this is first off related to this BZ and eventually this BZ. Since split drastically differs on BSD/UNIX systems, trying to work with the limited set of flags they do share was a bit of a pain and not terribly flexible.
More over, the latter BZ would eventually require that we take the generated files and upload them to FTP, and ideally it would be best if we could avoid putting the split up files on the local server first before uploading. This is possible in "GNU
split
land" with the--filter
flag via a nestedcurl
command, but not so in BSD. Since Ruby has all of the facilities to handle this in it's standard lib, and it is a common denominator on all appliances, this seemed like the most straight forward and easily testable route (despite being an exercise in "re-inventing the wheel").Why here?
Well, I figured the rake tasks that would be triggered by the
appliance_console
would be the ones in charge of determining if file splitting was necessary, so instead of moving this code intomanageiq-gems-pending
, I figured it made more sense to keep it close to where it was being used. Some support code will still be necessary to allowpipes
to work withPostgresAdmin
and such (see ManageIQ/awesome_spawn#41 for a start...), but those will come later.Why FTP (right now)?
Without it, it made less sense that I spent the time re-writing
split
, and including provides a better bit of context to why this is necessary, and how the pieces will eventually work together.Links
Testing/QA
I have been testing this by exercising the code in #17652, so see that PR for details.