Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: fail to connect hdfs if set atomic_write_dir #4867

Closed
wcy-fdu opened this issue Jul 8, 2024 · 7 comments · Fixed by #5039
Closed

bug: fail to connect hdfs if set atomic_write_dir #4867

wcy-fdu opened this issue Jul 8, 2024 · 7 comments · Fixed by #5039

Comments

@wcy-fdu
Copy link
Contributor

wcy-fdu commented Jul 8, 2024

error log:

2024-06-27 19:15:26,794 WARN org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedRenameTo: failed to rename /atomic_write_dir/0.bbca8834-a39b-432c-a0b9-72743ba5f218 to /data/cluster_id/0 because destination exists

The audit log shows that the file to be written was created earlier than the first temporary file, causing the temporary file rename to fail.

@meteorgan
Copy link
Contributor

I encountered the same issue when i tried to reproduce this issue (https://github.com/apache/opendal/actions/runs/10486150908/job/29043772467) on my local machine.
The reason should be the destination file is created before rename. the code is here:

let mut f = self

According to https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_rename.28Path_src.2C_Path_d.29, There must not be an existing file at the end of the destination path
and this problem caused many HDFS test cases to fail on my machine. However when I comment out the above code, all the tests pass. Strangely, they all succeed in the Github Action environment.

@Xuanwo
Copy link
Member

Xuanwo commented Aug 22, 2024

Hi, I believe this issue has not been addressed by CI yet.

We will need to:

  • Enable atomic_write_dir: this will write data in a temporary directory first and then rename it to the destination.
  • Ensure the data exists in place.

The HDFS Access::rename checks the destination before calling rename, but the rename inside Access::write doesn't handle it.

@meteorgan
Copy link
Contributor

I'm not sure if i missed something.
Should hdfs-default-with-atomic-write-dir cover this case ? in this job, atomic_write_dir is enabled, and in Access::write the target_path is always created if not exists

@Xuanwo
Copy link
Member

Xuanwo commented Aug 23, 2024

Should hdfs-default-with-atomic-write-dir cover this case ? in this job, atomic_write_dir is enabled, and in Access::write the target_path is always created if not exists

The problematic case is as follows. Users attempted to write x.pdf, and with atomic_write_dir enabled, we first write to tmp/x.pdf and then rename it to x.pdf. However, in some instances, x.pdf already exists, causing this renaming to fail.

if let Some(tmp_path) = &self.tmp_path {
self.client
.rename_file(tmp_path, &self.target_path)
.map_err(new_std_io_error)?;
}

@meteorgan
Copy link
Contributor

yes. I've done some trials on my laptop, when I set

OPENDAL_HDFS_ATOMIC_WRITE_DIR=/tmp/atomic_write_dir/opendal/
OPENDAL_HDFS_ENABLE_APPEND=false

these setting are the same to hdfs-default-with-atomic-write-dir in https://github.com/apache/opendal/blob/0cec8ba8991edf226997df2b429f3fdb0285620d/.github/workflows/service_test_hdfs.yml

and run OPENDAL_TEST=hdfs cargo test behavior --features tests,services-hdfs. many tests all failed:

failures:
    behavior::test_list_dir
    behavior::test_list_prefix
    behavior::test_remove_one_file
    behavior::test_list_dir_with_file_path
    behavior::test_list_dir_with_metakey
    behavior::test_delete_file
    behavior::test_list_dir_with_metakey_complete
    behavior::test_delete_with_special_chars
    behavior::test_list_rich_dir
    behavior::test_delete_stream
    behavior::test_list_nested_dir
    behavior::test_list_dir_with_recursive
    behavior::test_read_full
    behavior::test_list_file_with_recursive
    behavior::test_remove_all
    behavior::test_list_dir_with_recursive_no_trailing_slash
    behavior::test_rename_file
    behavior::test_read_with_special_chars
    behavior::test_read_range
    behavior::test_reader
    behavior::test_rename_target_dir
    behavior::test_rename_self
    behavior::test_rename_nested
    behavior::test_rename_overwrite
    behavior::test_stat_not_cleaned_path
    behavior::test_stat_file
    behavior::test_stat_nested_parent_dir
    behavior::test_stat_with_special_chars
    behavior::test_write_only
    behavior::test_blocking_remove_one_file
    behavior::test_blocking_read_full
    behavior::test_blocking_delete_file
    behavior::test_write_with_special_chars
    behavior::test_blocking_stat_file
    behavior::test_blocking_read_range
    behavior::test_writer_write_with_overwrite
    behavior::test_blocking_stat_with_special_chars
    behavior::test_blocking_write_file
    behavior::test_blocking_write_with_special_chars

so I'm wondering why the hdfs-default-with-atomic-write-dir is successful. Do I misunderstand something ?
since we set atomic_write_dir and don't append, every write operation will use this feature. Additionally, Access:write will create the target file, making this issue easy to reproduce

@Xuanwo
Copy link
Member

Xuanwo commented Aug 23, 2024

Oh, I see. hdfs-default will use local fs instead the hdfs cluster. I believe this is the difference.

@meteorgan
Copy link
Contributor

Oh, Let me try to change it to hdfs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants