You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Ubuntu 20.04 on both servers
python 3.8.10 on both servers
MS modules on dev
azure-core 1.28.0
azure-storage-blob 12.17.0
azure-storage-file-datalake 12.12.0
MS modules on prod
azure-core 1.25.0
azure-storage-blob 12.14.0b2
azure-storage-file-datalake 12.9.0b1
Background
we have a linux prod server where we upload file from into Azure Datalake using python scripts.
Have the Microsoft module azure-storage-file-datalake installed, which pulled in azure-core, azure-storage-blob.
Our scripts use a adls class where we wrapped our function around MS functions. Our code for upload check
if a file already exist and if the size is different. We used MS function get_file_properties for that,
and then upload the file.
Sometime during fall 2022 our script failed during the upload and it turned out that our function using MS function get_file_properties
timed out after 3+ minutes. Could not get any info why that started happening. Ended up rewriting our code for checking
if file exists and getting file size with a function that us MS get_paths, loop through all the files until found (or not)
and return data about that file.
Problem
This year we needed to test (from the prod server) towards our Azure dev Datalake. Discovered that the function with get_paths
did not work towards adls dev, but MS function get_file_properties did.
Same code, same MS module versions.
Was able to get a dev linux server, install our software, installed the MS modules. On dev server MS function get_file_properties worked.
Noticed that we had different version of the MS modules. Wrote a handful of test scripts that check towards adls prod and dev.
Tested functions for
get meta data for a file, and prinf file size.
get meta data for a folder, print last_modified
(used MS function get_file_properties)
list folders in a folder
list files in a folder
(used MS get_paths, check for file or dir, return object)
upload a file, this includes check if parent exist, if file exist, and check size.
download a file, include checking if file exist.
These test script was run against adls prod and dev (2 different file systems)
All tests ran successfully on the linux dev server.
We assumed the difference in the MS module version was the reason for our problem on the linux prod server.
Since all tests was successful on linux dev, we upgraded our linux prod. Installed the same test script,
upgraded MS modules to the same version.
Test scripts failed on the linux prod towards all adls dev.
Test script to upload files to adls prod worked, after we changed the part of the code that use MS function get_file_properties.
Test to download from adls prod failed.
we ended up rolling back our software, rolling back the MS module versions on linux prod.
How do we troubleshoot this ?
works on one linux with same OS, same python, same tokens for adls prod and dev.
The text was updated successfully, but these errors were encountered:
went back to linux prod server, and installed the latest azure python module versions.
instead of just a generic timeout msg I get
Connection to hpimdpdatalake.blob.core.windows.net timed out. (connect timeout=20
that talks about the python module use the dfs rest api when pointing to dfs, but for get_file_properties it uses the blob rest api.
Also mentions to try function get_access_control (uses the dfs rest api), so I wrote a new test script, and confirmed that get_access_control does not timeout for dfs (on linux prod server).
I will work with our IT support to see why the prod server seem to be blocked where linux dev is not
Environment
Ubuntu 20.04 on both servers
python 3.8.10 on both servers
MS modules on dev
azure-core 1.28.0
azure-storage-blob 12.17.0
azure-storage-file-datalake 12.12.0
MS modules on prod
azure-core 1.25.0
azure-storage-blob 12.14.0b2
azure-storage-file-datalake 12.9.0b1
Background
we have a linux prod server where we upload file from into Azure Datalake using python scripts.
Have the Microsoft module azure-storage-file-datalake installed, which pulled in azure-core, azure-storage-blob.
Our scripts use a adls class where we wrapped our function around MS functions. Our code for upload check
if a file already exist and if the size is different. We used MS function get_file_properties for that,
and then upload the file.
Sometime during fall 2022 our script failed during the upload and it turned out that our function using MS function get_file_properties
timed out after 3+ minutes. Could not get any info why that started happening. Ended up rewriting our code for checking
if file exists and getting file size with a function that us MS get_paths, loop through all the files until found (or not)
and return data about that file.
Problem
This year we needed to test (from the prod server) towards our Azure dev Datalake. Discovered that the function with get_paths
did not work towards adls dev, but MS function get_file_properties did.
Same code, same MS module versions.
Was able to get a dev linux server, install our software, installed the MS modules. On dev server MS function get_file_properties worked.
Noticed that we had different version of the MS modules. Wrote a handful of test scripts that check towards adls prod and dev.
Tested functions for
get meta data for a file, and prinf file size.
get meta data for a folder, print last_modified
(used MS function get_file_properties)
list folders in a folder
list files in a folder
(used MS get_paths, check for file or dir, return object)
upload a file, this includes check if parent exist, if file exist, and check size.
download a file, include checking if file exist.
These test script was run against adls prod and dev (2 different file systems)
All tests ran successfully on the linux dev server.
We assumed the difference in the MS module version was the reason for our problem on the linux prod server.
Since all tests was successful on linux dev, we upgraded our linux prod. Installed the same test script,
upgraded MS modules to the same version.
Test scripts failed on the linux prod towards all adls dev.
Test script to upload files to adls prod worked, after we changed the part of the code that use MS function get_file_properties.
Test to download from adls prod failed.
we ended up rolling back our software, rolling back the MS module versions on linux prod.
How do we troubleshoot this ?
works on one linux with same OS, same python, same tokens for adls prod and dev.
The text was updated successfully, but these errors were encountered: