Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

558 Multiple disks issue on windows and latest rdflib release. #588

Merged
Merged
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions osp/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,63 @@
__path__ = __import__('pkgutil').extend_path(__path__, __name__)

# Patch RDFLib <= 5.0.0. See osp-core issue https://github.com/simphony/osp-core/issues/558 (the drive letter from the
# path is stripped on Windows by the graph.Graph.serialize method of RDFLib <= 5.0.0).
import sys
if sys.platform == 'win32':
import os
import ctypes
import ctypes.wintypes as wintypes
from urllib.parse import urlparse
import rdflib


def _compare_version_leq(version, other_version):
"""Compares two software version strings.

Receives two software version strings which are just numbers separated by dots and determines whether the first one
is less or equal than the second one.

Args:
version (str): first version string (number separated by dots).
other_version (str) : second version string (number separated by dots).

Returns:
bool: whether the first version string is less or equal than the second one.
"""
version = version.split('.')
other_version = other_version.split('.')
for i in range(0, min(len(version), len(other_version))):
if version[i] < other_version[i]:
return True
elif version[i] > other_version[i]:
return False
else:
if len(other_version) > len(version) and other_version[i + 1] > str(0):
return False
else:
return True


if _compare_version_leq(rdflib.__version__, '5.0.0'):
# Then patch RDFLib with the following decorator.
def graph_serialize_fix_decorator(func):
def graph_serialize(*args, **kwargs):
if kwargs.get('destination') is not None and not hasattr(kwargs.get('destination'), 'write'):
# Bug causing case.
scheme, netloc, path, params, _query, fragment = urlparse(kwargs['destination'])
if urlparse(kwargs['destination']).path.startswith('\\'): # The destination is a windows path.
# Call the win32 API to get the volume ID.
windows_func = ctypes.windll.kernel32.GetVolumeNameForVolumeMountPointW
windows_func.argtypes = wintypes.LPCWSTR, wintypes.LPWSTR, wintypes.DWORD
lpszVolumeMountPoint = wintypes.LPCWSTR(f'{scheme}:\\')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a bit complicated for me. What is this? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that is the same thing that I though, and if you check the evolution of the code, it's what I intended to do but after trying several times I think that it is not really possible both to just wrap the serialize function and to do it this way.

The problem is the way that urllib is handling the windows paths inside the serialize function. The actual path needs to be on the path field of the ParseResult, and it does not happens with the windows path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the ctypes stuff, yeah it looks very complicated also for me, but it was the only way that I found of getting windows paths by volume (which work with urlparse) and also passing the CI. At first I thought of just querying windows itself via the command line, opening a subprocess that returns a the volume ID and capturing the output, but then bandit, the security vulnerability scanner complained about it being potentially insecure (which is true).

Thus, the solution was to query it in some other way. It turns out that python has no easy solution to call the win32 API to get this information. There are modules on pypy but I did not want to have an extra dependency just for this. ctypes on the other hand belongs to the standard library, and lets you call C functions from Python, which is precisely a way to call the win32 API to get the ID of the disk.

lpszVolumeName = ctypes.create_unicode_buffer(50)
cchBufferLength = wintypes.DWORD(50)
windows_func(lpszVolumeMountPoint, lpszVolumeName, cchBufferLength)
# Get a DOS_DEVICE_PATH, not affected by the drive letter stripping bug.
dos_device_path = os.path.join(lpszVolumeName.value.replace('?', '.')
or f'\\\\.\\Volume{{DRIVE_LETTER_{scheme}_NOT_ASSIGNED}}\\',
path)
kwargs['destination'] = dos_device_path
func(*args, **kwargs)
return graph_serialize
rdflib.Graph.serialize = graph_serialize_fix_decorator(rdflib.Graph.serialize)