-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use readdir
instead of deprecated readdir_r
to avoid warnings
#430
Conversation
It looks plausible, but due to past concerns about thread safety of Some extra testing may also be in order, but we can push that to a separate issue. Also found this Rust issue, still open: rust-lang/rust#34668 |
When researching this, I got to https://lwn.net/Articles/696475/ which states:
This raises the same concerns as the issue you linked to. On my linux, the man page states:
|
I found this article, which states that Going to look around a bit more. |
// errno is not changed. If an error occurs, NULL is returned and errno | ||
// is set appropriately. To distinguish end of stream and from an error, | ||
// set errno to zero before calling readdir() and then check the value of | ||
// errno if NULL is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for including this comment!
// From the man page | ||
// On success, readdir() returns a pointer to a dirent structure. | ||
// (This structure may be statically allocated; do not attempt to | ||
// free(3) it.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+2
src/unix/lwt_unix_unix.h
Outdated
@@ -1872,7 +1863,7 @@ CAMLprim value lwt_unix_readdir_n_job(value val_dir, value val_count) | |||
long count = Long_val(val_count); | |||
DIR *dir = DIR_Val(val_dir); | |||
|
|||
LWT_UNIX_INIT_JOB(job, readdir_n, dirent_size(dir) * count); | |||
LWT_UNIX_INIT_JOB(job, readdir_n, sizeof(struct dirent) * count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good that this gets rid of the size estimation and whatnot. I believe this is not necessary with readdir
.
For reference on how much of a pain this size estimation was, there is #292. Among other things, it fixed a buffer overflow bug in dbc42f2, where the janky dirent
size computation recommended online could give a number smaller than what macOS could actually write, etc.
See also https://bugs.chromium.org/p/chromium/issues/detail?id=457759.
I sometimes wonder how anything in the world works, with such OS API issues :p Then again, that's why we have 0-days all over the place.
src/unix/lwt_unix_unix.h
Outdated
|
||
/* All is good */ | ||
job->entries[i] = *entry; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is incorrect as a way to copy the filename, because the d_name
field is not specified to have any particular size; on a platform where it is defined as d_name[]
, copying
the structure will not copy the "true" contents of that field. From Linux readdir(3)
:
Warning: applications should avoid any dependence on the size of the
d_name field. POSIX defines it as char d_name[], a character array
of unspecified size, with at most NAME_MAX characters preceding the
terminating null byte ('\0').
I think we need to allocate a string and do strcpy
(or just call strdup
), and replace the flat array of struct dirent
s with an array of pointers to these copied strings. Of course, they have to be freed in the _result
function, or, upon failure, in the worker, if that is more convenient.
Unfortunately, this also means two copies of the string on the Lwt side, one into this array of ours, and one as part of caml_copy_string
. I doubt this is a big deal. It's not necessary, but for extra thoroughness it's good to run the benchmarks linked here; I can also do it on macOS on an updated version of this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost want to usecaml_copy_string
there. It sort of breaks the pattern of “worker
does the C/system part and result
does the conversion.” And on top of that, I'm not sure what happens if you call caml_copy_string
and then abandon those values because of an error. Do you need to free them?
Actually… see other comment.
I see little point in copying the string twice: once in C and then once to convert to OCaml layout. Any opinions about it? Any comments on the paragraph above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head, I think we can't call caml_copy_string
from a fully-preemptive worker thread that does not play nice with OCaml's global runtime lock. Should probably also clarify that in the recent getcwd
docs.
Actually… see other comment.
I hope I didn't miss another comment :)
I see little point in copying the string twice: once in C and then once to convert to OCaml layout. Any opinions about it? Any comments on the paragraph above?
I also see little point, but we may be forced to by the constraints of the runtime system :/ Ultimately, I think the cost of switching to the worker thread and back will tend to dominate the cost of copying bytes around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No other comment. I was worried about something else and then whilst writing the comment I realised that it wasn't a problem.
So replacing Rebasing also is a good idea, but not necessary – I think we will do a squash-merge at the end of this PR, and that should take care of that first commit being there. |
src/unix/lwt_unix_unix.h
Outdated
@@ -1872,7 +1863,7 @@ CAMLprim value lwt_unix_readdir_n_job(value val_dir, value val_count) | |||
long count = Long_val(val_count); | |||
DIR *dir = DIR_Val(val_dir); | |||
|
|||
LWT_UNIX_INIT_JOB(job, readdir_n, dirent_size(dir) * count); | |||
LWT_UNIX_INIT_JOB(job, readdir_n, sizeof(struct dirent) * count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should now probably be sizeof(char*)
.
@@ -1860,7 +1851,7 @@ static value result_readdir_n(struct job_readdir_n *job) | |||
result = caml_alloc(job->count, 0); | |||
long i; | |||
for (i = 0; i < job->count; i++) { | |||
Store_field(result, i, caml_copy_string(job->entries[i].d_name)); | |||
Store_field(result, i, caml_copy_string(job->entries[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also free the entries
that actually got allocated, both if the job fails and if it succeeds. For that, we need a valid job->count
in all cases, but we don't have one in case of an error from readdir
in the worker (because of the hard return
in it, above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the thing I had doubts about. But then, the previous code did not try to deallocate any of the dirent
s. Was there a memory leak before? What's the difference in the code that makes that change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC: the code before this PR allocated a flat array of dirent
s, i.e. if it calculated that a dirent
was 1024 bytes and needed to allocate 32 of them, it would allocate 32KB of space in one block. The strings were written directly into various parts of this single block. We controlled the maximum length of these strings when calling readdir_r
.
The new code inserts a level of indirection, because we don't know how long the strings returned by readdir
will be. So, for reading 32 entries, it is now allocating space for 32 8-byte pointers in one block, and each one of those pointers may then be set to the address of a separately-allocated string (by strdup
). We need to free these separate string blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. That makes sense.
src/unix/lwt_unix_unix.h
Outdated
|
||
/* All is good */ | ||
job->entries[i] = strdup(entry->d_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should check the result of this for NULL
, and probably set a custom error code (ENOMEM
?) and break out of the loop to ensure a valid count of entries previously allocated successfully. See other comments on the count and free
.
@@ -1860,7 +1851,7 @@ static value result_readdir_n(struct job_readdir_n *job) | |||
result = caml_alloc(job->count, 0); | |||
long i; | |||
for (i = 0; i < job->count; i++) { | |||
Store_field(result, i, caml_copy_string(job->entries[i].d_name)); | |||
Store_field(result, i, caml_copy_string(job->entries[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC: the code before this PR allocated a flat array of dirent
s, i.e. if it calculated that a dirent
was 1024 bytes and needed to allocate 32 of them, it would allocate 32KB of space in one block. The strings were written directly into various parts of this single block. We controlled the maximum length of these strings when calling readdir_r
.
The new code inserts a level of indirection, because we don't know how long the strings returned by readdir
will be. So, for reading 32 entries, it is now allocating space for 32 8-byte pointers in one block, and each one of those pointers may then be set to the address of a separately-allocated string (by strdup
). We need to free these separate string blocks.
src/unix/lwt_unix_unix.h
Outdated
/* readdir is good */ | ||
char *name = strdup(entry->d_name); | ||
if (name == NULL) { | ||
job->error_code = errno; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call here, I didn't even know that strdup
actually sets errno
(to ENOMEM
) :p
2. Transmit the number of actually read entries from the `worker` to | ||
the result parser. | ||
See below `worker_readdir_n` and `result_readdir_n` | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely 👍
src/unix/lwt_unix_unix.h
Outdated
job->error_code = result; | ||
if (entry == NULL && errno != 0) { | ||
job->error_code = errno; | ||
job->count = i - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe job->count
should be set to i
: if this fails on the first iteration, i
is 0; 0 entries were read. If it fails on the second iteration, i
is 1; 1 entry was read, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes. The stop condition when freeing is i < job->count
so it should definitely be i
. Well spotted.
I hate that big red X. This is my first time using reviews with Approve/Request changes, the X looks like rejected... |
If the additional data is stored inline in the job struct, it is | ||
deallocated with `lwt_unix_free_job`. If the additional data is for | ||
pointers to additional structure, you must remember to deallocate it | ||
yourself. For an example of this, see `readdir_n`.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks, this is one less task in my local TODO list :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, by inspection this looks good, except for that comment about name
vs. strdup
.
src/unix/lwt_unix_unix.h
Outdated
|
||
/* All is good */ | ||
job->entries[i] = strdup(entry->d_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant to assign name
here, not call strdup
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Let me know if it's all good to go and I can squash the whole thing (or rebase it to look better). |
It should be good now, just waiting for the build. There's no need for you to squash it, unless you want to. I can use GitHub's squash-merge. Let me know if you want me to wait and let you doctor the history instead of me :p |
Thank you! |
Fixed all C warning for unix on my machine. I'll have a look at the windows ones (although I'm not sure how much help I'd be there).
It passes tests but I don't trust myself and because it's C I don't trust the compiler to shout at me when I'm wrong. A second pair of eyes is definitely welcome.
I should also probably rebase to avoid a separate commit for cosmetics.