Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use readdir instead of deprecated readdir_r to avoid warnings #430

Merged
merged 10 commits into from
Jul 4, 2017

Conversation

raphael-proust
Copy link
Collaborator

Fixed all C warning for unix on my machine. I'll have a look at the windows ones (although I'm not sure how much help I'd be there).

It passes tests but I don't trust myself and because it's C I don't trust the compiler to shout at me when I'm wrong. A second pair of eyes is definitely welcome.

I should also probably rebase to avoid a separate commit for cosmetics.

@aantron
Copy link
Collaborator

aantron commented Jul 1, 2017

It looks plausible, but due to past concerns about thread safety of readdir, we should probably read the man pages carefully and look in the source code of the standard libraries of at least Linux, macOS, FreeBSD. Links to Linux and macOS sources can be found here. I can do this later, as part of a thorough review.

Some extra testing may also be in order, but we can push that to a separate issue.

Also found this Rust issue, still open: rust-lang/rust#34668

@raphael-proust
Copy link
Collaborator Author

When researching this, I got to https://lwn.net/Articles/696475/ which states:

  • In the current POSIX.1 specification (POSIX.1-2008), readdir(3) is
    not required to be thread-safe. However, in modern
    implementations (including the glibc implementation), concurrent
    calls to readdir(3) that specify different directory streams are
    thread-safe.

This raises the same concerns as the issue you linked to.


On my linux, the man page states:

       ┌──────────┬───────────────┬──────────────────────────┐
       │Interface │ Attribute     │ Value                    │
       ├──────────┼───────────────┼──────────────────────────┤
       │readdir() │ Thread safety │ MT-Unsafe race:dirstream │
       └──────────┴───────────────┴──────────────────────────┘

       In the current POSIX.1 specification (POSIX.1-2008), readdir() is not required to be thread-safe.  However, in modern implementations (including the glibc implementation), concurrent calls to readdir() that specify
       different directory streams are thread-safe.  In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated
       readdir_r(3) function.  It is expected that a future version of POSIX.1 will require that readdir() be thread-safe when concurrently employed on different directory streams.

@aantron
Copy link
Collaborator

aantron commented Jul 1, 2017

I found this article, which states that readdir is as thread-safe as we need it to be. It's from 2012, which is good – I was feeling uneasy that readdir might have only become thread-safe recently on some odd system.

Going to look around a bit more.

// errno is not changed. If an error occurs, NULL is returned and errno
// is set appropriately. To distinguish end of stream and from an error,
// set errno to zero before calling readdir() and then check the value of
// errno if NULL is returned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for including this comment!

// From the man page
// On success, readdir() returns a pointer to a dirent structure.
// (This structure may be statically allocated; do not attempt to
// free(3) it.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+2

@@ -1872,7 +1863,7 @@ CAMLprim value lwt_unix_readdir_n_job(value val_dir, value val_count)
long count = Long_val(val_count);
DIR *dir = DIR_Val(val_dir);

LWT_UNIX_INIT_JOB(job, readdir_n, dirent_size(dir) * count);
LWT_UNIX_INIT_JOB(job, readdir_n, sizeof(struct dirent) * count);
Copy link
Collaborator

@aantron aantron Jul 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that this gets rid of the size estimation and whatnot. I believe this is not necessary with readdir.

For reference on how much of a pain this size estimation was, there is #292. Among other things, it fixed a buffer overflow bug in dbc42f2, where the janky dirent size computation recommended online could give a number smaller than what macOS could actually write, etc.

See also https://bugs.chromium.org/p/chromium/issues/detail?id=457759.

I sometimes wonder how anything in the world works, with such OS API issues :p Then again, that's why we have 0-days all over the place.


/* All is good */
job->entries[i] = *entry;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect as a way to copy the filename, because the d_name field is not specified to have any particular size; on a platform where it is defined as d_name[], copying
the structure will not copy the "true" contents of that field. From Linux readdir(3):

Warning: applications should avoid any dependence on the size of the
d_name field. POSIX defines it as char d_name[], a character array
of unspecified size, with at most NAME_MAX characters preceding the
terminating null byte ('\0').

I think we need to allocate a string and do strcpy (or just call strdup), and replace the flat array of struct dirents with an array of pointers to these copied strings. Of course, they have to be freed in the _result function, or, upon failure, in the worker, if that is more convenient.

Unfortunately, this also means two copies of the string on the Lwt side, one into this array of ours, and one as part of caml_copy_string. I doubt this is a big deal. It's not necessary, but for extra thoroughness it's good to run the benchmarks linked here; I can also do it on macOS on an updated version of this code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost want to usecaml_copy_string there. It sort of breaks the pattern of “worker does the C/system part and result does the conversion.” And on top of that, I'm not sure what happens if you call caml_copy_string and then abandon those values because of an error. Do you need to free them?

Actually… see other comment.

I see little point in copying the string twice: once in C and then once to convert to OCaml layout. Any opinions about it? Any comments on the paragraph above?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off the top of my head, I think we can't call caml_copy_string from a fully-preemptive worker thread that does not play nice with OCaml's global runtime lock. Should probably also clarify that in the recent getcwd docs.

Actually… see other comment.

I hope I didn't miss another comment :)

I see little point in copying the string twice: once in C and then once to convert to OCaml layout. Any opinions about it? Any comments on the paragraph above?

I also see little point, but we may be forced to by the constraints of the runtime system :/ Ultimately, I think the cost of switching to the worker thread and back will tend to dominate the cost of copying bytes around.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No other comment. I was worried about something else and then whilst writing the comment I realised that it wasn't a problem.

@aantron
Copy link
Collaborator

aantron commented Jul 2, 2017

So replacing readdir_r by readdir seems like the right way on all major Unix platforms (Linux, macOS, FreeBSD, Android, iOS). See inline comments for small remaining issues.

Rebasing also is a good idea, but not necessary – I think we will do a squash-merge at the end of this PR, and that should take care of that first commit being there.

@@ -1872,7 +1863,7 @@ CAMLprim value lwt_unix_readdir_n_job(value val_dir, value val_count)
long count = Long_val(val_count);
DIR *dir = DIR_Val(val_dir);

LWT_UNIX_INIT_JOB(job, readdir_n, dirent_size(dir) * count);
LWT_UNIX_INIT_JOB(job, readdir_n, sizeof(struct dirent) * count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should now probably be sizeof(char*).

@@ -1860,7 +1851,7 @@ static value result_readdir_n(struct job_readdir_n *job)
result = caml_alloc(job->count, 0);
long i;
for (i = 0; i < job->count; i++) {
Store_field(result, i, caml_copy_string(job->entries[i].d_name));
Store_field(result, i, caml_copy_string(job->entries[i]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also free the entries that actually got allocated, both if the job fails and if it succeeds. For that, we need a valid job->count in all cases, but we don't have one in case of an error from readdir in the worker (because of the hard return in it, above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing I had doubts about. But then, the previous code did not try to deallocate any of the dirents. Was there a memory leak before? What's the difference in the code that makes that change necessary?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC: the code before this PR allocated a flat array of dirents, i.e. if it calculated that a dirent was 1024 bytes and needed to allocate 32 of them, it would allocate 32KB of space in one block. The strings were written directly into various parts of this single block. We controlled the maximum length of these strings when calling readdir_r.

The new code inserts a level of indirection, because we don't know how long the strings returned by readdir will be. So, for reading 32 entries, it is now allocating space for 32 8-byte pointers in one block, and each one of those pointers may then be set to the address of a separately-allocated string (by strdup). We need to free these separate string blocks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. That makes sense.


/* All is good */
job->entries[i] = strdup(entry->d_name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check the result of this for NULL, and probably set a custom error code (ENOMEM?) and break out of the loop to ensure a valid count of entries previously allocated successfully. See other comments on the count and free.

@@ -1860,7 +1851,7 @@ static value result_readdir_n(struct job_readdir_n *job)
result = caml_alloc(job->count, 0);
long i;
for (i = 0; i < job->count; i++) {
Store_field(result, i, caml_copy_string(job->entries[i].d_name));
Store_field(result, i, caml_copy_string(job->entries[i]));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC: the code before this PR allocated a flat array of dirents, i.e. if it calculated that a dirent was 1024 bytes and needed to allocate 32 of them, it would allocate 32KB of space in one block. The strings were written directly into various parts of this single block. We controlled the maximum length of these strings when calling readdir_r.

The new code inserts a level of indirection, because we don't know how long the strings returned by readdir will be. So, for reading 32 entries, it is now allocating space for 32 8-byte pointers in one block, and each one of those pointers may then be set to the address of a separately-allocated string (by strdup). We need to free these separate string blocks.

/* readdir is good */
char *name = strdup(entry->d_name);
if (name == NULL) {
job->error_code = errno;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call here, I didn't even know that strdup actually sets errno (to ENOMEM) :p

2. Transmit the number of actually read entries from the `worker` to
the result parser.
See below `worker_readdir_n` and `result_readdir_n`
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely 👍

job->error_code = result;
if (entry == NULL && errno != 0) {
job->error_code = errno;
job->count = i - 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe job->count should be set to i: if this fails on the first iteration, i is 0; 0 entries were read. If it fails on the second iteration, i is 1; 1 entry was read, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. The stop condition when freeing is i < job->count so it should definitely be i. Well spotted.

@aantron
Copy link
Collaborator

aantron commented Jul 4, 2017

I hate that big red X. This is my first time using reviews with Approve/Request changes, the X looks like rejected...

If the additional data is stored inline in the job struct, it is
deallocated with `lwt_unix_free_job`. If the additional data is for
pointers to additional structure, you must remember to deallocate it
yourself. For an example of this, see `readdir_n`.*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thanks, this is one less task in my local TODO list :)

Copy link
Collaborator

@aantron aantron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, by inspection this looks good, except for that comment about name vs. strdup.


/* All is good */
job->entries[i] = strdup(entry->d_name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to assign name here, not call strdup again.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@raphael-proust
Copy link
Collaborator Author

Let me know if it's all good to go and I can squash the whole thing (or rebase it to look better).

@aantron
Copy link
Collaborator

aantron commented Jul 4, 2017

It should be good now, just waiting for the build. There's no need for you to squash it, unless you want to. I can use GitHub's squash-merge. Let me know if you want me to wait and let you doctor the history instead of me :p

@aantron aantron merged commit 0df64ee into ocsigen:master Jul 4, 2017
@aantron
Copy link
Collaborator

aantron commented Jul 4, 2017

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants