Use `readdir` instead of deprecated `readdir_r` to avoid warnings #430

raphael-proust · 2017-07-01T18:27:53Z

Fixed all C warning for unix on my machine. I'll have a look at the windows ones (although I'm not sure how much help I'd be there).

It passes tests but I don't trust myself and because it's C I don't trust the compiler to shout at me when I'm wrong. A second pair of eyes is definitely welcome.

I should also probably rebase to avoid a separate commit for cosmetics.

Fixes ocsigen#426

aantron · 2017-07-01T18:53:24Z

It looks plausible, but due to past concerns about thread safety of readdir, we should probably read the man pages carefully and look in the source code of the standard libraries of at least Linux, macOS, FreeBSD. Links to Linux and macOS sources can be found here. I can do this later, as part of a thorough review.

Some extra testing may also be in order, but we can push that to a separate issue.

Also found this Rust issue, still open: rust-lang/rust#34668

raphael-proust · 2017-07-01T21:12:05Z

When researching this, I got to https://lwn.net/Articles/696475/ which states:

In the current POSIX.1 specification (POSIX.1-2008), readdir(3) is
not required to be thread-safe. However, in modern
implementations (including the glibc implementation), concurrent
calls to readdir(3) that specify different directory streams are
thread-safe.

This raises the same concerns as the issue you linked to.

On my linux, the man page states:

       ┌──────────┬───────────────┬──────────────────────────┐
       │Interface │ Attribute     │ Value                    │
       ├──────────┼───────────────┼──────────────────────────┤
       │readdir() │ Thread safety │ MT-Unsafe race:dirstream │
       └──────────┴───────────────┴──────────────────────────┘

       In the current POSIX.1 specification (POSIX.1-2008), readdir() is not required to be thread-safe.  However, in modern implementations (including the glibc implementation), concurrent calls to readdir() that specify
       different directory streams are thread-safe.  In cases where multiple threads must read from the same directory stream, using readdir() with external synchronization is still preferable to the use of the deprecated
       readdir_r(3) function.  It is expected that a future version of POSIX.1 will require that readdir() be thread-safe when concurrently employed on different directory streams.

aantron · 2017-07-01T22:45:07Z

I found this article, which states that readdir is as thread-safe as we need it to be. It's from 2012, which is good – I was feeling uneasy that readdir might have only become thread-safe recently on some odd system.

Going to look around a bit more.

aantron · 2017-07-02T17:10:38Z

src/unix/lwt_unix_unix.h

+    // errno is not changed. If an error occurs, NULL is returned and errno
+    // is set appropriately. To distinguish end of stream and from an error,
+    // set errno to zero before calling readdir() and then check the value of
+    // errno if NULL is returned.


+1 for including this comment!

aantron · 2017-07-02T17:12:52Z

src/unix/lwt_unix_unix.h

+        // From the man page
+        // On success, readdir() returns a pointer to a dirent structure.
+        // (This structure may be statically allocated; do not attempt to
+        // free(3) it.)


aantron · 2017-07-02T17:35:56Z

src/unix/lwt_unix_unix.h

@@ -1872,7 +1863,7 @@ CAMLprim value lwt_unix_readdir_n_job(value val_dir, value val_count)
    long count = Long_val(val_count);
    DIR *dir = DIR_Val(val_dir);

-    LWT_UNIX_INIT_JOB(job, readdir_n, dirent_size(dir) * count);
+    LWT_UNIX_INIT_JOB(job, readdir_n, sizeof(struct dirent) * count);


It's good that this gets rid of the size estimation and whatnot. I believe this is not necessary with readdir.

For reference on how much of a pain this size estimation was, there is #292. Among other things, it fixed a buffer overflow bug in dbc42f2, where the janky dirent size computation recommended online could give a number smaller than what macOS could actually write, etc.

See also https://bugs.chromium.org/p/chromium/issues/detail?id=457759.

I sometimes wonder how anything in the world works, with such OS API issues :p Then again, that's why we have 0-days all over the place.

aantron · 2017-07-02T17:41:23Z

src/unix/lwt_unix_unix.h


+        /* All is good */
+        job->entries[i] = *entry;


I think this is incorrect as a way to copy the filename, because the d_name field is not specified to have any particular size; on a platform where it is defined as d_name[], copying
the structure will not copy the "true" contents of that field. From Linux readdir(3):

Warning: applications should avoid any dependence on the size of the
d_name field. POSIX defines it as char d_name[], a character array
of unspecified size, with at most NAME_MAX characters preceding the
terminating null byte ('\0').

I think we need to allocate a string and do strcpy (or just call strdup), and replace the flat array of struct dirents with an array of pointers to these copied strings. Of course, they have to be freed in the _result function, or, upon failure, in the worker, if that is more convenient.

Unfortunately, this also means two copies of the string on the Lwt side, one into this array of ours, and one as part of caml_copy_string. I doubt this is a big deal. It's not necessary, but for extra thoroughness it's good to run the benchmarks linked here; I can also do it on macOS on an updated version of this code.

I almost want to usecaml_copy_string there. It sort of breaks the pattern of “worker does the C/system part and result does the conversion.” And on top of that, I'm not sure what happens if you call caml_copy_string and then abandon those values because of an error. Do you need to free them?

Actually… see other comment.

I see little point in copying the string twice: once in C and then once to convert to OCaml layout. Any opinions about it? Any comments on the paragraph above?

Off the top of my head, I think we can't call caml_copy_string from a fully-preemptive worker thread that does not play nice with OCaml's global runtime lock. Should probably also clarify that in the recent getcwd docs.

Actually… see other comment.

I hope I didn't miss another comment :)

I see little point in copying the string twice: once in C and then once to convert to OCaml layout. Any opinions about it? Any comments on the paragraph above?

I also see little point, but we may be forced to by the constraints of the runtime system :/ Ultimately, I think the cost of switching to the worker thread and back will tend to dominate the cost of copying bytes around.

No other comment. I was worried about something else and then whilst writing the comment I realised that it wasn't a problem.

aantron · 2017-07-02T17:48:45Z

So replacing readdir_r by readdir seems like the right way on all major Unix platforms (Linux, macOS, FreeBSD, Android, iOS). See inline comments for small remaining issues.

Rebasing also is a good idea, but not necessary – I think we will do a squash-merge at the end of this PR, and that should take care of that first commit being there.

aantron · 2017-07-03T22:12:00Z

src/unix/lwt_unix_unix.h

@@ -1872,7 +1863,7 @@ CAMLprim value lwt_unix_readdir_n_job(value val_dir, value val_count)
    long count = Long_val(val_count);
    DIR *dir = DIR_Val(val_dir);

-    LWT_UNIX_INIT_JOB(job, readdir_n, dirent_size(dir) * count);
+    LWT_UNIX_INIT_JOB(job, readdir_n, sizeof(struct dirent) * count);


This should now probably be sizeof(char*).

aantron · 2017-07-03T22:13:57Z

src/unix/lwt_unix_unix.h

@@ -1860,7 +1851,7 @@ static value result_readdir_n(struct job_readdir_n *job)
        result = caml_alloc(job->count, 0);
        long i;
        for (i = 0; i < job->count; i++) {
-            Store_field(result, i, caml_copy_string(job->entries[i].d_name));
+            Store_field(result, i, caml_copy_string(job->entries[i]));


We should also free the entries that actually got allocated, both if the job fails and if it succeeds. For that, we need a valid job->count in all cases, but we don't have one in case of an error from readdir in the worker (because of the hard return in it, above).

That's the thing I had doubts about. But then, the previous code did not try to deallocate any of the dirents. Was there a memory leak before? What's the difference in the code that makes that change necessary?

IIRC: the code before this PR allocated a flat array of dirents, i.e. if it calculated that a dirent was 1024 bytes and needed to allocate 32 of them, it would allocate 32KB of space in one block. The strings were written directly into various parts of this single block. We controlled the maximum length of these strings when calling readdir_r.

The new code inserts a level of indirection, because we don't know how long the strings returned by readdir will be. So, for reading 32 entries, it is now allocating space for 32 8-byte pointers in one block, and each one of those pointers may then be set to the address of a separately-allocated string (by strdup). We need to free these separate string blocks.

Oh. That makes sense.

aantron · 2017-07-03T22:14:54Z

src/unix/lwt_unix_unix.h


+        /* All is good */
+        job->entries[i] = strdup(entry->d_name);


Should check the result of this for NULL, and probably set a custom error code (ENOMEM?) and break out of the loop to ensure a valid count of entries previously allocated successfully. See other comments on the count and free.

aantron · 2017-07-04T16:47:32Z

src/unix/lwt_unix_unix.h

@@ -1860,7 +1851,7 @@ static value result_readdir_n(struct job_readdir_n *job)
        result = caml_alloc(job->count, 0);
        long i;
        for (i = 0; i < job->count; i++) {
-            Store_field(result, i, caml_copy_string(job->entries[i].d_name));
+            Store_field(result, i, caml_copy_string(job->entries[i]));


IIRC: the code before this PR allocated a flat array of dirents, i.e. if it calculated that a dirent was 1024 bytes and needed to allocate 32 of them, it would allocate 32KB of space in one block. The strings were written directly into various parts of this single block. We controlled the maximum length of these strings when calling readdir_r.

The new code inserts a level of indirection, because we don't know how long the strings returned by readdir will be. So, for reading 32 entries, it is now allocating space for 32 8-byte pointers in one block, and each one of those pointers may then be set to the address of a separately-allocated string (by strdup). We need to free these separate string blocks.

aantron · 2017-07-04T16:52:20Z

src/unix/lwt_unix_unix.h

+        /* readdir is good */
+        char *name = strdup(entry->d_name);
+        if (name == NULL) {
+            job->error_code = errno;


Good call here, I didn't even know that strdup actually sets errno (to ENOMEM) :p

aantron · 2017-07-04T16:54:49Z

src/unix/lwt_unix_unix.h

+       2. Transmit the number of actually read entries from the `worker` to
+          the result parser.
+          See below `worker_readdir_n` and `result_readdir_n`
+    */


Definitely 👍

aantron · 2017-07-04T16:58:27Z

src/unix/lwt_unix_unix.h

-            job->error_code = result;
+        if (entry == NULL && errno != 0) {
+            job->error_code = errno;
+            job->count = i - 1;


I believe job->count should be set to i: if this fails on the first iteration, i is 0; 0 entries were read. If it fails on the second iteration, i is 1; 1 entry was read, etc.

Oh yes. The stop condition when freeing is i < job->count so it should definitely be i. Well spotted.

aantron · 2017-07-04T16:59:29Z

I hate that big red X. This is my first time using reviews with Approve/Request changes, the X looks like rejected...

aantron · 2017-07-04T19:33:13Z

src/unix/lwt_unix_unix.h

+       If the additional data is stored inline in the job struct, it is
+       deallocated with `lwt_unix_free_job`. If the additional data is for
+       pointers to additional structure, you must remember to deallocate it
+       yourself. For an example of this, see `readdir_n`.*/


👍 Thanks, this is one less task in my local TODO list :)

aantron

Ok, by inspection this looks good, except for that comment about name vs. strdup.

aantron · 2017-07-04T19:36:52Z

src/unix/lwt_unix_unix.h


+        /* All is good */
+        job->entries[i] = strdup(entry->d_name);


I think you meant to assign name here, not call strdup again.

Good catch!

raphael-proust · 2017-07-04T20:54:55Z

Let me know if it's all good to go and I can squash the whole thing (or rebase it to look better).

aantron · 2017-07-04T21:00:30Z

It should be good now, just waiting for the build. There's no need for you to squash it, unless you want to. I can use GitHub's squash-merge. Let me know if you want me to wait and let you doctor the history instead of me :p

aantron · 2017-07-04T23:14:30Z

Thank you!

raphael-proust added 5 commits July 1, 2017 17:29

Remove unneeded macros GET_INET

33acfd0

Fixes ocsigen#426

Using readdir instead of deprecated readdir_r

e521527

Replace deprecated readdir_r by readdir

12ececc

Remove support function for removed readdir_r

69cc188

clang-format recently modified C file

e61777e

aantron requested changes Jul 2, 2017

View reviewed changes

Copy directory names from dirent in readdir_n

4ba200a

aantron requested changes Jul 3, 2017

View reviewed changes

raphael-proust added 2 commits July 4, 2017 07:45

More thorough cleanup of memory in readdir

0dd40d8

Comment readdir_n's double use of count

b7214cf

aantron requested changes Jul 4, 2017

View reviewed changes

aantron reviewed Jul 4, 2017

View reviewed changes

Free all read dirnames in readdir_n

8ba3489

aantron requested changes Jul 4, 2017

View reviewed changes

Assign the result of srtdup not another call

4d19c3f

aantron approved these changes Jul 4, 2017

View reviewed changes

aantron merged commit 0df64ee into ocsigen:master Jul 4, 2017

aantron mentioned this pull request Jul 19, 2017

Add Lwt 3.1.0 – concurrency library ocaml/opam-repository#9876

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `readdir` instead of deprecated `readdir_r` to avoid warnings #430

Use `readdir` instead of deprecated `readdir_r` to avoid warnings #430

raphael-proust commented Jul 1, 2017

aantron commented Jul 1, 2017

raphael-proust commented Jul 1, 2017

aantron commented Jul 1, 2017

aantron Jul 2, 2017

aantron Jul 2, 2017

aantron Jul 2, 2017 •

edited

Loading

aantron Jul 2, 2017

raphael-proust Jul 3, 2017

aantron Jul 3, 2017

raphael-proust Jul 3, 2017

aantron commented Jul 2, 2017

aantron Jul 3, 2017

aantron Jul 3, 2017

raphael-proust Jul 4, 2017

aantron Jul 4, 2017

raphael-proust Jul 4, 2017

aantron Jul 3, 2017

aantron Jul 4, 2017

aantron Jul 4, 2017

aantron Jul 4, 2017

aantron Jul 4, 2017

raphael-proust Jul 4, 2017

aantron commented Jul 4, 2017

aantron Jul 4, 2017

aantron left a comment

aantron Jul 4, 2017

raphael-proust Jul 4, 2017

raphael-proust commented Jul 4, 2017

aantron commented Jul 4, 2017

aantron commented Jul 4, 2017

Use readdir instead of deprecated readdir_r to avoid warnings #430

Use readdir instead of deprecated readdir_r to avoid warnings #430

Conversation

raphael-proust commented Jul 1, 2017

aantron commented Jul 1, 2017

raphael-proust commented Jul 1, 2017

aantron commented Jul 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aantron Jul 2, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aantron commented Jul 2, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aantron commented Jul 4, 2017

Choose a reason for hiding this comment

aantron left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raphael-proust commented Jul 4, 2017

aantron commented Jul 4, 2017

aantron commented Jul 4, 2017

Use `readdir` instead of deprecated `readdir_r` to avoid warnings #430

Use `readdir` instead of deprecated `readdir_r` to avoid warnings #430

aantron Jul 2, 2017 •

edited

Loading