Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nextcloud-setup: free() Invalid Pointer #108262

Closed
Thunderbottom opened this issue Jan 2, 2021 · 35 comments
Closed

nextcloud-setup: free() Invalid Pointer #108262

Thunderbottom opened this issue Jan 2, 2021 · 35 comments
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@Thunderbottom
Copy link
Member

Thunderbottom commented Jan 2, 2021

Description

Trying to set up pkgs.nextcloud20 on a NixOS server using NixOps: nixops deploy, fails with the error:

Jan 02 20:34:02 apollo nextcloud-setup-start[5540]: free(): invalid pointer
Jan 02 20:34:03 apollo nextcloud-setup-start[5464]: /nix/store/r135gyhyak3im4zfw7y7pxir4nl82rzl-unit-script-nextcloud-setup-start/bin/nextcloud-setup-start: line 29:  5540 Aborted                 (core dumped) /nix/store/fl8i6y94cr7skkr1fkg54750zn78v5pg-nextcloud-occ/bin/nextcloud-occ maintenance:install --admin-pass "$(<"/run/keys/nextcloud-admin")" --admin-user "admin" --data-dir "/var/lib/nextcloud/data" --database "pgsql" --database-host "/run/postgresql" --database-name "nextcloud" --database-pass "$(<"/run/keys/nextcloud-postgres")" --database-user "nextcloud"

To Reproduce

Steps to reproduce the behavior:

  1. Create a nextcloud.nix configuration with the following content:
{ config, pkgs, lib, ... }:

let
  nextcloud-db-keyfile = "nextcloud-postgres"; # deployment.keys for db password
in {
  services.nextcloud = {
    enable = true;
    hostName = "fqdn.name";  # replace with an actual FQDN
    https = true;
    maxUploadSize = "1G";
    package = pkgs.nextcloud20;

    autoUpdateApps = {
      enable = true;
      startAt = "05:00:00";
    };

    config = {
      adminuser = "admin";
      adminpassFile = "/run/keys/nextcloud-admin";
      dbtype = "pgsql";
      dbuser = "nextcloud";
      dbhost = "/run/postgresql";
      dbname = "nextcloud";
      dbpassFile = "/run/keys/${nextcloud-db-keyfile}";
      overwriteProtocol = "https";
    };
  };

  services.postgresql = {
    enable = true;
    ensureDatabases = [ "nextcloud" ];
    ensureUsers = [
      {
        name = "nextcloud";
        ensurePermissions."DATABASE nextcloud" = "ALL PRIVILEGES";
      }
    ];
  };

  services.nginx = {
    virtualHosts."fqdn.name" = {
      enableACME = true;
      forceSSL = true;
    };
  };

  systemd.services."nextcloud-setup" = {
    after = [  "nextcloud-admin-key.service" "${nextcloud-db-keyfile}-key.service" "postgres.service" ];
    wants = [  "nextcloud-admin-key.service" "${nextcloud-db-keyfile}-key.service" "postgres.service" ];
  };

  users.users.nextcloud.extraGroups = [ "keys" ];
}
  1. Run nixops deploy
  2. Watch it fail with the following error:
● nextcloud-setup.service
      Loaded: loaded (/nix/store/ma6q9s6bcwcygqn4w907iq4bisivb496-unit-nextcloud-setup.service/nextcloud-setup.service; enabled; vendor preset: enabled)
      Active: failed (Result: exit-code) since Sat 2021-01-02 20:34:03 UTC; 1s ago
     Process: 5464 ExecStart=/nix/store/r135gyhyak3im4zfw7y7pxir4nl82rzl-unit-script-nextcloud-setup-start/bin/nextcloud-setup-start (code=exited, status=134)
    Main PID: 5464 (code=exited, status=134)
          IP: 0B in, 0B out
         CPU: 125ms

Jan 02 20:34:02 apollo systemd[1]: Starting nextcloud-setup.service...
Jan 02 20:34:02 apollo nextcloud-setup-start[5540]: free(): invalid pointer
Jan 02 20:34:03 apollo nextcloud-setup-start[5464]: /nix/store/r135gyhyak3im4zfw7y7pxir4nl82rzl-unit-script-nextcloud-setup-start/bin/nextcloud-setup-start: line 29:  5540 Aborted                 (core dumped) /nix/store/fl8i6y94cr7skkr1fkg54750zn78v5pg-nextcloud-occ/bin/nextcloud-occ maintenance:install --admin-pass "$(<"/run/keys/nextcloud-admin")" --admin-user "admin" --data-dir "/var/lib/nextcloud/data" --database "pgsql" --database-host "/run/postgresql" --database-name "nextcloud" --database-pass "$(<"/run/keys/nextcloud-postgres")" --database-user "nextcloud"
Jan 02 20:34:03 apollo systemd[1]: nextcloud-setup.service: Main process exited, code=exited, status=134/n/a
Jan 02 20:34:03 apollo systemd[1]: nextcloud-setup.service: Failed with result 'exit-code'.
Jan 02 20:34:03 apollo systemd[1]: Failed to start nextcloud-setup.service.

Expected behavior
Nextcloud (in current context, pkgs.nextcloud20) should be installed.

Additional context
Don't know if this is relevant, but both my server and my host system are amd64 machines.

Notify maintainers
@schneefux
@bachp
@globin
@fpletz
@Ma27

Metadata
Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

Host System:

 - system: `"x86_64-linux"`
 - host os: `Linux 5.9.16, NixOS, 20.09.2405.e065200fc90 (Nightingale)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.9`
 - channels(root): `"home-manager-20.09, nixos-20.09.2405.e065200fc90, nixos-hardware, nixos-unstable-21.03pre260232.733e537a8ad, nixpkgs-unstable-21.03pre260775.2080afd0399"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

NixOps Server:

 - system: `"x86_64-linux"`
 - host os: `Linux 5.9.16, NixOS, 20.09.2405.e065200fc90 (Nightingale)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.3.9`
 - channels(root): `"nixos-20.09.1889.58f9c4c7d3a"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@Thunderbottom Thunderbottom added the 0.kind: bug Something is broken label Jan 2, 2021
@Ma27
Copy link
Member

Ma27 commented Jan 2, 2021

Hmm can you please give us a backtrace from gdb? (should be doable by running coredumpctl gdb and then bt). It would be also helpful to tell which command exactly failed (should be in the coredumpctl output).

@Thunderbottom
Copy link
Member Author

Sure, here is the coredumpctl output:

           PID: 7969 (.php-wrapped)
           UID: 1003 (nextcloud)
           GID: 996 (nextcloud)
        Signal: 6 (ABRT)
     Timestamp: Sat 2021-01-02 21:03:29 UTC (6min ago)
  Command Line: /nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12/bin/php -f /nix/store/6b27sl1g2lnm7a63njm2najvpkw8s0l3-nextcloud-20.0.4/cron.php
    Executable: /nix/store/i706qi3j3r0rh5dp8qhwdg5qmjvsmj0g-php-7.4.12/bin/php
 Control Group: /system.slice/nextcloud-cron.service
          Unit: nextcloud-cron.service
         Slice: system.slice
       Boot ID: 972fcb88ce264866add4c9e0436c1884
    Machine ID: 6df95ec0c8304498b2c7c05e45bb6d68
      Hostname: apollo
       Storage: /var/lib/systemd/coredump/core.\x2ephp-wrapped.1003.972fcb88ce264866add4c9e0436c1884.7969.1609621409000000.lz4
       Message: Process 7969 (.php-wrapped) of user 1003 dumped core.

and here's the gdb backtrace:

Reading symbols from /nix/store/i706qi3j3r0rh5dp8qhwdg5qmjvsmj0g-php-7.4.12/bin/php...
(No debugging symbols found in /nix/store/i706qi3j3r0rh5dp8qhwdg5qmjvsmj0g-php-7.4.12/bin/php)
[New LWP 7969]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libthread_db.so.1".
Core was generated by `/nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12/bin/php'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f25574eb08a in raise () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
warning: File "/nix/store/hxs99j1kx878pxxw5lbdarml69r5f1qb-gcc-9.3.0-lib/lib/libstdc++.so.6.0.28-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/nix/store/vran8acwir59772hj4vscr7zribvp7l5-gcc-9.3.0-lib".
To enable execution of this file add
	add-auto-load-safe-path /nix/store/hxs99j1kx878pxxw5lbdarml69r5f1qb-gcc-9.3.0-lib/lib/libstdc++.so.6.0.28-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) bt
#0  0x00007f25574eb08a in raise () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#1  0x00007f25574d5528 in abort () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#2  0x00007f255752c8a8 in __libc_message () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#3  0x00007f2557533a0a in malloc_printerr () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#4  0x00007f255753536c in _int_free () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#5  0x00007b2542c2195f in numa_init () from /nix/store/dgb0w5fsdym9k2hazvnbhsknrbmbi8a2-numactl-2.0.13/lib/libnuma.so.1
#6  0x00007f25583f28fa in call_init.part ()
   from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#7  0x00007f25583f2a36 in _dl_init () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#8  0x00007f25575e1bac in _dl_catch_exception () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#9  0x00007f25583f6db4 in dl_open_worker ()
   from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#10 0x00007f25575e1b65 in _dl_catch_exception () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#11 0x00007f25583f620a in _dl_open () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#12 0x00007f255795e246 in dlopen_doit () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libdl.so.2
#13 0x00007f25575e1b65 in _dl_catch_exception () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#14 0x00007f25575e1bff in _dl_catch_error () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#15 0x00007f255795e8f5 in _dlerror_run () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libdl.so.2
#16 0x00007f255795e2c6 in dlopen@@GLIBC_2.2.5 () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libdl.so.2
#17 0x000000000070b484 in php_load_shlib ()
#18 0x000000000070b554 in php_load_extension ()
#19 0x00000000007cd3de in zend_llist_apply ()
#20 0x000000000077b24a in php_ini_register_extensions ()
#21 0x0000000000773989 in php_module_startup ()
#22 0x000000000086926d in php_cli_startup ()
#23 0x000000000063635a in main ()

@Thunderbottom
Copy link
Member Author

Apologies, seems like that is some other trace from a cron that apparently has executed right after the maintenance:install execution. Here's the trace relevant to the issue:

coredumpctl output:

           PID: 8984 (.php-wrapped)
           UID: 1003 (nextcloud)
           GID: 996 (nextcloud)
        Signal: 6 (ABRT)
     Timestamp: Sat 2021-01-02 21:14:49 UTC (14s ago)
  Command Line: /nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12/bin/php occ maintenance:install --admin-pass pass --admin-user admin --data-dir /data/nextcloud/data --database pgsql --database-host /run/postgresql --database-name nextcloud --database-pass pass --database-user nextcloud
    Executable: /nix/store/i706qi3j3r0rh5dp8qhwdg5qmjvsmj0g-php-7.4.12/bin/php
 Control Group: /user.slice/user-0.slice/session-26.scope
          Unit: session-26.scope
         Slice: user-0.slice
       Session: 26
     Owner UID: 0 (root)
       Boot ID: 972fcb88ce264866add4c9e0436c1884
    Machine ID: 6df95ec0c8304498b2c7c05e45bb6d68
      Hostname: apollo
       Storage: /var/lib/systemd/coredump/core.\x2ephp-wrapped.1003.972fcb88ce264866add4c9e0436c1884.8984.1609622089000000.lz4
       Message: Process 8984 (.php-wrapped) of user 1003 dumped core.

gdb backtrace:

Reading symbols from /nix/store/i706qi3j3r0rh5dp8qhwdg5qmjvsmj0g-php-7.4.12/bin/php...
(No debugging symbols found in /nix/store/i706qi3j3r0rh5dp8qhwdg5qmjvsmj0g-php-7.4.12/bin/php)
[New LWP 8984]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libthread_db.so.1".
Core was generated by `/nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12/bin/php'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fa17a34508a in raise () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
warning: File "/nix/store/hxs99j1kx878pxxw5lbdarml69r5f1qb-gcc-9.3.0-lib/lib/libstdc++.so.6.0.28-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/nix/store/vran8acwir59772hj4vscr7zribvp7l5-gcc-9.3.0-lib".
To enable execution of this file add
	add-auto-load-safe-path /nix/store/hxs99j1kx878pxxw5lbdarml69r5f1qb-gcc-9.3.0-lib/lib/libstdc++.so.6.0.28-gdb.py
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
(gdb) bt
#0  0x00007fa17a34508a in raise () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#1  0x00007fa17a32f528 in abort () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#2  0x00007fa17a3868a8 in __libc_message () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#3  0x00007fa17a38da0a in malloc_printerr () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#4  0x00007fa17a38f36c in _int_free () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#5  0x00007ba165a7a95f in numa_init () from /nix/store/dgb0w5fsdym9k2hazvnbhsknrbmbi8a2-numactl-2.0.13/lib/libnuma.so.1
#6  0x00007fa17b24c8fa in call_init.part ()
   from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#7  0x00007fa17b24ca36 in _dl_init () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#8  0x00007fa17a43bbac in _dl_catch_exception () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#9  0x00007fa17b250db4 in dl_open_worker ()
   from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#10 0x00007fa17a43bb65 in _dl_catch_exception () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#11 0x00007fa17b25020a in _dl_open () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/ld-linux-x86-64.so.2
#12 0x00007fa17a7b8246 in dlopen_doit () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libdl.so.2
#13 0x00007fa17a43bb65 in _dl_catch_exception () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#14 0x00007fa17a43bbff in _dl_catch_error () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libc.so.6
#15 0x00007fa17a7b88f5 in _dlerror_run () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libdl.so.2
#16 0x00007fa17a7b82c6 in dlopen@@GLIBC_2.2.5 () from /nix/store/33idnvrkvfgd5lsx2pwgwwi955adl6sk-glibc-2.31/lib/libdl.so.2
#17 0x000000000070b484 in php_load_shlib ()
#18 0x000000000070b554 in php_load_extension ()
#19 0x00000000007cd3de in zend_llist_apply ()
#20 0x000000000077b24a in php_ini_register_extensions ()
#21 0x0000000000773989 in php_module_startup ()
#22 0x000000000086926d in php_cli_startup ()
#23 0x000000000063635a in main ()

@Ma27
Copy link
Member

Ma27 commented Jan 2, 2021

Gotta sleep now, will think about this tomorrow.
Until then, does anybody else from @NixOS/php have an idea? :)

@aanderse
Copy link
Member

aanderse commented Jan 3, 2021

Maybe something was written to the nextcloud log, which I assume would be written somewhere under /var/lib/nextcloud. @Thunderbottom can you please take a look and see?

@Thunderbottom
Copy link
Member Author

I cannot seem to find any such logs for nextcloud. Is there anything else I can help with?

@talyz
Copy link
Contributor

talyz commented Jan 3, 2021

Looks like it fails to load an extension and at some point, in glibc or numactl, tries to free memory that isn't freeable. It never seems to reach any actual PHP code execution.

Are you able to run php at all on the machine?

@Thunderbottom
Copy link
Member Author

Thunderbottom commented Jan 3, 2021

It seems like php isn't installed. Although, yes, I could run it on a nix-shell so I am pretty sure that it is working alright.

EDIT: It doesn't work, tried both php and php73:

[nix-shell:~]# php -v
PHP 7.3.24 (cli) (built: Oct 27 2020 11:02:08) ( ZTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.24, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.3.24, Copyright (c) 1999-2018, by Zend Technologies
free(): invalid pointer
Aborted (core dumped)

EDIT 2: Here's the coredumpctl and the gdb backtrace for php -v:

           PID: 8888 (.php-wrapped)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 6 (ABRT)
     Timestamp: Sun 2021-01-03 10:57:45 UTC (3s ago)
  Command Line: /nix/store/6g0pahbisdd1jg26y5ixm3ywc04qrx4w-php-with-extensions-7.3.24/bin/php -v
    Executable: /nix/store/ad1gcf2lf2swdhlkh21jqjsg3kdnx8v8-php-7.3.24/bin/php
 Control Group: /user.slice/user-0.slice/session-33.scope
          Unit: session-33.scope
         Slice: user-0.slice
       Session: 33
     Owner UID: 0 (root)
       Boot ID: 972fcb88ce264866add4c9e0436c1884
    Machine ID: 6df95ec0c8304498b2c7c05e45bb6d68
      Hostname: apollo
       Storage: /var/lib/systemd/coredump/core.\x2ephp-wrapped.0.972fcb88ce264866add4c9e0436c1884.8888.1609671465000000.lz4
       Message: Process 8888 (.php-wrapped) of user 0 dumped core.

gdb trace:

(gdb) bt
#0  0x00007ff55436408a in raise () from /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6
#1  0x00007ff55434e528 in abort () from /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6
#2  0x00007ff5543a58a8 in __libc_message () from /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6
#3  0x00007ff5543aca0a in malloc_printerr () from /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6
#4  0x00007ff5543ae36c in _int_free () from /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31/lib/libc.so.6
#5  0x00000000005f229d in zend_hash_destroy ()
#6  0x00007bf54378138f in zm_shutdown_mysqli ()
   from /nix/store/86zavhr7anx1711snwfgjz8sfdy6diiv-php-mysqli-7.3.24/lib/php/extensions/mysqli.so
#7  0x00000000005e7db9 in module_destructor ()
#8  0x00000000005e102c in module_destructor_zval ()
#9  0x00000000005f2d4a in zend_hash_graceful_reverse_destroy ()
#10 0x00000000005e20af in zend_shutdown ()
#11 0x0000000000578662 in php_module_shutdown ()
#12 0x0000000000456f44 in main ()

@talyz
Copy link
Contributor

talyz commented Jan 3, 2021

When I try any of these versions (the same nixpkgs commits referenced in your original post) on my machine they work just fine. I don't think this is a nixpkgs issue, at least not related to php or nextcloud. Rather, it looks like something is off with either the installation or your machine, since it's crashing the same way, but with a different build and in a different place.

@Ma27
Copy link
Member

Ma27 commented Jan 3, 2021

Looks like it fails to load an extension and at some point, in glibc or numactl, tries to free memory that isn't freeable. It never seems to reach any actual PHP code execution.

@talyz AFAIU php_load_shlib() does an dlopen from the active libc (glibc in our case) which seems to break. I don't understand why though.

@Thunderbottom so, this is a fairly weird problem we haven't encountered yet, so I'd ask for some patience until we figured out the problem itself. To investigate the issue, I'd have some follow-up questions:

  • Is the config you've pasted above everything that's needed to reproduce the issue? In other words, you don't have any other overlays (e.g. in ~/.config/nixpkgs)? Unfortunately I cannot reproduce this.
  • I just fetched the same store-path from the rev you pasted above for php 7.3 and I cannot reproduce the issue. To make sure that we have the exact same thing running, can you please paste the hashes of the store-paths? Mine look like this:
    $ nix-store -q --hash /nix/store/6g0pahbisdd1jg26y5ixm3ywc04qrx4w-php-with-extensions-7.3.24
    sha256:0kzkziw9xca7a03libnai2m766y167bh61s6dmmassbjyvx533m0
    $ nix-store -q --hash $(dirname $(dirname $(readlink -f /nix/store/6g0pahbisdd1jg26y5ixm3ywc04qrx4w-php-with-extensions-7.3.24/bin/.php-wrapped)))
    sha256:04hc8xmn0b267a3h062dg4w9af3fx488644ndx2jbvr8g2zn221k
    
  • Can you please share a gist with the output of php -i (even if this breaks as well)?
  • Do you have any additional php.ini (IIRC stuff in e.g. /etc/php.ini or /etc/php.d/* will be always loaded as well).

@Thunderbottom
Copy link
Member Author

No worries, and thank you for all the help so far! I'm willing to give as much information as required to help fix this issue.

Just so we are on the same page, I am deploying nextcloud on my server through nixops. To answer your questions:

  1. Yes, that is all that I have in my nextcloud configuration, no overlays.
  2. Both the SHA hashes match with yours:
     # nix-store -q --hash /nix/store/6g0pahbisdd1jg26y5ixm3ywc04qrx4w-php-with-extensions-7.3.24
     sha256:0kzkziw9xca7a03libnai2m766y167bh61s6dmmassbjyvx533m0
    
     # nix-store -q --hash $(dirname $(dirname $(readlink -f /nix/store/6g0pahbisdd1jg26y5ixm3ywc04qrx4w-php-with-extensions-7.3.24/bin/.php-wrapped)))
     sha256:04hc8xmn0b267a3h062dg4w9af3fx488644ndx2jbvr8g2zn221k
    
  3. The output is too long, and it does fail in the end: https://del.dog/urrofobuph.txt
  4. I haven't set up anything that uses php other than nextcloud, and there is no extra configuration other than the one I stated in the OP.

Let me know if there's any other information I could provide. Thanks!

@aanderse
Copy link
Member

aanderse commented Jan 3, 2021

So it's confusing but we can create many php executables on NixOS. We specifically want you to run /nix/store/6g0pahbisdd1jg26y5ixm3ywc04qrx4w-php-with-extensions-7.3.24/bin/php, not php from a nix-shell. I believe if you run this php with a -i or even without arguments we'll have a better idea.

Thanks!

@talyz
Copy link
Contributor

talyz commented Jan 3, 2021

@Ma27 Yeah, that's as much as I gathered as well. Looking at the glibc code, it seems to get quite far along in the loading process before crashing. It also seems to crash at a different place when php -v is run, so php might be a red herring.

@aanderse That is the version of php you get in nix-shell with the commit the remote machine is on. It's different from the one used by nextcloud and even links to a different glibc, but crashes the same way nonetheless.

@Thunderbottom Could you try running nix-store --verify-path $(nix-store -qR /nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12)? It should check the integrity of the php package used by nextcloud and all its dependencies.

@Thunderbottom
Copy link
Member Author

@aanderse, sorry for the confusion. I thought it would end up using the same derivation in the nix-shell as well. Here's the output without the nix-shell: https://del.dog/icrunexeri.txt

@talyz the command didn't output anything, so I suppose that passed the integrity check?

@aanderse
Copy link
Member

aanderse commented Jan 3, 2021

@talyz yeah, sorry, I was confused for a moment and referenced the wrong php. I meant to ask @Thunderbottom to run /nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12/bin/php -m instead. Sorry.

@Ma27
Copy link
Member

Ma27 commented Jan 3, 2021

the command didn't output anything, so I suppose that passed the integrity check?

I'm not sure sure, sometimes the nix-* cli tools can be kinda unintuitive :D You may want to check if you got 0 as exit-code, but if that's the case, we can safely assume that the store-path is fine.

A few more ideas regarding the nix-shell case:

  • Try running nix-shell with --pure and see if it's still reproducible. If not, we can assume that something on your system (and not the package itself) is the problem.
  • Maybe try running php with env USE_ZEND_ALLOC=0 php -v. Not sure if that makes any difference in your case, but it's worth a try.

Another idea: is this target server a QEMU VM (or sth. similar)? If that's the case and there isn't anything sensitive, you could export it as qcow2 image and then somebody (maybe me :)) could look at this.

@talyz
Copy link
Contributor

talyz commented Jan 3, 2021

@Thunderbottom Yep, that's correct. I should output a message for every error it finds, so it should be okay.

@aanderse Ah, I see, that makes sense :)

@Thunderbottom
Copy link
Member Author

@aanderse /nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12/bin/php -m shows:

# /nix/store/vc7rr3jz8mnvhl1ii6zj1bkwvhcs2jwm-php-with-extensions-7.4.12/bin/php -m
free(): invalid pointer
Aborted (core dumped)

@Ma27 I did run nix-shell with --pure, and it fails the same way. Using env USE_ZEND_ALLOC=0 php -v still fails with the same error.

My server is running on hetzner cloud. So I don't think it would be possible to export the VM as a qcow image :(

@Ma27
Copy link
Member

Ma27 commented Jan 3, 2021

My server is running on hetzner cloud. So I don't think it would be possible to export the VM as a qcow image :(

even though those are also virtualized, I didn't find a quick way in their UI for such an export.

For now, two more things:

  • Please check your full store via nix-store --verify --check-contents.
  • Run strace -fT php -v & strace -fT php -m and share the results.
  • Also, check for interesting things in your dmesg and journald.
  • Probably reboot your server once. I'm beginning to think that something in your fs may be broken.

@Thunderbottom
Copy link
Member Author

  1. I checked nix-store --verify --check-contents. There were no errors.
  2. Logs: strace -fT php -m, strace -fT php -v
  3. There's nothing funny that I could find in either of those.
  4. I did run a nixops reboot, it ends up the same every time. Hm, I am unsure if it is a filesystem issue...

@talyz
Copy link
Contributor

talyz commented Jan 3, 2021

We also have to consider that Hetzner Cloud VM or the NixOS Install ISO may be causing the issue. Could you try deploying the same configuration on a Virtualbox VM, or a different cloud provider, using NixOps? If it's not reproducible anywhere else, I don't think there's much we can do and Hetzner support would have to take over.

@Thunderbottom
Copy link
Member Author

I could try something later this week :)

@Ma27
Copy link
Member

Ma27 commented Jan 3, 2021

@talyz I may be blind, but does Hetzner Cloud even have NixOS ISOs? IIRC I had to kexec into a NixOS after booting into an Ubuntu on Hetzner Cloud.

Regarding the note of @talyz: which ISO did you use @Thunderbottom ? If I use the same one, the issue may become reproducible.

On a first glance I couldn't find the culprit in the strace output :/

Tbh I don't think that we'll be really successful with more guessing, a few more ideas on how to isolate the issue:

  • Is the issue reproducible on your local machine? AFAICS you have the same channel there, so you should get the same store-path at least.
  • Also, can you reproduce this in a fresh VM? You can build one e.g. with a config like this:
{
  vmname = { pkgs, ... }: {
 # your config here
  };
}

Then build it with nixos-build-vms vm.nix and run it with ./result/bin/nixos-run-vms -K. If it's reproducible, I'd be very happy if you could share the qcow2 image from /tmp/vm-state-vmname.

@Thunderbottom
Copy link
Member Author

I set up NixOS using nixos-infect, although I don't know if that should make any difference since deploying with nixops means that my host builds everything and then copies the closures to the server. Correct me if I am wrong but I think that means that there's nothing really in terms of configuration that is stored on the server?

I could try setting up a VM locally later this week and see if that works.

@talyz
Copy link
Contributor

talyz commented Jan 3, 2021

@Ma27 I haven't used Hetzner Cloud myself, so I'm not sure, but reading this it seems like they do.

@Ma27
Copy link
Member

Ma27 commented Jan 3, 2021

Okay, I just fired up a Hetzner Cloud VM (CX11) and installed NixOS via nixos-infect with the following command:

curl https://raw.githubusercontent.com/elitak/nixos-infect/master/nixos-infect | NIX_CHANNEL=nixos-20.09 bash 2>&1 | tee /tmp/infect.log

The output of nix-info looks like this:

system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.3.9, channels(root): "nixos-20.09.2468.c6b23ba64ae", nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos

However, both PHP 7.4 and 7.3 work fine:

[nix-shell:~]# php -v
PHP 7.4.12 (cli) (built: Oct 27 2020 15:02:01) ( ZTS )
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
    with Zend OPcache v7.4.12, Copyright (c), by Zend Technologies

[nix-shell:~]# php -m
[PHP Modules]
bcmath
calendar
Core
ctype
curl
date
dom
exif
fileinfo
filter
ftp
gd
gettext
gmp
hash
iconv
imap
intl
json
ldap
libxml
mbstring
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
PDO_ODBC
pdo_pgsql
pdo_sqlite
pgsql
Phar
posix
readline
Reflection
session
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache
[nix-shell:~]# php -v
PHP 7.3.24 (cli) (built: Oct 27 2020 11:02:08) ( ZTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.3.24, Copyright (c) 1998-2018 Zend Technologies
    with Zend OPcache v7.3.24, Copyright (c) 1999-2018, by Zend Technologies

[nix-shell:~]# php -m
[PHP Modules]
bcmath
calendar
Core
ctype
curl
date
dom
exif
fileinfo
filter
ftp
gd
gettext
gmp
hash
iconv
imap
intl
json
ldap
libxml
mbstring
mysqli
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
PDO_ODBC
pdo_pgsql
pdo_sqlite
pgsql
Phar
posix
readline
Reflection
session
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache

So I guess we can rule out that as well :/

@Thunderbottom
Copy link
Member Author

Thunderbottom commented Jan 3, 2021

I could set up a new instance on hetzner and replicate my current setup there. Maybe it really is a filesystem issue?

Also, did you use the nixos-community script for the setup? I too might set it up that way, then. Nevermind, I missed the first line.

@Ma27
Copy link
Member

Ma27 commented Jan 3, 2021

That would be a good idea. I'm sorry, but I currently don't know what else could be the issue atm.

@Thunderbottom
Copy link
Member Author

I shall update here when I get around setting it up (tomorrow, maybe). Thank you all for your time and effort! Really appreciate all the help :)

@veprbl veprbl added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Jan 3, 2021
@ajs124
Copy link
Member

ajs124 commented Jan 3, 2021

If I had to guess, I'd blame this on 30123 openat(AT_FDCWD, "/nix/store/assb564j5d9dpamqcy33l15crs07hxka-malloc-provider-scudo/lib/libclang_rt.scudo-x86_64.so", O_RDONLY|O_CLOEXEC) = 3 <0.000073>
Can you try deleting /etc/ld-nix.so.preload and running any of this again?

The relevant configuration option is environment.memoryAllocator.provider.

@Thunderbottom
Copy link
Member Author

Do you mind telling me what exactly I need to do here? It seems like the default option is libc, whereas the strace shows scudo/lib/libclang_rt.scudo-x86_64.so. Does this mean that the system is using scudo as its memory allocator? If so, do I explicitly set it to libc?

@talyz
Copy link
Contributor

talyz commented Jan 4, 2021

Oh, good catch, @ajs124!

@Thunderbottom Are you perhaps importing the hardened.nix profile? If so, comment it out and see if that solves the issue. Otherwise, manually set environment.memoryAllocator.provider to libc.

@Thunderbottom
Copy link
Member Author

@ajs124 amazing! Seems like that was the issue.

@talyz yeah, I did have hardened.nix in my system configuration. I just manually set environment.memoryAllocator.provider = "libc" and everything worked fine!

So does this mean the PHP package needs to check if the memoryAllocator provider is set to scudo and warn about it?

@talyz
Copy link
Contributor

talyz commented Jan 4, 2021

Okay, that's good. If you run into additional problems, try disabling the whole profile - it changes lots of options that could be detrimental to stability, performance or both.

I don't think any package or service should have to check environment.memoryAllocator.provider - the option already warns in its description that

Selecting an alternative allocator (i.e., anything other than libc) may result in instability, data loss, and/or service failure.

It should therefore be considered highly experimental, as should the hardened.nix profile. It should probably be noted in the manual and source of the profile that it's risky, though.

@Thunderbottom
Copy link
Member Author

So far so good! Everything seems to be working as intended.

It should probably be noted in the manual and source of the profile that it's risky, though.

I think this would be good to know in the manual, yeah.

Thank you everyone for your time and effort to help solve this issue. I really appreciate it!

talyz added a commit to talyz/nixpkgs that referenced this issue Jan 4, 2021
Enabling the profile can lead to hard-to-debug issues, which should be
warned about in addition to the cost in features and performance.

See NixOS#108262 for an example.
talyz added a commit to talyz/nixpkgs that referenced this issue Jan 4, 2021
Enabling the profile can lead to hard-to-debug issues, which should be
warned about in addition to the cost in features and performance.

See NixOS#108262 for an example.

(cherry picked from commit 0f0d5c0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

No branches or pull requests

6 participants