Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added shared memory multi-server feature #917

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jkuro-tii
Copy link
Contributor

@jkuro-tii jkuro-tii commented Nov 28, 2024

Description of changes

The memsocket application has been enhanced with the following new features and updates:

Host Support:
    The application can now run on the host in addition to virtual machines (VMs).

Multi-Server Capability:
    Multiple memsocket instances operating in server mode are now supported.
    Each server instance is configured with a list of specific client IDs it serves.
    Multiple servers can run simultaneously on a single VM or the host.

Client ID Exclusivity:
    Each client ID must be managed by only one server across the entire system.

Configuration Updates:
    Renamed some Nix configuration options for clarity.

Option Changes:
    The -c and -s options have been swapped:
        -c: Now used for running in server mode.
        -s: Now used for running in client mode.
    A new -l option has been introduced for server mode to specify the list of client IDs to be serviced.   
    A new -h option has been introduced for indicating run on the host

Introduced a Nix framework for the automatic generation of configurations for servers (currently supporting audio and video) and their clients.
Enabled the transfer of audio and video data via shared memory from application VMs to designated servers, including gui-vm and audio-vm.

Documentation: https://confluence.tii.ae/x/uEPOAg

Checklist for things done

  • [ X] Summary of the proposed changes in the PR description
  • [ X] More detailed description in the commit message(s)
  • [ X] Commits are squashed into relevant entities - avoid a lot of minimal dev time commits in the PR
  • Contribution guidelines followed
  • Ghaf documentation updated with the commit - https://tiiuae.github.io/ghaf/
  • PR linked to architecture documentation and requirement(s) (ticket id)
  • Test procedure described (or includes tests). Select one or more:
    • Tested on Lenovo X1 x86_64
    • Tested on Jetson Orin NX or AGX aarch64
    • Tested on Polarfire riscv64
  • Author has run make-checks and it passes
  • All automatic Github Action checks pass - see actions
  • Author has added reviewers and removed PR draft status
  • Change requires full re-installation
  • Change can be updated with nixos-rebuild ... switch

Instructions for Testing

  • List all targets that this applies to:
  • Is this a new feature
    • List the test steps to verify:
  • If it is an improvement how does it impact existing functionality?

@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow November 28, 2024 07:53 — with GitHub Actions Inactive
@jkuro-tii jkuro-tii marked this pull request as ready for review November 28, 2024 09:00
@josa41
Copy link
Contributor

josa41 commented Nov 28, 2024

Nothing too obvious to comment in the code.
I did test this also with pulseaudio and got everything to work just fine.
During my testing i found two issues.

  1. If the application socket (pulseaudio in my tests) is not available when memsocket starts then data does not reach the socket. Restarting memsocket helps for this or adding a delay for memsocket to start after the application.
  2. Memsocket/kernel crashes when a second memsocket server is started from command line on the same VM with another id

I passed more details to Jarek in slack

@mbssrc
Copy link
Collaborator

mbssrc commented Dec 4, 2024

Any idea how this can or should be tested?

@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 7, 2025 10:24 — with GitHub Actions Inactive
@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 7, 2025 11:51 — with GitHub Actions Inactive
@josa41
Copy link
Contributor

josa41 commented Jan 9, 2025

Tested briefly and audio works through shared mem.
Had some trouble with permissions but that is not related to this change.

@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 9, 2025 12:02 — with GitHub Actions Inactive
@jkuro-tii
Copy link
Contributor Author

Currently, audio and GUI data are sent using shared memory. It's controlled by the enabled option - e.g.: source code
Testing should include running multiple services simultaneously and stopping them, such as Google Chrome, and other app VMs.

The testing process should focus on detecting potential file descriptor or socket leaks by stopping and restarting these services repeatedly.

Additional tests can be performed using any Unix sockets-enabled app, e.g., netcat.

Keep in mind that it requires allocating additional shared memory slots (the shmSlots option). It's possible to reuse already assigned slots, but it requires shutting down the server and client using it.

VM1: server

nc -lU socket_path &
memsocket -s socket_path -l xxx # xxx - allocated slot number

VM2: client

memsocket -c socket_path xxx & # xxx - allocated slot number
nc -U socket_path

@leivos-unikie
Copy link
Contributor

leivos-unikie commented Jan 13, 2025

Tested on Lenovo-X1

  • ci-test-automation results ok, performance ok

  • logging in and out
    number of total open file descriptors varies between
    audio-vm: 16
    gui-vm: 0 (logged out) / 35 (logged in)

  • launching gui-vm apps
    number of total open file descriptors varies between
    chrome-vm: 16
    audio-vm: 16
    gui-vm: 35 (increasing to 40 over time)

  • launching chrome-vm and youtube video
    number of total open file descriptors varies between
    chrome-vm: 16 - 21
    audio-vm: 16 - 20
    gui-vm: 35 - 38

  • Launching all apps
    number of total open file descriptors varies between
    business-vm: 16 - 23
    chrome-vm: 16 - 18
    audio-vm: 16 - 19
    gui-vm: 35 - 53

Detected problems:

  • VM apps (running outside gui.vm) don't launch after log out and log in. While studying the issue noticed at some point that there were a lot of waypipe* files in gui-vm /tmp
[ghaf@gui-vm:/tmp]$ ls -1 | wc -l
3961
  • extract from repeating errors in gui-vm journalctl
[ghaf@gui-vm:/tmp]$ journalctl -n 100
Jan 13 14:07:41 gui-vm sudo[34403]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34403) opened.
Jan 13 14:07:41 gui-vm sudo[34403]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33964/fd
Jan 13 14:07:41 gui-vm sudo[34403]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:41 gui-vm sudo[34403]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:41 gui-vm sudo[34409]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34409) opened.
Jan 13 14:07:41 gui-vm sudo[34409]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33965/fd
Jan 13 14:07:41 gui-vm sudo[34409]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:41 gui-vm sudo[34409]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:41 gui-vm sudo[34415]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34415) opened.
Jan 13 14:07:41 gui-vm sudo[34415]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33969/fd
Jan 13 14:07:41 gui-vm sudo[34415]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:41 gui-vm sudo[34415]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:41 gui-vm sudo[34421]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34421) opened.
Jan 13 14:07:41 gui-vm sudo[34421]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33970/fd
Jan 13 14:07:41 gui-vm sudo[34421]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:41 gui-vm sudo[34421]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:41 gui-vm sudo[34427]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34427) opened.
Jan 13 14:07:41 gui-vm sudo[34427]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33972/fd
Jan 13 14:07:41 gui-vm sudo[34427]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:41 gui-vm sudo[34427]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:41 gui-vm systemd[33806]: waypipe-chrome.service: Scheduled restart job, restart counter is at 11.
Jan 13 14:07:41 gui-vm systemd[33806]: waypipe-zathura.service: Scheduled restart job, restart counter is at 11.
Jan 13 14:07:41 gui-vm systemd[33806]: waypipe-gala.service: Scheduled restart job, restart counter is at 11.
Jan 13 14:07:41 gui-vm systemd[33806]: waypipe-comms.service: Scheduled restart job, restart counter is at 11.
Jan 13 14:07:41 gui-vm systemd[33806]: waypipe-business.service: Scheduled restart job, restart counter is at 11.
Jan 13 14:07:41 gui-vm systemd[33806]: Started Waypipe for business.
Jan 13 14:07:41 gui-vm systemd[33806]: Started Waypipe for chrome.
Jan 13 14:07:41 gui-vm systemd[33806]: Started Waypipe for comms.
Jan 13 14:07:41 gui-vm systemd[33806]: Started Waypipe for gala.
Jan 13 14:07:41 gui-vm systemd[33806]: Started Waypipe for zathura.
Jan 13 14:07:42 gui-vm waypipe[34433]: C34433: 62.151425 [src/util.c:250] Error binding socket at gui-chrome-vm.sock: Address already in use
Jan 13 14:07:42 gui-vm waypipe[34432]: C34432: 62.151483 [src/util.c:250] Error binding socket at gui-business-vm.sock: Address already in use
Jan 13 14:07:42 gui-vm waypipe[34434]: C34434: 62.151688 [src/util.c:250] Error binding socket at gui-comms-vm.sock: Address already in use
Jan 13 14:07:42 gui-vm waypipe[34436]: C34436: 62.152567 [src/util.c:250] Error binding socket at gui-zathura-vm.sock: Address already in use
Jan 13 14:07:42 gui-vm waypipe[34435]: C34435: 62.152832 [src/util.c:250] Error binding socket at gui-gala-vm.sock: Address already in use
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-chrome.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-chrome.service: Failed with result 'exit-code'.
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-comms.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-comms.service: Failed with result 'exit-code'.
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-gala.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-gala.service: Failed with result 'exit-code'.
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-business.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-business.service: Failed with result 'exit-code'.
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-zathura.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:42 gui-vm systemd[33806]: waypipe-zathura.service: Failed with result 'exit-code'.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-chrome.service: Scheduled restart job, restart counter is at 12.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-business.service: Scheduled restart job, restart counter is at 12.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-zathura.service: Scheduled restart job, restart counter is at 12.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-gala.service: Scheduled restart job, restart counter is at 12.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-comms.service: Scheduled restart job, restart counter is at 12.
Jan 13 14:07:43 gui-vm systemd[33806]: Started Waypipe for business.
Jan 13 14:07:43 gui-vm systemd[33806]: Started Waypipe for chrome.
Jan 13 14:07:43 gui-vm systemd[33806]: Started Waypipe for comms.
Jan 13 14:07:43 gui-vm systemd[33806]: Started Waypipe for gala.
Jan 13 14:07:43 gui-vm systemd[33806]: Started Waypipe for zathura.
Jan 13 14:07:43 gui-vm waypipe[34452]: C34452: 63.422502 [src/util.c:250] Error binding socket at gui-business-vm.sock: Address already in use
Jan 13 14:07:43 gui-vm waypipe[34456]: C34456: 63.422697 [src/util.c:250] Error binding socket at gui-zathura-vm.sock: Address already in use
Jan 13 14:07:43 gui-vm waypipe[34455]: C34455: 63.423070 [src/util.c:250] Error binding socket at gui-gala-vm.sock: Address already in use
Jan 13 14:07:43 gui-vm waypipe[34454]: C34454: 63.422976 [src/util.c:250] Error binding socket at gui-comms-vm.sock: Address already in use
Jan 13 14:07:43 gui-vm waypipe[34453]: C34453: 63.422886 [src/util.c:250] Error binding socket at gui-chrome-vm.sock: Address already in use
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-business.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-business.service: Failed with result 'exit-code'.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-chrome.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-chrome.service: Failed with result 'exit-code'.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-gala.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-gala.service: Failed with result 'exit-code'.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-comms.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-comms.service: Failed with result 'exit-code'.
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-zathura.service: Main process exited, code=exited, status=1/FAILURE
Jan 13 14:07:43 gui-vm systemd[33806]: waypipe-zathura.service: Failed with result 'exit-code'.
Jan 13 14:07:43 gui-vm sudo[34466]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34466) opened.
Jan 13 14:07:43 gui-vm sudo[34466]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33964/fd
Jan 13 14:07:43 gui-vm sudo[34466]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:43 gui-vm sudo[34466]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:43 gui-vm sudo[34472]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34472) opened.
Jan 13 14:07:43 gui-vm sudo[34472]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33965/fd
Jan 13 14:07:43 gui-vm sudo[34472]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:43 gui-vm sudo[34472]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:43 gui-vm sudo[34479]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34479) opened.
Jan 13 14:07:43 gui-vm sudo[34479]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33969/fd
Jan 13 14:07:43 gui-vm sudo[34479]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:43 gui-vm sudo[34479]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:44 gui-vm sudo[34485]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34485) opened.
Jan 13 14:07:44 gui-vm sudo[34485]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33970/fd
Jan 13 14:07:44 gui-vm sudo[34485]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:44 gui-vm sudo[34485]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:44 gui-vm sudo[34492]: pam_systemd_home(sudo:account): New sd-bus connection (system-bus-pam-systemd-home-34492) opened.
Jan 13 14:07:44 gui-vm sudo[34492]:     ghaf : TTY=pts/0 ; PWD=/tmp ; USER=root ; COMMAND=/run/current-system/sw/bin/ls /proc/33972/fd
Jan 13 14:07:44 gui-vm sudo[34492]: pam_unix(sudo:session): session opened for user root(uid=0) by ghaf(uid=1001)
Jan 13 14:07:44 gui-vm sudo[34492]: pam_unix(sudo:session): session closed for user root
Jan 13 14:07:44 gui-vm systemd[33806]: waypipe-business.service: Scheduled restart job, restart counter is at 13.
Jan 13 14:07:44 gui-vm systemd[33806]: waypipe-comms.service: Scheduled restart job, restart counter is at 13.
Jan 13 14:07:44 gui-vm systemd[33806]: waypipe-zathura.service: Scheduled restart job, restart counter is at 13.
Jan 13 14:07:44 gui-vm systemd[33806]: waypipe-gala.service: Scheduled restart job, restart counter is at 13.
Jan 13 14:07:44 gui-vm systemd[33806]: waypipe-chrome.service: Scheduled restart job, restart counter is at 13.
Jan 13 14:07:44 gui-vm systemd[33806]: Started Waypipe for business.
Jan 13 14:07:44 gui-vm systemd[33806]: Started Waypipe for chrome.
Jan 13 14:07:44 gui-vm systemd[33806]: Started Waypipe for comms.
Jan 13 14:07:44 gui-vm systemd[33806]: Started Waypipe for gala.
Jan 13 14:07:44 gui-vm systemd[33806]: Started Waypipe for zathura.

@leivos-unikie leivos-unikie added the bug on Lenovo X1 Carbon Issues found on Lenovo X1 Carbon while checking this PR label Jan 13, 2025
@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 14, 2025 09:11 — with GitHub Actions Inactive
@jkuro-tii
Copy link
Contributor Author

Updated the systemd kill signal for Waypipe: it now uses SIGINT. Waypipe properly removes its sockets after receiving the SIGINT signal, whereas it previously received SIGTERM at logout.
The issue with the waypipe* files in /tmp persisting after exiting Waypipe on the gui-vm is unrelated to this PR. This issue also occurs on the main branch.

@leivos-unikie
Copy link
Contributor

Tested again on Lenovo-X1

The bug is fixed.

bat tests pass

Monitored now all VMs with the test script while starting all apps and closing them. Number of file descriptors change as noticed before. Nothing suspicious there.

@leivos-unikie leivos-unikie added Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon and removed bug on Lenovo X1 Carbon Issues found on Lenovo X1 Carbon while checking this PR labels Jan 14, 2025
@brianmcgillion brianmcgillion added Needs Testing CI Team to pre-verify bug on Orin AGX Issues found on NVIDIA Jetson AGX Orin while checking this PR bug on Orin NX Issues found on NVIDIA Jetson NX Orin while checking this PR labels Jan 14, 2025
@jkuro-tii jkuro-tii requested a review from josa41 January 15, 2025 05:24
@johannarautanen
Copy link

Checked with native Orin AGX and NX

  • build can be done
  • Ghaf booted up
  • applications launched
  • automation test cases were ok

Notes:

  • so far the build size flashed to the memory stick has been about 7-8GB, with this PR the size is 11-12 GB

@johannarautanen
Copy link

Checked with crosscompile Orin AGX and NX'

Issues:

  • cannot create the build fro AGX or NX

@leivos-unikie
Copy link
Contributor

But after another reboot the audio devices are back.

@leivos-unikie leivos-unikie added Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon and removed bug on Lenovo X1 Carbon Issues found on Lenovo X1 Carbon while checking this PR labels Jan 23, 2025
@jkuro-tii
Copy link
Contributor Author

It the last push I turned off sending audio through shared memory. So audio issues are unlikely related with this PR.

@leivos-unikie
Copy link
Contributor

It the last push I turned off sending audio through shared memory. So audio issues are unlikely related with this PR.

Yes, I agree.

Now when thinking what happened I believe that connecting to home wifi network with the same IP space than within ghaf virtual network might have caused the crash and also temporary loss of audio.

@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 24, 2025 11:43 — with GitHub Actions Inactive
@jkuro-tii jkuro-tii requested review from vadika and josa41 January 24, 2025 11:47
@jkuro-tii
Copy link
Contributor Author

Created issue regarding the problem with cross-building for the ARM64 platform.

@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 27, 2025 12:16 — with GitHub Actions Inactive
@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 27, 2025 12:27 — with GitHub Actions Inactive
@jkuro-tii
Copy link
Contributor Author

The issue with cross compilation has been solved. The disko-ab-partitions.nix re-sync to main.

@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 28, 2025 05:43 — with GitHub Actions Inactive
@jkuro-tii jkuro-tii requested review from vadika and josa41 January 28, 2025 06:45
@avnik
Copy link
Contributor

avnik commented Jan 28, 2025

@jkuro-tii feel free to cherry-pick fix from 89955dc

@jkuro-tii jkuro-tii temporarily deployed to internal-build-workflow January 30, 2025 04:59 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug on Orin AGX Cross Issues found on NVIDIA Jetson AGX Orin cross-compiled while checking this PR bug on Orin AGX Issues found on NVIDIA Jetson AGX Orin while checking this PR bug on Orin NX Cross Issues found on NVIDIA Jetson NX Orin cross-compiled while checking this PR bug on Orin NX Issues found on NVIDIA Jetson NX Orin while checking this PR Tested on Lenovo X1 Carbon This PR has been tested on Lenovo X1 Carbon
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants