Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with mdadm raid present in system #70

Closed
doctor64 opened this issue Oct 1, 2022 · 2 comments · Fixed by #252
Closed

Crash with mdadm raid present in system #70

doctor64 opened this issue Oct 1, 2022 · 2 comments · Fixed by #252

Comments

@doctor64
Copy link

doctor64 commented Oct 1, 2022

Describe the bug
Trying to start Timeshift (GUI or command line) with raid md0 present in system result with application crash.
Removing RAID device with sudo mdadm --stop /dev/md0 allow program to run successfully

To Reproduce
Steps to reproduce the behavior:

  1. Have device md0 (raid5 in my case) in system
  2. Start timeshift, sudo timeshift --list
  3. program crashed

Expected behavior
Program started and work as expected

Screenshots
error message
sudo timeshift --list
[Warning] Deleted invalid lock
**
ERROR:arraylist.c:910:gee_array_list_real_remove_at: assertion failed: (index >= 0)
Bail out! ERROR:arraylist.c:910:gee_array_list_real_remove_at: assertion failed: (index >= 0)
Aborted

System:

  • Linux Distribution Name and Version: Manjaro 22.0.0
  • Desktop XFCE
  • Application Version Timeshift v22.06.5
@KeithB
Copy link
Contributor

KeithB commented Feb 15, 2023

The issue is specific to RAID5 installs rather than RAID in general - though other levels of RAID weren't attempted to be treated the same way so probably don't work properly either.

The code looking for devices tries to be clever with RAID5 (not sure why RAID5 specifically but it uses original behaviour for RAID1, RAID6 etc) and fails horribly. It's far too dependent on the sequence of lsblk output and not particularly clever in how it uses it to identify block devices. Basically it's a throw of the dice on whether this works with your system once you have RAID5 installed.

As a couple of comparisons on how the system setup influences (i.e. breaks or not) timeshift (all based on CLI with --list-devices as the test command):
System 1

  • LM21 install with Timeshift 22.11.2 and lsblk 2.37.2
  • Disk setup is RAID5 arrays overlayed with LVM offering a single VG.
  • Test timeshift command crashes unless the VGs are disabled (RAID arrays still running).
  • With VGs disabled the output is incorrect missing arrays. I certainly wouldn't trust it.

System 2

  • LM20 install with Timeshift 22.06.05 and lsblk 2.34
  • Disk setup is RAID6 arrays again overlayed with LVM (this time offering 3 VGs).
  • Test timeshift command works, with no VGs or RAID arrays disabled, and appears sane (at first glance - this is a more complex setup).

2 things picked up so far:

  1. lsblk output orders differently between the two systems when using the same lsblk command from timeshift.
  2. The timeshift code (specifically RAID5 checks in device.vala) is entirely dependent on the ordered output of lsblk - once it finds a RAID device it assumes additional (duplicate) entries will be in certain proximity and just removes them.

Given it's looking for parent/child relationships (and goes to some effort to build that knowledge from a flat list), and dependencies on ordering, it may make more sense to revise the code in device.vala to use the structured JSON output from lsblk.

@KeithB
Copy link
Contributor

KeithB commented Feb 17, 2023

I've written a fix for this (didn't go for the JSON re-write in the end as I'm not familiar with the full scope of timeshift or Vala) against the fork at https://github.com/KeithB/timeshift.

Changes:

  • Changed the logic for more controlled removal of parents to RAID devices (rather than the blunt removal of whatever the 2 adjacent entries are)
  • Logic now applies for all RAID configurations rather than just RAID5.
  • dmraid tidy up swapped to the same code for removing parents.

Seems to work as far as I can tell but given the behaviours are caused by combinations of setup and lsblk version I'm not comfortable going near a PR yet.

All test feedback welcomed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants