Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs fails with trace #379

Closed
aarcane opened this issue Aug 30, 2011 · 3 comments
Closed

zfs fails with trace #379

aarcane opened this issue Aug 30, 2011 · 3 comments
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture
Milestone

Comments

@aarcane
Copy link

aarcane commented Aug 30, 2011

[ 2806.472323] BUG: unable to handle kernel paging request at ffdc4000
[ 2806.472413] IP: [] memcpy+0x1f/0x40
[ 2806.472469] *pdpt = 0000000000913001 *pde = 0000000001451067 *pte =
0000000000000000
[ 2806.472547] Oops: 0000 [#1] SMP
[ 2806.472592] last sysfs file:
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
[ 2806.472645] Modules linked in: usblp iscsi_trgt crc32c parport_pc
ppdev ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi nfsd exportfs nfs lockd
fscache nfs_acl auth_rpcgss sunrpc zfs(P) zcommon(P) znvpair(P) zavl(P)
zunicode(P) spl zlib_deflate joydev radeon asus_atk0110 ttm psmouse
drm_kms_helper k10temp serio_raw drm agpgart i2c_algo_bit i2c_piix4
shpchp lp parport usbhid hid sata_sil r8169 ahci mii libahci xhci_hcd
[ 2806.473392]
[ 2806.473415] Pid: 22, comm: bdi-default Tainted: P
2.6.35-30-generic-pae #56-Ubuntu M4A88TD-V EVO/USB3/System Product Name
[ 2806.473494] EIP: 0060:[] EFLAGS: 00010216 CPU: 0
[ 2806.473537] EIP is at memcpy+0x1f/0x40
[ 2806.473570] EAX: d4578000 EBX: 00006000 ECX: 00001400 EDX: ffdc3000
[ 2806.473615] ESI: ffdc4000 EDI: d4579000 EBP: ced3bccc ESP: ced3bcc0
[ 2806.473661] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2806.473702] Process bdi-default (pid: 22, ti=ced3a000 task=ced50000
task.ti=ced3a000)
[ 2806.473754] Stack:
[ 2806.473774] c9105988 00006000 00000000 ced3bd20 d0f09fa0 00000000
00000000 00006000
[ 2806.473888] <0> 00000000 00000000 d0f9ec68 ced3bd0c ced3bd10 00006000
00000000 00000000
[ 2806.474019] <0> 00000000 00006000 00000000 00000001 cb52e098 c911ab1c
c10f64e0 c911aa20
[ 2806.478318] Call Trace:
[ 2806.478318] [] ? dmu_write+0xa0/0x1b0 [zfs]
[ 2806.478318] [] ? zfs_putpage+0x203/0x300 [zfs]
[ 2806.478318] [] ? zpl_putpage+0x2f/0x50 [zfs]
[ 2806.478318] [] ? write_cache_pages+0x146/0x320
[ 2806.478318] [] ? zpl_putpage+0x0/0x50 [zfs]
[ 2806.478318] [] ? dequeue_entity+0x1c8/0x210
[ 2806.478318] [] ? zpl_writepages+0x18/0x20 [zfs]
[ 2806.478318] [] ? do_writepages+0x1c/0x40
[ 2806.478318] [] ? writeback_single_inode+0xb8/0x340
[ 2806.478318] [] ? writeback_sb_inodes+0x149/0x230
[ 2806.478318] [] ? writeback_inodes_wb+0x8a/0x160
[ 2806.478318] [] ? wb_writeback+0x1eb/0x230
[ 2806.478318] [] ? wb_do_writeback+0x129/0x130
[ 2806.478318] [] ? bdi_forker_task+0x62/0x2f0
[ 2806.478318] [] ? bdi_start_fn+0x0/0xc0
[ 2806.478318] [] ? bdi_forker_task+0x0/0x2f0
[ 2806.478318] [] ? kthread+0x74/0x80
[ 2806.478318] [] ? kthread+0x0/0x80
[ 2806.478318] [] ? kernel_thread_helper+0x6/0x10
[ 2806.478318] Code: 90 90 90 90 90 90 90 90 90 90 90 90 55 89 e5 83 ec
0c 89 1c 24 89 74 24 04 89 7c 24 08 0f 1f 44 00 00 89 cb 89 c7 c1 e9 02
89 d6 a5 89 d9 83 e1 03 74 02 f3 a4 8b 1c 24 8b 74 24 04 8b 7c 24
[ 2806.478318] EIP: [] memcpy+0x1f/0x40 SS:ESP 0068:ced3bcc0
[ 2806.478318] CR2: 00000000ffdc4000
[ 2806.478318] ---[ end trace 5186376129afa87a ]---

the system doesn't crash or kernel panic, but ZFS does stop working and silently fails to continue

linux-image-2.6.32-26-generic-pae
Ubuntu 10.10

lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] RS880 Host Bridge
00:01.0 PCI bridge: ASUSTeK Computer Inc. RS880 PCI to PCI bridge (int gfx)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] RS780/RS880 PCI to PCI bridge (PCIE port 5)
00:11.0 SATA controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode](rev 40)
00:12.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:12.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:13.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
00:14.3 ISA bridge: ATI Technologies Inc SB7x0/SB8x0/SB9x0 LPC host controller (rev 40)
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev 40)
00:14.5 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB OHCI2 Controller
00:15.0 PCI bridge: ATI Technologies Inc Device 43a0
00:15.1 PCI bridge: ATI Technologies Inc Device 43a1
00:16.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB OHCI0 Controller
00:16.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB EHCI Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control
00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 10h Processor Link Control
01:05.0 VGA compatible controller: ATI Technologies Inc RS880 [Radeon HD 4250]
02:00.0 USB Controller: NEC Corporation uPD720200 USB 3.0 Host Controller (rev 03)
03:06.0 SATA controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
03:07.0 SATA controller: Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)

in some way I think the SiL chips are partly to blame, but I can't be sure.

@behlendorf
Copy link
Contributor

It looks like an issue in the writepage code to me. It looks like we're trying to copy a page in to an arc buffer so it can be flushed to disk but it has already been paged out. That probably mean we have the wrong address somehow. Certainly looks like a zfs bug we'll look in to it. Are you able to consistently reproduce it? If so how.

@behlendorf
Copy link
Contributor

This could also be caused by something in the PAE kernel we're not taking in to account. 32-bit kernels are not support, I doubt you'll see this is you move to a 64-bit system.

@behlendorf
Copy link
Contributor

I've going to close this issue as stale. There have been a lot of fixes since this was originally reported and no other reports of this issue that I'm aware of. Please open a new issue if your able to recreate this failure with 0.6.0-rc7 or latter.

kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Mar 1, 2015
Reinstate the correct default behavior of returning the number of objects
in the cache for reclaim.  This behavior was disabled in recent releases
to do occasional reports of spinning in shrink_slabs().  Those issues have
been resolved and can no longer can be reproduced.  See commit 376dc35.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Issue openzfs#358
Closes openzfs#379
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Mar 1, 2015
For small objects the Linux slab allocator should be used to make the most
efficient use of the memory.  However, large objects are not supported by
the Linux slab and therefore the SPL implementation is preferred.  A cutoff
of 16K was determined to be optimal for architectures using 4K pages.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: DHE <[email protected]>
Issue openzfs#356
Closes openzfs#379
sdimitro pushed a commit to sdimitro/zfs that referenced this issue May 23, 2022
…mary (openzfs#379)

We keep the summary of the index in memory, so that ZettaCache::lookup()
can find which chunk contains a given key.  We do this in a
`Vec<BlockBasedLogChunkSummaryEntry<IndexEntry>>`.  The IndexEntry is
composed of an IndexKey (9 bytes) and IndexValue (15 bytes).  However,
only the IndexKey is actually needed.

This commit changes the in-memory storage of the chunk summary to omit
the IndexValue, storing only the IndexKey.  This reduces the memory
usage from 32 to 17 bytes per chunk (53%).  For a 100TB cache, this
corresponds to a reduction of ~3GB RAM.

The interface to `SummarizedBlockBasedLog::lookup_by_key()` is also
simplified.

Note that the on-disk representation of the Summary is not changed; it
still (unnecessarily) stores the IndexValue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Architecture Indicates an issue is specific to a single processor architecture
Projects
None yet
Development

No branches or pull requests

2 participants