1

I regularly get the following error on my Ubuntu Server 18.10:

00:00:30 systemd[1]: Starting Discard unused blocks...
00:00:30 systemd[1]: Starting Rotate log files...
00:00:30 systemd[1]: Started Rotate log files.
00:01:01 kernel: ata7.00: exception Emask 0x0 SAct 0x10000 SErr 0x0 action 0x6 frozen
00:01:01 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:01 kernel: ata7.00: cmd 64/01:80:00:00:00/00:00:00:00:00/a0 tag 16 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:01 kernel: ata7.00: status: { DRDY }
00:01:01 kernel: ata7: hard resetting link
00:01:01 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:01 kernel: ata7.00: configured for UDMA/133
00:01:01 kernel: ata7.00: device reported invalid CHS sector 0
00:01:01 kernel: ata7: EH complete
00:01:32 kernel: ata7.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x6 frozen
00:01:32 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:01:32 kernel: ata7.00: cmd 64/01:90:00:00:00/00:00:00:00:00/a0 tag 18 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:01:32 kernel: ata7.00: status: { DRDY }
00:01:32 kernel: ata7: hard resetting link
00:01:32 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:01:32 kernel: ata7.00: configured for UDMA/133
00:01:32 kernel: ata7.00: device reported invalid CHS sector 0
00:01:32 kernel: ata7: EH complete
00:02:04 kernel: ata7.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x6 frozen
00:02:04 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel:       Tainted: P           O      4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim          D    0 29514      1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel:  __schedule+0x29e/0x840
00:02:37 kernel:  schedule+0x2c/0x80
00:02:37 kernel:  schedule_timeout+0x258/0x360
00:02:04 kernel: ata7.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:04 kernel: ata7.00: status: { DRDY }
00:02:04 kernel: ata7: hard resetting link
00:02:05 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:05 kernel: ata7.00: configured for UDMA/133
00:02:05 kernel: ata7.00: device reported invalid CHS sector 0
00:02:05 kernel: ata7: EH complete
00:02:37 kernel: INFO: task fstrim:29514 blocked for more than 120 seconds.
00:02:37 kernel:       Tainted: P           O      4.18.0-17-generic #18-Ubuntu
00:02:37 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
00:02:37 kernel: fstrim          D    0 29514      1 0x00000000
00:02:37 kernel: Call Trace:
00:02:37 kernel:  __schedule+0x29e/0x840
00:02:37 kernel:  schedule+0x2c/0x80
00:02:37 kernel:  schedule_timeout+0x258/0x360
00:02:37 kernel:  io_schedule_timeout+0x1e/0x50
00:02:37 kernel:  wait_for_completion_io+0xb7/0x140
00:02:37 kernel:  ? wake_up_q+0x80/0x80
00:02:37 kernel:  submit_bio_wait+0x61/0x90
00:02:37 kernel:  blkdev_issue_discard+0x7a/0xd0
00:02:37 kernel:  ext4_trim_fs+0x5a9/0x8b0
00:02:37 kernel:  ? security_file_open+0x86/0x90
00:02:37 kernel:  ext4_ioctl+0xd81/0x14a0
00:02:37 kernel:  ? _copy_to_user+0x2b/0x40
00:02:37 kernel:  ? cp_new_stat+0x152/0x180
00:02:37 kernel:  do_vfs_ioctl+0xa8/0x620
00:02:37 kernel:  ? __do_sys_newfstat+0x5f/0x70
00:02:37 kernel:  ksys_ioctl+0x67/0x90
00:02:37 kernel:  __x64_sys_ioctl+0x1a/0x20
00:02:37 kernel:  do_syscall_64+0x5a/0x110
00:02:37 kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
00:02:37 kernel: RIP: 0033:0x7faba5a9e3c7
00:02:37 kernel: Code: Bad RIP value.
00:02:37 kernel: RSP: 002b:00007ffec09ede88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
00:02:37 kernel: RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007faba5a9e3c7
00:02:37 kernel: RDX: 00007ffec09ede90 RSI: 00000000c0185879 RDI: 0000000000000004
00:02:37 kernel: RBP: 0000000000000004 R08: 0000000000000001 R09: 0000000000000000
00:02:37 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000561d21106dd0
00:02:37 kernel: R13: 00007faba5663ff8 R14: 00007ffec09edfc8 R15: 0000561d21106dd0
00:02:37 kernel: ata7.00: NCQ disabled due to excessive errors
00:02:37 kernel: ata7.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
00:02:37 kernel: ata7.00: failed command: SEND FPDMA QUEUED
00:02:37 kernel: ata7.00: cmd 64/01:c0:00:00:00/00:00:00:00:00/a0 tag 24 ncq dma 512 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
00:02:37 kernel: ata7.00: status: { DRDY }
00:02:37 kernel: ata7: hard resetting link
00:02:38 kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
00:02:38 kernel: ata7.00: configured for UDMA/133
00:02:38 kernel: ata7.00: device reported invalid CHS sector 0
00:02:38 kernel: ata7: EH complete
00:03:02 fstrim[29514]: /home/caillou/downloads: 891.5 GiB (957190782976 bytes) trimmed
00:03:02 fstrim[29514]: /: 212.4 GiB (228063428608 bytes) trimmed
00:03:02 systemd[1]: Started Discard unused blocks.

Unfortunately I don't understand what it tries to tell me.

  1. What is ata7.00?
  2. What does failed command: SEND FPDMA QUEUED mean? What is this FPDMA?
  3. What does device reported invalid CHS sector 0 mean?

I suspect it has something to do with a drive, but I have no idea how to debug nor how to fix this issue.

Here is the output of lsblk:

sda           8:0    0   7.3T  0 disk
|-sda1        8:1    0     2G  0 part
`-sda2        8:2    0   7.3T  0 part
sdb           8:16   0   7.3T  0 disk
|-sdb1        8:17   0     2G  0 part
`-sdb2        8:18   0   7.3T  0 part
sdc           8:32   0   7.3T  0 disk
|-sdc1        8:33   0     2G  0 part
`-sdc2        8:34   0   7.3T  0 part
sdd           8:48   0   7.3T  0 disk
|-sdd1        8:49   0     2G  0 part
`-sdd2        8:50   0   7.3T  0 part
sde           8:64   0   3.7T  0 disk
|-sde1        8:65   0     2G  0 part
`-sde2        8:66   0   3.7T  0 part
sdf           8:80   0   7.3T  0 disk
|-sdf1        8:81   0     2G  0 part
`-sdf2        8:82   0   7.3T  0 part
sdg           8:96   0 931.5G  0 disk
`-sdg1        8:97   0 931.5G  0 part /home/caillou/downloads
nvme0n1     259:0    0 232.9G  0 disk
nvme1n1     259:1    0 232.9G  0 disk
|-nvme1n1p1 259:2    0   512M  0 part /boot/efi
`-nvme1n1p2 259:3    0 232.4G  0 part /
  • sdg is an mSATA SSD connected through a PCIe card.
  • sda - sdf are SATA HDDs with ZFS.

Detail of the drives:

  • sda WD Red 8T, mainboard SATA connector.
  • sdb WD Red 8T, mainboard SATA connector.
  • sdc WD Red 8T, mainboard SATA connector.
  • sdd WD Red 8T, mainboard SATA connector.
  • sde WD Red 4T, mainboard SATA connector.
  • sdf WD Red 8T, mainboard SATA connector.
  • sdg Samsung mSATA 1T, shuked from Samsung Portable SSD T5, connected through a PCIe card.
  • nvme0n1 and nvme1n1 Samsung 970 EVO, connected to the mainboard m.2 connector.

The system does not show other signs of errors. Also, everything seems to function as intended, with the exception of these errors in the logs.

1 Answers1

1

Note: It's a good idea to have good backups first.

You need to check/upgrade the firmware on the Samsung SSD's for sdg and nvme*.

Go to Samsung's download page here and download their Samsung Magician software tool to help with the firmware upgrade. Other software updates are also available there.

Also check for a firmware upgrade for the sdg PCIe card.

Check your motherboard BIOS with sudo dmidecode -s bios-version. Then go to the manufacturor's web site and check for a newer BIOS. If there is one, download and install it.

Note: later, if there are still problems, we'll discuss a ncq patch.

heynnema
  • 73,649