Clash透明代理机器scp传输时网卡hang住问题
按照 基于Linux+Clash实现透明代理(TProxy)详细教程 搭建的透明代理主机,当作为网关使用的时候,进行scp时,出现网络中断,无法正常传输的问题。
现象
$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file 0% 0 0.0KB/s --:-- ETAConnection to 192.168.10.10 closed by remote host.
lost connection
$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file 1% 14MB 13.3MB/s 01:12 ETA
Timeout, server 192.168.10.10 not responding.
lost connectionssh连接,作为透明代理,没有问题,一切正常。当进行scp的时候,就会中断。
1.排查
登录clash透明代理主机
查看clash服务正常
查看iptables正常,访问 192.168.10.0/24 网段的主机进行return,没有进入clash处理,然后走FORWARD链转发到目标机器。
查看系统日志,发现网卡存在hang住,reset的情况,应该是该问题导致的异常
[Wed Sep 10 17:03:54 2025] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
[Wed Sep 10 17:03:59 2025] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[Wed Sep 10 17:23:07 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <d9>
TDT <6c>
next_to_use <6c>
next_to_clean <d5>
buffer_info[next_to_clean]:
time_stamp <1092b5706>
next_to_watch <d9>
jiffies <1092b5998>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Wed Sep 10 17:23:09 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <d9>
TDT <6c>
next_to_use <6c>
next_to_clean <d5>
buffer_info[next_to_clean]:
time_stamp <1092b5706>
next_to_watch <d9>
jiffies <1092b5b88>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Wed Sep 10 17:23:11 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <d9>
TDT <6c>
next_to_use <6c>
next_to_clean <d5>
buffer_info[next_to_clean]:
time_stamp <1092b5706>
next_to_watch <d9>
jiffies <1092b5d80>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Wed Sep 10 17:23:13 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <d9>
TDT <6c>
next_to_use <6c>
next_to_clean <d5>
buffer_info[next_to_clean]:
time_stamp <1092b5706>
next_to_watch <d9>
jiffies <1092b5f70>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Wed Sep 10 17:23:15 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH <d9>
TDT <6c>
next_to_use <6c>
next_to_clean <d5>
buffer_info[next_to_clean]:
time_stamp <1092b5706>
next_to_watch <d9>
jiffies <1092b6168>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Wed Sep 10 17:23:16 2025] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
[Wed Sep 10 17:23:21 2025] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None日志含义
Detected Hardware Unit Hang
TDH <d9>, TDT <6c>, next_to_use <6c>, next_to_clean <d5>
buffer_info[next_to_clean]: ...
MAC Status <40080083>
PHY Status <796d>
...
Reset adapter unexpectedly
NIC Link is Up 1000 Mbps Full Duplex
Detected Hardware Unit Hang
- NIC 内部 DMA 或 TX/RX 队列卡住
- TDH/TDT 表示发送队列指针,出现“卡住”表示网卡发送缓冲区阻塞
Reset adapter unexpectedly
- 内核自动重置网卡尝试恢复
NIC Link is Up 1000 Mbps Full Duplex
- 网卡恢复连接,但之前网络中断
2.原因
SCP文件传输时,流量大,连续,密集TCP流,导致网卡DMA队列挂起。
为什么ssh正常,scp异常?
SCP vs SSH 的网络特性差异
| 特性 | SSH(交互式登录) | SCP(文件传输) |
|---|---|---|
| 流量大小 | 小包、零散 | 大块、连续、密集 TCP 流 |
| TCP 负载 | 低 | 高(TCP Window 满、长时间 DMA 写队列) |
| 包速率 | 低 | 高 → 网卡 DMA 队列压力大 |
| 延迟容忍度 | 高 | 低 → 任何包丢失/顺序乱 → TCP reset/连接断开 |
结论:SCP 会产生大量连续 TCP 数据包,对网卡发送队列(TX DMA)压力远高于交互式 SSH。
TCP 长连接 + 大流量
- SCP 发大文件 → TX 队列快速积累 → e1000e DMA 无法及时处理
- 网卡检测到 发送队列挂起 →
Detected Hardware Unit Hang
SSH 不触发
- SSH 登录只发小包 → DMA 队列很快清空 → 网卡不会 hang
- 所以 SSH 登录正常,但 SCP 传输中断
总结
SCP 产生 大流量连续 TCP 流 → iptables FORWARD → e1000e DMA 队列压力过大 → 硬件挂起 → 自动重置网卡
为什么其他情况正常:
- 小流量 TCP(SSH 交互、浏览网页、DNS 请求)不会占满 DMA 队列
- 所以网卡不 hang
3.解决方法
禁用TCP/UDP大流量 offload 功能
ethtool -K enp0s31f6 gro off gso off tso off禁用 TSO/GRO/GSO 可以减轻 DMA 压力
对 TCP 大文件传输稳定性有明显改善
禁用后,经测试,scp可以正常传输,网卡未出现hang住问题。
执行过程
查看当前网卡配置
admin@001:~$ sudo ethtool -k enp0s31f6
Features for enp0s31f6:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]其中的TSO/GSO/GRO配置:
tcp-segmentation-offload → TSO
generic-segmentation-offload → GSO
generic-receive-offload → GRO
admin@001:~$ sudo ethtool -k enp0s31f6|grep tcp-segmentation-offloa
tcp-segmentation-offload: on
admin@001:~$ sudo ethtool -k enp0s31f6|grep eneric-segmentation-offload
generic-segmentation-offload: on
admin@001:~$ sudo ethtool -k enp0s31f6|grep neric-receive-offload
generic-receive-offload: on禁用offload功能
ethtool -K enp0s31f6 gro off gso off tso off测试,恢复正常
$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file 100% 977MB 21.3MB/s 00:45
$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file 100% 977MB 19.9MB/s 00:494.原理
(1) 网卡 Offload 功能
| 功能 | 作用 | 对应问题 |
|---|---|---|
| TSO (TCP Segmentation Offload) | 内核发送大块 TCP 数据时,交给网卡拆分成 MSS 大小的包 | 高并发大流量时,DMA 队列容易堆积,e1000e 某些驱动/硬件会挂起 |
| GSO (Generic Segmentation Offload) | 内核处理非 TCP 协议的大包分段 | 同上 |
| GRO (Generic Receive Offload) | 网卡接收多个小 TCP 包后合并为大包交给内核 | 内核处理大流量时,包合并可能触发 DMA 队列 hang |
本质问题:
- SCP 产生大流量连续 TCP 数据
- 结合 TPROXY + fwmark,内核路径复杂 → 每个包多次处理
- 网卡 DMA 队列压力大,硬件无法及时完成 TX → e1000e 报
Hardware Unit Hang
关闭 offload 后:
- 数据包由内核自己切分/处理
- 避免 DMA 队列堆积
- TCP 大流量仍能通过,但 CPU 负载略增
- 网卡稳定,SCP 不再断
参考: