Clash透明代理机器scp传输时网卡hang住问题

按照基于Linux+Clash实现透明代理(TProxy)详细教程搭建的透明代理主机，当作为网关使用的时候，进行scp时，出现网络中断，无法正常传输的问题。

现象

$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file                                                    0%    0     0.0KB/s   --:-- ETAConnection to 192.168.10.10 closed by remote host.
lost connection



$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file                                                    1%   14MB  13.3MB/s   01:12 ETA


Timeout, server 192.168.10.10 not responding.
lost connection

ssh连接，作为透明代理，没有问题，一切正常。当进行scp的时候，就会中断。

1.排查

登录clash透明代理主机

查看clash服务正常

查看iptables正常，访问 192.168.10.0/24 网段的主机进行return，没有进入clash处理，然后走FORWARD链转发到目标机器。

查看系统日志，发现网卡存在hang住，reset的情况，应该是该问题导致的异常

[Wed Sep 10 17:03:54 2025] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
[Wed Sep 10 17:03:59 2025] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[Wed Sep 10 17:23:07 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                             TDH                  <d9>
                             TDT                  <6c>
                             next_to_use          <6c>
                             next_to_clean        <d5>
                           buffer_info[next_to_clean]:
                             time_stamp           <1092b5706>
                             next_to_watch        <d9>
                             jiffies              <1092b5998>
                             next_to_watch.status <0>
                           MAC Status             <40080083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
[Wed Sep 10 17:23:09 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                             TDH                  <d9>
                             TDT                  <6c>
                             next_to_use          <6c>
                             next_to_clean        <d5>
                           buffer_info[next_to_clean]:
                             time_stamp           <1092b5706>
                             next_to_watch        <d9>
                             jiffies              <1092b5b88>
                             next_to_watch.status <0>
                           MAC Status             <40080083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
[Wed Sep 10 17:23:11 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                             TDH                  <d9>
                             TDT                  <6c>
                             next_to_use          <6c>
                             next_to_clean        <d5>
                           buffer_info[next_to_clean]:
                             time_stamp           <1092b5706>
                             next_to_watch        <d9>
                             jiffies              <1092b5d80>
                             next_to_watch.status <0>
                           MAC Status             <40080083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
[Wed Sep 10 17:23:13 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                             TDH                  <d9>
                             TDT                  <6c>
                             next_to_use          <6c>
                             next_to_clean        <d5>
                           buffer_info[next_to_clean]:
                             time_stamp           <1092b5706>
                             next_to_watch        <d9>
                             jiffies              <1092b5f70>
                             next_to_watch.status <0>
                           MAC Status             <40080083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
[Wed Sep 10 17:23:15 2025] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                             TDH                  <d9>
                             TDT                  <6c>
                             next_to_use          <6c>
                             next_to_clean        <d5>
                           buffer_info[next_to_clean]:
                             time_stamp           <1092b5706>
                             next_to_watch        <d9>
                             jiffies              <1092b6168>
                             next_to_watch.status <0>
                           MAC Status             <40080083>
                           PHY Status             <796d>
                           PHY 1000BASE-T Status  <3800>
                           PHY Extended Status    <3000>
                           PCI Status             <10>
[Wed Sep 10 17:23:16 2025] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
[Wed Sep 10 17:23:21 2025] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

日志含义

Detected Hardware Unit Hang
TDH <d9>, TDT <6c>, next_to_use <6c>, next_to_clean <d5>
buffer_info[next_to_clean]: ...
MAC Status <40080083>
PHY Status <796d>
...
Reset adapter unexpectedly
NIC Link is Up 1000 Mbps Full Duplex

Detected Hardware Unit Hang

NIC 内部 DMA 或 TX/RX 队列卡住
TDH/TDT 表示发送队列指针，出现“卡住”表示网卡发送缓冲区阻塞

Reset adapter unexpectedly

内核自动重置网卡尝试恢复

NIC Link is Up 1000 Mbps Full Duplex

网卡恢复连接，但之前网络中断

2.原因

SCP文件传输时，流量大，连续，密集TCP流，导致网卡DMA队列挂起。

为什么ssh正常，scp异常？

SCP vs SSH 的网络特性差异

特性	SSH（交互式登录）	SCP（文件传输）
流量大小	小包、零散	大块、连续、密集 TCP 流
TCP 负载	低	高（TCP Window 满、长时间 DMA 写队列）
包速率	低	高 → 网卡 DMA 队列压力大
延迟容忍度	高	低 → 任何包丢失/顺序乱 → TCP reset/连接断开

结论：SCP 会产生大量连续 TCP 数据包，对网卡发送队列（TX DMA）压力远高于交互式 SSH。

TCP 长连接 + 大流量

SCP 发大文件 → TX 队列快速积累 → e1000e DMA 无法及时处理
网卡检测到 发送队列挂起 → Detected Hardware Unit Hang

SSH 不触发

SSH 登录只发小包 → DMA 队列很快清空 → 网卡不会 hang
所以 SSH 登录正常，但 SCP 传输中断

总结

SCP 产生 大流量连续 TCP 流 → iptables FORWARD → e1000e DMA 队列压力过大 → 硬件挂起 → 自动重置网卡

为什么其他情况正常：

小流量 TCP（SSH 交互、浏览网页、DNS 请求）不会占满 DMA 队列
所以网卡不 hang

3.解决方法

禁用TCP/UDP大流量 offload 功能

ethtool -K enp0s31f6 gro off gso off tso off

禁用 TSO/GRO/GSO 可以减轻 DMA 压力

对 TCP 大文件传输稳定性有明显改善

禁用后，经测试，scp可以正常传输，网卡未出现hang住问题。

执行过程

查看当前网卡配置

admin@001:~$ sudo ethtool -k enp0s31f6
Features for enp0s31f6:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

其中的TSO/GSO/GRO配置：

tcp-segmentation-offload → TSO
generic-segmentation-offload → GSO
generic-receive-offload → GRO

admin@001:~$ sudo ethtool -k enp0s31f6|grep tcp-segmentation-offloa
tcp-segmentation-offload: on
admin@001:~$ sudo ethtool -k enp0s31f6|grep eneric-segmentation-offload
generic-segmentation-offload: on
admin@001:~$ sudo ethtool -k enp0s31f6|grep neric-receive-offload
generic-receive-offload: on

禁用offload功能

ethtool -K enp0s31f6 gro off gso off tso off

测试，恢复正常

$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file                                                  100%  977MB  21.3MB/s   00:45
$ scp 1Gb.file admin@192.168.10.10:/tmp/
admin@192.168.10.10's password:
1Gb.file                                                  100%  977MB  19.9MB/s   00:49

4.原理

(1) 网卡 Offload 功能

功能	作用	对应问题
TSO (TCP Segmentation Offload)	内核发送大块 TCP 数据时，交给网卡拆分成 MSS 大小的包	高并发大流量时，DMA 队列容易堆积，e1000e 某些驱动/硬件会挂起
GSO (Generic Segmentation Offload)	内核处理非 TCP 协议的大包分段	同上
GRO (Generic Receive Offload)	网卡接收多个小 TCP 包后合并为大包交给内核	内核处理大流量时，包合并可能触发 DMA 队列 hang

本质问题：

SCP 产生大流量连续 TCP 数据
结合 TPROXY + fwmark，内核路径复杂 → 每个包多次处理
网卡 DMA 队列压力大，硬件无法及时完成 TX → e1000e 报 Hardware Unit Hang

关闭 offload 后：

数据包由内核自己切分/处理
避免 DMA 队列堆积
TCP 大流量仍能通过，但 CPU 负载略增
网卡稳定，SCP 不再断

参考:

解决方案：e1000e eno1 Detected Hardware Unit Hang

Clash透明代理机器scp传输时网卡hang住问题

1.排查

2.原因

3.解决方法

4.原理

添加新评论

最新文章

最近回复

分类

标签

归档

其它