Troubleshooting Slow TCP Transfers: A Stack-Level Approach

robertbmacdonald
Nov 19, 2025
7 min read

"Something is wrong with the network. I used to get 4Gbps transfers but now I'm only getting 120Mbps. Did you change something recently?"

Sound familiar? If you've spent any time supporting production systems, you've probably heard some variation of this complaint. Before jumping to conclusions about where the problem lies, we need to understand what's actually happening at the TCP layer on both endpoints.

What This Article Is NOT

Before we dive in, let me be crystal clear: this is not another TCP 101 explainer. I'm not going to walk through three-way handshakes, explain what SYN/ACK means, or draw diagrams of the TCP state machine. There are hundreds of articles and RFCs that cover this well.

We're going to focus on the tools and metrics that actually help you diagnose real-world issues.

The Troubleshooting Mindset

When someone reports slow transfers, the key is understanding what TCP is experiencing on both endpoints. Modern TCP implementations are remarkably resilient and can maintain good throughput even with modest packet loss (1-2%). By examining TCP's internal state, we can answer critical questions:

What transfer rate is TCP actually achieving, and why?
Is one endpoint limiting throughput? Which one?
Is TCP adapting correctly to network conditions?
Are packets being lost, and how is TCP recovering?

Your job is to gather this insight using the tools Linux provides. That means working with two primary tools: ss (socket statistics) and nstat (network statistics).

Tool #1: ss - Per-Socket TCP State

The ss command (which replaced the older netstat) gives you detailed information about individual TCP connections. The magic happens when you use the -i (information) flag:

ss -tin dst 3.130.241.210:443

This shows you TCP internal state for a specific connection. Here's what a healthy transfer looks like:

State                Recv-Q                Send-Q                                Local Address:Port                                   Peer Address:Port                Process                
ESTAB                0                     0                                      172.17.1.187:47128                                 3.130.241.210:443                 
	 cubic wscale:12,7 rto:236 rtt:35.047/19.403 ato:40 mss:1288 pmtu:1500 rcvmss:1288 advmss:1448 cwnd:10 bytes_sent:102830 bytes_acked:102831 bytes_received:169536 segs_out:2785 segs_in:2247 data_segs_out:1571 data_segs_in:1214 send 2940052bps lastsnd:7176 lastrcv:7154 lastack:7154 pacing_rate 5879976bps delivery_rate 1401584bps delivered:1572 app_limited busy:54422ms reord_seen:6 rcv_rtt:27 rcv_space:14480 rcv_ssthresh:70646 minrtt:16.109 rcv_ooopack:2 snd_wnd:77824

Are your eyes glazing over yet? Let's break down the key fields you need to understand.

Understanding Window Scaling

The wscale:12,7 field (wscale: send scale factor, receive scale factor) shows the window scaling factors negotiated during connection establishment. Without scaling, TCP is limited to 64KB windows, which significantly caps throughput!

Max Throughput = Window Size / RTT

If you see wscale:0,0, window scaling wasn't enabled on one or both sides, or was stripped during the handshake. Check with:

cat /proc/sys/net/ipv4/tcp_window_scaling  # Should be 1

With wscale:7, multiply the advertised window by 2^7 (128). For 10Gbps at 10ms RTT, you need ~12.5MB windows. Do the math: Required Window = Bandwidth * RTT to understand if your window size can support the expected throughput.

RTT: TCP's View of Network Latency

The rtt:35.047/19.403 field shows TCP's smoothed round-trip-time measurement (35.047ms) and mean deviation (19.403ms). This is NOT ICMP ping—it's measured from actual data ACKs and used to calculate RTO, congestion control decisions, and pacing rates.

Each socket maintains its own RTT value!

What this tells us: High RTT (>100ms on local networks) or high deviation indicates latency or jitter. If you see rtt:150.5/75.2 on what should be a 1ms network, check if Recv-Q is full—this suggests the receiver isn't reading data fast enough, causing TCP to measure higher RTT as it waits for window space to open.

RTO: The Retransmission Timeout

The rto:236 field shows the current retransmission timeout (236ms), dynamically calculated from RTT: RTO = SRTT + max(G, 4 * RTTVAR). Understanding RTO helps explain retransmission behavior—an RTO that seems too aggressive relative to actual network conditions can trigger unnecessary retransmits, while an inflated RTO (much higher than current RTT) indicates TCP is recovering from previous timeout events.

MSS and Path MTU Discovery

mss:1288 pmtu:1500 rcvmss:1288 advmss:1448

The MSS (1288 bytes) and PMTU (1500 bytes) tell you about packet sizing along the path.

What these values indicate:

pmtu:1500 but connection stalls after initial data: Real path MTU is smaller (tunnels/VPNs creating a PMTU black hole).
Unusually small MSS on standard Ethernet: A middlebox is clamping MSS values.
Configured 9000 MTU but seeing small MSS: Path doesn't support jumbo frames, or PMTU discovery hasn't completed.

Tool #2: nstat - System-Wide TCP Counters

While ss gives you per-socket information, nstat shows you aggregate TCP counters across all connections. This is invaluable for spotting patterns:

nstat -az | grep -i tcp

The -a flag shows absolute counter values (instead of increments since previous use) and -z shows counters with a value of zero. Run it twice with a few seconds in between to see rates:

nstat -az | grep -i tcp
sleep 5
nstat -az | grep -i tcp

Retransmissions: SACK vs Duplicate ACKs

Look for key counters in the nstat output to understand whether or not packets are being retransmitted, a “smoking gun” indicator of packet loss.

Key counters:

TcpRetransSegs - Total segments retransmitted
TcpExtTCPSACKRecovery - Fast retransmit via SACK (modern, efficient)
TcpExtTCPSACKReorder – Fast reordering of received packets
TcpExtTCPRenoRecovery - Fast retransmit via duplicate ACKs (legacy, less efficient)
TcpExtTCPTimeouts - Full RTO expirations (forces slow start, severely impacts throughput)

Understanding the difference: SACK tells the sender exactly which packets are missing, enabling selective retransmission. Without SACK, TCP relies on three duplicate ACKs and may retransmit unnecessarily. Check SACK status: cat /proc/sys/net/ipv4/tcp_sack (should be 1).

Practical example:

TcpRetransSegs           245
TcpExtTCPSACKRecovery     12
TcpExtTCPSACKReorder     243
TcpExtTCPTimeouts          2

This shows 245 retransmitted segments with 12 SACK recovery events (indicating TCP is handling packet loss effectively) and 2 full timeouts. In addition, there were 243 packet reordering events.

While SACK generally does a good job at quickly recovering from data lost in transit, any SACK activity indicates a suboptimal flow. Large numbers of SACK recovery events and retransmitted segments are a key indicator of performance problems and help inform the next troubleshooting steps.

Reordering is another condition that TCP is designed to handle, but can lead to decreased performance. High levels of reorder events indicate significant out-of-order packet delivery, requiring the receiver to buffer and reorder packets before making the data available to the application. This means higher latency and lower throughput.

Putting It All Together: A Diagnostic Workflow

When someone reports slow TCP performance, here's my systematic approach to understanding what's happening:

# While transfer is running
ss -tin dst <remote_ip>

Observe:

Is window scaling enabled and appropriate? (wscale should not be 0,0)
What RTT is TCP measuring?
Are MSS/PMTU values appropriate for the path?
Are there retransmits? (retrans field)
Is Send-Q or Recv-Q consistently full? (indicates which side is limiting throughput)

nstat -az | grep -i tcp; sleep 10; nstat -az | grep -i tcp

Observe:

TcpRetransSegs rate (shows retransmission frequency)
TcpExtTCPTimeouts (indicates severe packet loss or delays)
TcpExtTCPSACKRecovery vs TcpExtTCPRenoRecovery (which recovery mechanism is active)
TcpExtTCPSACKReorder (show packets received out of order)
TcpExtTCPSACKReneging (if non-zero, receiver behavior is inconsistent)

Only after understanding TCP's view of the connection should you move to packet captures. The TCP stack statistics usually tell you what you need to know.

Common Patterns and What They Tell Us

Pattern: High throughput briefly, then drops to almost nothing

Observe: Recv-Q consistently full
Interpretation: Receiver application can't read data fast enough; TCP flow control is limiting the sender

Pattern: Throughput limited regardless of bandwidth

Observe: wscale:0,0 in ss output
Interpretation: Window scaling disabled; TCP window limited to 64KB, capping throughput based on RTT

Pattern: Many retransmits but ping shows minimal loss

Observe: High TcpExtTCPTimeouts, RTO much higher than current RTT
Interpretation: Previous timeout events have inflated RTO; TCP is being conservative while recovering

Pattern: Transfer stalls after sending some data

Observe: MSS/PMTU values, connection becomes unresponsive
Interpretation: PMTU black hole—packets larger than actual path MTU are being dropped silently

Pattern: Retransmits primarily TcpExtTCPRenoRecovery

Observe: SACK disabled (tcp_sack=0)
Interpretation: TCP using legacy duplicate ACK recovery, less efficient than SACK for multiple losses

A Note on Packet Captures

I've deliberately kept packet captures out of this article because they deserve their own deep dive. Tools like tcpdump and Wireshark are incredibly powerful, but they're also time-consuming and generate massive amounts of data. In my experience, most TCP performance problems can be diagnosed using ss and nstat alone.

That said, there are cases where you absolutely need captures—particularly when you suspect middlebox interference, need to verify congestion control behavior, or want to see the exact timing of events during connection establishment. I'll cover practical packet capture analysis for TCP troubleshooting in a future article.

Conclusion

The next time someone reports slow TCP performance, start by understanding what TCP is experiencing on both endpoints. Use ss to examine per-socket state and nstat to observe system-wide patterns. Look at window scaling, RTT measurements, RTO values, MSS/PMTU settings, and retransmission behavior.

These tools give you direct visibility into TCP's decision-making process and help you answer the critical questions:

What rate is TCP achieving?
Which endpoint is limiting throughput?
How is TCP adapting to the network conditions?
Is packet loss being handled effectively?

Understanding TCP's internal state helps you diagnose issues systematically and explain what's happening under the hood. Sometimes the explanation points to configuration that needs adjustment, sometimes it reveals application behavior that needs attention, and sometimes it shows that TCP is doing exactly what it should given the network conditions it's experiencing.

The goal isn't to find something to blame—it's to understand what's happening so you can have an informed conversation about next steps.

References

Man Pages

ss(8) - Linux manual page https://man7.org/linux/man-pages/man8/ss.8.html Socket statistics utility - part of the iproute2 package
nstat(8) - Linux manual page https://man7.org/linux/man-pages/man8/nstat.8.html Network statistics tool for monitoring kernel SNMP counters

IETF RFCs

RFC 6298 - Computing TCP's Retransmission Timer https://www.rfc-editor.org/rfc/rfc6298.html V. Paxson, M. Allman, J. Chu, M. Sargent (June 2011) Defines the standard algorithm for TCP RTO calculation
RFC 7323 - TCP Extensions for High Performance https://www.rfc-editor.org/rfc/rfc7323.html D. Borman, B. Braden, V. Jacobson, R. Scheffenegger (September 2014) Specifies TCP Window Scale and Timestamps options
RFC 2018 - TCP Selective Acknowledgment Options https://www.rfc-editor.org/rfc/rfc2018.html M. Mathis, J. Mahdavi, S. Floyd, A. Romanow (October 1996) Defines SACK mechanism for TCP
RFC 1191 - Path MTU Discovery https://www.rfc-editor.org/rfc/rfc1191.html J. Mogul, S. Deering (November 1990) Describes technique for dynamically discovering path MTU