top of page
All Posts
The Hidden Problem Breaking Your Packet Analysis
The Troubleshooting Scenario You're deep into troubleshooting a performance issue. The application team reports intermittent slowdowns on a critical database connection. Response times spike from 2ms to 200ms randomly, then return to normal. Users are complaining, and you need answers. You start with TCP analysis (referencing the techniques from my previous articles): SACK blocks are present, indicating selective packet loss Retransmission rate sits at 0.8% (well above the he
robertbmacdonald
Dec 11, 20257 min read
TCP Troubleshooting with Packet Captures: When Wireshark is Your Only Tool
You're staring at the same problem: a server that should be sending 4Gbps is only pushing 120Mbps. But this time, you don't have server access. Maybe it's a remote site. Maybe it's vendor equipment. Maybe it's a colleague handing you a capture and asking for help. All you have are packet captures from both endpoints. The challenge: extract the same diagnostic insights from PCAPs that you'd get from ss and nstat on Linux. This is practical troubleshooting with Wireshark. No
robertbmacdonald
Dec 3, 20256 min read
Troubleshooting Slow TCP Transfers on Windows
After publishing my article on Linux TCP troubleshooting using ss and nstat, I got several requests for the Windows equivalent. The short answer: there isn't one. Windows exposes per-socket TCP statistics through the SIO_TCP_INFO API in the form of TCP_INFO_v[01] data structures. These data structures include (some of) the data that we need! Per-socket RTT and minimum RTT Congestion window (cwnd) Send/receive window sizes Bytes in flight Retransmit counters (fast retransmits
robertbmacdonald
Dec 3, 20251 min read
Troubleshooting Slow TCP Transfers: A Stack-Level Approach
"Something is wrong with the network. I used to get 4Gbps transfers but now I'm only getting 120Mbps. Did you change something recently?" Sound familiar? If you've spent any time supporting production systems, you've probably heard some variation of this complaint. Before jumping to conclusions about where the problem lies, we need to understand what's actually happening at the TCP layer on both endpoints. What This Article Is NOT Before we dive in, let me be crystal clear:
robertbmacdonald
Nov 19, 20257 min read
Stop Collecting Everything: A Better Approach to Network Telemetry
After 15+ years in network engineering, I've learned that effective telemetry starts with one question: what matters to the business? The telemetry paradox is real. You think collecting everything gives you visibility and you might need it "someday", but it creates noise. And noise is expensive—in storage costs, in analyst time, and in your ability to spot real problems. Start With Business Outcomes, Not Technical Metrics Before you enable a single collector, ask what the bus
robertbmacdonald
Nov 13, 20253 min read
When the Network Is Clean: Troubleshooting the Host Network Stack
"...but the network is still slow." You've dropped everything to jump on an urgent troubleshooting conference call. A vital application has started running slowly and the business is hurting. Everyone looks at you and says "the network is slow!" I recently shared how I troubleshoot "the network is slow" complaints in under 30 minutes. Using this data-driven method, the routers and switches in the network are systematically checked and verified. But what happens when the route
robertbmacdonald
Nov 5, 20259 min read
It's Always The Network (Until You Prove It Isn't)
"The network is slow." I've heard this at 3am, in conference rooms, and in Slack channels more times than I can count. And about 80% of the time, after I dig into it, the network is fine. The problem is somewhere else entirely. But here's the thing—you can't just say 'it's not the network' and walk away. You need data to either find the problem or clear the network so troubleshooting can move forward. Over the years, I've developed a systematic approach based on Brendan G
robertbmacdonald
Oct 28, 20255 min read
bottom of page