Learn Zig Series (#87) - WebSocket Protocol
Learn Zig Series (#87) - WebSocket Protocol

What will I learn?
- Why WebSocket exists, and what it does that plain HTTP simply cannot;
- How the opening handshake upgrades an ordinary HTTP request into a persistent, two-way connection;
- How to compute the
Sec-WebSocket-Acceptvalue with SHA-1 and base64; - How a WebSocket frame is laid out on the wire, bit by bit;
- How to parse incoming frames, including the 16- and 64-bit extended length forms;
- Why client-to-server frames are masked, and how the XOR masking actually works;
- How to encode text, binary, and control frames (ping, pong, close);
- How to unit-test a protocol implementation without a live browser anywhere in sight.
Requirements
- A working modern computer running macOS, Windows or Ubuntu;
- An installed Zig 0.14+ distribution (download from ziglang.org);
- The ambition to learn Zig programming.
Difficulty
- Intermediate
Curriculum (of the Learn Zig Series):
- Zig Programming Tutorial - ep001 - Intro
- Learn Zig Series (#2) - Hello Zig, Variables and Types
- Learn Zig Series (#3) - Functions and Control Flow
- Learn Zig Series (#4) - Error Handling (Zig's Best Feature)
- Learn Zig Series (#5) - Arrays, Slices, and Strings
- Learn Zig Series (#6) - Structs, Enums, and Tagged Unions
- Learn Zig Series (#7) - Memory Management and Allocators
- Learn Zig Series (#8) - Pointers and Memory Layout
- Learn Zig Series (#9) - Comptime (Zig's Superpower)
- Learn Zig Series (#10) - Project Structure, Modules, and File I/O
- Learn Zig Series (#11) - Mini Project: Building a Step Sequencer
- Learn Zig Series (#12) - Testing and Test-Driven Development
- Learn Zig Series (#13) - Interfaces via Type Erasure
- Learn Zig Series (#14) - Generics with Comptime Parameters
- Learn Zig Series (#15) - The Build System (build.zig)
- Learn Zig Series (#16) - Sentinel-Terminated Types and C Strings
- Learn Zig Series (#17) - Packed Structs and Bit Manipulation
- Learn Zig Series (#18b) - Addendum: Async Returns in Zig 0.16
- Learn Zig Series (#19) - SIMD with @Vector
- Learn Zig Series (#20) - Working with JSON
- Learn Zig Series (#21) - Networking and TCP Sockets
- Learn Zig Series (#22) - Hash Maps and Data Structures
- Learn Zig Series (#23) - Iterators and Lazy Evaluation
- Learn Zig Series (#24) - Logging, Formatting, and Debug Output
- Learn Zig Series (#25) - Mini Project: HTTP Status Checker
- Learn Zig Series (#26) - Writing a Custom Allocator
- Learn Zig Series (#27) - C Interop: Calling C from Zig
- Learn Zig Series (#28) - C Interop: Exposing Zig to C
- Learn Zig Series (#29) - Inline Assembly and Low-Level Control
- Learn Zig Series (#30) - Thread Safety and Atomics
- Learn Zig Series (#31) - Memory-Mapped I/O and Files
- Learn Zig Series (#32) - Compile-Time Reflection with @typeInfo
- Learn Zig Series (#33) - Building a State Machine with Tagged Unions
- Learn Zig Series (#34) - Performance Profiling and Optimization
- Learn Zig Series (#35) - Cross-Compilation and Target Triples
- Learn Zig Series (#36) - Mini Project: CLI Task Runner
- Learn Zig Series (#37) - Markdown to HTML: Tokenizer and Lexer
- Learn Zig Series (#38) - Markdown to HTML: Parser and AST
- Learn Zig Series (#39) - Markdown to HTML: Renderer and CLI
- Learn Zig Series (#40) - Key-Value Store: In-Memory Store
- Learn Zig Series (#41) - Key-Value Store: Write-Ahead Log
- Learn Zig Series (#42) - Key-Value Store: TCP Server
- Learn Zig Series (#43) - Key-Value Store: Client Library and Benchmarks
- Learn Zig Series (#44) - Image Tool: Reading and Writing PPM/BMP
- Learn Zig Series (#45) - Image Tool: Pixel Operations
- Learn Zig Series (#46) - Image Tool: CLI Pipeline
- Learn Zig Series (#47) - Build a Shell: Parsing Commands
- Learn Zig Series (#48) - Build a Shell: Process Spawning
- Learn Zig Series (#49) - Build a Shell: Built-in Commands
- Learn Zig Series (#50) - Build a Shell: Job Control and Signals
- Learn Zig Series (#51) - HTTP Server: Accept Loop and Parsing
- Learn Zig Series (#52) - HTTP Server: Router and Responses
- Learn Zig Series (#53) - HTTP Server: Static Files and MIME
- Learn Zig Series (#54) - HTTP Server: Middleware and Logging
- Learn Zig Series (#55) - ECS Game Engine: Architecture
- Learn Zig Series (#56) - ECS Game Engine: Component Storage
- Learn Zig Series (#57) - ECS Game Engine: Systems and Queries
- Learn Zig Series (#58) - ECS Game Engine: Terminal Rendering
- Learn Zig Series (#59) - Assembler: Instruction Encoding
- Learn Zig Series (#60) - Assembler: Two-Pass Assembly
- Learn Zig Series (#61) - Assembler: Disassembler and Binary Inspector
- Learn Zig Series (#62) - File Systems: Reading Directories and Metadata
- Learn Zig Series (#63) - File Watching: Detecting Changes
- Learn Zig Series (#64) - Process Management: Fork, Exec, Wait
- Learn Zig Series (#65) - Pipes and Inter-Process Communication
- Learn Zig Series (#66) - Shared Memory and Semaphores
- Learn Zig Series (#67) - Signal Handling Deep Dive
- Learn Zig Series (#68) - Unix Domain Sockets
- Learn Zig Series (#69) - Daemonization: Background Services
- Learn Zig Series (#70) - Timers and Scheduling
- Learn Zig Series (#71) - Resource Limits and Capabilities
- Learn Zig Series (#72) - System Call Wrappers
- Learn Zig Series (#73) - seccomp and Sandboxing
- Learn Zig Series (#74) - ptrace: Process Tracing
- Learn Zig Series (#75) - Reading Kernel State from /proc and /sys
- Learn Zig Series (#76) - Mini Project: Process Monitor
- Learn Zig Series (#77) - Mini Project: File Sync Tool - Part 1
- Learn Zig Series (#78) - Mini Project: File Sync Tool - Part 2: Delta Transfer
- Learn Zig Series (#79) - Mini Project: File Sync Tool - Part 3: Network Protocol
- Learn Zig Series (#80) - Mini Project: File Sync Tool - Part 4: Polish
- Learn Zig Series (#81) - UDP Sockets and Datagrams
- Learn Zig Series (#82) - DNS Resolver from Scratch
- Learn Zig Series (#83) - DNS Server Implementation
- Learn Zig Series (#84) - HTTP/1.1 Deep Dive
- Learn Zig Series (#85) - HTTP/2 Frames and Streams
- Learn Zig Series (#86) - TLS via C Interop
- Learn Zig Series (#87) - WebSocket Protocol (this post)
Learn Zig Series (#87) - WebSocket Protocol
Solutions to Episode 86 Exercises
Last episode I left you three exercises on top of the TlsClient we wrapped around OpenSSL -- ALPN negotiation, a non-blocking client, and a tiny HTTPS server. They all build on that same struct, so keep the episode 86 file open beside this one.
Exercise 1: Add ALPN negotiation
// Reuses TlsClient and the `c` namespace from episode 86.
/// Advertise the protocols we speak, most-preferred first. The wire format is
/// NOT a comma-separated string: it's a sequence of length-prefixed entries,
/// each one byte of length followed by that many ASCII bytes.
pub fn setAlpn(self: *TlsClient) void {
const protos = "\x02h2\x08http/1.1"; // "h2", then "http/1.1"
_ = c.SSL_set_alpn_protos(self.ssl, protos, protos.len);
}
/// After a successful handshake, ask which protocol the server agreed to.
pub fn selectedAlpn(self: *TlsClient) []const u8 {
var data: [*c]const u8 = null;
var len: c_uint = 0;
c.SSL_get0_alpn_selected(self.ssl, &data, &len);
if (len == 0) return ""; // server ignored ALPN -> fall back to http/1.1
return data[0..len];
}
The trap people fall into is the wire format. ALPN is not "h2,http/1.1" -- it's length-prefixed, so h2 becomes the two bytes 0x02 'h' '2'. Get that wrong and OpenSSL silently advertises garbage and the server picks nothing. After the handshake, SSL_get0_alpn_selected hands back a pointer-plus-length (no NUL terminator, hence the data[0..len] slice), and that single string is how a real client decides between speaking HTTP/2 or HTTP/1.1 over the same port 443.
Exercise 2: Make the client non-blocking
const std = @import("std");
const posix = std.posix;
/// Flip the socket into non-blocking mode (recall O_NONBLOCK from the I/O episodes).
pub fn setNonBlocking(fd: posix.fd_t) !void {
var flags = try posix.fcntl(fd, posix.F.GETFL, 0);
flags |= 1 << 11; // O_NONBLOCK == 0o4000 on Linux
_ = try posix.fcntl(fd, posix.F.SETFL, flags);
}
/// Park on poll() until the fd is ready in the direction OpenSSL asked for.
fn waitReady(fd: posix.fd_t, want_write: bool) !void {
var pfd = [_]posix.pollfd{.{
.fd = fd,
.events = if (want_write) posix.POLL.OUT else posix.POLL.IN,
.revents = 0,
}};
_ = try posix.poll(&pfd, -1);
}
/// Retry the handshake, suspending on the fd between WantRead/WantWrite, so a
/// single thread can drive many connections at once.
pub fn handshakeNonBlocking(self: *TlsClient) !void {
while (true) {
self.handshake() catch |err| switch (err) {
error.WantRead => { try waitReady(self.socket, false); continue; },
error.WantWrite => { try waitReady(self.socket, true); continue; },
else => return err,
};
return;
}
}
This is where last episode's decision to surface WantRead and WantWrite as distinct Zig errors pays off. A non-blocking socket never sleeps inside OpenSSL; instead SSL_connect returns "I need to read more" or "I need to write", and we translate that into a poll on the right event. The whole point is that between two such waits the thread is free to service other connections -- which is exactly the muscle we'll need once we have long-lived sockets that stay open for minutes.
Exercise 3: Build a tiny HTTPS server
/// The server side mirrors the client: a different method, a loaded cert+key,
/// and SSL_accept instead of SSL_connect.
pub fn initServerCtx(cert_path: [:0]const u8, key_path: [:0]const u8) !*c.SSL_CTX {
const ctx = c.SSL_CTX_new(c.TLS_server_method()) orelse return error.ContextInit;
errdefer c.SSL_CTX_free(ctx);
if (c.SSL_CTX_use_certificate_file(ctx, cert_path, c.SSL_FILETYPE_PEM) != 1)
return error.ContextInit;
if (c.SSL_CTX_use_PrivateKey_file(ctx, key_path, c.SSL_FILETYPE_PEM) != 1)
return error.ContextInit;
return ctx;
}
pub fn serveOne(ctx: *c.SSL_CTX, client_fd: std.posix.socket_t) !void {
const ssl = c.SSL_new(ctx) orelse return error.ContextInit;
defer c.SSL_free(ssl);
_ = c.SSL_set_fd(ssl, @intCast(client_fd));
if (c.SSL_accept(ssl) != 1) return error.HandshakeFailed; // server-side handshake
const resp = "HTTP/1.1 200 OK\r\nContent-Length: 5\r\n\r\nhello";
_ = c.SSL_write(ssl, resp.ptr, @intCast(resp.len));
}
The whole asymmetry of a TLS server versus a client is two function names: TLS_server_method instead of TLS_client_method, and SSL_accept instead of SSL_connect. Everything else -- the SSL_CTX, the per-connection SSL, the defer/errdefer cleanup -- is identical. Generate the throwaway cert with openssl req -x509 -newkey rsa:2048 -nodes -keyout key.pem -out cert.pem -days 1 and point a browser at https://localhost:port (it'll warn about the self-signed cert, which is expected).
At the very end of episode 86 I wrote that the next step was "the upgrade dance that turns an ordinary HTTPS request into a persistent, bidirectional channel... the same wss:// way your browser does it." Well -- here we are ;-) Today we build exactly that channel: WebSocket. And the lovely thing is that every layer underneath it is already in our hands. TCP sockets came in episode 21, the HTTP request parsing in episode 84, binary framing in episode 85, and TLS in episode 86. WebSocket is the protocol that stitches them into something a chat app or a live dashboard can actually use.
Why WebSocket exists at all
Plain HTTP has one structural limitation that no amount of cleverness fully removes: the client asks, the server answers, and then the exchange is over. The server cannot speak first. If you want the server to push you a new chat message the instant it arrives, classic HTTP forces ugly workarounds -- polling every second (wasteful and laggy), or long-polling (a request that the server holds open until it has something to say, then you immediately reopen another). Both fight the protocol in stead of working with it.
WebSocket solves this honestly. You start with a normal HTTP request, ask the server to upgrade the connection, and if it agrees, that same TCP socket stops speaking HTTP and starts speaking a tiny, symmetric, message-oriented protocol where either side can send a message at any time. No new connection, no polling, no headers repeated on every message. One socket, kept open, bytes flowing both directions. That's it. Bam, jonguh!
The key mental shift: after the handshake, WebSocket is not request/response anymore. It's a bidirectional stream of discrete messages, each one wrapped in a small binary frame. So this episode has two halves -- the one-time HTTP handshake that opens the door, and the framing protocol that carries everything afterward.
The opening handshake
A WebSocket connection starts life as an ordinary HTTP/1.1 GET, the kind we parsed back in episode 84, but with a few special headers:
GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
The Upgrade: websocket and Connection: Upgrade headers signal intent. Sec-WebSocket-Version: 13 pins the protocol version (13 is the version -- RFC 6455). The interesting one is Sec-WebSocket-Key: 16 random bytes, base64-encoded by the client. It is not a security token (it's sent in the clear, so it secures nothing); its only job is to prove that the server actually understood the WebSocket handshake and didn't just blindly echo a cached HTTP response.
The server proves comprehension by a fixed ritual. Take the client's key string, concatenate a magic GUID defined in the RFC, SHA-1 the result, base64-encode the 20-byte digest, and send it back in Sec-WebSocket-Accept. The magic string is a constant -- 258EAFA5-E914-47DA-95CA-C5AB0DC85B11 -- chosen precisely because no naive HTTP cache would ever append it on its own. Here's the computation in pure Zig, using the standard library's SHA-1 and base64 (no C interop needed this time):
const std = @import("std");
const ws_magic = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";
/// Compute Sec-WebSocket-Accept from the client's Sec-WebSocket-Key.
/// SHA-1 of (key ++ magic) is 20 bytes; base64 of 20 bytes is exactly 28 chars.
pub fn computeAccept(key: []const u8, out: *[28]u8) void {
var sha1 = std.crypto.hash.Sha1.init(.{});
sha1.update(key);
sha1.update(ws_magic);
var digest: [20]u8 = undefined;
sha1.final(&digest);
_ = std.base64.standard.Encoder.encode(out, &digest);
}
Note how the update calls let us hash the key and the magic constant without first allocating a joined buffer -- a streaming hash is the natural fit (we met the same pattern with file hashing back in the sync-tool project). The server's reply is then a bog-standard 101 status with the upgrade headers:
pub fn writeHandshakeResponse(key: []const u8, out: []u8) ![]u8 {
var accept: [28]u8 = undefined;
computeAccept(key, &accept);
return std.fmt.bufPrint(out,
"HTTP/1.1 101 Switching Protocols\r\n" ++
"Upgrade: websocket\r\n" ++
"Connection: Upgrade\r\n" ++
"Sec-WebSocket-Accept: {s}\r\n\r\n",
.{accept},
);
}
Once the client receives that 101 Switching Protocols, both sides forget HTTP entirely. The socket is now a WebSocket. From here on, every byte is part of the framing protocol.
The frame format, bit by bit
This is where episode 17 (packed structs and bit manipulation) and episode 85 (binary framing) come roaring back. A WebSocket frame is compact -- the header is as small as 2 bytes -- and it packs several fields into individual bits of the first two octets:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
: Masking-key (4 bytes, if MASK set) :
+---------------------------------------------------------------+
: Payload Data continued ... :
+---------------------------------------------------------------+
Let me walk the fields. The first byte holds the FIN bit (1 means "this is the final fragment of a message" -- WebSocket can split one big message across frames), three reserved bits (RSV1-3, zero unless an extension negotiated otherwise), and a 4-bit opcode. The opcode is the whole vocabulary of the protocol, so it's a perfect non-exhaustive enum (episode 6):
pub const Opcode = enum(u4) {
continuation = 0x0, // a continuation of a fragmented message
text = 0x1, // UTF-8 text payload
binary = 0x2, // raw binary payload
close = 0x8, // closing handshake
ping = 0x9, // heartbeat request
pong = 0xA, // heartbeat reply
_, // forward-compatible: unknown opcodes don't crash us
};
The second byte starts with the MASK bit, then a 7-bit payload length. That length has three forms, which is the one genuinely fiddly part of parsing: if it's 0-125, that's the actual length. If it's exactly 126, the real length is the next 2 bytes as a big-endian u16. If it's 127, the real length is the next 8 bytes as a big-endian u64. This variable-length trick keeps small frames tiny while still allowing gigabyte payloads -- the same philosophy as varint encodings, just with three fixed buckets in stead of a continuation bit.
Masking: why client frames are scrambled
Here's a rule that surprises everyone the first time: every frame a client sends to a server MUST be masked, and every frame a server sends back MUST NOT be. Masking means XOR-ing each payload byte with one of four rotating key bytes that the client picks at random and includes in the frame.
Why on earth? It's not encryption -- the mask key is right there in the frame, so anyone reading the bytes can trivially unmask. The real reason is a defence against a specific attack on intermediaries. Before WebSocket was hardened, a malicious page could craft payloads that, passing through an old caching proxy that half-understood HTTP, looked enough like a fake HTTP request to poison the proxy's cache for other users. Forcing the client to XOR its payload with a fresh random key makes the bytes-on-the-wire unpredictable, so an attacker can't reliably smuggle a chosen plaintext past a confused proxy. The server, sitting at the trusted end, has no such worry and never masks.
The masking itself is delightfully simple -- byte i is XOR-ed with mask[i % 4]:
fn applyMask(payload: []u8, mask: [4]u8) void {
for (payload, 0..) |*byte, i| {
byte.* ^= mask[i % 4];
}
}
Because XOR is its own inverse, the same function both masks and unmasks. The server calls it to recover the plaintext a client sent; a client would call it (with a random key) before sending. Nota bene: that i % 4 is on the hot path for large payloads -- we'll come back to it in the performance section.
Parsing an incoming frame
Now we assemble the pieces. Our parser takes a byte buffer (whatever we've read off the socket so far) and either returns a fully-decoded frame plus how many bytes it consumed, or null to mean "not enough bytes yet, come back when you've read more" -- the same incremental-reader contract the HTTP/2 FrameReader used last episode. Honest error handling (episode 4) covers the malformed cases:
pub const Frame = struct {
fin: bool,
opcode: Opcode,
payload: []u8, // already unmasked, points into the input buffer
};
pub const Parsed = struct { frame: Frame, consumed: usize };
pub fn parseFrame(buf: []u8) !?Parsed {
if (buf.len < 2) return null; // need at least the 2-byte header
const fin = (buf[0] & 0x80) != 0;
const opcode: Opcode = @enumFromInt(@as(u4, @truncate(buf[0] & 0x0F)));
const masked = (buf[1] & 0x80) != 0;
var len: u64 = buf[1] & 0x7F;
var off: usize = 2;
if (len == 126) {
if (buf.len < off + 2) return null;
len = std.mem.readInt(u16, buf[off..][0..2], .big);
off += 2;
} else if (len == 127) {
if (buf.len < off + 8) return null;
len = std.mem.readInt(u64, buf[off..][0..8], .big);
off += 8;
}
var mask: [4]u8 = .{ 0, 0, 0, 0 };
if (masked) {
if (buf.len < off + 4) return null;
@memcpy(&mask, buf[off..][0..4]);
off += 4;
}
const total = off + @as(usize, @intCast(len));
if (buf.len < total) return null; // payload not fully arrived yet
const payload = buf[off..total];
if (masked) applyMask(payload, mask);
return Parsed{
.frame = .{ .fin = fin, .opcode = opcode, .payload = payload },
.consumed = total,
};
}
Every return null is a "would block" -- the frame straddles more data than we've buffered, so the caller reads more and tries again. That pattern is what lets one parser sit on top of a stream socket where reads arrive in arbitrary chunks. The @truncate(buf[0] & 0x0F) pulls the low four bits into the u4 the opcode enum expects, and @enumFromInt lands on _ for any opcode we don't recognise rather than panicking.
Encoding frames to send
The server side (unmasked) is the mirror image, and shorter because we don't mask. We set FIN, write the opcode, choose the right length encoding, and copy the payload:
pub fn encodeFrame(opcode: Opcode, payload: []const u8, out: []u8) !usize {
out[0] = 0x80 | @as(u8, @intFromEnum(opcode)); // FIN=1, single unfragmented frame
var i: usize = 2;
if (payload.len <= 125) {
out[1] = @intCast(payload.len);
} else if (payload.len <= 0xFFFF) {
out[1] = 126;
std.mem.writeInt(u16, out[2..4], @intCast(payload.len), .big);
i = 4;
} else {
out[1] = 127;
std.mem.writeInt(u64, out[2..10], payload.len, .big);
i = 10;
}
if (out.len < i + payload.len) return error.BufferTooSmall;
@memcpy(out[i..][0..payload.len], payload);
return i + payload.len;
}
Because we're the server, the MASK bit in out[1] stays 0 -- we never set it, so we never write a masking key. The 0x80 on the first byte is the FIN flag, meaning "complete message in one frame", which is what you want 99% of the time (fragmentation is for streaming a message whose length you don't know up front).
Control frames: close, ping, pong
Three opcodes are control frames, and they have two extra rules: their payload must be 125 bytes or fewer, and they must never be fragmented. Ping and pong are the heartbeat -- either side sends a ping, the other must reply with a pong echoing the same payload, which lets you detect a half-dead connection that TCP hasn't noticed yet. Close is the polite shutdown: an optional 2-byte big-endian status code followed by a UTF-8 reason.
/// Build a close frame: a 2-byte big-endian status code plus an optional reason.
/// 1000 = normal, 1001 = going away, 1002 = protocol error (see RFC 6455 ss 7.4).
pub fn encodeClose(code: u16, reason: []const u8, out: []u8) !usize {
var payload: [125]u8 = undefined;
if (reason.len > 123) return error.ReasonTooLong; // 2 bytes go to the code
std.mem.writeInt(u16, payload[0..2], code, .big);
@memcpy(payload[2..][0..reason.len], reason);
return encodeFrame(.close, payload[0 .. 2 + reason.len], out);
}
/// A pong MUST echo the ping's payload verbatim.
pub fn encodePong(ping_payload: []const u8, out: []u8) !usize {
return encodeFrame(.pong, ping_payload, out);
}
A correct WebSocket endpoint answers a close with its own close and then stops sending, and answers a ping with a pong promptly. Those little courtesies are what keep a long-lived connection healthy instead of silently rotting behind a NAT timeout.
Testing without a browser
The beauty of pushing all of this into pure functions is that the entire protocol is testable with byte arrays -- no socket, no browser, no live peer. First, the handshake against the canonical example straight out of RFC 6455 (this exact key/accept pair is in the spec, so it's a perfect regression anchor):
test "accept value matches the RFC 6455 example" {
var out: [28]u8 = undefined;
computeAccept("dGhlIHNhbXBsZSBub25jZQ==", &out);
try std.testing.expectEqualStrings("s3pPLMBiTxaQ9kYGzzhZRbK+xOo=", &out);
}
Then a masked client frame carrying the text Hi. The mask is 37 fa 21 3d; H (0x48) XOR 0x37 is 0x7f, i (0x69) XOR 0xfa is 0x93, so the masked bytes on the wire are 7f 93. Parsing must unmask them back to Hi:
test "parse a masked client text frame" {
var buf = [_]u8{ 0x81, 0x82, 0x37, 0xfa, 0x21, 0x3d, 0x7f, 0x93 };
// 0x81 = FIN+text, 0x82 = MASK bit + length 2
const p = (try parseFrame(&buf)).?;
try std.testing.expect(p.frame.fin);
try std.testing.expectEqual(Opcode.text, p.frame.opcode);
try std.testing.expectEqualStrings("Hi", p.frame.payload);
try std.testing.expectEqual(@as(usize, 8), p.consumed);
}
test "encode then parse round-trips a server frame" {
var out: [64]u8 = undefined;
const n = try encodeFrame(.binary, "zig!", &out);
const p = (try parseFrame(out[0..n])).?;
try std.testing.expectEqual(Opcode.binary, p.frame.opcode);
try std.testing.expectEqualStrings("zig!", p.frame.payload);
}
Notice that parseFrame handles both masked (client) and unmasked (server) frames, so the same parser tests both directions. The forementioned "return null when short" contract is worth a test too -- feed it a single byte and assert you get null, proving the incremental reader won't read past the buffer.
Performance considerations
Two things matter once you're moving real traffic. The first is masking throughput. That tidy payload[i] ^ mask[i % 4] loop does a modulo per byte, and on a megabyte payload that's a million modulos. The fix is to recognise that the mask repeats every 4 bytes, so you can load the 4-byte key into a u32 and XOR a word at a time, or -- even better on modern hardware -- lean on the @Vector SIMD we covered in episode 19 to mask 16 or 32 bytes per instruction. The naive version is correct and fine for chat messages; the vectorised version is what you reach for when you're proxying video.
The second is buffering and partial frames. A frame's payload can be larger than one read() returns, so your reader must accumulate bytes until a whole frame has arrived -- exactly the return null contract above. The mistake to avoid is reallocating that accumulation buffer on every read; size it once (16 KB is a sane default, matching a TLS record from episode 86) and grow only when a genuinely large frame demands it. Nota bene: also enforce a maximum frame size, or a hostile peer announcing a 2 KB header claiming a u64 payload of 16 exabytes will happily make you try to allocate the universe.
How this compares to C, Rust, and Go
In C, you'd hand-roll exactly this bit-twiddling -- and libraries like libwebsockets do, with a great deal of careful pointer arithmetic and manual length checking. The framing logic is identical; what C lacks is Zig's @enumFromInt landing safely on a non-exhaustive _, and slices that carry their length so an over-long payload claim can't walk off the end of your buffer unnoticed.
In Go, gorilla/websocket (and now nhooyr.io/websocket) gives you conn.ReadMessage() / conn.WriteMessage() and hides every byte we just decoded. It's productive and the goroutine-per-connection model makes the concurrency trivial. The cost is the usual one -- you're inside Go's runtime and its allocation patterns, with less control over exactly when and where buffers are reused.
In Rust, tungstenite (sync) and tokio-tungstenite (async) are the standard answer, memory-safe by construction and rigorous about the masking and UTF-8-validation rules the RFC demands. It's excellent, and arguably the most correct-by-default of the lot, at the price of Rust's steeper learning curve around async lifetimes.
Where does Zig sit? Right where it likes to: you write the protocol yourself, in maybe 150 lines, you see every bit, you control every allocation, and the result cross-compiles to a tiny static binary with no runtime. For a learning exercise it's unbeatable, because nothing is hidden. For production you might still reach for a hardened library -- but now you'll actually understand what it's doing under the hood, which is the entire point of building it from scratch ;-)
Where this is heading
We now have every piece of the WebSocket protocol as a set of pure, tested functions: the handshake, the frame parser, the encoder, masking, and the control frames. What we don't have yet is the thing that holds it all together over a live connection -- the loop that accepts a socket, performs the upgrade, then sits there reading frames and reacting to them, answering pings, honouring closes, and tracking per-connection state across many clients at once. All the non-blocking and state-machine groundwork from the last several episodes points straight at that. We've built the protocol; next we put it to work.
The handshake, the framing, the masking -- they aren't separate party tricks, they're the layers of one real-time channel you can now build with your eyes open.
Exercises
Detect fragmentation. Extend the parser's caller to handle a message split across frames: a first frame with
opcode = textandfin = false, followed by one or moreopcode = continuationframes, ending withfin = true. Concatenate the payloads into a single message and assert the opcode of the assembled message is taken from the first frame, not the continuations.Write a client-side encoder. Add an
encodeMaskedFramethat sets the MASK bit, generates 4 random key bytes (usestd.crypto.random.bytes), writes the key into the frame, and masks the payload. Round-trip it throughparseFrameand assert the recovered payload matches the original.Validate close codes. Write a function that takes a close frame's payload, extracts the 2-byte status code, and rejects the reserved/invalid codes (anything below 1000, plus 1004, 1005, 1006, and 1015, which the RFC says must never appear on the wire). Return a Zig error for an invalid code and the
u16for a valid one.