Virtually, there is no hard limit on the size of payloads.
All modern web servers and browsers support compressed content using the HTTP Accept- Encoding and Content-Encoding headers. Specific API requests and responses can utilize these headers by specifying the compression algorithm that is used for the content—for example, gzip. Compression can reduce network bandwidth and latencies by 50% or more.
The trade-off cost is the compute cycles to compress and decompress the content. This is typically small compared to the savings in network transit times.
- Ian Gorton
It generally follows the following diagram.

But the real-life network environment is a lot more complicated than this. Say, as of writing this(10th of Jan, 2025) the simplest network topology we have is as follows.

So, given the topology, we can apply compression on any layer of the network, Say if we apply end-to-end compression, it will be as follows:

Bear in mind, however, that when you use network service like CloudFlare and if you enable following example settings, it will recompress/redecompress:
However, the actual cost you pay for the compression might be negligible. I’ll touch on this later in the next section.
So, what do all of these mean?
And this is the exact reason why compression logic can be a good option.
There are two options you can go for :
Before we go further, we need to understand why the smaller it gets it performs better although that itself seems self-explanatory. The key thing is this - although the size of your data IS variable, the size of packet and segment you transfer over the network is not variable - it’s fixed!
So, much of how data is transferred over the network when the size goes beyond its fixed limit is by either fragmentation or segmentation. Knowing when it gets fragmented or segmented is crucial when the service is expected to deliver a significant amount of data and it definitely affects user experiences.
For example, maximum segment size(MSS) is 1460 bytes(approx, 1.5kb). So if you want to send the data, it gets segmented when the size of them exceeds - There comes performance gain. By sending less data, you achieve network efficiency, though it may come at the cost of additional processing on both the server and client sides.
As you can imagine, processing time for data depends on the specific compression implementation. For example, brotli1(brotli has three variants(1,9,11)), shows 98.3MB/s for compression and 334MB/s for decompression, which means to decompress 80KB, it will take around less than 0.001 second - 1mills for compression. For your reference, I will leave the table that shows some of the algorithms, its compression ratio and speed for compression and decompression:

Plus, these algorithms are run with single-thread by default. But bear in mind that if the content you are delivered is natively compressed such as PNG files, compressing it down again may give you less benefit. Thankfully, most of the data we render are subject to compression such as HTML, JSON, XML, Javascript and CSS.
This is the least of your worries because:
Let’s say we want to send 100KB data
Over 4G(~50Mbps) internet, for example, it takes 160ms transfer time
Using brotli1 the size will be around 30KB, which translates to 52ms transfer time.
To decompress it, it takes 1-2 ms.
Net gain: 107ms even including decompression overhead
A bit of CS101, when a threeway handshake is made, client and server exchange their congestion window(CWND) and advertised window(RWND). What it essentially does is it holds off on sending small amounts of data until:
This is what’s called Nagle’s algorithm and this is the default behaviour of TCP now. And the size of CWND is 10MSS by default on Linux, and macOS for example. With them, what it means is that:
Plus, considering that the network in the world is built on top of asynchronous assumptions, these are translated into ‘Any data that’s equal or smaller than window size may take the same time.
Given the knowledge we learned, we should be able to prove the followings:
Now, I enabled the encoding header for the pro QA environment, turned on brotli and gzip(which merely takes adding a few lines of code.)
To simulate client, I used debug mode with single thread program as follows:
#[tokio::main(flavor = "current_thread")]
async fn main() {
// get env variable
let compression_applied: Option<String> =
std::env::var("COMPRESS").map(|v| Some(v)).unwrap_or(None);
let decomp = std::env::var("DECOMP")
.map(|v| v == "true")
.unwrap_or(false);
if let Some(compression) = &compression_applied {
println!(
"RUN WITH COMPRESSION: {:?}\n decompression applied: {}",
compression, decomp
);
} else {
println!("RUN WITHOUT COMPRESSION");
}
let num_calls = 1000;
println!("RUNNING {num_calls} calls");
let handler = tokio::spawn(caller(num_calls, compression_applied, decomp));
let with_compression = handler.await.unwrap();
println!("Takes average {:?}", with_compression);
}
And run the program with the different environment variable :
# run the “program” with the different
program > non-compress.txt &
COMPRESS=gzip program > gzip-no-decode.txt &
COMPRESS=gzip DECOMP=true program > gzip-decode.txt &
COMPRESS=br program > brotli-no-decode.txt &
COMPRESS=br DECOMP=true program > brotli-decode.txt
wait
RUN WITH COMPRESSION: "gzip"
decompression applied: false
RUNNING 1000 calls
Status: 200 OK Took 1067 ms, size: 6852
Status: 200 OK Took 694 ms, size: 6852
...
...
Status: 200 OK Took 676 ms, size: 6852
Status: 200 OK Took 726 ms, size: 6852
Status: 200 OK Took 620 ms, size: 6852
Takes average 719
RUN WITH COMPRESSION: "gzip"
decompression applied: true
RUNNING 1000 calls
Status: 200 OK Took 1067 ms, size: 42523
Status: 200 OK Took 710 ms, size: 42523
...
...
Status: 200 OK Took 688 ms, size: 42523
Status: 200 OK Took 739 ms, size: 42523
Status: 200 OK Took 746 ms, size: 42523
Takes average 717
RUN WITH COMPRESSION: "br"
decompression applied: false
RUNNING 1000 calls
Status: 200 OK Took 926 ms, size: 5516
Status: 200 OK Took 904 ms, size: 5516
...
...
Status: 200 OK Took 661 ms, size: 5516
Status: 200 OK Took 675 ms, size: 5516
Status: 200 OK Took 761 ms, size: 5514
Takes average 716
RUN WITH COMPRESSION: "br"
decompression applied: true
RUNNING 1000 calls
Status: 200 OK Took 832 ms, size: 42523
Status: 200 OK Took 998 ms, size: 42523
...
...
Status: 200 OK Took 668 ms, size: 42523
Status: 200 OK Took 684 ms, size: 42523
Takes average 719
Compression/decompression should work fast (confirmed)
Whether you apply compression or not, TTI should be the same because it doesn’t exceed the window buffer If the size of data is negligible (confirmed).
If you do apply compression, the size of data you receive should get downsized If the size of data is negligible (confirmed).
For the large data that exceeds default window size, I could not confirm that it shows significant improvement both in size and response time because I couldn’t find endpoints, meaning we may not have a good use case for the compression although we can gain some wins in network bandwidth.