<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Hacking Scale by Better Stack]]></title><description><![CDATA[A weekly newsletter about building and scaling software from engineers at Better Stack]]></description><link>https://newsletter.betterstack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!_FGI!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F87b62237-9eb2-480e-abda-ac1ffc34a163_256x256.png</url><title>Hacking Scale by Better Stack</title><link>https://newsletter.betterstack.com</link></image><generator>Substack</generator><lastBuildDate>Mon, 06 Apr 2026 00:20:47 GMT</lastBuildDate><atom:link href="https://newsletter.betterstack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Better Stack]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[betterstack@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[betterstack@substack.com]]></itunes:email><itunes:name><![CDATA[Better Stack]]></itunes:name></itunes:owner><itunes:author><![CDATA[Better Stack]]></itunes:author><googleplay:owner><![CDATA[betterstack@substack.com]]></googleplay:owner><googleplay:email><![CDATA[betterstack@substack.com]]></googleplay:email><googleplay:author><![CDATA[Better Stack]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How LinkedIn Reduced GPU Memory Usage by 60% for LLM Training]]></title><description><![CDATA[LinkedIn hand-picked the best GPU performance techniques and put them in a library]]></description><link>https://newsletter.betterstack.com/p/how-linkedin-reduced-gpu-memory-usage</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-linkedin-reduced-gpu-memory-usage</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Thu, 16 Jan 2025 15:19:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>LinkedIn uses large language models</strong> (LLMs) for many features, just like other tech companies. Features like job matching and providing users with relevant content.</p><p>But, LinkedIn has so much data that <strong>training these models takes up a huge amount of resources</strong>.</p><p>So the team found a way to make that process more efficient.</p><p>Here's how they did it.</p><p><em>Estimated reading time 5 minutes and 15 seconds.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Is Training So Resource Intensive?</h2><p>Training a model involves giving it a large amount of data to learn from.</p><p>There are <strong>three steps that can be done for training</strong>:</p><ul><li><p><strong>Pre-training</strong>: Teaching the model general language</p></li><li><p><strong>Fine-tuning</strong>: Specializing the model for specific tasks</p></li><li><p><strong>Alignment</strong>: Making sure the model behaves as intended, this step is optional</p></li></ul><p><strong>Pre-training is the most resource-intensive step</strong>, so we'll focus on that.</p><p>This <strong>involves</strong> <strong>giving the model large unstructured data</strong> like books, articles, and websites. The data is stored in text, JSON, or binary format.</p><p>Then, <strong>words are tokenized into a numerical representation</strong>. So a sentence like "I love programming" could be represented as <code>[1, 23, 456]</code></p><p>These <strong>tokens are converted to embeddings</strong>, which capture words or phrases in a dense numerical format. This helps the model understand their meaning and relationship to other words.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2c_C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2c_C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 424w, https://substackcdn.com/image/fetch/$s_!2c_C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 848w, https://substackcdn.com/image/fetch/$s_!2c_C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 1272w, https://substackcdn.com/image/fetch/$s_!2c_C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2c_C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png" width="523" height="300.01220504475185" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21f3198d-668c-40e7-a920-d879551482a1_1229x705.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:705,&quot;width&quot;:1229,&quot;resizeWidth&quot;:523,&quot;bytes&quot;:251876,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2c_C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 424w, https://substackcdn.com/image/fetch/$s_!2c_C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 848w, https://substackcdn.com/image/fetch/$s_!2c_C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 1272w, https://substackcdn.com/image/fetch/$s_!2c_C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21f3198d-668c-40e7-a920-d879551482a1_1229x705.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>After that, the <strong>training phase starts</strong>.</p><p>Here, the <strong>model learns patterns by predicting the next word</strong> in a sentence or guessing missing words. This is all done automatically (self-supervised).</p><p>The model has the answers, so it tries to predict the answer and then checks it against the correct sentence. <strong>If it's correct or incorrect, it adjusts its weights</strong> to reduce the chances of a wrong prediction (backpropagation<em>)</em>.</p><p>A <strong>complete pass through all the data is called an epoch</strong>. But training usually stops after many epochs. At least one, but it could be up to 300.</p><p>To speed things up, <strong>predictions are made in parallel using lots of GPUs</strong>. But even so, training a model on gigabytes of data through many epochs could take days or weeks.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IdEL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IdEL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!IdEL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!IdEL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!IdEL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IdEL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4162,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IdEL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!IdEL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!IdEL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!IdEL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0168d7e3-ebfe-40e0-8089-85e2d31291a7_1456x30.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4><em>Sidenote: Model Weights</em></h4><p><em>An <strong>LLM is made up of many nodes</strong>. These process data from inputs.</em></p><p><em><strong>Nodes are organized into layers</strong> to process data in stages. A <strong>weight is the strength of the connection between nodes</strong> on different layers.</em></p><p><em>Weights are <strong>randomly assigned</strong> numbers at the start of training. But <strong>they</strong> <strong>change during training</strong> to produce better outputs.</em></p><p><em>For example, if you're training a model to recognize the positivity of the sentence, &#8220;Get lost&#8221;. The inputs would be the word embeddings, and the weight value could lean towards positivity. So, the higher the value, the more positive the output.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j3_n!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j3_n!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 424w, https://substackcdn.com/image/fetch/$s_!j3_n!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 848w, https://substackcdn.com/image/fetch/$s_!j3_n!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 1272w, https://substackcdn.com/image/fetch/$s_!j3_n!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j3_n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png" width="1425" height="770" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:770,&quot;width&quot;:1425,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:154617,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j3_n!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 424w, https://substackcdn.com/image/fetch/$s_!j3_n!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 848w, https://substackcdn.com/image/fetch/$s_!j3_n!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 1272w, https://substackcdn.com/image/fetch/$s_!j3_n!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa82123fa-0a2f-4b78-973c-821c69b2268c_1425x770.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A node typically has more than one weight. So, all <strong>weight values are summed</strong> and <strong>then go through an activation function</strong>. This adds 'non-linearity' to the values, which <strong>helps the model learn complex patterns</strong>.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kXu4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kXu4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!kXu4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!kXu4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!kXu4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kXu4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4162,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kXu4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!kXu4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!kXu4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!kXu4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80cffe64-b19f-452c-aa09-2f8a13070fa3_1456x30.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In <strong>LinkedIn</strong>&#8217;s case, they <strong>were experiencing performance bottlenecks during training</strong>, such as:</p><ol><li><p>Heavy <strong>GPU memory access</strong></p></li><li><p>Extra time and <strong>resources used per-operation</strong></p></li></ol><p>To address this, <strong>they built a library called Liger-Kernel</strong>.</p><p></p><h2>Liger-Kernel to the Rescue</h2><p>Let's jump into how Liger-Kernel addressed these bottlenecks. We'll start with GPU memory access.</p><p>The <strong>GPU has different memory types</strong> for different purposes during training. </p><p><strong>Slower High Bandwidth Memory</strong> <strong>(HBM) stores datasets</strong>, weights, and other large data structures.</p><p><strong>Faster Shared Memory</strong> <strong>(SRAM) stores frequently accessed data</strong>. This could be intermediate calculations such as attention scores.</p><p>Because the SRAM is small, <strong>data is regularly transferred between the HBM and SRAM</strong>. This can add latency and <strong>delay computation time</strong>.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U2ea!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U2ea!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!U2ea!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!U2ea!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!U2ea!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U2ea!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4162,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U2ea!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!U2ea!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!U2ea!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!U2ea!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94b4c0ab-8c88-4bac-8f1d-9c40eaf7159a_1456x30.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4><em>Sidenote: Attention Scores</em></h4><p><em>A way for the <strong>model to "pay attention" to the most relevant parts</strong> of the input data.</em></p><p><em>Let's take a look at these two sentences:</em></p><p><em>"Jason is much faster than Toby; he trains a lot."</em></p><p><em>"Jason is much faster than Toby, who trains a lot."</em></p><p><em>The sentences only have one different word, which completely changes their meaning. The 'trains a lot' in the first sentence is for Jason, and the second is for Toby. We humans can figure out this instantly, but a model can't.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UVHn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UVHn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 424w, https://substackcdn.com/image/fetch/$s_!UVHn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 848w, https://substackcdn.com/image/fetch/$s_!UVHn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 1272w, https://substackcdn.com/image/fetch/$s_!UVHn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UVHn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png" width="719" height="279.52134831460677" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:890,&quot;resizeWidth&quot;:719,&quot;bytes&quot;:38772,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UVHn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 424w, https://substackcdn.com/image/fetch/$s_!UVHn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 848w, https://substackcdn.com/image/fetch/$s_!UVHn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 1272w, https://substackcdn.com/image/fetch/$s_!UVHn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe661264a-b75e-4a65-b0bf-0c57dc4829b7_890x346.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>What we see as words, models see as lots of numbers. So, attention scores exist to focus on the important parts.</em></p><p><em>In the case of the first sentence, "Jason", "he", "trains a lot", and the semicolon could have higher attention scores. This is because the semicolon indicates a new clause that refers to Jason. This means Jason is faster because he trains a lot.</em></p><p><em>For the second sentence. "Toby" and "who trains a lot" could have a higher attention score because "trains a lot" relates to Toby. This suggests Toby is the one who trains, but Jason is still faster.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yey3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yey3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!Yey3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!Yey3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!Yey3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yey3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4162,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yey3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 424w, https://substackcdn.com/image/fetch/$s_!Yey3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 848w, https://substackcdn.com/image/fetch/$s_!Yey3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 1272w, https://substackcdn.com/image/fetch/$s_!Yey3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe95b27e0-bee2-4db1-bbc3-c422fdb10e78_1456x30.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>Liger-Kernel is built upon a technique called FlashAttention</strong>. This can <strong>improve GPU performance</strong> by calculating things like attention scores and partial sums on the SRAM instead of the HBM.</p><p>This was a good first step. But, <strong>they further optimized GPUs</strong> <strong>by</strong> taking tasks that needed many GPUs and <strong>merging operations so tasks could run on one GPU</strong>.</p><p>This is one of the ways they dealt with the per-operation time bottleneck.</p><p><strong>Let's explain</strong> it with an example.</p><p><strong>Two operations are used to control the range of values</strong> during training:</p><ol><li><p><strong>RMSNorm</strong>: Root Mean Square Normalization. A technique used to normalize the size (magnitude) of activation outputs from a layer of nodes.</p></li><li><p><strong>Scaling</strong>: Adjust the range or magnitude of input data to a common scale. This helps the learning process because scaling makes sure data is within a range that the model can learn from.</p></li></ol><p>Typically, <strong>these operations are performed on different GPUs</strong>. But they can be merged together (operator fusion). How?</p><p><strong>Training can be done with a framework like PyTorch</strong> or TensorFlow. Out of the box, PyTorch <strong>does something called eager execution</strong>. This means <strong>operations are executed immediately</strong> without a compilation step.</p><p>While it sounds good in theory, it <strong>has some performance issues</strong>. Since <strong>operations are executed one at a time</strong> (synchronously), this <strong>prevents parallel executions</strong>.</p><p>To address this, <strong>PyTorch introduced a feature called torch.compile</strong>. Which <strong>enables Just-In-Time (JIT) compilation.</strong> JIT compiles a models' computational graph to machine code for faster execution.</p><p>A computation graph represents the sequence of operations a model performs on its inputs to produce an output.</p><p>Part of <strong>JIT involves operator fusion</strong> for many operations, not just the two mentioned earlier. In some cases JIT can be <a href="https://stackoverflow.com/questions/53005487/eager-mode-very-slow-22x-slower-than-graph-mode">22 times faster</a> than eager execution.</p><p>Both <strong>FlashAttention and operator fusion work in Liger-Kernel</strong>. These features addressed the bottlenecks LinkedIn had. </p><p>But <strong>Liger-Kernel is a Python library</strong>, and <strong>raw Python code cannot run on a GPU</strong>.</p><p>So they <strong>wrote it in a language called Triton</strong>.</p><p></p><h2>Python on the GPU</h2><p><strong>Triton</strong> is an open source<strong> domain-specific language and compiler</strong> created by an OpenAI employee. It's <strong>designed to help write custom GPU kernels</strong> using a Python-based syntax.</p><p>Triton code complies to low-level GPU code that generates a new optimized GPU kernel.</p><p><strong>LinkedIn wrote </strong>their own <strong>operator fusion in Triton</strong> for Liger-Kernel. But also wrote <strong>RMSNorm and other operations</strong> in Triton to take full advantage of the GPU. </p><p>So, <strong>how is Liger-Kernel deployed</strong>?</p><p>Usually <strong>training is done through a distributed setup</strong>. This involves many GPUs training different parts of the model.</p><p>There are <strong>many distributed setups</strong>, but we'll focus on <strong>Torch Distributed Elastic in AWS</strong> purely because <a href="https://aws.amazon.com/blogs/machine-learning/distributed-training-with-amazon-eks-and-torch-distributed-elastic/">AWS diagrams</a> make sense to me.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jKep!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jKep!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 424w, https://substackcdn.com/image/fetch/$s_!jKep!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 848w, https://substackcdn.com/image/fetch/$s_!jKep!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 1272w, https://substackcdn.com/image/fetch/$s_!jKep!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jKep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png" width="1456" height="843" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:843,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:278755,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jKep!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 424w, https://substackcdn.com/image/fetch/$s_!jKep!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 848w, https://substackcdn.com/image/fetch/$s_!jKep!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 1272w, https://substackcdn.com/image/fetch/$s_!jKep!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F085a58ce-e517-4393-8eec-4fab60f21666_2399x1389.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This has a <strong>parameter server</strong> used to <strong>store and update parameters</strong> like weights. And a <strong>controller </strong>to <strong>manage training jobs</strong> based on the available resources.</p><p><strong>Liger-Kernel</strong> will be <strong>added to the container image</strong>, which runs on a pod. Once the pod starts, Liger-Kernel <strong>code compiles then optimizes the GPU kernel</strong> before training begins.</p><p><strong>With all these features</strong> and <a href="https://x.com/hsu_byron/status/1866577403918917655">optimized post-training losses</a>, which I won't go into in this article. <strong>LinkedIn's</strong> <a href="https://github.com/linkedin/Liger-Kernel">open-source</a> <strong>library has achieved impressive performance</strong>.</p><p>Improved multi-GPU training throughput by 20%. A 3x reduction in end-to-end training time. Reduced memory usage by 60% and much more.</p><p>If you enjoyed this article, check out the <a href="https://www.linkedin.com/blog/engineering/open-source/liger-kernel-open-source-ecosystem-for-efficient-llm-training">original article</a> from LinkedIn.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Tinder Secures Its 500+ Microservices]]></title><description><![CDATA[Tinder's highly customised solution that fixed their microservice security chaos]]></description><link>https://newsletter.betterstack.com/p/how-tinder-secures-its-500-microservices</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-tinder-secures-its-500-microservices</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 08 Jan 2025 13:40:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Tinder</strong> is an <strong>online dating platform</strong> famous for its swiping mechanism. Swipe right to like and swipe left to dislike.</p><p>It launched in 2012 and has grown to be one of the most popular dating platforms with around <strong>75 million active users</strong>.</p><p>This growth has led to the <strong>development of many different tech-related services</strong> over the years. Some of these are external, so they&#8217;re <strong>open to the public</strong> and attackers.</p><p>These external services <strong>used different third-party solutions</strong> for security and routing requests. This, of course, made them very difficult to maintain.</p><p>So, the team at Tinder decided to <strong>build their own solution</strong>. A solution that would match their custom infrastructure.</p><p>Here's how they did it.</p><p><em>Estimated reading time: 4 minutes 48 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Tinder Has External Services</h2><p>Tinder has many external services. Some of <strong>these include</strong>;</p><ul><li><p>The<strong> Authentication </strong>service for login and session management</p></li><li><p><strong>Recommendations</strong> to suggest matches</p></li><li><p><strong>Messaging</strong> for communication between matched users</p></li><li><p><strong>Geolocation</strong> for location-based matching</p></li><li><p><strong>Media</strong> to handle all image and video uploads</p></li></ul><p>And many more.</p><p>These are external because <strong>people can access them easily</strong>. They can view the API requests and see the data being received.</p><p>This also means that <strong>bad actors can also see this information</strong>. This isn't great for any app, but it's a massive problem for a dating app like Tinder.</p><p>Someone could track a user's exact location via the geolocation service. Access phone numbers, images, or other personal information, hijack accounts, and so on.</p><p>Not to mention, if a hacker got access to an external service, <strong>they</strong> <strong>could also get access to internal services</strong>.</p><p>So it's important the team at Tinder <strong>keep these external services secure</strong>.</p><p>The best way to do this is to use an <a href="https://newsletter.betterstack.com/i/146792793/sidenote-api-gateway">API gateway </a>which enables authorization and security.</p><p>The problem Tinder had was that they were using many different third-party API gateways. These used different tech stacks and were difficult to manage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IbQX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IbQX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 424w, https://substackcdn.com/image/fetch/$s_!IbQX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 848w, https://substackcdn.com/image/fetch/$s_!IbQX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 1272w, https://substackcdn.com/image/fetch/$s_!IbQX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IbQX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png" width="1456" height="747" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:747,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:437991,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IbQX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 424w, https://substackcdn.com/image/fetch/$s_!IbQX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 848w, https://substackcdn.com/image/fetch/$s_!IbQX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 1272w, https://substackcdn.com/image/fetch/$s_!IbQX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5db37c50-53b8-42ac-996a-3e86b14f11fd_3662x1879.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>They needed a single API gateway solution that:</p><ol><li><p>Would have a <strong>consistent use of session management</strong> across different services</p></li><li><p>Could be a framework teams could take and <strong>use to scale their application independently</strong></p></li><li><p>Could be <strong>customized using a configuration file</strong> instead of writing code</p></li><li><p>Would <strong>integrate with their Envoy service mesh</strong></p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6a8d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6a8d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!6a8d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!6a8d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!6a8d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6a8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6a8d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!6a8d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!6a8d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!6a8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d92a580-7b04-46e4-ac57-cd148b036476_2254x46.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4><em>Sidenote: Service Mesh</em></h4><p><em>A service mesh is a piece of software that <strong>manages communication between services</strong>.</em></p><p><em>Service meshes are usually added to a Kubernetes cluster and <strong>provide features like</strong>:</em></p><ul><li><p><em><strong>mTLS</strong>: Mutual TLS which makes sure that services verify their identities before they can talk</em></p></li><li><p><em><strong>Retries</strong>: Automatically retry a connection if a service is down</em></p></li><li><p><em><strong>Observability</strong>: Collect metrics and traces from a service</em></p></li><li><p><em><strong>Traffic splitting</strong>: Control what percentage of traffic goes to each version of a service if there are many versions</em></p></li></ul><p><em>These features and more work without making any changes to the application code.</em></p><p><em>A service mesh works by adding two things. A <a href="https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/">sidecar</a> proxy to each pod. This <strong>intercepts all the application network calls</strong>. And a control plane that adds and manages the proxies.</em></p><p><em>So, <strong>configuration is done via the control plane</strong>, which updates the proxies. But the proxies don't need to communicate with the control plane.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Rl6v!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Rl6v!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 424w, https://substackcdn.com/image/fetch/$s_!Rl6v!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 848w, https://substackcdn.com/image/fetch/$s_!Rl6v!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 1272w, https://substackcdn.com/image/fetch/$s_!Rl6v!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Rl6v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png" width="1456" height="906" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:906,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:195985,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Rl6v!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 424w, https://substackcdn.com/image/fetch/$s_!Rl6v!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 848w, https://substackcdn.com/image/fetch/$s_!Rl6v!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 1272w, https://substackcdn.com/image/fetch/$s_!Rl6v!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Popular service mesh proxies include Envoy, Linkerd Proxy, and Consul Connect Proxy. Popular control planes include <a href="https://istio.io/">Istio</a>, <a href="https://linkerd.io/">Linkerd</a>, and <a href="https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F333347e0-2643-49ba-8bcc-9c2761979d56_2092x1302.png">Consul</a>.</em></p><p><em>In fact, Envoy is the most popular service mesh proxy with first class support for <a href="https://newsletter.betterstack.com/i/152558295/sidenote-grpc">gRPC</a>.</em></p><p><em>It's important to note the <strong>difference between a service and a pod in Kubernetes</strong>.</em></p><p><em>A pod runs the application code. A <strong>service is a virtual layer</strong> that can be applied to a single or many pods.</em></p><p><em>It provides a stable DNS name and IP address. This is used to <strong>make sure that internal or external apps can access a pod</strong> or pods, even if the pod gets replaced or restarted.</em></p><p><em>Although you don't need services to use a service mesh, they provide stable names for pods that can complement a service mesh.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4X80!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4X80!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!4X80!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!4X80!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!4X80!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4X80!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1056e703-4994-43d8-b231-7dadee987116_2254x46.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4X80!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!4X80!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!4X80!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!4X80!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1056e703-4994-43d8-b231-7dadee987116_2254x46.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The team at <strong>Tinder looked into existing solutions</strong> like Amazon AWS Gateway, Kong, Apigee, and <a href="http://Tyk.io">Tyk.io</a>, but <strong>none met all their requirements</strong>.</p><p>So they <strong>built their own</strong> <strong>solution called TAG</strong>, Tinder API Gateway.</p><h2>How TAG Works</h2><p>Before TAG can receive traffic, it <strong>needs to be configured with a list of routes</strong>.</p><p>A developer would first create a route as configuration file. This would be a YAML file that could include information like an API endpoint or route, as well as its service and filters.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FFkv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FFkv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 424w, https://substackcdn.com/image/fetch/$s_!FFkv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 848w, https://substackcdn.com/image/fetch/$s_!FFkv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 1272w, https://substackcdn.com/image/fetch/$s_!FFkv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FFkv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png" width="381" height="340.5856754306437" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:986,&quot;width&quot;:1103,&quot;resizeWidth&quot;:381,&quot;bytes&quot;:123988,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FFkv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 424w, https://substackcdn.com/image/fetch/$s_!FFkv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 848w, https://substackcdn.com/image/fetch/$s_!FFkv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 1272w, https://substackcdn.com/image/fetch/$s_!FFkv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31ddd428-3782-45d8-9a6a-d5f7a4293dec_1103x986.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Filters are bits of logic that can be applied to requests as they come in, and responses as they go out. There are <strong>three types of filters</strong>.</p><ul><li><p><strong>Pre-Filters</strong>: applied to requests before they reach the service (e.g., converting HTTP to gRPC).</p></li><li><p><strong>Post filters</strong>: applied to responses after they leave the service (e.g., adding headers like location).</p></li><li><p><strong>Global filters</strong>: applied to all requests and responses. (e.g., request and response scanning).</p></li></ul><p>These three types of filters have predefined logic inside of TAG, but it's also possible to add custom filters.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!K74C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!K74C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!K74C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!K74C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!K74C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!K74C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!K74C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!K74C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!K74C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!K74C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F41d6c15a-0810-4df0-94fe-0432eb34cb48_2254x46.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h4><em>Sidenote: Request and Response Scanning</em></h4><p><em>This is one of the <strong>ways TAG can prevent attacks</strong> on the system, and it does this very cleverly.</em></p><p><em>When a request or response is sent to TAG, an <strong>async event is sent to an event streaming platform</strong>. This event contains details like the type of request and endpoint being access. </em></p><p><em>This is async, so it doesn't block other processes.</em></p><p><em>The data is securely <strong>streamed to other applications</strong> using Amazon MSK (Managed Streaming for Apache Kafka).</em></p><p><em>These applications can <strong>check for unusual patterns</strong>, bots, or other attacks.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!w9jH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!w9jH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 424w, https://substackcdn.com/image/fetch/$s_!w9jH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 848w, https://substackcdn.com/image/fetch/$s_!w9jH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 1272w, https://substackcdn.com/image/fetch/$s_!w9jH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!w9jH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png" width="436" height="505.1730769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1687,&quot;width&quot;:1456,&quot;resizeWidth&quot;:436,&quot;bytes&quot;:330565,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!w9jH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 424w, https://substackcdn.com/image/fetch/$s_!w9jH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 848w, https://substackcdn.com/image/fetch/$s_!w9jH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 1272w, https://substackcdn.com/image/fetch/$s_!w9jH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb0792608-3cd3-4ab3-acd0-e5d2cf7cda2b_1967x2279.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>If TAG detects any issues, it can trigger some global filters, such as rate limiting, or just block the request.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U4Ic!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U4Ic!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!U4Ic!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!U4Ic!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!U4Ic!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U4Ic!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png" width="1456" height="30" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:30,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4226,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U4Ic!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 424w, https://substackcdn.com/image/fetch/$s_!U4Ic!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 848w, https://substackcdn.com/image/fetch/$s_!U4Ic!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 1272w, https://substackcdn.com/image/fetch/$s_!U4Ic!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F47e7af0f-b9c5-4dd0-a806-0e1ed5130549_2254x46.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>When TAG gets <strong>an update to the route configuration</strong> or starts up, it does a few things. </p><p>It <strong>creates internal objects that represent each route</strong>. Associates the correct filters with each route object. Then, sets up rules for how each request should be matched to a service.</p><p>After that, TAG is <strong>ready to receive traffic</strong>. Here's a step-by-step, high-level overview of how it does this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IOXB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IOXB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 424w, https://substackcdn.com/image/fetch/$s_!IOXB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 848w, https://substackcdn.com/image/fetch/$s_!IOXB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 1272w, https://substackcdn.com/image/fetch/$s_!IOXB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IOXB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png" width="1456" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:679826,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IOXB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 424w, https://substackcdn.com/image/fetch/$s_!IOXB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 848w, https://substackcdn.com/image/fetch/$s_!IOXB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 1272w, https://substackcdn.com/image/fetch/$s_!IOXB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9efadfb-133d-46e2-982c-e228581ccd70_4035x1746.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p>A <strong>client sends a HTTP request</strong> to the backend</p></li><li><p>The request hits TAG, which applies global filters</p></li><li><p>A <strong>route in TAG is matched</strong> to the request</p></li><li><p>TAG uses Envoy to <strong>discover which service should handle the request</strong></p></li><li><p>TAG <strong>applies pre-filters</strong>, then forwards the request to the service</p></li><li><p>Once the service has responded, TAG <strong>applies post-filters</strong></p></li><li><p>Then the response is <strong>sent back to the client</strong></p></li></ol><p>Teams at Tinder use TAG as a framework for building their own API gateways just by writing configuration files.</p><p>TAG is also used by other companies like Hinge, OkCupid, and PlentyOfFish.</p><h2>Wrapping Things Up</h2><p>If I'm being honest, I'm amazed to see how much effort went into building TAG for an app like Tinder. It's easy not to think much of something like a dating platform. How complicated can it be?</p><p>But doing the research for this article was a great insight into how problems at scale, no matter the app, can be really complicated to solve.</p><p>Check out the <a href="https://medium.com/tinder/how-we-built-the-tinder-api-gateway-831c6ca5ceca">original article</a> if you want more details about how Tinder's API gateway works.</p><p>And as usual, if you want the next article sent to your inbox as soon as it's released, go ahead and <a href="https://newsletter.betterstack.com/">subscribe</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p><div><hr></div>]]></content:encoded></item><item><title><![CDATA[How DoorDash Improved Redis to Handle 10+ Million Reads per Second]]></title><description><![CDATA[A handful of insanely clever things DoorDash did to make Redis blazingly fast]]></description><link>https://newsletter.betterstack.com/p/how-doordash-improved-redis-to-handle</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-doordash-improved-redis-to-handle</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 18 Dec 2024 14:03:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>DoorDash</strong> is an online food delivery service. It <strong>allows users to order food from local restaurants</strong> and delivers it to their doorstep.</p><p>Founded in 2013 by four US students. It has grown to over <strong>19 thousand employees worldwide</strong>, with 550,000 restaurants in 2023, and made over <strong>8 billion dollars in revenue</strong> that same year.</p><p>That&#8217;s a lot of growth in 10 years. But there&#8217;s more.</p><p>In 2022, DoorDash acquired <a href="https://wolt.com/">Wolt Enterprises</a>, taking the total number of countries it operates in from 6 to more than 30.</p><p>With so many restaurants, giving users an <strong>excellent search and recommendation experience is important</strong>.</p><p>So to do this, the team at DoorDash <strong>built a machine learning model</strong> that used <a href="https://redis.io/">Redis</a> to store data.</p><p>But Redis wasn't coping well with the amount of reads to the data.</p><p>So here's how they improved it.</p><p><em>Estimated reading time: 4 minutes 51 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Does DoorDash Use ML?</h2><p>Not all online services use machine learning for their search and recommendations. So why does DoorDash?</p><p>The team used <strong>traditional methods in the past to suggest restaurants</strong> based on a user's location and preferences. Most likely using a <a href="https://newsletter.betterstack.com/i/149093328/how-search-actually-worked">search pipeline with Elasticsearch</a>.</p><p>But <strong>this didn't have the level of personalization</strong> users have come to expect. The search and recommendations didn't update dynamically based on user behavior.</p><p>So, the team at DoorDash <strong>built a machine learning model to learn from its users</strong> and make better predictions.</p><p>But to do that, they would need to <strong>store a lot of data somewhere</strong> for fast and easy access. And that somewhere for DoorDash was <strong>Redis</strong>.</p><div><hr></div><h4><em>Sidenote: Redis</em></h4><p><em>Redis (Remote dictionary server) is <strong>an in-memory data store</strong>. In-memory means data is read and modified from computer memory (RAM), not the disk. This makes it incredibly fast. </em></p><p><em>Redis reads 12x faster than MongoDB and <a href="https://redis.io/blog/redisjson-public-preview-performance-benchmarking/">500x faster than Elasticsearch.</a></em></p><p><em>It <strong>stores data as key-value pairs</strong> where keys are always strings, and values can be any data type.</em></p><p><em>But, because Redis stores data in memory, all the data must be stored in RAM, which can get expensive for a lot of data. This also means if the server crashes, data not yet written to disk is lost.</em></p><p><em>Because of that, Redis is <strong>commonly used as a cache for data to be retrieved quickly</strong>. But, is often paired with other databases for long-term storage.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9nyk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9nyk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 424w, https://substackcdn.com/image/fetch/$s_!9nyk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 848w, https://substackcdn.com/image/fetch/$s_!9nyk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 1272w, https://substackcdn.com/image/fetch/$s_!9nyk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9nyk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png" width="625" height="454.1864139020537" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:920,&quot;width&quot;:1266,&quot;resizeWidth&quot;:625,&quot;bytes&quot;:124051,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9nyk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 424w, https://substackcdn.com/image/fetch/$s_!9nyk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 848w, https://substackcdn.com/image/fetch/$s_!9nyk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 1272w, https://substackcdn.com/image/fetch/$s_!9nyk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5539e49-abf8-4a29-8589-f7152db20e07_1266x920.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>The team tried using different databases: Cassandra, CockroachDB and <a href="https://newsletter.betterstack.com/i/147920452/sidenote-scylladb">Scylla</a>. But they settled on Redis for its performance and cost.</p><p>An ML model capable of the predictions DoorDash wanted would need to make tens of millions of reads per second.</p><p>As performant as Redis is, it wasn't able to handle that many reads out of the box.</p><p>So they needed to massively improve it.</p><div><hr></div><h4><em>Sidenote: ML Predictions</em></h4><p><em>Why does a machine learning model need to make tens of millions of reads per second?</em></p><p><em>A machine learning model is essentially a <strong>program that finds patterns in data</strong> and uses them to make predictions.</em></p><p><em>So if someone types 'best-running shoes' into a model for recommendations. The <strong>model would search for data</strong>, like shoe ratings, user's purchase history, shoe specifications, etc.</em></p><p><em>These pieces of data are <strong>called</strong> <strong>features</strong>. This is <strong>the</strong> <strong>input data the model needs to analyze</strong>. Features start out as raw data, like shoe data from an application database.</em></p><p><em>It's then cleaned up and transformed into a format that the model can be trained on and used to make predictions.</em></p><p><em>This includes creating categories or buckets for data, combining buckets to make new data, and removing redundant data. Things that can help the model find patterns.</em></p><p><em>All this data is <strong>stored in a feature store</strong>.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hYG5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hYG5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 424w, https://substackcdn.com/image/fetch/$s_!hYG5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 848w, https://substackcdn.com/image/fetch/$s_!hYG5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 1272w, https://substackcdn.com/image/fetch/$s_!hYG5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hYG5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png" width="1456" height="594" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:594,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:175468,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hYG5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 424w, https://substackcdn.com/image/fetch/$s_!hYG5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 848w, https://substackcdn.com/image/fetch/$s_!hYG5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 1272w, https://substackcdn.com/image/fetch/$s_!hYG5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5ff9bf4-0893-4608-8f6a-bbc3a6ce737b_2173x886.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A feature store itself <strong>contains two main components</strong>: offline and online stores.</em></p><p><em><strong>Offline stores</strong> contains <strong>historical data</strong> used to train the model. Usually stored on disk-based databases.</em></p><p><em><strong>Online stores</strong> contain the most <strong>current data</strong> from real-time events used for real-time predictions. This data is often streamed via <a href="https://newsletter.betterstack.com/i/151865001/sidenote-cdc">CDC</a> and stored in memory for quick access.</em></p><p><em>New Data from online storage is often transferred to offline storage so the model can be trained on it. This is called feature ingestion.</em></p><p><em>So, if a prediction needs to be made, the model will read the online feature store to get data.</em></p><p><em>If many predictions need to be made from different users that require lots of feature data, thousands or tens of thousands of reads could be made simultaneously.</em></p><div><hr></div><h2>How DoorDash Improved Redis</h2><p>Without modifications, <strong>Redis can handle a few hundred thousand reads per second</strong>. Which is more than enough for the average company.</p><p>But for DoorDash to use it as its feature store, <strong>it needed to handle a few million reads per second</strong>, which it struggled with.</p><p>So to improve Redis, the <strong>team needed to make it use less memory</strong> and use the CPU more efficiently. These were some of the bottlenecks they encountered.</p><p>Let's go through how they did that.</p><p>The first thing they did was to use Redis Hashes.</p><div><hr></div><h4><em>Sidenote: Redis Hashes</em></h4><p><em>Redis Hashes are a data structure that <strong>allows you to store many values with a single key</strong>.</em></p><p><em>By default, Redis uses strings to store values, which weren't designed for many related values.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!W9RK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!W9RK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 424w, https://substackcdn.com/image/fetch/$s_!W9RK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 848w, https://substackcdn.com/image/fetch/$s_!W9RK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 1272w, https://substackcdn.com/image/fetch/$s_!W9RK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!W9RK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png" width="275" height="154.58579881656806" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:285,&quot;width&quot;:507,&quot;resizeWidth&quot;:275,&quot;bytes&quot;:21610,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!W9RK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 424w, https://substackcdn.com/image/fetch/$s_!W9RK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 848w, https://substackcdn.com/image/fetch/$s_!W9RK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 1272w, https://substackcdn.com/image/fetch/$s_!W9RK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f8c1649-df10-4b3f-a907-6a7aad5c7253_507x285.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>But hashes are designed to do that. They are <strong>more memory efficient for storing many values</strong> because Redis can optimize them.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FUzr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FUzr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 424w, https://substackcdn.com/image/fetch/$s_!FUzr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 848w, https://substackcdn.com/image/fetch/$s_!FUzr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 1272w, https://substackcdn.com/image/fetch/$s_!FUzr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FUzr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png" width="404" height="213.55019059720456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/da0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:416,&quot;width&quot;:787,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:42463,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FUzr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 424w, https://substackcdn.com/image/fetch/$s_!FUzr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 848w, https://substackcdn.com/image/fetch/$s_!FUzr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 1272w, https://substackcdn.com/image/fetch/$s_!FUzr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fda0a6b39-a83d-430e-99ca-cf4cbff1edbe_787x416.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>You could also use the HGET command to get a single value and HMGET to get multiple values.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jn3G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jn3G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 424w, https://substackcdn.com/image/fetch/$s_!Jn3G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 848w, https://substackcdn.com/image/fetch/$s_!Jn3G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 1272w, https://substackcdn.com/image/fetch/$s_!Jn3G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jn3G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png" width="284" height="133.37857142857143" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:263,&quot;width&quot;:560,&quot;resizeWidth&quot;:284,&quot;bytes&quot;:22261,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jn3G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 424w, https://substackcdn.com/image/fetch/$s_!Jn3G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 848w, https://substackcdn.com/image/fetch/$s_!Jn3G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 1272w, https://substackcdn.com/image/fetch/$s_!Jn3G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F226b1a17-2771-4bd7-9fcf-ab6955fbe728_560x263.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div><hr></div><p><strong>Hashes alone reduced CPU usage by 82%</strong>. But there were more optimizations the team could make.</p><p>Next, they compressed feature names and values.</p><p>They <strong>compressed feature names with a fast hashing algorithm</strong> called <a href="https://xxhash.com/">xxHash</a>.</p><p>Feature names were typically very long for human readability.</p><p>But they took up 27 bytes of memory. Putting that exact text through xxHash would reduce it to 32 bits.</p><p>Considering 27 bytes (B) is 216 bits (b), that's an <strong>85% reduction in size</strong>. Doing this on a large scale reduced a lot of memory.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PZWK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PZWK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 424w, https://substackcdn.com/image/fetch/$s_!PZWK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 848w, https://substackcdn.com/image/fetch/$s_!PZWK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 1272w, https://substackcdn.com/image/fetch/$s_!PZWK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PZWK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png" width="524" height="260.1823587710605" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:501,&quot;width&quot;:1009,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:72447,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PZWK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 424w, https://substackcdn.com/image/fetch/$s_!PZWK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 848w, https://substackcdn.com/image/fetch/$s_!PZWK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 1272w, https://substackcdn.com/image/fetch/$s_!PZWK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F79324cf4-22e7-495b-b0ba-6a0a619c1fac_1009x501.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The team likely had a separate mapping or table that linked each feature name to the hashed feature name.</p><p>When it came to <strong>compressing feature values</strong>, they used a more complicated approach.</p><p>They first <strong>converted values to Protocol buffers</strong> (protobufs). A data format developed by Google to store and transmit data in a compact form. It is a way to convert structured data to a binary format and is <a href="https://newsletter.betterstack.com/i/152558295/sidenote-grpc">heavily used in gRPC</a>.</p><p>Then, they <strong>compressed the protobufs using Snappy</strong>. Another Google-developed library that focuses on speed over compression size.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Vg5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Vg5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 424w, https://substackcdn.com/image/fetch/$s_!0Vg5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 848w, https://substackcdn.com/image/fetch/$s_!0Vg5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 1272w, https://substackcdn.com/image/fetch/$s_!0Vg5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Vg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png" width="648" height="329.34065934065933" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:740,&quot;width&quot;:1456,&quot;resizeWidth&quot;:648,&quot;bytes&quot;:544991,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Vg5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 424w, https://substackcdn.com/image/fetch/$s_!0Vg5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 848w, https://substackcdn.com/image/fetch/$s_!0Vg5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 1272w, https://substackcdn.com/image/fetch/$s_!0Vg5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f5cd31b-1f9c-4c18-b110-bbf17a2f8aa8_1790x910.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Snappy doesn't have the highest compression ratio and doesn't have the lowest CPU usage. But it was chosen over other options because it could compress Redis hashes and decompress feature values well.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yGJW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yGJW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 424w, https://substackcdn.com/image/fetch/$s_!yGJW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 848w, https://substackcdn.com/image/fetch/$s_!yGJW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 1272w, https://substackcdn.com/image/fetch/$s_!yGJW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yGJW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png" width="698" height="238.73901098901098" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/74371120-c877-4499-881e-42b195774360_1918x656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:498,&quot;width&quot;:1456,&quot;resizeWidth&quot;:698,&quot;bytes&quot;:109072,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yGJW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 424w, https://substackcdn.com/image/fetch/$s_!yGJW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 848w, https://substackcdn.com/image/fetch/$s_!yGJW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 1272w, https://substackcdn.com/image/fetch/$s_!yGJW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F74371120-c877-4499-881e-42b195774360_1918x656.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>With all these changes, DoorDash saw a <strong>62% reduction in overall memory usage</strong>, from 298 GB of RAM to 112GB.</p><p>And a <strong>65% reduction in CPU use</strong> from 208 CPUs to 72 per 10 million reads per second.</p><p>That&#8217;s incredible.</p><h2>Wrapping things up</h2><p>If you thought the efforts of the DoorDash team weren't impressive enough, check this out.</p><p>They added CockroachDB to their feature store because Redis' memory costs were too high.</p><p>They used CockroachDB as an offline feature store and kept Redis as their online feature store. But that's a topic for another article.</p><p>As usual, if you liked this post and want more details, check out the <a href="https://careersatdoordash.com/blog/building-a-gigascale-ml-feature-store-with-redis/">original article</a>.</p><p>And if you want the next article sent straight to your inbox, be sure to <a href="https://newsletter.betterstack.com/">subscribe</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 22 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Dropbox Saved Millions of Dollars by Building a Load Balancer]]></title><description><![CDATA[Dropbox saved resources by creating a superior version of a tool everyone uses]]></description><link>https://newsletter.betterstack.com/p/how-dropbox-saved-millions-of-dollars</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-dropbox-saved-millions-of-dollars</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 11 Dec 2024 14:01:03 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Dropbox</strong> is a <strong>cloud-based storage service</strong> that is ridiculously easy to use.</p><p>Download the app and drag your files into the newly created folder. That's it; your files are in the cloud and can be accessed from anywhere.</p><p>It sounds like a simple idea, but back in 2007, when it was released, there wasn't anything like it.</p><p>Today, Dropbox has around <strong>700 million users</strong> and stores over <strong>550 billion files</strong>.</p><p>All these files need to be <strong>organized, backed up, and accessible from anywhere</strong>. Dropbox uses <strong>virtual servers</strong> for this. But they often got <strong>overloaded</strong> and sometimes crashed.</p><p>So, the team at Dropbox built a solution to <strong>manage server loads</strong>.</p><p>Here's how they did it.</p><p><em>Estimated reading time: 5 minutes 15 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Dropbox Servers Were Overloaded</h2><p>Before Dropbox grew in scale, they <strong>used a traditional system</strong> to balance load.</p><p>This likely used a <strong>round-robin algorithm with fixed weights</strong>.</p><p>So, a user or client would upload a file. The load balancer would forward the upload request to a server. Then, that server would upload the file and store it correctly.</p><div><hr></div><h4><em>Sidenote: Weighted Round Robin</em></h4><p><em>A round-robin is a <strong>simple load-balancing algorithm</strong>. It works by cycling requests to different servers so they get an equal share of the load.</em></p><p><em>If there are three servers, A, B, C, and three requests come in. A gets the first, B gets the second, and C gets the third.</em></p><p><em><strong>Weighted round robin</strong> is a level up from round robin. Each server is given a <strong>weight based on its processing power and capacity</strong>.</em></p><p><em><strong>Static weights</strong> are assigned manually by a network admin. <strong>Dynamic weights</strong> are adjusted in <strong>real time</strong> by a load balancer. </em></p><p><em>The higher the weight, the more load the server gets.</em></p><p><em>So if A has a weight of 3, B has 2, C has 1, and there were 12 requests. A would get 6, B would get 4, and C would get 2.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eN06!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eN06!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 424w, https://substackcdn.com/image/fetch/$s_!eN06!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 848w, https://substackcdn.com/image/fetch/$s_!eN06!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 1272w, https://substackcdn.com/image/fetch/$s_!eN06!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eN06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png" width="639" height="470.8997429305913" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:860,&quot;width&quot;:1167,&quot;resizeWidth&quot;:639,&quot;bytes&quot;:113057,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eN06!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 424w, https://substackcdn.com/image/fetch/$s_!eN06!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 848w, https://substackcdn.com/image/fetch/$s_!eN06!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 1272w, https://substackcdn.com/image/fetch/$s_!eN06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2cdf30e7-17a2-437d-842d-9c75b052fff6_1167x860.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>But there was an issue with their traditional load balancing approach.</p><p>Dropbox had <strong>many virtual servers</strong> with vastly <strong>different hardware</strong>. This made it <strong>difficult to distribute the load evenly</strong> between them with static weights.</p><p>This difference in hardware could have been caused by Dropbox using more powerful servers as it grew.</p><p>They may have started with an average server. As it grew, the team acquired more powerful servers. As it grew more, they acquired even more powerful ones.</p><p>At the time, there was <strong>no off-the-shelf load-balancing solution</strong> that could help. Especially one that used a dynamic weighted round-robin <strong>with gRPC support</strong>.</p><p>So, they <strong>built their own</strong>, which they called <strong>Robinhood</strong>.</p><div><hr></div><h4><em>Sidenote: gRPC</em></h4><p><em><strong>Google Remote Procedure Call</strong> (gRPC) is a way for different programs to talk to each other. It's based on RPC, which allows a client to <strong>run a function on the server simply by calling it</strong>.</em></p><p><em>This is <strong>different from REST</strong>, which requires communication via a URL. REST also focuses on the resource being accessed instead of the action that needs to be taken.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!A6Ug!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!A6Ug!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 424w, https://substackcdn.com/image/fetch/$s_!A6Ug!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 848w, https://substackcdn.com/image/fetch/$s_!A6Ug!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 1272w, https://substackcdn.com/image/fetch/$s_!A6Ug!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!A6Ug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png" width="536" height="158.6122448979592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:232,&quot;width&quot;:784,&quot;resizeWidth&quot;:536,&quot;bytes&quot;:42039,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!A6Ug!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 424w, https://substackcdn.com/image/fetch/$s_!A6Ug!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 848w, https://substackcdn.com/image/fetch/$s_!A6Ug!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 1272w, https://substackcdn.com/image/fetch/$s_!A6Ug!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff8c9a35b-00f2-4f88-bfe4-48557886807c_784x232.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>But gRPC has many more differences between REST and regular RPC.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VnYj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VnYj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 424w, https://substackcdn.com/image/fetch/$s_!VnYj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 848w, https://substackcdn.com/image/fetch/$s_!VnYj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 1272w, https://substackcdn.com/image/fetch/$s_!VnYj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VnYj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png" width="617" height="284.0733137829912" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:471,&quot;width&quot;:1023,&quot;resizeWidth&quot;:617,&quot;bytes&quot;:88880,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VnYj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 424w, https://substackcdn.com/image/fetch/$s_!VnYj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 848w, https://substackcdn.com/image/fetch/$s_!VnYj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 1272w, https://substackcdn.com/image/fetch/$s_!VnYj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9853703d-f470-4c45-8c43-7db3b5a86b5f_1023x471.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>The biggest one is the <strong>use of protobufs</strong>. This file format <a href="https://protobuf.dev/">developed by Google</a> is used <strong>to store and send data</strong>.</em></p><p><em>It works by encoding structured data into a binary format for fast transmission. The recipient then decodes it back to structured data. This format is also much smaller than something like JSON.</em></p><p><em>Protobufs are what make <strong>gRPC fast</strong>, but also more difficult to set up since the client and server need to support it.</em></p><p><em>gRPC isn't supported natively by browsers. So, it's <strong>commonly</strong> <strong>used for internal server communication</strong>.</em></p><div><hr></div><h2>The Custom Load Balancer</h2><p>The <strong>main component of RobinHood</strong> is the load balancing service or <strong>LBS</strong>. This <strong>manages how requests are distributed</strong> to different servers.</p><p>It does this by <strong>continuously collecting data</strong> from all the servers. It uses this data to figure out the average optimal resource usage for all the servers.</p><p>Each server is given a <strong>PID controller</strong>, a piece of code to help with resource regulation. This has an <strong>upper and lower server resource limit</strong> close to the average.</p><p>Say the average CPU limit is 70%. The upper limit could be 75%, and the lower limit could be 65%. If a server hits 75%, it is given fewer requests to deal with, and if it goes below 65%, it is given more.</p><p>This is <strong>how the LBS gives weights</strong> to each server. Because the LBS uses dynamic weights, a server that previously weighted 5 could become 1 if its resources go above the average.</p><p>In addition to the LBS, <strong>Robinhood had two other components</strong>: the <strong>proxy</strong> and the <strong>routing database</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GAAQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GAAQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 424w, https://substackcdn.com/image/fetch/$s_!GAAQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 848w, https://substackcdn.com/image/fetch/$s_!GAAQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 1272w, https://substackcdn.com/image/fetch/$s_!GAAQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GAAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png" width="539" height="654.034103410341" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1103,&quot;width&quot;:909,&quot;resizeWidth&quot;:539,&quot;bytes&quot;:123449,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GAAQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 424w, https://substackcdn.com/image/fetch/$s_!GAAQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 848w, https://substackcdn.com/image/fetch/$s_!GAAQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 1272w, https://substackcdn.com/image/fetch/$s_!GAAQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F83f9f178-43b3-4026-b6db-1f20a1e7b0e2_909x1103.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <strong>proxy sends server load data to the LBS</strong> via gRPC. </p><p>Why doesn't the LBS collect this itself? Well, the LBS is already doing a lot.</p><p>Imagine there could be thousands of servers. It would need to scale up just to collect metrics from all of them.</p><p>So, the proxy has the sole responsibility of collecting server data to reduce the load on the LBS.</p><p>The <strong>routing database stores server information</strong>. Things like weights generated by the LBS, IP addresses, hostname, etc.</p><p>Although the LBS stores some data in memory for quick access, <strong>an</strong> <strong>LBS itself can come in and out of existence</strong>; sometimes, it crashes and needs to restart.</p><p>The <strong>routing database keeps data for a long time</strong>, so new or existing LBS instances can access it.</p><p>Routing databases can either be Zookeeper or etcd based. The decision to choose one or the other may be to support legacy systems.</p><div><hr></div><h4><em>Sidenote: Zookeeper vs etcd</em></h4><p><em>Both Zookeeper and etcd are what's called a <strong>distributed coordination service</strong>.</em></p><p><em>They are designed to be the <strong>central place where config and state data is stored</strong> in a distributed system.</em></p><p><em>They also make sure that <strong>each node</strong> in the system has the most <strong>up-to-date version of this data</strong>.</em></p><p><em>These services contain multiple servers and elect a single server, called a leader, that takes all the writes.</em></p><p><em>This server copies the data to other servers, which then distribute the data to the relevant clients. In this case, a client could be an LBS instance.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bXRy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bXRy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 424w, https://substackcdn.com/image/fetch/$s_!bXRy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 848w, https://substackcdn.com/image/fetch/$s_!bXRy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 1272w, https://substackcdn.com/image/fetch/$s_!bXRy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bXRy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png" width="666" height="320.64972527472526" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/af11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:701,&quot;width&quot;:1456,&quot;resizeWidth&quot;:666,&quot;bytes&quot;:94238,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bXRy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 424w, https://substackcdn.com/image/fetch/$s_!bXRy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 848w, https://substackcdn.com/image/fetch/$s_!bXRy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 1272w, https://substackcdn.com/image/fetch/$s_!bXRy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faf11fe7b-1dc9-49c7-96ec-c5b6ea0adb38_1536x739.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>So, if a new LBS instance joins the cluster, it knows the exact state of all the servers and the average that needs to be achieved.</em></p><p><em>There are a few differences between Zookeeper and etcd.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T1n9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T1n9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 424w, https://substackcdn.com/image/fetch/$s_!T1n9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 848w, https://substackcdn.com/image/fetch/$s_!T1n9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 1272w, https://substackcdn.com/image/fetch/$s_!T1n9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T1n9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png" width="627" height="289.33756097560973" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:473,&quot;width&quot;:1025,&quot;resizeWidth&quot;:627,&quot;bytes&quot;:82294,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T1n9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 424w, https://substackcdn.com/image/fetch/$s_!T1n9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 848w, https://substackcdn.com/image/fetch/$s_!T1n9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 1272w, https://substackcdn.com/image/fetch/$s_!T1n9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4e63a16-6002-4a30-946f-93ee5bd1bb91_1025x473.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>After <strong>Dropbox deployed RobinHood to all their data centers</strong>, here is the difference it made.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wzfl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wzfl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 424w, https://substackcdn.com/image/fetch/$s_!wzfl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 848w, https://substackcdn.com/image/fetch/$s_!wzfl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 1272w, https://substackcdn.com/image/fetch/$s_!wzfl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wzfl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp" width="691" height="436.6736111111111" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1440,&quot;resizeWidth&quot;:691,&quot;bytes&quot;:21304,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wzfl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 424w, https://substackcdn.com/image/fetch/$s_!wzfl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 848w, https://substackcdn.com/image/fetch/$s_!wzfl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 1272w, https://substackcdn.com/image/fetch/$s_!wzfl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F395f629e-dadf-4560-97cd-f018529203b7_1440x910.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The X axis shows date in MM/DD and the Y axis shows the ratio of CPU usage compared to the average. So, a value of 1.5 means CPU usage was 1.5 times higher than the average.</p><p>You can see that at the start, 95% of CPUs were operating at around 1.17 above the average.</p><p>It takes a <strong>few days for RobinHood to regulate everything</strong>, but after 11/01, the usage is stabilized, and most CPUs are operating at the average. </p><p>This shows a <strong>massive reduction in CPU workload</strong>, which indicates a <strong>better-balanced load</strong>.</p><p>In fact, after using Robinhood in production for a few years, the team at Dropbox has been able to <strong>reduce their server size by 25%</strong>. This massively reduced their costs.</p><p>It isn't stated that Dropbox saved millions annually from this change. But, based on the cost and resource savings they mentioned from implementing Robinhood, as well as their size.</p><p>It can be inferred that they saved a lot of money, most likely millions from this change.</p><h2>Wrapping Things Up</h2><p>It's amazing everything that goes on behind the scenes when someone uploads a file to Dropbox. I will never look at the app in the same way again.</p><p>I hope you enjoyed reading this as much as I enjoyed writing it. If you want more details, you can check out the <a href="https://dropbox.tech/infrastructure/robinhood-in-house-load-balancing-service#footnote-two">original article</a>.</p><p>And as usual, be sure to subscribe to get the next article sent straight to your inbox.</p><div><hr></div><p><strong>Newsletter Spotlight</strong></p><p>If you want to become a more productive tech professional, check out the <a href="https://techproductivity.co/">Tech Productivity</a> newsletter. </p><p>It&#8217;s full of links and tools you wish you had known about earlier.</p><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Stripe Processed $1 Trillion in Payments with Zero Downtime]]></title><description><![CDATA[Stripe's hand-crafted system that guarantees their data never gets lost]]></description><link>https://newsletter.betterstack.com/p/how-stripe-processed-1-trillion-in</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-stripe-processed-1-trillion-in</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 04 Dec 2024 14:02:44 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Stripe</strong> is a platform that <strong>allows businesses to accept payments online</strong> and in person.</p><p>Yes, there are lots of other payment platforms like PayPal and Square. But what makes Stripe so popular is its developer-friendly approach.</p><p>It can be set up with just a few lines of code, has excellent documentation and support for lots of programming languages.</p><p>Stripe is now <strong>used on 2.84 million sites</strong> and processed over <strong>$1 trillion in total payments in 2023</strong>. Wow.</p><p>But what makes this more impressive is they were <strong>able to process all these payments with virtually no downtime</strong>.</p><p>Here's how they did it.</p><p><em>Estimated reading time: 4 minutes 58 seconds.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The Resilient Database</h2><p>When Stripe was starting out, they chose <strong>MongoDB</strong> because they found it <strong>easier to use than a relational database</strong>.</p><p>But as Stripe began to process large amounts of payments. They <strong>needed a solution that could scale with zero downtime during migrations</strong>.</p><p>MongoDB already has a solution for data at scale which involves sharding. But this wasn't enough for Stripe's needs.</p><div><hr></div><h4>Sidenote: MongoDB Sharding</h4><p><em><a href="https://newsletter.betterstack.com/i/147520006/sidenote-sharding">Sharding</a> is the process of <strong>splitting a large database into smaller ones</strong>. This means all the demand is spread across smaller databases.</em></p><p><em>Let's explain how MongoDB does sharding. Imagine we have a database or collection for users.</em></p><p><em>Each document has fields like userID, name, email, and transactions.</em></p><p><em>Before sharding takes place, a <strong>developer must choose a shard key</strong>. This is a field that MongoDB uses to figure out how the data will be split up. In this case, <strong>userID is a good shard key</strong>.</em></p><p><em>If userID is sequential, we could say <strong>users 1-100 will be divided into a chunk</strong>. Then, 101-200 will be divided into another chunk, and so on. The max chunk size is 128MB.</em></p><p><em>From there, <strong>chunks are distributed into shards</strong>, a small piece of a larger collection.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2EzT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2EzT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 424w, https://substackcdn.com/image/fetch/$s_!2EzT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 848w, https://substackcdn.com/image/fetch/$s_!2EzT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 1272w, https://substackcdn.com/image/fetch/$s_!2EzT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2EzT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png" width="1322" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1322,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:91740,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2EzT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 424w, https://substackcdn.com/image/fetch/$s_!2EzT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 848w, https://substackcdn.com/image/fetch/$s_!2EzT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 1272w, https://substackcdn.com/image/fetch/$s_!2EzT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcb0ffc49-1c6f-4773-bf98-81b0832b3edf_1322x760.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>MongoDB creates a <strong>replication set for each shard</strong>. This means each shard is duplicated at least once in case one fails. So, there will be a primary shard and at least one secondary shard.</em></p><p><em>It also creates something called a <strong>Mongos instance</strong>, which is a <strong>query router</strong>. So, if an application wants to read or write data, the instance will route the query to the correct shard.</em></p><p><em>A Mongos instance works with a <strong>config server</strong>, which <strong>keeps all the metadata about the shards</strong>. Metadata includes how many shards there are, which chunks are in which shard, and other data.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SxxY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SxxY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 424w, https://substackcdn.com/image/fetch/$s_!SxxY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 848w, https://substackcdn.com/image/fetch/$s_!SxxY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!SxxY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SxxY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png" width="566" height="540.5666412795126" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1254,&quot;width&quot;:1313,&quot;resizeWidth&quot;:566,&quot;bytes&quot;:129505,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SxxY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 424w, https://substackcdn.com/image/fetch/$s_!SxxY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 848w, https://substackcdn.com/image/fetch/$s_!SxxY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 1272w, https://substackcdn.com/image/fetch/$s_!SxxY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2a894ca4-19cb-43a4-863c-f7981cdad4b2_1313x1254.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Stripe wanted more control over all this data movement or migrations. They also wanted to focus on the reliability of their APIs.</em></p><div><hr></div><p>So, the team <strong>built their own database infrastructure called DocDB</strong> on top of MongoDB.</p><p>MongoDB managed how data was stored, retrieved, and organized. While DocDB handled sharding, data distribution, and data migrations.</p><p>Here is a high-level overview of how it works.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bYFS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bYFS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 424w, https://substackcdn.com/image/fetch/$s_!bYFS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 848w, https://substackcdn.com/image/fetch/$s_!bYFS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 1272w, https://substackcdn.com/image/fetch/$s_!bYFS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bYFS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png" width="488" height="598.3043095866315" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1394,&quot;width&quot;:1137,&quot;resizeWidth&quot;:488,&quot;bytes&quot;:113292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bYFS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 424w, https://substackcdn.com/image/fetch/$s_!bYFS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 848w, https://substackcdn.com/image/fetch/$s_!bYFS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 1272w, https://substackcdn.com/image/fetch/$s_!bYFS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6825a7c4-3a1d-4740-82cc-e1ad9c56dc50_1137x1394.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Aside from a few things the process is similar to MongoDB's. One difference is that all the services are <strong>written in Go to help with reliability and scalability</strong>.</p><p>Another difference is the addition of a CDC. We'll talk about that in the next section.</p><h2>The Data Movement Platform</h2><p>The Data Movement Platform is what Stripe calls the '<em>heart</em>' of DocDB. It's <strong>the system that enables zero downtime</strong> when chunks are moved between shards.</p><p>But why is Stripe moving so much data around?</p><p>DocDB tries to keep a defined data range in one shard, like userIDs between 1-100. Each chunk has a max size limit, which is unknown but likely 128MB.</p><p>So <strong>if data grows in size, new chunks need to be created</strong>, and the extra data needs to be moved into them.</p><p>Not to mention, if someone wants to change the shard key for a more even data distribution. Then, a lot of data would need to be moved.</p><p>This <strong>gets really complex</strong> if you take into account that <strong>data in a specific shard might depend on data from other shards</strong>.</p><p>For example, if user data contains transaction IDs. And these IDs link to data in another collection. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WDjo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WDjo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 424w, https://substackcdn.com/image/fetch/$s_!WDjo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 848w, https://substackcdn.com/image/fetch/$s_!WDjo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 1272w, https://substackcdn.com/image/fetch/$s_!WDjo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WDjo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png" width="640" height="274.794711203897" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:617,&quot;width&quot;:1437,&quot;resizeWidth&quot;:640,&quot;bytes&quot;:156612,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WDjo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 424w, https://substackcdn.com/image/fetch/$s_!WDjo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 848w, https://substackcdn.com/image/fetch/$s_!WDjo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 1272w, https://substackcdn.com/image/fetch/$s_!WDjo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F676e2643-bc4b-4a8f-93f2-10d39be8c001_1437x617.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If a transaction gets deleted or moved, then chunks in different shards need to change.</p><p>These are the kinds of things the Data Movement Platform was created for.</p><p>Here is <strong>how a chunk would be moved</strong> from Shard A to Shard B.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZpOr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZpOr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 424w, https://substackcdn.com/image/fetch/$s_!ZpOr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 848w, https://substackcdn.com/image/fetch/$s_!ZpOr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 1272w, https://substackcdn.com/image/fetch/$s_!ZpOr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZpOr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png" width="1391" height="822" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:822,&quot;width&quot;:1391,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98578,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZpOr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 424w, https://substackcdn.com/image/fetch/$s_!ZpOr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 848w, https://substackcdn.com/image/fetch/$s_!ZpOr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 1272w, https://substackcdn.com/image/fetch/$s_!ZpOr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F46d487ff-5bd2-459e-a174-80e7218ac6ff_1391x822.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>1. <strong>Register the intent.</strong> Tell Shard B that it's getting a chunk of data from Shard A.</p><p>2. <strong>Build indexes on Shard B</strong> based on the data that will be imported. An index is a small amount of data that acts as a reference. Like the contents page in a book. This helps the data move quickly.</p><p>3. <strong>Take a snapshot.</strong> A copy or snapshot of the data is taken <strong>at a specific time</strong>, we'll call this T.</p><p>4. <strong>Import snapshot data</strong>. The data is transferred from the snapshot to Shard B. But during the transfer, the chunk on Shard A can accept new data. Remember, this is a zero-downtime migration.</p><p>5. <strong>Async replication</strong>. After data has been transferred from the snapshot, all the new or changed data on Shard A after T is written to Shard B.</p><p>But how does the system know what changes have taken place? This is where the CDC comes in.</p><div><hr></div><h4><em>Sidenote: CDC</em></h4><p><em><strong>Change Data Capture</strong>, or CDC, is a technique that is used to <strong>capture changes made to data</strong>. It's especially useful for updating different systems in real-time.</em></p><p><em>So when data changes, a <strong>message</strong> containing before and after the change is <strong>sent to an event streaming platform</strong>, like <a href="https://kafka.apache.org/">Apache Kafka</a>. Anything subscribed to that message will be updated.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uDry!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uDry!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 424w, https://substackcdn.com/image/fetch/$s_!uDry!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 848w, https://substackcdn.com/image/fetch/$s_!uDry!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 1272w, https://substackcdn.com/image/fetch/$s_!uDry!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uDry!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png" width="682" height="521.8082001389854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1101,&quot;width&quot;:1439,&quot;resizeWidth&quot;:682,&quot;bytes&quot;:138132,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uDry!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 424w, https://substackcdn.com/image/fetch/$s_!uDry!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 848w, https://substackcdn.com/image/fetch/$s_!uDry!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 1272w, https://substackcdn.com/image/fetch/$s_!uDry!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5df4f87d-f0c9-415c-b2e0-c2b7afee43cd_1439x1101.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>In the case of MongoDB, changes made to a shard are <strong>stored in a special collection called the Operation Log</strong> or Oplog. So when something changes, the <strong>Oplog sends that record to the CDC</strong>.</em></p><p><em>Different <strong>shards can subscribe to a piece of data</strong> and get notified when it's updated. This means they can <strong>update their data accordingly</strong>.</em></p><p><em>Stripe went the extra mile and stored all CDC messages in Amazon S3 for long term storage.</em></p><div><hr></div><p>6. <strong>Point-in-time snapshots.</strong> These are taken throughout the async replication step. They compare updates on Shard A with the ones on Shard B to check they are correct.</p><p>Yes, writes are still being made to Shard A so Shard B will always be behind.</p><p>7. <strong>The traffic switch</strong>. Shard A stops being updated while the final changes are transferred. Then, traffic is switched, so new reads and writes are made on Shard B. </p><p>This <strong>process takes less than two seconds</strong>. So, new writes made to Shard A will fail initially, but will always work after a retry.</p><p>8. <strong>Delete moved chunk.</strong> After migration is complete, the chunk from Shard A is deleted, and metadata is updated.</p><h2>Wrapping Things Up</h2><p>This has to be the most complicated database system I have ever seen.</p><p>It took a lot of research to fully understand it myself. Although I'm sure I'm missing out some juicy details.</p><p>If you're interested in what I missed, please feel free to run through the <a href="https://stripe.com/blog/how-stripes-document-databases-supported-99.999-uptime-with-zero-downtime-data-migrations">original article</a>.</p><p>And as usual, if you enjoy reading about how big tech companies solve big issues, go ahead and <a href="https://newsletter.betterstack.com/">subscribe</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Shopify Reduced Metrics Resources by 75%]]></title><description><![CDATA[Shopify saved resources by breaking big tools into tiny reusable services]]></description><link>https://newsletter.betterstack.com/p/how-shopify-reduced-metrics-resources</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-shopify-reduced-metrics-resources</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 27 Nov 2024 13:02:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Shopify</strong> launched in 2006, and in 2023, made over $7 billion in revenue, with 5.6 million active stores.</p><p>That's almost as much as the population of Singapore.</p><p>But with so many stores, it's essential to ensure they feel quick to navigate through and don't go down.</p><p>So, the team at Shopify <strong>created a system from scratch to monitor their infrastructure</strong>.</p><p>Here's exactly how they did it.</p><p><em>Estimated reading time: 4 minutes 53 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p></p><h2>Shopify's Bespoke System</h2><p>Shopify didn't always have its own system. Before 2021, it <strong>used different third-party services </strong>for logs, metrics, and traces.</p><p>But as it scaled, things started to get <strong>very expensive</strong>. The team also struggled to collect and share data across the different tools.</p><p>So they decided to <strong>build their own observability tool</strong>, which they called <strong>Observe</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!lplS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!lplS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 424w, https://substackcdn.com/image/fetch/$s_!lplS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 848w, https://substackcdn.com/image/fetch/$s_!lplS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 1272w, https://substackcdn.com/image/fetch/$s_!lplS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!lplS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png" width="1456" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:647954,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!lplS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 424w, https://substackcdn.com/image/fetch/$s_!lplS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 848w, https://substackcdn.com/image/fetch/$s_!lplS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 1272w, https://substackcdn.com/image/fetch/$s_!lplS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5193b1bb-8a60-40aa-b7a2-642b64aac0b9_1627x880.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot of Observe from <a href="https://www.youtube.com/watch?v=_tRR9JY7oXY&amp;t=2105s">this video</a></figcaption></figure></div><p>As you can imagine, a lot of work from many different teams went into building the backend of Observe. But the UI was actually <strong>built on top of</strong> <strong>Grafana</strong>.</p><div><hr></div><h4><em>Sidenote: Grafana</em></h4><p><em>Grafana is an <strong>open-source observability tool</strong>. It focuses on visualizing data from different sources using interactive dashboards.</em></p><p><em>Say you have a web application that stores its log data in a database. You give Grafana access to the data and create a dashboard <strong>to visually understand the data</strong>.</em></p><p><em>Of course, you would have to host Grafana yourself to share the dashboard. That's the advantage, or disadvantage, of open-source software.</em></p><p><em>Although Grafana is open-source, it <strong>allows users to extend its functionality</strong> <strong>with plugins</strong>. This works without needing to change the core Grafana code.</em></p><p><em>This is <strong>how Shopify was able to build Observe</strong> on top of it. And use its visualization ability to display their graphs.</em></p><div><hr></div><p>Observe is a <strong>tool for monitoring and observability</strong>. This article will focus on the metrics part.</p><p>Although it has 5.6 million active stores, at most, <strong>Shopify collects metrics from 1</strong> <strong>million endpoints</strong>. An endpoint is <strong>a</strong> <strong>component that can be monitored</strong>, like a server or container. Let me explain.</p><p>Like many large-scale applications, Shopify runs on a <strong>distributed cloud infrastructure</strong>. This means it<strong> uses servers in many locations</strong> around the world. This makes the service fast and reliable for all users.</p><p>The infrastructure also <strong>scales based on traffic</strong>. So if there are many visits to Shopify, more servers get added automatically.</p><p>All 5.6 million stores share this same infrastructure.</p><p>Shopify usually has around <strong>a hundred thousand monitored endpoints</strong>. But this could grow up to <strong>one million at peak times</strong>. Considering a regular company would have around 100 monitored endpoints, 1 million is incredibly high.</p><p>Even after building Observe the team <strong>struggled to handle this many endpoints</strong>.</p><h2>More Metrics, More Problems</h2><p>The Shopify team used an <strong>architecture for collecting metrics</strong> that was pretty standard.</p><p><strong>Kubernetes</strong> to manage their applications and <strong>Prometheus</strong> to collect metrics.</p><p>In the world of Prometheus, a <strong>monitored endpoint is called a target</strong>. And In the world of Kubernetes, a server runs in a container that runs within a pod.</p><div><hr></div><h4><em>Sidenote: Prometheus</em></h4><p><em>Prometheus is an open-source, <strong>metrics-based monitoring system</strong>.</em></p><p><em>It <strong>works by scraping or pulling metrics data</strong> from an application instead of the application pushing or giving data to Prometheus.</em></p><p><em>To use Prometheus on a server, you'll <strong>need to use a metrics exporter</strong> like <a href="https://github.com/siimon/prom-client">prom-client</a> for Node.</em></p><p><em>This will <strong>collect metrics</strong> like memory and CPU usage and<strong> store them in memory</strong> on the application server.</em></p><p><em>The <strong>Prometheus server pulls the in-memory metrics</strong> data every 30 seconds and stores it in a <a href="https://hazelcast.com/glossary/time-series-database/">time series database</a> (TSDB).</em></p><p><em>From there, you can view the metrics data using the Prometheus web UI or a third-party visualization tool like Grafana.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OJAG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OJAG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 424w, https://substackcdn.com/image/fetch/$s_!OJAG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 848w, https://substackcdn.com/image/fetch/$s_!OJAG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 1272w, https://substackcdn.com/image/fetch/$s_!OJAG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OJAG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png" width="1456" height="622" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/778369da-0472-4eac-8da2-0ca96340411b_1689x721.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:622,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:110703,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OJAG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 424w, https://substackcdn.com/image/fetch/$s_!OJAG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 848w, https://substackcdn.com/image/fetch/$s_!OJAG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 1272w, https://substackcdn.com/image/fetch/$s_!OJAG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F778369da-0472-4eac-8da2-0ca96340411b_1689x721.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>There are <strong>two ways to run Prometheus</strong>: server mode and agent mode. </em></p><p><em><strong>Server mode</strong> is the mode explained above that has the Prometheus server, database, and web UI.</em></p><p><em><strong>Agent mode</strong> is designed to <strong>collect and forward the metrics</strong> to any storage solution. So a developer can choose any storage solution that accepts Prometheus metrics.</em></p><div><hr></div><p>The team had many <strong>Prometheus agent pods in a replication set</strong>. A replication set makes sure a specific number of pods are running at any given time.</p><p>Each Prometheus agent would be <strong>assigned a percentage of total targets</strong>. They use the Kubernetes API to check which targets are assigned to them.</p><p>Then search through all the targets to find theirs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vyc3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vyc3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 424w, https://substackcdn.com/image/fetch/$s_!Vyc3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 848w, https://substackcdn.com/image/fetch/$s_!Vyc3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 1272w, https://substackcdn.com/image/fetch/$s_!Vyc3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vyc3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png" width="1456" height="737" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:737,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:133393,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vyc3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 424w, https://substackcdn.com/image/fetch/$s_!Vyc3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 848w, https://substackcdn.com/image/fetch/$s_!Vyc3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 1272w, https://substackcdn.com/image/fetch/$s_!Vyc3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9895d0c8-f0e7-43a5-be82-24f90e385bcf_1684x852.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You can already see what kind of <strong>problems would arise with this approach</strong> when it comes to scaling.</p><p>1. Lots of new targets could cause an <strong>agent to run out of memory</strong> and crash.</p><p>2. Distributing targets by <strong>percentage is uneven</strong>. One target could be a huge application with 100 metrics to track. While another could be small and have just 4. </p><p>But these are nothing compared to the big issue the team discovered.</p><p>Around <strong>50% of an agent's resources</strong> were being used just to <strong>discover targets</strong>.</p><p>Each agent had to go through up to 1 million targets to find the ones assigned to them. So, each pod is doing the exact same piece of work <strong>which is wasteful</strong>.</p><p>To fix this, the team had to <strong>destroy and rebuild</strong> <strong>Prometheus</strong>.</p><h2>Breaking Things Down</h2><p>Since <strong>discovery was taking up most of the resources</strong>, they removed it from the agents. How?</p><p>They went through all the code for a Prometheus agent. <strong>Took out the code related to discovery</strong> and put it in its <strong>own service</strong>.</p><p>But they didn't stop there.</p><p>They gave these discovery services the ability to <strong>scrape all targets every two minutes</strong>.</p><p>This was to <strong>check exactly how many metrics targets had</strong> so they could be shared evenly.</p><p>They also <strong>built an operator service</strong>. This managed the Prometheus agents and <strong>received scraped data from discovery pods</strong>.</p><p>The operator will check if an agent has the capacity to handle the targets; if it did, it will distribute them. If not, it will create a new agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mubA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mubA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 424w, https://substackcdn.com/image/fetch/$s_!mubA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 848w, https://substackcdn.com/image/fetch/$s_!mubA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 1272w, https://substackcdn.com/image/fetch/$s_!mubA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mubA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png" width="1456" height="737" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:737,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:113204,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mubA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 424w, https://substackcdn.com/image/fetch/$s_!mubA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 848w, https://substackcdn.com/image/fetch/$s_!mubA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 1272w, https://substackcdn.com/image/fetch/$s_!mubA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa7eb02ac-d18b-4dd6-a50c-668311ece700_1525x772.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These changes alone <strong>reduced resource usage by</strong> <strong>33%</strong>. A good improvement, but they did better.</p><p>The team had <strong>many discovery pods</strong> to distribute the load and for the process to keep running if one pod crashed. But they realized <strong>each pod was still going through</strong> <strong>all the targets</strong>.</p><p>So they <strong>reduced it to just</strong> <strong>one pod</strong> but also added what they called discovery workers. These were responsible for scraping targets.</p><p>The discovery pod will discover targets then <strong>put the target in a queue to be scraped</strong>. The <strong>workers pick a target</strong> from the queue and <strong>scrape its metrics</strong>.</p><p>The worker then <strong>sends the data to the discovery pod</strong>, which then <strong>sends it to the operator</strong>. </p><p>Of course, the number of workers could be scaled up or down as needed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Zxa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Zxa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 424w, https://substackcdn.com/image/fetch/$s_!0Zxa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 848w, https://substackcdn.com/image/fetch/$s_!0Zxa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 1272w, https://substackcdn.com/image/fetch/$s_!0Zxa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Zxa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png" width="1456" height="681" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:681,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114328,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Zxa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 424w, https://substackcdn.com/image/fetch/$s_!0Zxa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 848w, https://substackcdn.com/image/fetch/$s_!0Zxa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 1272w, https://substackcdn.com/image/fetch/$s_!0Zxa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F621636e5-ded8-41f6-a761-f9ac9368b602_1610x753.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The workers could also filter out unhealthy targets. These are targets that are unreachable or do not respond to scrape requests.</p><p>This further change <strong>reduced resource use by a</strong> <strong>whopping 75%</strong>.</p><h2>Wrapping Things Up</h2><p>This is a common pattern I see when it comes to solving issues at scale. Break things down to their basic pieces, then build them back up.</p><p>All the information from this post was from a series of internal <a href="https://www.youtube.com/playlist?list=PLvQF73bM4-5X9mt0lweCXL_v8xdvrLEvB">YouTube videos</a> about Observe that were made public. I'm glad Shopify did this so others can learn from it.</p><p>Of course, there is more information in <a href="https://www.youtube.com/watch?v=ZXOxx6Cjt5g&amp;t=822s">this video</a> than what this article provides, so please check it out.</p><p>And if you want the next Hacking Scale article sent straight to your inbox, go ahead and <a href="https://newsletter.betterstack.com/">subscribe</a>. You won't be disappointed.</p><p>One final thing. If you&#8217;re thinking of self-hosting Prometheus, it&#8217;s actually cheaper and easier to use Better Stack. Don&#8217;t believe me? <a href="https://betterstack.com/infrastructure-monitoring">Check this out</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p><div><hr></div><p><em>Reading this email in gmail? Drag it to the <strong>Primary tab</strong> and never miss the next one.</em></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jiE-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jiE-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 424w, https://substackcdn.com/image/fetch/$s_!jiE-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 848w, https://substackcdn.com/image/fetch/$s_!jiE-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 1272w, https://substackcdn.com/image/fetch/$s_!jiE-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jiE-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png" width="777" height="159" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:159,&quot;width&quot;:777,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:37420,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jiE-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 424w, https://substackcdn.com/image/fetch/$s_!jiE-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 848w, https://substackcdn.com/image/fetch/$s_!jiE-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 1272w, https://substackcdn.com/image/fetch/$s_!jiE-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdfcfbbfc-985f-404a-8570-d0b7bfa01b9d_777x159.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p>]]></content:encoded></item><item><title><![CDATA[How GitHub Reduced Repo Storage Size by Over 90%]]></title><description><![CDATA[GitHub tackled its data overload by introducing a completely new type of file]]></description><link>https://newsletter.betterstack.com/p/how-github-reduced-repo-storage-size</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-github-reduced-repo-storage-size</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 20 Nov 2024 14:03:31 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa73a85a0-3321-4f06-85ac-f12b7d503186_2548x1431.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>GitHub</strong> supports over 200 programming languages and has over 330 million repositories. But it has a pretty big problem.</p><p>It <strong>stores</strong> <strong>almost 19 petabytes of data</strong>.</p><p>You can store 3 billion songs with one petabyte, so we're talking about <strong>a lot of data</strong>.</p><p>And <strong>much of that data is unreachable</strong>; it's just taking up space unnecessarily.</p><p>But with some clever engineering, GitHub was able to fix that and reduce the size of specific projects by more than 90%.</p><p>Here's how they did it.</p><p><em>Estimated reading time: 4 minutes 25 seconds.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why GitHub has Unreachable Data</h2><p>The <strong>Git</strong> in GitHub comes from the name of a version control system called Git, which was created by the <a href="https://en.wikipedia.org/wiki/Linus_Torvalds">founder of Linux</a>.</p><p>It <strong>works by tracking changes to files </strong>in a project over time using different methods.</p><p>A developer typically installs Git on their local machine. Then, they push their code to GitHub, which has a custom implementation of Git on its servers.</p><p>Although <strong>Git and GitHub are different products</strong>, the GitHub team adds features to Git from time to time.</p><p>So, how does it track changes? Well, <strong>every piece of data Git tracks is stored as an object</strong>.</p><div><hr></div><h4><em><strong>Sidenote: Git Objects and Branches</strong></em></h4><p><em>A <strong>Git object</strong> is something Git uses to <strong>keep track of a repository's content </strong>over time.</em></p><p><em>There are <strong>three main types</strong> of objects in Git.</em></p><p><em>1. <strong>BLOB</strong> -&nbsp;&nbsp;Binary large object. This is what <strong>stores the contents of a file</strong>, not the filename, location, or any other metadata.</em></p><p><em>2. <strong>Tree</strong> - How Git represents directories. A tree <strong>lists blobs and other trees</strong> that exist in a directory.</em></p><p><em>3. <strong>Commit</strong> - A <strong>snapshot of the files</strong> (blobs) and directories (trees) at a point in time. It also contains a parent commit, a <a href="https://newsletter.betterstack.com/i/147520006/sidenote-hashing">hash</a> of the previous commit.</em></p><p><em>A developer manually creates a commit containing hashes of just the blobs and trees that have changed.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L6wM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L6wM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 424w, https://substackcdn.com/image/fetch/$s_!L6wM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 848w, https://substackcdn.com/image/fetch/$s_!L6wM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 1272w, https://substackcdn.com/image/fetch/$s_!L6wM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L6wM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png" width="1456" height="789" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:789,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:407561,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L6wM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 424w, https://substackcdn.com/image/fetch/$s_!L6wM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 848w, https://substackcdn.com/image/fetch/$s_!L6wM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 1272w, https://substackcdn.com/image/fetch/$s_!L6wM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a6a86eb-3bc3-4b23-9511-f74e42f5ff30_2642x1431.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Commit names are difficult for humans to remember, so this is where <strong>branches</strong> come in.</em></p><p><em>A branch is just a <strong>named reference to a commit</strong>, like a label. The default branch is called main or master, and it <strong>points to the most recent commit</strong>.</em></p><p><em>If a new branch is created, it will also point to the most recent commit. But if a new commit is made on the new branch, that commit will not exist on main.</em></p><p><em>This is <strong>useful for working on a feature without affecting the main branch</strong>.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yDjZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yDjZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 424w, https://substackcdn.com/image/fetch/$s_!yDjZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 848w, https://substackcdn.com/image/fetch/$s_!yDjZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!yDjZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yDjZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png" width="418" height="471.41733690795354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1262,&quot;width&quot;:1119,&quot;resizeWidth&quot;:418,&quot;bytes&quot;:129983,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yDjZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 424w, https://substackcdn.com/image/fetch/$s_!yDjZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 848w, https://substackcdn.com/image/fetch/$s_!yDjZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 1272w, https://substackcdn.com/image/fetch/$s_!yDjZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ec72e20-705b-4448-83b5-49815d5232b7_1119x1262.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>Based on how Git keeps track of a project, it is possible to do things that will <strong>make objects unreachable</strong>.</p><p>Here are <strong>three different ways</strong> this could happen:</p><p>1. <strong>Deleting a branch</strong>: Deleting doesn't immediately remove it but <strong>removes the reference</strong> to it. </p><p>Reference is like a signpost to the branch. So the objects in the deleted branch still exist.</p><p>2. <strong>Force pushing</strong>. This replaces a remote branch's commit history with a local branch's history.</p><p>A remote branch could be a branch on GitHub, for example. This means the <strong>old commits lose their reference</strong>.</p><p>3. <strong>Removing sensitive data</strong>. Sensitive data usually exists in many commits. Removing the data from all those commits creates lots of new hashes. This makes those original commits unreachable.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RwzW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RwzW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 424w, https://substackcdn.com/image/fetch/$s_!RwzW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 848w, https://substackcdn.com/image/fetch/$s_!RwzW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!RwzW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RwzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png" width="1456" height="695" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:695,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:187522,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RwzW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 424w, https://substackcdn.com/image/fetch/$s_!RwzW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 848w, https://substackcdn.com/image/fetch/$s_!RwzW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!RwzW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F529aebe5-7d06-4b0d-a8dc-1445a6053a91_2162x1032.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There are many other ways to make <strong>unreachable objects</strong>, but these are the most common.</p><p>Usually, unreachable objects aren't a big deal. They typically <strong>get removed with Git's garbage collection</strong>.</p><div><hr></div><h4><em><strong>Sidenote: Git's Garbage Collection</strong></em></h4><p><em>Garbage collection exists to <strong>remove unreachable objects</strong>.</em></p><p><em>It can be triggered manually using the git gc command. But it also <strong>happens automatically</strong> <strong>during operations</strong> like git commit, git rebase, and git merge.</em></p><p><em>Git <strong>only removes an object if it's old enough</strong> to be considered safe for deletion. This is <strong>typically 2 weeks</strong>. In case a developer accidentally deletes objects and they need to be retrieved.</em></p><p><em>Objects that are too recent to be removed are <strong>kept in Git's objects folder</strong>. These are known as <strong>loose objects</strong>.</em></p><p><em>Garbage collection also <strong>compresses loose, reachable objects into packfiles</strong>. These have a .pack extension.</em></p><p><em>Like most files, packfiles have a <strong>single modification time</strong> (mtime). This means the mtime of individual objects in a packfile would not be known until it&#8217;s uncompressed.</em></p><p><em><strong>Unreachable loose objects are not added to packfiles</strong>. They are left loose to expose their modification time.</em></p><div><hr></div><p>But garbage collection isn't great with large projects. This is because <strong>large projects can create a lot of loose, unreachable objects</strong>, which take up a lot of storage space.</p><p>To solve this, the team at GitHub introduced something called Cruft Packs.</p><h2>Cruft Packs to the Rescue</h2><p><strong>Cruft packs</strong>, as you might have guessed, are a way to <strong>compress loose, unreachable objects</strong>.</p><p>The name "<em>cruft</em>" comes from software development. It refers to outdated and unnecessary data that accumulates over time.</p><p>What makes cruft packs different from packfiles is how they handle modification times.</p><p>Instead of having a single modification time, cruft packs <strong>have a separate .mtimes file</strong>.</p><p>This file <strong>contains the last modification time of all the objects</strong> in the pack. This means Git will be able to remove just the objects over 2 weeks old.</p><p>As well as the .pack file and the .mtimes file, a cruft pack also <strong>contains an index file</strong> with an `.idx` extension.</p><p>This includes the <strong>ID of the object</strong> as well as its <strong>exact location in the packfile</strong>, known as the offset.</p><p>Each object, index, and mtime entry matches the order in which the object was added.</p><p>So the third object in the pack file will match the third entry in the idx file and the third entry in the mtimes file.</p><p>The offset helps Git quickly locate an object without needing to count all the other objects.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!S89Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!S89Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 424w, https://substackcdn.com/image/fetch/$s_!S89Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 848w, https://substackcdn.com/image/fetch/$s_!S89Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!S89Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!S89Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png" width="678" height="403.260989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:866,&quot;width&quot;:1456,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:366088,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!S89Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 424w, https://substackcdn.com/image/fetch/$s_!S89Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 848w, https://substackcdn.com/image/fetch/$s_!S89Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!S89Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe189c438-20c7-403c-8cec-f7ea99422ea4_1769x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Cruft packs were <strong>introduced in Git version 2.37.0</strong> and can be generated by adding the <code>--cruft</code> flag to <code>git gc</code>, so <code>git gc --cruft</code>.</p><p>With this new Git feature implemented, GitHub <strong>enabled it for all repositories</strong>.</p><p>By applying a cruft pack to the main GitHub repo, they were able to reduce its size from 57GB to 27GB, a <strong>reduction of 52%</strong>.</p><p>And in an extreme example, they were able to reduce a 186GB repo to 2GB. That's a <strong>92% reduction</strong>!</p><h2>Wrapping things up</h2><p>As someone who uses GitHub regularly I'm super impressed by this.</p><p>I often hear about their <a href="https://github.com/features/copilot">AI developments</a> and UI improvements. But things like this tend to go under the radar, so it's nice to be able to give it some exposure.</p><p>Check out the <a href="https://github.blog/engineering/architecture-optimization/scaling-gits-garbage-collection/">original article</a> if you want a more detailed explanation of how cruft packs work.</p><p>Otherwise, be sure to <a href="https://newsletter.betterstack.com/">subscribe</a> so you can get the next Hacking Scale article as soon as it's published.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How SQLite made Notion 30% Faster]]></title><description><![CDATA[Notion made a tough bet on emerging tech, and it paid off in a big way]]></description><link>https://newsletter.betterstack.com/p/how-sqlite-made-notion-30-faster</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-sqlite-made-notion-30-faster</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 06 Nov 2024 14:00:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It's difficult to explain exactly what <strong>Notion</strong> is. </p><p>A tool for note-taking, documentation, project management, and more. All wrapped up with great <strong>collaboration features</strong> and a <strong>beautiful UI</strong>.</p><p>It has over <strong>30 million users</strong>, 4 million of whom are paying for it. Not bad. </p><p>But if there was one<strong> common criticism </strong>of Notion, it was that it <strong>felt slow</strong>. Specifically when <strong>navigating between pages</strong>.</p><p>The team managed to make it faster with a few <strong>interesting techniques</strong>.</p><p>Let's go through it.</p><p><em>Estimated reading time: 4 minutes 35 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What Made Notion Slow?</h2><p>If you created a few blank pages in Notion, navigating between them would feel <strong>lightning fast.</strong></p><p>But, if you added some images, tables, charts, and other complex widgets. Navigation would feel <strong>very slow</strong>.</p><p>There are no articles saying what caused this slowness. But we can make some assumptions from technical posts they've written:</p><ol><li><p>Notion depended on many <strong>third-party scripts</strong>. Possibly for collaboration features, analytics, and third-party assets like images.</p></li><li><p><strong>Frequent API calls</strong> being made to Notion's servers. This was because there was no or very limited caching in the browser.</p></li><li><p><strong>CPU cores not being used </strong>for processing tasks. Most people have between <strong>4 and 8 cores</strong>, which means 4 to 8 tasks could be processed at the same time. Notion wasn&#8217;t taking advantage of this.</p></li></ol><p>Some previous attempts were made to fix these issues. The team used <strong>LocalStorage</strong> to cache data in the browser. But this had limited storage, which wasn't great for users with lots of pages.</p><p>They then tried using <strong>IndexedDB </strong>for caching. This was never shipped because it didn't improve performance. In fact, on certain devices it was even slower than <strong>LocalStorage</strong>.</p><div><hr></div><h4><em><strong>Sidenote: LocalStorage vs IndexedDB</strong></em></h4><p><em>Both LocalStorage and IndexedDB are ways to <strong>store data</strong> in the browser. </em></p><p><em>This also means the data will get <strong>saved on a user's device</strong>. Meaning it will exist after <strong>closing a tab</strong> or <strong>restarting</strong> the browser. </em></p><p><em>But there are a few differences between them.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xXJ5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xXJ5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 424w, https://substackcdn.com/image/fetch/$s_!xXJ5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 848w, https://substackcdn.com/image/fetch/$s_!xXJ5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 1272w, https://substackcdn.com/image/fetch/$s_!xXJ5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xXJ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png" width="562" height="431.3490701001431" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1073,&quot;width&quot;:1398,&quot;resizeWidth&quot;:562,&quot;bytes&quot;:468701,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xXJ5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 424w, https://substackcdn.com/image/fetch/$s_!xXJ5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 848w, https://substackcdn.com/image/fetch/$s_!xXJ5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 1272w, https://substackcdn.com/image/fetch/$s_!xXJ5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faeb6cf22-22f4-4eca-80f0-fc5012beceb9_1398x1073.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Because IndexedDB is different across browsers, fixing browser-specific bugs  .</em></p><p><em>Also, some users would have Notion open on different tabs in the same browser. Notion had <strong>fine-grained data</strong>. This means if a page had a wall of text, each paragraph would have its own database row.</em></p><p><em>So lots of data being changed between tabs with IndexedDB would cause major <strong>performance issues</strong>.</em></p><div><hr></div><p>The team improved performance for the <strong>desktop and mobile</strong> apps by using an <strong>SQLite</strong> database to cache data. So it made sense to try it on the browser.</p><p>To their surprise, it worked really well.</p><p></p><h2>Why SQLite Worked</h2><p>SQLite is a database like <strong>MySQL</strong> and <strong>Postgres</strong>, using <a href="https://www.w3schools.com/sql/">SQL</a> as its query language.</p><p>But it's different from them because it holds all its data in a <strong>single file</strong> and <strong>doesn't have a server</strong>.</p><p>Databases tend to use servers to manage data access, prevent conflicts, and control user permissions.</p><p>The lack of a server limits SQLite compared to other databases. But it was ideal for Notion's caching needs.</p><p>SQLite isn't natively supported in browsers. But it does have a <strong>WebAssembly</strong> version.</p><div><hr></div><h4><em><strong>Sidenote: WebAssembly (WASM)</strong></em></h4><p><em>WebAssembly allows you to run code written in languages other than JavaScript <strong>in the browser</strong>.</em></p><p><em>If I wrote a really fast complex calculation in <strong>C++</strong> and wanted to run it in the browser.</em></p><p><em>Instead of <strong>rewriting it in JavaScript</strong>, I could keep the C++ code, compile it to <strong>WebAssembly</strong>, and run it in the browser. </em></p><p><em>Because SQLite is written in <strong>C</strong>, it can be <strong>compiled</strong> into WebAssembly. </em></p><p><em>But a user will have to download <strong>all of SQLite</strong> before it can be used.</em></p><div><hr></div><p>Unfortunately, Notion couldn't just drop SQLite into their project and <strong>call it a day</strong>. They had to make a bunch of changes first.</p><p></p><h2>Problems with SQLite</h2><p>As well as WebAssembly, SQLite uses a few <strong>other web technologies</strong>.</p><p>It uses a <strong>Web Worker</strong> to handle reading and writing to the database.</p><p>Web Workers allow code to <strong>run in the background</strong>, meaning they won't block actions on the main site.</p><p>The SQLite file was stored on the <strong>Origin Private File System</strong> (OPFS). Browsers cannot access a user's file system without their permission.</p><p>So OPFS provides an isolated file system only for the browser. This is <strong>separate</strong> from the main file system. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pKXF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pKXF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 424w, https://substackcdn.com/image/fetch/$s_!pKXF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 848w, https://substackcdn.com/image/fetch/$s_!pKXF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 1272w, https://substackcdn.com/image/fetch/$s_!pKXF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pKXF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png" width="1234" height="274" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/deb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:274,&quot;width&quot;:1234,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pKXF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 424w, https://substackcdn.com/image/fetch/$s_!pKXF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 848w, https://substackcdn.com/image/fetch/$s_!pKXF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 1272w, https://substackcdn.com/image/fetch/$s_!pKXF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdeb30d4d-9885-4656-9d6d-a24d41347b30_1234x274.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>But OPFS has a <strong>crucial limitation</strong>. If one tab is reading or writing to a file, it locks the file to that tab, meaning changes made by other tabs will not work.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dPeC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dPeC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 424w, https://substackcdn.com/image/fetch/$s_!dPeC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 848w, https://substackcdn.com/image/fetch/$s_!dPeC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 1272w, https://substackcdn.com/image/fetch/$s_!dPeC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dPeC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png" width="1349" height="405" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:405,&quot;width&quot;:1349,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:76852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dPeC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 424w, https://substackcdn.com/image/fetch/$s_!dPeC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 848w, https://substackcdn.com/image/fetch/$s_!dPeC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 1272w, https://substackcdn.com/image/fetch/$s_!dPeC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad32fe0-eb5e-45cf-af56-3b6082ac7c61_1349x405.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To fix this, Notion created a system where changes made by other tabs went to a single worker that had access to the database file. This was the <strong>Active Worker</strong>.</p><p>A <a href="https://developer.mozilla.org/en-US/docs/Web/API/SharedWorker">SharedWorker</a> was created to figure out which tab would have the active worker.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!HM8d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!HM8d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 424w, https://substackcdn.com/image/fetch/$s_!HM8d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 848w, https://substackcdn.com/image/fetch/$s_!HM8d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 1272w, https://substackcdn.com/image/fetch/$s_!HM8d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!HM8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png" width="1456" height="673" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:673,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:210166,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!HM8d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 424w, https://substackcdn.com/image/fetch/$s_!HM8d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 848w, https://substackcdn.com/image/fetch/$s_!HM8d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 1272w, https://substackcdn.com/image/fetch/$s_!HM8d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe65a98d4-c4c3-47ff-8eb1-f1c7c1573465_1951x902.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So if the active worker tab was <strong>closed</strong>. The SharedWorker would make another web worker active. </p><div><hr></div><h4><em><strong>Sidenote: The two types of WASM SQLite</strong></em></h4><p><em>SQLite can interact with the OPFS virtual file system (VFS) in two different ways.</em></p><p><em>1. <strong>OPFS sqlite3_vfs</strong></em></p><p><em>2. <strong>OPFS SyncAccessHandle Pool VFS</strong></em></p><p><em>Note, OPFS isn't really a virtual file system; it's just an isolated environment which is where the term <strong>virtual</strong> comes from.</em></p><p><em>The first one, <strong>sqlite3_vfs</strong>, does support running on many tabs. But only works with <strong>cross-origin isolation</strong>. Cross-origin isolation puts the browser in a 'protective bubble' that gives it extra security.</em></p><p><em>But this restricts it from sharing data with <strong>other websites</strong>.</em></p><p><em>This didn't work for Notion because they depended on <strong>third-party scripts</strong>.</em></p><p><em>So they chose <strong>OPFS SyncAccessHandle Pool VFS</strong>.</em></p><p><em>This can only run in one tab. But is supported in all major browsers and has slightly better performance than <strong>sqlite3_vfs</strong>.</em></p><div><hr></div><p>Another issue the team had with this approach was that <strong>pages loaded slower at first</strong>.</p><p>This was because a user would have to <strong>download SQLite</strong> if they didn't have it. It wasn't huge, under 1 MB. But on slower connections, it was noticeable.</p><p>To fix this, the team changed the way SQLite was loaded. Instead of loading it together with the site, they would wait for the page to <strong>finish loading</strong> first before <strong>downloading SQLite</strong>.</p><p>This meant that the initial page data wasn't coming from the cache. But the <strong>slight speed increase</strong> from loading initial data with SQLite wasn't worth the complication.</p><p>In general, the move to SQLite in the browser was a <strong>success</strong>. </p><p>The Notion site in certain parts of the world benefited from a <strong>33% speed increase</strong> when navigating between pages.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7l_q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7l_q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 424w, https://substackcdn.com/image/fetch/$s_!7l_q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 848w, https://substackcdn.com/image/fetch/$s_!7l_q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 1272w, https://substackcdn.com/image/fetch/$s_!7l_q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7l_q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png" width="1170" height="800" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8edc697b-333e-415b-982d-25f0873f7272_1170x800.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:800,&quot;width&quot;:1170,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:64315,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7l_q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 424w, https://substackcdn.com/image/fetch/$s_!7l_q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 848w, https://substackcdn.com/image/fetch/$s_!7l_q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 1272w, https://substackcdn.com/image/fetch/$s_!7l_q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8edc697b-333e-415b-982d-25f0873f7272_1170x800.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from the original article. <strong>Control</strong> is old site performance. <strong>Test</strong> is new.</figcaption></figure></div><p></p><h2>Wrapping Things Up</h2><p>I would love to have seen if this <strong>improved signups</strong> or kept users on the <strong>site for longer</strong>. Maybe Notion is holding off these metrics for <strong>another article</strong>.</p><p>Anyway, I hope you enjoyed this and learned something new. I certainly did.</p><p>If you want more details on this topic, you can check out the <a href="https://www.notion.so/blog/how-we-sped-up-notion-in-the-browser-with-wasm-sqlite">original article</a>.</p><p>Until then, be sure to <a href="https://newsletter.betterstack.com/">subscribe</a> to get the next article <strong>as soon as it&#8217;s released</strong>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 22 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Canva Scaled Their Search to Handle 1M+ Searches Per Minute]]></title><description><![CDATA[Canva massively leveled-up their search by stripping away duplicated processes]]></description><link>https://newsletter.betterstack.com/p/how-canva-scaled-their-search-to</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-canva-scaled-their-search-to</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 23 Oct 2024 13:01:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Canva is a <strong>web-based</strong> design tool. It can be used to create graphics, presentations, videos, and more.</p><p>It's also insanely popular, with over <strong>170 million users</strong> worldwide creating over 180 designs <strong>every second</strong>.</p><p>Canva's search is '<em>foundational</em>' to its success. It allows users to search for templates, design assets, and other media.</p><p>It handles over <strong>20,000 requests</strong> every second and <strong>1 million every minute</strong>. But it has significant architectural issues that causes downtime and performance problems.</p><p>Here's how they addressed it.</p><p><em>Estimated reading time: 4 minutes 35 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Does Canva Have Search?</h2><p>If you have never used Canva, you may be wondering why its search can be <strong>so pivotal</strong>.</p><p>Well, it has a huge content library. With over <strong>100 million stock images</strong>, video and graphics elements, as well as <strong>600,000 templates</strong>.</p><p>It's users commonly search for media assets to help with designs. For example <strong>balloon</strong> or <strong> </strong> assets would help with a birthday card design.</p><p>The search functionality had <strong>4 different servers and search indexes</strong> for each category. Media (images, videos, graphics), templates, fonts and audio.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WkfX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WkfX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 424w, https://substackcdn.com/image/fetch/$s_!WkfX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 848w, https://substackcdn.com/image/fetch/$s_!WkfX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!WkfX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WkfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png" width="1456" height="934" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:934,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:197852,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WkfX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 424w, https://substackcdn.com/image/fetch/$s_!WkfX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 848w, https://substackcdn.com/image/fetch/$s_!WkfX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!WkfX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7c45845-7811-4a83-93ce-104c0a57496c_2032x1304.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There was some <strong>shared code between</strong> the servers but for the most part, they were <strong>completely separate</strong>.</p><p>This meant any updates to the search would have to be done <strong>4 times</strong>. It was also difficult to do things like A/B testing without lots of duplication.</p><p>The team needed to find a way to <strong>reduce repeated code</strong>.</p><div><hr></div><h4><em><strong>Sidenote: Search Index</strong></em></h4><p><em>Imagine you have a <strong>database of sentences</strong>, and you wanted to search the text "brown dog."</em></p><p><em>By default, a <strong>traditional database</strong> search would return an <strong>exact match</strong>. So, the sentence "The <strong>brown dog</strong> jumped over the fence" would be returned. But, "The <strong>dog brown</strong> jumped over the fence" would not be returned.</em></p><p><em>A search index is a data structure designed to help with this problem.</em></p><p><em>It works by breaking the text into <strong>individual words</strong> or <strong>tokens</strong>, then returns all the sentences that contain those words.</em></p><p><em>It also stores things like <strong>word frequency</strong>, the <strong>position of the word</strong> in the sentence, and <strong>caches these results</strong> for faster searches. </em></p><p><em>This of course, takes up <strong>more storage space</strong> and processing power. But it makes searching much faster.</em></p><p><em>A popular piece of software for creating search indexes is <a href="https://en.wikipedia.org/wiki/Apache_Lucene">Apache Lucene</a>. We will talk more about this later.</em></p><div><hr></div><h2>How Search Actually Worked</h2><p>Even though Canva's search wasn't as complex as Google's, it still had <strong>many steps</strong> to go through.</p><p>These are the steps that would take place if a user searched for "<em>brown dog</em>."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dfD2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dfD2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 424w, https://substackcdn.com/image/fetch/$s_!dfD2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 848w, https://substackcdn.com/image/fetch/$s_!dfD2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 1272w, https://substackcdn.com/image/fetch/$s_!dfD2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dfD2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png" width="556" height="908.4642857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2379,&quot;width&quot;:1456,&quot;resizeWidth&quot;:556,&quot;bytes&quot;:335996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dfD2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 424w, https://substackcdn.com/image/fetch/$s_!dfD2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 848w, https://substackcdn.com/image/fetch/$s_!dfD2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 1272w, https://substackcdn.com/image/fetch/$s_!dfD2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18e32cae-c33b-4bb9-8e9f-c25784c62bc8_1468x2399.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><strong>Rewriting</strong>: Transform the query into <strong>standardized text</strong>. In the case of "<em>brown dog</em>," not much would be done. But if it had any spelling mistakes, was in another language or had uppercase letters, it would be rewritten.</p></li><li><p><strong>Tokenization</strong>: Split the text into <strong>individual words</strong> or tokens.</p></li><li><p><strong>Annotation</strong>: Add extra information (metadata) to expand the tokens and the search query. For example, it figures out that "dog" is an animal and "brown" is a color.</p><p>It can also be configured to find <strong>synonyms</strong>, like chocolate or auburn for "brown," and canine or hound for "dog."</p></li><li><p><strong>Candidate generation</strong>: This step <strong>reduces a large amount</strong> of data using various techniques. The goal is to create a smaller set of results based on the annotated query.</p></li><li><p><strong>Re-ranking</strong>: Reorder the narrowed results based on <strong>relevance</strong>. For example, if a user had searched for a vet before, brown dogs with vets might be shown first.</p></li></ol><p>After these five steps, the results are returned to the user. This is known as a <strong>search pipeline</strong>, and these were the core steps that were repeated across the four different search servers.</p><p>The team planned to separate these steps into their <strong>own components</strong>. This meant individuals could contribute without having to understand the entire system.</p><p>They also wanted to change the <strong>candidate generation</strong> step to use <a href="https://www.elastic.co/elasticsearch">Elasticsearch</a> instead of <a href="https://solr.apache.org/">Solr</a>.</p><div><hr></div><h4><em><strong>Sidenote: Elasticsearch vs. Solr</strong></em></h4><p><em>Both Elasticsearch and Solr are platforms designed to <strong>efficiently search through search indexes</strong>.</em></p><p><em>They're both built on top of <a href="https://en.wikipedia.org/wiki/Apache_Lucene">Apache Lucene</a>, the Java-based open-source software used to create search indexes.</em></p><p><em>But, they do have some differences, which mostly favor Elasticsearch.</em></p><ul><li><p><em><strong>Architecture</strong>: Elasticsearch is more optimized for scale because it uses a distributed architecture. Solr uses a more traditional client/server architecture. This requires more manual configuration for scaling.</em></p></li><li><p><em><strong>Querying</strong>: Solr is great for text search, but Elasticsearch is better at filtering, grouping data, and real-time indexing.</em></p></li><li><p><em><strong>Ease of Use</strong>: Solr can be complex to set up for beginners, but Elasticsearch is much easier. It has better documentation and a more user-friendly interface. It also has a larger active community with many plugins and extensions.</em></p></li></ul><p><em>You can easily see why Canva switched from <strong>Solr</strong> to <strong>Elasticsearch</strong>. For more details, the team has put together a <a href="https://www.canva.dev/blog/engineering/migrating-from-solr-to-elasticsearch-and-their-differences/">detailed article on their reasons</a>.</em></p><div><hr></div><h2>The Migration</h2><p>The decision to create components wasn't <strong>made overnight</strong>. There were many prototypes and experiments that led to this.</p><p>The team also paid special attention to creating a stable and clean interface. Making sure they followed <strong>good software design practices</strong>.</p><p>They came up with three criteria for each component:</p><ol><li><p><strong>Transient</strong>: Each component can be removed and reintroduced without any issues.</p></li><li><p><strong>Stateless</strong>: Components should <strong>manage their own state</strong> and not share it with others. For example, the candidate generator doesn't need to know how many annotations have been cached.</p></li><li><p><strong>Ordered</strong>: Each component processes data in the search pipeline order. Apart for the Annotation and Candidate generators. These steps can occur simultaneously. Why?</p><p></p><p>As soon as the first set of annotations are processed&#8212;such as '<em>brown dog</em>,' '<em>chocolate</em>,' and '<em>canine</em>'. They can be sent to the candidate generator to fetch some <strong>basic results</strong>.</p><p></p><p>The annotation generator can continue producing <strong>more synonyms</strong>. These can then be sent to the candidate generator to <strong>refine</strong> the initial set of results.</p></li></ol><p>After lots and lots of work, a <strong>component library</strong> that each search server could use was created.</p><p>The team wrote very detailed articles on the problems. This includes details about their <strong>process</strong> for <strong>creating the component library</strong> from the search pipeline. But, there wasn't much in the way of an <strong>updated architectural diagram</strong>.</p><p>So I've taken some creative freedom in drawing this new diagram based on <strong>my understanding</strong> of their solution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pLqj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pLqj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 424w, https://substackcdn.com/image/fetch/$s_!pLqj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 848w, https://substackcdn.com/image/fetch/$s_!pLqj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!pLqj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pLqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png" width="1456" height="554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:554,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:391283,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pLqj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 424w, https://substackcdn.com/image/fetch/$s_!pLqj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 848w, https://substackcdn.com/image/fetch/$s_!pLqj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 1272w, https://substackcdn.com/image/fetch/$s_!pLqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7eba23c7-0fba-4573-8285-b33ce44d9407_3427x1304.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The new architectural model still has many servers, but each has <strong>identical search pipeline components</strong>.</p><p>This change not only helped Canva release search <strong>updates quicker</strong>. It also allowed components to <strong>scale horizontally</strong> in the server based on the load. And allowed them to add <strong>observability</strong> and <strong>monitoring</strong> to each component of their search.</p><h2><strong>Wrapping Things Up</strong></h2><p>When I first looked at this article on the <strong>Canva blog</strong>, which has a <a href="https://www.canva.dev/blog/engineering/search-pipeline-part-i/">part 1</a> and a<a href="https://www.canva.dev/blog/engineering/search-pipeline-part-ii/"> part 2</a>, I wasn't sure if it would be interesting enough to write about.</p><p>But I'm <strong>surprised</strong> at how much I could get out of it. Who knew search engines for design tools could be so complicated?</p><p>As usual, if you enjoyed this article, go ahead and <a href="https://newsletter.betterstack.com/">subscribe</a> to get the next one as soon as it's written.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 25 hours.</em></p><p></p>]]></content:encoded></item><item><title><![CDATA[How Uber Reduced Their Log Size By 99%]]></title><description><![CDATA[Uber broke apart an open source tool to massively compress their logs]]></description><link>https://newsletter.betterstack.com/p/how-uber-reduced-their-log-size-by</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-uber-reduced-their-log-size-by</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 09 Oct 2024 13:02:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8PPD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Despite all the competition, <strong>Uber</strong> is still the most popular ride-hailing service in the world.</p><p>With over <strong>150 million monthly</strong> active users and <strong>28 million trips</strong> per day, Uber isn't going anywhere anytime soon.</p><p>The company has had its fair share of challenges, and a surprising one has been log messages.</p><p>Uber generates around <strong>5PB</strong> of just INFO-level logs every month. This is when they're storing logs for only <strong>3 days</strong> and deleting them afterward.</p><p>But somehow they managed to reduce storage size by <strong>99%</strong>.</p><p>Here is how they did it.</p><p><em>Estimated reading time: 4 minutes 56 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Uber generates so many logs?</h2><p>Uber collects <strong>a lot of data</strong>: trip data, location data, user data, driver data, even weather data.</p><p>With all this data <strong>moving between systems</strong>, it is important to check, fix, and improve how these systems work.</p><p>One way they do this is by <strong>logging events</strong> from things like user actions, system processes, and errors.</p><p>These events generate a lot of logs&#8212;approximately <strong>200 TB per day</strong>.</p><p>Instead of storing all the log data in one place, Uber stores it in a <strong>Hadoop Distributed File System</strong> (HDFS for short), a file system built for <strong>big data</strong>.</p><div><hr></div><p><em><strong>Sidenote: HDFS</strong></em></p><p><em>A HDFS works by splitting <strong>large files</strong> into smaller <strong>blocks</strong>, around <strong>128MB</strong> by default. Then storing these blocks on different machines (nodes).</em></p><p><em>Blocks are replicated <strong>three times</strong> by default across different nodes. This means if one node fails, data is still available.</em></p><p><em>This impacts storage since it <strong>triples the space</strong> needed for each file.</em></p><p><em>Each node runs a background process called a <strong>DataNode</strong> that stores the block and talks to a <strong>NameNode</strong>, the main node that tracks all the blocks.</em></p><p><em>If a block is added, the DataNode tells the NameNode, which tells the other DataNodes to replicate it.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8PPD!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8PPD!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 424w, https://substackcdn.com/image/fetch/$s_!8PPD!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 848w, https://substackcdn.com/image/fetch/$s_!8PPD!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 1272w, https://substackcdn.com/image/fetch/$s_!8PPD!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8PPD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png" width="1456" height="827" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:827,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:438011,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8PPD!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 424w, https://substackcdn.com/image/fetch/$s_!8PPD!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 848w, https://substackcdn.com/image/fetch/$s_!8PPD!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 1272w, https://substackcdn.com/image/fetch/$s_!8PPD!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F29ba534b-f746-4243-bcd0-e9d07b605f13_2921x1659.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>If a client wants to <strong>read a file</strong>, they communicate with the NameNode, which tells the DataNodes which blocks to send to the client.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1LRd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1LRd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 424w, https://substackcdn.com/image/fetch/$s_!1LRd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 848w, https://substackcdn.com/image/fetch/$s_!1LRd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 1272w, https://substackcdn.com/image/fetch/$s_!1LRd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1LRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png" width="1456" height="812" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:812,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:442121,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1LRd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 424w, https://substackcdn.com/image/fetch/$s_!1LRd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 848w, https://substackcdn.com/image/fetch/$s_!1LRd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 1272w, https://substackcdn.com/image/fetch/$s_!1LRd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8e07783-ec89-4f18-a63d-4d3241cdc559_2921x1629.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>A <strong>HDFS client</strong> is a program that interacts with the HDFS cluster. Uber used one called <strong>Apache Spark</strong>, but there are others like <strong>Hadoop CLI</strong> and <strong>Apache Hive</strong>.</em></p><p><em>A HDFS is <strong>easy to scale</strong>, it's <strong>durable</strong>, and it <strong>handles large data well</strong>.</em></p><div><hr></div><p>To analyze logs well, lots of them need to be <strong>collected over time</strong>. Uber&#8217;s data science team wanted to keep <strong>one months </strong>worth of logs.</p><p>But they could only store them for <strong>three days</strong>. Storing them for longer would mean the cost of their HDFS would reach <strong>millions of dollars per year</strong>. </p><p>There also wasn't a tool that could <strong>manage all these logs</strong> without costing the earth.</p><p>You might wonder why Uber doesn't use <a href="https://clickhouse.com/">ClickHouse</a> or <strong>Google BigQuery</strong> to <strong>compress</strong> and <strong>search</strong> the logs.</p><p>Well, Uber uses ClickHouse for <strong>structured logs</strong>, but a lot of their logs were <strong>unstructured</strong>, which ClickHouse wasn't designed for.</p><div><hr></div><p><em><strong>Sidenote: Structured vs. Unstructured Logs</strong></em></p><p><em>Structured logs are typically <strong>easier to read</strong> and <strong>analyze</strong> than unstructured logs.</em></p><p><em>Here's an example of a <strong>structured</strong> log.</em></p><pre><code><em>{
  "<strong>timestamp</strong>": "2021-07-29 14:52:55.1623",
  "<strong>level</strong>": "Info",
  "<strong>message</strong>": "New report created",
  "<strong>userId</strong>": "4253",
  "<strong>reportId</strong>": "4567",
  "<strong>action</strong>": "Report_Creation"
}</em></code></pre><p><em>And here's an example of an <strong>unstructured</strong> log.</em></p><pre><code><em>2021-07-29 14:52:55.1623 <strong>INFO</strong> New report <strong>4567</strong> created by user <strong>4253</strong></em></code></pre><p><em>The structured log, typically written in JSON, is <strong>easy for humans</strong> and <strong>machines</strong> to read.</em></p><p><em>Unstructured logs need more <strong>complex parsing</strong> for a computer to understand, making them more difficult to analyze.</em></p><p><em>The large amount of unstructured logs from Uber could be down to <strong>legacy systems</strong> that were <strong>not configured</strong> to output structured logs.    </em></p><div><hr></div><p>Uber needed a way to <strong>reduce the size</strong> of the logs, and this is where <strong>CLP</strong> came in.</p><p></p><h2>What is CLP?</h2><p><strong>Compressed Log Processing</strong> (CLP) is a tool designed to compress unstructured logs. It's also designed to <strong>search</strong> the compressed logs without decompressing them.</p><p>It was created by researchers from the University of Toronto, who later founded a company around it called <a href="https://yscope.com/">YScope</a>.</p><p><a href="https://github.com/y-scope/clp">CLP</a> compresses logs by at least <strong>40x</strong>. In an example from YScope, they compressed <strong>14TB</strong> of logs to <strong>328 GB</strong>, which is just <strong>2.26%</strong> of the original size. That's incredible.</p><p>Let's go through how it's able to do this.</p><p>If we take our previous unstructured log example and add an<strong> operation time</strong>.</p><pre><code>2021-07-29 14:52:55.1623 INFO New report 4567 created by user 4253, <strong>operation took 1.23 seconds</strong></code></pre><p>CLP compresses this using these steps.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wb1J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wb1J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 424w, https://substackcdn.com/image/fetch/$s_!wb1J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 848w, https://substackcdn.com/image/fetch/$s_!wb1J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 1272w, https://substackcdn.com/image/fetch/$s_!wb1J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wb1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png" width="1456" height="1033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1033,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:307681,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wb1J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 424w, https://substackcdn.com/image/fetch/$s_!wb1J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 848w, https://substackcdn.com/image/fetch/$s_!wb1J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 1272w, https://substackcdn.com/image/fetch/$s_!wb1J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4695ae23-0d0f-4214-a39e-ac5d1abb33df_1855x1316.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ol><li><p><strong>Parses the message</strong> into a timestamp, variable values, and log type.</p></li><li><p><strong>Splits repetitive variables</strong> into a dictionary and non-repetitive ones into non-dictionary.</p></li><li><p><strong>Encodes</strong> timestamps and non-dictionary variables into a <strong>binary format</strong>.</p></li><li><p>Places log type and variables into a dictionary to deduplicate values.</p></li><li><p>Stores the message in a <strong>three-column table</strong> of encoded messages.</p></li></ol><p>The final table is then compressed again using <a href="https://facebook.github.io/zstd/">Zstandard</a>. A <strong>lossless</strong> compression method developed by <strong>Facebook</strong>.</p><div><hr></div><p><em><strong>Sidenote: Lossless vs. Lossy Compression</strong></em></p><p><em>Imagine you have a <strong>detailed painting</strong> that you want to send to a friend who has <strong>slow internet</strong>.</em></p><p><em>You could compress the image using either <strong>lossy</strong> or <strong>lossless</strong> compression. Here are the differences:</em></p><p><em><strong>Lossy compression</strong> removes some image data while still keeping the general shape so it is identifiable. This is how .<strong>jpg images</strong> and <strong>.mp3 audio</strong> works.</em></p><p><em><strong>Lossless compression</strong> keeps all the image data. It compresses by storing data in a more efficient way.</em></p><p><em>For example, if pixels are <strong>repeated</strong> in the image. Instead of storing all the color information for each pixel. It just stores the color of the <strong>first pixel</strong> and the number of <strong>times it's repeated</strong>.</em> </p><p><em>This is what <strong>.png</strong> and <strong>.wav</strong> files use.</em></p><div><hr></div><p>Unfortunately, Uber were not able to use it directly on their logs; they had to use it in <strong>stages</strong>.</p><p></p><h2>How Uber Used CLP</h2><p>Uber initially wanted to use CLP <strong>entirely</strong> to compress logs. But they realized this approach wouldn't work.</p><p>Logs are streamed from the application to a solid state drive (SSD) before being uploaded to the HDFS. </p><p>This was so they could be <strong>stored quickly</strong>, and transferred to the HDFS in batches.</p><p>CLP works best by compressing <strong>large batches of logs</strong> which isn't ideal for streaming.</p><p>Also, CLP tends to use a lot of memory for its compression, and Uber's SSDs were already under high memory pressure to keep up with the logs.</p><p>To fix this, they decided to split CLPs <strong>4-step compression</strong> approach into <strong>2 phases</strong> doing 2 steps:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3iie!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3iie!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 424w, https://substackcdn.com/image/fetch/$s_!3iie!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 848w, https://substackcdn.com/image/fetch/$s_!3iie!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 1272w, https://substackcdn.com/image/fetch/$s_!3iie!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3iie!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png" width="658" height="488.9807692307692" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1082,&quot;width&quot;:1456,&quot;resizeWidth&quot;:658,&quot;bytes&quot;:200387,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3iie!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 424w, https://substackcdn.com/image/fetch/$s_!3iie!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 848w, https://substackcdn.com/image/fetch/$s_!3iie!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 1272w, https://substackcdn.com/image/fetch/$s_!3iie!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c7c1a4c-83e1-4e42-b6a8-248573ad09a5_1891x1405.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Phase 1:</strong> Only <strong>parse</strong> and <strong>encode</strong> the logs, then compress them with <strong>Zstandard</strong> before sending them to the <strong>HDFS</strong>.</p><p><strong>Phase 2:</strong> Do the <strong>dictionary</strong> and <strong>deduplication</strong> step on batches of logs. Then create compressed columns for each log.</p><p>After <strong>Phase 1</strong>, this is what the logs looked like.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rCOa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rCOa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 424w, https://substackcdn.com/image/fetch/$s_!rCOa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 848w, https://substackcdn.com/image/fetch/$s_!rCOa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 1272w, https://substackcdn.com/image/fetch/$s_!rCOa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rCOa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png" width="1456" height="491" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:491,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:109800,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rCOa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 424w, https://substackcdn.com/image/fetch/$s_!rCOa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 848w, https://substackcdn.com/image/fetch/$s_!rCOa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 1272w, https://substackcdn.com/image/fetch/$s_!rCOa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86ae3a8a-f4aa-44ff-b056-55756668a0df_1471x496.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The &lt;H&gt; tags are  <strong>used to mark different sections</strong>, making it easier to parse.</p><p>From this change the memory-intensive operations were performed on the HDFS instead of the SSD.</p><p>With just <strong>Phase 1</strong> complete (just using 2 out of the 4 of CLPs compression steps). Uber was able to compress <strong>5.38PB</strong> of logs to <strong>31.4TB</strong>, which is <strong>0.6%</strong> of the original size&#8212;a <strong>99.4% reduction</strong>.</p><p>They were also able to increase log retention from <strong>three days</strong> to <strong>one month</strong>.</p><p></p><h2>And that's a wrap</h2><p>You may have noticed <strong>Phase 2</strong> isn&#8217;t in this article. That&#8217;s because it was already getting too long, and we want to make them short and sweet for you.<br><br>Give this article a <strong>like</strong> if you&#8217;re interested in seeing <strong>part 2</strong>! Promise it&#8217;s worth it.</p><p>If you really can&#8217;t wait, here&#8217;s the <a href="https://www.uber.com/en-GB/blog/reducing-logging-cost-by-two-orders-of-magnitude-using-clp/?uclick_id=bb351000-4e6a-47c8-ba88-2484480e32c4">original article</a>, which funnily enough, is also written in <a href="https://www.uber.com/en-GB/blog/modernizing-logging-with-clp-ii/">two parts</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Discord Processes 30+ Petabytes of Data]]></title><description><![CDATA[Discord's genius approach to automating insights from billions of messages]]></description><link>https://newsletter.betterstack.com/p/how-discord-processes-30-petabytes</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-discord-processes-30-petabytes</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 25 Sep 2024 13:02:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Discord is a well-known chat app like Slack, but it was originally <strong>designed for gamers</strong>.</p><p>Today it has a much broader audience and is used by millions of people every day&#8212;<strong>29 million</strong>, to be exact.</p><p>Like many other chat apps, Discord stores and analyzes every single one of its <strong>4 billion daily</strong> messages.</p><p>Let's go through how and why they do that.</p><p><em>Estimated reading time: 4 minutes 56 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Why Does Discord Analyze Your Messages?</h2><p>Reading the opening paragraphs you might be shocked to learn that Discord <strong>stores</strong> <strong>every message</strong>, no matter when or where they were sent.</p><p>Even after a <strong>message is deleted</strong>, they still have access to it.</p><p>Here are a few reasons for that:</p><ol><li><p>Identify <strong>bad communities or members</strong>: scammers, trolls, or those who violate their <a href="https://discord.com/terms">Terms of Service</a>.</p></li><li><p>Figuring out what <strong>new features</strong> to add or how to <strong>improve existing ones</strong>.</p></li><li><p>Training their <strong>machine learning models</strong>. They use them to moderate content, analyze behavior, and rank issues.</p></li><li><p><strong>Understanding their users</strong>. Analyzing engagement, retention, and demographics. </p></li></ol><p>There are a few more reasons beyond those mentioned above. If you're interested, check out their <a href="https://discord.com/privacy#3">Privacy Policy</a>.</p><p>But, don't worry. Discord employees aren't reading your <strong>private messages</strong>. The data gets anonymized before it is stored, so they shouldn't know anything about you.</p><p>And <strong>for analysis</strong>, which is the focus of this article, they do much more.</p><p>When a user sends a message, it is saved in the application-specific database, which uses <a href="https://www.scylladb.com/">ScyllaDB</a>.</p><p>This <strong>data is cleaned</strong> before being used. We&#8217;ll talk more about cleaning later.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZNmO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZNmO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 424w, https://substackcdn.com/image/fetch/$s_!ZNmO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 848w, https://substackcdn.com/image/fetch/$s_!ZNmO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!ZNmO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZNmO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png" width="692" height="362.15934065934067" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:692,&quot;bytes&quot;:216276,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZNmO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 424w, https://substackcdn.com/image/fetch/$s_!ZNmO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 848w, https://substackcdn.com/image/fetch/$s_!ZNmO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 1272w, https://substackcdn.com/image/fetch/$s_!ZNmO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5a79b64d-595e-4de6-86a9-2a832b2a8e0d_2567x1344.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But as Discord began to produce <strong>petabytes of data daily</strong>.</p><p>Yes, petabytes (1,000 terabytes)&#8212;the business needed a <strong>more automated process</strong>.</p><p>They needed a process that would automatically take raw data from the app database, clean it, and transform it to be used for analysis. </p><p>This was being done <strong>manually on request</strong>.</p><p>And they needed a solution that was easy to use for those <strong>outside</strong> of the data platform team.</p><p>This is why they developed <strong>Derived</strong>.</p><div><hr></div><h4><em><strong>Sidenote: ScyllaDB</strong></em></h4><p><em>Scylla is a NoSQL database <strong>written in C++</strong> and designed for <strong>high performance</strong>.</em></p><p><em>NoSQL databases don't use SQL to query data. They also lack a relational model like MySQL or PostgreSQL.</em></p><p><em>Instead, they use a different query language. Scylla uses CQL, which is the <strong>Cassandra Query Language</strong> used by another NoSQL database called <a href="https://cassandra.apache.org/_/index.html">Apache Cassandra</a>.</em></p><p><em>Scylla also <a href="https://newsletter.betterstack.com/i/147520006/problems-with-livegraph">shards databases</a> by default based on the number of <strong>CPU cores available</strong>.</em></p><p><em>For example, an M1 MacBook Pro has 10 CPU cores. So a 1,000-row database will be sharded into 10 databases containing 100 rows each. This helps with speed and scalability.</em></p><p><em>Scylla uses a <strong>wide-column store</strong> (like Cassandra). It stores data in tables with columns and rows. Each row has a unique key and can have a different set of columns. </em></p><p><em>This makes it more flexible than traditional rows, which are determined by columns.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ykvH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ykvH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 424w, https://substackcdn.com/image/fetch/$s_!ykvH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 848w, https://substackcdn.com/image/fetch/$s_!ykvH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 1272w, https://substackcdn.com/image/fetch/$s_!ykvH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ykvH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png" width="404" height="441.037394451146" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:905,&quot;width&quot;:829,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:75559,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ykvH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 424w, https://substackcdn.com/image/fetch/$s_!ykvH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 848w, https://substackcdn.com/image/fetch/$s_!ykvH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 1272w, https://substackcdn.com/image/fetch/$s_!ykvH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a13d191-fda8-45f3-9125-7b7fe83a2fd9_829x905.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>What is Derived?</h2><p>You may be wondering, what's wrong with the <strong>app data</strong> in the first place? Why can't it be used <strong>directly for analysis</strong>?</p><p>Aside from <strong>privacy concerns</strong>, the raw data used by the application is designed for the application, not for analysis.</p><p>The data has information that may not help the business. So, the cleaning process typically <strong>removes unnecessary</strong> <strong>data</strong> before use. This is part of a process called <strong>ETL</strong>. Extract, Transform, Load.</p><p>Discord used a tool called <a href="https://airflow.apache.org/">Airflow</a> for this, which is an <strong>open-source</strong> tool for creating data pipelines. Typically, Airflow pipelines are written in <strong>Python</strong>.</p><p>The cleaned data for analysis is stored in another database called the <strong>Data Warehouse</strong>.</p><p>Temporary tables created from the Data Warehouse are called <strong>Derived Tables</strong>.</p><p>This is where the name "<em>Derived</em>" came from.</p><div><hr></div><h4><em><strong>Sidenote: Data Warehouse</strong></em></h4><p><em>You may have figured this out based on the article, but a data warehouse is a place where the <strong>best</strong> <strong>quality data is stored</strong>.</em></p><p><em>This means the data has been <strong>cleaned</strong> and <strong>transformed</strong> for analysis.</em></p><p><em>Cleaning data means <strong>anonymizing</strong> it. So remove personal info and replace sensitive data with <strong>random text</strong>. Then remove duplicates and make sure things like <strong>dates</strong> are in a consistent format.</em></p><p><em>A data warehouse is the <strong>single source of truth</strong> for all the company's data, meaning data inside it should not be changed or deleted. But, it is possible to create tables based on transformations from the data warehouse.</em></p><p><em>Discord used Google's <a href="https://support.google.com/cloud/answer/6255052?hl=en">BigQuery</a> as their data warehouse, which is a <strong>fully managed</strong> service used to store and process data.</em></p><p><em>It is a service that is part of <strong>Google Cloud Platform</strong>, Google's version of AWS.</em></p><p><em>Data from the Warehouse can be used in business intelligence tools like <a href="https://cloud.google.com/looker/">Looker</a> or <a href="https://www.microsoft.com/en-us/power-platform/products/power-bi">Power BI</a>. It can also train machine learning models.</em></p><div><hr></div><p>Before Derived, if someone needed specific data like the <strong>number of daily sign ups</strong>. They would communicate that to the data platform team, who would manually write the code to create that derived table.</p><p>But with Derived, the requester would create a <a href="https://gist.github.com/DiscordBlog/18d1b39cd1b13c5dbb61caee3eda7726#file-version-two-user-ergonomics-code-block-yml">config file</a>. This would contain the needed data, plus some optional extras. </p><p>This file would be submitted as a <strong>pull request</strong> to the repository containing code for the <strong>data transformations. </strong>Basically a repo containing all the Airflow files. </p><p>Then, <strong>a</strong> <strong>continuous integration</strong> process, something like a <a href="https://docs.github.com/en/actions">GitHub Action</a>, would create the derived table based on the file. </p><p>One config file per table.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PlGe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PlGe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 424w, https://substackcdn.com/image/fetch/$s_!PlGe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 848w, https://substackcdn.com/image/fetch/$s_!PlGe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 1272w, https://substackcdn.com/image/fetch/$s_!PlGe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PlGe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png" width="502" height="520.6646010844307" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1339,&quot;width&quot;:1291,&quot;resizeWidth&quot;:502,&quot;bytes&quot;:145528,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PlGe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 424w, https://substackcdn.com/image/fetch/$s_!PlGe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 848w, https://substackcdn.com/image/fetch/$s_!PlGe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 1272w, https://substackcdn.com/image/fetch/$s_!PlGe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c95ce0c-fb15-4517-9a62-4fb0e1e129c4_1291x1339.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This approach solved the problem of the previous system not being easy to edit by other teams.</p><p>To address the issue of data not being <strong>updated frequently enough</strong>, they came up with a different solution.</p><p>The team used a service called <strong>Cloud Pub/Sub</strong> to update data warehouse data whenever application data changed.</p><div><hr></div><h4><em><strong>Sidenote: Pub/Sub</strong></em></h4><p><em>Pub/Sub is a way to send messages from one application to another.</em></p><p><em>"Pub" stands for <strong>Publish</strong>, and "Sub" stands for <strong>Subscribe</strong>.</em></p><p><em>To send a message (which could be any data) from app A to app B, app A would be the publisher. It would publish the message to a <strong>topic</strong>.</em></p><p><em>A topic is like a channel, but more of a <strong>distribution channel</strong> and less like a TV channel. App B would subscribe to that topic and receive the message.</em></p><p><em>Pub/Sub is different from <a href="https://en.wikipedia.org/wiki/Request%E2%80%93response">request/response</a> and other <strong>messaging patterns</strong>. This is because publishers don&#8217;t wait for a response before sending another message. </em></p><p><em>And in the case of Cloud Pub/Sub, if app B is down when app A sends a message, the topic keeps it until app B is <strong>back online</strong>.</em></p><p><em>This means messages will <strong>never be lost</strong>.</em></p><div><hr></div><p>This method was used for important tables that needed frequent updates. Less critical tables were <strong>batch-updated</strong> every hour or day.</p><p>The final focus was <strong>speed</strong>. The team copied <strong>frequently used tables</strong> from the data warehouse to a Scylla database. They used it to run queries, as BigQuery isn't the fastest for that.</p><p>With all that in place, this is what the final process for analyzing data looked like:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JZvv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JZvv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 424w, https://substackcdn.com/image/fetch/$s_!JZvv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 848w, https://substackcdn.com/image/fetch/$s_!JZvv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 1272w, https://substackcdn.com/image/fetch/$s_!JZvv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JZvv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png" width="728" height="377.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:755,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:509255,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JZvv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 424w, https://substackcdn.com/image/fetch/$s_!JZvv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 848w, https://substackcdn.com/image/fetch/$s_!JZvv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 1272w, https://substackcdn.com/image/fetch/$s_!JZvv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e2d6c4e-5348-4e6d-baaf-6a28d8f4043a_3475x1802.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Wrapping Things Up</h2><p>This topic is a bit different from the usual posts here. It's more <strong>data-focused</strong> and less <strong>engineering-focused</strong>. But scale is scale, no matter the discipline.</p><p>I hope this gives some insight into the issues that a data platform team may face with <strong>lots of data</strong>.</p><p>As usual, if you want a much more detailed account, check out the <a href="https://discord.com/blog/how-discord-creates-insights-from-trillions-of-data-points">original article</a>.</p><p>If you would like more technical summaries from companies like <strong>Uber and Canva</strong>, go ahead and <a href="http://If you would like more technical summaries from companies like Uber and Canva, go ahead and subscribe.">subscribe</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 25 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[Figma's 100x Approach to Scaling Its Collaborative Experience]]></title><description><![CDATA[How Figma destroyed their old setup and used the pieces to build a better one]]></description><link>https://newsletter.betterstack.com/p/figmas-100x-approach-to-scaling-its</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/figmas-100x-approach-to-scaling-its</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 11 Sep 2024 13:03:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Figma is a <strong>web-based</strong> design tool. It&#8217;s great for creating user interfaces for websites, mobile apps, and similar projects.</p><p>What makes it so unique is its amazing <strong>collaboration features</strong>.</p><p>Users can work on the same file <strong>simultaneously</strong>. Seeing exactly what everyone is doing as they edit and comment.</p><p>This is one reason Figma has <strong>4 million active users</strong>. And also why they focus on making collaboration fast, even as their user base grows.</p><p>They do this in many ways, but one important way is by developing a system called <strong>LiveGraph</strong>.</p><p><em>Estimated reading time: 4 minutes 40 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>LiveGraph?</h2><p>When Figma first launched its <a href="https://www.figma.com/blog/multiplayer-editing-in-figma/">collaboration feature back in 2016</a>, it ran on a simple tech stack.</p><p><strong>React</strong> on the frontend, <strong>Ruby</strong> on the backend. <strong>Redux</strong> for state management, and <strong>WebSockets</strong> for real-time communication.</p><p>Collaboration worked like this. When a client changed something, like renaming a file. The <strong>database</strong> was updated. And then that event was <strong>broadcasted</strong> to other clients.</p><p>Events were <strong>hand-coded</strong>. The process was <strong>time-consuming</strong> and became complicated as Figma added more features.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ze71!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ze71!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 424w, https://substackcdn.com/image/fetch/$s_!Ze71!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 848w, https://substackcdn.com/image/fetch/$s_!Ze71!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 1272w, https://substackcdn.com/image/fetch/$s_!Ze71!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ze71!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png" width="530" height="355.27472527472526" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:976,&quot;width&quot;:1456,&quot;resizeWidth&quot;:530,&quot;bytes&quot;:227299,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ze71!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 424w, https://substackcdn.com/image/fetch/$s_!Ze71!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 848w, https://substackcdn.com/image/fetch/$s_!Ze71!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 1272w, https://substackcdn.com/image/fetch/$s_!Ze71!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6fe52ede-2f39-40af-a9d3-2623ecca0fab_2273x1523.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is why <strong>LiveGraph</strong> was created.</p><p>It was inspired by <a href="https://graphql.org/">GraphQL,</a> built <strong>in-house,</strong> and serves as a <strong>data store or cache</strong> for all clients. It also manages database queries and updates.</p><p>This means that instead of having to <strong>manually</strong> write an event to broadcast. Clients <strong>subscribe</strong> to LiveGraph.</p><p>So clients know whenever something changes. Like subscribing to a newsletter or your favorite YouTube channel.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zpRo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zpRo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 424w, https://substackcdn.com/image/fetch/$s_!zpRo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 848w, https://substackcdn.com/image/fetch/$s_!zpRo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 1272w, https://substackcdn.com/image/fetch/$s_!zpRo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zpRo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png" width="534" height="357.95604395604397" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:976,&quot;width&quot;:1456,&quot;resizeWidth&quot;:534,&quot;bytes&quot;:218392,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zpRo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 424w, https://substackcdn.com/image/fetch/$s_!zpRo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 848w, https://substackcdn.com/image/fetch/$s_!zpRo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 1272w, https://substackcdn.com/image/fetch/$s_!zpRo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F884b8c35-6b52-4c13-b763-d18c81d84427_2273x1523.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This approach was <strong>easier to manage</strong>.</p><div><hr></div><h4><em><strong>Sidenote: GraphQL</strong></em></h4><p><em>GraphQL is a query language that makes it easy to <strong>request data from a server</strong>.</em></p><p><em>The reason for using it over REST is simple. With REST, you need many endpoints to get different data. With this, you can write a <strong>single endpoint</strong> to get all the data you need.</em></p><p><em>Using <strong>JSON-like syntax</strong> for the query, a single or many HTTP POST requests are made to the server.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!x0ya!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!x0ya!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 424w, https://substackcdn.com/image/fetch/$s_!x0ya!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 848w, https://substackcdn.com/image/fetch/$s_!x0ya!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 1272w, https://substackcdn.com/image/fetch/$s_!x0ya!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!x0ya!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png" width="678" height="273.8076923076923" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:588,&quot;width&quot;:1456,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:265917,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!x0ya!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 424w, https://substackcdn.com/image/fetch/$s_!x0ya!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 848w, https://substackcdn.com/image/fetch/$s_!x0ya!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 1272w, https://substackcdn.com/image/fetch/$s_!x0ya!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F057dcf0f-b926-4573-b88d-86ffc0f88c80_3237x1307.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>With GraphQL however, it's difficult to get real-time updates, especially with <a href="https://www.ibm.com/topics/event-streaming">event streams</a>.</em></p><p><em>This is why Figma built <strong>LiveGraph</strong>.</em></p><div><hr></div><h2>Problems with LiveGraph</h2><p>LiveGraph worked well for the user base Figma had at the time. But since its release,  Figma users <strong>tripled</strong> and page views <strong>increased by 5x</strong>.</p><p>This all ran on a <strong>single Postgres database</strong> which had tables <strong>several terabytes</strong> in size.</p><p>A technique called <strong>sharding</strong> solved this giving the database near-infinite scalability.</p><p>LiveGraph wasn't designed for such a large database; it also <strong>needed to be scaled</strong>. </p><p>This started a project called <a href="https://www.figma.com/blog/livegraph-real-time-data-at-scale/#livegraph-100x-a-new-architecture">LiveGraph 100x</a>.</p><p>This article will focus on two techniques Figma used to scale LiveGraph. </p><ol><li><p>Improving  <strong>caching</strong> </p></li><li><p>Creating an <strong>invalidator</strong>.</p></li></ol><div><hr></div><h4><em><strong>Sidenote: Sharding</strong></em></h4><p><em>For most projects, it's perfectly fine to have a <strong>single database</strong>.</em></p><p><em>But for databases with <strong>millions or even billions of rows</strong>, a single database would consume <strong>lots of resources</strong>.</em></p><p><em>Sharding is a way to split a large database into<strong> smaller databases</strong>, each containing a subset of the data.</em></p><p><em>There is <strong>horizontal sharding</strong>, which splits the database by rows. And there is <strong>vertical sharding</strong>, which splits it by columns. </em></p><p><em>Figma implemented a <a href="https://www.figma.com/blog/how-figmas-databases-team-lived-to-tell-the-scale/">horizontal sharding solution</a>.</em></p><p><em>As a simple example, if a database had <strong>1,000 rows</strong>, it could be sharded into 10 databases, each containing <strong>100 rows</strong>.</em></p><div><hr></div><h2>Improving The Caching</h2><p>The obvious first step to scaling LiveGraph was to <strong>increase its number of instances</strong>. This would allow the <strong>workload to be shared</strong>.</p><p>The problem with this is that it would create many caches. The database wouldn't know which cache requested the data, so it would send the data to <strong>all of them</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0BkY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0BkY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 424w, https://substackcdn.com/image/fetch/$s_!0BkY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 848w, https://substackcdn.com/image/fetch/$s_!0BkY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 1272w, https://substackcdn.com/image/fetch/$s_!0BkY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0BkY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png" width="502" height="336.5054945054945" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:976,&quot;width&quot;:1456,&quot;resizeWidth&quot;:502,&quot;bytes&quot;:139472,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0BkY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 424w, https://substackcdn.com/image/fetch/$s_!0BkY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 848w, https://substackcdn.com/image/fetch/$s_!0BkY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 1272w, https://substackcdn.com/image/fetch/$s_!0BkY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3153896c-9d91-4e8c-92fd-53fc1378e795_1473x987.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>To solve this, the cache was taken out and moved to a single <strong>centralized cache</strong>.</p><p>Now all LiveGraph instances would use one cache. And data would go to <strong>the same place</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BHLK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BHLK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 424w, https://substackcdn.com/image/fetch/$s_!BHLK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 848w, https://substackcdn.com/image/fetch/$s_!BHLK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 1272w, https://substackcdn.com/image/fetch/$s_!BHLK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BHLK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png" width="548" height="293.19505494505495" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:779,&quot;width&quot;:1456,&quot;resizeWidth&quot;:548,&quot;bytes&quot;:164015,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BHLK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 424w, https://substackcdn.com/image/fetch/$s_!BHLK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 848w, https://substackcdn.com/image/fetch/$s_!BHLK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 1272w, https://substackcdn.com/image/fetch/$s_!BHLK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8269bf90-bfd8-4fbd-94bf-c9125721ae52_1844x987.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But this raised another issue.</p><p>The large number of users and their Figma activities could make the cache <strong>very large</strong>.</p><p>So they decided to <strong>shard the cache</strong>, just as they did with the database.</p><p>The cache was a key-value store containing the <strong>query as the key</strong> and the <strong>results as the value</strong>. Sharding worked by <strong>hashing</strong> the key, then storing both the key and the value.</p><p>This meant a LiveGraph instance would only use <strong>its needed cache shards</strong>, not the entire cache.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JHsX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JHsX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 424w, https://substackcdn.com/image/fetch/$s_!JHsX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 848w, https://substackcdn.com/image/fetch/$s_!JHsX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 1272w, https://substackcdn.com/image/fetch/$s_!JHsX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JHsX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png" width="538" height="338.0975274725275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:915,&quot;width&quot;:1456,&quot;resizeWidth&quot;:538,&quot;bytes&quot;:218410,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JHsX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 424w, https://substackcdn.com/image/fetch/$s_!JHsX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 848w, https://substackcdn.com/image/fetch/$s_!JHsX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 1272w, https://substackcdn.com/image/fetch/$s_!JHsX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d00cd08-a715-4351-bd16-88b6d7727bef_1800x1131.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>But this solution introduces<strong> two problems</strong>.</p><ol><li><p>The database wouldn't know which cache shard to<strong> send the data to.</strong> </p></li><li><p>Changing one shard might require <strong>changes to another shard</strong>.</p></li></ol><p>For example, if a file is <strong>moved to a different folder</strong>, two shards would need to be updated. The shard that stores files from the <strong>old folder</strong> and the shard that stores files from the <strong>new folder</strong>. </p><p>To solve these problems, the team created an <strong>Invalidator</strong>.</p><div><hr></div><h4><em><strong>Sidenote: Hashing</strong></em></h4><p><em>Hashing is the process of converting data into a <strong>fixed-length</strong> of letters and numbers.</em></p><p><em>As a simple example, imagine we want to hash the word "hello."</em></p><p><em>We could take the <strong>position of each letter in the alphabet</strong>, then add them all together. So "hello" would become <strong>52</strong>.</em></p><p><em>Of course, modern hashing algorithms are much more complex. </em></p><p><em>Using <a href="https://www.techtarget.com/searchsecurity/definition/MD5">the MD5 algorithm</a>, "hello" would be:</em></p><pre><code><code>5d41402abc4b2a76b9719d911017c592</code><em>.</em></code></pre><p><em>Hashing is <strong>very fast</strong>. It provides a <strong>unique ID</strong> for data, and the same input with the same algorithm will always produce the <strong>same output</strong>.</em></p><p><em>They are commonly used to <strong>store passwords</strong> and <strong>create indexes</strong> for things like databases.</em></p><div><hr></div><h2>Creating an Invalidator</h2><p>The system now has <strong>many cache shards</strong>. A Figma user typically only uses a <strong>specific part</strong> of the app at a time. So a LiveGraph instance doesn't need to subscribe to all the shards.</p><p>But sometimes an update in one shard would trigger<strong> a different shard to update</strong>. And if LiveGraph isn't subscribed to a shard, it wouldn't display the update.</p><p>Think back to the <strong>file moving example</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WGBV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WGBV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 424w, https://substackcdn.com/image/fetch/$s_!WGBV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 848w, https://substackcdn.com/image/fetch/$s_!WGBV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 1272w, https://substackcdn.com/image/fetch/$s_!WGBV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WGBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png" width="1456" height="602" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:602,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:191557,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WGBV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 424w, https://substackcdn.com/image/fetch/$s_!WGBV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 848w, https://substackcdn.com/image/fetch/$s_!WGBV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 1272w, https://substackcdn.com/image/fetch/$s_!WGBV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e73eac6-1d59-4711-af84-c508d44e1f5d_2169x897.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>To save resources, instead of automatically updating shards, they were <strong>marked as invalid</strong>.</p><p>Meaning if in the future, a LiveGraph instance subscribes to an invalid shard. It knows it needs to get <strong>updated data</strong> from the database.</p><p>The job of the <strong>invalidator</strong> was to mark the correct shards as invalid.</p><p>It was a server that was sharded like the database. And worked by <strong>reading the logs</strong> of that specific database shard.</p><p>It would then figure out what cache shards to mark as invalid based on <strong>mutations</strong> in the DB.</p><p>With both cache sharding and invalidators in place, here is the new LiveGraph.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fJov!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fJov!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 424w, https://substackcdn.com/image/fetch/$s_!fJov!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 848w, https://substackcdn.com/image/fetch/$s_!fJov!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 1272w, https://substackcdn.com/image/fetch/$s_!fJov!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fJov!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png" width="1456" height="538" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:538,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:289177,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fJov!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 424w, https://substackcdn.com/image/fetch/$s_!fJov!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 848w, https://substackcdn.com/image/fetch/$s_!fJov!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 1272w, https://substackcdn.com/image/fetch/$s_!fJov!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfbe17e2-f4c5-4350-82c2-c905c4f66769_2594x959.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>It is truly impressive</strong> how the engineering team at Figma came up with these solutions.</p><h2>Wrapping Things Up</h2><p>It's important to remember that Figma built all these technologies <strong>in-house. This means</strong> there are a lot of details not<strong> covered in this article; </strong>for example,</p><ul><li><p>Specifics of <strong>subscribing to cache shards</strong>.</p></li><li><p>How the invalidator <strong>creates queries</strong> by comparing data before and after a change.</p></li><li><p>And how <strong>queries are routed</strong> to the correct database or cache shard.</p></li></ul><p>This is all covered in the <a href="https://www.figma.com/blog/livegraph-real-time-data-at-scale/">original article.</a> As well as a <a href="https://www.figma.com/blog/how-figmas-databases-team-lived-to-tell-the-scale/">database scaling article</a> and <a href="https://www.youtube.com/watch?v=bnvF-IsQaUE">a talk</a> given by one of Figma's engineers.</p><p>But if you're satisfied with this information and can't wait for the next one. Go ahead and <a href="https://newsletter.betterstack.com/">subscribe</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 25 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[Here's What Really Caused 8.5 Million Computers to Crash]]></title><description><![CDATA[How one security product crippled the world because of bad programming]]></description><link>https://newsletter.betterstack.com/p/heres-what-really-caused-85-million</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/heres-what-really-caused-85-million</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 28 Aug 2024 13:00:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Fn9t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>On <strong>July 19th, 2024</strong> the largest cyber event in history hit the world. It affected <strong>8.5 million Windows machines</strong> used for finance, healthcare, and other sectors.</p><p>The event was caused by the cybersecurity software, <strong>CrowdStrike Falcon</strong>. I mean, you couldn't make this up. </p><p>The software designed to <strong>protect against attacks</strong> was what caused the problem.</p><p>Although 8.5 million computers is less than <strong>1% of Windows machines worldwide</strong>. The last major <a href="https://en.wikipedia.org/wiki/WannaCry_ransomware_attack">cyber event</a> in 2017 affected <strong>300,000 computers</strong>. So this is a massive step up.</p><p>But what exactly happened?</p><p><em>Estimated reading time: 4 minutes 55 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What is CrowdStrike Falcon?</h2><p>You've most likely heard of <strong>antivirus software</strong> like Norton or McAfee.</p><p>Falcon is like that, but it focuses on protecting a <strong>large network</strong> instead of a single computer. This is known as <strong>endpoint security</strong>.</p><p>It's cloud-based, but each machine in the network (endpoint) installs a small piece of software called a <strong>Falcon Sensor</strong>.</p><p>Once installed, the sensor constantly monitors and sends information to <strong>Falcon's servers</strong>.</p><p>These servers analyze all data from sensors using <strong>machine learning</strong> and <strong>threat intelligence</strong>. Basically, CrowdStrike uses its vast knowledge of cyber threats and hackers to check if a machine is infected. </p><p>To add to that, CrowdStrike has a <strong>team of researchers</strong> checking for <strong>new threats</strong> and adding their findings back to <strong>Falcon's database</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fn9t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fn9t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 424w, https://substackcdn.com/image/fetch/$s_!Fn9t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 848w, https://substackcdn.com/image/fetch/$s_!Fn9t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 1272w, https://substackcdn.com/image/fetch/$s_!Fn9t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fn9t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png" width="1456" height="1331" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1331,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:316727,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fn9t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 424w, https://substackcdn.com/image/fetch/$s_!Fn9t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 848w, https://substackcdn.com/image/fetch/$s_!Fn9t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 1272w, https://substackcdn.com/image/fetch/$s_!Fn9t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e4f6371-9931-4cdb-aa06-8feb86e395f8_2222x2031.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The sensors themselves have a '<em>detection engine</em>' that also uses machine learning. This <strong>analyzes files</strong> and <strong>system processes</strong>.</p><p>Security engineers using Falcon have access to a web interface. This shows all monitored endpoints and alerts them if any threats are detected.</p><p>Sounds impressive. </p><p>If I owned a large organization with hundreds or <strong>thousands of machines</strong> to protect, I'd definitely buy Falcon.</p><p>But what's interesting about the sensors is that they operate at the <strong>kernel level</strong>. This is something few other programs do.</p><p>It has <strong>complete access</strong> to a machine. Meaning it can check <strong>all system activities</strong>, including <strong>hardware info</strong> such as memory and disk usage.</p><div><hr></div><p><em><strong>Sidenote: The Kernel</strong></em></p><p><em>The kernel is a <strong>program in the operating system</strong>. It sits between hardware and software, managing their communication.</em></p><p><em>If your browser needs more memory, it doesn't need to know the <strong>type or amount of memory </strong>available; it just asks the kernel.</em></p><p><em>The kernel also stops software from accessing <strong>core system functions</strong>. These include CPU control, system configuration, and power management.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!j07H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!j07H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 424w, https://substackcdn.com/image/fetch/$s_!j07H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 848w, https://substackcdn.com/image/fetch/$s_!j07H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 1272w, https://substackcdn.com/image/fetch/$s_!j07H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!j07H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png" width="416" height="403.73378264532437" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1152,&quot;width&quot;:1187,&quot;resizeWidth&quot;:416,&quot;bytes&quot;:89364,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!j07H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 424w, https://substackcdn.com/image/fetch/$s_!j07H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 848w, https://substackcdn.com/image/fetch/$s_!j07H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 1272w, https://substackcdn.com/image/fetch/$s_!j07H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43d49861-6a70-4226-9e4b-4d10eefd142c_1187x1152.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Third-party software installed at the kernel level sits <strong>between the kernel and the software</strong>. So it can access core system functions.</em></p><p><em>They can also be installed and updated from the internet <strong>in the background,</strong> without the user knowing.</em></p><p><em>If the kernel encounters a critical error that it doesn't know how to handle, known as a <strong>kernel panic</strong>, it will typically cause a blue screen on Windows.</em></p><div><hr></div><p>As well as being able to <strong>receive data</strong> from sensors, Falcon servers can also <strong>send data</strong> to sensors (such as updates).</p><p>In this case, a malformed <strong>configuration update</strong> was sent to <strong>all Windows sensors</strong>, which caused those machines to stop working.</p><p>Because the issue was at the kernel level, it <strong>prevented Windows from starting correctly</strong>.</p><p>Who would have thought that such a small change would cause such a <strong>big problem</strong>?</p><p></p><h2>The Configuration File</h2><p>As mentioned before, the cause of the <strong>mass outage</strong> was a configuration update sent to all Falcon sensors for Windows.</p><p>Configuration updates are placed inside &#8216;channel files.&#8217; In this case, the affected file was <strong>channel file 291</strong>, or C-00000291.sys. These are located in the Windows critical systems folder.</p><p>This specific update changed how Falcon analyzes 'named pipes' in Windows. A way for different processes, or programs, to <strong>communicate with each other</strong> even if they are on completely <strong>different machines</strong>.</p><div><hr></div><p><em><strong>Sidenote: Named Pipes</strong></em></p><p><em>Imagine you have two small programs, one for <strong>adding two numbers</strong> (add) and one for <strong>doubling a number </strong>(double).</em></p><p><em>If you wanted to add a number and then double the result of that addition, you could do it like this:</em></p><pre><code>add 1 2 | double  # result 6 (3 x 2)</code></pre><p><em>The </em><code>|</code><em> character is called a pipe and it's used to connect <strong>the output of one program to the input of another</strong>.</em></p><p><em>This <strong>connection is temporary,</strong> so it closes when the program finishes. But if you want a more permanent connection, you could use a <strong>named pipe</strong>.</em></p><p><em>This is a special file that acts as a <strong>communication channel between two programs</strong>. So using a named pipe for the previous example would look like this:</em></p><pre><code>add 1 2 &gt; <strong>myNamedPipe</strong>
double &lt; <strong>myNamedPipe </strong># result 6</code></pre><p><em>This is a very simple example, but the same concept applies to <strong>most of the programs </strong>on your computer. </em></p><p><em>For example, a <strong>browser extension</strong> wanting to communicate with a <strong>local password manager</strong> would use a named pipe.</em></p><div><hr></div><p>The team discovered that hackers could use named pipes as an <strong>attack technique</strong>, so the configuration update was sent to detect this.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YKB4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YKB4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 424w, https://substackcdn.com/image/fetch/$s_!YKB4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 848w, https://substackcdn.com/image/fetch/$s_!YKB4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 1272w, https://substackcdn.com/image/fetch/$s_!YKB4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YKB4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png" width="678" height="257.97527472527474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:554,&quot;width&quot;:1456,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:99384,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YKB4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 424w, https://substackcdn.com/image/fetch/$s_!YKB4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 848w, https://substackcdn.com/image/fetch/$s_!YKB4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 1272w, https://substackcdn.com/image/fetch/$s_!YKB4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F56644d52-324f-4afa-80ac-cad78aad4b5c_1715x653.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What's interesting is that this update was first released at the end of <strong>February 2024</strong> and was stress-tested at the <strong>beginning of March</strong>. </p><p>Which means it was available in a customer&#8217;s sensor, but was not used <strong>until after it was stress tested</strong>. </p><p>A confusing way to release something, but it <strong>wasn&#8217;t breaking Windows</strong> when it was first released.</p><p>The July 19th update was a <em>bug fix</em> to what was <strong>tested in March</strong>. The main feature was <strong>already stress-tested</strong>, so the team saw no need to heavily test this specific fix.</p><p>Although <strong>light testing</strong> was conducted, <a href="https://www.siliconrepublic.com/enterprise/crowdstrike-it-outage-bug-update-microsoft">there was a bug in their testing system</a> that prevented an issue in the bug fix from being caught.</p><p>It was only after they released it that they noticed channel file 291 was causing errors when <strong>trying to access some memory that didn't exist</strong>. </p><p>Which was something the kernel didn't know how to handle, resulting in a Windows crash.</p><p>Some speculate that this crash was caused by a file <a href="https://x.com/jeremyphoward/status/1814364640127922499">full of null bytes or characters</a>, which means the update file was <strong>full of zeros</strong>.</p><p>But CrowdStrike has <a href="https://www.crowdstrike.com/blog/tech-analysis-channel-file-may-contain-null-bytes/">responded saying</a>, that's just <strong>how Windows works</strong>. </p><p>When a program is creating a new file, Windows first fills the new file with null bytes (a bunch of zeros) before <strong>adding data to that file</strong>.</p><p>Meaning the crash could have happened while a <strong>new file was being created</strong>.</p><p></p><h2>What Happens Now?</h2><p>Thankfully, it took CrowdStrike <strong>78 minutes</strong> to release a fix.</p><p>Meaning affected machines should <strong>automatically download and install the update</strong> after a good old restart.</p><p>But if the machine <strong>crashes again</strong> after being restarted, the fix would need to be applied <a href="https://www.youtube.com/watch?v=TZlUrXXVxc8">the hard way</a>.</p><p>Someone would need to <strong>download the fix</strong> from a working computer <strong>to a USB drive</strong>.</p><p>Then put that USB <strong>into the broken machine</strong> before turning it on. When it does turn on, a menu should show, allowing the fix to be selected.</p><p>CrowdStrike has outlined steps they will take to prevent this from happening again, which include:</p><ul><li><p>improved <strong>developer testing</strong></p></li><li><p>improved <strong>stress testing</strong></p></li><li><p>improving their <strong>light testing stack</strong></p></li><li><p>and improving their <strong>error handling</strong> so that future errors do not cause crashes</p><p></p></li></ul><p></p><h2>Wrapping Things Up</h2><p>I <strong>truly sympathize</strong> with those affected by this outage. It caused missed or canceled flights, delayed mail, and failed 911 calls.</p><p>It goes to show just how much <strong>damage the code we write</strong> can cause when it's not <strong>properly reviewed and tested</strong>.</p><p>I wish I could say this will never happen again, but we're human; we make mistakes, so it&#8217;s only a matter of time.</p><p>But hopefully not on <strong>this scale</strong>.</p><p>Anyway, I hope you enjoyed this article.</p><p>If you would like more of these, then be sure to <a href="https://newsletter.betterstack.com/">subscribe</a>. </p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Netflix Uses Throttling to Prevent 4 Big Streaming Problems]]></title><description><![CDATA[Netflix reveals their unconventional trick to keep viewers happy]]></description><link>https://newsletter.betterstack.com/p/how-netflix-uses-throttling-to-prevent</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-netflix-uses-throttling-to-prevent</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Thu, 15 Aug 2024 13:02:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It would be <strong>really difficult</strong> to find someone who has never heard of Netflix<strong> </strong>before.</p><p>With around <strong>240 million paid subscribers,</strong> Netflix has to be the world's most <strong>popular streaming service</strong>. And it&#8217;s well deserved.</p><p>Wherever you are in the world, no matter the time or device, you can press play on any piece of Netflix content and <strong>it will work</strong>.</p><p>Does that mean that <strong>Netflix </strong>never has issues? Nope, things go wrong <strong>quite often</strong>. But they guarantee you'll always be able to watch your favorite show.</p><p>Here's how they can do that.</p><p><em>Estimated reading time: 4 minutes 13 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What Goes Wrong?</h2><p>Just like with many other services, there are lots of things that could affect a <strong>Netflix user's streaming experience</strong>.</p><ol><li><p><strong>Network Blip:</strong> A user's network connection temporarily goes down or has another issue.</p></li><li><p><strong>Under Scaled Services:</strong> Cloud servers have not scaled up or do not have enough resources (CPU, RAM, Disk) to handle the traffic.</p></li><li><p><strong>Retry Storms:</strong> A backend service goes down, meaning client requests fail, so it retries and retries, causing requests to build up.</p></li><li><p><strong>Bad Deployments:</strong> Features or updates that introduce bugs.</p></li></ol><p>This is not an exhaustive list, but remember that the main purpose of <strong>Netflix</strong> is to <strong>provide great content to its users</strong>. If any of these issues prevent a user from doing that, then Netflix is not <strong>fulfilling its purpose</strong>.</p><p>Considering most issues affect Netflix's <strong>backend services</strong>. The solution must '<em>shield</em>' content playback from any potential problems.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cwi7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cwi7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 424w, https://substackcdn.com/image/fetch/$s_!cwi7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 848w, https://substackcdn.com/image/fetch/$s_!cwi7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 1272w, https://substackcdn.com/image/fetch/$s_!cwi7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cwi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png" width="1456" height="808" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:808,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:483931,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cwi7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 424w, https://substackcdn.com/image/fetch/$s_!cwi7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 848w, https://substackcdn.com/image/fetch/$s_!cwi7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 1272w, https://substackcdn.com/image/fetch/$s_!cwi7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc622d5dc-ecce-4a9f-9d72-240908118589_3665x2034.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h4><em><strong>Sidenote: API Gateway</strong></em></h4><p><em>Netflix has <strong>many backend services,</strong> as well as many clients that all communicate with them.</em></p><p><em>Imagine all the connection lines between them; it would look a lot like spaghetti.</em></p><p><em>An <strong>API Gateway</strong> is a server that sits between all those clients and the backend services. It's like a traffic controller routing requests to the right service. This results in cleaner, less confusing connections.</em></p><p><em>It can also check that the client has the <strong>authority</strong> to make requests to certain services and <strong>monitor requests</strong>, more about that later. </em></p><div><hr></div><h2>The Shield</h2><p>If Netflix had a problem and <strong>no users were online</strong>, it could be resolved quickly without anyone noticing.</p><p>But if there's a problem, like not being able to favorite a show, and someone tries to use that feature, this would <strong>make the problem worse</strong>. Their attempts would send more requests to the backend, putting <strong>more strain</strong> on its resources.</p><p>It <strong>wouldn't make sense</strong> to block this feature because Netflix doesn&#8217;t want to scare its users. </p><p>But what they could do is &#8216;<em>throttle</em>&#8217; those requests using the <strong>API Gateway</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!btPC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!btPC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 424w, https://substackcdn.com/image/fetch/$s_!btPC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 848w, https://substackcdn.com/image/fetch/$s_!btPC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 1272w, https://substackcdn.com/image/fetch/$s_!btPC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!btPC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png" width="1456" height="770" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:770,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:488441,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!btPC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 424w, https://substackcdn.com/image/fetch/$s_!btPC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 848w, https://substackcdn.com/image/fetch/$s_!btPC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 1272w, https://substackcdn.com/image/fetch/$s_!btPC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F31101ae0-be27-453c-b6fc-4119e304eabf_3591x1900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h4><em><strong>Sidenote: Throttling</strong></em></h4><p><em>If you show up at <strong>a popular restaurant</strong> without booking ahead, you may be asked to <strong>come back later</strong> when a table is available.</em></p><p><em>Restaurants can only provide a <strong>certain number of seats at a time</strong>, or they would get overcrowded. This is how throttling works.</em></p><p><em>A service can usually handle only a <strong>certain number of requests at a time</strong>. A request threshold can be set, say <strong>5 requests per minute</strong>.</em></p><p><em>If 6 requests are made in a minute, the 6th request is either <strong>held for a specified amount of time</strong> before being processed (rate limiting) or rejected.</em></p><div><hr></div><p></p><h2>How It Worked</h2><p>Because Netflix's API Gateway was <strong>configured to track</strong> CPU load, error rates, and a bunch of other things for all the backend services. </p><p>It knew <strong>how many errors</strong> each service had and <strong>how many requests</strong> were being sent to them. </p><p>So if a service was getting a <strong>lot of requests</strong> and had <strong>lots of errors</strong>, this was a good indicator that any further requests would need to be throttled.</p><div><hr></div><h4><em><strong>Sidenote: Collecting Request Metrics</strong></em></h4><p><em>Whenever a request is sent from a client to the API Gateway, it <strong>starts collecting metrics</strong> like response time, status code, request size, and response size.</em></p><p><em>This happens <strong>before the request</strong> is directed to the appropriate service.</em></p><p><em>When the service sends back a response, it goes through the gateway, which <strong>finishes collecting metrics</strong> before sending it to the client.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ar08!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ar08!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 424w, https://substackcdn.com/image/fetch/$s_!ar08!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 848w, https://substackcdn.com/image/fetch/$s_!ar08!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 1272w, https://substackcdn.com/image/fetch/$s_!ar08!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ar08!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png" width="1456" height="770" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:770,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:456178,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ar08!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 424w, https://substackcdn.com/image/fetch/$s_!ar08!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 848w, https://substackcdn.com/image/fetch/$s_!ar08!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 1272w, https://substackcdn.com/image/fetch/$s_!ar08!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd85fe701-3af1-4bfa-807f-22e6d0b1761d_3591x1900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p>Of course, there are some services that if throttled, would have more of an <strong>impact on the ability to watch</strong> content than others. So the team <strong>prioritized</strong> requests based on:</p><ol><li><p><strong>Functionality</strong>: What will be affected if this request is throttled? If it's important to the user, then it's <strong>less likely</strong> to be throttled.</p></li><li><p><strong>Point of origin</strong>: Is this request from a user interaction or something else, like a cron job? User interactions are <strong>less likely</strong> to be throttled.</p></li><li><p><strong>Fallback available</strong>: If a request gets throttled, does it have a reasonable fallback? For example, if a trailer doesn&#8217;t play on hover, will the user see an image? If there's a good fallback, then it's <strong>more likely</strong> to be throttled.</p></li><li><p><strong>Throughput:</strong> If the backend service tends to receive a lot of requests, like logs, then these requests are <strong>more likely</strong> to be throttled.</p></li></ol><p>Based on these criteria, each request was given a score between <strong>0 and 100</strong> before being routed. With <strong>0 being high priority</strong> (less likely to be throttled) and <strong>100 being low priority</strong> (more likely to be throttled).</p><p>The team implemented a <strong>threshold number,</strong> for example 40, and if a request's score was above that number, it would be throttled. </p><p>This threshold was determined by the<strong> health of all the backend services</strong> which again, was monitored by the API Gateway. The worse the health, <strong>the lower the threshold</strong> and vice versa.</p><p>There are no hard numbers in the <a href="https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94">original article</a> on how much resource, or time this technique saved the company (which is a shame). </p><p>But the <strong>gif below</strong> is a recording of what a potential user would experience if the backend system was recovering from an issue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Jvei!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Jvei!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 424w, https://substackcdn.com/image/fetch/$s_!Jvei!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 848w, https://substackcdn.com/image/fetch/$s_!Jvei!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 1272w, https://substackcdn.com/image/fetch/$s_!Jvei!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Jvei!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif" width="720" height="320" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:320,&quot;width&quot;:720,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4235551,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Jvei!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 424w, https://substackcdn.com/image/fetch/$s_!Jvei!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 848w, https://substackcdn.com/image/fetch/$s_!Jvei!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 1272w, https://substackcdn.com/image/fetch/$s_!Jvei!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8ad0a827-1aa9-49a9-939d-036652b11786_720x320.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from the <a href="https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94">original article</a></figcaption></figure></div><p>As you can see, they were able to <strong>play their favorite show without interruption,</strong> oblivious to what was going on <strong>in the background</strong>.</p><h2>Let's Call It</h2><p>I could go on, but I think this is a good <strong>place to stop</strong>.</p><p>The team must have put a huge amount of effort into getting this across the line. I mean, the API gateway is written in Java, so bravo to them.</p><p>If you want <strong>more information</strong> about this there's plenty of it out there.</p><p>I recommend reading the <a href="https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94">original article</a>, watching <a href="https://www.youtube.com/watch?v=TmNiHbh-6Wg">this video</a>, and reading <a href="https://blog.quastor.org/p/netflix-implements-load-shedding-1">this article</a> as well.</p><p>But if you don't have time to do all that and are enjoying these <strong>simplified summaries</strong>, you <a href="https://newsletter.betterstack.com/">know what to do</a>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 19 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How Instagram Saved 90% of Computing Power & Improved Video Quality]]></title><description><![CDATA[Ditching this key process gave Instagram users a better video experience]]></description><link>https://newsletter.betterstack.com/p/how-instagram-saved-90-of-computing</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-instagram-saved-90-of-computing</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Thu, 08 Aug 2024 13:02:43 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>With <strong>2.5 billion active users</strong>, <strong>Instagram</strong> is one of the most popular social media platforms in the world. </p><p>And <strong>video</strong> accounts for <strong>over 80%</strong> of its total traffic.</p><p>With those numbers, it's difficult to imagine how much <strong>computation time and resources</strong> it takes to upload, encode and publish videos from all those users.</p><p>But Instagram managed to<strong> </strong>reduce that time by 94% and also <strong>improve their video quality</strong>.</p><p>Here's how.</p><p><em>Estimated reading time: 4 minutes 52 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>The Process from Upload to Publish</h2><p>Here are the typical steps that take place whenever a user uploads a video on Instagram:</p><ol><li><p><strong>Pre-processing:</strong> Enhance the video&#8217;s quality like color, sharpness, frame rate, etc. </p></li><li><p><strong>Compression/Encoding:</strong> Reduce the file size</p></li><li><p><strong>Packaging:</strong> Splitting it into smaller chunks for streaming</p></li></ol><p>For this article, we will focus on the <strong>encoding</strong> and <strong>packaging</strong> steps.</p><div><hr></div><p><em><strong>Sidenote: Video Encoding</strong></em></p><p><em>If you were to record a 10-second 1080 video on your phone without any compression, it would be around <strong>1.7 GB</strong>.</em></p><p><em>That&#8217;s a lot! </em></p><p><em>To make it smaller your phone uses something called a <strong>codec</strong>, that compresses the video for storage using <strong>efficient algorithms</strong>.</em></p><p><em>So efficient that it will get the file size down to <strong>35MB</strong>, but it's in a format that&#8217;s not designed to be read by humans.</em></p><p><em>To watch the encoded video, a <strong>codec</strong> needs to decompress the file to pixels that can be displayed on your screen.</em></p><p><em>The compression process is called <strong>encoding</strong>, and the decompression process is called <strong>decoding</strong>.</em></p><p><em><strong>Codecs</strong> have improved over time so there are <a href="https://developer.mozilla.org/en-US/docs/Web/Media/Formats/Video_codecs#common_codecs">many of them out there</a>. And they&#8217;re stored in most devices, cameras, phones, computers, etc.</em></p><div><hr></div><p>Instagram generated <strong>two types</strong> of encodings on upload: <strong>Advanced Encoding </strong>(AV1), and <strong>Simple</strong> <strong>Encoding</strong> (H.264).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VkaQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VkaQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 424w, https://substackcdn.com/image/fetch/$s_!VkaQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 848w, https://substackcdn.com/image/fetch/$s_!VkaQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 1272w, https://substackcdn.com/image/fetch/$s_!VkaQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VkaQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png" width="772" height="763" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:763,&quot;width&quot;:772,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:479556,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VkaQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 424w, https://substackcdn.com/image/fetch/$s_!VkaQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 848w, https://substackcdn.com/image/fetch/$s_!VkaQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 1272w, https://substackcdn.com/image/fetch/$s_!VkaQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ebfc72a-897e-43e5-a20e-4f8a31e51297_772x763.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot of video from the <a href="https://engineering.fb.com/2022/11/04/video-engineering/instagram-video-processing-encoding-reduction/">original article</a></figcaption></figure></div><p><strong>Advanced encoding</strong> produces videos that are small in size with <strong>great quality.</strong> These kind of videos only made up <strong>15% of Instagram&#8217;s total watch time</strong>.</p><p><strong>Simple encoding</strong> produces videos that work on <strong>older devices,</strong> but used a <strong>less efficient method of compression</strong>, meaning that videos are small with not great quality. </p><p>To make matters worse, simple encoding alone took up more than <strong>80% of Instagram's computing resources</strong>.</p><p></p><h2>Why Simple Encoding Is Such a Resource Hog</h2><p>For <strong>Simple encoding</strong>, a video is actually encoded in <strong>two formats</strong>:</p><ul><li><p><strong>Adaptive bit rate (ABR)</strong>: video quality will change based on the user's <strong>connection speed.</strong></p></li><li><p><strong>Progressive</strong>: video quality <strong>stays the same</strong> no matter the connection. This was for older versions of Instagram that <strong>don't support ABR.</strong></p></li></ul><p>Both ABR and progressive created multiple encodings of the same video in different <strong>resolutions and bit rates</strong>.</p><p>But for <strong>progressive</strong>, the video player will only play one encoded video. </p><p>While for <strong>ABR</strong> those videos are split into <strong>small 2-10 second chunks</strong>, and the video player will change which chunk is played based on the user&#8217;s internet speed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!scne!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!scne!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 424w, https://substackcdn.com/image/fetch/$s_!scne!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 848w, https://substackcdn.com/image/fetch/$s_!scne!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 1272w, https://substackcdn.com/image/fetch/$s_!scne!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!scne!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png" width="1456" height="910" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:910,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1278243,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!scne!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 424w, https://substackcdn.com/image/fetch/$s_!scne!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 848w, https://substackcdn.com/image/fetch/$s_!scne!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 1272w, https://substackcdn.com/image/fetch/$s_!scne!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00f936a0-a932-4662-b637-e95c395e3951_5940x3713.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">It&#8217;s unknown how many videos were produced so 8 is a rough guess</figcaption></figure></div><p></p><div><hr></div><p><em><strong>Sidenote: Bit rate</strong></em></p><p><em>When a video is encoded, it stores binary data (1s and 0s) for each frame of the video, the more information each frame has, the higher its <strong>bit rate</strong>.</em></p><p><em>If I recorded a video of a still pond the <strong>compression algorithm</strong> will notice that most pixels stay blue, and store them with <strong>less data</strong> to keep the pixels the same.</em></p><p><em>If I had a recording of a <strong>fast-flowing waterfall</strong> and the compression algorithm kept pixels the same, the video would look odd.</em></p><p><em>Since pixels change a lot between frames it needs to <strong>store more information</strong> in each frame.</em></p><p><em><strong>Bit rate</strong> is measured in <strong>megabits per second (mbps)</strong> since this is how much data is sent to the video player.</em></p><p><em>On YouTube the average bitrate for a 1080 video is <strong>8Mbps</strong> which is <strong>1MB</strong> of transmitted data every second.</em></p><div><hr></div><p>If you had to guess which specific process was taking up the most resources, you'd correctly guess <strong>adaptive bit rate</strong>.</p><p>This is not only due to creating multiple video files, but also because the additional <strong>packaging step</strong> involves <strong>complex algorithms</strong> to figure out how to <strong>seamlessly switch between different video qualities</strong>.</p><p></p><h2>The Clever Fix</h2><p>Usually, progressive encoding creates just one video file. But because Instagram was <strong>creating multiple files</strong> with the same codec as ABR (H.264). </p><p>They realized they could use the <strong>same files for progressive and ABR</strong> eliminating the need to create two sets of the same videos.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SQ1D!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SQ1D!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 424w, https://substackcdn.com/image/fetch/$s_!SQ1D!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 848w, https://substackcdn.com/image/fetch/$s_!SQ1D!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 1272w, https://substackcdn.com/image/fetch/$s_!SQ1D!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SQ1D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png" width="1456" height="1093" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1093,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:899627,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SQ1D!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 424w, https://substackcdn.com/image/fetch/$s_!SQ1D!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 848w, https://substackcdn.com/image/fetch/$s_!SQ1D!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 1272w, https://substackcdn.com/image/fetch/$s_!SQ1D!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F066333d2-b111-41c8-a9d2-4140a729047e_4911x3687.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>If you compare the image above to the previous image, you&#8217;ll see that <strong>4 videos</strong> are now created during the encoding stage <strong>instead of 8</strong>.</p><p>The team were able to use <strong>the same progressive files</strong> for the packaging stage of ABR which wasn&#8217;t as efficient as before resulting in <strong>poorer compression</strong>.</p><p>But they did save <strong>a lot of resources</strong>. </p><p>Instagram claims the old ABR process took <strong>86 seconds</strong> for a <strong>23-second video</strong>. </p><p>But the <strong>new</strong> ABR process, just packaging, took <strong>0.36 seconds</strong>, which is a whopping <strong>99% reduction</strong> in processing time.</p><p>With this much reduction Instagram could dedicate more resources to the<strong> advanced encoding process</strong>, which meant more users could see <strong>higher quality videos</strong>. How?</p><p>Because simple encoding <strong>took longer</strong> in the old process and used <strong>more resources</strong>, there wasn&#8217;t enough to always create advanced videos. </p><p>With the new process, there was enough resource to run <strong>both types of encoding</strong>, meaning both can be published and more users would see <strong>higher quality videos</strong>.</p><p>This resulted in an increase in views of advanced encoded video <strong>from 15% to 48%</strong>.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EbsP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EbsP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 424w, https://substackcdn.com/image/fetch/$s_!EbsP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 848w, https://substackcdn.com/image/fetch/$s_!EbsP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 1272w, https://substackcdn.com/image/fetch/$s_!EbsP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EbsP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp" width="1157" height="540" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:540,&quot;width&quot;:1157,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26394,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EbsP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 424w, https://substackcdn.com/image/fetch/$s_!EbsP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 848w, https://substackcdn.com/image/fetch/$s_!EbsP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 1272w, https://substackcdn.com/image/fetch/$s_!EbsP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F169e8881-8c6c-4914-8333-79d2350a8717_1157x540.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image from <a href="https://engineering.fb.com/2022/11/04/video-engineering/instagram-video-processing-encoding-reduction/">original article</a></figcaption></figure></div><div><hr></div><p><em><strong>Sidenote: Encoding vs Transcoding</strong></em></p><p><em>This is an optional side note for the video experts among you.</em></p><p><em>The word <strong>transcoding</strong> isn't used in this article, but technically it should have been.</em></p><p><em><strong>Encoding</strong> is the process of compressing an uncompressed video into a smaller format.</em></p><p><em><strong>Transcoding</strong> is the process of changing a video from one encoded format to the same, or another format.</em></p><p><em>Because all devices (phones, cameras) have a <strong>codec</strong>, when a video is recorded it is automatically encoded.</em></p><p><em>So even before you upload a video to <strong>Instagram</strong> it is already encoded, and any further encoding is called <strong>transcoding</strong>.</em></p><p><em>But because the <a href="https://engineering.fb.com/2022/11/04/video-engineering/instagram-video-processing-encoding-reduction/">original article</a> mostly uses the term <strong>encoding</strong> and it&#8217;s is such a catch-all term used in the industry, I decided to stick with it.</em></p><div><hr></div><h2>Wrapping Things Up</h2><p>After reading this you may be thinking, <strong>how did the team not spot this obvious improvement?</strong></p><p>Well, small issues on a small scale are often <strong>overlooked</strong>. Small issues on a large scale <strong>no longer remain small issues</strong>, and I guess that's what happened here.</p><p>Besides, Instagram was always a <strong>photo app</strong> that is now focusing more on video, so I assume it's a learning process for them too.</p><p>If you want to read <strong>more about their learnings,</strong> check out the <a href="https://engineering.fb.com/tag/instagram/">Meta Engineering Blog</a>.</p><p>But if you enjoyed this <strong>simplified version</strong>, be sure to <strong>subscribe</strong>.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 16 hours.</em></p>]]></content:encoded></item><item><title><![CDATA[How OpenAI Scaled Kubernetes to 7,500 Nodes by Removing One Plugin]]></title><description><![CDATA[The one change that improved OpenAI's network.]]></description><link>https://newsletter.betterstack.com/p/how-openai-scaled-kubernetes-to-7500</link><guid isPermaLink="false">https://newsletter.betterstack.com/p/how-openai-scaled-kubernetes-to-7500</guid><dc:creator><![CDATA[Richard Oliver Bray]]></dc:creator><pubDate>Wed, 31 Jul 2024 13:54:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!tBw6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>One of the biggest names in AI tripled the size of its infrastructure, going from <strong>2,500 nodes to 7,500</strong> in just a few years.</p><p>To put that into perspective, the average enterprise company will do fine with around <strong>50-100</strong> nodes.</p><p>This <strong>enormous increase</strong> in computing power worked because of <a href="https://openai.com/index/scaling-kubernetes-to-7500-nodes/">lots of changes</a>. </p><p>But the most important one was removing a plugin called <a href="https://github.com/flannel-io/flannel">Flannel</a>.</p><p>Why? Let&#8217;s dive into it.</p><p><em>Estimated reading time: 3 minutes 59 seconds</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>What is Flannel?</h2><p>Imagine we have a Kubernetes cluster that contains two<strong> nodes</strong> (virtual computers), one for a <strong>Node.js web server</strong>, another for a <strong>Postgres database</strong>.</p><p>Inside each of these nodes we'll have two pods (containers), one for the web server/database and something to collect logs like <a href="https://vector.dev/">Vector</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tBw6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tBw6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 424w, https://substackcdn.com/image/fetch/$s_!tBw6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 848w, https://substackcdn.com/image/fetch/$s_!tBw6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 1272w, https://substackcdn.com/image/fetch/$s_!tBw6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tBw6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png" width="650" height="454.01785714285717" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1017,&quot;width&quot;:1456,&quot;resizeWidth&quot;:650,&quot;bytes&quot;:223569,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tBw6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 424w, https://substackcdn.com/image/fetch/$s_!tBw6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 848w, https://substackcdn.com/image/fetch/$s_!tBw6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 1272w, https://substackcdn.com/image/fetch/$s_!tBw6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd0e7206-f377-41f9-84f1-8ca86197c981_1708x1193.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><em><strong>Sidenote: Node vs Pod</strong></em></p><p><em>Pods usually contain a <strong>single application</strong> (multiple if needed), and a node provides resources for the pod to run, CPU, Memory etc.</em></p><p><em>Nodes get assigned an IP address by the cloud provider. Pods do get assigned IPs by default using <strong>kubnet</strong>, but are used to communicate <strong>only within the same node</strong>.</em></p><div><hr></div><p>If you want pods in different nodes to communicate, HTTP is a good approach.</p><p>But before you can even do that, you'll need to give each pod a <strong>unique IP address</strong> so they know where to send data.</p><p>A bit like giving someone a <strong>unique postal address</strong> so you're able to send letters to them.</p><p>Pod-to-pod communication between nodes out-of-the-box with Kubernetes is <strong>not straightforward</strong>, and this is where Flannel comes in.</p><p>Flannel exists to make pod-to-pod communication really easy. It <strong>configures the network</strong> for each pod and does things like <strong>assigns each pod an IP address</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Idh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Idh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 424w, https://substackcdn.com/image/fetch/$s_!1Idh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 848w, https://substackcdn.com/image/fetch/$s_!1Idh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 1272w, https://substackcdn.com/image/fetch/$s_!1Idh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Idh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png" width="1456" height="1224" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1224,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:302845,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Idh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 424w, https://substackcdn.com/image/fetch/$s_!1Idh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 848w, https://substackcdn.com/image/fetch/$s_!1Idh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 1272w, https://substackcdn.com/image/fetch/$s_!1Idh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfca89a6-9e49-4b3a-8560-2c6ee511bb76_1736x1459.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In the world of Kubernetes, Flannel is called a <strong>CNI</strong> <strong>plugin</strong> (Container Network Interface). It's not the only one; there are others like <a href="https://docs.tigera.io/">Calico</a> and <a href="https://cilium.io/">Cilium</a>.</p><p>But Flannel is the <a href="https://www.devopsschool.com/blog/list-of-cni-plugins-used-in-kubernetes/">easiest to setup</a>.</p><h2>Why Was it Removed?</h2><p>Flannel is a great solution, but it wasn't designed for <strong>thousands of nodes</strong>.</p><p>OpenAI was already <a href="https://openai.com/index/scaling-kubernetes-to-2500-nodes/">seeing slow speeds with Flannel for 2,500 nodes</a>, so 7,500 nodes would have definitely been a struggle.</p><p>To fully understand what made it slow, let's go back to our previous example and <strong>double the number of nodes</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AoQT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AoQT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 424w, https://substackcdn.com/image/fetch/$s_!AoQT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 848w, https://substackcdn.com/image/fetch/$s_!AoQT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!AoQT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AoQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png" width="576" height="537.3" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1194,&quot;width&quot;:1280,&quot;resizeWidth&quot;:576,&quot;bytes&quot;:135535,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AoQT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 424w, https://substackcdn.com/image/fetch/$s_!AoQT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 848w, https://substackcdn.com/image/fetch/$s_!AoQT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!AoQT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe37240f6-6394-47e0-bb83-944c51ebe60e_1280x1194.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>For communication between nodes, Flannel performs several tasks:</p><ul><li><p>Assigns <strong>each cluster a range of IP addresses for each node</strong> and a range of IPs for nodes to give to each pod (subnet allocation).</p></li><li><p>Creates <strong>route tables </strong>to keep track of which node has which IP address.</p></li><li><p>With this information, if data needs to be sent to a specific pod, Flannel will know the <strong>exact IP address of the node to which the pod belongs to</strong> (traffic routing).</p></li><li><p>It also <strong>labels traveling data</strong> (packets) so that if it needs to pass through a cluster to reach its destination, it is not read by accident (packet encapsulation).</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oisb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oisb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 424w, https://substackcdn.com/image/fetch/$s_!oisb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 848w, https://substackcdn.com/image/fetch/$s_!oisb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!oisb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oisb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png" width="1456" height="1137" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1137,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:165393,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oisb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 424w, https://substackcdn.com/image/fetch/$s_!oisb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 848w, https://substackcdn.com/image/fetch/$s_!oisb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!oisb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb28eab21-74f7-4457-9102-b20846942ee0_1529x1194.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The aim of Flannel is to speed up communication, but with all these processes on top of thousands of nodes, it actually <strong>slows down the network</strong>. </p><p>As shown in the image below, <strong>with 150,000</strong> requests per second (RPS), <strong>Flannel&#8217;s VXLAN</strong> tends to have the <a href="https://machinezone.github.io/research/networking-solutions-for-kubernetes/#id3">worst performance</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UX_y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UX_y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 424w, https://substackcdn.com/image/fetch/$s_!UX_y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 848w, https://substackcdn.com/image/fetch/$s_!UX_y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 1272w, https://substackcdn.com/image/fetch/$s_!UX_y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UX_y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:405128,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UX_y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 424w, https://substackcdn.com/image/fetch/$s_!UX_y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 848w, https://substackcdn.com/image/fetch/$s_!UX_y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 1272w, https://substackcdn.com/image/fetch/$s_!UX_y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c0c0548-700b-41ed-b523-412da6e50c6e_2880x960.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Of course, the team at OpenAI needed to do something about this.</p><div><hr></div><p><em><strong>Sidenote: VXLAN</strong></em></p><p><em>VXLAN (Virtual eXtensible Local Area Network) is used to create a <strong>virtual network</strong> that allows machines to communicate.</em></p><p><em>Long before the internet, if you had two physical computers, you would have to <strong>connect a cable between</strong> them so they could communicate.</em></p><p><em>A VXLAN simulates this using the <strong>power of the internet</strong> or a larger network.</em></p><p><em>If two computers are connected to the internet and have <strong>VXLAN software installed</strong>, they each get a unique IP address, and packets are encapsulated so that it&#8217;s not viewed by another machine.</em></p><div><hr></div><h2>What Was Used Instead?</h2><p>If you keep up with <strong>OpenAI news</strong>, or just AI news in general, you'll know that Microsoft has put a <a href="https://www.cnbc.com/2023/04/08/microsofts-complex-bet-on-openai-brings-potential-and-uncertainty.html">lot of money behind them</a>.</p><p>You may also know that Microsoft owns Azure, which is a <strong>huge cloud provider</strong>.</p><p>So it only makes sense that OpenAI hosts <strong>all of its infrastructure on Azure</strong>. I mean, it would be weird if they hosted it on Google Cloud or AWS, right?</p><p>Since OpenAI was using the <strong>Azure Kubernetes Service (AKS)</strong> for their infrastructure, using <a href="https://learn.microsoft.com/en-us/azure/aks/configure-azure-cni?tabs=configure-networking-portal">Azure's CNI</a> was the obvious choice, as both services work better together.</p><p>Azure's CNI isn't just <strong>a fork of Flannel with Microsoft branding</strong>; it's different in many ways:</p><ul><li><p>Both nodes and pods are given <strong>IP addresses from the same range</strong> (subnet allocation). For example, if a node has IP 10.0.0.1, the pod gets 10.0.0.2, allowing direct pod-to-pod communication without going through a node.</p></li><li><p>It doesn't use a virtual network (VXLAN); it uses direct routing. This means <strong>no virtual routes</strong> need to be created, and there's <strong>no encapsulation</strong> of packets.</p></li><li><p>It is not a general-purpose CNI; it is specifically designed for Azure Kubernetes Service, meaning it has <strong>native integration and better performance</strong> with an AKS cluster.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_ryb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_ryb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 424w, https://substackcdn.com/image/fetch/$s_!_ryb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 848w, https://substackcdn.com/image/fetch/$s_!_ryb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!_ryb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_ryb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png" width="1456" height="1115" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1115,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:164689,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_ryb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 424w, https://substackcdn.com/image/fetch/$s_!_ryb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 848w, https://substackcdn.com/image/fetch/$s_!_ryb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 1272w, https://substackcdn.com/image/fetch/$s_!_ryb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F78180de7-68ee-4aec-8629-4f1f38f936f6_1559x1194.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The red pods here are for the Azure CNI service and not Flannel</figcaption></figure></div><p>All of these changes made Azure CNI <strong>much faster</strong> than Flannel, but it's not the same as Flannel.</p><p>The team had to use a different solution to keep track of packets and rely on a different <strong>routing engine</strong>.</p><p>However, the benefits <strong>outweigh the drawbacks</strong>.</p><p>OpenAI even claimed that pod-to-pod communication between clusters was as fast as direct communication between <strong>pods in the same node</strong>.</p><h2>And That&#8217;s a Wrap</h2><p>It's amazing to think that this whole article was just <strong>one paragraph</strong> of the <a href="https://openai.com/index/scaling-kubernetes-to-7500-nodes/">original post</a>.</p><p>There's a lot of good information in there, and I recommend you check it out.</p><p>But if you enjoyed this <strong>simplified format</strong>, go ahead and <strong>subscribe</strong> to get more like this.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.betterstack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">We do tech deep dives like this <strong>every 2 weeks</strong>. Get the next one sent straight to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p><em>PS: Enjoyed this newsletter? Please forward it to a pal or follow us on socials (<a href="https://www.linkedin.com/company/betterstack/">LinkedIn</a>, <a href="https://x.com/BetterStackHQ">Twitter</a>, <a href="https://www.youtube.com/@betterstack">YouTube</a>, <a href="https://www.instagram.com/betterstackhq/">Instagram</a>). It only takes 10 seconds. Making this one took 20 hours.</em></p>]]></content:encoded></item></channel></rss>