Common Numbers You Should Know

29 Jan 2022

This post is inspired by Jeff Dean’s list of “Numbers Everyone Should Know”. I wanted to 1) update Jeff’s numbers to reflect the latest generation of technologies 2) add more relevant numbers that are not covered in the original list.

Op	Number	Notes
L1 cache	0.5-1ns, 1TB/s, 80KB (Silver 4314)	L1 access takes ~4 CPU cycles. For a 4GHz CPU, this is ~1ns [1]. The capacity includes both instruction and data. It seems like the i and d-cache sizes are similar.
L2	5ns, 1TB/s, 1.3MB per physical processor (Silver 4314)	~10 cycles. Note that L2 may be shared by two cores [1]
L3	50ns, 500GB/s, 24MB (Silver 4314), up to 48MB (Gold 6338N)	Silver 4314 seem to have one L3 per socket (source: output of `lscpu --caches`)
Main memory (DDR3)	100ns, 10GB/s, ~1GHz	[2] Not sure how up to date the 100ns latency is.
Main memory (DDR4)	?ns, 30GB/s, ~3GHz	[3] Not sure how up to date the 100ns latency is.
Main memory (HBM)	?ns, 200GB/s	Need to verify
SSD	20us, 10GB/s	[4] but sources such as [6] suggest SSD bw can be lower than 1GB/s
NVM	300ns, 5GB/s	[5]
Hard disk drive	10ms, 100MB/s	[6]

Network

Op	Number	Notes
2kB over 1Gbps network	16us	Theoretically this should take 16us, since 1Gbps means 1ns per bit. I am seeing 400us between my 1Gbps workstation and server (~20meters apart). Why is this? Likely because of latency.
Round trip time within same data center	500us	What does this mean? Time between two servers in the same datacenter? Why is it 500us? Does it have to do with network speed vs distance? Speed of light is approx 300m/us.
Round trip time from California to Europe	150ms	Distance between California and Europe is about 8000kM [9]. Assuming network packets travel at the speed of light, a round trip would take ~50ms. This probably means the network travel speed is slower, which is expected.

Seems like some of [8] numbers are based on prediction models of past papers. We should try benchmarking some of these numbers ourselves.

Availability Levels [7]

Number of seconds in a day = 86400; Number of hours in a year = 8760

Nines	Downtime per year	per day
90%	37 days	2.4hr
95%	18 days	1.2hr
99%	5 days	14min
99.9%	9 hr	1.4min
99.99%	53 min	8.6s
99.999%	5 min	0.9s

Difference between 4 nines and 5 nines is 10x, since 0.01% vs 0.001% downtime
There are 24*60*60 = 86400s in a day. 0.001% downtime is 0.86s or approx. 0.9s (or approx. 1s?)

what is required in practice to get 5 nines? If I need to design such a system, what optimizations would I need to do?.

Binary Numbers

2^10	10^3	1k
2^20	10^6	1M
2^30	10^9	1G
2^40	10^12	1T
2^50	10^15	1Peta
2^60	10^18	1Exa

Misc

How can Raft afford to use a single master to serve all client requests? Does this not create a performance bottleneck? .

Sources

[1] https://www.intel.com/content/www/us/en/developer/articles/technical/memory-performance-in-a-nutshell.html

[2] https://www.kingston.com/en/memory/ddr3-1600

[3] https://www.crucial.com/support/memory-speeds-compatability

[4] https://en.wikipedia.org/wiki/Solid-state_drive

[5] https://arxiv.org/pdf/1903.05714.pdf

[6] https://tekie.com/blog/hardware/ssd-vs-hdd-speed-lifespan-and-reliability/#:~:text=A%20typical%207200%20RPM%20HDD,s%20to%20550%20MB%2Fs.

[7] https://sre.google/sre-book/availability-table/

[8] https://colin-scott.github.io/personal_website/research/interactive_latency.html

[9] http://distancebetween2.com/california/europe#:~:text=The%20total%20straight%20line%20distance,to%20Europe%20is%205171.2%20miles.