Managing distributed data is like managing a team

is there any difference??
  1. There is a central coordinator who needs a lot of memory and their only job is coordinating (the manager)
  2. Adding more nodes (employees) doesn’t automatically make things faster (esp. work is poorly distributed).
  3. It doesn’t make sense for everyone to know everything (data partitioning)
  4. Transfer of context can be inefficient. It helps for relevant knowledge to be colocated (domain expertise)
  5. When a single node (employee) goes MIA without telling anyone, it’s worse than lost work because it can block others.
  6. When it becomes prohibitively expensive to acquire larger nodes (10x employees), you must expand horizontally.
  7. If one person gets all the work (skew) they will be overworked and sad. Sometimes you won’t realize a task or domain is disproportionately hard until it’s too late.
  8. You need some knowledge overlap (replication) to avoid single points of failure when people leave or go on PTO.
  9. Rebalancing is expensive and creates downtime (meetings, meetings, more meetings)
  10. When networking goes down (slack) everyone is fucked.

Leave a Reply

Your email address will not be published. Required fields are marked *