[Article] On Demand Hadoop Processing by Google (a.k.a. “one way of dealing with lots of people writing shitty queries”)

Blog Post: https://cloud.google.com/blog/big-data/2017/06/fastest-track-to-apache-hadoop-and-spark-success-using-job-scoped-clusters-on-cloud-native-architecture

HN: https://news.ycombinator.com/item?id=14499170

This is a pretty interesting trend – as lots of processing becomes compute bottlenecked by shitty queries, an on-demand service like Dataproc really appealing. The trends as outlined in the post involve:

  • reducing complexity
  • resource isolation (so one query can’t kill everyone)
  • better auditing & monitoring (so you know who to yell at)
  • and more flexibility (so a select few can play around with few consequences)

TLDR: Lets trade off some performance across the board to better handle “lots of people writing lots of shitty queries”.

Also Ryan Noon sighting. Always asking the tough questions.

Leave a Reply

Your email address will not be published. Required fields are marked *