Databricks

Databricks is an application with various features to support deployment of Apache Spark. Extremely important for our purposes is the ability of Databricks to automatically scale the computing cluster used by Spark to meet increasing demand, and likewise reduce resource utilization when demand is lower. Compared to other big-data technologies, this feature of Databricks allows far more efficient resource utilization, resulting in budget friendly data pipelines.

The Jupyter-type notebook development paradigm offered by Databricks allows for easy collaboration among programmers, data analysts, and data scientists working with collected data in their preferred language of R, Python, Java, Scala, or SQL.