Databricks Looks To Tame Data Lakes With Acquisition, New Product Launch

Red hot big data company Databricks debuts Delta Engine for deriving insights from “lakehouse” data stores, buys data visualization tech developer Redash.

ARTICLE TITLE HERE

Big data and AI tech company Databricks made two significant moves Wednesday to expand the capabilities of its Unified Data Analytics Platform, acquiring dashboarding and data visualization service Redash and unveiling a new query engine for cloud-based data lakes.

Databricks executives said the Redash acquisition and the availability of Delta Engine were both in response to the emerging “lakehouse” trend where businesses and organizations try to bring structured transactions and data quality management – traditionally attributes of databases and data warehouse systems – to data lakes.

The moves are intended to help tap into the potential of data lakes, traditionally huge stores of unorganized, often unstructured data that can be a challenge to manage and derive insights from.

id
unit-1659132512259
type
Sponsored post

[Related: The Big Data 100 2020]

“AI, machine learning, data analytics, cloud – Databricks is in the middle of all these,” said CEO Ali Ghodsi (pictured) in an interview with CRN. “Most organizations that are trying to do data science and data warehousing are using multiple architectures with data stuck in organizational silos.”

Databricks, founded in 2013 by the original developers of the popular Spark big data processing engine, has been one of the hottest IT startups in recent years. Last October the San Francisco-based company raised a stunning $400 million in Series F funding, putting the company’s valuation at $6.2 billion. Last week CNBC reported that Databricks reached $200 million in annualized recurring revenue last year and may be set to go public in 2021.

The company’s new Delta Engine is expected to help customers with the challenge of getting value out of data lakes – Ghodsi jokingly called then “data swamps” – in contrast to data warehouse systems where data is organized and formatted for specific data science and analytical tasks.

“Lakehouse” systems, somewhere between data warehouses and data lakes, are stores of unstructured data that have been organized and curated for data analytics and data science tasks.

A key component of lakehouse systems is Delta Lake, developed by Databricks in 2017 and donated to the Linux Foundation as an open-source project. Delta Lake is a data storage technology layer that runs on top of data lake systems and brings a measure of reliability to the data, making it possible to run queries and transactions on the data.

Databrick’s new Delta Engine, a high-performance query engine, works with Delta Lake to enable fast query execution for data analytics and data science without having to move data out of a data lake. Databricks said it developed Delta Engine from the ground up to take advantage of cloud hardware systems for accelerated query performance, making it easier for customers to move to a unified data analytics platform.

Databricks said Delta Engine builds on the success of the Delta Lake project by expanding control beyond storing and managing data to how data is used and consumed.

An ecosystem of companies, including developers of data ingestion technologies, has grown up around Delta Lake, according to Pankaj Dugar, Databricks vice president of business development, ISVs and tech partners. Dugar, in an interview with CRN, said the new Delta Engine product will expand Databricks’ work with those companies.

Databricks also announced that it had acquired Redash, developer of the popular Redash open-source project, for an undisclosed sum. Redash is used by analysts and data scientists to gather data from a wide variety of sources – including operational databases and data lakes – and use it to develop charts, dashboards and other visual representations of the data.

Ghodsi said adding Redash to the company’s Unified Data Analytics Platform will make it easier for users to access and analyze curated data within data lakes – including in conjunction with Delta Engine. The open-source Redash can be used with Databricks today via a free connector while Databricks plans to fully integrate Redash with the Databricks Unified Data Analytics Platform and Databricks workspace in the coming months.

Dugar said both Delta Engine and Redash create opportunities for solution providers to develop data lake practices and services. “These are net-additive to our overall partner strategy and expand the ways we can go to market with all types of partners. They can become the trusted advisors with their services and help architect the data lakehouses of the future.”