Databricks Offers Data Unity Option With New Delta Lake Release
Generative AI is playing an outsized role in a slew of new technology unveilings, including LakehouseIQ and Lakehouse AI, at the Databricks’ Data + AI Summit this week.
Databricks co-founder and CEO Ali Ghodsi
Databricks unveiled a new edition of its Delta Lake data storage format Wednesday that the data lakehouse platform developer says eliminates data silos through its new Universal Format capability.
The Delta Lake 3.0 launch is the latest development among competing open-source standards – including the Apache Foundation’s Iceberg and Hudi – for the analytic data tables in data lake systems. Delta Lake was originally developed by Databricks in 2019 and is now a Linux Foundation project.
The new Delta Lake release was one of a series of data lakehouse-related announcements Wednesday at the Databricks Data + AI Summit in San Francisco, which also included the new LakehouseIQ generative AI knowledge engine and Lakehouse AI toolset for building large language models on a data lakehouse.
[Related: Databricks Steps Up Data Governance With Okera Acquisition]
AI, specifically generative AI, is a major theme at the Databricks event this week. It also follows the news Monday that the company had struck a deal to acquire generative AI startup MosaicML for $1.3 billion.
“At Databricks, we have been saying that we want to democratize data and AI for a very long time – actually for almost a decade,” said Databricks co-founder and CEO Ali Ghodsi in his Data + AI Summit keynote Wednesday.
“We’ve been saying this technology needs to be democratized. And the problem in the industry has been that these two worlds of data and AI have actually been separated in the past…And these worlds were incompatible.”
Ghodsi called Delta Lake “the foundation” of the Databricks Data Lakehouse Platform, Databricks’ flagship offering for data lakes, data warehousing and data analytics, and artificial intelligence and machine learning workloads.
The key innovation in Delta Lake 3.0, which is now in public preview, is its new Universal Format or “UniForm” capability that allows data stored in Delta Lake data tables to be read as if it were in Apache Iceberg or Apache Hudi format.
By automatically supporting Iceberg and Hudi within Delta Lake, UniForm averts the need to choose a data format and eliminates data compatibility issues, according to Databricks. And it eliminates the need for complex integration work created by different data formats.
A UniForm Approach
UniForm works by automatically generating metadata needed for Iceberg or Hudi and so unifies table formats so users don’t have to choose or do manual conversions between formats.
“What we’ve done is we actually unified the format of the metadata of all three projects inside Delta Lake. We call this UniForm, which is short for universal format,” Ghodsi said. “That way you can actually eliminate this friction, these format wars that some would like us to have, we can just eliminate that. And we can just really democratize access to data in these lakehouses. We think this is a really big step forward.”
In a pre-conference interview Joel Minnick, Databricks marketing vice president, told CRN that the Delta Lake format has gained “really strong adoption” across the industry. But he said that “people like choice” and it’s unlikely the industry will ever standardize on a single data format. With Delta Lake and the Delta Lake community, Databricks is trying to achieve data unification and simplification, he said.
From a partner perspective Minnick said Delta Lake 3.0 and UniForm will simplify the process of developing and maintaining connectors to data systems with incompatible data formats. And that means solution providers and systems integrators can focus on higher-value tasks.
Another new component of Delta Lake 3.0 is Delta Kernal, which addresses connector fragmentation issues by ensuring that connectors are built against a core Delta library that implements Delta specifications. That provides one stable API for developers to code against, alleviating the need to update Delta connectors with each new version or protocol change.
Also new in Delta Lake 3.0 is Delta Liquid Clustering, a flexible data layout technique that Databricks says provides cost-efficient data clustering as data volumes grow and improves data read and write performance.
Lakehouse AI And LakehouseIQ Debuts
At the Data + AI Summit Databricks also unveiled LakehouseIQ, a new generative AI knowledge engine that makes it possible for anyone in an organization to use natural language to search, query and understand data. The Databricks Assistant, powered by LakehouseIQ, is now in preview.
During his keynote Ghodsi touted the LakehouseIQ technology as a way to “democratize” access to data and analytics. “I actually think this will be the future of Databricks,” the CEO said, adding that LakehouseIQ will be a core component of the company’s strategy “for many years to come.”
LakehouseIQ uses generative AI to learn what makes an organization’s data, data usage patterns, operations, culture and jargon unique, allowing it to answer questions within the context of the business, according to Databricks. LakehouseIQ is integrated with the Databricks Unity Catalog for unified search and data governance.
Databricks also debuted Lakehouse AI, a new suite of generative AI tools that customers can use to build and govern their own generative AI applications, including large language models (LLMs), that run within the Databricks Lakehouse Platform.
Databricks said Lakehouse AI provides a data-centric approach to AI with built-in capabilities for the entire AI lifecycle and underlying data monitoring and governance.
Tools in Lakehouse AI include Vector Search, Lakehouse Monitoring, LLM-optimized Model Serving, MLflow 2.5 with LLM capabilities such as AI Gateway and Prompt Tools, and a curated collection of open-source models.
And Databricks announced that the company’s Unity Catalog system now includes Lakehouse Federation capabilities that enable organizations to create highly scalable and performant data mesh architectures with unified governance. Databricks said the new federation capabilities, going into public preview shortly, help unify previously siloed data systems under the Databricks Lakehouse Platform.
Databricks also said that the Databricks Marketplace for sharing data products, data models and notebooks is now live.