Dremio Offers New Edition Of Its Data Lake Engine Optimized For AWS
The Dremio system, available through the AWS Marketplace, includes new elastic engines and parallel projects capabilities boost data lake query performance and lower cloud infrastructure costs.
Big data software developer Dremio is providing a new edition of its data lake query engine for the Amazon Web Services cloud platform and is making the software available through the AWS Marketplace.
The AWS edition also offers a pair of new technologies, elastic engines and parallel projects, that the company said support on-demand data lake insights and reduce cloud infrastructure costs for data lake initiatives.
While the free Community Edition and paid subscription Enterprise Edition of Dremio’s data lake engine previously supported the AWS cloud platform, the new Dremio AWS Edition is purpose-built and optimized for AWS, said Jason Nadeau, Dremio marketing vice president, in an interview. (The Community and Enterprise editions also support Microsoft Azure and Google Cloud Platform.)
[ Related: The Big Data 100 2020]
Data lakes are huge stores of data, often including both structured and unstructured data and relational and non-relational data. But unlike data warehouses where data is organized and formatted for specific analytical purposes, data is stored in data lakes for potentially multiple uses and is unorganized. That can make finding, accessing and extracting data for specific tasks a challenge.
Dremio’s data lake engine is designed to help data scientists and business analysts explore, curate and query huge volumes of data within data lake systems. The technology, for example, can be used in conjunction with a business analytics tool like Tableau to analyze data “in place” within an AWS S3-based data lake, instead of moving or copying data to a data warehouse or other system.
Dremio is built on Apache Arrow, an open-source framework for developing data analytics applications that can process in-memory columnar data.
Business analysts “are doing a lot of data exploration,” Nadeau said of the demand for Dremio’s technology. “And we find that organizations, more and more, want an open architecture.”
The AWS Edition includes two significant technology additions to the Dremio engine: elastic engines and parallel projects.
Elastic engines, currently available only in the AWS Edition, is designed to help businesses and organizations reduce infrastructure costs by better managing IT resource utilization. Traditional query engines are often built around a single execution cluster architecture that supports multiple, dynamic query workloads. That means organizations often purchase IT infrastructure and cloud computing capacity to handle peak query workloads rather than taking advantage of the inherent elasticity of AWS, according to Dremio.
“Generally, customers over-provision to handle peaks,” Nadeau said.
Elastic engines make it easier to scale up or down to meet dynamic query workload requirements. Data managers can configure and size compute engines for specific workloads running within their AWS accounts, isolating workloads and making it possible to eliminate under- and over-provisioning compute resources and eliminating costs associated with idle infrastructure.
“With Dremio AWS Edition, data teams can query the data in place in S3 with lightning-fast interactive performance while reducing their cloud infrastructure costs by over 90 percent compared to traditional SQL engines,” said Tomer Shiran, Dremio chief product officer, in a statement.
The new parallel projects functionality in Dremio AWS Edition makes it possible to deploy multi-tenant instances of the software with each instance containing all associated configuration, metadata and data reflection information for instance isolation.
That provides improved lifecycle automation and best practices for system deployment, configuration and optimization in multi-tenant Dremio environments, Nadeau said, such as when multiple business units or departments within an organization are running separate query projects utilizing a shared data lake.
The company is offering a free version of Dremio AWS Edition and a paid subscription that includes additional security and governance capabilities and Dremio support.
Dremio works with a number of systems integrators, service providers and consultants with big data analytics practices, Nadeau said. The company is currently working on a more formal partner ecosystem program for debut later this year, he said.
Dremio, based in Santa Clara, Calif., recently raised $70 million in Series C funding.