Trifacta Adds Data Quality Capabilities To Its Data Wrangling Toolset
Big data software developer Trifacta is shipping a new release of its data preparation software that expands the product's data quality management capabilities.
Those capabilities are increasingly in demand by businesses and organizations for business analytics, data management, artificial intelligence and machine learning initiatives in which data quality is a must, according to Trifacta executives.
"If you try to introduce poor-quality data into these processes, the models themselves won't be very good," said Will Davis, Trifacta senior director of product marketing, in an interview with CRN.
[Related: Big Data Wrangler Trifacta Launches Channel Program For Reseller, Consulting And Technology Partners]
Organizations working with big data today often wrestle with data that is scattered across multiple systems and data stores and is inconsistent in its quality. Gartner has estimated that 40 percent of all failed business initiatives are the result of poor-quality data.
The new capabilities also will benefit Trifacta's channel partners who provide big data system implementation and consulting services around Trifacta's software.
The company, which made the 2018 CRN Big Data 100 list, also has an OEM deal with Google, which includes Trifacta's software within its Google Cloud Dataprep service on the Google Cloud Platform.
Trifacta's product expansion comes as other companies, including developers of business analytics software, expand into Trifacta's core data preparation turf. Last month, for example, Tableau announced the general availability of Tableau Prep Conductor for self-service data preparation tasks.
Davis said many of those competitor capabilities are geared toward narrow use cases involving their own products, unlike Trifacta which targets enterprise-scale "DataOps" applications with data lake environments and data governance projects.
The new Active Profiling and Smart Cleaning functionality in Trifacta's software are designed to make data quality assessment, remediation and monitoring "more intelligent and efficient," according to the company.
Active Profiling includes a data selection model capability that identifies data quality problems and provides guidance on how to resolve them. A column section uses visual histograms, data quality bars and pattern information for addressing column distributions and data quality issues. And the ability to interact with profiling information drives suggestions and methods for data cleaning, according to the company.
The Smart Cleaning functionality includes Cluster Clean, which uses clustering algorithms to group similar data values and resolve them to a single standard; Pattern Clean, which handles composite data types such as dates and phone numbers that can be represented multiple ways and reformat them in a uniform way; and Reference Clean, which allows users to specify a reference dataset or dictionary for matching and standardizing data values.
Trifacta plans on introducing data quality capabilities for automation processes, including data flow orchestration, monitoring and alerting, later this year.