10 Big Data Trends You Should Know About For 2022
From predictive analytics and data fabric architecture to data observability and data governance software, here’s a look at 10 big data trends and technologies that solution and service providers need to be aware of in the new year.
Big Developments In Big Data
Businesses and organizations have long used business reporting and data analytics on a tactical basis, answering such questions as “just what were sales in Wisconsin in 2021?” But in recent years big data management and analytics has become more strategic, spurred by digital transformation initiatives, efforts to leverage data for competitive advantage and even moves to monetize data assets.
More immediately, with the COVID-19 pandemic and its economic disruptions, businesses now realize the need to better utilize data for such tasks as managing supply chains and retaining employees. And the wave of cybersecurity incidents making headlines has brought home the importance of stepping up their data governance operations.
All this is changing how businesses collect, manage, utilize and analyze their growing volumes of data. Here‘s a look at 10 big data trends that the channel should keep an eye on in 2022.
Analyzing Data Across Multiple Clouds
Businesses and organizations are increasingly storing data in cloud platforms like Amazon Web Services, Snowflake and Microsoft Azure, even creating networks that distribute data storage across multiple clouds. But business analytics initiatives can be a challenge when data is scattered across on-premises and multi-cloud platforms.
In 2022, we’ll see increased use of new software tools such as the Alluxio Data Orchestration Platform, Qlik Forts and Starburst Galaxy that provide a unified view of data scattered across multiple on-premises and cloud systems, and access that data – wherever it resides – for data analytics tasks.
Gaining a virtual view of dispersed data and being able to access it with everyday business intelligence tools is increasingly seen as a viable alternative to the traditional data warehouse where data is collected from multiple sources and managed in a central location.
The Shift To Predictive And Prescriptive Analytics Accelerates
This idea comes from a recent report from data cloud platform company Snowflake on data science and analytics trends. Data analytics has traditionally been used to understand what happened. But there is growing use of data analytics and machine learning technology to predict what will happen and offer suggested prescriptive responses – even automated actions – to the analytical results, turning data into “an actionable asset.”
This shift is accelerating thanks to the growing availability and adoption of easy-to-use machine learning tools by analysts and data scientists, the ability to manage and deploy at scale machine learning features with next-generation feature stores, and a new generation of distributed frameworks for training and deploying machine learning models.
More frequent releases of machine learning tools and libraries are a factor here, as are consolidated platforms (like Snowflake) that are closing the gap between analytics and machine learning.
Data Fabric Vs. Data Mesh
“Data fabric” and “data mesh” are emerging architectures for integrating, accessing and managing data across multiple heterogeneous platforms and technologies. But there are differences, so expect to hear more about both in 2022 along with some debate – and possibly some confusion.
The data fabric concept has been around for a few years, but it’s become more prevalent as data is increasingly scattered across hybrid-cloud/multi-cloud networks. Data fabrics weave together data from internal silos and external data sources to create data networks to power business applications, AI and analytics, according to a definition from Stardog, which develops an Enterprise Knowledge Graph platform.
Major big data players such as Tibco, Talend and Informatica, along with newer companies like K2, develop software used in data fabric implementations. Stardog founder and CEO Kendall Clark believes data fabric will become more mainstream in 2022, noting in an email comment that “the maturity of enterprise data fabric as the key to data integration in the hybrid multi-cloud world will become more commercially evident.”
This year “will see significant growth and interest in data fabric solutions as companies seek to leverage a common management layer to accelerate analytics migration to the cloud, ensure security and governance, and quickly deliver business value by supporting real-time, trusted data across hybrid-multi-cloud – all in driving digital transformation,” said Buno Pati, CEO at big data software vendor Infoworks, in a statement. “We believe this technology will be broadly adopted over the next five years.”
The “data mesh” concept, developed by Zhamak Dehghani, a director at IT consultancy Thoughtworks, is focused on the logical and physical interconnectedness of data from producers through to consumers, according to Starburst, which targets its data analytics engine for use within data mesh systems. Data observability software vendor Monte Carlo says data mesh is an alternative to a monolithic data lake and “embraces the ubiquity of data in the enterprise by leveraging a domain-driven, self-service design.”
Data Observability Goes Mainstream
“Data observability” became a hot buzz term in 2021. But 2022 will be the year the implementation and application of data observability technology really catch up to the hype.
Data analytics and data-intensive applications are key components of many digital transformation and machine learning initiatives – making it critical that the quality, reliability and completeness of the data used for those projects meet high standards.
Just as there are service level agreements for applications and IT infrastructure, data observability provides a way to monitor data for its quality, behavior, privacy and ROI, says Sanjeev Mohan, a consultant and advisor at data and analytics firm Eckerson Group, in a recent blog post.
Data engineers, data scientists, chief data officers, data privacy officers, chief information security officers, CIOs and CTOs “can all benefit from a richer view of their data assets,” Mohan says.
The data observability space is crowded with established companies like Cisco Systems, Splunk, Sumo Logic, New Relic and Data Dog, as well as startups including Cribl, Monte Carlo and Bigeye. Some, like Splunk and New Relic, offer data observability tools for systems management and DevOps purposes while others, like Monte Carlo and Bigeye, are more focused on the quality of the data itself – what some call “data health.”
In an Observability Trends Report released in December, based on a survey of nearly 1,300 IT leaders, software engineers and developers, New Relic says data observability will become mission critical in 2022 for managing the entire software lifecycle, not just troubleshooting problems.
Increased Deployment And Large-Scale Use Of Machine Learning
Machine learning has been a hot area in the last few years with both established IT vendors and – especially – startups offering software for developing, training, deploying and managing machine learning models and the data they use. This year will see the use of ML tools become more widespread due to several developments, according to a report from Snowflake on data science and analytics trends.
Easy-to-use machine learning tools, including “AutoML” or automated machine learning software, will automate the technical aspects of data science tasks. That will allow data scientists to do their jobs more quickly and even make data science capabilities available to a wider audience of data analysts.
Managing and deploying machine learning features at scale will become easier with increased use of feature store technology that has become available in the last year, according to the report. And continuous releases of machine learning tools, libraries and frameworks offer more options for data scientists.
Machine learning projects will increasingly be housed in cloud data platforms such as Snowflake and Databricks, predicts Eckerson Group.
And some see machine learning and data analytics essentially merging into one operation. “Automation, business intelligence and AI will converge into one practice, fueling the proliferation of citizen data scientists across the enterprise,” said Florian Douetteau, co-founder and CEO of Dataiku, an AI and machine learning platform developer, in an email.
The Rise Of Comprehensive Data Governance Platforms
Until recently data managers looking to automate data governance processes have had to rely on point products and tools with specific functionality including data catalogs, data lineage, data quality, data access control, data security, master data management and more. That made it difficult to protect data against unauthorized access and misuse and ensure that data met corporate policy and government regulatory requirements, notes industry analyst Wayne Eckerson of the Eckerson Group in a recent report on big data trends for 2022.
Over the last year a number of big data companies have been combining these tools and capabilities into increasingly comprehensive data governance platforms, Eckerson notes, including Ataccama, Alation, Collibra, Hitachi Vantara, Informatica, Precisely, Talend and Zaloni. As we enter 2022 look for data managers to adopt and implement these platforms to improve their data governance efforts.
Increased Use Of DataSecOps Technologies
As a corollary to the need for better data governance, businesses and organizations will also increasingly turn to DataSecOps software from vendors such as Immuta and Satori to ensure data protection and data privacy policies are being followed.
The need for DataSecOps tools is being driven by the increasing distribution of data across data fabric and data mesh architectures, noted Satori founder and CEO Eldad Chai, in an email. More sensitive personal data is migrating to the cloud for analytical and machine learning tasks, he notes, and “over-privileged” employees at startups in financial and healthcare technologies have broad access to customer data. And attackers are increasingly targeting data held by analytics and AI service providers, he said.
“As data volumes grow and usage expands, it has become impossible to control who has access to what data, ensure proper compliance, and enable safe data sharing,” said Matt Carroll, CEO of Immuta, a Boston-based developer of universal cloud data access control software, in an email. “Without automatic data access control, organizations have no way of monitoring who is accessing what data, when, and for what purpose, jeopardizing the data’s security and privacy.”
Supply Chain Analytics Becomes A Strategic Imperative
For many businesses managing their way through supply chain disruptions has been the biggest challenge associated with the COVID-19 pandemic. While the disruptions were triggered by shuttered manufacturing plants around the world, those problems were compounded by the lack of visibility many businesses have into their supplier networks, making it difficult to shift plans, find alternative suppliers, and adjust distribution to match supply with demand.
In an Eckerson Group report analyst Rich Fox notes that supply chain management has evolved from a tactical function to a strategic imperative. Look for many businesses in 2022 to step up their digital tracking of data from manufacturers and transportation companies using sensors and other technologies, analyzing supply chain interdependencies, and developing contingency plans using machine learning and AI capabilities.
Expanded Use Of Predictive Analytics To Overcome “The Great Resignation”
Just as the COVID-19 pandemic has created supply chain problems for businesses, it also triggered what has become known as “The Great Resignation” where millions of people have quit their jobs to retire, change their life direction – or because they were dissatisfied with where they were working and felt they could do better elsewhere.
Data analytics has traditionally been applied to human resource management for basic reporting tasks such as compiling employment data for tax purposes. But some forward-thinking businesses and organizations have begun applying the same kind of predictive technologies used to monitor customer churn to identifying key employees that may be on the verge of quitting by analyzing data around compensation, job satisfaction, productivity and other metrics.
Look for that trend to accelerate in 2022. “As organizations grapple with employee turnover amidst the ‘Great Resignation,’ they will increasingly look to predictive analytics to help save the day,” says Nick Curcuru, vice president of advisory services at Privitar, a developer of data privacy management software.
“By embracing predictive analytics, organizations can identify key trends in employee engagement and key moments in time to intervene, enabling them to take action and possibly save relationships that might be on the brink. It can cost as much as four times an employee salary to replace someone as it does to keep them, so being able to prevent churn can provide a huge value to an organization,” Curcuru said in an email.
But Curcuru cautions that HR analytics efforts must include safeguards for both employees and their personal data and management must be transparent about policies to protect data and its ethical use.
Data Marketplace Use Will Explode
Business analytics initiatives have traditionally focused on analyzing internally generated data such as sales, market surveys and business performance. But increasingly businesses are obtaining data from external sources and using it to supplement and enrich their own data: IDC says that 75 percent of enterprises in 2021 used external data sources to strengthen cross-functional and decision-making capabilities.
This year will see an explosion in the use of data marketplaces to provide businesses with public, third-party data for analytical and machine learning tasks, according to analyst Joe Hilleary in the Eckerson Group report.
Data marketplaces provide a platform where companies can supply and consume data for a fee. Hilleary says demand for third-party is growing while acquiring and using such data “now has a much lower opportunity cost and a clearer value proposition.” That’s because the number of data providers is increasing and the supply of data for sale is growing as more organizations look to monetize their own data assets. And the expanding capabilities of cloud data marketplaces such as the Snowflake Data Marketplace simplifies the process of buying and selling data.