More turbulence is coming to the big-data analytics market …
Big-data analytics has been one of the dominant tech trends of this decade and one of the most dynamic and innovative segments of the information technology market. But today’s big data analytics market is quite different from the industry of even a few years ago, and it’s almost sure to be substantially different in a few years’ time.
In 2018, we saw many clear signs that the big data market that rose rapidly at the start of this decade is beginning to recrystallize in a very different form. In the coming years, this market probably won’t even be referred to as “big data.” That’s because much of its evolution is toward artificial intelligence, which, though data-driven at its core, doesn’t necessarily rely on massive amounts of data in order to be effective in many applications.
As Wikibon looks ahead to 2019, we foresee the following dominant trends in big-data analytics:
Public cloud providers are absorbing most new big-data analytics growth opportunities
Enterprises are moving more of their big data analytics workloads to public clouds and developing more greenfield applications for these environments.
In 2019, the three principal public cloud providers — Amazon Web Services Inc., Microsoft Azure and Google Cloud Platform — will step up their efforts to help enterprise accounts migrate their data lakes away from on-premises platforms.
Other public cloud providers will struggle to hold onto their big-data analytics market shares. In 2018, the challenges from the public cloud leaders compelled IBM Corp. to buy RedHat Inc. Going forward, IBM, Oracle Corp. and other public cloud providers will emphasize hybrid cloud solutions that help customers centralize management of big-data assets distributed between private and public clouds.
In addition, more big-data public cloud providers — ceding the infrastructure-as-a-service and platform-as-a-service segments to AWS, Microsoft Corp. and Google LLC — will shift to offering premium software-as-a-service analytics applications for line-of-business and industry-specific opportunities. For example, Snowflake Computing Inc. has been a notable success in the cloud data warehousing market, landing $450 million funding in 2018 to sustain its growth trajectory.
The big-data analytics ecosystem is going deeply cloud-native
Kubernetes, the open-source software popular for managing software containers for applications that need to move easily among clouds and on-premises data centers, is the foundation for the new generation of cloud-native big data. The most noteworthy trend over the past year in this market has been the recrystallization of the data ecosystem around Kubernetes.
The advance of cloud-native big-data architectures drove a lot of funding and mergers-and-acquisitions activity in 2018. That explains why Pivotal, now focusing on distributed in-memory data in multiple clouds, fetched $555 million in its initial public offering of stock. The need for simpler tools for loading data into cloud data warehouses was behind Talend’s acquisition of Stitch. And growing enterprise requirements for scalable cloud-based file and object storage is a big part of the reason why Cloudian bought Infinity Storage.
In 2019, we predict that the Open Hybrid Architecture Initiative will deliver on its plan to modularize and containerize HDFS, MapReduce, HBase, Hive, Pig, YARN and other principal Hadoop components. We also predict that the prime sponsors — Hortonworks Inc., soon to become part of Cloudera Inc., and IBM/Red Hat — will deliver next-generation commercial Hadoop platforms that incorporate this architecture into their respective hybrid-cloud solution portfolios in early 2019, and that other cloud solution providers will follow their lead throughout the year.
Similar containerization initiatives in the Spark, TensorFlow, streaming, distributed object store and block storage segments will take hold in 2019 as the entire big-data stack decouples for more agile deployment and management in Kubernetes-based DevOps environments.
Every big data analytics platform provider is investing heavily in data science toolchains
Big data analytics solution providers are in a war to win the hearts and minds of the new generation of developers working on AI projects. A new generation of data science workbenches have come to market in the past few years, including Anaconda, Dataiku, DataKitchen, DataRobot, Dimensional Mechanics, Domino Data Lab, H2O.ai, Hydrosphere.io, Kogentix, Pipeline.ai and Seldon. In addition, established big data analytics vendors such as IBM, Oracle, Cloudera, Alteryx and others have jumped into this niche, as have all three principal public cloud vendors.
The DataRobot, Tamr, and Immuta venture capital funding rounds in 2018 just hint at the fact that seemingly dozens of startups have taken root in this data science workbench segment over the past few years. In 2018, Wikibon has come across a growing number of them based in China and elsewhere in the Far East.
In 2019, more companies will emphasize their offerings’ ability to automate such traditionally manual tasks as feature engineering, hyperparameter optimization and data labeling. Big-data analytics solution providers will invest heavily in tools to accelerate deployment of trained AI models into production applications. As the big-data analytics ecosystem shifts toward cloud-native architectures, more data science workbenches will incorporate the ability to automate tasks over Kubernetes orchestration fabrics and also to containerize models for deployment into public and private clouds. This trend will bring emerging standards, such as Kubeflow, into burgeoning data science DevOps toolchain ecosystem.
Hadoop and Spark are becoming legacy technologies
Hadoop has seen its role in the big-data analytics arena steadily diminish in the past several years. Flattening growth prospects in the Hadoop market were a principal factor behind Cloudera’s and Hortonworks’ agreement to merge in 2018.
Increasingly, Hadoop has seen its core use cases narrow to a distributed file system for unstructured data, a platform for batch data transformation, a big-data governance repository and a queryable big data archive.
In 2019, Hadoop will struggle to expand its application reach to online analytical processing, business intelligence, data warehousing and other niches that are addressed by other open-source projects. By the end of the year, Hadoop will start to be phased out in many enterprise big-data environments even in its core data-lake role in favor of distributed object stores, stream-computing platforms and massively scalable in-memory clusters.
Even Apache Spark, which developed in the middle of this decade as a Hadoop alternative, is feeling increasingly like a legacy technology in many TensorFlow-centric AI shops. This trend can be seen by the data extract/transform/load niche into which Spark is increasingly being deployed, which may decline in importance as schema-on-read architectures come to the forefront.
Big data catalogs are become central to data management DevOps
Users’ ability to rapidly search, discover, curate and govern data assets is now the foundation of digital business success. In this regard, Looker Data Sciences Inc. acquired $100 million in a Series E funding to address the market for big data cataloguing, governance, preparation and visualization solutions.
In 2019, we expect to see more enterprises repurpose their data lakes into big-data catalogs within application infrastructures that drive the productivity of knowledge workers, support the new generation of developers who are building and training production AI applications, and facilitate algorithmic transparency and e-discovery.
We also expect vendors such as IBM, Cloudera/Hortonworks, Informatica, Collibra and others deepen their existing big-data catalog platforms’ ability to manage more of the metadata, models, images, containers and other artifacts that are the lifeblood of the AI DevOps workflow. More big data catalogs will be deployed across multiclouds, leveraging the new generation of virtualization tools that present a single control pane for managing disparate data assets across public and private clouds. And we predict that the principal public cloud providers — AWS, Microsoft, and Google — will roll out their own big-data catalogs for customers who choose to deploy those services in hybrid public/private clouds.
Data lakes are evolving toward cloud object storage and stream computing
In 2018, cloud object storage platforms such as AWS S3 and Microsoft Azure Data Lake Storage continued to supplant Hadoop in enterprise data lakes. We also saw venture capitalists prioritize funding for established providers of multicloud data access, query and visualization solutions (e.g. Dremio, $25 million Series B); software-defined multicloud storage (e.g. Scality, $60 million Series E) and cloud object storage (e.g. Cloudian, $94 million Series E).
Going forward, that trend will continue, but it’s in the process of being eclipsed over the next three to five years by stream computing backbones. Low-latency streaming platforms such as Kafka, Flink and Spark Structured Streaming are becoming as fundamental to enterprise data infrastructures as relational data architectures have been since the 1970s.
Business intelligence goes all AI and all in memory
AI is remaking the business intelligence market inside and out. Over the past few years, one of the core BI trends has been the convergence of the technology’s traditional focus on historical analytics with a new generation of AI-infused predictive analytics, search and forecasting tools that allow any business user to do many things that used to require a trained data scientist.
In 2019, more BI vendors will integrate a deep dose of AI to automate the distillation of predictive insights from complex data, while offering these sophisticated features in solutions that provide self-service simplicity and guided next-best-action prescriptions. We saw an indicator of this trend in 2018 with startup ThoughtSpot Inc. scoring $145 million in a Series D funding for its innovative AI-augmented business analytics solution portfolio.
With regard to AI’s growing role in practically every segment of the big data analytics market, consider these funding stories we saw in 2018:
Many of the largest seed rounds this past year were to established AI solution providers, including AI automation workbench startup DataRobot Inc. with a $100 million Series D round, automated data preparation firm Tamr Inc. with an $18 million Series D and AI DevOps data privacy control company Immuta Inc. with a $20 million Series B.
Enterprises’ insatiable appetite for powerful AI-driven search technologies for sifting through growing piles of log data explains why Elasticsearch B.V. was able to raise $252 million in its IPO.
Another dominant trend in big data analytics has been in-memory architectures. This explains why, in 2018, MemSQL Inc. scored $30 million in a Series D round for in-memory transactional analytics, InfluxData Inc. took in $35 million in a Series C funding for real-time performance monitoring using a time-series database, and Actian Corp. was acquired by equity firm HCL on the strength of its established in-memory hybrid transaction/analytic platform.
Edge computing is radically remaking database architectures
Databases as we’ve known them are in the process of being are deconstructed and reassembled for edge-facing deployments.
Much of the evolution in the big data analytics market is toward edge-facing, streaming, data-in-motion architectures, which don’t necessarily depend on huge storage architectures. That explains why we saw notable funding rounds in 2018 for scalable machine-data storage, processing, and analytics (CrateDB, $11 million Series A) and streaming data pipeline integration, monitoring, and management (StreamSets, $24 million Series C).
In 2019, enterprises will deploy streaming platforms to drive low-latency DevOps pipelines that continuously infuse mobile, IoT, robotics, and other edge applications with trained, best-fit machine-learning models. Online transactional analytic processing, data transformation, and data governance workloads are also increasingly moving toward low-latency, stateful streaming backbones.
Over the coming years, disruptive new data platforms will come to market that combine mesh, streaming, in-memory, and blockchain capabilities. Many of these new distributed data platforms will be optimized for continuous AI DevOps pipelines that require low-latency, scalable and automated data ingest, modeling, training and serving to edge devices. Serverless interfaces to these analytic-pipeline capabilities will be standard, supplemented by stateful streaming fabrics that support in-line recommendation engines, next best action and other transactional workloads in edge devices on the emerging 5G broadband wireless networks.
Those are Wikibon’s year-end retrospectives and look-aheads for big data analytics. I would like to hear what my readers think will be the dominant trends in this market going forward.
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.
The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.