Developers have been kids in a candy store for the past decade with exciting options for databases (such as MongoDB, Elasticsearch, and Cassandra), cloud storage services like Amazon S3, and new paradigms like microservices and serverless computing architectures. With these new approaches, applications are being developed more quickly and efficiently. That’s the good news.
The bad news is that the data generated by these applications creates new challenges when it comes to analytics. Those copious quantities of data largely reside in JSON and other nonrelational formats, inaccessible to traditional tools and methodologies for analytics. For any enterprise looking to embrace the future of data, its biggest enemy is the crusty data infrastructure upon which its past has been built.
Fortunately, there is hope in the form of a new breed of open source projects, like Dremio and Presto, has arisen to bridge the gap between traditional business intelligence (BI) tools and new-fangled data sources. While still early, these tools show promise as a way to let developers use their preferred tools while someone else stitches together the silos.
Old analytics dogs meet new data tricks While application development practices have evolved dramatically in recent years, the way companies manage data for analytics hasn’t changed nearly as much. This would not matter so much if there were not so darn many BI users in any given company. We rightly tout the importance of software developers, but there are probably 10 times as many BI users as software developers.
Leaving them behind really is not an option, yet that is exactly what we are doing.
By and large, most analysis is performed using BI tools such as Tableau, Looker, Power BI, Qlik, and Cognos. These tools all assume the data is in one place, in a relational model. Unfortunately for the utility of such tools, the truth is that no company of any size keeps all its data in a single data warehouse, or even in one of those much-hyped data lakes. There are and always will be silos.
A number of open source projects have emerged to bridge this gap between traditional BI tools. These include Presto and Dremio, as well as Amazon Athena (which is based on Presto), and Google BigQuery. These projects aim to run between data sources (relational, file systems, NoSQL sources) and different SQL-based tools like BI as well as data science platforms based on Python and R.
Dremio is different than the other new data kids While each purports to fulfil that goal, Dremio is different. It is much more than the query execution engine provided by something like Presto. Dremio integrates other key functional areas for query acceleration, data curation, data lineage, data catalogue, and delivers the solution as a self-service model that is similar to Google Docs, but for data sets.
That is cool, and it just got a bit cooler.
Now, Dremio has announced support for Looker, a popular BI platform. This lets users reach a broader range of sources than they could before (MongoDB, Elasticsearch, S3, HDFS, Azure ADLS, etc), perform joins across sources, and accelerate queries. It expands the reach of data consumers using Looker, and it helps make them be more independent and self-directed — no more waiting in the data bread lines for IT to move data into one silo for analysis.
This is all part of a bigger trend to let data users use their favourite tools, let developers build apps on their favourite databases and filesystems, and solve the technology mismatch problem with a new layer that sits between the tools and the data.
In short, the choice moves from “either/or” to “and,” which is a great selling point to IT pros and others who need to extend the value of existing BI investments while embracing a more modern, open source-driven data future. It lets developers be developers, without having to slow down to bother with the data silos they leave in their wake.