Constructing a Semantic Lakehouse With AtScale and Databricks



It is a collaborative put up between AtScale and Databricks. We thank Kieran O’Driscoll, Know-how Alliances Supervisor, AtScale, for his contributions.


Kyle Hale, Resolution Architect with Databricks, coined the time period “Semantic Lakehouse” in his weblog just a few months again. It’s a good overview of the potential to simplify the BI stack and leverage the ability of the lakehouse. As AtScale and Databricks collaborate increasingly more on supporting our joint clients, the potential for leveraging AtScale’s semantic layer platform with Databricks to quickly create a Semantic Lakehouse has taken form. A semantic lakehouse supplies an abstraction layer on the bodily tables and supplies a business-friendly view of information consumption by defining and organizing the info by completely different topic areas, and defining the entities, attributes and joins. All of this simplifies the info consumption by enterprise analysts and finish customers.

Most enterprises nonetheless battle with knowledge democratization

Making knowledge obtainable to decision-makers is a problem that almost all organizations face as we speak. The bigger the group, the tougher it turns into to impose a single commonplace for consuming and getting ready analytics. Over half of enterprises report utilizing three or extra BI instruments, with over a 3rd utilizing 4 or extra. On prime of BI customers, knowledge scientists have their very own vary of preferences as do software builders.

These instruments work in numerous methods and converse completely different question languages. Conflicting analytics outputs are virtually assured when a number of enterprise models make choices by resorting to completely different siloed knowledge copies or standard OLAP cubing options like Tableau Hyper Extracts, Energy BI Premium Imports, or Microsoft SQL Server Evaluation Companies (SSAS) for Excel customers.

Preserving knowledge in numerous knowledge marts and knowledge warehouses, extracts in varied databases and externally cached knowledge in reporting instruments does not give a single model of fact for the enterprise and will increase knowledge motion, ETL, safety and complexity. It turns into a knowledge governance nightmare and it additionally implies that the organizations are operating their companies on doubtlessly stale knowledge from completely different knowledge silos within the BI layers and never leveraging the complete energy of the Databricks Lakehouse.

The necessity for a common semantic layer

The AtScale semantic layer sits between all of your analytics consumption instruments and your Databricks Lakehouse. By abstracting the bodily kind and site of information, the semantic layer makes knowledge saved within the Delta Lake evaluation prepared and simply consumable by the enterprise customers’ software of selection. Consumption instruments can hook up with AtScale through one of many following protocols:

  • For SQL, the AtScale engine seems as a Hive SQL warehouse.
  • For MDX or DAX, AtScale seems as a SQL Server Evaluation Companies (SSAS) dice.
  • For REST or Python purposes, AtScale seems as an internet service.

Reasonably than processing knowledge domestically, AtScale pushes inbound queries all the way down to Databricks as optimized SQL. Because of this customers’ queries run immediately towards Delta Lake utilizing Databricks SQL for compute, scale, and efficiency.

Semantic Lakehouse with Databricks and Atscale
Semantic Lakehouse with Databricks and Atscale

The additional benefit of utilizing a Common Semantic Layer is that AtScale’s autonomous efficiency optimization know-how identifies person question patterns to routinely orchestrate the creation and upkeep of aggregates, similar to the info engineering group would do. Now nobody has to spend the event effort and time to create and keep these aggregates, as they’re auto-created and managed by Atscale for optimum efficiency. These aggregates are created within the Delta Lake as bodily Delta Tables and could be regarded as a “Diamond Layer”. These aggregates are totally managed by AtScale and enhance the dimensions and efficiency of your BI Experiences on the Databricks Lakehouse whereas radically simplifying analytics knowledge pipelines and related knowledge engineering.

Making a tool-agnostic semantic lakehouse

The imaginative and prescient of the Databricks Lakehouse Platform is a single unified platform to assist all of your knowledge, analytics and AI workloads. Kyle’s description of the “Semantic Lakehouse” is a pleasant mannequin for a simplified BI stack.

AtScale extends this concept of a Semantic Lakehouse by supporting BI workloads and AI/ML use instances via our tool-agnostic Semantic Layer. The mix of AtScale and Databricks implies that the semantic Lakehouse structure is prolonged to any presentation layer – does not matter whether it is Tableau, Energy BI , Excel or Looker. All of them can use the identical semantic layer in AtScale.

Semantic Lakehouse - all your analytics directly on the Lakehouse
Semantic Lakehouse – all of your analytics immediately on the Lakehouse

With the arrival of the lakehouse, organizations now not have their BI and AI/ML groups working in isolation. AtScale’s Common Semantic Layer helps organizations get constant entry to all of their enterprise knowledge, regardless if it is a enterprise person in Excel or a knowledge scientist utilizing a Pocket book, whereas leveraging the complete energy of their Databricks Lakehouse Platform.

Extra sources

Watch our panel dialogue with Franco Patano, lead product specialist at Databricks for extra info and to search out out extra about how these instruments might help you to create an agile, scalable analytics platform.

In case you have any questions relating to AtScale or find out how to modernize and migrate your legacy EDW, BI and reporting stack to Databricks and AtScale – be happy to achieve out to [email protected] or contact Databricks.