That includes 4 new rising traits and 6 massive traits from final yr
As we shut out 2022, it’s superb to see how a lot the information world has modified.
It was lower than a yr in the past in March that Knowledge Council occurred. Sure, it was simply an occasion. However it was the occasion, the primary in-person convention since COVID. It was the information world coming alive once more and assembly head to head for the primary time in two lengthy years.
Since then, we’ve been busy stirring up controversy with our scorching takes, debating our tech and group, elevating vital conversations, and duking it out on Twitter with Friday fights. We had been in progress mode, at all times looking for the following new factor and vying for a piece of the seemingly infinite knowledge pie.
Now we’re getting into a special world, certainly one of recession and layoffs and price range cuts that 98% of CEOs anticipate will final 12–18 months. Corporations are getting ready for battle, amping up the stress and shifting from progress mode to effectivity mode.
In 2023, we’ll face a brand new set of challenges — enhancing effectivity, refocusing on speedy affect, and making knowledge groups probably the most helpful useful resource in each group.
So what does this imply for the information world? This text breaks down the ten massive traits you must know concerning the trendy knowledge stack this yr — 4 rising traits that shall be a giant deal within the coming yr, and 6 current traits which are poised to develop even additional.
Obtain this text as a PDF.

With the current financial downswing, the tech world is trying into 2023 with a brand new give attention to effectivity and cost-cutting. This may result in 4 new traits associated to how trendy knowledge stack corporations and knowledge groups function.

Storage has at all times been one of many largest prices for knowledge groups. For instance, Netflix spent $9.6 million per thirty days on AWS knowledge storage. As corporations tighten their budgets, they’ll must take a tough have a look at these payments.
Snowflake and Databricks have already been investing in product optimization. We’ll doubtless see extra enhancements to assist prospects lower prices this yr.
For instance, in its June convention, Snowflake highlighted product enhancements to hurry up queries, scale back compute time, and lower prices. It introduced 10% common quicker compute on AWS, 10–40% quicker efficiency for write-heavy DML workloads, and seven–10% decrease storage prices from higher compression.
At its June convention, Databricks additionally devoted a part of its keynote to cost-saving product enhancements, such because the launches of Enzyme (an computerized optimizer for ETL pipelines) and Photon (a question engine with as much as 12x higher worth to efficiency).
Later within the yr, each Snowflake and Databricks doubled down by investing additional in price optimization options, and extra are certain to come back subsequent yr. Snowflake even highlighted cost-cutting as certainly one of its prime knowledge traits for 2023 and affirmed its dedication to minimizing price whereas rising efficiency.
In 2023, we’ll additionally see the expansion of tooling from impartial corporations and storage companions to additional scale back knowledge prices.
Darkish knowledge, or knowledge that by no means really will get used, is a major problem for knowledge groups. As much as 68% of information goes unused, though corporations are nonetheless paying to retailer it.
This yr, we’ll see the expansion of cost-management instruments like Bluesky, CloudZero, and Slingshot designed to work with particular knowledge storage programs like Snowflake and Databricks.
We’ll additionally see trendy knowledge stack companions introduce appropriate optimization options, like dbt’s incremental fashions and packages. dbt Labs and Snowflake even wrote a whole white paper collectively on optimizing your knowledge with dbt and Snowflake.
Metadata additionally has a giant function to play right here. With a trendy metadata platform, knowledge groups can use recognition metrics to seek out unused knowledge belongings, column-level lineage to see when belongings aren’t linked to pipelines, redundancy options to delete duplicate knowledge, and extra.
A lot of this could even be automated with lively metadata, like robotically optimizing knowledge processing or purging stale knowledge belongings.
For instance, an information crew we work with lowered their month-to-month storage prices by $50,000 simply by discovering and eradicating an unused BigQuery desk. One other crew deprecated 30,000 unused belongings (or two-thirds of their knowledge property) by discovering tables, views, and schemas that weren’t used upstream.

[Data Domain and ServiceNow] had been constructed and run for efficiency, full cease… Our corporations ran at the next velocity, with larger requirements and a narrower focus than most. Going quicker, sustaining larger requirements, and with a narrower aperture. Sounds easy? The query is the way you go about amping up your group. How a lot quicker do you run? How a lot larger are your requirements? How laborious do you focus?
Frank Slootman has IPOed three profitable tech corporations, no small feat within the startup world. He stated that his success got here right down to optimizing crew velocity and efficiency.
Previously few years, knowledge groups have been capable of run free with much less regulation and oversight.
We now have a lot perception within the energy and worth of information that knowledge groups haven’t at all times been required to show that worth. As a substitute, they’ve chugged alongside, balancing day by day knowledge work with forward-looking tech, course of, and tradition experiments. Optimizing how we work has at all times been a part of the information dialogue, nevertheless it’s typically relegated to extra urgent issues like constructing an excellent cool tech stack.
Subsequent yr, this can now not lower it. As budgets tighten, knowledge groups and their stacks will get extra consideration and scrutiny. How a lot do they price, and the way a lot worth are they offering? Knowledge groups might want to change into extra like Frank Slootman, specializing in efficiency and effectivity.
In 2023, corporations will get extra severe about measuring knowledge ROI, and knowledge crew metrics will begin changing into mainstream.
It’s not simple to measure ROI for a perform as elementary as knowledge, nevertheless it’s extra vital than ever that we determine it out.
This yr, we’ll see knowledge groups begin growing proxy metrics to measure their worth. This may increasingly embody utilization metrics like knowledge utilization (e.g. DAU, WAU, MAU, and QUA), web page views or time spent on knowledge belongings, and knowledge product adoption; satisfaction metrics like a d-NPS rating for knowledge customers; and belief metrics like knowledge downtime and knowledge high quality scores.

For years, the trendy knowledge stack has been rising. And rising. And rising some extra.
As VCs pumped in tens of millions of {dollars} in funding, new instruments and classes popped up on daily basis. However now, with the financial downturn, this progress section is over. VC cash has already been drying up — simply have a look at the lower in funding bulletins over the past six months.
We’ll see fewer knowledge corporations and instruments launching subsequent yr and slower enlargement for current corporations. Finally, that is in all probability good for consumers and the trendy knowledge stack as a complete.
Sure, hypergrowth mode is enjoyable and thrilling, nevertheless it’s additionally chaotic. We used to joke that it will suck to be an information purchaser proper now, with everybody claiming to do every thing. The result’s some really wild stack diagrams.
This lack of capital will pressure at the moment’s knowledge corporations to give attention to what issues and ignore the remaining. Meaning fewer “good to have” options. Fewer splashy pivots. Fewer acquisitions that make us surprise “Why did they do this?”
With restricted funds, corporations must give attention to what they do finest and companion with different corporations for every thing else, moderately than attempting to deal with each knowledge downside in a single platform. This may result in the creation of the “best-in-class trendy knowledge stack” in 2023.
Because the chaos calms down and knowledge corporations give attention to their core USP, the winners of every class will begin to change into clear.
These instruments will even give attention to working even higher with one another. They’ll act as launch companions, aligning behind frequent requirements and pushing the trendy knowledge stack ahead. (A few examples from final yr are Fivetran’s Metadata API and dbt’s Semantic Layer, the place shut companions like us constructed integrations prematurely and celebrated the launch as a lot as Fivetran and dbt Labs.)
These partnerships and consolidation will make it simpler for consumers to decide on instruments and get began rapidly, a welcome change from how issues have been.

Tech corporations are dealing with new stress to chop prices and improve income in 2023. A method to do that is by specializing in their core features, as talked about above. One other approach is in search of out new prospects.
Guess what the most important untapped supply of information prospects is at the moment? Enterprise corporations with legacy, on-premise knowledge programs. To serve these new prospects, trendy knowledge stack corporations must begin supporting legacy instruments.
In 2023, the trendy knowledge stack will begin to combine with Oracle and SAP, the 2 enterprise knowledge behemoths.
This may increasingly sound controversial, nevertheless it’s already begun. The trendy knowledge stack began reaching into the on-prem, enterprise knowledge world over a yr in the past.
In October 2021, Fivetran acquired HVR, an enterprise knowledge replication device. Fivetran stated that this may enable it to “deal with the huge marketplace for modernizing analytics for operational knowledge related to ERP programs, Oracle databases, and extra”. This was the primary main transfer from a contemporary knowledge stack firm into the enterprise market.

These are six of the massive concepts that blew up within the knowledge world final yr and solely promise to get greater in 2023.

This was one of many massive traits from final yr’s article, so it’s not stunning that it’s nonetheless a scorching subject within the knowledge world. What was stunning, although, was how briskly the concepts of lively metadata and third-generation knowledge catalogs continued to develop.
In a significant shift from 2021, when these concepts had been new and few folks had been speaking about them, many corporations at the moment are competing to assert the class.
Take, for instance, Hevo Knowledge and Castor’s adoption of the “Knowledge Catalog 3.0” language. A couple of corporations have the tech to again up their speak. However just like the early days of the information mesh, when specialists and newbies alike appeared knowledgable in an area that was nonetheless being outlined, others don’t.
Final yr, analysts latched onto and amplified the thought of lively metadata and trendy knowledge catalogs.
After its new Market Information for Lively Metadata in 2021, Gartner went all in on lively metadata final yr. At its August convention, lively metadata starred as a key theme in Gartner’s keynotes, in addition to in what appeared like half of the convention’s talks.
G2 launched a brand new “Lively Metadata Administration” class in the midst of the yr, marking a “new technology of metadata”. They even referred to as this the “third section of…knowledge catalogs”, in line with this new “third-generation” or “3.0” language.
Equally, Forrester scrapped its Wave report on “Machine Studying Knowledge Catalogs” to make approach for “Enterprise Knowledge Catalogs for DataOps”, marking a significant shift of their thought of what a profitable knowledge catalog ought to appear like.
In the meantime, VCs continued to pump cash into metadata and cataloging — e.g. Alation’s $123M Sequence E, Knowledge.world’s $50M Sequence C, our $50M Sequence B, and Castor’s $23.5M Sequence A.


One of many largest indicators from this yr was within the new Forrester Wave report.
From 2021 to 2022, Forrester upended its Wave rankings. It moved the 2021 Leaders (Alation, IBM, and Collibra) to the underside and center tiers of its 2022 Wave report, and raised beforehand low and even unranked corporations (us, Knowledge.world, and Informatica) to change into the brand new Leaders.
It is a main signal that the market is beginning to separate trendy catalogs (e.g. lively metadata platforms, knowledge catalogs for DataOps, and many others.) from conventional knowledge catalogs.
Our prediction is that lively metadata platforms will substitute the “knowledge catalog” class in 2023.
The “knowledge catalog” is only a single use case of metadata: serving to customers perceive their knowledge belongings. However that hardly scratches the floor of what metadata can do.
Activating metadata holds the important thing to dozens of use instances like observability, price administration, remediation, high quality, safety, programmatic governance, optimized pipelines, and extra — all of that are already being actively debated within the knowledge world. Listed here are a couple of actual examples:
- Eventbridge event-based actions: Permits knowledge groups to create production-grade, event-driven metadata automations, like alerts when possession modifications or auto-tagging classifications.
- Trident AI: Makes use of the facility of GPT-3 to robotically create descriptions and READMEs for brand spanking new knowledge belongings, based mostly on metadata from earlier belongings.
- GitHub integration: Routinely creates an inventory of affected knowledge belongings throughout every GitHub pull request.
As the information world aligns on the significance of modernizing our metadata, we’ll see the rise of a definite lively metadata class, doubtless with a dominant lively metadata platform.

This began in August with Chad Sanderson’s publication on “The Rise of Knowledge Contracts”. He later adopted this up with a two-part technical information to knowledge contracts with Adrian Kreuziger. He then spoke about knowledge contracts on the Analytics Engineering Podcast — with us! (Shoutout to Chad, Tristan Useful, and Julia Schottenstein for an excellent chat.)
The core driver of information contracts is that engineers haven’t any incentive to create high-quality knowledge.
Due to the trendy knowledge stack, the individuals who create knowledge have been separated from the individuals who eat it. Because of this, we find yourself with GIGO knowledge programs — rubbish in, rubbish out.
The information contract goals to resolve this by creating an settlement between knowledge producers and customers. Knowledge producers decide to producing knowledge that adheres to sure guidelines — e.g. a set knowledge schema, SLAs round accuracy or completeness, and insurance policies on how the information can be utilized and adjusted.
After agreeing on the contract, knowledge customers can create downstream functions with this knowledge, assured that engineers gained’t unexpectedly change the information and break reside knowledge belongings.
After Chad Sanderson’s publication went reside, this dialog blew up. It unfold throughout Twitter and Substack, the place the information group argued whether or not knowledge contracts had been an vital dialog, frustratingly imprecise or self-evident, not really a tech downside, doomed to fail, or clearly a good suggestion. We hosted Twitter fights, created epic threads, and watched battle royales from a secure distance, popcorn in hand.

Whereas knowledge contracts are an vital challenge in their very own proper, they’re half of a bigger dialog about how to make sure knowledge high quality.
It’s no secret that knowledge is commonly outdated or incomplete or incorrect — the information group has been speaking about the right way to repair it for years. First we stated that metadata documentation was the answer, then it was knowledge product delivery requirements. Now the buzzword is knowledge contracts.
This isn’t to dismiss knowledge contracts, which will be the answer we’ve been ready for. However it appears extra doubtless that knowledge contracts shall be subsumed in a bigger development round knowledge governance.
In 2023, knowledge governance will begin shifting “left”, and knowledge requirements will change into a first-class citizen in orchestration instruments.
For many years, knowledge governance has been an afterthought. It’s typically dealt with by knowledge stewards, not knowledge producers, who create documentation lengthy after knowledge is created.
Nonetheless, we’ve not too long ago seen a shift to maneuver knowledge governance “left”, or nearer to knowledge producers. Which means whoever creates the information (normally a developer or engineer) should create documentation and test the information in opposition to pre-defined requirements earlier than it could possibly go reside.
Main instruments have not too long ago made modifications that assist this concept, and we anticipate to see much more within the coming yr:
- dbt’s yaml information and Semantic Layer, the place analytics engineers can create READMEs and outline metrics whereas making a dbt mannequin
- Airflow’s Open Lineage, which tracks metadata about jobs and datasets as DAGs execute
- Fivetran’s Metadata API, which gives metadata for knowledge synced by Fivetran connectors
- Atlan’s GitHub extension, which creates an inventory of downstream belongings that shall be affected by a pull request

Additionally referred to as a “metrics layer” or “enterprise layer”, the semantic layer is an concept that’s been floating across the knowledge world for a long time.
The semantic layer is a literal time period — it’s the “layer” in an information structure that makes use of “semantics” (phrases) that the enterprise person will perceive. As a substitute of uncooked tables with column names like “A000_CUST_ID_PROD”, knowledge groups construct a semantic layer and rename that column “Buyer”. Semantic layers disguise complicated code from enterprise customers whereas conserving it well-documented and accessible for knowledge groups.
In our earlier report, we talked about how corporations had been struggling to take care of constant metrics throughout complicated knowledge ecosystems. Final yr, we took a giant leap ahead.
In October 2022, dbt Labs made a giant splash at their annual convention by saying their new Semantic Layer.
This was a giant deal, spawning excited tweets, in-depth suppose items, and celebrations from companions like us.
The core idea behind dbt’s Semantic Layer: outline issues as soon as, use them wherever. Knowledge producers can now outline metrics in dbt, then knowledge customers can question these constant metrics in downstream instruments. No matter which BI device they use, analysts and enterprise customers can search for a stat in the midst of a gathering, assured that their reply shall be appropriate.
The Semantic Layer was an enormous step ahead for the trendy knowledge stack because it paves the best way for metrics to change into a first-class citizen.
Making metrics a part of knowledge transformation intuitively is smart. Making them a part of dbt — the dominant transformation device, which is already well-integrated with the trendy knowledge stack — is strictly what the semantic layer wanted to go from thought to actuality.

Since dbt’s Semantic Layer launched, progress has been pretty measured — partially as a result of this occurred lower than three months in the past.
It’s additionally as a result of altering the best way that individuals write metrics is laborious. Corporations can’t simply flip a swap and transfer to a semantic layer in a single day. The change will take time, doubtless years moderately than months.
In 2023, the primary set of Semantic Layer implementations will go reside.
Many knowledge groups have spent the final couple of months exploring the affect of this new know-how — experimenting with the Semantic Layer and considering by the right way to change their metrics frameworks.
This course of will get simpler as extra instruments within the trendy knowledge stack combine with the Semantic Layer. Seven instruments had been Semantic Layer–prepared at its launch (together with us, Hex, Mode, and Thoughtspot). Eight extra instruments had been Metrics Layer–prepared, an intermediate step to integrating with the Semantic Layer.

This concept is said to reverse ETL, one of many massive traits in final yr’s report.
In 2022, a number of the principal gamers in reverse ETL labored to redefine and develop their class. Their newest buzzword is “knowledge activation”, a brand new tackle the “buyer knowledge platform” (CDP).
A CDP combines knowledge from all buyer touchpoints (e.g. web site, electronic mail, social media, assist heart, and many others). An organization can then phase or analyze that knowledge, construct buyer profiles, and energy customized advertising. For instance, they will create an automatic electronic mail with a reduction code if somebody abandons their cart, or promote to individuals who have visited a particular web page on the web site and used the corporate’s reside chat.
The important thing thought right here is that CDPs are designed round utilizing knowledge, moderately than merely aggregating and storing it — and that is the place knowledge activation is available in. Because the argument goes, in a world the place knowledge is saved in a central knowledge platform, why do we want standalone CDPs? As a substitute, we may simply “activate” knowledge from the warehouse to deal with conventional CDP features and various use instances throughout the corporate.
At its core, knowledge activation is much like reverse ETL, however as a substitute of simply sending knowledge again to supply programs, you’re actively driving use instances with that knowledge.
We’ve been speaking about knowledge activation in numerous types for the final couple of years. Nonetheless, this concept of information activation as the brand new CDP took off in 2022.
For instance, Arpit Choudhury analyzed the area in April, Sarah Krasnik broke down the controversy in July, Priyanka Somrah included it as a knowledge class in August, and Luke Lin referred to as out knowledge activation in his 2023 knowledge predictions final month.
Partially, this development was brought on by advertising from former reverse ETL corporations, who now model themselves as knowledge activation merchandise. (These corporations nonetheless speak about reverse ETL, nevertheless it’s now a function inside their knowledge activation platform. Notably, Census has resisted this development, retaining “reverse ETL” throughout its website.)
For instance, Hightouch rebranded itself with a giant splash in April, dropping three blogs on knowledge activation in 5 days:
Partially, this will also be traced to the bigger debate round driving knowledge use instances and worth, moderately than specializing in knowledge infrastructure or stacks. As Benn Stancil put it, “Why has knowledge know-how superior a lot additional than worth an information crew gives?”
Partially, this was additionally an inevitable results of the trendy knowledge stack. Stacks like Snowflake + Hightouch have the identical knowledge and performance as a CDP, however they can be utilized throughout an organization moderately than for just one perform.

CDPs made sense previously. When it was troublesome to face up an information platform, having an out-of-the-box, completely personalized buyer knowledge platform for enterprise customers was a giant win.
Now, although, the world has modified, and corporations can arrange an information platform in below half-hour — one which not solely has buyer knowledge, but in addition all different vital firm knowledge (e.g. finance, product/customers, companions, and many others).
On the identical time, knowledge work has been consolidating across the trendy knowledge stack. Salesforce as soon as tried to deal with its personal analytics (referred to as Einstein Analytics). Now it has partnered with Snowflake, and Salesforce knowledge will be piped into Snowflake identical to another knowledge supply.
The identical factor has occurred for many SaaS merchandise. Whereas inner analytics was as soon as their upsell, they’re now realizing that it makes extra sense to maneuver their knowledge into the prevailing trendy knowledge ecosystem. As a substitute, their upsell is now syncing knowledge to warehouses by way of APIs.
On this new world, knowledge activation turns into very highly effective. The trendy knowledge warehouse plus knowledge activation will substitute not solely the CDP, but in addition all pre-built, specialised SaaS knowledge platforms.
With the trendy knowledge stack, knowledge is now created in specialised SaaS merchandise and piped into storage programs like Snowflake, the place it’s mixed with different knowledge and reworked within the API layer. Knowledge activation is then essential for piping insights again into the supply SaaS programs the place enterprise customers do their day by day work.
For instance, Snowflake acquired Streamlit, which permits folks to create pre-built templates and templates on prime of Snowflake. Reasonably than growing their very own analytics or counting on CDPs, instruments like Salesforce can now let their prospects sync knowledge to Snowflake and use a pre-built Salesforce app to investigate the information or do customized actions (like cleansing a lead listing with Clearbit) with one click on. The result’s the customization and user-friendliness of a CDP, mixed with the facility of recent cloud compute.

This concept got here from Zhamak Dehghani — first with two blogs in 2019, after which along with her O’Reilly e book in 2022.
The shortest abstract: deal with knowledge as a product, not a by-product. By driving knowledge product considering and making use of area pushed design to knowledge, you possibly can unlock important worth out of your knowledge. Knowledge must be owned by those that understand it finest.
There are 4 pillars to the information mesh:
- Area-oriented knowledge decentralization: Reasonably than letting knowledge reside in a central knowledge warehouse or lake, corporations ought to transfer knowledge nearer to the individuals who understand it finest. The advertising crew ought to personal web site knowledge, RevOps ought to personal finance knowledge, and so forth. Every area could be liable for its knowledge pipelines, documentation, high quality, and so forth, with assist from a centralized knowledge crew.
- Knowledge as a product: Knowledge groups ought to give attention to constructing reusable, reproducible belongings (with elementary product parts like SLAs) moderately than getting caught within the “service lure” of ad-hoc work.
- Self-service knowledge infrastructure: Reasonably than one central knowledge platform, corporations ought to have a versatile knowledge infrastructure platform the place every knowledge crew can create and eat its personal knowledge merchandise.
- Federated computational governance: Knowledge belongings must work collectively even when knowledge is distributed. Whereas area house owners ought to have autonomy over their knowledge and its localized requirements, there must also be a central “federation” of information leaders to create international guidelines and make sure the firm’s knowledge is wholesome.
The information mesh was in every single place in 2021. In 2022, it began to maneuver from summary thought to actuality.
The information mesh dialog has shifted from “What’s it?” to “How can we implement it?” As actual person tales grew in locations just like the Knowledge Mesh Studying Neighborhood, the implementation debate break up into two theories:
- Through crew buildings: Distributed, domain-based knowledge groups are liable for publishing knowledge merchandise, with assist and infrastructure from a central knowledge platforms crew.
- Through “knowledge as a product”: Knowledge groups are liable for creating knowledge merchandise — i.e. pushing knowledge governance to the “left”, nearer to knowledge producers moderately than customers.
In the meantime, corporations have began branding themselves across the knowledge mesh. Thus far, we’ve seen this with Starburst, Databricks, Oracle, Google Cloud, Dremio, Confluent, Denodo, Soda, lakeFS, and K2 View, amongst others.

4 years after it was created, we’re nonetheless within the early phases of the information mesh.
Although extra folks now imagine within the idea, there’s a scarcity of actual operational steerage about the right way to obtain an information mesh. Knowledge groups are nonetheless determining what it means to implement the information mesh, and the mesh tooling stack continues to be untimely. Whereas there’s been quite a lot of rebranding, we nonetheless don’t have a best-in-class reference structure of how an information mesh will be achieved.
In 2023, we predict that the primary wave of information mesh “implementations” will go reside, with “knowledge as a product” entrance and heart.
This yr, we’ll begin seeing an increasing number of actual knowledge mesh architectures — not the aspirational diagrams which have been floating round knowledge blogs for years, however actual architectures from actual corporations.
We additionally anticipate that the information world will begin to converge on a best-in-class reference structure and implementation technique for the information mesh. This may embody the next core parts:
- Metadata platform that may combine into developer workflows (e.g. Atlan’s APIs and GitHub integration)
- Knowledge high quality and testing (e.g. Nice Expectations, Monte Carlo)
- Git-like course of for knowledge producers to include testing, metadata administration, documentation, and many others. (e.g. dbt)
- All constructed across the identical central knowledge warehouse/lakehouse layer (e.g. Snowflake, Databricks)

Considered one of our massive traits from final yr, knowledge observability has held its personal and continued to develop alongside adjoining concepts like knowledge high quality and reliability.
All of those classes have grown considerably over the past yr with current corporations getting greater, new corporations going mainstream, and new instruments launching each month.
For instance, in firm information, Databand was acquired by IBM in July 2022. There have been additionally some main Sequence Ds (Cribl with $150M, Monte Carlo with $135M, Unravel with $50M) and Sequence Bs (Edge Delta with $63M, Manta with $35M) on this area.
In tooling information, Kensu launched a knowledge observability answer, Anomalo launched the Pulse dashboard for knowledge high quality, Monte Carlo created a knowledge reliability dashboard, Bigeye launched Metadata Metrics, AWS launched observability options into Amazon Glue 4.0, and Entanglement spun out one other firm targeted on knowledge observability.
Within the thought management area, Monte Carlo and Kensu revealed main books with O’Reilly about knowledge high quality and observability.
In a notable change, this area additionally noticed important open-source progress in 2022.
Datafold launched an open-source diff device, Acceldata open-sourced its knowledge platform and knowledge observability libraries, and Soda launched each its open-source Soda Core and enterprise Soda Cloud platforms.

Considered one of our open questions in final yr’s report was the place knowledge observability was heading — in direction of its personal class, or merging with one other class like knowledge reliability or lively metadata.
We expect that knowledge observability and high quality will converge in a bigger “knowledge reliability” class centered round making certain high-quality knowledge.
This may increasingly appear to be a giant change, nevertheless it wouldn’t be the primary time this class has modified. It’s been attempting to choose the identify for a number of years.
Acceldata began with logs observability however now manufacturers itself as an information observability device. After beginning within the knowledge high quality area, Soda is now a significant participant in knowledge observability. Datafold began with knowledge diffs, however now calls itself an information reliability platform. The listing goes on and on.
As these corporations compete to outline and personal the class, we’ll proceed to see extra confusion within the brief time period. Nonetheless, we’re seeing early indicators that this can begin to quiet down into one class within the close to future.

It feels attention-grabbing to welcome 2023 as knowledge practitioners. Whereas there’s quite a lot of uncertainty looming within the air (uncertainty is the brand new certainty!), we’re additionally a bit relieved.
2021 and 2022 had been absurd years within the historical past of the information stack.
The hype was loopy, new instruments had been launching on daily basis, knowledge folks had been always being poached by knowledge startups, and VCs had been throwing cash at each knowledge practitioner who even hinted at constructing one thing. The “trendy knowledge stack” was lastly cool, and the information world had all the cash and assist and acknowledgment it wanted.
At Atlan, we began as a knowledge crew ourselves. As individuals who have been in knowledge for over a decade, this was a wild time. Progress is usually made in a long time, not years. However within the final three years, the trendy knowledge stack has grown and matured as a lot as within the decade earlier than.
It was thrilling… but we ended up asking ourselves existential questions greater than as soon as. Is this contemporary knowledge stack factor actual, or is it simply hype fueled by VC cash? Are we residing in an echo chamber? The place are the information practitioners on this complete factor?
Whereas this hype and frenzy led to nice tooling, it was in the end unhealthy for the information world.
Confronted by a sea of buzzwords and merchandise, knowledge consumers typically ended up confused and will spend extra time attempting to get the correct stack than really utilizing it.
Let’s be clear — the objective of the information area is in the end to assist corporations leverage knowledge. Instruments are vital for this. However they’re in the end an enabler, not the objective.
As this hype begins to die down and the trendy knowledge stack begins to stabilize, we have now the possibility to take the tooling progress we’ve made and translate it into actual enterprise worth.
We’re at some extent the place knowledge groups aren’t preventing to arrange the correct infrastructure. With the trendy knowledge stack, establishing an information ecosystem is faster and simpler than ever. As a substitute, knowledge groups are preventing to show their value and get extra outcomes out of much less time and assets.
Now that corporations can’t simply throw cash round, their choices must be focused and data-driven. Which means knowledge is extra vital than ever, and knowledge groups are in a novel place to offer actual enterprise worth.
However to make this occur, knowledge groups must lastly work out this “worth” query.
Now that we’ve bought the trendy knowledge stack down, it’s time to determine the trendy knowledge tradition stack. What does an excellent knowledge crew appear like? How ought to it work with enterprise? How can it drive probably the most affect within the least time?
These are robust questions, and there gained’t be any fast fixes. But when we are able to crack the secrets and techniques to a greater knowledge tradition, we are able to lastly create dream knowledge groups — ones that won’t simply assist their corporations survive throughout the subsequent 12–18 months, however propel them to new heights within the coming a long time.
Obtain this text as a PDF right here.
Prepared for spicy takes on these traits? We’re internet hosting a panel of information superstars (Bob Muglia, Barr Moses, Benn Stancil, Douglas Laney, and Tristan Useful) to debate the way forward for knowledge in 2023. Save your spot for the following Nice Knowledge Debate.
This content material was co-written with Christine Garcia (Director of Content material).
Header picture: Nicholas Cappello on Unsplash