2022, Data Warehouse and Metadata

Biggest Trends in Data Visualization Taking Shape in 2022

Smart Data Collective

OCTOBER 13, 2021

Some solutions provide read and write access to any type of source and information, advanced integration, security capabilities and metadata management that help achieve virtual and high-performance Data Services in real-time, cache or batch mode. How does Data Virtualization complement Data Warehousing and SOA Architectures?

Visualization

Visualization Cost-Benefit Big Data Prescriptive Analytics

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. Iceberg basics Iceberg is an open table format designed for large analytic workloads.

Data Warehouse

Data Warehouse Snapshot Metadata Cost-Benefit

AWS Lake Formation 2022 year in review

AWS Big Data

JANUARY 31, 2023

In this post, we are excited to summarize the features that the AWS Glue Data Catalog, AWS Glue crawler, and Lake Formation teams delivered in 2022. Whether you are a data platform builder, data engineer, data scientist, or any technology leader interested in data lake solutions, this post is for you.

Data Lake

Data Lake Data Governance Data Architecture Machine Learning

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Cloudera

DECEMBER 16, 2022

We are pleased to announce that Cloudera has been named a Leader in the 2022 Gartner ® Magic Quadrant for Cloud Database Management Systems. Cloudera has long had the capabilities of a data lakehouse, if not the label. 4-Ready for modern data fabric architectures. 4-Ready for modern data fabric architectures.

Management

Management Metadata Machine Learning Data Lake

What is a data architect? Skills, salaries, and how to become a data framework master

CIO Business Intelligence

OCTOBER 13, 2023

Data architect Armando Vázquez identifies eight common types of data architects: Enterprise data architect: These data architects oversee an organization’s overall data architecture, defining data architecture strategy and designing and implementing architectures. Are data architects in demand?

Data Architecture

Data Architecture Data Warehouse Statistics Visualization

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

AWS Big Data

JULY 27, 2023

Amazon Redshift is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, easy, and secure analytics at scale. Tens of thousands of customers rely on Amazon Redshift to analyze exabytes of data and run complex analytical queries, making it the widely used cloud data warehouse.

Data Warehouse

Data Warehouse Analytics Metadata Dashboards

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

AWS Big Data

APRIL 19, 2023

Many customers run big data workloads such as extract, transform, and load (ETL) on Apache Hive to create a data warehouse on Hadoop. We split the solution into two primary components: generating Spark job metadata and running the SQL on Amazon EMR. The script generates a metadata JSON file for each step.

Metadata

Metadata Data Lake Testing Consulting

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

AWS Big Data

APRIL 3, 2024

Data engineers use Apache Iceberg because it’s fast, efficient, and reliable at any scale and keeps records of how datasets change over time. Apache Iceberg offers integrations with popular data processing frameworks such as Apache Spark, Apache Flink, Apache Hive, Presto, and more.

Data Lake

Data Lake Snapshot Metadata Data Architecture

The Future of the Data Lakehouse – Open

CIO Business Intelligence

JUNE 23, 2022

These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

7 enterprise data strategy trends

CIO Business Intelligence

NOVEMBER 22, 2022

External data sharing gets strategic Data sharing between business partners is becoming far easier and much more cooperative, observes Mike Bechtel, chief futurist at business advisory firm Deloitte Consulting. The fabric, especially at the active metadata level, is important, Saibene notes.

Data Strategy

Data Strategy Strategy Enterprise Consulting

Use Apache Iceberg in a data lake to support incremental data processing

AWS Big Data

MARCH 2, 2023

Apache Iceberg is an open table format for very large analytic datasets, which captures metadata information on the state of datasets as they evolve and change over time. Iceberg has become very popular for its support for ACID transactions in data lakes and features like schema and partition evolution, time travel, and rollback.

Data Lake

Data Lake Data Processing Metadata Snapshot

Achieve your AI goals with an open data lakehouse approach

IBM Big Data Hub

OCTOBER 4, 2023

Why does AI need an open data lakehouse architecture? from 2022 to 2026. Another IDC study showed that while 2/3 of respondents reported using AI-driven data analytics, most reported that less than half of the data under management is available for this type of analytics. All of this supports the use of AI.

Data Lake

Data Lake Metadata Data Warehouse Cost-Benefit

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

Data Lake

Data Lake Data Warehouse Machine Learning Data-driven

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Please vote before May 11! 2022 DBTA Reader’s Choice Awards

erwin

APRIL 27, 2022

Please help us keep our #1 position in 2022. In data warehousing, the data is extracted and transported from production database(s) into a data warehouse for reporting and analysis. Best Data Modeling Solution (erwin Data Modeler). Read more about erwin® Data Modeler by Quest.

Data Governance

Data Governance Data Warehouse Metadata Digital Transformation

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

AWS Big Data

NOVEMBER 15, 2023

Bayerische Motoren Werke AG (BMW) is a motor vehicle manufacturer headquartered in Germany with 149,475 employees worldwide and the profit before tax in the financial year 2022 was € 23.5 Data providers and consumers are the two fundamental users of a CDH dataset. The difference lies in when and where data transformation takes place.

Dashboards

Dashboards Analytics Metadata Data Warehouse

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

With Cloudera’s vision of hybrid data , enterprises adopting an open data lakehouse can easily get application interoperability and portability to and from on premises environments and any public cloud without worrying about data scaling. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Augmented data management: Data fabric versus data mesh

IBM Big Data Hub

APRIL 27, 2022

Gartner defines a data fabric as “a design concept that serves as an integrated layer of data and connecting processes. The data fabric architectural approach can simplify data access in an organization and facilitate self-service data consumption at scale. 2 “Exposing The Data Mesh Blind Side ” Forrester.

Management

Management Metadata Data Architecture Data Lake

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

This is the promise of the modern data lakehouse architecture. analyst Sumit Pal, in “Exploring Lakehouse Architecture and Use Cases,” published January 11, 2022: “Data lakehouses integrate and unify the capabilities of data warehouses and data lakes, aiming to support AI, BI, ML, and data engineering on a single platform.”

Metadata

Metadata Machine Learning Unstructured Data Data Lake

Regeneron turns to IT to accelerate drug discovery

CIO Business Intelligence

NOVEMBER 4, 2022

MetaBio, which received a 2022 CIO 100 Award , provides a single source for datasets in a unified format, enabling researchers to quickly extract information about various therapeutic functions without having to worry about how to prepare or find the data. At the data pipeline level, scientists use Apigee, Airflow, NiFi, and Kafka.

Data Lake

Data Lake IT Experimentation Data-driven

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

This view is used to identify patterns and trends in customer behavior, which can inform data-driven decisions to improve business outcomes. In 2022, AWS commissioned a study conducted by the American Productivity and Quality Center (APQC) to quantify the Business Value of Customer 360.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

OCTOBER 11, 2022

A recent VentureBeat article , “4 AI trends: It’s all about scale in 2022 (so far),” highlighted the importance of scalability. But it isn’t just aggregating data for models. Data needs to be prepared and analyzed. Different data types need different types of analytics – real-time, streaming, operational, data warehouses.

Data Science

Data Science Snapshot Data Warehouse Metadata

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

AWS Big Data

JULY 21, 2023

This leads to having data across many instances of data warehouses and data lakes using a modern data architecture in separate AWS accounts. We recently announced the integration of Amazon Redshift data sharing with AWS Lake Formation. See Managing LF-Tags for metadata access control for more details.

Data Lake

Data Lake Data Warehouse Marketing Management

10 Years Later: Who’s the GOAT of Data Catalogs?

Alation

DECEMBER 15, 2022

June 2017: Dresner Advisory Services names Alation the #1 data catalog in its inaugural Data Catalog End-User Market Study. August 2017: Alation debuts as a leader in the Gartner MQ for Metadata Management Solutions. August 2018: Gartner names Alation a 2X Leader in the MQ for Metadata Management Solutions.

Metadata

Metadata Data Governance Data Quality Marketing

Choosing an open table format for your transactional data lake on AWS

AWS Big Data

JUNE 9, 2023

A modern data architecture enables companies to ingest virtually any type of data through automated pipelines into a data lake, which provides highly durable and cost-effective object storage at petabyte or exabyte scale. Metadata table s eliminate slow S3 file listing operations.

Data Lake

Data Lake Metadata Statistics Optimization

Build a real-time GDPR-aligned Apache Iceberg data lake

AWS Big Data

FEBRUARY 24, 2023

AWS contributed the Apache Iceberg integration with the AWS Glue Data Catalog , which enables you to use open-source data computation engines like Apache Spark with Iceberg on AWS Glue. In 2022, Amazon Athena announced support of Iceberg , enabling transaction queries on S3 objects. Choose Add database.

Data Lake

Data Lake Metadata Testing Data Warehouse

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

CIO Business Intelligence

SEPTEMBER 6, 2022

As a result, Pimblett now runs the organization’s data warehouse, analytics, and business intelligence. Establishing a clear and unified approach to data. In a first test of the technology, he used Alation to catalog a subset of Very’s data held in an old Teradata database. We’re a Power BI shop,” he says. “I

IT

IT Forecasting Data Lake Data Warehouse

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

Alation

AUGUST 11, 2022

The data warehouse and analytical data stores moved to the cloud and disaggregated into the data mesh. Today, the brightest minds in our industry are targeting the massive proliferation of data volumes and the accompanying but hard-to-find value locked within all that data. Architectures became fabrics.

Metadata

Metadata Data Warehouse Data Quality Data Lake

How data stores and governance impact your AI initiatives

IBM Big Data Hub

OCTOBER 12, 2023

Among the tasks necessary for internal and external compliance is the ability to report on the metadata of an AI model. Metadata includes details specific to an AI model such as: The AI model’s creation (when it was created, who created it, etc.) Learn more about IBM watsonx 1.

Cost-Benefit

Cost-Benefit Metadata Data Governance Modeling

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

July brings summer vacations, holiday gatherings, and for the first time in two years, the return of the Massachusetts Institute of Technology (MIT) Chief Data Officer symposium as an in-person event. A key area of focus for the symposium this year was the design and deployment of modern data platforms. What is a data fabric?

Data Lake

Data Lake Data Architecture Data-driven Data Warehouse

Data Mesh vs. Data Fabric: A Love Story

Alation

JANUARY 13, 2022

Spoiler alert: data fabric and data mesh are independent design concepts that are, in fact, quite complementary. Data fabric has captured most of the limelight; it focuses on the technologies required to support metadata-driven use cases across hybrid and multi-cloud environments. Gartner on Data Fabric.

Data Lake

Data Lake Metadata Data-driven Data Governance

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

AWS Big Data

MARCH 9, 2023

Thousands of customers rely on Amazon Redshift to build data warehouses to accelerate time to insights with fast, simple, and secure analytics at scale and analyze data from terabytes to petabytes by running complex analytical queries. Data loading is one of the key aspects of maintaining a data warehouse.

Slice and Dice

Slice and Dice Data Warehouse Metrics Metadata

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

CSP was recently recognized as a leader in the 2022 GigaOm Radar for Streaming Data Platforms report. The DevOps/app dev team wants to know how data flows between such entities and understand the key performance metrics (KPMs) of these entities. Meet Laila, a very opinionated practitioner of Cloudera Stream Processing.

Data Lake

Data Lake Manufacturing Metadata Dashboards

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera

JANUARY 19, 2024

Log in to the Cloudera Data Warehouse service as DWAdmin. Go to the virtual warehouse tab, locate the Virtual Warehouse on which you want to enable this feature, and click “edit.” Log in to the data warehouse service as DWAdmin. Log in to the data warehouse service as DWAdmin.

Data Warehouse

Data Warehouse Data Processing Optimization Modeling

How To Liberalize Data Access To Empower Data Users

BI-Survey

FEBRUARY 23, 2023

This confirms that the opening statement has reached the top of organizations and that the consideration and development of a data culture should be anchored in the data strategy. Global survey This study was based on the findings of a worldwide online survey conducted in July and August 2022.

Metadata

Metadata Data-driven Data Strategy Strategy

How Fifth Third Bank Democratizes Data Access via a Data Mesh with Alation and Snowflake

Alation

JUNE 7, 2022

They took their centralized architecture and are creating a decentralized, cloud-native and domain-centric data environment. The Snowflake Data Cloud serves as their central repository for data and analytics, and their Alation data catalog now provides the metadata management capabilities to all data citizens.

Metadata

Metadata Data Strategy Strategy Data Governance

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

AWS Big Data

MARCH 3, 2023

Athena supports reading native Delta tables and therefore we can read the data successfully even though the Data Catalog shows only a single array column. If you need the individual column-level metadata to be available in the Data Catalog, run an AWS Glue crawler periodically to keep the AWS Glue metadata updated.

Data Lake

Data Lake Dashboards Metrics Metadata

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

Alation

SEPTEMBER 7, 2021

According to Entrepreneur , Gartner predicts, “through 2022, only 20% of organizations investing in information governance will succeed in scaling governance for digital business.” This survey result shows that organizations need a method to help them implement Data Governance at scale. Two problems arise.

Data Governance

Data Governance Risk Data Quality Dashboards

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

Using bad data, or the incorrect data can generate devastating results. between 2022 and 2029. And the rise in data valuation has been compared to that of oil during the 19th century. The comparison makes sense because, like petroleum, data has enormous potential. This is where a reverse ETL process is needed.

Data Governance

Data Governance Marketing Machine Learning Sales

Top Takeaways from the Gartner® Innovation Insight: Data Security Posture Management

Laminar Security

MAY 3, 2023

According to our recent State of Cloud Data Security Report 2023 , 77% of organizations experienced a cloud data breach in 2022. That’s particularly concerning considering that 60% of worldwide corporate data was stored in the cloud during that same period.

Management

Management Risk Risk Management Data Processing

Replacing Oracle Discoverer: The Smart Way

Jet Global

MAY 27, 2021

Internet Explorer 11 on Windows 10 support will end June 2022. While it has many advantages, it’s not built to be a transactional reporting tool for day-to-day ad hoc analysis or easy drilling into data details. Java Applets support has ended on all modern browsers. Chrome: September 2015. FireFox: September 2018. Hubble Equivalent.

Reporting

Reporting Cost-Benefit Dashboards Finance

5 Data Governance Mistakes to Avoid

Alation

APRIL 25, 2023

Using bad data, or the incorrect data can generate devastating results. between 2022 and 2029. And the rise in data valuation has been compared to that of oil during the 19th century. The comparison makes sense because, like petroleum, data has enormous potential. This is where a reverse ETL process is needed.

Data Governance

Data Governance Marketing Machine Learning Sales

5 Reasons to Upgrade to Latest Version of Angles for Oracle

Jet Global

JUNE 9, 2022

This integrated solution helps you unlock your enterprise data and gain actionable insights so you can act decisively in an uncertain and quickly changing world. was released in the first quarter of 2022. Seamless Integration with Cloud Data Warehouse Targets. Cloud data replication. Extend or Create New View.

Operational Reporting

Operational Reporting Reporting Data Warehouse Cost-Benefit

Biggest Trends in Data Visualization Taking Shape in 2022

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Webinars

Trending Sources

AWS Lake Formation 2022 year in review

Webinars

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

What is a data architect? Skills, salaries, and how to become a data framework master

Near-real-time analytics using Amazon Redshift streaming ingestion with Amazon Kinesis Data Streams and Amazon DynamoDB

Accelerate HiveQL with Oozie to Spark SQL migration on Amazon EMR

Use Apache Iceberg in your data lake with Amazon S3, AWS Glue, and Snowflake

The Future of the Data Lakehouse – Open

7 enterprise data strategy trends

Use Apache Iceberg in a data lake to support incremental data processing

Achieve your AI goals with an open data lakehouse approach

The Future of the Data Lakehouse – Open

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Please vote before May 11! 2022 DBTA Reader’s Choice Awards

BMW Cloud Efficiency Analytics powered by Amazon QuickSight and Amazon Athena

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Augmented data management: Data fabric versus data mesh

The Modern Data Lakehouse: An Architectural Innovation

Regeneron turns to IT to accelerate drug discovery

Create an end-to-end data strategy for Customer 360 on AWS

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Implement tag-based access control for your data lake and Amazon Redshift data sharing with AWS Lake Formation

10 Years Later: Who’s the GOAT of Data Catalogs?

Choosing an open table format for your transactional data lake on AWS

Build a real-time GDPR-aligned Apache Iceberg data lake

The Very Group adopts a data catalog to better organize and leverage its online retail capabilities

Fabrics, Meshes & Stacks, oh my! Q&A with Sanjeev Mohan

How data stores and governance impact your AI initiatives

Demystifying Modern Data Platforms

Data Mesh vs. Data Fabric: A Love Story

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

Turning Streams Into Data Products

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

How To Liberalize Data Access To Empower Data Users

How Fifth Third Bank Democratizes Data Access via a Data Mesh with Alation and Snowflake

Build incremental data pipelines to load transactional data changes using AWS DMS, Delta 2.0, and Amazon EMR Serverless

Top 10 Reasons for Alation with Snowflake: Reduce Risk with Active Data Governance

5 Data Governance Mistakes to Avoid

Top Takeaways from the Gartner® Innovation Insight: Data Security Posture Management

Replacing Oracle Discoverer: The Smart Way

5 Data Governance Mistakes to Avoid

5 Reasons to Upgrade to Latest Version of Angles for Oracle

Stay Connected