Data Lake, Data Transformation and Software

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

AWS Big Data

OCTOBER 30, 2024

Amazon DataZone now launched authentication supports through the Amazon Athena JDBC driver, allowing data users to seamlessly query their subscribed data lake assets via popular business intelligence (BI) and analytics tools like Tableau, Power BI, Excel, SQL Workbench, DBeaver, and more. Connect with him on LinkedIn.

Visualization

Visualization Data Lake Testing Data Governance

MLOps and DevOps: Why Data Makes It Different

O'Reilly on Data

OCTOBER 19, 2021

This is both frustrating for companies that would prefer making ML an ordinary, fuss-free value-generating function like software engineering, as well as exciting for vendors who see the opportunity to create buzz around a new category of enterprise software. All ML projects are software projects.

IT

IT Testing Experimentation Software

Bridging the gap between mainframe data and hybrid cloud environments

CIO Business Intelligence

FEBRUARY 27, 2025

A high hurdle many enterprises have yet to overcome is accessing mainframe data via the cloud. Connecting mainframe data to the cloud also has financial benefits as it leads to lower mainframe CPU costs by leveraging cloud computing for data transformations.

Metadata

Metadata Data Lake Cost-Benefit Forecasting

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

AWS Big Data

AUGUST 1, 2023

In the world of software engineering and development, organizations use project management tools like Atlassian Jira Cloud. Companies often take a data lake approach to their analytics, bringing data from many different systems into one place to simplify how the analytics are done. Search for the Jira Cloud connector.

Data Lake

Data Lake Data Transformation Data-driven Cost-Benefit

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

AWS Big Data

JULY 20, 2023

With data becoming the driving force behind many industries today, having a modern data architecture is pivotal for organizations to be successful. In this post, we describe Orca’s journey building a transactional data lake using Amazon Simple Storage Service (Amazon S3), Apache Iceberg, and AWS Analytics.

Data Lake

Data Lake Analytics Snapshot Data Quality

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

The DataFrame code generation now extends beyond AWS Glue DynamicFrame to support a broader range of data processing scenarios. Your generated jobs can use a variety of data transformations, including filters, projections, unions, joins, and aggregations, giving you the flexibility to handle complex data processing requirements.

Data Integration

Data Integration Visualization Data Processing Big Data

Build a data lake with Apache Flink on Amazon EMR

AWS Big Data

JANUARY 27, 2023

The Amazon EMR Flink CDC connector reads the binlog data and processes the data. Transformed data can be stored in Amazon S3. We use the AWS Glue Data Catalog to store the metadata such as table schema and table location. Verify all table metadata is stored in the AWS Glue Data Catalog.

Data Lake

Data Lake Metadata Business Analysis Data-driven

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

With the ability to browse metadata, you can understand the structure and schema of the data source, identify relevant tables and fields, and discover useful data assets you may not be aware of. About the Authors Chiho Sugimoto is a Cloud Support Engineer on the AWS Big Data Support team.

Visualization

Visualization Data Processing Testing Publishing

How to modernize data lakes with a data lakehouse architecture

IBM Big Data Hub

JULY 5, 2023

Data Lakes have been around for well over a decade now, supporting the analytic operations of some of the largest world corporations. Such data volumes are not easy to move, migrate or modernize. The challenges of a monolithic data lake architecture Data lakes are, at a high level, single repositories of data at scale.

Data Lake

Data Lake Metadata Cost-Benefit Data Warehouse

Introducing Amazon Q data integration in AWS Glue

AWS Big Data

APRIL 30, 2024

Amazon Q Developer can also help you connect to third-party, software as a service (SaaS), and custom sources. Amazon Q Developer can now generate complex data integration jobs with multiple sources, destinations, and data transformations. He is responsible for building software artifacts to help customers.

Data Integration

Data Integration Data Lake Data Warehouse Software

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

NOVEMBER 29, 2023

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake

Data Lake Management Metrics Data Warehouse

How the BMW Group analyses semiconductor demand with AWS Glue

AWS Big Data

APRIL 26, 2023

To enable this use case, we used the BMW Group’s cloud-native data platform called the Cloud Data Hub. In 2019, the BMW Group decided to re-architect and move its on-premises data lake to the AWS Cloud to enable data-driven innovation while scaling with the dynamic needs of the organization.

Forecasting

Forecasting Manufacturing Data Lake Big Data

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

AWS Big Data

JANUARY 17, 2024

With Amazon EMR 6.15, we launched AWS Lake Formation based fine-grained access controls (FGAC) on Open Table Formats (OTFs), including Apache Hudi, Apache Iceberg, and Delta lake. Many large enterprise companies seek to use their transactional data lake to gain insights and improve decision-making.

Data Lake

Data Lake Snapshot Big Data Data-driven

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

AWS Big Data

MAY 9, 2023

For workloads such as data transforms, joins, and queries, you can use G.1X With exponentially growing data sources and data lakes, customers want to run more data integration workloads, including their most demanding transforms, aggregations, joins, and queries. 1X (1 DPU) and G.2X DPU-hour ($) G.2X

Data Lake

Data Lake Cost-Benefit Data Integration Data Transformation

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

AUGUST 9, 2024

These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3). We started with 115 dc2.large

Data Lake

Data Lake Analytics Data Warehouse Data-driven

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Lay the groundwork now for advanced analytics and AI

CIO Business Intelligence

AUGUST 3, 2023

In response, Lenovo launched a new line of entry-level gaming laptops and desktops it now brands as Lenovo LOQ that caters to a new gamer’s first foray into gaming, says Girish Hoogar, global head of engineering for Lenovo’s cloud and software business in its Intelligent Devices Group.

Analytics

Analytics Data Lake Metadata Cost-Benefit

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

In this post, we show you how to establish the data ingestion pipeline between Google Analytics 4, Google Sheets, and an Amazon Redshift Serverless workgroup. With Amazon AppFlow, you can run data flows at nearly any scale and at the frequency you chooseon a schedule, in response to a business event, or on demand.

Analytics

Analytics Data Warehouse Big Data Metrics

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). Iceberg is a 100% open table format, developed through the Apache Software Foundation , and helps users avoid vendor lock-in. We selected change data capture as our first use case on Iceberg.

Data Lake

Data Lake Data Warehouse Data Architecture Metadata

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

dbt allows data teams to produce trusted data sets for reporting, ML modeling, and operational workflows using SQL, with a simple workflow that follows software engineering best practices like modularity, portability, and continuous integration/continuous development (CI/CD). The Open Data Lakehouse . Introduction.

Data Warehouse

Data Warehouse Data Transformation Machine Learning Data Lake

Data’s dark secret: Why poor quality cripples AI and growth

CIO Business Intelligence

APRIL 8, 2025

Comparison of modern data architectures : Architecture Definition Strengths Weaknesses Best used when Data warehouse Centralized, structured and curated data repository. Inflexible schema, poor for unstructured or real-time data. Data lake Raw storage for all types of structured and unstructured data.

Data Quality

Data Quality Data-driven Key Performance Indicator Metadata

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

AWS Big Data

MARCH 3, 2023

It has been well published since the State of DevOps 2019 DORA Metrics were published that with DevOps, companies can deploy software 208 times more often and 106 times faster, recover from incidents 2,604 times faster, and release 7 times fewer defects. For users that require a unified view of software quality, this is unacceptable.

Software

Software Data Lake Testing Cost-Benefit

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

It is comprised of commodity cloud object storage, open data and open table formats, and high-performance open-source query engines. To help organizations scale AI workloads, we recently announced IBM watsonx.data , a data store built on an open data lakehouse architecture and part of the watsonx AI and data platform.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

AWS Big Data

JULY 10, 2024

This allows data analysts and data scientists to rapidly construct the necessary data preparation steps to meet their business needs. We use the new data preparation authoring capabilities to create recipes that meet our specific business needs for data transformations.

Interactive

Interactive Visualization Data Integration Statistics

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

AWS Big Data

AUGUST 19, 2024

In today’s data-driven world, the ability to seamlessly integrate and utilize diverse data sources is critical for gaining actionable insights and driving innovation. Refer to Editing AWS Glue managed data transform nodes for more information. For more information on AWS Glue, visit AWS Glue.

Analytics

Analytics Data-driven Data Integration Data Lake

An AI Chat Bot Wrote This Blog Post …

DataKitchen

DECEMBER 9, 2022

Observability in DataOps refers to the ability to monitor and understand the performance and behavior of data-related systems and processes, and to use that information to improve the quality and speed of data-driven decision making. Query> Is DataOps something that can be solved with software or is it more of a people process?

Machine Learning

Machine Learning Data-driven Optimization Data Analytics

Connect your data for faster decisions with AWS

AWS Big Data

NOVEMBER 7, 2023

Second, organizations still need transformations like cleansing, deduplication, and combining datasets for analysis and machine learning (ML). For these, AWS Glue provides fast, scalable data transformation. Prior to his current role, he was VP of Analytics at AWS, where he worked across the entire AWS database portfolio.

Dashboards

Dashboards Data-driven Data Integration Data Lake

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Ontotext

FEBRUARY 12, 2024

For many organizations, a centralized data platform will fall short as it gives data teams much less autonomy over managing increasingly diverse and voluminous datasets. A centralized data engineering team focuses on building a governed self-serviced infrastructure, while domain teams use the services to build full-stack data products.

Data-driven

Data-driven Data Lake Data Quality Business Objectives

Unlock scalable analytics with AWS Glue and Google BigQuery

AWS Big Data

OCTOBER 27, 2023

Efficiency : Data transformation tasks that previously took weeks or months can now be accomplished within minutes, optimizing efficiency. Anshul Sharma is a Software Development Engineer in AWS Glue Team. Cost efficiency : Building and maintaining custom connectors can be expensive.

Analytics

Analytics Visualization Data Integration Cost-Benefit

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

JULY 18, 2022

Automatically tracking data lineage across queries executed in any language. To ensure you can deliver on this world-changing vision of data, Alation helps you maximize the value of your data lake with integrations to the Unity catalog. The Power of Partnership to Accelerate Data Transformation.

ROI

ROI Metadata Data Lake Digital Transformation

How to use foundation models and trusted governance to manage AI workflow risk

IBM Big Data Hub

OCTOBER 16, 2023

How to scale AL and ML with built-in governance A fit-for-purpose data store built on an open lakehouse architecture allows you to scale AI and ML while providing built-in governance tools. A data store lets a business connect existing data with new data and discover new insights with real-time analytics and business intelligence.

Risk

Risk Modeling Management Metadata

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

AWS Big Data

APRIL 25, 2024

In the era of data, organizations are increasingly using data lakes to store and analyze vast amounts of structured and unstructured data. Data lakes provide a centralized repository for data from various sources, enabling organizations to unlock valuable insights and drive data-driven decision-making.

Optimization

Optimization Data Lake Cost-Benefit Reporting

Tackling AI’s data challenges with IBM databases on AWS

IBM Big Data Hub

MARCH 14, 2024

It seamlessly integrates with Amazon RDS for Db2, watsonx.data SaaS, and other IBM and AWS services like IBM data fabric, Amazon S3, Amazon EMR and AWS Glue. This allows you to scale all analytics and AI workloads across the enterprise with trusted data. 

Cost-Benefit

Cost-Benefit Metadata Optimization Management

Exploring the AI and data capabilities of watsonx

IBM Big Data Hub

JULY 17, 2023

IBM software products are embedding watsonx capabilities across digital labor, IT automation, security, sustainability, and application modernization to help unlock new levels of business value for clients. Our approach to an open data lakehouse architecture combines the best of IBM with the best of open source.

Machine Learning

Machine Learning Data Warehouse Modeling Cost-Benefit

Building Better Data Models to Unlock Next-Level Intelligence

Sisense

MAY 11, 2021

The reasons for this are simple: Before you can start analyzing data, huge datasets like data lakes must be modeled or transformed to be usable. According to a recent survey conducted by IDC , 43% of respondents were drawing intelligence from 10 to 30 data sources in 2020, with a jump to 64% in 2021!

Modeling

Modeling Big Data IoT Data Warehouse

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

AWS Big Data

APRIL 5, 2023

Showpad also struggled with data quality issues in terms of consistency, ownership, and insufficient data access across its targeted user base due to a complex BI access process, licensing challenges, and insufficient education. The company also used the opportunity to reimagine its data pipeline and architecture.

Dashboards

Dashboards Reporting Cost-Benefit Visualization

CIO 100 Award winners drive business results with IT

CIO Business Intelligence

AUGUST 7, 2024

The company started its New Analytics Era initiative by migrating its data from outdated SQL servers to a modern AWS data lake. It then built a cutting-edge cloud-based analytics platform, designed with an innovative data architecture. There is no more waiting around for quality data.

IT

IT Insurance Cost-Benefit Testing

What is a Data Pipeline?

Jet Global

MAY 9, 2024

The key components of a data pipeline are typically: Data Sources : The origin of the data, such as a relational database , data warehouse, data lake , file, API, or other data store. This can include tasks such as data ingestion, cleansing, filtering, aggregation, or standardization.

Data Lake

Data Lake Data Warehouse Business Intelligence Machine Learning

What is Data Mapping?

Jet Global

FEBRUARY 23, 2024

This field guide to data mapping will explore how data mapping connects volumes of data for enhanced decision-making. Why Data Mapping is Important Data mapping is a critical element of any data management initiative, such as data integration, data migration, data transformation, data warehousing, or automation.

Data Warehouse

Data Warehouse Reporting Data Transformation Visualization

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Jet Global

OCTOBER 1, 2024

Trino allows users to run ad hoc queries across massive datasets, making real-time decision-making a reality without needing extensive data transformations. This is particularly valuable for teams that require instant answers from their data. Data Lake Analytics: Trino doesn’t just stop at databases.

Dashboards

Dashboards Data Lake Reporting Cost-Benefit

Introducing the HubSpot connector for AWS Glue

AWS Big Data

DECEMBER 2, 2024

Most companies have adopted a diverse set of software as a service (SaaS) platforms to support various applications. The rapid adoption has enabled them to quickly streamline operations, enhance collaboration, and gain more accessible, scalable solutions for managing their critical data and workflows. Choose Create database.

Data Lake

Data Lake Testing Data Integration Metadata

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

AWS Big Data

FEBRUARY 18, 2025

To optimize their security operations, organizations are adopting modern approaches that combine real-time monitoring with scalable data analytics. They are using data lake architectures and Apache Iceberg to efficiently process large volumes of security data while minimizing operational overhead.

Snapshot

Snapshot Optimization Data Lake Metadata

Come gestire (e ridurre) il debito tecnico per innovare nell’era dell’AI

CIO Business Intelligence

MAY 19, 2025

Sappiamo, in particolare, didover trasformare la gestione dei dati e creare un data lake basato su nuovi stack tecnologici per migliorare la governance e aiutare lazienda a diventare full data-driven. Si tratta di applicazioni o anche di infrastruttura, scelte architetturali, persino data management e software di terze parti.

KPI

KPI Data Lake Data-driven Software

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

AWS Big Data

NOVEMBER 14, 2024

We use the built-in features of Data Firehose, including AWS Lambda for necessary data transformation and Amazon Simple Notification Service (Amazon SNS) for near real-time alerts. APIs act as the entry point for applications to access data, business logic, or functionality from your backend services.

Data Lake

Data Lake Metadata Testing Data-driven

Expanding data analysis and visualization options: Amazon DataZone now integrates with Tableau, Power BI, and more

MLOps and DevOps: Why Data Makes It Different

Webinars

Trending Sources

Bridging the gap between mainframe data and hybrid cloud environments

Webinars

Empower your Jira data in a data lake with Amazon AppFlow and AWS Glue

Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Build a data lake with Apache Flink on Amazon EMR

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

How to modernize data lakes with a data lakehouse architecture

Introducing Amazon Q data integration in AWS Glue

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

How the BMW Group analyses semiconductor demand with AWS Glue

Enforce fine-grained access control on Open Table Formats via Amazon EMR integrated with AWS Lake Formation

Scale your AWS Glue for Apache Spark jobs with new larger worker types G.4X and G.8X

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Lay the groundwork now for advanced analytics and AI

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Data’s dark secret: Why poor quality cripples AI and growth

How Tricentis unlocks insights across the software development lifecycle at speed and scale using Amazon Redshift

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

Author data integration jobs with an interactive data preparation experience with AWS Glue visual ETL

Unlock scalable analytics with a secure connectivity pattern in AWS Glue to read from or write to Snowflake

An AI Chat Bot Wrote This Blog Post …

Connect your data for faster decisions with AWS

Data Mesh 101: How Data Mesh Helps Organizations Be Data-Driven and Achieve Velocity

Unlock scalable analytics with AWS Glue and Google BigQuery

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

How to use foundation models and trusted governance to manage AI workflow risk

Optimize data layout by bucketing with Amazon Athena and AWS Glue to accelerate downstream queries

Tackling AI’s data challenges with IBM databases on AWS

Exploring the AI and data capabilities of watsonx

Building Better Data Models to Unlock Next-Level Intelligence

Showpad accelerates data maturity to unlock innovation using Amazon QuickSight

CIO 100 Award winners drive business results with IT

What is a Data Pipeline?

What is Data Mapping?

Unlocking Trino’s Full Potential With Simba Drivers for BI & ETL

Introducing the HubSpot connector for AWS Glue

Streamline AWS WAF log analysis with Apache Iceberg and Amazon Data Firehose

Come gestire (e ridurre) il debito tecnico per innovare nell’era dell’AI

Ingest telemetry messages in near real time with Amazon API Gateway, Amazon Data Firehose, and Amazon Location Service

Stay Connected