Remove 2022 Remove Data Lake Remove Data Transformation
article thumbnail

Perform upserts in a data lake using Amazon Athena and Apache Iceberg

AWS Big Data

Amazon Athena supports the MERGE command on Apache Iceberg tables, which allows you to perform inserts, updates, and deletes in your data lake at scale using familiar SQL statements that are compliant with ACID (Atomic, Consistent, Isolated, Durable). Create a table to point to the CDC data. Upload 20220922-184314489.csv

article thumbnail

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

result = client.trigger_run( dag_id="any_bash_command", conf={ "command": "echo 'Hello from external trigger!'" Let’s expand the use case to run your data pipeline and perform extract, transform, and load (ETL) jobs when a new file lands in an Amazon Simple Storage Service (Amazon S3) bucket in your data lake.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Build and manage your modern data stack using dbt and AWS Glue through dbt-glue, the new “trusted” dbt adapter

AWS Big Data

dbt is an open source, SQL-first templating engine that allows you to write repeatable and extensible data transforms in Python and SQL. dbt is predominantly used by data warehouses (such as Amazon Redshift ) customers who are looking to keep their data transform logic separate from storage and engine.

Data Lake 122
article thumbnail

Migrate Amazon Redshift from DC2 to RA3 to accommodate increasing data volumes and analytics demands

AWS Big Data

These processes retrieve data from around 90 different data sources, resulting in updating roughly 2,000 tables in the data warehouse and 3,000 external tables in Parquet format, accessed through Amazon Redshift Spectrum and a data lake on Amazon Simple Storage Service (Amazon S3). We started with 115 dc2.large

Data Lake 111
article thumbnail

Databricks’ Data+AI Summit 2022: A Show of Partner “Unity”

Alation

Are your data users overwhelmed by silos and frustrated by untrusted data? Tell them to grab a catalog … and go jump in a lake. That was the message — delivered a little more elegantly than that — at Databricks’ Data+AI Summit 2022. Automatically tracking data lineage across queries executed in any language.

ROI 52
article thumbnail

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

To bring their customers the best deals and user experience, smava follows the modern data architecture principles with a data lake as a scalable, durable data store and purpose-built data stores for analytical processing and data consumption.

Data Lake 115
article thumbnail

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.