Data Analytics, Data Processing and Data Warehouse

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

While customers can perform some basic analysis within their operational or transactional databases, many still need to build custom data pipelines that use batch or streaming jobs to extract, transform, and load (ETL) data into their data warehouse for more comprehensive analysis.

Data Warehouse

Data Warehouse Analytics Testing Sales

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

AWS Big Data

MAY 30, 2024

Amazon Redshift is a fast, scalable, and fully managed cloud data warehouse that allows you to process and run your complex SQL analytics workloads on structured and semi-structured data. The system had an integration with legacy backend services that were all hosted on premises.

Data Warehouse

Data Warehouse Data Lake Cost-Benefit Structured Data

The future of data: A 5-pillar approach to modern data management

CIO Business Intelligence

DECEMBER 11, 2024

Manish Limaye Pillar #1: Data platform The data platform pillar comprises tools, frameworks and processing and hosting technologies that enable an organization to process large volumes of data, both in batch and streaming modes. The choice of vendors should align with the broader cloud or on-premises strategy.

Management

Management Data Governance Data Science Reporting

Webinars

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

DataOps needs a directed graph-based workflow that contains all the data access, integration, model and visualization steps in the data analytic production process. It orchestrates complex pipelines, toolchains, and tests across teams, locations, and data centers. Amaterasu — is a deployment tool for data pipelines.

Testing

Testing Machine Learning Consulting Data Science

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

AWS Big Data

DECEMBER 20, 2024

You can now generate data integration jobs for various data sources and destinations, including Amazon Simple Storage Service (Amazon S3) data lakes with popular file formats like CSV, JSON, and Parquet, as well as modern table formats such as Apache Hudi , Delta , and Apache Iceberg.

Data Integration

Data Integration Visualization Data Processing Data Lake

Take Your SQL Skills To The Next Level With These Popular SQL Books

datapine

SEPTEMBER 27, 2022

Business leaders, developers, data heads, and tech enthusiasts – it’s time to make some room on your business intelligence bookshelf because once again, datapine has new books for you to add. We have already given you our top data visualization books , top business intelligence books , and best data analytics books.

Business Intelligence

Business Intelligence Data Warehouse Data Processing Data mining

Scaling RISE with SAP data and AWS Glue

AWS Big Data

NOVEMBER 29, 2024

Customers often want to augment and enrich SAP source data with other non-SAP source data. Such analytic use cases can be enabled by building a data warehouse or data lake. Customers can now use the AWS Glue SAP OData connector to extract data from SAP.

Visualization

Visualization Data Processing Data-driven Cost-Benefit

How Will The Cloud Impact Data Warehousing Technologies?

Smart Data Collective

APRIL 8, 2020

Dating back to the 1970s, the data warehousing market emerged when computer scientist Bill Inmon first coined the term ‘data warehouse’. Created as on-premise servers, the early data warehouses were built to perform on just a gigabyte scale. The post How Will The Cloud Impact Data Warehousing Technologies?

Technology

Technology Data Warehouse Big Data Machine Learning

How EUROGATE established a data mesh architecture using Amazon DataZone

AWS Big Data

JANUARY 15, 2025

In addition to real-time analytics and visualization, the data needs to be shared for long-term data analytics and machine learning applications. The applications are hosted in dedicated AWS accounts and require a BI dashboard and reporting services based on Tableau.

IoT

IoT Machine Learning Metadata Data-driven

Power analytics as a service capabilities using Amazon Redshift

AWS Big Data

APRIL 17, 2024

The AaaS model accelerates data-driven decision-making through advanced analytics, enabling organizations to swiftly adapt to changing market trends and make informed strategic choices. times better price-performance than other cloud data warehouses. Data processing jobs enrich the data in Amazon Redshift.

Data Warehouse

Data Warehouse Analytics Cost-Benefit Data Processing

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

Tens of thousands of customers use Amazon Redshift for modern data analytics at scale, delivering up to three times better price-performance and seven times better throughput than other cloud data warehouses. She has experience in product vision and strategy in industry-leading data products and platforms.

Visualization

Visualization Sales Data Warehouse Management

Common Business Intelligence Challenges Facing Entrepreneurs

datapine

MAY 21, 2019

“BI is about providing the right data at the right time to the right people so that they can take the right decisions” – Nic Smith. Data analytics isn’t just for the Big Guys anymore; it’s accessible to ventures, organizations, and businesses of all shapes, sizes, and sectors.

Business Intelligence

Business Intelligence Cost-Benefit Dashboards ROI

Implement data warehousing solution using dbt on Amazon Redshift

AWS Big Data

NOVEMBER 17, 2023

Seeds – These are CSV files in your dbt project (typically in your seeds directory), which dbt can load into your data warehouse using the dbt seed command. This includes the host, port, database name, user name, and password. An Amazon Simple Storage (Amazon S3) bucket to host documentation files. project-dir.

Snapshot

Snapshot Data Processing Testing Data Warehouse

Enable data analytics with Talend and Amazon Redshift Serverless

AWS Big Data

JULY 25, 2023

Today, in order to accelerate and scale data analytics, companies are looking for an approach to minimize infrastructure management and predict computing needs for different types of workloads, including spikes and ad hoc analytics. Prerequisites To complete the integration, you need a Redshift Serverless data warehouse.

Data Analytics

Data Analytics Analytics Data Warehouse Data Processing

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

AWS Big Data

JUNE 10, 2024

One of the key challenges in modern big data management is facilitating efficient data sharing and access control across multiple EMR clusters. Organizations have multiple Hive data warehouses across EMR clusters, where the metadata gets generated. The producer account will host the EMR cluster and S3 buckets.

Data Lake

Data Lake Metadata Data Warehouse Data Processing

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

AWS Big Data

OCTOBER 18, 2023

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that provides the flexibility to use provisioned or serverless compute for your analytical workloads. Modern analytics is much wider than SQL-based data warehousing. Fault tolerance is built in. Any hardware failures are automatically replaced.

Analytics

Analytics Data Warehouse Dashboards Testing

CIOs are (still) closer than ever to their dream data lakehouse

CIO Business Intelligence

OCTOBER 15, 2024

The formats are basically abstraction layers that give business analysts and data scientists the ability to mix and match whatever data stores they need, wherever they may lie, with whatever processing engine they choose. The data itself remains intact, uncopied and unaltered. And the table formats will keep track of all of it.

Metadata

Metadata Data Processing Uncertainty Data Warehouse

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

AWS Big Data

AUGUST 1, 2024

Large-scale data warehouse migration to the cloud is a complex and challenging endeavor that many organizations undertake to modernize their data infrastructure, enhance data management capabilities, and unlock new business opportunities. This makes sure the new data platform can meet current and future business goals.

Data Warehouse

Data Warehouse KPI Optimization Cost-Benefit

How smava makes loans transparent and affordable using Amazon Redshift Serverless

AWS Big Data

DECEMBER 21, 2023

To speed up the self-service analytics and foster innovation based on data, a solution was needed to provide ways to allow any team to create data products on their own in a decentralized manner. To create and manage the data products, smava uses Amazon Redshift , a cloud data warehouse.

Data Lake

Data Lake Data Warehouse Data-driven B2B

Deciphering The Seldom Discussed Differences Between Data Mining and Data Science

Smart Data Collective

NOVEMBER 18, 2020

Data Science is used in different areas of our life and can help companies to deal with the following situations: Using predictive analytics to prevent fraud Using machine learning to streamline marketing practices Using data analytics to create more effective actuarial processes. Where to Use Data Mining?

Data mining

Data mining Data Science Informatics Statistics

Use AWS Glue to streamline SFTP data processing

AWS Big Data

AUGUST 13, 2024

In this blog post, we explore how to use the SFTP Connector for AWS Glue from the AWS Marketplace to efficiently process data from Secure File Transfer Protocol (SFTP) servers into Amazon Simple Storage Service (Amazon S3) , further empowering your data analytics and insights. Choose Store a new secret.

Data Processing

Data Processing Visualization Data Lake Data Processing

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

AWS Big Data

DECEMBER 13, 2023

A CDC-based approach captures the data changes and makes them available in data warehouses for further analytics in real-time. usually a data warehouse) needs to reflect those changes in near real-time. This post showcases how to use streaming ingestion to bring data to Amazon Redshift.

Data Warehouse

Data Warehouse Snapshot Data Processing Internet of Things

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

AWS Big Data

NOVEMBER 8, 2023

Because Gilead is expanding into biologics and large molecule therapies, and has an ambitious goal of launching 10 innovative therapies by 2030, there is heavy emphasis on using data with AI and machine learning (ML) to accelerate the drug discovery pipeline. This data volume is expected to increase monthly and is fully refreshed each month.

Data Lake

Data Lake Data Warehouse Cost-Benefit Optimization

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

AWS Big Data

MAY 30, 2023

A host with the installed MySQL utility, such as an Amazon Elastic Compute Cloud (Amazon EC2) instance, AWS Cloud9 , your laptop, and so on. The host is used to access an Amazon Aurora MySQL-Compatible Edition cluster that you create and to run a Python script that sends sample records to the Kinesis data stream.

Data Lake

Data Lake Data Analytics Analytics Data Processing

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

IBM Big Data Hub

JUNE 15, 2023

The data lakehouse architecture combines the flexibility, scalability and cost advantages of data lakes with the performance, functionality and usability of data warehouses to deliver optimal price-performance for a variety of data, analytics and AI workloads.

Data Warehouse

Data Warehouse Data Lake Optimization Data-driven

What is business intelligence? Transforming data into business insights

CIO Business Intelligence

JANUARY 20, 2023

Improved employee satisfaction: Providing business users access to data without having to contact analysts or IT can reduce friction, increase productivity, and facilitate faster results. BI analysts use data analytics, data visualization, and data modeling techniques and technologies to identify trends.

Business Intelligence

Business Intelligence Dashboards Data mining OLAP

Resolve private DNS hostnames for Amazon MSK Connect

AWS Big Data

OCTOBER 20, 2023

The connectors were only able to reference hostnames in the connector configuration or plugin that are publicly resolvable and couldn’t resolve private hostnames defined in either a private hosted zone or use DNS servers in another customer network. Many customers ensure that their internal DNS applications are not publicly resolvable.

Data Processing

Data Processing Snapshot Data Warehouse Management

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

AWS Big Data

DECEMBER 4, 2024

These nodes can implement analytical platforms like data lake houses, data warehouses, or data marts, all united by producing data products. By treating the data as a product, the outcome is a reusable asset that outlives a project and meets the needs of the enterprise consumer.

Metadata

Metadata Data Governance Data Quality Data-driven

Your Effective Roadmap To Implement A Successful Business Intelligence Strategy

datapine

FEBRUARY 22, 2022

Without real-time insight into their data, businesses remain reactive, miss strategic growth opportunities, lose their competitive edge, fail to take advantage of cost savings options, don’t ensure customer satisfaction… the list goes on. This should also include creating a plan for data storage services. Ensure data literacy.

Business Intelligence

Business Intelligence Strategy Cost-Benefit Dashboards

Important Considerations When Migrating to a Data Lake

Smart Data Collective

MARCH 30, 2022

Azure Data Lake Storage Gen2 is based on Azure Blob storage and offers a suite of big data analytics features. If you don’t understand the concept, you might want to check out our previous article on the difference between data lakes and data warehouses. Conclusion.

Data Lake

Data Lake Cost-Benefit Data Warehouse Big Data

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

AWS Big Data

FEBRUARY 27, 2024

Each data producer within the organization has its own data lake in Apache Hudi format, ensuring data sovereignty and autonomy. These datasets are pivotal for reporting and analytics use cases, powered by services like Amazon Redshift and tools like Power BI.

Data-driven

Data-driven Data Lake Data Quality Data Governance

Integrate Tableau and Okta with Amazon Redshift using AWS IAM Identity Center

AWS Big Data

JUNE 3, 2024

Amazon Redshift is a fast, scalable cloud data warehouse built to serve workloads at any scale. This integration positions Amazon Redshift as an IAM Identity Center-managed application, enabling you to use database role-based access control on your data warehouse for enhanced security. Open Tableau Desktop.

Data Warehouse

Data Warehouse Reporting Testing Publishing

A Guide To Starting A Career In Business Intelligence & The BI Skills You Need

datapine

MARCH 31, 2022

On the flip side, if you enjoy diving deep into the technical side of things, with the right mix of skills for business intelligence you can work a host of incredibly interesting problems that will keep you in flow for hours on end. This could involve anything from learning SQL to buying some textbooks on data warehouses.

Business Intelligence

Business Intelligence Statistics Visualization Data-driven

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

AWS Big Data

NOVEMBER 8, 2023

Fast-track streaming ETL with AWS streaming data services: Learn how to build streaming data pipelines across data lakes and data warehouses. Learn best practices for performance, scale, and cost control in Amazon Kinesis Data Streams, Amazon MSK, Amazon Redshift streaming ingestion, and AWS Glue streaming.

Data-driven

Data-driven Machine Learning Data Lake Cost-Benefit

Create an end-to-end data strategy for Customer 360 on AWS

AWS Big Data

MARCH 26, 2024

The AWS modern data architecture shows a way to build a purpose-built, secure, and scalable data platform in the cloud. Learn from this to build querying capabilities across your data lake and the data warehouse. About the Authors Ismail Makhlouf is a Senior Specialist Solutions Architect for Data Analytics at AWS.

Data Strategy

Data Strategy Strategy Data Warehouse Prescriptive Analytics

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Cloudera

JANUARY 19, 2024

Supported AI models and services The SQL AI Assistant is not bundled with a specific LLM; instead it supports various LLMs and hosting services. The model can run locally, be hosted on CML infra or in the infrastructure of a trusted service provider. Log in to the Cloudera Data Warehouse service as DWAdmin.

Data Warehouse

Data Warehouse Data Processing Optimization Modeling

Architectural Patterns for real-time analytics using Amazon Kinesis Data Streams, Part 2: AI Applications

AWS Big Data

MAY 28, 2024

The ingested data gets transformed and analyzed in near real time using Amazon Managed Service for Apache Flink. Stream data can further be enriched using lookup data hosted in a data warehouse such as Amazon Redshift. Shwetha Radhakrishnan is a Solutions Architect for AWS with a focus in Data Analytics.

IoT

IoT Analytics Dashboards Data-driven

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

AWS Big Data

OCTOBER 18, 2023

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Spark SQL is an Apache Spark module for structured data processing. host') export PASSWORD=$(aws secretsmanager get-secret-value --secret-id $secret_name --query SecretString --output text | jq -r '.password')

Big Data

Big Data Data Processing Interactive Testing

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

AWS Big Data

APRIL 25, 2024

As the queries finish running, an UNLOAD operation is invoked from the Redshift data warehouse to the S3 bucket in Account A. Cross-account access has been set up between S3 buckets in Account A with resources in Account B to be able to load and unload data. role_arn={5}&database={6}&region={7}'.format(conn_type,

Metadata

Metadata Data Processing Management Testing

Attribute Amazon EMR on EC2 costs to your end-users

AWS Big Data

AUGUST 27, 2024

About the Authors Raj Patel is AWS Lead Consultant for Data Analytics solutions based out of India. He specializes in building and modernising analytical solutions. His background is in data warehouse/data lake – architecture, development and administration.

Metrics

Metrics Dashboards Data Lake Optimization

Integrate Tableau and Microsoft Entra ID with Amazon Redshift using AWS IAM Identity Center

AWS Big Data

SEPTEMBER 3, 2024

Amazon Redshift and Tableau empower data analysis. Amazon Redshift is a cloud data warehouse that processes complex queries at scale and with speed. Tableau’s extensive capabilities and enterprise connectivity help analysts efficiently prepare, explore, and share data insights company-wide. Open Tableau Desktop.

Reporting

Reporting Publishing Data Warehouse Management

Announcing the 2021 Data Impact Awards

Cloudera

MAY 12, 2021

2020 saw us hosting our first ever fully digital Data Impact Awards ceremony, and it certainly was one of the highlights of our year. We saw a record number of entries and incredible examples of how customers were using Cloudera’s platform and services to unlock the power of data. SECURITY AND GOVERNANCE LEADERSHIP.

Digital Transformation

Digital Transformation Machine Learning Optimization Data Lake

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

AWS Big Data

MARCH 9, 2023

Thousands of customers rely on Amazon Redshift to build data warehouses to accelerate time to insights with fast, simple, and secure analytics at scale and analyze data from terabytes to petabytes by running complex analytical queries. Data loading is one of the key aspects of maintaining a data warehouse.

Slice and Dice

Slice and Dice Data Warehouse Metrics Metadata

How Macmillan Publishers authored success using IBM Cognos Analytics

IBM Big Data Hub

AUGUST 28, 2023

It’s no wonder then that Macmillan needs sophisticated business intelligence (BI) and data analytics. This data is leveraged by departments throughout the organization and is essential to their business operations. As business processes grew more complex, the data transparency and visibility suffered.

Publishing

Publishing Analytics Business Intelligence Operational Reporting

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Migrate a petabyte-scale data warehouse from Actian Vectorwise to Amazon Redshift

Webinars

Trending Sources

The future of data: A 5-pillar approach to modern data management

Webinars

The DataOps Vendor Landscape, 2021

Amazon Q data integration adds DataFrame support and in-prompt context-aware job creation

Take Your SQL Skills To The Next Level With These Popular SQL Books

Scaling RISE with SAP data and AWS Glue

How Will The Cloud Impact Data Warehousing Technologies?

How EUROGATE established a data mesh architecture using Amazon DataZone

Power analytics as a service capabilities using Amazon Redshift

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Common Business Intelligence Challenges Facing Entrepreneurs

Implement data warehousing solution using dbt on Amazon Redshift

Enable data analytics with Talend and Amazon Redshift Serverless

Design a data mesh pattern for Amazon EMR-based data lakes using AWS Lake Formation with Hive metastore federation

Migrate Microsoft Azure Synapse Analytics to Amazon Redshift using AWS SCT

CIOs are (still) closer than ever to their dream data lakehouse

Unlock scalability, cost-efficiency, and faster insights with large-scale data migration to Amazon Redshift

How smava makes loans transparent and affordable using Amazon Redshift Serverless

Deciphering The Seldom Discussed Differences Between Data Mining and Data Science

Use AWS Glue to streamline SFTP data processing

Break data silos and stream your CDC data with Amazon Redshift streaming and Amazon MSK

How Gilead used Amazon Redshift to quickly and cost-effectively load third-party medical claims data

Join a streaming data source with CDC data for real-time serverless data analytics using AWS Glue, AWS DMS, and Amazon DynamoDB

The disruptive potential of open data lakehouse architectures and IBM watsonx.data

What is business intelligence? Transforming data into business insights

Resolve private DNS hostnames for Amazon MSK Connect

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

Your Effective Roadmap To Implement A Successful Business Intelligence Strategy

Important Considerations When Migrating to a Data Lake

Empowering data-driven excellence: How the Bluestone Data Platform embraced data mesh for success

Integrate Tableau and Okta with Amazon Redshift using AWS IAM Identity Center

A Guide To Starting A Career In Business Intelligence & The BI Skills You Need

Real-time streaming data top picks you cannot miss at AWS re:Invent 2023

Create an end-to-end data strategy for Customer 360 on AWS

Setting up and Getting Started with Cloudera’s New SQL AI Assistant

Architectural Patterns for real-time analytics using Amazon Kinesis Data Streams, Part 2: AI Applications

Run Apache Hive workloads using Spark SQL with Amazon EMR on EKS

Orchestrate an end-to-end ETL pipeline using Amazon S3, AWS Glue, and Amazon Redshift Serverless with Amazon MWAA

Attribute Amazon EMR on EC2 costs to your end-users

Integrate Tableau and Microsoft Entra ID with Amazon Redshift using AWS IAM Identity Center

Announcing the 2021 Data Impact Awards

Simplify data loading into Type 2 slowly changing dimensions in Amazon Redshift

How Macmillan Publishers authored success using IBM Cognos Analytics

Stay Connected