Data Leaders Brief

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

AWS Big Data

OCTOBER 23, 2024

Today, we are excited to announce an enhancement to the Amazon MWAA integration with the Airflow REST API. This improvement streamlines the ability to access and manage your Airflow environments and their integration with external systems, and allows you to interact with your workflows programmatically.

Interactive

Interactive Testing Data-driven Data Lake

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

AWS Big Data

JANUARY 6, 2025

Amazon AppFlow is a fully managed integration service that you can use to securely transfer data from software as a service (SaaS) applications, such as Google BigQuery, Salesforce, SAP, HubSpot, and ServiceNow, to Amazon Web Services (AWS) services such as Amazon Simple Storage Service (Amazon S3) and Amazon Redshift, in just a few clicks.

Analytics

Analytics Data Warehouse Big Data Metrics

Build a high-performance quant research platform with Apache Iceberg

AWS Big Data

JANUARY 9, 2025

Iceberg offers distinct advantages through its metadata layer over Parquet, such as improved data management, performance optimization, and integration with various query engines. This approach also reduces expensive ListObjects API calls typically needed when directly accessing Parquet files in Amazon S3.

Metadata

Metadata Snapshot Cost-Benefit Optimization

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Automation, Evolved: Your New Playbook for Smarter Knowledge Work

Data Talks, CFOs Listen: Why Analytics Are Key To Better Spend Management

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What is data architecture? A framework to manage data

CIO Business Intelligence

DECEMBER 20, 2024

Data architecture definition Data architecture describes the structure of an organizations logical and physical data assets, and data management resources, according to The Open Group Architecture Framework (TOGAF). Modern data architectures use APIs to make it easy to expose and share data. Data integrity. Flexibility.

Data Architecture

Data Architecture Management Consulting Internet of Things

Revolutionize QA: GAPs AI-Driven Accelerators for Smarter, Faster Testing

From generating test cases and Cypress code to AI-powered code reviews and detailed defect reports, our platform streamlines QA processes, saving time and resources. Accelerate API testing with Pytest-based cases and boost accuracy while reducing human error.

Testing

Integrate custom applications with AWS Lake Formation – Part 2

AWS Big Data

NOVEMBER 19, 2024

In the first part of this series , we demonstrated how to implement an engine that uses the capabilities of AWS Lake Formation to integrate third-party applications. With its libraries, CLI, and services, you can connect your frontend to the cloud for authentication, storage, APIs, and more.

Data Processing

Data Processing Metadata Publishing Testing

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

AWS Big Data

MARCH 6, 2025

The Amazon Redshift Data API simplifies access to your Amazon Redshift data warehouse by removing the need to manage database drivers, connections, network configurations, data buffering, and more. The Redshift Data API is the recommended method to connect with Amazon Redshift for web applications.

Visualization

Visualization Sales Data Warehouse Management

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

AWS Big Data

OCTOBER 11, 2024

Snapshots play a critical role in providing the availability, integrity and ability to recover data in OpenSearch Service domains. This guide is designed to help you maintain data integrity and continuity while navigating complex multi-Region and multi-account environments in OpenSearch Service. Add a bucket policy.

Snapshot

Snapshot Dashboards Management Testing

The 10 Essential SaaS Trends You Should Watch Out For In 2020

datapine

DECEMBER 11, 2019

Flexibility in payment models, where they only pay for the resource usage they need, for instance, is attractive for many organizations in today’s competitive world. The growing need for API connections. 3) The Growing Need For API Connections. At first, SaaS providers didn’t come with a complete integration solution.

Software

Software Cost-Benefit Data-driven Data Processing

Data distilleries: CIOs turn to new efficient enterprise data platforms

CIO Business Intelligence

DECEMBER 5, 2024

By integrating and refining data through these modern solutions, insurers can enhance the accuracy of risk assessments, reduce claims payout time by over 50%, and boost operational efficiency by more than 30%. Integrating advanced technologies like genAI often requires extensively reengineering existing systems.

Enterprise

Enterprise Insurance Unstructured Data Business Intelligence

Practical Skills for The AI Product Manager

O'Reilly on Data

MAY 14, 2020

The product manager for the research phase understands that AI Research products are first and foremost products, and therefore develops all of the necessary tools, structure, relationships, and resources needed to be successful. Finally, integrating AI products into business tech stacks (especially in enterprises) is nontrivial.

Management

Management Experimentation B2B Machine Learning

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

AWS Big Data

DECEMBER 10, 2024

Although the integration with AWS IAM Identity Center is the recommended approach, this post focuses on setups where IAM Identity Center might not be applicable due to compliance constraints, such as organizations requiring FedRAMP Moderate compliance, which IAM Identity Center doesnt yet meet. Choose Create a resource.

Sales

Sales Metadata Enterprise Testing

How BMW streamlined data access using AWS Lake Formation fine-grained access control

AWS Big Data

OCTOBER 29, 2024

The CDH is used to create, discover, and consume data products through a central metadata catalog, while enforcing permission policies and tightly integrating data engineering, analytics, and machine learning services to streamline the user journey from data to insight. It comprises distinct AWS account types, each serving a specific purpose.

Data Lake

Data Lake Sales Metadata Machine Learning

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

AWS Big Data

DECEMBER 16, 2024

The need to integrate diverse data sources has grown exponentially, but there are several common challenges when integrating and analyzing data from multiple sources, services, and applications. First, you need to create and maintain independent connections to the same data source for different services.

Visualization

Visualization Data Processing Testing Publishing

Have we reached the end of ‘too expensive’ for enterprise software?

CIO Business Intelligence

JANUARY 9, 2025

What began with chatbots and simple automation tools is developing into something far more powerful AI systems that are deeply integrated into software architectures and influence everything from backend processes to user interfaces. An important aspect of this democratization is the availability of LLMs via easy-to-use APIs.

Software

Software Enterprise Key Performance Indicator Machine Learning

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

AWS Big Data

NOVEMBER 19, 2024

For each user query, an API is invoked on Amazon API Gateway to process the request. The API is integrated with AWS Lambda , which processes the user query and generates the answers based on available documents and user access using retrieval augmented generation (RAG).

Management

Management Metadata Manufacturing Testing

Generative AI in the Enterprise

O'Reilly on Data

NOVEMBER 28, 2023

Many of the new open source models are much smaller and not as resource intensive but still deliver good results (especially when trained for a specific application). We suspect that many API services are being offered as loss leaders—that the major providers have intentionally set prices low to buy market share.

Enterprise

Enterprise Modeling Testing Reporting

SAP Datasphere Powers Business at the Speed of Data

Rocket-Powered Data Science

MARCH 20, 2023

SAP Datasphere has arrived to address those pain points, by enabling discovery, access, and integration of the heterogeneous data distributed across the enterprise. Datasphere manages and integrates structured, semi-structured, and unstructured data types. Datasphere is not just for data managers.

Data Warehouse

Data Warehouse Metadata Digital Transformation Machine Learning

Enhancing Search Relevancy with Cohere Rerank 3.5 and Amazon OpenSearch Service

AWS Big Data

DECEMBER 18, 2024

In the rapidly evolving landscape of AI-powered search, organizations are looking to integrate large language models (LLMs) and embedding models with Amazon OpenSearch Service. Now through a single Rerank API call in Amazon Bedrock, you can integrate Rerank into existing systems at scale. How to integrate Cohere Rerank 3.5

Metrics

Metrics Modeling Data Processing Machine Learning

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

AWS Big Data

NOVEMBER 27, 2024

Zero-ETL integration with Amazon Redshift reduces the need for custom pipelines, preserves resources for your transactional systems, and gives you access to powerful analytics. In this post, we explore how to use Aurora MySQL-Compatible Edition Zero-ETL integration with Amazon Redshift and dbt Cloud to enable near real-time analytics.

Data Warehouse

Data Warehouse Analytics Testing Sales

Cloudera AI Inference Service Enables Easy Integration and Deployment of GenAI Into Your Production Environments

Cloudera

DECEMBER 4, 2024

It is a powerful deployment environment that enables you to integrate and deploy generative AI (GenAI) and predictive models into your production environments, incorporating Cloudera’s enterprise-grade security, privacy, and data governance. This is where the Cloudera AI Inference service comes in. We also outlined many of its capabilities.

Metrics

Metrics Data Processing Machine Learning Deep Learning

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

AWS Big Data

NOVEMBER 22, 2024

Spark Upgrades addresses four key areas of changes: Spark SQL API methods and functions Spark DataFrame API methods and operations Python language updates (including module deprecations and syntax changes) Spark SQL and Core configuration settings The complexity of these upgrades becomes evident when you consider migrating from Spark 2.4.3

Cost-Benefit

Cost-Benefit Data-driven Software Testing

Recap of Amazon Redshift key product announcements in 2024

AWS Big Data

DECEMBER 17, 2024

This allows your teams to flexibly scale write workloads such as extract, transform, and load (ETL) and data processing by adding compute resources of different types and sizes based on individual workloads price-performance requirements, as well as securely collaborate with other teams on live data for use cases such as customer 360.

Data Lake

Data Lake Data Warehouse Data-driven Optimization

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

AWS Big Data

MAY 2, 2025

Flows can be programmatically exported, deployed, and scaled on any OpenSearch 2.19+ cluster through OpenSearchs existing ingest, index, workflow and search APIs. You can use the flow builder through APIs or a visual designer. Flows are a pipeline of processor resources. Each project contains at least one ingest or search flow.

Machine Learning

Machine Learning Visualization Dashboards Metadata

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

AWS Big Data

OCTOBER 21, 2024

It provides data catalog, automated crawlers, and visual job creation to streamline data integration across various data sources and targets. It is a data marketplace featuring over 300 providers offering thousands of datasets accessible through files, Amazon Redshift tables, and APIs. AWS Glue is used for this integration.

Sales

Sales Data-driven Data Processing Key Performance Indicator

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

AWS Big Data

DECEMBER 4, 2024

With the growing emphasis on data, organizations are constantly seeking more efficient and agile ways to integrate their data, especially from a wide variety of applications. However, this task is complicated by the unique characteristics of modern systems, such as differing API protocols, implementations, and rate limits.

Data Integration

Data Integration Data Lake Statistics Data-driven

What Is Hyperautomation?

O'Reilly on Data

OCTOBER 11, 2022

Automating routine office tasks is an important and worthwhile project–and redesigning routine tasks so that they can be integrated into a larger workflow that can be automated more effectively is even more important. So from the start, we have a data integration problem compounded with a compliance problem.

Data Integration

Data Integration Insurance Dashboards Data-driven

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

AWS Big Data

OCTOBER 24, 2024

The following diagram illustrates solution architecture, which manages stored objects using a continuous integration and delivery (CI/CD) pipeline. Jenkins runs an OpenSearch Service API to deploy changes. A new commit invokes a build job in Jenkins. Jenkins retrieves JSON files from the GitHub repository and performs validation.

Visualization

Visualization Management Data Processing Testing

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Rocket-Powered Data Science

FEBRUARY 15, 2023

Second, doing something new (especially something “big” and disruptive) must align with your business objectives – otherwise, you may be steering your business into deep uncharted waters that you haven’t the resources and talent to navigate. Remember to Keep it Simple and Smart (the “KISS” principle ).

Strategy

Strategy Experimentation Uncertainty Machine Learning

Progress for big data in Kubernetes

O'Reilly on Data

SEPTEMBER 11, 2018

That’s great to have because you can use that storage platform to build a data fabric that extends from your on-premises systems into multiple cloud systems to get access to data at a performance level and with an API that you want. Most of the work so far in Kubernetes involves the use of file system APIs.

Big Data

Big Data Data Processing Statistics Management

Serverless Kubernetes Has Become Invaluable to Data Scientists

Smart Data Collective

MARCH 2, 2022

Serverless can also bring cost reductions, as users only pay for the resources used. Each worker node has an agent called kubelet that connects it to the Kubernetes API. Kubectl also uses PodSpecs to manage the underlying pods whenever a kubelet is running on a server and connected to K8s API. Kubernetes without Nodes?

Cost-Benefit

Cost-Benefit Data Science Interactive Management

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

AWS Big Data

DECEMBER 19, 2024

Invoke a Lambda function as the target for the EventBridge rule and pass the event payload to it: The Lambda function does 2 things: Fetches the asset details, including the Amazon Resource Name (ARN) of the S3 published asset and the IAM role ARN from the subscription target.

Publishing

Publishing Unstructured Data Metadata Data-driven

Implement a full stack serverless search application using AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon OpenSearch Serverless

AWS Big Data

MAY 31, 2024

This encompasses tasks such as integrating diverse data from various sources with distinct formats and structures, optimizing the user experience for performance and security, providing multilingual support, and optimizing for cost, operations, and reliability. API Gateway forwards all requests to the Lambda function to serve up the requests.

Metadata

Metadata Data-driven Management Testing

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Cloudera

NOVEMBER 13, 2024

Several LLMs are publicly available through APIs from OpenAI , Anthropic , AWS , and others, which give developers instant access to industry-leading models that are capable of performing most generalized tasks. The training jobs use Cloudera’s Workbench compute resources, and users can track the performance of a training job within the UI.

Cost-Benefit

Cost-Benefit Data Processing Machine Learning Testing

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

AWS Big Data

NOVEMBER 11, 2024

You can use this approach for a variety of use cases, from real-time log analytics to integrating application messaging data for real-time search. OpenSearch Ingestion integrates with many AWS services, and provides ready-made blueprints to accelerate ingesting data for a variety of analytics use cases into OpenSearch Service domains.

Metadata

Metadata Metrics Analytics Data Processing

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AWS Big Data

OCTOBER 12, 2023

You can run analytics workloads at any scale with automatic scaling that resizes resources in seconds to meet changing data volumes and processing requirements. EMR Serverless automatically scales resources up and down to provide just the right amount of capacity for your application, and you only pay for what you use.

Big Data

Big Data Data-driven Management Visualization

AI Cybersecurity Lessons: Insights from the DeepSeek Incident

David Menninger's Analyst Perspectives

APRIL 22, 2025

Further complicating matters, Microsoft suspects that DeepSeek AI misused OpenAI APIs to harvest substantial amounts of data, potentially infringing on intellectual property rights. The reported incident revealed that DeepSeek AI had left its ClickHouse database accessible to the public, exposing over one million lines of log entry data.

Measurement

Measurement Risk Strategy Data-driven

Introducing job queuing to scale your AWS Glue workloads

AWS Big Data

SEPTEMBER 3, 2024

Data volume can increase significantly over time, and it often requires concurrent consumption of large compute resources. Data integration workloads can become increasingly concurrent as more and more applications demand access to data at the same time. It simplifies your daily operation and reduces latency for the retries.

Data Integration

Data Integration Software Data-driven Big Data

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

AWS Big Data

NOVEMBER 6, 2024

Kinesis Data Streams not only offers the flexibility to use many out-of-box integrations to process the data published to the streams, but also provides the capability to build custom stream processing applications that can be deployed on your compute fleet. This allows you to process the same data with fewer compute resources.

Cost-Benefit

Cost-Benefit Metadata Optimization Publishing

Integrate custom applications with AWS Lake Formation – Part 1

AWS Big Data

NOVEMBER 19, 2024

In this two-part series, we show how to integrate custom applications or data processing engines with Lake Formation using the third-party services integration feature. In this post, we dive deep into the required Lake Formation and AWS Glue APIs. We discuss in later sections how these tags are used.

Data Lake

Data Lake Metadata Testing Data Processing

Optimize APIs with API security best practices

IBM Big Data Hub

NOVEMBER 6, 2023

However, digital infrastructures are highly dependent on application programming interfaces — or APIs — to facilitate data transfers between software applications and between applications and end users. As the backend framework for most web and mobile apps, APIs are internet-facing and therefore vulnerable to attacks.

Optimization

Optimization Testing Software Risk

Introducing enhanced functionality for worker configuration management in Amazon MSK Connect

AWS Big Data

MARCH 25, 2024

MSK Connect now supports the ability to delete MSK Connect worker configurations, tag resources, and manage worker configurations and custom plugins using AWS CloudFormation. Together, these new capabilities make it straightforward to manage your MSK Connect resources and automate deployments through CI/CD pipelines.

Management

Management Metadata Reporting Big Data

Build event-driven architectures with Amazon MSK and Amazon EventBridge

AWS Big Data

SEPTEMBER 28, 2023

If the use case is well defined and directly maps to one event bus, such as event streaming and analytics with streaming events (Kafka) or application integration with simplified and consistent event filtering, transformation, and routing on discrete events (EventBridge), the decision for a particular broker technology is straightforward.

Data-driven

Data-driven Metrics Publishing Management

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

AWS Big Data

JANUARY 30, 2025

Cost-effective Use resources in an efficient, cost-effective way. Aim to minimize situations where resources are running idly while waiting for other processes to be completed. As a result, we simplified the way we defined the resources we wanted to deploy while using our preferred coding language for development.

Data Warehouse

Data Warehouse Data Architecture Machine Learning Data Transformation

Introducing simplified interaction with the Airflow REST API in Amazon MWAA

Ingest data from Google Analytics 4 and Google Sheets to Amazon Redshift using Amazon AppFlow

Webinars

Trending Sources

Build a high-performance quant research platform with Apache Iceberg

Webinars

What is data architecture? A framework to manage data

Revolutionize QA: GAPs AI-Driven Accelerators for Smarter, Faster Testing

Integrate custom applications with AWS Lake Formation – Part 2

Build a secure data visualization application using the Amazon Redshift Data API with AWS IAM Identity Center

Take manual snapshots and restore in a different domain spanning across various Regions and accounts in Amazon OpenSearch Service

The 10 Essential SaaS Trends You Should Watch Out For In 2020

Data distilleries: CIOs turn to new efficient enterprise data platforms

Practical Skills for The AI Product Manager

Federate to Amazon Redshift Query Editor v2 with Microsoft Entra ID

How BMW streamlined data access using AWS Lake Formation fine-grained access control

Introducing a new unified data connection experience with Amazon SageMaker Lakehouse unified data connectivity

Have we reached the end of ‘too expensive’ for enterprise software?

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

Generative AI in the Enterprise

SAP Datasphere Powers Business at the Speed of Data

Enhancing Search Relevancy with Cohere Rerank 3.5 and Amazon OpenSearch Service

Unlocking near real-time analytics with petabytes of transaction data using Amazon Aurora Zero-ETL integration with Amazon Redshift and dbt Cloud

Cloudera AI Inference Service Enables Easy Integration and Deployment of GenAI Into Your Production Environments

Introducing generative AI upgrades for Apache Spark in AWS Glue (preview)

Recap of Amazon Redshift key product announcements in 2024

Amazon OpenSearch Service launches flow builder to empower rapid AI search innovation

Demystify data sharing and collaboration patterns on AWS: Choosing the right tool for the job

Simplify data integration with AWS Glue and zero-ETL to Amazon SageMaker Lakehouse

What Is Hyperautomation?

Manage Amazon OpenSearch Service Visualizations, Alerts, and More with GitHub and Jenkins

Business Strategies for Deploying Disruptive Tech: Generative AI and ChatGPT

Progress for big data in Kubernetes

Serverless Kubernetes Has Become Invaluable to Data Scientists

Implement a custom subscription workflow for unmanaged Amazon S3 assets published with Amazon DataZone

Implement a full stack serverless search application using AWS Amplify, Amazon Cognito, Amazon API Gateway, AWS Lambda, and Amazon OpenSearch Serverless

Introducing Cloudera Fine Tuning Studio for Training, Evaluating, and Deploying LLMs with Cloudera AI

Use Amazon Kinesis Data Streams to deliver real-time data to Amazon OpenSearch Service domains with Amazon OpenSearch Ingestion

Orchestrate Amazon EMR Serverless jobs with AWS Step functions

AI Cybersecurity Lessons: Insights from the DeepSeek Incident

Introducing job queuing to scale your AWS Glue workloads

Reduce your compute costs for stream processing applications with Kinesis Client Library 3.0

Integrate custom applications with AWS Lake Formation – Part 1

Optimize APIs with API security best practices

Introducing enhanced functionality for worker configuration management in Amazon MSK Connect

Build event-driven architectures with Amazon MSK and Amazon EventBridge

How Open Universities Australia modernized their data platform and significantly reduced their ETL costs with AWS Cloud Development Kit and AWS Step Functions

Stay Connected