Remove data-science-dictionary feature-engineering
article thumbnail

The state of data quality in 2020

O'Reilly on Data

We suspected that data quality was a topic brimming with interest. The responses show a surfeit of concerns around data quality and some uncertainty about how best to address those concerns. Key survey results: The C-suite is engaged with data quality. Data quality might get worse before it gets better.

article thumbnail

Lessons learned building natural language processing systems in health care

O'Reilly on Data

Language understanding benefits from every part of the fast-improving ABC of software: AI (freely available deep learning libraries like PyText and language models like BERT ), big data (Hadoop, Spark, and Spark NLP ), and cloud (GPU's on demand and NLP-as-a-service from all the major cloud providers). IBM Watson NLU. Azure Text Analytics.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Build efficient, cross-Regional, I/O-intensive workloads with Dask on AWS

AWS Big Data

Welcome to the era of data. The sheer volume of data captured daily continues to grow, calling for platforms and solutions to evolve. The Amazon Sustainability Data Initiative (ASDI) uses the capabilities of Amazon S3 to provide a no-cost solution for you to store and share climate science workloads across the globe.

article thumbnail

AWS Professional Services scales by improving performance and democratizing data with Amazon QuickSight

AWS Big Data

The AWS Professional Services (ProServe) Insights team builds global operational data products that serve over 8,000 users within Amazon. In this post, we discuss how QuickSight has helped us improve our performance, democratize our data, and provide insights to our internal customers at scale.

article thumbnail

Open Data Science and Machine Learning for Business with Cloudera Data Science Workbench on HDP

Cloudera

It’s official – Cloudera and Hortonworks have merged , and today I’m excited to announce the availability of Cloudera Data Science Workbench (CDSW) for Hortonworks Data Platform (HDP). Trusted by large data science teams across hundreds of enterprises —. Sound familiar? What is CDSW?

article thumbnail

Manual Feature Engineering

Domino Data Lab

Many thanks to AWP Pearson for the permission to excerpt “Manual Feature Engineering: Manipulating Data for Fun and Profit” from the book, Machine Learning with Python for Everyone by Mark E. Feature engineering is useful for data scientists when assessing tradeoff decisions regarding the impact of their ML models.

Testing 68
article thumbnail

Invoking IT to help revitalize Indigenous languages at risk of extinction

CIO Business Intelligence

Data collection on tribal languages has been undertaken for decades, but in 2012, those working at the Myaamia Center and the National Breath of Life Archival Institute for Indigenous Languages realized that technology had advanced in a way that could better move the process along.

Risk 98