Databricks Ups AI Ante With New AutoML Engine And Feature Store

Republié par Platon

Suiveurs: 0

As Databricks‘ annual conference in North America, Sommet Données + IA, continues, so do the announcements from the company about new capabilities on its platform. Yesterday was focused on conventional analytics. Today’s all about AI, and for multiple audiences. For the developer crowd as well as sophisticated business users, Databricks is introducing an AutoML (automated machine learning) engine; for data scientists, the company is adding a feature store.

A lire également: Databricks déploie le partage de données, les pipelines automatisés et le catalogue de données

La prémisse

In general, AutoML platforms allow users to bring their own data set, and build a model from it, by indicating which column contains the target variable and what broad problem to solve (for example, classification or regression). From there, the AutoML platform can sweep through a range of algorithms, and hyperparameter values for each, looking for the best model, based on a selected metrics of accuracy and efficiency.

A lire également:

Unlike AutoML, which is often used by non-specialists, a feature store is designed for data scientists. The premise of a feature store is based on two important facts: (1) that a single ML model may derive its training data from multiple sources, each of which may be updated on a different cadence and (2) that some such source data may be used by more than one model. Based on this many-to-many relationship — often thought to be one-to-one — it turns out that looking at models as the smallest unit of granularity in ML operations is often incorrect. Instead, it’s the source data and the group of ML model features (input variables) that data feeds into that should be managed together, in terms of ingest, feature engineering and then perhaps propagated retraining of impacted models.

La mise en oeuvre

Databricks’ AutoML platform, which is both UI- and API-driven, goes a step further than many on the market, in that it avoids the “black box” scenario of simply taking data in, and pushing a model out. While you can use it that way, Databricks employs what it calls a “glass box” approach, where you can see the actual code used to produce the various models, and decide on the “winning” output model, just as if the work were hand-coded by a data scientist.

Databricks AutoML will put that code in a standard, editable notebook and the code will leverage the ML experimentation capabilities of MLflow, already part of the Databricks platform. This is an excellent approach that supports regulatory compliance and transparency. It also provides a good “grow up” story, where data scientists can take the AutoML code, use it as a baseline, and then develop it further. Essentially then, Databricks AutoML isn’t just a tool for non-specialists, but also a utility that can support data scientists by eliminating a lot of their time-consuming grunt work.

A lire également:

Databricks’ feature store is materialized in Delta Lake files and accessible via Delta Lake APIs. And, like the AutoML engine, the feature store is integrated with MLflow. It also integrates Shapley values for model and inference (prediction) explainability.

Both Databricks AutoML and Databricks Feature Store are part of Databricks’ strategy to build out a completely self-contained data platform with a full range of lake/lakehouse, data prep, data management, data governance, BI and AI capabilities. As many in the industry presume the company is headed for an initial public offering, it certainly makes sense that it would be looking to get all its ML and data ducks in a row.

Source: https://www.zdnet.com/article/databricks-ups-ai-ante-with-new-automl-engine-and-feature-store/#ftag=RSSbaffb68

Horodatage: 27 mai 2021

Horodatage: Le 22 juin 2021

Les CBDC ne sont pas stables, bitcoin, btc,

Hitachi Vantara acquiert l'acteur de la gouvernance des données Io-Tahoe

Cluster source:

ZDNET

Nœud source: 1856947

Horodatage: Le 28 juin 2021

Fivetran achète HVR et ajoute un financement de 565 $

Cluster source:

ZDNET

Nœud source: 1875248

Horodatage: Le 20 septembre 2021

Une plongée plus approfondie dans la migration MySQL 8.0 de Facebook

Cluster source:

ZDNET

Nœud source: 1858706

Horodatage: Le 23 juillet 2021

Republié par Platon

La relance de Data Governance et Catalog Cloud d'Informatica unifie les workflows cloisonnés

CockroachDB simplifie le déploiement multirégional

Informatica présente la gouvernance du modèle d'IA

L'état des MLOps en 2021

Microsoft annonce la disponibilité générale d'Azure SQL avec Arc

Timescale évolue et se concentre sur l'analyse

La startup d'entrepôt de données cloud Firebolt clôture un cycle de financement de série B de 127 millions de dollars

Hitachi Vantara acquiert l'acteur de la gouvernance des données Io-Tahoe

Fivetran achète HVR et ajoute un financement de 565 $

Une plongée plus approfondie dans la migration MySQL 8.0 de Facebook

À propos de nous

Recherche verticale et Ai

Plateforme

Restez à l'affût

Compte