Comet partners with Snowflake to improve the reproducibility of machine learning datasets

Comet partners with Snowflake to improve the reproducibility of machine learning datasets

Join top executives in San Francisco July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Learn more

MLOps platform Comet today announced a strategic partnership with Snowflake which aims to introduce innovative solutions that enable data scientists to build superior machine learning (ML) models at an accelerated pace, strengthening data-driven decision-making.

The company said the collaboration will integrate Comet’s solutions into Snowflake’s unified platform, allowing developers to track and modify their Snowflake queries and datasets within their Snowflake environment.

Comet expects this integration will facilitate model lineage and performance tracking, providing better visibility and understanding of the development process and the influence of data changes on model performance. Leveraging data from Snowflake, customers can now benefit from a streamlined and transparent model development process.

Faster model training, deployment and monitoring

Snowflake’s Data Cloud and Comet’s ML platform combined will enable customers around the world to build, train, deploy and monitor models much faster, according to the companies.


Transform 2023

Join us in San Francisco July 11-12, where top executives will share how they integrated and optimized AI investments for success and avoided common pitfalls.

subscribe now

“Furthermore, this partnership fosters a feedback loop between model development at Comet and data management at Snowflake,” Comet CEO Gideon Mendels told VentureBeat.

This cycle can continually improve models and bridge the gap between experimentation and implementation while delivering the key promise of machine learning: the ability to learn and adapt over time. Clear versioning between datasets and models can enable organizations to define actionable steps to address data changes and their impact on models in production.

Comet’s new offering follows the recent release of a suite of tools and integrations designed to accelerate workflows for data scientists working with large language models (LLMs).

Improve ML models through constant feedback

When data scientists or developers run queries to pull datasets from Snowflake for their ML models, Comet can record, version, and directly link these queries to the resulting models.

Mendels said this approach offers several benefits, including increased reproducibility, collaboration, auditability, and iterative improvement.

“The integration between Comet and Snowflake aims to provide a more robust, transparent and efficient framework for ML development by enabling tracking and versioning of Snowflake queries and datasets within Snowflake itself,” he explained. “By versioning SQL queries and datasets, data scientists can always figure out the exact version of data used to train a specific version of the model. This is crucial for the reproducibility of the model”.

Linking changes in model performance to data alterations

In ML, training data is just as important as the model itself. Data alterations, such as introducing new features, fixing missing values, or changes in data distributions, can profoundly affect the performance of a model.

The company says that by tracing a model’s lineage, it becomes possible to establish a connection between changes in model performance and specific alterations in the data. This not only helps in debugging and performance understanding, but drives data quality and feature engineering.

Mendels said tracking queries and data over time can create a feedback loop that drives continuous improvements in both the data management and model development stages.

“Model lineage can facilitate collaboration among a team of data scientists, as it allows anyone to understand the history of a model and how it was developed without the need for extensive documentation,” Mendels said. “This is particularly useful when team members leave or when new members join the team, allowing for seamless knowledge transfer.”

What is the future of Comet?

The company says customers who use Comet, such as Uber, Etsy and Shopify, typically report a 70% to 80% improvement in their machine learning speed.

“This is due to faster research cycles, the ability to understand model performance and detect problems faster, better collaboration, and more,” Mendels said. “With the joint solution, this should increase even more as there are still challenges today in connecting the two systems. Customers save on input and consumption costs by keeping data within Snowflake instead of transferring it over the wire and saving it to other locations.”

Mendels said Comet aims to establish itself as the de facto AI development platform.

“Our view is that companies will see real value from AI only after they implement these models based on their own data,” he said. “Whether they’re training from scratch, fine-tuning an OSS model, or using context input in ChatGPT, Comet’s mandate is to make this process seamless and bridge the gap between research and production.”

VentureBeat’s mission it is to be a digital city square for technical decision makers to gain insights into transformative business technology and transactions. Discover our Briefings.