WhyLabs launches LangKit to make large language models safe and accountable

WhyLabs launches LangKit to make large language models safe and accountable

Join top executives in San Francisco July 11-12 to hear how leaders are integrating and optimizing AI investments for success. Learn more

WhyLabs, a Seattle-based startup providing monitoring tools for data and artificial intelligence applications, today announced the release of LangKit, an open source technology that helps enterprises monitor and safeguard their large language models (LLMs). ). LangKit enables users to detect and prevent risks and problems in LLMs, such as toxic language, data loss, hallucinations and jailbreaks.

WhyLabs cofounder and CEO Alessya Visnjic told VentureBeat in an exclusive interview ahead of today’s launch that the product is designed to help companies monitor how their AI systems are working and detect problems before they affect customers or users.

“LangKit is the culmination of the types of metrics that are essential for LLM models to track,” he said. “Essentially what we’ve done is we’ve taken this huge range of popular metrics that our clients have used to track LLMs, and we’ve packaged them into LangKit.”

Meet rapidly evolving LLM standards

LangKit is based on two fundamental principles: open source and extensibility. Visnjic believes that by leveraging the open source community and building a highly extensible platform, WhyLabs can keep pace with the evolving AI landscape and meet diverse customer needs, particularly in industries such as healthcare and fintech. which have higher safety standards.


Transform 2023

Join us in San Francisco July 11-12, where top executives will share how they integrated and optimized AI investments for success and avoided common pitfalls.

subscribe now

Some of the metrics provided by LangKit include sentiment analysis, toxicity detection, topic mining, text quality rating, personally identifiable information (PII) detection, and jailbreak detection. These metrics can help users validate and safeguard individual prompts and responses, assess compliance of LLM behavior with policies, monitor user interactions within an LLM-based application, and run A/B tests on different versions of LLM and prompts.

Visnjic says LangKit is relatively easy to use and integrates with several popular platforms and frameworks, such as OpenAI GPT-4, Hugging Face Transformers, AWS Boto3, and others. Users can get started with a few lines of Python code and leverage the platform to track metrics over time and set up alerts and guardrails. Users can also customize and extend LangKit with their own models and metrics to fit their specific use cases.

Early adopters praised the solution’s out-of-the-box metrics, ease of use, and plug-and-play features, according to Visnjic. These capabilities have proven especially valuable to stakeholders in regulated industries, as LangKit provides understandable insights into language patterns, enabling more accessible conversations about technology.

An emerging market for AI monitoring

Visnjic said LangKit is based on feedback and collaboration from WhyLabs customers, ranging from Fortune 100 companies to early AI startups across various industries. He said LangKit helps them gain visibility and control over their LLMs in production.

“With LangKit, what they’re able to do is run a kind of very specialized LLM integration test, where they specify a set of prompts as a golden set of prompts, that their model should be good at answering. And then they run this golden set of prompts whenever they make small changes to the model itself or some of the engineering aspects of the prompt,” Visnjic explained.

LangKit early adopters include Symbl.AI AND Tryolab, both of which provided valuable feedback to help refine the product. Tryolabs, a company focused on helping enterprises adopt large language models, offers insights from a variety of use cases. Symbl.AI, on the other hand, is a prototype client using LangKit to monitor their LLM-based application in production.

“In their case (Symbl.AI), they have an LLM-based application, it’s running in production, they have customers interacting with it. And would they like to have that transparency about how it’s going? How is it performing over time? And they would like to have the possibility to install protective barriers,” Visnjic said.

Model tracking built for business

LangKit is specifically designed to handle high-throughput, real-time, and automated systems that require a variety of metrics and alerts to track LLM behavior and performance. Unlike the embed-based approach commonly used for LLM monitoring and evaluation, LangKit uses a metrics-based approach that is more suited to scalable and operational use cases.

“When you’re dealing with high-throughput systems in production, you have to look at the metrics,” Visnjic said. “You have to figure out what types of signals you want to track or potentially have a really broad range of signals. So you want these metrics pulled, you want some sort of baseline, and you want it to be tracked over time with as much automation as possible.

LangKit will be integrated into WhyLabs’ AI observability platformwhich also offers solutions for monitoring other types of AI applications, such as embeds, model performance, and unstructured data drift.

WhyLabs was founded in 2020 by former Amazon Machine Learning engineers and is backed by Andrew Ng’s AI Fund, Madrona Venture Group, Defy Partners and Bezos Expeditions. The company was also incubated at the Allen Institute for Artificial Intelligence (AI2).

LangKit is available today as a file open source library on GitHub and as a SaaS solution on the WhyLabs website. Users can also check out a demo notebook and overview video to learn more about LangKit’s features and capabilities.

VentureBeat’s mission it is to be a digital city square for technical decision makers to gain insights into transformative business technology and transactions. Discover our Briefings.