Shreya Rajpal on Guardrails for Large Language Models

07 Feb 2024 · 2 years ago

Guardrails AI

Guardrails AI is an open-source framework that enhances the reliability and safety of large language model (LLM) applications.
It acts as a "firewall" around LLMs, checking inputs and outputs for correctness and filtering out unreliable or harmful content.
Guardrails AI employs various techniques, including few-shot prompting, validators, and machine learning models, to ensure correctness.
It operates as a "side car" that runs alongside the LLM, checking prompts before they are sent and outputs before they are returned to the application.
Correctness criteria can be customized to meet specific user needs.

Common applications of Guardrails AI include chatbots, structured data extraction, resume screening, healthcare support chatbots, and contract analysis.

Implementing Guardrails AI involves additional compute and latency, but it can be configured to balance risk mitigation with performance.
Guardrails AI supports customizability, allowing users to create their own guardrails and integrate with different language models.

Recent updates to Guardrails AI include improved logging, visibility, and stability.
Guardrails AI employs a "reasing" paradigm, where incorrect outputs are sent back to the language model for self-correction.
Reasing optimizes the prompt to focus on correcting only the incorrect parts of the output, reducing computational costs.

The speaker emphasizes the importance of understanding the specific correctness criteria for the application and configuring Guardrails AI accordingly, rather than relying solely on structured prompts.
Prompts do not guarantee correct outputs due to the non-deterministic nature of LMs and the possibility of incorrect responses despite specified instructions.
Safer prompts can be used to prime the LM for more accurate responses, but verification is necessary to ensure that the desired conditions are respected.
Verification systems are crucial to ensure that the conditions the user cares about are adhered to.