Introduction

Navigating the selection of foundation models in Amazon Bedrock can be complex, but with the right tools and insights, it becomes a streamlined process. Whether you’re a developer aiming to deploy generative AI applications or an enterprise looking to harness the power of AI, understanding how to effectively evaluate and compare models is crucial. This guide offers a detailed walkthrough on utilizing Amazon Bedrock’s Model Evaluation features, both automatic and human, to determine the best foundation models for your needs.

Automatic Model Evaluation in Amazon Bedrock

Getting Started with Automatic Evaluation

Automatic evaluation simplifies the process of assessing model performance by using predefined metrics and datasets. To begin, access the Amazon Bedrock console and navigate to ‘Model evaluation’ under the ‘Assessment & deployment’ section. Here, you can create a new model evaluation project by selecting the ‘Automatic’ option.

Configuring Your Evaluation:

Once you initiate a new evaluation, a setup dialog will guide you through selecting the foundation model (FM) and the specific task you want to evaluate, such as text summarization. You’ll then choose your evaluation metrics and specify a dataset. Amazon Bedrock allows the use of built-in datasets or your own data in JSON Lines format, tailored to assess different model dimensions:

JSON:

{"referenceResponse":"Cantal","category":"Capitals","prompt":"Aurillac is the capital of"}
{"referenceResponse":"Bamiyan Province","category":"Capitals","prompt":"Bamiyan city is the capital of"}
{"referenceResponse":"Abkhazia","category":"Capitals","prompt":"Sokhumi is the capital of"}
...

Reviewing Evaluation Results

After configuring and running the evaluation job, you can review the model’s performance in a detailed report provided by Amazon Bedrock. This report helps you understand the strengths and weaknesses of the model across different tasks and metrics.

Human Model Evaluation in Amazon Bedrock

Setting Up Human Evaluations

For evaluations that require a subjective assessment or custom metrics like relevance to brand voice, Amazon Bedrock facilitates human evaluation workflows. You can initiate this by creating a new model evaluation and choosing either ‘Human: Bring your own team’ or ‘Human: AWS managed team’.

Customizing Human Evaluation Workflows

If you opt for an AWS managed team, you will need to provide details about the task type, required expertise, and contact information. AWS experts will then collaborate with you to tailor a project that meets your specific needs. Conversely, if you use your own team, you’ll follow similar steps for setup but manage the evaluators directly.

Data Format and Reporting

For human evaluations, data should again be formatted in JSON Lines, albeit with optional fields like ‘category’ and ‘referenceResponse’. Here’s how you might set it up:

JSON

{"prompt":"Aurillac is the capital of","referenceResponse":"Cantal","category":"Capitals"}
{"prompt":"Bamiyan city is the capital of","referenceResponse":"Bamiyan Province","category":"Capitals"}
{"prompt":"Senftenberg is the capital of","referenceResponse":"Oberspreewald-Lausitz","category":"Capitals"}

Upon completion, Amazon Bedrock generates a comprehensive evaluation report detailing the performance of the model against your selected metrics.

Important Considerations

Model Support and Pricing

During the preview phase, you can evaluate text-based large language models (LLMs) with either one model per automatic evaluation job or up to two models for each human evaluation. Pricing is based only on the model inference during this phase, with detailed rates available on the Amazon Bedrock Pricing page.

Availability of regions

Explore the capabilities of automatic and human evaluations by joining the public preview available in AWS Regions US East (N. Virginia) and US West (Oregon).

Partnering with Origo for Foundation Model Evaluation

Origo stands out as the ideal AWS partner to assist customers in evaluating various foundation models within AWS. With a team of experts who possess the right skills and deep knowledge of AWS services, Origo ensures that not only large enterprises but also professional and small offices seeking back-office support can benefit from tailored AI solutions. No matter the size of your business, Origo is equipped to guide you through the complex landscape of foundation models, helping you optimize your AI strategies effectively.

Conclusion

Through the powerful tools and features offered by Amazon Bedrock, businesses have the capability to refine and perfect their use of AI, backed by both automatic and human evaluation methods. Origo enhances this process by providing expert guidance tailored to each customer’s unique needs. Whether you’re just beginning to explore AI capabilities or looking to deepen existing applications, Amazon Bedrock, together with Origo’s expertise, provides a robust foundation for achieving superior model performance and alignment with business goals.

For more information, contact us at info@origo.ec