Logo

dev-resources.site

for different kinds of informations.

Teaching Large Language Models (LLMs) to do Math Correctly

Published at
1/15/2025
Categories
llm
ai
node
typescript
Author
shayy
Categories
4 categories in total
llm
open
ai
open
node
open
typescript
open
Author
5 person written this
shayy
open
Teaching Large Language Models (LLMs) to do Math Correctly

When we use large language models, we often find they struggle with math. But there's a way to help them get better at it. We can create an AI agent and give it access to code that does mathematical evaluations. This way, we can make sure the math answers are correct. Let's look at a piece of code that does this and explain each part.

Setting Up the Code

First, we need to bring in some tools that will help our AI agent do math. We use these lines to get the tools ready:

import { generateText, tool } from "ai";
import { createOpenAI } from "@ai-sdk/openai";
import { z } from "zod";
import { evaluate } from "mathjs";
Enter fullscreen mode Exit fullscreen mode

These lines bring in the functions and libraries we need. The generateText and tool functions come from the "ai" package. We use createOpenAI from "@ai-sdk/openai" to connect to the OpenAI service. The z function comes from "zod", which helps us define what kind of data we expect. And evaluate from "mathjs" is what we use to do the actual math.

To use these, you need to install them. You can do this by running this command in your terminal:

npm install ai @ai-sdk/openai zod mathjs
Enter fullscreen mode Exit fullscreen mode

Connecting to OpenAI

Next, we set up our connection to OpenAI. We do this with this piece of code:

const openai = createOpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});
Enter fullscreen mode Exit fullscreen mode

This code creates a connection to OpenAI using an API key. The API key is stored in an environment variable called OPENAI_API_KEY. This is a secure way to use the key without putting it directly in the code.

Creating the Math Tool

Now, we create a tool that will do the math for us. We define it like this:

const math = tool({
  description: "A tool for evaluating mathematical expressions",
  parameters: z.object({
    expression: z
      .string()
      .describe(
        "The mathematical expression to evaluate in format supported by mathjs"
      ),
  }),
  execute: async (params) => {
    const result = evaluate(params.expression, { number: "BigNumber" });
    return result.toString();
  },
});
Enter fullscreen mode Exit fullscreen mode

This tool is called math. It has a description that says it's for evaluating math expressions. The parameters part tells us what kind of data the tool needs. In this case, it needs a string called expression, which is the math problem we want to solve.

The execute part is where the tool does its work. It uses the evaluate function from mathjs to solve the math problem. The result is then turned into a string and returned.

Using the Math Tool

Finally, we use the math tool in a function called main. Here's how it works:

async function main() {
  const result = await generateText({
    model: openai("gpt-4o"),
    system:
      "You are a math expert. When you are asked to evaluate a mathematical expression, use the math tool to evaluate it. Finally, once you have the result, return it as a string.",
    tools: { math },
    maxSteps: 10,
    prompt:
      "Calculate the volume of a cylinder with a radius of 5 meters and a height of 10 meters.",
  });

  console.log(`Output: ${result.text}`);
}

main();
Enter fullscreen mode Exit fullscreen mode

In this function, we tell the AI to use the gpt-4o model and act as a math expert. We give it a system message that tells it to use the math tool when it needs to solve a math problem. We also give it a prompt, which is a math problem to solve.

The AI will use the math tool to solve the problem and return the result as a string. We then print this result to the console.

Output

When we run this code, we get the following output:

Output: The volume of the cylinder is approximately 785.4 cubic meters.
Enter fullscreen mode Exit fullscreen mode

By using this code, we can make sure that our AI agent does math correctly. It uses a tool that guarantees the right answers, which is very helpful when working with LLMs.

ai Article's
30 articles in total
Artificial Intelligence (AI) focuses on creating intelligent systems that can simulate human thinking and automate decision-making processes.
Favicon
Join us for the Agent.ai Challenge: $10,000 in Prizes!
Favicon
Can I build & market a SaaS app to $100 in 1 month?
Favicon
🚨 The Dangers of Developers Relying Exclusively on AI Without Understanding Fundamental Concepts
Favicon
Announcing Powerful Devs Conference + Hack Together 2025
Favicon
Build Your First AI Application Using LlamaIndex!
Favicon
Designing for developers means designing for LLMs too
Favicon
Daily.dev's unethical software design
Favicon
Teaching Large Language Models (LLMs) to do Math Correctly
Favicon
Boost Your Productivity with Momentum Builder: A Web App to Overcome Procrastination and Track Progress
Favicon
When AI Fails, Good Documentation Saves the Day πŸ€–πŸ“š
Favicon
The Language Server Protocol - Building DBChat (Part 5)
Favicon
πŸ“βœ¨ClearText
Favicon
Habit Tracker: A Web Application to Track Your Daily Habits
Favicon
Impostor syndrome website: Copilot 1-Day Build Challenge
Favicon
How to Get the Most out of Cursor
Favicon
The Frontier of Visual AI in Medical Imaging
Favicon
5 cool things from CES for Amazon Developers (plus 4 more!)
Favicon
5 Free AI Design Tools For Designers!
Favicon
Artificial Neurons: The Heart of AI
Favicon
Amazon Product Finder
Favicon
3D models from images with local AI
Favicon
How RAG works? Retrieval Augmented Generation Explained
Favicon
The AI Advantage: Enhancing Customer Engagement through CRM
Favicon
AI-Powered Lockscreen: The Future of Mobile Intelligence Is Already Here
Favicon
POST ABOUT AI'S INCREASING INFLUENCE IN CODING
Favicon
Evolution By Sound
Favicon
Binary classification with Machine Learning: Neural Networks for classifying Chihuahuas and Muffins
Favicon
Weekly Planner - API
Favicon
Flow Networks Breakthrough: New Theory Shows Promise for Machine Learning Structure Discovery
Favicon
Breakthrough: Privacy-First AI Splits Tasks Across Devices to Match Central Model Performance

Featured ones: