dev-resources.site
for different kinds of informations.
Building a (somewhat) intelligent agent
This started on a Saturday night. If you are very social like me, you would know that there is no better time to do some coding than a peaceful Saturday night. So I opened up a pet project I've been working on and realized that it wasn't pushed to Github yet. I didn't remember the commands to set a remote repo and could have easily Googled or "ChatGPTed" it. But, wouldn't it be cooler to add another layer of abstraction and just tell the computer to "set this projects remote as such and such", specially in this era of intelligent agents? And wouldn't it be even cooler to build that agent?. And that's exactly what I did, instead of spending a few seconds on finding the commands to set the remote repo.
I started solving the problem backwards. I would need a way to run shell commands from a program. That's easy, the subprocess module in Python.
import subprocess
def run_shell_command(command):
try:
# Run the shell command
result = subprocess.run(command, shell=True, check=True, text=True, capture_output=True)
# Return the command output
return result.stdout
except subprocess.CalledProcessError as e:
# Return the error output if the command fails
return e.stderr
print(run_shell_command('pwd'))
Now I need a way to decide what commands to run. That's where the intelligence comes in. It needs to take the natural language input and convert them to shell commands. Large Language Models (LLMs) are good at this sort of things. So I tried the following prompt on ChatGPT.
I'm computer program created to assist a human. Currently the human is working on a DevOps task. He asked me to "ssh to the development server and install python". Please tell me what shell commands to run to do the activity. If you need more information please let me know what information you need. Please respond in the following manner and don't include anything else in the response.
Type: [Can be either "Commands" or "More information"]
Response: [List of shell commands or list of more information needed, separated by commas]
Example response 1:
Type: More information
Response: user id, key path
Example response 2:
Type: Commands
Response: ssh -i 'keyfile.pem' user1@server
It worked surprisingly well most of the time.
This was the response.
Type: More information
Response: user id, server IP or hostname, key path or password, operating system type (e.g., Ubuntu, CentOS)
And after passing the inputs, it returned the list of commands as,
Type: Commands
Response: ssh -i 'key.pem' [email protected], sudo yum install python3 -y
Not exactly production ready, but this is a promising start. On this high note I had to stop for the day one, or rather hour one since I'm no longer the young man I once was and it was already 10 PM.
A week later...
Zooming out a little bit, "how would I use this?". I would open up a project on the terminal, and type " set the remote repo for this project as " . Then the agent will ask the LLM for the commands to run. If it needs more information, it will prompt me. After getting the information, it will send them to the LLM, for which the LLM will give the commands or ask for more information. This will repeat until a command runs. If the command is successful, it will stop. But if it returns errors the agent will prompt the LLM for commands to resolve the issue. Also, with each request to the LLM , the agent will send the conversation history in window with a suitable size. This will provide the context to the LLM.
We would need to make the queries to LLM a little abstract to make the agent handle a wider range of tasks. After all, it wouldn't be very useful if its only capable of setting remote repo URLs. At the same time, we need to clearly defile its scope. In this case it would be an agent for running shell commands. To help handling a range of commands, we can parameterize the prompt. Those parameters would be,
- The natural language input from the human.
- Context: This is little tricky, I will use the conversation history for now.
- Any errors returned by running a command.
In addition to that we will have to maintain the state such as executing a command or getting more info.
Let's code it. I've changed the LLMs output to a JSON format string since it's easier to write the processing part that way.
I tested it with a few simple commands and they worked as expected.
Seems alright. Let's try another one.
That's not what I asked for. May be we need to be more specific.
That's more like it. Although I should definitely add a mechanism to verify the commands before running them. That should prevent the agent from doing something crazy. Also, explaining a command before it runs would be a good feature - but not for now.
answer = input(f" Shall I run '{command}'? (Yes/ No) ")
if answer.lower()=='yes': # Execute the command
So, it kind of works, but we need to make it easily accessible. Creating an alias did the trick. I added the following to ~/.bashrc.
alias shelly='/home/akalanka/projects/shelly/venv/bin/python3 /home/akalanka/projects/shelly/main.py'
Let's see how well "Shelly" fulfills her purpose. First I told Shelly to create the remote repo, but it did't work because it was trying to setup gh CLI tools authentication, which was too complex for a simple tool like this. So I created the remote repo and then asked to set it as the origin of the local repo, which also failed the first time. But after improving the prompt template, I asked her to correct the mistake, which actually worked.
Then I went ahead and asked her to commit and push her own code, which also was done nicely enough (ignoring the fact that she ignored the instruction about the commit message).
It's not much useful for commands I use frequently, which I remember, because it's quicker and more reliable to run the shell command directly. But for other cases this actually seem to help.
So about a week later, I was finally able to set the remote repo for the project. Great success!. What a way to spend weekend evenings!.
Obviously, a lot can be done to improve this. To start, some way of persisting the user inputs between the invocations could smooth things up. Using LangChain could be a good idea. Let me know what you think. Also feel free to check out the source code and open a PR to make it more intelligent. It could use some help. Hey, you can use the Shelly to push your feature, hopefully.
P.S. This was entirely written by a human. Absolutely no intelligence - artificial or otherwise was involved in the writing.
Featured ones: