With the rise of LLMs and other AI tools, it’s given that many people would use them for automating tasks. One major interest for this is code or command generation with LLMs and in this article, we’ll look at how effective LLMs are at completing a given task when given full access to the user’s terminal.
⚠️ Warning⚠️: Giving an LLM full access to your terminal may result in data loss or system corruption. It is up to you to judge if the risk is worth it.
TLDR: Here’s a sort of proof of concept
Model
Now that the legal stuff is over let’s start with what we’ll need. An ideal model will be one with a large context window, tuned for command generation, relatively fast, and possibly having an internet search ability in case it runs into something unknown.
Unfortunately, a perfect model does not exist (yet) so I’ll be going with the second best option. Cohere provides their Command-R-Plus model for free and it fits most of the requirements. It’s got a massive 128k token context limit, internet abilities, and is quite fast.
Structure
The goal is to have the LLM interact with the terminal like you would. This means it runs commands one by one and tries to fix errors when they occur. It should also fill out any required inputs and know when it’s completed the task.
Testing
To get all this working I created a Python program to handle sending the LLM the necessary information and running the received commands. I then devised a few unscientific test tasks to see how well the AI performs.
Task 1: Look for the directory test-dir. If it exists, create a file inside called test.txt with the contents “test”. If it doesn’t exist, create test.txt in the current folder.
This task gave mixed results. The AI started by looking for test-dir. When it existed it did the right thing and created test.txt with “test” inside. However, when test-dir didn’t exist, the AI decided to create test-dir and place test.txt inside instead of just placing test.txt in the current folder.
Task 2: Install Bettercap.
I tried this on 2 distros, the first one had Bettercap available from the package manager, and the AI successfully installed it. The second distro did not have Bettercap available, so the AI tried to clone the Bettercap repo and build it. This is where it failed as instead of using the built-in build script it kept trying to use make which failed. It was also unable to input data when the program asked (like when apt asks for a y/n) but that was due to a limitation of the code.
Conclusion
LLMs are not quite ready to take full control of a terminal. They can perform simple tasks but more complicated, multistep ones are too difficult. I may have gotten better results with paid models like GPT-4 but I wanted this to be available for everyone. Feel free to post your suggestions or improvements to the code on the GitHub repo.
Leave a Reply
You must be logged in to post a comment.