Technical

Testing Devin AI: Our First Month's Insights

Jul 4, 2024 · 2 min read

We had the chance to test Devin AI as part of the first 250 testers. Check out our first-month insights with Yavuz Alp and Eray Yapağcı.

We carried out several projects to test Devin. One of the projects involved creating a benchmark by requesting three different APIs for basic QA and creating a benchmark. This task was Devin's introduction task. Devin successfully learned API documentation through the browser.

We tested Devin's ability to develop a simple project by giving it input similar to how a non-engineer would. We asked Devin to create an online dictionary using a Turkish PDF, similar to the TDK website, and observed its planning and execution.

Devin created a plan, updated it based on our inputs, and successfully developed the project step-by-step. It worked like a software engineer, writing and organizing code, and deploying the project on Netlify, even troubleshooting issues.

We noticed some challenges with Devin reading from PDFs, so we provided a specific example ("Elmas") and Devin wrote a test to ensure the word was retrieved correctly, demonstrating its ability to plan, code, and test based on user feedback.

We also tested Devin's general capabilities by asking it to order a pizza using a single prompt that included login and credit card information. Devin planned, navigated the web, and successfully placed the order, showing advanced tool usage.

We also tried complex problems like implementing a thread library and a FAT32 File System Image Modifier with Devin. Unfortunately, Devin struggled with these tasks, indicating areas for improvement in handling more advanced projects.

Our observations over the month suggest that despite some limitations (human guide, slowness), Devin stands out as a software engineer agent. We also noted open-source projects like Princeton's SVE Agent, which serve as strong academic and community-driven alternatives.