robot and man at home 2025 03 24 23 15 10 utc 1500

Human Touch to Test Automation with LLMs?

27.03.2025

Introduction

As the landscape of software testing evolves, leveraging AI-based solutions has become an increasingly attractive prospect. Concepts of utilizing AI for test automation have still been quite limited, merely partial optimizations for self-healing and code or test data generation. Our R&D team embarked on a project to explore how Large Language Models (LLMs) can be used to automate web application testing. This blog post outlines our findings, challenges, and potential implications for the future of test automation with LLMs.

Why Use LLMs for Testing?

Traditional test automation tools excel at executing predefined scripts, but they lack the ability to adapt dynamically, make suggestions, and work with unstructured input. LLMs, on the other hand, bring human-like reasoning capabilities that can:

Propose a plan for executing a test scenario based on context and requirements.
Make dynamic decisions during the test case progress.
Interpret unstructured formats such as images and human-readable text.
Adapt to interfaces dynamically and simulate human-like interactions with applications.

Simply put, LLMs possess skills closely aligned with those of human testers, particularly in exploratory and high-level test case generation.

Development & Architecture

Our journey began with simple experiments: feeding webpage source code into an LLM and analyzing its responses. However, this approach had limitations, as the model struggled with structured comprehension of complex webpages and hallucination often diverted the test execution far away from the original task specification.

To enhance its capabilities, we turned to research on multimodal LLMs for web browsing and adopted a Planner-Observer Pattern. This allowed us to:

Divide testing tasks into smaller steps.
Introduce a planning module that decides what actions to take.
Utilize an observer module to interpret feedback from the web application.

Key Components:

Web Interaction Layer: Utilizes tools such as Selenium and ChromeDriver to interact with web applications.
LLM Processing: Uses OpenAI’s GPT-4o model to generate test cases and interpret responses.
Business logic: Connects the different LLM actors together and prompts them in a correct way to create useful results.

Challenges & Learnings

LLM hallucinations:
- LLMs sometimes generate incorrect but confident responses, which is problematic for deterministic test automation.
- In real life use, this could be mitigated by incorporating human validation and possibly running multiple attempts before finalizing test cases.
Performance limitations:
- Real-time web browsing using LLMs is currently slow, requiring optimizations for practical usability.
- We introduced preprocessing steps and speed enhancements in our demo.
Captchas & restrictions:
- Some web applications restrict automated interactions (e.g., CAPTCHA challenges prevent bots from selecting certain UI elements).
- Modern apps host also mouse hover, drag or touch related interactions. AI requires suitable integrations and observability features in order to access them.

Practical Applications

One promising approach involves using LLMs to generate traditional test automation cases for later re-use rather than executing them always in the “smart mode”. For instance:

Test case generation:
- The LLM records test steps (e.g., clicked elements, inputs) and converts them into reusable test scripts.
- These scripts can then be executed independently, reducing reliance on the LLM during routine test runs.
Regression testing support:
- When test scripts fail due to UI changes, the LLM can assist in regenerating or fixing the test automation code.

Future Directions

Although this approach is still experimental, it holds huge potential. Future improvements could include:

Exploring local LLM models: Reducing dependency on API-based models to minimize costs and enhance data privacy.
Enhancing speed & determinism: Improving LLM response consistency and reducing latency.
Expanding beyond web applications: Initial experiments suggest potential use cases in Windows applications as well.

Conclusion

LLMs introduce new possibilities in the realm of software testing by offering dynamic test generation, exploratory testing assistance, and automated script maintenance. While challenges remain, hybrid approaches that combine LLM-powered insights with traditional automation tools could redefine how we approach software testing.

We look forward to refining these ideas further and welcome collaboration from the testing and AI communities. Stay tuned for more updates as we continue our journey in AI-assisted software testing!

For more information, please contact

lassi niemisto

Lassi Niemistö

Head of DevOps, Quality and Security Solutions Segment

+35810 277 5054

lassi.niemisto@wapice.com