Chat AI
Learnings from deploying AI too early…
When OpenAI announced they were opening up public access to their GPT models, we on the Innovation Team at DataSnipper saw a real opportunity.
DataSnipper’s core function is simple: it extracts data from structured sources - Invoices, bank statements, tax documents - and seamlessly links it into Excel, making life easier for financial professionals. But we wanted to go further. We saw how AI could help us move beyond just structured data and dive into the messier, unstructured world.
My Product Manager and I adopted a new continuous discovery framework to develop our understanding of our users perspectives on AI implementation.
From September to December we held interviews with clients from around the world covering all verticals. Each time we came out of interviews we had a variety of suggestions, requests and general ideas for how DataSnipper users wanted to utilise AI.
The issue we quickly found out was that AI was the new craze and was promising the world.
We had requests from ‘I want it to do my entire audit’ to ‘It’d just be nice if it could summarise things for me’. We had to filter these requests out to understand where clients were spending the most amount of time. Where could natural language AI actually be applied appropriately.
The Problem
One of the major docs that is supplied to Auditors are various forms of Agreements. Lengthy contracts that are written in complex legalese and contain multiple hidden clauses.
Part of the job of the auditor is understanding these docs to extract specific information.
The Search & Snip feature is already a hugely powerful tool to any modern DataSnipper user. Through keyword search, users can find the areas of their documents to then pin-point information. However, while this can get someone in the ballpark, it can still take a lot of reading to find explicit information. This route also requires being adept with keywords and knowing surrounding content.
Overall this way though fast for the power user, can be slow and manual for the uninitiated.
So how might we bring explicit, niche information, direct to our users?
Chat AI
Natural Language Queries, backed by Snips.
Starting with text…
When we began this project and a implementation into DataSnipper, we determined that the best way forward would be to build a simple POC.
Allow users to ‘chat’ to their documents to find answers. We’d then pull out the more specific details from the answer and present it to them as a Snippable value.
With little needed in terms of designs, we cracked on with the POC. As soon as we had something tangible we showcased a demo of what we were able to achieve in less than a month.
This demo led management to ask us to bring it into production before years end…
Road to production…
The original POC was built in WPF and was fraught with bugs and issues. Some technical constraints would have to be fixed on the engineering side, whilst others would have to be solved by product.
Rate limiting
Fast becoming a known issue (and ever expanding) rate limits on asking questions strongly inhibited users from carrying on meaningful interaction with the AI.
Repeated poor questions asked to complex long documents would slow the Azure environment before crashing.
Limited to one doc
By far the biggest constraint is that users would only be limited to the document they were currently viewing to ask questions.
They could not compare evidence across docs and would have to manually switch docs to ask further questions.
Blank box syndrome
When we first handed our POC to a few select users, we watched from behind the scenes as they sat at their keyboard.
“What should I type?” ”What do I need from this document?” ”How do I phrase this question?”
These were very common responses. Without prompting our users and having to much room to think they sat staring blankly at the text input and opted not to ask targeted questions but rather to ask random questions to the AI.
Introducing the Usage Ring
To avoid rate limiting issues, we needed a way to present users with a visual representation of their remaining questions/prompts.
As a user continues their conversation with the AI, the ring depletes. At certain stages, a warning will show to prompt them to refresh the chat.
This kept limits low and encouraged them to keep their prompts precise and direct.
❌ In the end, most users ignored the ring, opting to send one single prompt at a time before switching docs entirely (doing this would cause the chat to refresh anyway). This in turn would then cause confusion and frustration in the lack of context and history.
Reducing conversation
In another attempt to reduce rate limits and long context, we worked on reducing chat capabilities and viewed the input more as a extended search.
1 query, 1 answer at a time.
By not showing past prompts, we aimed to get users to be more precise and targeted with their questioning. Mimicking the behaviour of a search engine.
❌ While this visually worked, once again the lack of context continued to trip users up who disliked not having an easy history available to them to reference when expanding prompts and searching for answers.
Introducing the Prompts
To help mitigate the issues surrounding Blank Box Syndrome, we wanted to introduce a series of simple prompts to the AI, to get a user started.
These simple chips are a well recognised staple of AI UX now. A simple menu of predetermined questions that a user could select based on their document.
❌ These didn’t work due to the variety of documents that were being questioned. Though AI generated, they were generic enough that a user would require to sift through them to find an appropriate prompt for their document.
This ultimately went against the timesaving's involved in providing prompts.
Our testing was to get the POC and eventually the MVP out to select clients as soon as possible.
We succeeded in achieving a stable enough version for testers to install and provided them with materials. As part of the testing agreement they would have their prompts monitored alongside filling out a diary study during a one month period with Chat AI.
We provided a simple Word doc that they could relay their feedback in as well as direct access to my PM and myself.
Each week we would prompt them with a challenge to complete and see results. We’d then receive the data on the backend and the feedback in quick summary calls users booked with us.
What we couldn’t fix…
Ultimately it all came down to technical constraints and cost. As soon as we released for testing, the immediate feedback was surrounding the one-doc capabilities. This severely slowed down the users when context switching. The engineers worked in the background to mitigate this issues but to no avail. It would be too expensive to send a single prompt across multiple documents, it would also be difficult to send multiple prompts across a single document. This is the barrier we could not overcome.
It was decided that Chat AI would not go any further after testing was complete. We stopped production and wrapped testing before announcing to the company that’d we’d be focusing on other work.
It’s not all bad news. It’s never fun watching a project get iceboxed, there is always learnings to take away.
- AI is not a second thought - We built Chat AI with the want and desire to explore the technology. We were working backwards to find the problem with a technological solution in hand. Whilst AI is still playing a major role in our solutions, we now choose to look at our AI Principles before choosing the path to take.
- Don’t promise the world - This goes for both users and stakeholders. AI was peak craze and everyone wanted it to do everything. This provided our users with bias and expectations going into testing. As for stakeholders, we wanted to push forward and drive innovation at DataSnipper. This meant committing to unrealistic deadlines and unachievable goals.
What happened next…
We returned to our discovery methodology and begin to define our AI Product Principles. Alongside my PM and EM, we determined we needed guiding principles to ensure when building AI Products, we would not stray away from what our users really wanted and what we as a company stand for.
- Traceable - It always leads back to the source.
- Actionable - Allow users to work with the output.
- Supportive - It’s not doing everything for you, but it’s helping you get there.
- Specific - Deployment of AI is not a blanket solution, apply it to targeted problem spaces.
DocuMine is the first product built from the ground up with these principles in mind. Along the route we have always checked in with them to ask “is this aligning?” before making decisions. We are scheduled for release in November and a larger case study on some of the challenges we faced with this product is incoming!