Quality, Feedback, and Consistency - Path to 95%+ Accuracy

Understanding the impact of Quality, Feedback, and Consistency in AI-driven customer support

Deepak Singla

In August 2023, one of the world's largest tech company approached us to integrate GenAI and automate their most frequent support queries. Their existing automation system was struggling, with the match between their Ideal Answers and AI answers (developed internally) below 50%, and support costs soaring to an all-time high. Together, we began to address the problem, diving deep into the root causes.

As our collaboration progressed, we automated over 60% of originating support queries within 4 weeks. The company was able to reimagine its customer support ecosystem, ultimately reducing overall support costs by a staggering 40% and scaling our solutions to 10 countries.

The Magic Formula: Quality, Feedback, and Consistency

Quality

One of the most relevant sayings I've heard from my managers during analytics projects is ‘Garbage In, Garbage Out’. AI is no magic either; we had to ensure the models were being trained on high-quality and comprehensive data.

We worked hand-in-hand with the client to comprehensively identify the different types of problems customers could be facing and the relevant artifacts containing this information. Our proprietary models captured that information from the client's databases (multimodal) across various platforms - web links, APIs, files and G-drive, making the data AI training-ready.

This foundation helped us rapidly jump from a sub-par 50% accuracy rate to an 85% accuracy rate. But our journey had just begun!

Feedback

As LLMs absorb information to form their responses, they struggle with three key issues: Bad Prompt, Bad RAG, and Hallucination. A strong quality assurance loop helps keep these issues in check, and we pride ourselves on our in-house QA framework. Our commitment to continuous improvement and reinforcement learning drives us to push these boundaries further every day.

Many engineers resort to using Automatic Evaluations as a stop-gap solution, but we believe that at the end of the day, an LLM can never replace the expertise of a trained human. However, with the right human-feedback loop, we can empower the LLM to provide accurate guidance to customers.

In our journey with the client, we initially used spreadsheets to track versions of our internal AI bot, prompts, and accuracy measurements across the thorough Question & Answer datasets we created. However, this quickly became unwieldy as we started to iterate. To solve this issue across the industry, we released a fully Open Source package (Paramount) to help clients' expert agents measure and pinpoint accuracy issues with AI chat. This tool speeds up the discovery of these issues by 10x.

‍

For the Techies ⚙️: Our Open Source package, Paramount (stars and PRs welcome!), works as a decorator and records your AI functions in Python into either CSV or a database. After this, you can use the CLI/Docker to launch a UI where your expert agents can easily determine if chat recordings were accurate and provide corrections.

‍

Consistency

Now that the LLMs are learning from well-organized data sources and formalized channels for QA, it was time to roll up our sleeves—iterate consistently and improve! We followed a high-touch model where we had dedicated account managers and AI engineers ensure the model meets the client’s unique escalations and needs.

We kept a strong analytical hold over the model’s accuracy on various components such as:

Customer Satisfaction: Resolution rate of the model.
Resolution Time: Active TAT for driving the customer’s query to resolution.
User and Response Sentiment: Ensuring the AI's communication style matches user preferences, making interactions more engaging and comfortable, whether professional, casual, or playful.

We also leveraged the AI model to segment issues into business-relevant clusters and tracked the parameters across them. This helped us drive targeted action, from improving data coverage to rebalancing model parameters. Iterating this process consistently helped us crack the ceiling needed to deliver an exceptional resolution rate.

‍The Result

Starting with an 85% accuracy rate in the first month, we fine-tuned our AI models and implemented a rigorous QA process to exceed 95% accuracy. Our efforts also led to the client doubling the scope of the work contract value within twelve months of establishing relations, demonstrating the impact of quality, empathy, and consistency in customer support.

Learn More

If you’re interested in enhancing your user support with high-quality, seamless, and consistent solutions, visit www.usefini.com to learn more about how we can help.

Case Studies

View all →

Case Studies

One Size Doesn’t Fit All: How We Scaled Support Excellence for DistroKid

Feb 17, 2025

Case Studies

Competing Priorities of Cost and Quality - When Column Tax Required Both, Fini was the Solution

Feb 6, 2025

Case Studies

How Qogita Scaled E-commerce Support with 88% Ticket Resolution Using Fini

Feb 6, 2025

Deepak Singla

Co-founder

Deepak is the co-founder of Fini. Deepak leads Fini’s product strategy, and the mission to maximize engagement and retention of customers for tech companies around the world. Originally from India, Deepak graduated from IIT Delhi where he received a Bachelor degree in Mechanical Engineering, and a minor degree in Business Management