Coris AI is building the modern risk infrastructure for payment processors. We’re excited to announce Merchant Real Industry, a classification model that uses GPT-4 to automatically determine a merchant’s MCC and NAICS codes with >90% accuracy.
The problem: Manual merchant industry classification
Payment processors play a pivotal role in powering purchases from online and brick-and-mortar merchants, but many of these organizations still rely on manual processes for merchant underwriting.
A key part of the merchant underwriting process is determining a merchant’s Merchant Category Code (MCC). For certain purposes, determining the merchant’s North American Industry Classification System (NAICS) code also becomes important.
Accurate industry classification of merchants is important for payment processors to manage risk, comply with regulations, and provide efficient services. It helps payment processors pre-qualify potential merchants, identify prohibited or high-risk businesses, and monitor for fraudulent activities.
However, determining a merchant’s MCC and NAICS code classification can be challenging. There are various factors that go into determining these codes: the product/service sold, mode of sale (in-person vs. online), method of sale (one-time or recurring), whether there is a free trial, etc.
“Historically, proper at-scale industry classification of businesses has been a very difficult problem to solve for data scientists working in payments. Prevailing systems have not worked well for a variety of reasons including the diverse nature of businesses, hard-to-get sources of sufficiently accurate business information, and extremely imbalanced training sets.”, said Michael Maze, who most recently headed the Seller Risk Data Science team at PayPal.
With our new model, Coris AI is taking the guesswork out of the industry classification process and empowering risk practitioners to work more efficiently.
Coris AI’s model
We built our own MCC and NAICS code classification models based on available small business data such as data from merchants’ websites, online reviews, and other third party online sources. Based on our testing, our model can predict a merchant’s MCC and NAICS codes in a few seconds.
How we built using GPT-4
Once OpenAI’s GPT-4 API became available last month, we started building our model on top of GPT-4. With our decades of experience building merchant risk tools, we were well aware that this will take more than using the latest Large Language Model (LLM) out there. At a high-level, our automated industry classification process is outlined below:
- Collect basic merchant information from customers, and determine their online presence in various platforms and governmental data.
- Apply proprietary entity resolution techniques to validate and verify the accuracy of the information.
- Identify the merchant’s website (if available) and scrape relevant content for industry classification.
- Leverage additional information from online platforms to gain insights into the merchant’s industry and business activities.
- Run the collected data through our classification model to determine the MCC and NAICS codes.
- Apply predefined thresholds to ensure reliable and accurate classification results.
Based on our testing, we have seen the results from this model to be over 90% accurate in classifying a business into the right MCC code and NAICS code that a trained human being would approve of. While we are early, we recently demoed our model at an industry conference and received extremely positive responses from customers and industry experts alike.
While there is a lot of hype surrounding AI and LLMs such as GPT-4, we believe there is true potential in deploying this technology in specific areas of the merchant risk management process. In the case of industry classification, there is a host of contextual information about a merchant from various sources.
LLMs such as GPT and LLaMA have shown to excel at capturing semantic meaning through embeddings, which are numerical representations that convey the essence of a sentence’s meaning. These embeddings, along with the capabilities of LLMs, offer a natural solution to the problem at hand as they have proven their ability to grasp concepts and meaning, even when expressed in diverse ways using natural language. We don’t believe LLMs will be the right approach for every problem we are looking to solve for our customers, but they certainly proved the best for this problem.
By accurately and automatically classifying merchants, payment processors and anyone who manages SMB risk can profoundly improve their risk management practices. Customers can access these attributes through our developer-first APIs or with a file upload in our portal. In the coming months, our team at Coris is excited to roll out several new products and features that will further improve our customers’ risk management processes.
If this is of interest to you or have any feedback, please contact us through our website.