How does GPT-4-powered merchant industry classification compare to a risk analyst?

April 18, 2024

It’s been exactly one year since we launched Merchant Real Industry. It’s the only tool using LLMs to automatically classify an SMB’s industry classification (MCC & six-digit NAICS codes) based on website analysis, online reputation, and other proprietary data.

Many customers have used Merchant Real Industry to auto-classify their SMB customers with high accuracy and speed. Customers have specifically told us that Merchant Real Industry delivers a more accurate reflection of a business’s true industry than existing approaches.

We wanted to verify these results, so we ran an experiment comparing the results from Merchant Real Industry against the current gold standard for industry classification: manual risk analysis.

TL;DR: Merchant Real Industry’s results matched 90% of the results from manual analysis, at a fraction of the time (minutes vs. hours). This result is consistent with our customers’ results. 

Read on to learn more about our experiment, and reach out to try Merchant Real Industry today. 

Our approach

We obtained NAICS code data from the Small Business Administration’s Paycheck Protection Program (PPP). PPP data is one of the few public data sources containing details on SMBs, including their NAICS codes, so it was a reliable data set for our experiment. 

To assess the performance of Merchant Real Industry, we structured our evaluation around two methods: a comparative analysis against PPP data, and an independent QA test by three risk analysts:

  1. Comparative analysis: We first compared NAICS codes classified by Merchant Real Industry against NAICS classification in the Paycheck Protection Program (PPP) database.
  2. Objective QA testing: Following the comparative analysis, we invited three independent industry experts to perform a blind test to identify the correct NAICS code. 

Our findings

Comparative analysis

Our first step was to compare the NAICS codes generated by Coris with those listed in the PPP data. We know that a high match rate with PPP does not necessarily indicate accuracy.

We found the following:

  • 70% match rate at the 2-digit level, capturing the broad sector of the economy
  • 50% match rate at the 4-digit level, detailing more specific subsectors

The variance across the match rates underlines how difficult it is to precisely classify a business’s industry. Based on our industry experience, businesses tend to be more accurately bucketed in broad NAICS codes categories (at the 2-digit level) vs. more granular levels. This makes intuitive sense: there are 1,012 six-digit NAICS codes, so there is a lot of room for individual discretion on the exact NAICS code a business falls into. 

Independent QA Test

As previously mentioned, match rates alone don’t tell you enough about industry classification. There could be inaccuracies or inconsistencies within the control data set (PPP data).

We enlisted three independent analysts to manually classify businesses blindly. Analysts agreed with Coris’s NAICS codes in 90% of the cases, consistent with our ongoing findings with customers. 

When we compared the speed of classification results, we noticed that Merchant Real Industry took a few seconds to auto-generate the NAICS code for a business, while a risk analyst took a few minutes. These time savings are significant when analyzing a large SMB data set. 

We’re encouraged by the results of this experiment, and see them as a vote of confidence for our vision to build the future of SMB risk intelligence.

Recent improvements

We’ve recently improved Merchant Real Industry to provide even greater value to customers. 

Now, Merchant Real Industry shares the top three most likely NAICS and MCC codes for each business, accompanied by confidence scores for each. This recognizes the complex nature of industry classification and provides more nuanced insights.

By automatically and accurately classifying businesses, anyone who manages SMB risk – from fintechs to software platforms and more – can improve their risk management practices more effectively. Customers can access these attributes through our developer-first APIs or our no-code portal.

Learn more

In addition to NAICS code classification, Coris already supports industry classification for MCC and is continuing to invest in this area with new enhancements coming soon.

Get in touch if you’d like to use this feature or learn more.

Wrapping Up

We hope this guide is helpful for getting started with the OS1 and Google Cartographer. We’re looking forward to seeing everything that you build. If you have more questions please visit forum.ouster.at or check out our online resources.

This was originally posted on Wil Selby’s blog: https://www.wilselby.com/2019/06/ouster-os-1-lidar-and-google-cartographer-integration/

Related Resources

How does GPT-4-powered merchant industry classification compare to a risk analyst?

April 18, 2024

It’s been exactly one year since we launched Merchant Real Industry. It’s the only tool using LLMs to automatically classify an SMB’s industry classification (MCC & six-digit NAICS codes) based on website analysis, online reputation, and other proprietary data.

Many customers have used Merchant Real Industry to auto-classify their SMB customers with high accuracy and speed. Customers have specifically told us that Merchant Real Industry delivers a more accurate reflection of a business’s true industry than existing approaches.

We wanted to verify these results, so we ran an experiment comparing the results from Merchant Real Industry against the current gold standard for industry classification: manual risk analysis.

TL;DR: Merchant Real Industry’s results matched 90% of the results from manual analysis, at a fraction of the time (minutes vs. hours). This result is consistent with our customers’ results. 

Read on to learn more about our experiment, and reach out to try Merchant Real Industry today. 

Our approach

We obtained NAICS code data from the Small Business Administration’s Paycheck Protection Program (PPP). PPP data is one of the few public data sources containing details on SMBs, including their NAICS codes, so it was a reliable data set for our experiment. 

To assess the performance of Merchant Real Industry, we structured our evaluation around two methods: a comparative analysis against PPP data, and an independent QA test by three risk analysts:

  1. Comparative analysis: We first compared NAICS codes classified by Merchant Real Industry against NAICS classification in the Paycheck Protection Program (PPP) database.
  2. Objective QA testing: Following the comparative analysis, we invited three independent industry experts to perform a blind test to identify the correct NAICS code. 

Our findings

Comparative analysis

Our first step was to compare the NAICS codes generated by Coris with those listed in the PPP data. We know that a high match rate with PPP does not necessarily indicate accuracy.

We found the following:

  • 70% match rate at the 2-digit level, capturing the broad sector of the economy
  • 50% match rate at the 4-digit level, detailing more specific subsectors

The variance across the match rates underlines how difficult it is to precisely classify a business’s industry. Based on our industry experience, businesses tend to be more accurately bucketed in broad NAICS codes categories (at the 2-digit level) vs. more granular levels. This makes intuitive sense: there are 1,012 six-digit NAICS codes, so there is a lot of room for individual discretion on the exact NAICS code a business falls into. 

Independent QA Test

As previously mentioned, match rates alone don’t tell you enough about industry classification. There could be inaccuracies or inconsistencies within the control data set (PPP data).

We enlisted three independent analysts to manually classify businesses blindly. Analysts agreed with Coris’s NAICS codes in 90% of the cases, consistent with our ongoing findings with customers. 

When we compared the speed of classification results, we noticed that Merchant Real Industry took a few seconds to auto-generate the NAICS code for a business, while a risk analyst took a few minutes. These time savings are significant when analyzing a large SMB data set. 

We’re encouraged by the results of this experiment, and see them as a vote of confidence for our vision to build the future of SMB risk intelligence.

Recent improvements

We’ve recently improved Merchant Real Industry to provide even greater value to customers. 

Now, Merchant Real Industry shares the top three most likely NAICS and MCC codes for each business, accompanied by confidence scores for each. This recognizes the complex nature of industry classification and provides more nuanced insights.

By automatically and accurately classifying businesses, anyone who manages SMB risk – from fintechs to software platforms and more – can improve their risk management practices more effectively. Customers can access these attributes through our developer-first APIs or our no-code portal.

Learn more

In addition to NAICS code classification, Coris already supports industry classification for MCC and is continuing to invest in this area with new enhancements coming soon.

Get in touch if you’d like to use this feature or learn more.