With each release of O*NET-SOC AutoCoder we refine our methodologies and expand the supporting dictionaries to increase overall accuracy.
The law of diminishing returns applies of course – so as we approach the theoretical maximum accuracy rate, our gains become smaller for the same investment.
Yet with each major release we also expand flexibility – v4 added State-specific tuning, v5 added simultaneous support for multiple variants of O*NET-SOC, and v6 adds support for mixed English and Spanish search text.
This flexibility always comes at a cost to accuracy – standardizing on inputs, processes, and outputs would allow tuning that just isn’t possible when inputs (search text in mixed languages), processes (State tuning factors) and outputs (variants of O*NET-SOC) are all variables.
In the end, our investment in v6 to increase accuracy was offset by allowing mixed language search text – the same word in English can have a different meaning in Spanish.
Still, the 87.3% accuracy rate produced by v6 on a test sample far exceeds the 40.2% produced by the ITSC version 2 (their most current demo) on the same sample.
Looking beyond overall accuracy rates is important too – ideally an automated coding solution will be equally accurate for all occupational types.
In reality, when looking at O*NET-SOC, it is easier to classify some occupations than others – so accuracy scores will vary.
Healthcare (major group 29) occupations are easier to code accurately than most others because the titles tend to reflect professional licenses and/or specific training (RN, LPN, CNA, DO, OT, etc.).
In contrast, Sales occupations (major group 41) tend to be much harder – a ‘sales executive’ may refer to an upper-level manager or an entry-level retail position.
Looking at accuracy rates by two-digit occupational families, you’ll notice that v6 delivered 80% or higher for all groups; whereas in contrast, v2 did a passable job for group 29 (68% accuracy), but dismally for group 41 (21% accuracy). By the way, the ‘bubble size’ represents the number of sample records in each two-digit occupational group – the bigger the bubble, the more common the occupation in the sample.
Occupation Group 99 is a special category referring to ads that are too vague to allow accurate coding – ads that should be assigned a ‘not classified’ code. Knowing when to ‘punt’ is one of the hardest tasks for an automated coder – set the threshold too low and far too many records are assigned a code of 99; but set it too high and the coder will jam records into codes when it shouldn’t. RMWC AutoCoder v6 achieves a reasonable balance, delivering 54% accuracy for non-classifiable searches. ITSC v2 on the other hand achieved the worst possible score of 0%.