(no title)
rockdiesel | 12 days ago
Any thoughts? Should I default to what's in the product title instead of the unit count? Not sure the best way to combat this.
rockdiesel | 12 days ago
Any thoughts? Should I default to what's in the product title instead of the unit count? Not sure the best way to combat this.
Propelloni|12 days ago
rockdiesel|12 days ago
hluska|12 days ago
Consider the top four most expensive golf balls on your current list:
TaylorMade 2021 TP5x (3+1 Box) 4DZ Golf Ball Pack, White — uses 4DZ in title, 48.0 in unit count in product specs.
Bridgestone Golf Tour B RXS Quadfecta - nothing in the title, unit count in product specs is 4.0. This one shows 4 dozen in a different spot than other balls.
TaylorMade Golf 2024 TP5 Golf Balls 3+1 Box Four Dozen — Four dozen in the title, unit count in product specs is 1.0 but it has 4.0 dozen in the same div as the Bridgestone balls.
Srixon Z Star Yellow Golf Balls - Buy 2 DZ Get 1 DZ Free — Title shows buy 2 DZ get 1 free. That’s represented as 2+1 or 3+1 in other data. In product specs it shows a unit count of 1.0.
— In that extremely limited sample, the product weight is a pretty good metric to show that the unit count is flawed though that only works in comparison to others. I wonder if you could do a multi pass approach, where you sort data first and then do a unit count versus weight check to find outliers and then start rocking through the titles? You’ll still end up digging through a lot of edge cases and that won’t be much fun but a multi pass would at least give you some insight into those weird edge cases.
rockdiesel|12 days ago
I'm thinking I could just start with any listing where unit count = 1 and take a pass at those first. I haven't looked yet, but I'm guessing single unit counts are almost always inconsistent with the actual number of golf balls.
datsci_est_2015|12 days ago
See also: toilet paper sheet count comparisons.
fultonn|12 days ago
tonygrue|12 days ago