AI Assisted Data Acquisition - C-Systems & Obvious Leap

Phase 1 - January 2026

Results

all models json csv
whole goods json csv
parts json csv

C# Source

source code ZIP

Presentation on LLM's Approximation & Data Staleness

presentation PDF

Final Step

C-Systems selects their preferred model. (gemini flash 2.5?)

Contract

Deliverables

  1. Presentation explaining reality of cost per item based on different models, AI model with and without search.
  2. CSV of results from the models specified + manually scraped benchmark data
  3. Evaluation of the estimated cost for each model
  4. Code interacting with the model’s APIs
  5. Proposed prompts, which you can tune; for each model

Prepayment Bill

Bill PDF - Paid

Delivery Bill

PDF, online


Phase 2 - March 2026

Optmized Prompt

I want information about the {part} in JSON format with separate sections for : - product name - brand name - product code - msrp in USD (lowest price, only pick one, number only) - product category - short description - long description - shipping weight (one property only as number, convert to pounds) - packaged dimensions (round to closer integer, numbers only convert to inches, seperate in individual properties: length, width, height) - compatible accessories (with product code) - technical specifications (list of strings, in point form) (all of them) - power source - voltage (if power source is electric, as integer) - fuel tank capacity (if power source isn't electric, with units as string) - product images Convert every unit of measure in imperial standards, only provide the JSON, nothing else. If something isn't found, omit it entirely. All keys in the json are in underscore format. Use web search for precise information. No hallucinations.

SERPER

Pricing Information comes to 0.001$ per request. However there needs to be some filtering on the results.

Unfiltered Results

whole goods images are ok. One thing I notice is I need to find similar images, and keep the highest resolution ones.
This is something to try sample openCV matching code

Steps

Once the prefered model has been selected, there will be a prompt optimization for parts and whole goods. Namely to remove 'commentary' text generated by the LLM, and make sure all expected data is present.

For product images we need to test different alternatives to keep costs low. I've identified a few.

Once that is done, proper integration of data into the data catalog.