Appraising the value of an item based on photos

I’m working on a project to appraise the dollar value of antiques. I have a FAISS database with prices and text descriptions, and just need to find the most accurate description of an image so I can run a search in the database.

Simple use case: generic brass candle holder → take image → recognizes it’s a brass candle holder → $10

For the simple use case, I just used Gemini API, and it correctly recognized what the image was.

Advanced use case: branded brass candle holder

In this case, Gemini API failed to recognize the brand, and classified it as a generic brass candle holder.

I’ve added a tool which calls Google Lens. It was able to recognize the brand, and from the database I got its real price → $100

Pro use case: Japanese vase

Google Lens (incorrectly) recognized the vase as being from the Meiji period, potentially worth up to $500. I then uploaded 5 photos to Gemini Chat (Pro), showing all sides including the bottom. Gemini discovered that the creators used “tube-lined technique” to mimic the look of traditional Meiji-period vases, and it’s a fake, probably worth $50.


So this is where I am right now. I tried both Gemini API and Google Lens, but I’m still not able to properly distinguish a Japanese antique vase from a fake. I also wonder how come Gemini Chat managed to recognize it, but Gemini API didn’t, despite using the same model.

I’m thinking that maybe I should create a loop:

  1. Gemini API appraises the images
  2. Google Lens appraises the “best” image (since it can only do one)
  3. The two are combined as input to a 3rd agent that does quality control

The problem is, the QC agent cannot just blindly pick the “cheaper” evaluation, because then I’ll miss out on actual rare finds.

Anyone has an idea how best to create a system which gives an expert advice on the value of antiques? Is there maybe a dataset which explains what exactly to look for - eg. “if it’s a Japanese vase, check the image to see if it’s clay or metal, etc” and then I could feed that to the LLM?

I also wonder how come Gemini Chat managed to recognize it, but Gemini API didn’t, despite using the same model.

While Gemini x.x and GPT x.x are the models themselves—such as LLMs, VLMs, or multimodal LLMs—Gemini Chat and ChatGPT are Agentic RAGs (or more advanced services), so their behavior differs significantly.

In some cases, Gemini Chat and ChatGPT automatically even create and execute Python scripts within the cloud runtime, process the results, and—if necessary—call other Gemini or GPT models… essentially building and using a fairly complex pipeline like this internally on their own to provide the final answer based on those results. Seriously.

It’s quite difficult to replicate all of that as-is (though it is possible since there are OSS frameworks for that), and because it’s fully automated, the results are prone to variation (to the point where even slight differences in usage can lead to performance differences…), so it’s better to leave that for later.

It’s probably more realistic to start by building a pipeline that mimics only the parts that are important for your own project. Also, that approach is likely to be lighter:


Evidence-based photo appraisal for antiques

Gemini API + Lens + FAISS/RAG + OCR + expert-style QC

The key shift:

Generic resale items: photo appraisal is mostly object recognition.

Antiques: photo appraisal is hypothesis testing.

A simple item can work like this:

photo → “generic brass candle holder” → similar records → ~$10

A difficult antique should not work like this:

photo → “Meiji vase” → similar records → ~$500

For antiques, the valuable label is usually what must be proven. A vase can look like a Meiji-period Japanese vase while actually being a later decorative reproduction, Satsuma-style imitation, tube-lined revival piece, tourist/export ware, or seller-mislabeled object.

A better target pipeline:

photos
→ visible evidence
→ competing hypotheses
→ positive + negative retrieval
→ expert rubric checks
→ comp filtering
→ QC / skeptic review
→ supported value + upside-if-authenticated

Not:

photo
→ one best caption
→ FAISS text search
→ average price

1. Why Gemini Chat likely beat the Gemini API call

The likely reason is not simply “Gemini Chat used a better model.”

More likely:

Gemini model = the engine.
Gemini Chat / Gemini Apps = finished assistant product around the engine.
Gemini API = programmable access to the engine/tools, but you build the assistant behavior.

Gemini Apps support uploaded files/photos/videos in a chat workflow: Gemini Apps file upload docs. The Gemini API also supports image understanding, but you must explicitly send the right images, label them, prompt correctly, and attach tools/retrieval: Gemini image understanding.

In the vase case, Gemini Chat saw multiple views including the bottom. For ceramics, the base, foot rim, underside wear, mark, glaze/body transition, decoration detail, and restoration clues can matter more than the front view.

So the API probably failed because it behaved like a one-shot captioner. Gemini Chat succeeded because it behaved more like a multi-photo inspection assistant.

To match or beat Chat, replicate:

  • multi-photo context
  • labeled views
  • detail crops
  • OCR on marks/labels
  • visual search as candidate discovery
  • RAG over rubrics and prior cases
  • structured JSON outputs
  • tool calling
  • comp filtering
  • QC gates

Useful Gemini building blocks:


2. Three use cases, three routes

A. Generic brass candle holder

Mostly object recognition plus broad comp retrieval.

photo → “brass candle holder” → FAISS/text/image comps → ~$10

Route:

cheap Gemini call or OSS VLM
→ object type
→ visible material
→ broad database search
→ simple value range

Example:

Likely identity:
Generic brass candle holder.

Supported resale estimate:
~$8–15, depending on size, condition, and local marketplace.

Confidence:
Medium-high if no maker, designer, age, or unusual quality indicators are visible.

B. Branded brass candle holder

This is exact-identity / maker recognition.

Gemini may see:

brass candle holder

Google Lens may find:

specific branded brass candle holder

That is where visual search helps. Google Lens discovers visually similar images and related content from an image: Google Lens: how it works.

Route:

full photo + mark/logo crop
→ OCR if needed
→ Google Lens / visual search
→ exact or near-exact comp retrieval
→ database price lookup
→ Gemini reconciles evidence

For branded objects, the best Lens image is often:

logo
maker mark
label
pattern number
base stamp
distinctive design detail

Example:

Likely identity:
[Brand/model] brass candle holder.

Evidence:
Visible maker/brand clue + visual-search match + matching database records.

Supported estimate:
~$100, assuming same model, material, size, and condition.

C. Japanese vase

This is authentication / attribution / comp-validity.

Dangerous near-neighbors:

Meiji-period Satsuma vase
Meiji-style vase
Satsuma-style decorative vase
moriage tourist ware
tube-lined imitation
modern decorative reproduction
Chinese/Japanese-style decorative ceramic

Google Lens may surface the expensive visual hypothesis:

Meiji-period Japanese vase → maybe ~$500

But multi-photo evidence may support:

tube-lined or revival technique used to mimic older Meiji-period appearance
→ likely later/revival/reproduction
→ maybe ~$50

The QC question is not:

Which answer is cheaper?

It is:

Which hypothesis is best supported by visible evidence?
What contradicts the expensive hypothesis?
What evidence is missing?
Which sold comps actually match the supported hypothesis?
What upside remains if the item is later authenticated?

3. Recommended architecture

1. Photo intake / sufficiency gate
2. Risk routing
3. Crop and detail extraction
4. OCR and mark interpretation
5. Visual evidence extraction
6. Lens / visual-search candidate discovery
7. Multimodal retrieval
8. Negative-example retrieval
9. Category-specific rubric checks
10. Comparable-sales filtering
11. QC / skeptic review
12. Final appraisal report

Role separation:

Component Correct role Incorrect role
Gemini API Evidence extraction, hypotheses, rubric reasoning, comp filtering, QC, report One-shot appraiser
Google Lens Candidate labels, visually similar web/listing discovery Authenticator or price authority
OCR Read marks, labels, stamps, signatures Maker/authenticity proof by itself
FAISS/Qdrant/vector DB Retrieve positive comps, negatives, marks, details Final price calculator
Rubric/RAG Tell model what to check per category Generic background only
QC agent Block unsupported claims, preserve upside, decide escalation Pick cheaper answer

4. Photo sufficiency gate

For Japanese ceramics, require:

front view
back view
left and right side views
top / mouth / interior
bottom / base
foot rim close-up
mark / backstamp close-up
decoration macro
damage / restoration close-ups
scale photo

Internal schema:

{
  "object_category_guess": "Japanese ceramic vase",
  "sufficient_for_generic_identification": true,
  "sufficient_for_authentication": false,
  "missing_required_views": [
    "clear bottom/base photo",
    "foot rim close-up",
    "legible mark/backstamp close-up",
    "decoration macro"
  ],
  "valuation_allowed": "low_confidence_only",
  "blocked_claims": [
    "Meiji-period attribution",
    "verified maker",
    "high-confidence high-value appraisal"
  ]
}

Hard rule:

If category = Japanese ceramic vase
and no clear base/foot/mark views are present,
then block high-confidence period and maker claims.

5. Route by risk

Route A: generic low-risk item
Route B: branded / marked / exact-product item
Route C: antique / authenticity-sensitive item

Route A

Gemini object ID
→ broad text/image database search
→ simple comp estimate

Route B

crop mark/logo
→ OCR
→ Lens / eBay / visual search
→ exact or near-exact comps
→ value estimate

eBay’s Browse API includes search by image for product-like discovery: eBay Browse API: search by image.

Route C

multi-photo intake
→ detail crops
→ OCR
→ positive retrieval
→ negative retrieval
→ category rubric
→ comp filtering
→ QC
→ supported value + upside scenario

6. Crop decisive details

Full-object views identify the broad object. Detail crops contain appraisal evidence.

For ceramics, crop:

base
foot rim
mark / backstamp
decoration detail
glaze/body transition
top/interior
damage/restoration
scale reference

Useful tools:

  • Florence-2 for captioning/detection/segmentation-style tasks.
  • Grounding DINO for text-prompted region detection.
  • SAM / SAM 2 for segmentation after detection.
  • YOLO variants if you train fixed appraisal-detail categories.

The cropper is not the appraiser. It ensures the model sees the same details a human specialist would inspect.


7. OCR and mark interpretation

A mark can be:

maker mark
workshop mark
retailer mark
import mark
country-of-origin mark
pattern number
decorative mark
apocryphal mark
fake mark
later-added label

Split mark handling:

1. detect mark region
2. transcribe mark
3. classify mark type
4. interpret appraisal implication

OCR candidates:

Example:

{
  "ocr_text": "MADE IN JAPAN",
  "mark_type": "country_of_origin_mark",
  "appraisal_implication": "If original to the object, this conflicts with a 19th-century Meiji-period attribution.",
  "confidence": "medium",
  "needs_human_check": false
}

Never jump from:

mark visible

to:

maker verified

without corroboration.


8. Extract visible evidence, not final value

Bad prompt:

What is this item and what is it worth?

Better prompt:

You are not appraising yet.

Extract visible evidence only.
Separate visible facts, uncertain observations, interpretations, missing views, and risk flags.
Do not state period, maker, authenticity, or value as fact unless directly supported.
Do not generate a final price.

Example structured output:

{
  "visible_facts": [
    {
      "fact": "The object is a ceramic vase form.",
      "source_image": "front_view",
      "confidence": 0.92
    },
    {
      "fact": "Raised decorative outlines are visible around motifs.",
      "source_image": "decoration_macro",
      "confidence": 0.86
    }
  ],
  "uncertain_observations": [
    {
      "observation": "Raised decoration may be tube-lined, molded, or applied.",
      "needed_evidence": "macro photo under angled light"
    }
  ],
  "missing_evidence": [
    "clear base photo",
    "foot rim macro",
    "legible mark close-up",
    "measurements",
    "condition close-ups",
    "provenance"
  ],
  "risk_flags": [
    "period_style_mismatch_possible",
    "reproduction_possible",
    "visual_similarity_not_authentication"
  ]
}

Use structured outputs for schema validation: Gemini structured outputs.


9. Lens as candidate discovery

Good representation:

{
  "lens_candidates": [
    "Meiji Satsuma vase",
    "Japanese moriage vase",
    "Satsuma-style decorative vase",
    "Japanese export pottery"
  ],
  "status": "candidate_discovery_only",
  "warning": "Visual similarity does not establish period, maker, authenticity, condition, or value."
}

Rule:

Lens creates hypotheses.
Rubrics and comps test hypotheses.
QC decides whether claims are allowed, blocked, upside-only, or need expert review.

10. Add multimodal retrieval

Avoid the bottleneck:

image → Gemini description → FAISS text search

If the description misses “tube-lined imitation,” “moriage tourist ware,” “Satsuma-style reproduction,” or a mark clue, the right records may never be searched.

Add:

full image → similar object images
base crop → similar bases / foot rims
mark crop → similar marks
decoration crop → similar techniques
text query → similar descriptions
negative query → similar reproductions

Gemini Embedding 2 supports cross-modal retrieval: Gemini embeddings.

Use multiple indexes:

full_object_image_index
text_description_index
mark_crop_index
base_crop_index
foot_rim_index
decoration_detail_index
damage_detail_index
negative_example_index
auction_catalogue_page_index

If FAISS is enough, keep it. If metadata filtering becomes painful, consider Qdrant.

Example metadata filters:

{
  "object_type": "vase",
  "material": "ceramic",
  "sale_status": "sold",
  "source_type": "auction_result",
  "has_base_photo": true,
  "condition_known": true,
  "period_claim": "Meiji-style",
  "attribution_strength": "seller_claim | auction_house | specialist | authenticated"
}

11. Add negative retrieval

Most systems retrieve only positives:

Meiji Satsuma vase
Japanese vase
antique ceramic vase

Also retrieve lower-value confusables:

modern Satsuma-style reproduction
tube-lined imitation vase
moriage tourist ware
Meiji-style decorative vase
fake/apocryphal mark
Japanese-style ceramic reproduction
Chinese/Japanese-style decorative ceramic

Ask:

What lower-value confusable class could explain the same visual evidence?

Then compare:

Does the base match period examples or reproduction examples?
Does the decoration look hand-applied or mechanically uniform?
Does the mark support maker/period or merely style/import?
Do sold comps match the same material, technique, size, condition, and attribution strength?

QC should choose the best-supported hypothesis, not the cheaper one.


12. Use category-specific rubrics

There is no single perfect “all antiques” dataset. But cultural-heritage work shows the right pattern: expert-defined visual questions.

Useful references:

Start rubrics for:

Japanese / Chinese ceramics
silver vs silverplate
brass / bronze / resin
paintings vs prints
signed glass
furniture / joinery
designer decor
jewelry

Example rubric:

{
  "category": "Japanese ceramic vase",
  "required_views": [
    "front",
    "back",
    "sides",
    "top/interior",
    "bottom/base",
    "foot rim close-up",
    "mark close-up",
    "decoration macro",
    "damage close-ups",
    "scale photo"
  ],
  "attributes_to_check": [
    "object type",
    "material/body",
    "glaze",
    "decoration technique",
    "raised decoration method",
    "mark type",
    "foot rim",
    "wear pattern",
    "condition",
    "restoration",
    "period vs style",
    "maker attribution strength"
  ],
  "high_value_claims": [
    {
      "claim": "Meiji-period Satsuma vase",
      "required_evidence": [
        "period-consistent base and foot rim",
        "period-consistent decoration technique",
        "credible mark or provenance",
        "no modern country-of-origin/import mark",
        "matching sold comps from reliable sources",
        "condition sufficiently documented"
      ]
    }
  ],
  "common_false_positives": [
    "modern Satsuma-style decorative ware",
    "tube-lined imitation",
    "moriage tourist ware",
    "Chinese/Japanese style confusion",
    "seller-labeled Meiji without evidence",
    "apocryphal or decorative marks"
  ],
  "blocked_without_evidence": [
    "authentic Meiji-period",
    "verified maker",
    "museum-quality",
    "rare signed workshop piece"
  ],
  "safe_language": [
    "Satsuma-style",
    "Japanese-style",
    "unverified age",
    "decorative ceramic vase",
    "possibly later/revival"
  ]
}

Store rubrics, known reproductions, expert notes, and prior corrections in RAG/File Search: Gemini File Search.


13. Filter comps aggressively

A visually similar listing is not necessarily a valid comp.

A valid comp should match:

object type
material
technique
size
period/style
maker/attribution strength
condition
sale status
sale venue
source quality
photo completeness
sale date

Examples/inspiration:

Comp decision schema:

{
  "comp_id": "lot_123",
  "include_in_valuation": false,
  "reason": "Rejected: visually similar but active listing only; no sold price; no base photo; period attribution unsupported.",
  "matched_fields": [
    "object_type",
    "broad_style"
  ],
  "missing_or_mismatched_fields": [
    "sale_status",
    "period",
    "technique",
    "condition",
    "attribution_strength",
    "base_photo"
  ]
}

Accepted comp schema:

{
  "comp_id": "lot_456",
  "include_in_valuation": true,
  "reason": "Accepted: sold result, similar object type, similar size, similar later Satsuma-style decorative category, unsigned, comparable condition.",
  "adjustments": [
    "condition report incomplete",
    "size within acceptable range",
    "no verified maker, matching current item"
  ]
}

Do not average all retrieved prices.

Good report line:

Retrieved 12 visually similar records.
Rejected 8 as invalid comps.
Used 4 closer comps for supported valuation.

For reranking after retrieval, consider BGE-reranker-v2-m3.


14. QC with claim permissions

Classify claims:

allowed
blocked
upside-only
needs-human-review

Examples:

{
  "claim": "Meiji-period Satsuma vase",
  "status": "upside_only",
  "reason": "Visual style is suggestive, but base/foot/mark/technique evidence is insufficient.",
  "required_next_evidence": [
    "clear base photo",
    "foot rim macro",
    "legible mark close-up",
    "decoration macro under angled light",
    "matching specialist sold comps"
  ]
}
{
  "claim": "Japanese Satsuma-style decorative vase",
  "status": "allowed",
  "reason": "Supported by broad visual vocabulary and decoration style, while avoiding unsupported period authentication."
}
{
  "claim": "authentic Meiji-period vase worth $500",
  "status": "blocked",
  "reason": "Current evidence does not establish period, maker, or technique strongly enough."
}

15. Supported value vs upside scenario

Do not output one number. Output:

supported value
conservative resale value
upside if authenticated

Example:

Supported value:
$40–80 as a later Satsuma-style decorative ceramic vase.

Conservative resale value:
$40–60 if listed honestly as unverified age/style only.

Upside scenario:
Potentially much higher if authenticated as Meiji-period or a signed workshop piece, but that requires stronger evidence: base, foot rim, mark, technique, condition, and specialist comps.

Current claim permission:
Do not list as authentic Meiji-period.

This avoids:

always choose cheap → miss rare finds
accept expensive visual match → overvalue fakes

Recommended final report format

Likely supported identity:
Japanese Satsuma-style decorative ceramic vase, likely later/revival rather than verified Meiji-period.

Supported value:
$40–80, assuming no major hidden damage.

Conservative resale value:
$40–60 if listed honestly as unverified age/style only.

Upside scenario:
Could be materially higher if authenticated as Meiji-period or a signed workshop piece, but current evidence does not support that claim.

Evidence supporting the supported identity:
- Japanese/Satsuma-style decorative vocabulary.
- Raised decorative outlines visible.
- Current photos do not verify period or maker.
- Technique may indicate later/revival/tube-lined imitation.

Evidence against the high-value Meiji claim:
- Visual similarity alone is not authentication.
- Mark/base/foot evidence is insufficient.
- Decoration technique needs closer verification.
- Valid comps must match size, material, technique, condition, and attribution.

Missing evidence:
- Sharp base photo.
- Foot rim macro.
- Legible mark close-up.
- Decoration macro under angled light.
- Measurements.
- Provenance or prior auction record.

Accepted comps:
- [comp IDs + reason]

Rejected comps:
- [comp IDs + reason]

Safe listing title:
Japanese Satsuma-style decorative ceramic vase, raised decoration, unverified age.

Do not list as:
Authentic Meiji-period Satsuma vase.

Confidence:
Medium-low from photos only.

Escalation:
Human expert review recommended if the user wants to sell, insure, purchase, donate, or consign based on the high-value scenario.

Human expert fallback

AI can triage and produce evidence reports, but human review is needed for high-stakes decisions. Human appraisal services use photo/info intake plus expert review, not one-shot photo guessing: ValueMyStuff: how it works. Appraisal guidance also increasingly treats generative AI as a tool that still requires professional judgment: Appraisal Foundation / USPAP / AO-41.

Use expert review when:

price spread across hypotheses is large
upside exceeds threshold
mark is visible but unclear
period/maker claim drives value
condition/restoration is uncertain
provenance is claimed
user wants insurance/tax/estate/sale support

Store expert corrections as future training/evaluation data.


Suggested stack

Gemini-centered core

Gemini image understanding
Gemini structured outputs
Gemini function calling
Gemini File Search / RAG
Gemini Embedding 2
Google Search grounding
URL Context

OSS / preprocessing

PaddleOCR / PaddleOCR-VL for marks and labels
Florence-2 / Grounding DINO / SAM / YOLO for crops
FAISS or Qdrant for vector retrieval
BM25 + dense embeddings for hybrid text search
BGE rerankers for comp filtering
Qwen3-VL / InternVL as optional OSS VLM baselines
Qwen3-VL-Embedding / Jina CLIP / CLIP / SigLIP as retrieval alternatives

External candidate discovery

Google Lens wrapper
eBay image search
Google Vision Web Detection
auction/sold-price databases

Workflow frameworks

Google ADK if you want Gemini-native agent orchestration
LangGraph if you want deterministic gates and human review
LlamaIndex if your RAG layer grows large

See also: Google ADK.


Build order

Phase 1 — Make Gemini API match Gemini Chat

5–10 labeled photos
base/foot/mark/detail views
appraisal-specific prompt
structured output
hypotheses + evidence + missing evidence

Goal:

Gemini API should reproduce the “tube-lined imitation / likely later decorative” insight when given the same evidence.

Phase 2 — Add crop + OCR

base crop
foot rim crop
mark crop
decoration crop
damage crop
OCR result
mark interpretation

Phase 3 — Add multimodal retrieval

full object
text description
mark crop
base crop
foot rim
decoration detail
negative examples
auction catalogue pages

Phase 4 — Add category rubrics

Japanese ceramics
silver vs silverplate
brass/bronze/resin
paintings vs prints
branded/decorator objects

Phase 5 — Add comp filtering

Reject:

active listings as primary comps
wrong material
wrong size
wrong period
wrong technique
wrong condition
signed vs unsigned mismatch
seller-label-only attribution
no base photo
no condition report

Phase 6 — Add QC and claim permissions

allowed
blocked
upside-only
needs-human-review

Phase 7 — Build a private benchmark

Create at least 100 labeled cases:

generic low-value items
branded/maker-mark items
Japanese/Asian ceramic confusables
OCR/mark-heavy items
damaged/restored items

Track:

unsupported high-value claim rate
reproduction detection
mark-reading accuracy
missing-photo detection
negative retrieval success
comp-filter precision
value-range overlap
confidence calibration
escalation accuracy
safe-title accuracy

Main metric:

Does the system avoid unsupported expensive claims while preserving rare-find upside?

Curated links

Gemini / Google API

Visual search

Cultural heritage / appraisal-style research

Market data / appraisal practice

OSS / infrastructure


Final answer

Your simple case is object recognition. Your branded case is exact-identity search. Your Japanese vase case is authenticity reasoning.

So the system should be three-tiered:

Generic item:
Gemini API + simple FAISS retrieval.

Branded item:
Gemini + OCR + Lens / image search + exact comps.

Antique item:
Gemini + multi-photo evidence + crops/OCR + Lens candidates + multimodal retrieval + negative examples + expert rubrics + comp filtering + QC gates.

The winning architecture is not:

Gemini vs Google Lens

It is:

Gemini for reasoning
+ Lens for candidate discovery
+ OCR/crops for evidence
+ multimodal retrieval for comps
+ negative examples for fake/reproduction detection
+ expert rubrics for domain checks
+ sold-comp filtering for valuation
+ QC gates for safe claims
+ human escalation for high-risk cases

That architecture can match or surpass Gemini Chat because it replicates what made the chat product useful—multi-photo context, conversational inspection, and tool-like behavior—while adding what Gemini Chat does not have by default: your private price database, negative-example corpus, category rubrics, comp filters, structured outputs, and deterministic appraisal safeguards.

That’s really helpful, thank you.

using photos to estimate an item’s value definitely has potential, especially with how far computer vision has come. That said, accuracy can vary a lot depending on image quality, item condition, and how well the model can match it with real market data. Tools in this space usually combine visual recognition with comparable sales data to give a rough estimate rather than an exact price , so I think combining this with additional metadata (like brand, age, or condition details) could really improve reliability. Overall, it’s a solid idea, just needs careful handling of edge cases.

I agree. It’s best not to rely on it for final decisions. Fakers and scammers are aware that such systems exist, so they will always try to outsmart them.

It’s important to use it solely as an aid in filtering information.

Well, this applies to generative AI outputs in general…

With current technology, humans are still better at making final judgments—provided they have the expertise to do so.