← Back to Blog
Research

GEO Field Notes 01: From Black Box to White Box—Quantifying AI's Brand Sentiment

Answerank Team
10 min read

Today marks the official launch of a new series—GEO (Generative Engine Optimization) Field Training. As an AI Marketing Agency, we help organizations optimize their digital footprint in the AI world every day. The biggest pain point we face: Large language models are Schrödinger's cat. Want to know if ChatGPT likes your brand? Previously, we could only 'guess' and conduct 'manual testing.' To monitor effectiveness, my team had to do massive amounts of inefficient repetitive work. To solve this, I developed an internal tool: just input a URL, and it generates a comprehensive GEO health report with one click. But this isn't just a tool showcase. Today, I want to take you deep into the 'underlying logic'—sharing how we record, deconstruct, and quantify LLM feedback in practice.

Key Takeaways

Traditional AI brand monitoring is blind testing: random questions, eyeball observation, subjective judgment with no data foundation

Three-step quantification framework: Prompt Matrix (20+ scenarios), NLP Sentiment Analysis (adjective clouds, hallucination detection), Citation Source Analysis

Standardized measurement transforms vague impressions into actionable data: appearance rate, sentiment score, citation sources

Real case: SaaS brand improved ChatGPT mention rate from 0% to 40% in 2 weeks by optimizing G2 Crowd presence based on citation analysis

The Pain Point: How We Used to 'Feel Our Way in the Dark'

Before developing the tool, reaching a conclusion like 'GPT-4 has a positive view of our brand' was an extremely painful and unrigorous process:

Random Questioning: Ops team randomly thinks up questions like 'recommend some good coffee machines.'

Eyeball Observation: See if the answer mentions us.

Subjective Judgment: 'Feels like' the AI said something decent.

The problem with this approach: Sample size too small, no documented evidence.

Large language models give completely different answers under different contexts (Context) and personas (Persona). A single manual test tells you nothing about actual AI positioning.

Example of the Chaos:

  • Team Member A asks 'best project management tools' → Brand mentioned
  • Team Member B asks 'project management software for startups' → Brand not mentioned
  • Conclusion? We had no idea. Was it phrasing? Timing? Model version? Pure randomness?

This 'blind testing' approach consumed 15-20 hours per client per month, with results that were neither reproducible nor actionable.

Practical Breakdown: The Calculation Logic Behind a GEO Report

To make GEO measurable, I designed a standardized 'evidence collection process' in the tool. When we say 'your brand's GEO score is 80,' it actually went through rigorous calculation across three steps:

Step 1: Build a 'Prompt Matrix' for Stress Testing

We can't ask just one question. To reach objective conclusions, the tool automatically generates 20+ prompts across 3 dimensions for comprehensive bombardment:

Scenario A (Direct Inquiry): 'How is [Brand X]?' 'Is [Product X] worth buying?' — Tests AI's direct brand awareness.

Scenario B (Category Recommendation): 'Recommend the best SaaS tools for startups in 2024' 'Most cost-effective Bluetooth headphones?' — Tests AI's natural recommendation ranking.

Scenario C (Competitive Comparison): 'Brand A vs Brand B, which is better?' — Tests AI's comparative preference.

Detail Recording: The tool fully captures raw text from ChatGPT, Claude, Perplexity, Gemini when answering these questions. Even subtle wording differences get logged.

Why 20+ Prompts? Because LLMs are non-deterministic. One answer proves nothing. Twenty answers reveal patterns.

Step 2: NLP Semantic Analysis and 'Sentiment Scoring'

After capturing raw text, we don't need humans to read it. The tool backend uses NLP (Natural Language Processing) to 'dissect' dozens of responses, extracting key data:

Mention Position (Ranking Position): Is your brand recommended in the first sentence, or buried as the fifth 'other option'? (Weight differs dramatically)

Adjective Cloud: When AI mentions you, are high-frequency words 'expensive,' 'complex,' or 'innovative,' 'efficient'?

Hallucination Detection: Does AI claim you have features you don't? This isn't just an error—it's a risk point GEO needs to fix.

Conclusion Derivation: Based on the above data, we calculate 'Sentiment Score.' If AI uses transitional words like 'but,' 'although,' points get deducted accordingly.

Sentiment Scoring Formula:

  • Strong Positive: 'Industry-leading,' 'highly recommended' → +10 points
  • Neutral Positive: 'Good option,' 'worth considering' → +5 points
  • Neutral: Listed without commentary → 0 points
  • Neutral Negative: 'Limited features,' 'lacking support' → -5 points
  • Strong Negative: 'Not recommended,' 'reported issues' → -10 points

Step 3: Source Attribution Analysis (This is the Critical Detail)

GEO's core lies in 'citations.' Why does Perplexity or SearchGPT recommend you? Because they cited specific web pages.

Our tool reverse-engineers all Citations (reference links) in LLM responses:

Record Sources: Is it a Reddit post? TechCrunch coverage? Or a PDF from your official site?

Analyze Weights: We discovered that content cited by high-authority media (like Forbes) is more easily adopted by LLMs as 'fact.'

Citation Source Hierarchy:

1Tier 1 (Highest Trust)

Forbes, TechCrunch, WSJ, academic papers

2Tier 2 (High Trust)

G2 Crowd, Capterra, industry blogs

3Tier 3 (Medium Trust)

Company blogs, Medium articles

4Tier 4 (Low Trust)

Social media posts, forums

This explains why some brands with objectively inferior products still get recommended—they've captured Tier 1/2 citation sources.

Real Case: What the Data Told Us

Through this automated recording and calculation system, we recently made a precise diagnosis for a SaaS client.

Manual Era: Client felt 'AI doesn't seem to mention us much.'

Tool-Generated Detailed Conclusion:

Primary Issue: In 'category recommendation' prompts, ChatGPT ignored the brand 100% of the time.

Root Cause via Source Analysis: By analyzing competitor Citations, we found competitors appeared extensively in 'G2 Crowd' and 'Capterra' comparison articles—data sources ChatGPT heavily trusts.

Actionable Guidance: We didn't need to blindly write advertorials. We needed to concentrate firepower on optimizing the G2 Crowd review page.

Results: After just two weeks of adjustment, the brand's appearance rate in identical prompts increased from 0% to 40%.

Specific Actions Taken:

1. Encouraged 20+ satisfied customers to leave detailed G2 reviews

2. Responded to all existing reviews (positive and negative)

3. Updated G2 profile with comprehensive feature descriptions

4. Added comparison charts vs. competitors

Why This Worked:

ChatGPT's training data includes extensive G2 Crowd content. When users ask 'best [category] tools,' GPT naturally references its indexed G2 comparison articles. By improving G2 presence, we directly influenced GPT's source material—changing AI behavior at the data layer, not the prompt layer.

From Black Box to White Box: What Changed

Before Tool (Black Box Approach):

  • Measurement: 'Feels like AI doesn't mention us much'
  • Sample Size: 3-5 manual tests per month
  • Evidence: Screenshots in scattered Google Docs
  • Actionability: Zero. 'Try writing more content?'
  • Time Cost: 15-20 hours/month of manual testing

After Tool (White Box Approach):

  • Measurement: 'Mention rate: 23%, Sentiment Score: +6.2, Primary citation source: Medium (Tier 3)'
  • Sample Size: 60+ automated tests across 4 LLMs
  • Evidence: Structured database with timestamped records
  • Actionability: 'Focus on securing Tier 1/2 citations; current Medium articles insufficient'
  • Time Cost: 2 minutes (96% reduction)

This shift from qualitative guessing to quantitative measurement is the essence of GEO as a discipline.

Key Insight:

You can't optimize what you can't measure. Traditional brand monitoring tells you 'people are talking about you.' GEO monitoring tells you 'AI models rank you 5th in recommendation lists because you lack high-authority citations in G2 Crowd—fix this specifically.'

Conclusion: AI Marketing is Data Science, Not Fortune Telling

AI Marketing isn't mysticism—it's data science.

I developed this tool to transform 'how LLMs perceive you' from a vague feeling into a visible, countable, improvable report.

The GEO Field Training series ahead will use this tool to bring you more real-world industry data analysis.

If you also want to know your brand's true appearance in AI's eyes, stay tuned.

Next in the Series:

GEO Field Notes 02: The Citation War—How to Win the Battle for High-Authority References

GEO Field Notes 03: Prompt Engineering for Brand Positioning—Making AI Remember You Correctly

GEO Field Notes 04: The Competitive Intelligence Layer—Reverse Engineering Competitor GEO Strategies

Frequently Asked Questions

How is GEO measurement different from traditional SEO analytics?

Traditional SEO measures rankings, clicks, and conversions on search pages where users see 10 blue links. GEO measures AI mention rate, sentiment, and citation sources in contexts where users see one synthesized answer. The fundamental difference: SEO optimizes for 'being seen,' GEO optimizes for 'being recommended.' You can rank #1 on Google but never get mentioned by ChatGPT—they're separate battlefields requiring different strategies.

Why do you test across 20+ prompts instead of a few key questions?

Large language models are non-deterministic and context-sensitive. A single prompt tells you almost nothing. Twenty prompts across different scenarios (direct inquiry, category recommendation, competitive comparison) reveal patterns. We've seen brands mentioned in direct questions but completely ignored in category recommendations—which is where 80% of users actually ask. Small sample sizes create false confidence. Systematic testing reveals reality.

Can I improve my GEO score without changing my actual product?

Yes and no. You can't fake quality indefinitely, but you can optimize how AI perceives your existing quality. Most brands aren't invisible because they're bad—they're invisible because AI can't find authoritative sources confirming their value. Optimizing G2 reviews, securing press coverage, and structuring website content for AI parsing are all valid GEO tactics that don't require product changes. However, if your product genuinely has issues, AI will eventually reflect that as more data accumulates.

How often should I run GEO monitoring for my brand?

Frequency depends on your competitive landscape and content velocity. Minimum: monthly for stable brands in slow-moving industries. Recommended: weekly for competitive categories or during active campaigns. Ideal: daily for brands in rapidly evolving spaces or those actively implementing GEO strategies. LLMs update frequently—ChatGPT refreshes knowledge every 2-4 weeks, Perplexity indexes near real-time. Missing a negative sentiment shift for even 2 weeks can mean thousands of users seeing outdated or critical information.

What's the biggest mistake brands make in GEO?

Treating AI like a search engine. The biggest mistake is assuming 'good SEO = good GEO.' They overlap but aren't identical. SEO optimizes for crawlers finding keywords on pages. GEO optimizes for LLMs synthesizing answers from authoritative sources and presenting coherent narratives. We've seen brands with perfect technical SEO but zero AI visibility because they lack the citation layer (reviews, press, comparisons) that LLMs actually reference when making recommendations. Fix the data layer, not just the keyword layer.

Conclusion

The transition from 'feeling' to 'knowing' how AI perceives your brand isn't just a technical upgrade—it's a strategic necessity. As AI becomes the primary information gatekeeper for billions of users, brands that can systematically measure and optimize their AI presence will dominate their categories. The black box era of AI marketing is over. The white box era—where every mention, every sentiment shift, every citation source is tracked and optimized—has begun. The question isn't whether to adopt GEO measurement, but whether you can afford to operate blind while competitors build data-driven AI strategies. Start measuring today. Optimize tomorrow. Dominate the day after.

Want to See Your Brand's GEO Score?

Get a comprehensive AI brand perception report across ChatGPT, Claude, Perplexity, and Gemini—with actionable insights.

Get Your Free GEO Health Report