Citation Tracking
Understanding how AEO/GEO Analytics extracts, tracks, and analyzes citations from LLM responses.
What is a Citation?
Citation: A URL or domain reference that appears in an LLM's response to a query.
Citation Examples
Direct URL Citation:
Response: "According to this guide (https://familyhandyman.com/
plumbing/faucet-repair/), you should first turn off the water supply..."
Citation Extracted: familyhandyman.com
Inline Reference:
Response: "Family Handyman recommends turning off the water supply
before starting any faucet repair..."
Citation Extracted: May or may not include URL depending on provider
Multiple Citations:
Response: "Several sources including This Old House, Home Depot,
and YouTube tutorials suggest the following steps..."
Citations Extracted: thisoldhouse.com, homedepot.com, youtube.com
How Citation Extraction Works
The Extraction Process
Step 1: LLM Response
- Query sent to provider (Claude, GPT-4, etc.)
- Provider generates response with citations
- Full text returned to AEO/GEO Analytics
Step 2: URL Detection
- Regular expression scans response text
- Identifies patterns matching URLs
- Extracts complete URL strings
Step 3: Domain Parsing
- Full URL broken down
- Root domain extracted
- Subdomains removed (www. stripped)
Step 4: Storage
- Citation saved to database
- Linked to original query and response
- Domain tracked separately for aggregation
Example Flow
Query: "How to fix a leaking faucet?"
Provider Response:
"To fix a leaking faucet, follow these steps from
https://familyhandyman.com/plumbing/faucet-repair/:
1. Turn off water supply
2. Remove faucet handle
..."
Extraction:
URL Found: https://familyhandyman.com/plumbing/faucet-repair/
Domain Parsed: familyhandyman.com
Stored: Citation record created
Domain counter incremented
Citation Patterns
Provider Differences
Claude (Anthropic):
- Highest citation rate (6-10 per response)
- Explicit URL format
- Clear source attribution
- Educational and how-to sites favored
GPT-4 (OpenAI):
- Medium-high citation rate (4-8 per response)
- Mix of URLs and source names
- Balanced source types
- Review and comparison sites favored
Gemini (Google AI):
- Lower citation rate (3-6 per response)
- Search-oriented citations
- Google properties sometimes favored
- Educational sites common
Perplexity:
- Medium citation rate (4-7 per response)
- Real-time web search based
- Fresh content cited
- News and current sources
Query Type Impact
How-To Queries:
- Citation Rate: High (7-12 citations)
- Source Types: Tutorial sites, YouTube, how-to guides
- Why: Multiple approaches exist, LLM cites several
Comparison Queries:
- Citation Rate: Medium-High (5-9 citations)
- Source Types: Review sites, comparison tools
- Why: Need multiple perspectives
Problem-Solving:
- Citation Rate: High (6-10 citations)
- Source Types: Troubleshooting guides, expert sites
- Why: Multiple solutions cited
Educational:
- Citation Rate: Medium (4-6 citations)
- Source Types: Wikipedia, .edu sites, encyclopedias
- Why: Fewer authoritative sources
Decision-Making:
- Citation Rate: Medium (4-6 citations)
- Source Types: Expert opinions, buying guides
- Why: Specific recommendations
Citation Quality Indicators
Position in Response
Early Citation (First paragraph):
- Highest visibility
- User sees immediately
- Primary source signal
Mid-Response Citation:
- Supporting evidence
- Additional context
- Medium visibility
Late Citation (Last paragraph, footnotes):
- Secondary sources
- Further reading
- Lower visibility
Attribution Style
Explicit Attribution:
"According to Family Handyman (familyhandyman.com)..."
- High visibility
- Clear source
- Trust building
Implicit Attribution:
"Most plumbers recommend... (sources: familyhandyman.com, thisoldhouse.com)"
- Medium visibility
- Aggregated sources
- Less direct
URL Only:
"...(https://familyhandyman.com/guide)"
- Lower visibility
- Technical reference
- Click-through possible
Citation Context
Recommendation Citation:
- "Expert source X recommends..."
- High trust signal
- Authority building
Data Citation:
- "According to research..."
- Credibility signal
- Statistical authority
Example Citation:
- "As shown in this guide..."
- Practical application
- How-to authority
Domain Tracking
What We Track
Per Domain:
- Total citation count
- First seen date
- Last seen date
- Queries triggering citations
- Providers citing it
- Specific URLs cited
Aggregate:
- Citation share percentage
- Rank among all domains
- Trending up or down
Domain Authority Signals
High Citation Count:
- Frequent citations = trusted source
- Consistent across queries = breadth
- Multiple providers = universal trust
Citation Diversity:
- Many different URLs = comprehensive
- Various query types = versatile
- All providers = robust authority
Citation Consistency:
- Always cited for topic = go-to source
- First position often = primary authority
- Explicit attribution = recognized expert
Citation Metrics
Primary Metrics
Total Citations:
- Absolute count across all responses
- Growth metric over time
- Volume indicator
Citation Rate:
- Citations per response
- Quality indicator
- Provider comparison metric
Citation Share:
- Your citations ÷ Total citations
- Competitive metric
- Market position indicator
Secondary Metrics
Unique Domains:
- Number of different domains cited
- Diversity indicator
- Competition level
Average Citations per Query:
- Total citations ÷ Total queries
- Query quality indicator
- Expected citation baseline
Citation Distribution:
- How evenly citations spread
- Monopoly vs competitive landscape
- Opportunity identification
Citation Analysis Techniques
Competitive Benchmarking
Compare Your Domain:
- Execute 50 queries in your industry
- Track citation counts by domain
- Rank domains by citation share
- Identify your position
Example Results:
1. competitor-a.com - 15% share (150 citations)
2. competitor-b.com - 12% share (120 citations)
3. industry-leader.com - 10% share (100 citations)
...
10. your-domain.com - 3% share (30 citations)
Insights:
- Gap from leaders: 12% points
- Opportunity: Increase share by 4x
- Focus: Match competitor content coverage
Content Gap Analysis
Find Uncited Queries:
- Execute query set
- Identify queries with 0-2 citations
- Low competition = opportunity
Example:
Query: "How to replace toilet flapper valve?"
Citations: 2 (very low)
Opportunity: Create definitive guide
Competition: Minimal
Potential: High citation share capture
Trending Analysis
Track Over Time:
- Weekly: Execute same query set
- Monthly: Compare citation shares
- Quarterly: Trend analysis
Example Trend:
Jan: your-domain.com - 3% share
Feb: your-domain.com - 4% share (+33%)
Mar: your-domain.com - 5.5% share (+38%)
Trend: Growing share, positive trajectory
Action: Continue current strategy
Why Citations Matter
Direct Benefits
Brand Awareness:
- Domain appears in AI responses
- Users see your brand name
- Builds familiarity over time
Traffic Potential:
- Users may click citations
- 5-15% CTR typical
- Qualified traffic
Trust Building:
- AI trusts your content
- Users trust AI recommendations
- Transitive trust effect
Indirect Benefits
Search Rankings:
- AI citations may signal quality to Google
- Authority building across platforms
- Backlink-like effect
Brand Recall:
- Even without clicks, visibility matters
- "I've heard of that before"
- Future search behavior influenced
Content Validation:
- Citations confirm content quality
- Topic coverage verification
- Strategic direction validation
Citation Limitations
What Citations Don't Tell You
User Behavior:
- Can't track if user clicked
- Can't measure time on site
- Can't attribute conversions
Citation Intent:
- Positive or negative mention?
- Agreement or counterexample?
- Primary or supplementary?
Full Context:
- How citation was framed
- What else was cited
- User's ultimate choice
Measurement Challenges
Provider Variability:
- Different citation rates
- Different source preferences
- Inconsistent formatting
Query Dependence:
- Some queries cite more than others
- Can't compare across industries easily
- Context matters
Temporal Effects:
- Provider updates change behavior
- Content freshness varies
- Seasonal patterns exist
Best Practices
Testing Methodology
Consistent Query Sets:
- Use same queries over time
- Compare apples to apples
- Track trends accurately
Multiple Providers:
- Don't rely on single LLM
- Provider diversity important
- Comprehensive view
Regular Cadence:
- Weekly for active monitoring
- Monthly for trend analysis
- Quarterly for strategy
Data Interpretation
Statistical Significance:
- Minimum 20-50 queries for patterns
- Don't over-interpret single citations
- Look for consistent trends
Context Awareness:
- Consider query type
- Account for provider differences
- Note market changes
Competitive Context:
- Always compare to benchmarks
- Track relative not just absolute
- Industry varies widely
Action-Oriented Analysis
Identify Opportunities:
- Low-citation queries = create content
- Competitor gaps = differentiation
- Trending topics = prioritize
Measure Impact:
- Before/after content updates
- A/B test content approaches
- Validate hypotheses
Iterate Quickly:
- Test → Measure → Refine
- Don't wait for perfect
- Continuous improvement
Next Steps
- LLM Providers → - Provider-specific citation behaviors
- Analytics Guide → - Analyze your citation data
- AEO/GEO Overview → - Broader context