---
project: Finance Skills Showcase
purpose: Demonstrate Anthropic financial-services Claude Code plugins on a varied value-tilted basket; preserve outputs for public review and critique on personalwebapp
target_destination: personalwebapp `/research` section (long-form pieces)
critique_posture: honest assessment — to be written by Andy in his own voice
basket: TEF.MC (anchor), 2603.TW, BRK.B, PAH3.XETRA, MSFT
created: 2026-05-05 → 2026-05-06
---

# Finance Skills Showcase — README

This directory contains the raw outputs of Anthropic's `claude-for-financial-services` plugin skills run on a value-tilted basket of real companies. The intent is to preserve what each skill actually produces so visitors to personalwebapp can read the outputs and form their own view about Claude's capability in equity research and financial analysis.

The critique posture is **honest assessment** — Andy intends to write his own commentary alongside each output, including where the skills shine, where they fall short, and where they hallucinate. This README documents what's been produced, what's outstanding, and what was learned along the way.

---

## Project arc

1. **Install** (2026-05-05) — `claude-for-financial-services` marketplace registered globally; `financial-analysis`, `equity-research`, `earnings-reviewer`, `market-researcher` installed at user scope. Recurring `hooks/hooks.json` schema bug patched; documented in `~/.claude/projects/.../memory/finance-plugins.md`. Reference doc: `C:\Users\andy_\finance-plugins-rundown.md`.

2. **Skills tour** (2026-05-05) — full inventory of 22 skills across the two vertical plugins. Curated list of 9 most relevant for value-investor / portfolio-manager workflows: `thesis-tracker`, `idea-generation`, `catalyst-calendar`, `earnings-analysis`, `model-update`, `dcf-model`, `comps-analysis`, `audit-xls`, `sector-overview`. Skip list documented in the rundown.

3. **First integration test** (2026-05-05) — `/screen` (idea-generation skill) anchored on Vodafone's deep-value profile against the 84-name EODHD telecom universe. Result: 18 hits, 3 top picks (Telefónica, Orange, Proximus). Output written into the webapp-portfolio research store as an `agent_output` artifact (id=2) on Telefónica (research_companies id=3). End-to-end integration validated.

4. **Showcase outputs** (2026-05-06) — one substantive output per skill, on the most natural test name per skill. Outputs 01–10 produced before pivot.

5. **Full 5-task initiating-coverage pipeline** (2026-05-06) — Tasks 1–5 produced as separate deliverables: research doc (.md), financial model (.xlsx with formulas), valuation analysis (.md + 4 valuation tabs in xlsx), 32 charts at 300 DPI (.zip), final 30-page DOCX with all charts embedded. The DOCX hits structural minimums but undershot the skill's own 10K-word target (~3K actual).

6. **Production validation** (2026-05-06) — fact-checked the entire initiating-coverage output against live data via web search and primary filings. Surfaced multiple thesis-breaking errors (most importantly: dividend was already cut, Hispam exits more advanced, revenue base wrong by 15%, Murtra appointment date hallucinated). Documented in `13_initiating-coverage_TEF_validation_vs_reality.md`.

---

## File index

### Showcase outputs (one per skill) — v1 = original, v2 = post-WebSearch refresh

| # | Skill | v1 file | v2 file (refreshed against verified facts) | Test name |
|---|---|---|---|---|
| 01 | comps-analysis | `01_comps-analysis_TEF.md` | `01_comps-analysis_TEF_v2.md` | TEF + 6 European peers (TEF row corrected to company-def FCF) |
| 02 | idea-generation | `02_idea-generation_TEF-screen.md` | (snapshot — not refreshed) | VOD anchor → 18 hits |
| 03 | sector-overview | `03_sector-overview_European-Telecoms.md` | `03_sector-overview_European-Telecoms_v2.md` | Adds Transform & Grow + dividend reset wave context |
| 04 | competitive-analysis | `04_competitive-analysis_TEF.md` | `04_competitive-analysis_TEF_v2.md` | Updated Spain shares, Hispam status, Murtra date |
| 05 | thesis-tracker | `05_thesis-tracker_TEF.md` | `05_thesis-tracker_TEF_v2.md` | **Pillars rewritten — capital allocation reset thesis, not yield carry** |
| 06 | catalyst-calendar | `06_catalyst-calendar_basket.md` | `06_catalyst-calendar_basket_v2.md` | TEF Q1 14 May verified, VMO2 lock-up June |
| 07 | morning-note | `07_morning-note_basket.md` | (illustrative — not refreshed) | Coverage basket digest |
| 08 | earnings-analysis | `08_earnings-analysis_MSFT.md` | (illustrative — not refreshed) | MSFT Q3 FY26 illustrative |
| 09 | earnings-preview | `09_earnings-preview_TEF-Q1.md` | `09_earnings-preview_TEF-Q1_v2.md` | TEF Q1 2026, May 14 verified |
| 10 | model-update | `10_model-update_TEF.md` | `10_model-update_TEF_v2.md` | Hypothetical post-Q1 update aligned to v2 thesis |

### Initiating-coverage 5-task pipeline — v1 (showcase, no upfront WebSearch)

| Task | File | Format | Notes |
|---|---|---|---|
| 1 — Research | `13_initiating-coverage-task1_TEF.md` | Markdown ~6,800 words | 8 sections incl 4 management bios + 14 risks |
| 2 — Model | `13_initiating-coverage_TEF_task2_model.xlsx` | Excel 10 tabs | Formulas not hardcodes; yellow cells = inputs |
| 3 — Valuation | `13_initiating-coverage_TEF_task3_valuation.md` + 4 xlsx tabs | Markdown + xlsx | DCF/Comps/SOTP, blended PT €5.05 (BUY) |
| 4 — Charts | `13_initiating-coverage_TEF_task4_charts.zip` | 32 PNGs at 300 DPI | 4 mandatory charts ⭐ included |
| 5 — DOCX | `13_initiating-coverage_TEF_task5_report.docx` | Word doc, 30+ pages | Embeds all 32 charts; ~3K words vs 10K skill target |
| Validation | `13_initiating-coverage_TEF_validation_vs_reality.md` | Markdown | Live data fact-check; 17 cited sources |

### Initiating-coverage v2 — production run with WebSearch forced upfront

The v2 was built on a 12-search WebSearch foundation before any drafting. Every claim traces to a primary source via the facts pack. The exercise demonstrates exactly how much of v1's failure was a discipline failure rather than a model-capability failure.

| File | Format | Notes |
|---|---|---|
| `14_initiating-coverage_TEF_v2_facts-pack.md` | Markdown | Verified data sheet — 25 cited sources, every number used in v2 |
| `15_initiating-coverage_TEF_v2_research.md` | Markdown | Production research doc; HOLD, PT €4.10 (+8% upside) |
| `16_initiating-coverage_TEF_v1_vs_v2.md` | Markdown | Side-by-side variance summary |
| `17_initiating-coverage_TEF_v2_model.xlsx` | Excel | 10 tabs anchored to FY25A actuals + Transform & Grow guidance; FCF baseline €3bn FY26E (vs v1 €5.9bn) |
| `17_initiating-coverage_TEF_v2_charts.zip` | 16 PNGs at 300 DPI | Includes new charts on FCF definition gap (EODHD vs company) and v1-vs-v2 PT comparison |
| `17_initiating-coverage_TEF_v2_report.docx` | Word doc | DOCX with v2 narrative + 16 charts embedded |

**Key narrative shift v1 → v2:** dividend cut already happened (Nov 2025); Hispam mostly exited; revenue base €35.1bn not €41.7bn; FCF €2.07bn (company def.) not €5.2bn (EODHD def.); recommendation BUY/+44% → HOLD/+8%; aligns with sell-side consensus rather than top-of-range bullish.

**Build scripts (kept for reproducibility):**
- `_build_v2_model.py` — openpyxl construction of the v2 financial model
- `_build_v2_charts_and_docx.py` — combined chart generation + DOCX assembly

### Additional skill demonstrations (post-v2)

| File | Skill | What it tests |
|---|---|---|
| `18_audit-xls_v2-model.md` | `audit-xls` (financial-analysis plugin) | Programmatic audit of the v2 xlsx — surfaced 3 critical / 32 warnings / 19 info findings. Genuinely catches BS-doesn't-balance, cash-doesn't-tie, and unlevered-FCF-mismatch — three real model-integrity gaps that would block client distribution |
| `19_earnings-reviewer_VOD_Q3-FY26.md` | `earnings-reviewer` (named agent — wraps earnings-analysis + model-update + audit-xls + morning-note) | Run on Vodafone's most recent print (Q3 FY26, reported 4 Feb 2026) plus the post-print CK Hutchison VodafoneThree buyout (announced 5 May 2026 — yesterday). Tests the agent-orchestration pattern vs running individual skills. Three deliverables produced: variance table, model update summary, note draft |
| `20_pptx-author_TEF-summary-deck.pptx` | `pptx-author` (financial-analysis plugin) | TEF investment summary deck built from v2 narrative + v2 chart set. 6 slides (cover, summary, thesis, plan, valuation, risks/catalysts) with 4 embedded charts. Tests whether the skills generalise from text/DOCX to slide format |
| `_audit_xls.py` | (build script) | Programmatic execution of the audit-xls workflow against the v2 xlsx |
| `_build_v2_pptx.py` | (build script) | python-pptx construction of the deck |

### Build scripts (kept for reproducibility)

- `_build_task2_model.py` — openpyxl construction of the financial model
- `_build_task3_valuation_tabs.py` — appends valuation tabs to model
- `_build_task4_charts.py` — matplotlib chart generation
- `_build_task5_report.py` — python-docx assembly

### Outstanding (skills still to demo if continuing the showcase)

- `dcf-model` — file 11 has a markdown sample with a key teaching moment (caught its own bad WACC mid-process)
- `3-statement-model` — file 12 has a markdown sample
- Utility skills: `audit-xls`, `clean-data-xls`, `xlsx-author`, `pptx-author`, `ppt-template-creator`, `deck-refresh`, `ib-check-deck`, `skill-creator`, `lbo-model` — most don't produce standalone analytical outputs and would be better demonstrated as utilities run on the existing initiating-coverage deliverables (e.g. audit-xls applied to the Task 2 model)

---

## Key lessons documented

The validation exercise on the Telefónica initiation report (file `13_..._validation_vs_reality.md`) is the centrepiece teaching artifact of this project. Key generalisable lessons:

### Lesson 1 — Hallucination of specific facts is the dominant failure mode

Where the showcase used a specific date or precise number, it was wrong roughly half the time, even for facts within Claude's training window. Where it described directional dynamics, it was usually right. **Every specific number, date, and quantitative claim must be web-validated before publishing.**

The most striking example: the showcase identified "dividend cut" as Risk 8 in its own risk table with a -€1.00 PT impact estimate — but the dividend had already been cut in November 2025, six months before the showcase was generated. The skill *knew* this was the asymmetric vulnerability; it just didn't know which risks had already materialised.

### Lesson 2 — Database definitions need explicit reconciliation

Telefónica's company-reported FCF (€2.6B) and EODHD's calculated FCF (€5.2B) differ by ~2x because they use different definitions (textbook OCF-Capex vs the company's stricter measure netting spectrum, hybrid coupons, lease principal). Any analysis built on a third-party database must reconcile to the company's own definitions before drawing conclusions.

This applies broadly: EBITDAaL vs EBITDA, lease-adjusted leverage, FCF before/after spectrum. A production analysis must do this reconciliation explicitly. The skill does not enforce this; the analyst must.

### Lesson 3 — Skills with explicit cutoff guards perform better

The `earnings-analysis` skill opens with `🚨🚨🚨 CRITICAL: TRAINING DATA IS OUTDATED 🚨🚨🚨` and forces 4 web searches before drafting. The `initiating-coverage` skill does not, and produced exactly the failure mode that warning is meant to prevent.

A robust production version of `initiating-coverage` should force:
- Web search for "[ticker] news last 6 months" before any management/strategic commentary
- Web search for "[ticker] dividend status" before any income-thesis pillar
- Current price pull as the first action in valuation
- Current consensus pull before any "vs consensus" claim

### Lesson 4 — Structural framework is genuinely transferable

Stripped of factual errors, the analytical reasoning shape is strong: peer set selection, valuation method blend, risk identification, recognising when a WACC override is needed, identifying the operational variable (Spain ARPU) that materially moves the bull case. A senior analyst reviewing the showcase would find the *structure* defensible while flagging specific facts as needing rebasing — a better starting point than most junior analyst drafts.

### Lesson 5 — Why I didn't run production validation upfront (self-critique)

Three reasons: (1) I conflated "synthetic showcase" with "no fact-checking required" — the cover applies to forward estimates, not backward-looking specifics; (2) the initiating-coverage skill itself doesn't force the cutoff guard, so I followed the skill's discipline and inherited its blind spot; (3) I treated 30–60 minutes of upfront search work as overhead, when it would have saved more time than it cost. A senior analyst would never draft an initiation without Bloomberg/filings open. I behaved like a junior pulling from textbook.

For the article: this is actually the cleanest framing — "the AI knew the right risks; it just didn't know which had already happened." More useful for readers than either "AI is great" or "AI is bad."

---

## Article structure suggestions (for Andy to write)

The validation report effectively gives the bones of an honest assessment piece. Possible structures:

**Option A: "What ten of Anthropic's new finance skills actually produce"**
- Brief on each output, side-by-side critique
- Reader can click through to read the raw outputs
- Closes with the validation finding as the punchline ("but here's what happens when you fact-check it")

**Option B: "I asked Claude to write an institutional initiation report"**
- Focus on the 5-task pipeline only
- Walk through what got produced at each task
- Validation report is the final section — "and then I checked it against reality"
- Strongest narrative arc; weakest breadth

**Option C: "Where AI can and can't write equity research" (essay)**
- Use the showcase outputs as evidence
- Structural framework transferable, content discipline must be operator-imposed
- Lessons 1–4 above as the substantive backbone
- Most useful to industry readers

The raw outputs in this folder support any of those structures.

---

## Future work

- Re-run the initiating-coverage pipeline with web search forced upfront ("production v2") to see how much improves automatically with WebSearch in the loop
- Run the same validation exercise on a US name (MSFT or BRK.B) where data is denser and primary sources easier to access
- Eventually wire this pattern back into webapp-portfolio's transparent agent layer (Phase 4 / 5) — every plugin skill output becomes a typed/versioned artifact with a separate validation pass before being marked "production"
- Consider authoring a custom plugin (`thesis-tracker-pro` or similar) that wraps Anthropic's skills with the cutoff-guard discipline that initiating-coverage lacks

---

## Background context (for future Claude sessions reading this)

- Skills inventory and architecture rundown: `C:\Users\andy_\finance-plugins-rundown.md`
- Plugin install state and recurring hooks.json bug: `~/.claude/projects/C--Users-andy-/memory/finance-plugins.md`
- Webapp-portfolio integration target (transparent agent layer): `~/.claude/projects/C--Users-andy-/memory/webapp-portfolio.md`
- Personalwebapp project (target site for the showcase article): `~/.claude/projects/C--Users-andy-/memory/personalwebapp.md`
- Company-analyst (existing decision journal — eventual merge target): `~/.claude/projects/C--Users-andy-/memory/company-analyst.md`
