How I use AI to summarize official bulletins
On a normal day, the BOE (Spain's national gazette) publishes between 50 and 100 entries. The BDNS (the national grants database) adds between 100 and 300 grant announcements. Regional bulletins contribute several dozen more. In total, between 200 and 500 daily entries of legal-administrative text that no human has the time or desire to read.
In Boletin Claro, I use LLMs to turn that volume into concise summaries that a freelancer or small business owner can read in 3 minutes with their morning coffee. The challenge isn't "summarize text with AI" (that's trivial). The challenge is doing it at scale, with controlled costs, and with enough quality that a user trusts the summary without needing to read the original.
The scale problem
Summarizing 300 entries with an LLM is expensive if you do it naively. Assume each entry averages 2,000 tokens. That's 600,000 input tokens per day. With GPT-4o at $2.50 per million input tokens, that's $1.50/day just on input. Sounds small, but add output tokens, the fact that some texts are much longer, and multiply this by every alert for every user.
If I have 100 users with 3 alerts each and process all 300 entries for each alert, we're talking about 90,000 LLM calls per day. Obviously, that doesn't scale.
Filter before summarizing
The solution is a two-phase pipeline: filtering and summarization. Filtering discards 85-95% of entries before they ever touch the LLM.
Phase 1: Relevance filtering
Each user has alerts configured with a natural language description, like "grants for tech companies in Madrid" or "cleaning service tenders for public buildings." Filtering works in three layers:
- Source filter: if the alert only monitors BOE and BDNS, we immediately discard regional bulletins. This is trivial but eliminates a lot of volume.
- Section filter: BOE entries are classified into sections (I. General provisions, II. Authorities and personnel, III. Other provisions, etc.). If an alert looks for grants, we only care about sections III and V.
- Semantic filter: we use embeddings to calculate similarity between the alert description and each entry. Only entries above a threshold pass to the LLM. I calibrated this threshold empirically: 0.35 for high recall (few missed entries) but enough to discard the clearly irrelevant ones.
The result: from 300 daily entries, a typical alert passes 10-30 to the LLM. Cost drops by an order of magnitude.
Phase 2: LLM summarization
Entries that pass the filter are sent to the LLM to generate a summary. This is where prompt engineering makes the difference.
Prompt engineering for legal text
Spanish administrative text has specific characteristics that make generic prompts work poorly:
- 200-word sentences: Spanish legal language chains subordinate clauses to infinity. The LLM needs explicit instructions to simplify the syntactic structure.
- Key information buried deep: the budget, application deadline, and eligible beneficiaries are usually in the third paragraph, not the first. The prompt must specify what to extract.
- Absurdly long agency names: "Dirección General de Industria, Energía y Minas de la Consejería de Economía, Hacienda y Empleo" is a real agency name. The summary needs to abbreviate it without losing the reference.
- Dates and deadlines: they appear as "twenty business days from the day following publication of the extract in the BOE." The summary should convert that to an actual date.
The prompt I use follows this structure:
Eres un analista de boletines oficiales españoles.
Genera un resumen conciso de la siguiente entrada.
REGLAS:
- Máximo 3 frases
- Incluye SIEMPRE: quién convoca, qué se ofrece, a quién va dirigido
- Si hay presupuesto, inclúyelo con cifras
- Si hay plazo, calcula la fecha límite a partir de la fecha de publicación
- Simplifica nombres de organismos (ej: "Consejería de Economía" en vez del nombre completo)
- NO uses lenguaje jurídico: escribe para un empresario, no para un abogado
- Idioma: español
FECHA DE PUBLICACIÓN: {date}
FUENTE: {source}
ENTRADA:
{content}
I've iterated on this prompt dozens of times. The most impactful tweaks were: explicitly asking it to calculate the deadline date (before it gave the deadline in business days, which is useless to the user) and asking it to abbreviate agency names.
Real costs
With the filtering system, actual costs are much more reasonable than they seem:
| Item | Daily volume | Estimated cost |
|---|---|---|
| Embeddings (filtering) | ~300 entries | $0.01 |
| LLM summaries (per alert) | ~15 entries | $0.02 |
| Total per alert/day | - | ~$0.03 |
With 100 active alerts, the AI cost is about $3/day or $90/month. Viable as a cost for a SaaS with monthly subscription.
Email delivery
Once summaries are generated, they're grouped by alert and sent via email. The email is HTML with a minimalist design: each summary is a block with a title, a 2-3 line summary, and a link to the original source.
The decision to deliver by email (and not push notifications or an in-app feed) was deliberate. The target audience is freelancers and small business owners who already use email as their primary work tool. They don't need to install another app or remember to check another dashboard. The email arrives, gets read, gets archived or acted on. Zero friction.
Delivery is handled with Amazon SES. Each email is personalized by alert: if a user has a "digitalization grants" alert and a "consulting tenders" alert, they receive two separate emails. I tried grouping everything into one and users preferred the per-topic separation.
Quality and hallucinations
The big question with any generative AI system: does it make things up? In the context of official bulletins, a hallucination isn't just annoying, it's potentially harmful. If the summary says the deadline is April 15 but it's actually April 5, the user could miss a grant.
My mitigation strategy:
- Always cite the source: every summary links to the original text in the bulletin. The summary doesn't try to replace the original, just flag what's relevant.
- Extractive over generative: the prompt asks to extract existing information, not generate interpretations. "500,000-euro grant for SME digitalization in Andalusia, deadline April 30" is safer than "This exciting opportunity will let Andalusian SMEs take a digital leap."
- Low temperature: I use
temperature=0.1. For factual summaries, I want determinism, not creativity. - Date validation: a post-processor checks that dates mentioned in the summary are consistent with the publication date. If the LLM says the deadline is before the publication date, the summary is flagged for review.
What I've learned
After months of processing bulletins with AI, my main takeaways are:
First, the LLM is the easy part. 80% of the work is in the upstream pipeline: extracting clean text, filtering for relevance, structuring context. If you feed garbage to the LLM, you get summarized garbage back.
Second, AI costs are manageable if you filter well. Most articles about "prohibitive AI costs" assume you're processing all data with the most expensive model. In practice, with a good filtering pipeline, costs are a fraction of hosting.
Third, users value coverage over perfection. They'd rather receive a "good enough" summary of everything relevant than a perfect summary of only 50% of publications.
If you want to see how the product works, check the AI alerts page for real examples of generated summaries. And if you're building something similar with LLMs over public data, I'd love to compare notes -- you can find me on LinkedIn.