Architecture

How I built Boletin Claro: architecture of a SaaS on top of government bulletins

Enrique Lopez · March 24, 2026

Every day, hundreds of entries are published across Spain's official bulletins: the BOE (the national gazette), the BDNS (the grants database), and the regional bulletins from each autonomous community. Grants, public tenders, regulations. Most small businesses find out too late, or never at all. I built Boletin Claro to fix exactly that: a system that reads the bulletins for you, extracts what's relevant, and sends you a summary every morning.

In this post I'll break down the technical architecture of the project, the stack choices, and the most interesting problems I've had to solve.

The problem

Official bulletins are the primary source of information about public funding, government contracts, and regulation in Spain. But they're designed to meet legal requirements, not to be useful. BOE PDFs have no semantic structure. The BDNS exposes a REST API but with erratic pagination. Regional bulletins range from reasonably clean XML to early-2000s HTML.

The target user is a small business owner or freelancer who needs to know if there's a relevant grant for their business, but can't afford to spend an hour a day scanning 20 different sources. The product does three things: collect, interpret, and deliver.

Overall architecture

The system is composed of five independent services, each deployed on Cloud Run:

The frontend is React 19 with TypeScript, Vite, and TailwindCSS v4. All infrastructure is defined with Terraform.

Why this stack

Go for the API and auxiliary services

Go is a natural choice for services that need fast startup and low memory footprint. On Cloud Run you pay for execution time, so a 200ms cold start versus 2 seconds makes a real difference. The backend handles auth, CRUD, and business logic: exactly the kind of code where Go shines through its simplicity.

Python for data processing

The reader and interpreter need to parse HTML, XML, PDFs, call AI APIs, and manipulate text. Python is unbeatable for that. beautifulsoup4 for HTML, lxml for XML, pdfplumber for PDF text extraction. FastAPI as the HTTP framework because Pydantic typing reduces errors in service-to-service contracts.

Firebase and Firestore

I don't need complex joins or distributed transactions. What I need is authentication solved out of the box (magic links + Google OAuth), a database that scales without management, and a generous free tier to get started. Firestore checks all those boxes. The data model is hierarchical: workspaces/{id}/alerts/{id}, which is exactly how Firestore works with subcollections.

The daily pipeline

Every morning, a Cloud Scheduler triggers the reader. The flow is:

  1. The reader iterates over configured sources and downloads the day's bulletins.
  2. Each bulletin is parsed into a uniform structure: title, text, metadata, source.
  3. Entries are stored in Firestore and embeddings are generated for semantic search.
  4. The interpreter receives the new entries, filters them against each user's alerts, and generates summaries with the LLM.
  5. Summaries are sent via email to subscribed users.

Everything is idempotent. If the reader runs twice for the same date, it doesn't duplicate entries. If the interpreter processes an alert it already handled, it detects the duplicate and skips the email.

Technical challenges

Parsing government PDFs

BOE PDFs are the most interesting challenge. They're not consistently text-selectable. Some have text layers, others are scanned images. Tables break across pages. I've tried pdfplumber, pymupdf, and pdfminer in various combinations. The final solution uses pdfplumber for base extraction and custom heuristics to reconstruct section structure.

AI costs

Sending 300 bulletin entries through an LLM every day isn't cheap. The key is to filter before summarizing. The interpreter first applies relevance filters using keywords and embeddings, and only sends entries to the LLM that pass a threshold. This reduces token volume by 85-95% compared to summarizing everything.

Cross-service consistency

With five independent services, communication is key. I use synchronous HTTP between services (no events), which simplifies debugging. Each service has its own health check and Cloud Run handles scaling. There's no central orchestrator: Cloud Scheduler triggers the reader, the reader calls the interpreter when it's done, and the interpreter handles email delivery autonomously.

Current state and what's next

Boletin Claro currently processes the BOE, BDNS, and several regional bulletins. The system is stable: it runs unattended every day with a failure rate below 1%. Alerts can be configured in natural language ("grants for SME digitalization in Madrid") and the system translates that into technical filters.

Next steps are improving coverage of regional bulletins and adding delivery channels (Telegram, WhatsApp). I'm also building free public search tools on top of the collected data, usable by anyone without an account.

If you're building something similar or have questions about the architecture, you can find me on LinkedIn.