How I built Boletin Claro: architecture of a SaaS on top of government bulletins
Every day, hundreds of entries are published across Spain's official bulletins: the BOE (the national gazette), the BDNS (the grants database), and the regional bulletins from each autonomous community. Grants, public tenders, regulations. Most small businesses find out too late, or never at all. I built Boletin Claro to fix exactly that: a system that reads the bulletins for you, extracts what's relevant, and sends you a summary every morning.
In this post I'll break down the technical architecture of the project, the stack choices, and the most interesting problems I've had to solve.
The problem
Official bulletins are the primary source of information about public funding, government contracts, and regulation in Spain. But they're designed to meet legal requirements, not to be useful. BOE PDFs have no semantic structure. The BDNS exposes a REST API but with erratic pagination. Regional bulletins range from reasonably clean XML to early-2000s HTML.
The target user is a small business owner or freelancer who needs to know if there's a relevant grant for their business, but can't afford to spend an hour a day scanning 20 different sources. The product does three things: collect, interpret, and deliver.
Overall architecture
The system is composed of five independent services, each deployed on Cloud Run:
- Reader (Python + FastAPI): connects to official sources, downloads the day's bulletins, and converts them to structured markdown. One service per source would be overkill, so internally there's an HTTP client per bulletin with a common interface.
- Interpreter (Python + FastAPI): receives entries from the reader, applies relevance filters, and generates summaries with LLMs. Also handles email delivery.
- Backend (Go + Gin): REST API for the frontend. Manages users, workspaces, alerts, and plans. Firebase Auth for authentication, Firestore as the database.
- Location (Go + Gin): geography sidecar. Resolves municipalities, provinces, and autonomous communities from INE codes (Spain's national statistics institute identifiers). Needed for filtering bulletins by territory.
- Tools (Go + Gin): public semantic search tools over bulletins. This is the service powering the bulletin search engines.
The frontend is React 19 with TypeScript, Vite, and TailwindCSS v4. All infrastructure is defined with Terraform.
Why this stack
Go for the API and auxiliary services
Go is a natural choice for services that need fast startup and low memory footprint. On Cloud Run you pay for execution time, so a 200ms cold start versus 2 seconds makes a real difference. The backend handles auth, CRUD, and business logic: exactly the kind of code where Go shines through its simplicity.
Python for data processing
The reader and interpreter need to parse HTML, XML, PDFs, call AI APIs, and manipulate text. Python is unbeatable for that. beautifulsoup4 for HTML, lxml for XML, pdfplumber for PDF text extraction. FastAPI as the HTTP framework because Pydantic typing reduces errors in service-to-service contracts.
Firebase and Firestore
I don't need complex joins or distributed transactions. What I need is authentication solved out of the box (magic links + Google OAuth), a database that scales without management, and a generous free tier to get started. Firestore checks all those boxes. The data model is hierarchical: workspaces/{id}/alerts/{id}, which is exactly how Firestore works with subcollections.
The daily pipeline
Every morning, a Cloud Scheduler triggers the reader. The flow is:
- The reader iterates over configured sources and downloads the day's bulletins.
- Each bulletin is parsed into a uniform structure: title, text, metadata, source.
- Entries are stored in Firestore and embeddings are generated for semantic search.
- The interpreter receives the new entries, filters them against each user's alerts, and generates summaries with the LLM.
- Summaries are sent via email to subscribed users.
Everything is idempotent. If the reader runs twice for the same date, it doesn't duplicate entries. If the interpreter processes an alert it already handled, it detects the duplicate and skips the email.
Technical challenges
Parsing government PDFs
BOE PDFs are the most interesting challenge. They're not consistently text-selectable. Some have text layers, others are scanned images. Tables break across pages. I've tried pdfplumber, pymupdf, and pdfminer in various combinations. The final solution uses pdfplumber for base extraction and custom heuristics to reconstruct section structure.
AI costs
Sending 300 bulletin entries through an LLM every day isn't cheap. The key is to filter before summarizing. The interpreter first applies relevance filters using keywords and embeddings, and only sends entries to the LLM that pass a threshold. This reduces token volume by 85-95% compared to summarizing everything.
Cross-service consistency
With five independent services, communication is key. I use synchronous HTTP between services (no events), which simplifies debugging. Each service has its own health check and Cloud Run handles scaling. There's no central orchestrator: Cloud Scheduler triggers the reader, the reader calls the interpreter when it's done, and the interpreter handles email delivery autonomously.
Current state and what's next
Boletin Claro currently processes the BOE, BDNS, and several regional bulletins. The system is stable: it runs unattended every day with a failure rate below 1%. Alerts can be configured in natural language ("grants for SME digitalization in Madrid") and the system translates that into technical filters.
Next steps are improving coverage of regional bulletins and adding delivery channels (Telegram, WhatsApp). I'm also building free public search tools on top of the collected data, usable by anyone without an account.
If you're building something similar or have questions about the architecture, you can find me on LinkedIn.