All posts
4 min readDraft

How the Encyclopedia Works: 1% AI, 99% Context

Why we spent months building a curated knowledge base before writing a single AI prompt — and why that's the right order.

Shahzad Muhammad
architectureairag

How the Encyclopedia Works: 1% AI, 99% Context

Most AI career tools work like this: user submits resume → AI reads it → AI makes up career advice based on its training data.

The advice sounds plausible. Sometimes it's even correct. But "plausible" and "correct" are different things when someone's career depends on the answer.

Alfalah works differently.

The problem with raw AI

Imagine asking a freshly graduated generalist consultant: "What salary should a staff nurse in the UAE expect in 2026, and which visa pathway applies to a Pakistani nurse moving there for the first time?"

The consultant can answer. They'll draw on everything they read during university — general knowledge, some economics, maybe a few news articles. The answer will sound confident. It might be close to right.

But it won't be sourced. It won't be current. And if it's wrong, you won't know until you're already in Dubai.

This is the problem with raw AI for career guidance. Large language models know a lot, but they don't know what they don't know, and they don't know when their knowledge is stale.

What we built instead

Before writing a single AI prompt that delivers advice to users, we built an encyclopedia.

The encyclopedia is a collection of curated Markdown files — one per country, one per occupation, one per visa class, one per industry — that contain:

  • Official sources: directly quoted or summarized from government immigration portals, labor statistics bureaus, and national job boards
  • Source citations: every factual claim links to a source document with a "last verified" date
  • Structured data: salary ranges by experience level, ATS norms by country, document checklists by visa class, hiring culture notes by industry

These files are then broken into ~500-token chunks, embedded into a vector database, and indexed for retrieval.

When a user runs an Assessment, here's what actually happens:

  1. We extract the semantic intent from the resume and job description (country, industry, role, occupation code)
  2. We build a retrieval query from that intent
  3. We fetch the top 5 most relevant chunks from the encyclopedia — chunks that are about this specific country, this specific occupation, this specific visa class
  4. Those chunks become <context> in the AI prompt

The AI then operates like a consultant who was just handed a briefing document. Not guessing. Reading.

Why this order matters

We could have launched months ago with a raw AI approach. Many of our competitors do exactly that. You can build a resume scorer in a weekend with the right API calls.

But we've seen the outputs those systems produce. They're often US-centric (because that's where the training data is densest). They misidentify visa categories. They quote salary figures from two years ago. They give advice about "ATS optimization" that doesn't apply in countries where ATS software is rare.

Our encyclopedia-first approach means:

  • A nurse in Pakistan gets accurate UAE nursing visa requirements, not a US-market guess
  • A software engineer in Nigeria gets Nigeria-specific salary benchmarks, not Silicon Valley numbers
  • A teacher in the UK gets advice about the Qualified Teacher Status pathway, not a generic "get certified" answer

The tradeoff is time. Building and curating an encyclopedia is slow. As of today, we have deep coverage for 8 countries and 425 occupations. We need 187 more countries and full visa data.

But every weekend, the corpus grows.

"1% AI, 99% context"

That's how we think about what we're building.

The AI model is 1% of the value. Any modern LLM can reason about a resume, spot keywords, suggest cover letter improvements. That part is commoditized.

The 99% is the context: knowing that a Canadian visa application for a healthcare worker requires a specific credential evaluation body, that Saudi ATS systems in 2026 use specific keyword conventions, that a Nigerian software engineer applying for a German Blue Card needs a credential evaluation from anabin-recognized institution.

That knowledge doesn't exist in any model's training data at the precision and recency we need. We have to source it, verify it, structure it, and maintain it.

That's what we're building. That's why we're not launched yet.

That's why it'll be worth waiting for.


Shahzad Muhammad is the founder of Alfalah. He also built SkilledScore and VisaBridge before realizing they needed to be one product. He builds on weekends.