Privacy-First AI for Minors: How PDP Shikshya Handles Student Data
Edge anonymization that strips student PII before it reaches the model, a curriculum-walled retrieval layer grounded in Nepal's grade 9-10 textbooks, and handshake photos that are never stored — PDP Shikshya's privacy and grounding model, in depth.
“Data privacy for minors” sits near the top of every school’s list of AI fears, and for good reason: a child typing their name and a question into a chatbot is sending a stranger’s server a stream of identifiable data about a minor, indefinitely. Most EdTech answers this with a paragraph in a privacy policy. PDP Shikshya, built by pdpspectra, answers it with architecture — privacy that happens before the model, in code, on the school’s own box. This post is the deep dive on how. (For the full product, see the overview; for the teaching mechanics, the pedagogy post.)
The problem with “we take privacy seriously”#
The standard chatbot data flow for a student looks like this: the student writes a message that may contain their name, roll number, or other identifying detail; the whole message goes to a third-party model API; and what happens to it next — retention, training, logging — is governed by terms the school never reads and cannot enforce. For adults that is a known trade-off. For thirteen-year-olds it is the exact thing a school is right to refuse.
PDP Shikshya’s position is that the only privacy guarantee worth making is one you can point at in the system, not in the legal copy. So it intervenes at three places: before the model, around the model, and at the edges where images and logs would normally leak.
Local edge anonymization: strip PII before the model#
The first and most important control is Local Edge Anonymization. Before any student text reaches the language model, it passes through an anonymization layer that runs locally — a regex-and-roster pass that detects and strips personally identifiable information such as names and roll numbers.
The effect is concrete and demonstrable. A student can type “I’m Aarya, roll 14, explain photosynthesis,” and the model on the other side only ever sees the anonymized version — the identity is removed on the school’s own box first. The tutor interface even surfaces that the PII was stripped locally, so the protection is visible rather than promised.
This is a meaningful architectural choice. Anonymizing before the API call means the identifying data never leaves the school’s deployment at all. There is no “the provider promises not to look” — there is nothing identifying to look at. For a school responsible for minors, “the name never left the building” is a far stronger statement than “the request was encrypted in transit.”
The curriculum wall: grounding that also contains#
The second control solves a different fear — hallucination and misinformation — and it does so with a retrieval design that doubles as a containment boundary.
Instead of letting the model answer from its open-ended training, PDP Shikshya grounds the tutor in a curriculum-walled knowledge base: a local TF-IDF retrieval layer built over the school’s own corpus, including the real Nepal grade 9–10 textbooks (Compulsory Science, Mathematics, English, Nepali, Social Studies, Computer Science, and more). When a student asks a question:
- inside the syllabus, the tutor retrieves the relevant passage and answers with a 📖 citation back to the source material;
- outside the syllabus in exam mode, it deflects rather than improvising an answer.
This “curriculum wall” does two jobs at once. It keeps answers grounded in vetted, age-appropriate, syllabus-accurate material — the cure for hallucination. And because the retrieval index is local TF-IDF rather than a hosted vector service, nothing about the queries or the corpus leaves the box. The same mechanism that makes answers trustworthy also makes them private.
There is an operational angle here too: a school can drop its actual grade 9 and 10 textbook PDFs into the platform, and the system loads them into both the browsable Library and the retrieval corpus, so exam-mode answers ground on the books the students actually use. Grounding is not a generic dataset; it is this school’s curriculum.
Zero-retention AI behind a swappable client#
The third control governs the model itself. The reasoning layer is a large language model accessed through a single, swappable client module, and the selection criterion is explicitly privacy: enterprise zero-retention and no-training terms that match a school’s privacy ask. Because the vendor sits behind one module, the platform is not locked to a single provider — it can move to whichever offers the terms a school requires, without rewriting the product.
Combined with edge anonymization, this is defense in depth: the model is on zero-retention terms and it never receives identifying data in the first place. Either one alone would help; together they mean a student’s identity is protected even if you assume the weakest link.
No stored photos, monitored chats, and audited access#
Privacy is also about the quiet edges where data accumulates without anyone deciding it should. PDP Shikshya is deliberate at those edges:
- The Analog Handshake vision check, which photographs a student’s handwritten working, never stores the photos — the check runs and the image is gone.
- AI parent digests are generated without sending any child PII to the model, so the convenience of a weekly summary does not reopen the privacy hole that anonymization closes.
- Student-to-student chats are monitored by a department teacher in a read-only capacity — a safeguarding control for minors that is scoped and explicit rather than blanket surveillance.
- The platform keeps login-audit events and a 30-day-style operational discipline around access, so who-saw-what is recorded rather than assumed.
Each of these is a place where a less careful product would silently retain images, leak PII into a “helpful” summary, or leave student messaging unsupervised. Treating them as first-class requirements is what “privacy by design” actually means in practice.
Why this architecture matters beyond Nepal#
The specific market here is Nepali schools, but the pattern generalizes to any setting where AI touches data about vulnerable people. The lesson is that the strongest privacy controls are the ones positioned before the model, not after it. Anonymize at the edge and the provider’s retention policy stops mattering for identity. Retrieve locally and the corpus never becomes someone else’s training data. Keep the vendor swappable and a change in one provider’s terms is a configuration change, not a crisis.
That is a transferable blueprint — for healthcare, for any minors’ service, for any regulated workload — and PDP Shikshya is a working instance of it built for the hardest audience to get wrong: children.
The takeaway#
PDP Shikshya protects students’ data by doing the privacy work where it counts — before anything reaches the AI. Local edge anonymization strips a minor’s identity on the school’s own box; a curriculum-walled local retrieval layer grounds answers in the school’s real textbooks while keeping every query in-house; the model runs on zero-retention terms behind a swappable client; and the quiet edges — handshake photos, parent digests, student chats, access logs — are each handled deliberately. The result is a privacy promise a school can inspect rather than take on faith. See it live at pdpshikshya.com, and complete the picture with the product overview, the pedagogy deep-dive, and the platform and architecture post.