A voice-driven AI that mines the entire web in real time to answer any question — built as a Master's thesis at the University of Essex.
The year was 2018. ChatGPT didn't exist. Neither did GPT-3. The goal was audacious: build an AI that could answer any question — not from a pre-built database, but by mining the entire internet in real time.
Voice in, knowledge out. A Seq2Seq LSTM chatbot handled general conversation while a parallel information retrieval engine scraped Wikipedia, Wolfram Alpha, Quora, Google, Bing, Yahoo, and WikiHow simultaneously — extracting, summarizing, and speaking the answers back.
"The pre-eminent objective is to fabricate a framework which can mimic artificial intelligence and give results to any kind of question asked by the user through voice." — from the thesis abstract
Every question flowed through a purpose-built pipeline — from voice input to spoken answer.
Browser-based voice interface. Speech recognition converts spoken questions to text. Speech synthesis reads answers aloud. Full hands-free loop.
Encoder-decoder architecture trained on the Cornell Movie Dialogues dataset. 10,000 conversational samples. Handles general conversation and small talk.
When the chatbot can't answer, the retrieval engine fires in parallel — scraping and querying multiple knowledge sources simultaneously.
Raw HTML scraped from the web is parsed, core information extracted, and summarized into concise, meaningful answers for the user.
The brain that routes queries, manages the chatbot, dispatches retrieval tasks, and serves responses back to the frontend.
The conversational engine was built on an encoder-decoder LSTM architecture. The encoder reads the input question word by word, compressing it into a fixed-dimensional vector. The decoder then generates a response, one token at a time, from that compressed representation.
Training began with a 100-sample experiment to validate the approach — loss dropped from 230% to just 3.7% in 100 epochs. The model was then scaled to 10,000 samples from the Cornell Movie Dialogues dataset and trained on a Paperspace cloud GPU.
"The number of samples is directly proportional to the number of epochs the model needs to train. The model can clearly answer any question from its training bucket — the goal is a proper dataset with good samples and a high-end GPU to train on."
When the chatbot couldn't answer a factual question, the system didn't give up. It fired off requests to multiple knowledge sources in parallel, extracted the relevant information, summarized it, and spoke the answer back — all in real time.
Encyclopedic knowledge via the Wikipedia API. Full article summaries, key facts, and structured data extracted on demand.
Computational and mathematical queries. Unit conversions, scientific calculations, and data-driven answers from the computational knowledge engine.
Opinion-based and subjective answers. Scraped using Beautiful Soup to extract top-voted community responses to questions.
General web search across three major engines. SERP scraping extracts snippets, URLs, and page content for any open-domain query.
Step-by-step how-to guides. HTML structure parsed to extract procedural knowledge for "how do I..." style questions.
All raw scraped content is filtered, cleaned, and condensed into concise answers — only meaningful insight reaches the user.
In 2018, Siri, Google Assistant, and Alexa relied on pre-built knowledge bases and predefined workflows. When asked an unusual or complex question, they'd either redirect to a web search or fail silently. This system went to the source in real time — scraping, extracting, summarizing, and actually answering the question.
| Capability | Apple Siri | Google Assistant | Amazon Alexa | This System |
|---|---|---|---|---|
| General Conversation | Limited | Limited | Limited | Seq2Seq LSTM |
| Open-Domain Q&A | Web redirect | Partial | Web redirect | Real-time retrieval |
| Computational Queries | Basic | Yes | Basic | Wolfram Alpha |
| Multi-Source Answers | No | No | No | 5+ sources parallel |
| Answer Summarization | No | No | No | NLP summarization |
| No Pre-Built Knowledge | Relies on it | Relies on it | Relies on it | 100% real-time |
| Voice-First Interface | Yes | Yes | Yes | Yes |
The thesis didn't just build a web app. It envisioned the endgame: a fully portable Personal Assistant System worn on the body. Cloud computing handles the AI. The user carries only a lightweight wearable. This was written years before Meta's smart glasses, Humane's AI Pin, or Apple Vision Pro existed.
A smartband with a pico projector that beams a display onto your forearm. Visual output without a screen. Predicted wearable AR before it went mainstream.
Voice input and audio output through a lightweight earpiece. Hands-free, eyes-free interaction. The only hardware the user actually carries.
All computation happens in the cloud. The chatbot, retrieval engine, summarization — everything runs on remote servers. The wearable is just an interface.
Low-energy Bluetooth + Wi-Fi keeps the system linked to the cloud at all times. Endless knowledge, zero local storage required.
"The only portable system carried with the user will be the small lightweight earpiece providing complete hands-free communication to the system — connecting the user to endless knowledge and information." — from the thesis conclusion
38 pages. Encoder-decoder models, web scraping pipelines, training logs, system comparisons, and the wearable PAS vision. The complete Master's thesis, exactly as submitted to the University of Essex in August 2018.