We're told all the time that Gen AI will speed up the work educators do, help to create great learning materials, and give time back to spend with our students. But that narrative can be misleading.
Using Gen AI to do this kind of work well takes three things at once: pedagogical, technical, and domain expertise. You need to be confident using the technology, have a deep understanding of best teaching and learning practices, and know your topic content to prompt the AI to create something of quality and then refine its output.
Each month, the Generative AI Lab for Education (GAILE) shares what we've learned in an online session, and we ask educators to trial the strategies and tell us what worked, what didn't, and where they found value. This month's question is one we hear a lot: can we offload the task of generating quizzes, assessment questions and discussion prompts to AI?
Four recent studies provide a nuanced story regarding how Gen AI can play a role.
Lam et al. (2024) found that multiple-choice questions generated by Gen AI were comparable in quality to human-generated questions. The catch: this was only when the AI was given explicit prompt instructions, including a discipline-relevant framework and examples of questions. Generic prompting didn't get there.
Elkins et al. (2024) found that teachers preferred AI-generated questions built around Bloom's taxonomy over questions produced from simple, generic prompts. The pedagogy in the prompt mattered more than the model itself.
This one surprised us. Across teachers in the Elkins et al. (2024) study, the average time to write quizzes was roughly the same whether teachers used controlled AI generation, simple AI generation, or wrote the questions by hand.
If your reason for reaching for AI is purely speed, you may be disappointed. But if your reason is quality, consistency across Bloom's levels, or breaking through blank-page syndrome — the picture looks much better.
One exception: differentiation.
Differentiation is the practice of adjusting learning material so it suits learners working at different levels in the same classroom — extension for some, scaffolding for others, grade-level material for the middle.
Done well, it's powerful.
Done at all, it's time-consuming, which is why it often gets dropped from the planning week.
Akdeniz, Clark and Roberts (2025) found that AI-generated questions aligned well with lower-order cognitive tasks and remained consistent across subjects, with promise at higher-order levels too. The authors describe AI as a tool to support teachers in differentiating instruction — not replace their expertise. In a higher education context, differentiation might mean contextualising the same underlying concept for students from different disciplines in a shared course — framing a statistics question through a public health lens for one cohort and an economics lens for another. Or producing two versions of a tutorial question set: one pitched at second-year students still consolidating fundamentals, and one stretched for third-years ready to apply the concept in less familiar territory. Producing one version with AI takes roughly the same time as producing one by hand. Producing several tailored versions does not. This is the use case where the time argument finally lands in AI's favour.
Kusam et al. (2025) compared student perceptions of AI-generated and manually-written quizzes. Their findings:
In other words: AI is good at clean, fair, well-formed questions. Humans are still better at the kind of cognitively demanding, context-aware questions that stretch a learner. That's a useful split to keep in mind when you decide what to delegate.
Across the studies, three problems with AI-generated questions kept coming up:
Elkins et al. also noted a useful detail: "Empirical results from preliminary experimentation showed that generating all of the questions together produced more diverse outputs, whereas generating them separately produced duplicate questions." Batch your generations.
Pulling the threads together, here's what the studies converge on:
A quick note on few-shot prompting
Few-shot learning is a prompting strategy where you give the model a small number — typically one to five — of example input → output pairs inside the prompt, so the model can follow that pattern when generating new outputs. The examples teach the model the desired format, style or level without fine-tuning the model weights.
A quick checklist for a good few-shot prompt:
Reading the research is one thing. Building the quiz is another. Based on what the literature recommends and what we've seen work in practice, here are three approaches you can try — ordered from lowest to highest effort, and from least to most customisable.
If you want to get a feel for AI-generated questions without writing a single prompt, start here. Tools like Menti metre, NotebookLM (free use with a google account) and Thea have quiz generation built directly into the product. You can upload your source material, and the tool produces questions grounded in that content.
Best for: educators new to Gen AI, quick formative checks, and getting unstuck when you're staring at a blank page.
Watch for: limited control over Bloom's level and question style. You'll likely get clear, fair questions — but they may sit at lower cognitive levels by default. Uploading information or sources you don’t own.
This is where the research findings really pay off. Using a general-purpose AI tool like Claude, Copilot or RMIT’s Val, you write a structured prompt that locks in the pedagogy: a clear role, the Bloom's level you want, two or three exemplar questions, and the source material. You save the prompt, and reuse it every time you need a new quiz.
A starting prompt looks something like this:
| You are an experienced [subject] educator writing quiz questions for [year level / cohort]. Generate six multiple-choice questions based on the passage below: one at each level of Bloom's taxonomy (Remember, Understand, Apply, Analyse, Evaluate, Create). Each question should have four options and a clearly marked correct answer. Match the tone and style of these examples: [paste 2–3 of your own questions]. Passage: [paste source material]. |
Best for: educators who want consistency across quizzes, control over cognitive level, and the ability to refine a prompt over time so it gets better at producing your style of question.
Watch for: the first version of your prompt won't be the best version. Plan to iterate. Run the AI's questions past your learning outcomes, and ask the AI to answer its own questions — if it stumbles, your question probably needs more context.
If you're comfortable letting AI generate code as well as content, you can go further: a custom interactive quiz, a scenario-based escape room, a self-marking practice tool, a discussion-prompt generator your students use directly. You describe what you want in plain English, and the AI builds it. This is what people mean by "vibe coding" — you don't need to write the code yourself, but you do need to be able to test it, give feedback, and know what good looks like.
Best for: educators wanting to experiment with format, build assets they can reuse for years, or design learning experiences that simply don't exist as off-the-shelf products.
Watch for: the highest effort of the three. Worth it for an asset you'll use repeatedly; probably not worth it for next Tuesday's lesson check-in.
Whichever approach you choose, these small habits will sharpen your output:
Yes. And it's an excellent tool for tackling blank-page syndrome and getting started. But it still requires your expertise, careful prompting tied to pedagogy, and willingness to iterate and refine.
It might not speed up the process — most definitely not the first time you do it. But as you train your chosen AI model or application with examples, constraints and feedback, it gets better at creating the kinds of questions you want. The investment compounds.
The educators who get the most out of this aren't the ones with the best prompts. They're the ones who treat the AI as a junior colleague: give it good instructions, check its work, and push back when it gets things wrong.
Come to our online May session to try one of these strategies in a community environment. We'll walk through all three approaches above with live demos, and leave time for participants to try one for themselves on a topic they're actually teaching. Bring a unit outline, a passage of source material, and an open mind.
Date: Wednesday27th May @ 12:00PM – 1:00PM
Register Gen AI for Educators: Emerging Practices and Possibilities (Monthly Session)
Akdeniz, H., Clark, T., & Roberts, J. L. (2025). Can AI Generate Questions Aligned with Bloom's Taxonomy? A Framework for Gifted Education to Support Teachers. Journal of Advanced Academics, 36(4), 671–694. https://doi.org/10.1177/1932202X251349917
Elkins, S., Kochmar, E., Cheung, J. C. K., & Serban, I. (2024). How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24).
Kusam, V. A., Song, Z., Maxim, B. R., & Kattan, K. (2025). Evaluating the Effectiveness of Generative AI for Automated Quiz Creation: A Case Study. Paper presented at the American Society for Engineering Education Annual Conference, 2025.
Lam, Y. Y., Chu, S. K. W., Ong, E. L. C., Suen, W. W. L., Xu, L., Lam, L. C. L., & Wong, S. M. Y. (2024). Comparative Study of GenAI (ChatGPT) vs. Human in Generating Multiple Choice Questions Based on the PIRLS Reading Assessment Framework. In Proceedings of the 87th Annual Meeting of the Association for Information Science & Technology, Calgary, AB, Canada.

RMIT University acknowledges the people of the Woi wurrung and Boon wurrung language groups of the eastern Kulin Nation on whose unceded lands we conduct the business of the University. RMIT University respectfully acknowledges their Ancestors and Elders, past and present. RMIT also acknowledges the Traditional Custodians and their Ancestors of the lands and waters across Australia where we conduct our business - Artwork 'Sentient' by Hollie Johnson, Gunaikurnai and Monero Ngarigo.
Learn more about our commitment to Indigenous cultures