Can Gen AI create learning questions better than a human?

Can Gen AI create learning questions better than a human?

What the research says, what we've found, and three ways you can try it for yourself.

The wondering

We're told all the time that Gen AI will speed up the work educators do, help to create great learning materials, and give time back to spend with our students. But that narrative can be misleading.

Using Gen AI to do this kind of work well takes three things at once: pedagogical, technical, and domain expertise. You need to be confident using the technology, have a deep understanding of best teaching and learning practices, and know your topic content to prompt the AI to create something of quality and then refine its output.

Each month, the Generative AI Lab for Education (GAILE) shares what we've learned in an online session, and we ask educators to trial the strategies and tell us what worked, what didn't, and where they found value. This month's question is one we hear a lot: can we offload the task of generating quizzes, assessment questions and discussion prompts to AI?

The discovery: what the research says

Four recent studies provide a nuanced story regarding how Gen AI can play a role. 

AI questions can match human-written ones — under the right conditions

Lam et al. (2024) found that multiple-choice questions generated by Gen AI were comparable in quality to human-generated questions. The catch: this was only when the AI was given explicit prompt instructions, including a discipline-relevant framework and examples of questions. Generic prompting didn't get there.

Elkins et al. (2024) found that teachers preferred AI-generated questions built around Bloom's taxonomy over questions produced from simple, generic prompts. The pedagogy in the prompt mattered more than the model itself.

It probably won't save you time — at least not the first time

This one surprised us. Across teachers in the Elkins et al. (2024) study, the average time to write quizzes was roughly the same whether teachers used controlled AI generation, simple AI generation, or wrote the questions by hand.

If your reason for reaching for AI is purely speed, you may be disappointed. But if your reason is quality, consistency across Bloom's levels, or breaking through blank-page syndrome — the picture looks much better.

One exception: differentiation.

Differentiation is the practice of adjusting learning material so it suits learners working at different levels in the same classroom — extension for some, scaffolding for others, grade-level material for the middle.

Done well, it's powerful.

Done at all, it's time-consuming, which is why it often gets dropped from the planning week.

Akdeniz, Clark and Roberts (2025) found that AI-generated questions aligned well with lower-order cognitive tasks and remained consistent across subjects, with promise at higher-order levels too. The authors describe AI as a tool to support teachers in differentiating instruction — not replace their expertise. In a higher education context, differentiation might mean contextualising the same underlying concept for students from different disciplines in a shared course — framing a statistics question through a public health lens for one cohort and an economics lens for another. Or producing two versions of a tutorial question set: one pitched at second-year students still consolidating fundamentals, and one stretched for third-years ready to apply the concept in less familiar territory. Producing one version with AI takes roughly the same time as producing one by hand. Producing several tailored versions does not. This is the use case where the time argument finally lands in AI's favour.

Students notice the difference between AI and human questions

Kusam et al. (2025) compared student perceptions of AI-generated and manually-written quizzes. Their findings:

  • Students rated AI quizzes as clearer and fairer (higher clarity and accuracy ratings).
  • Students found manual quizzes more challenging and engaging, and often better aligned to lecture emphasis.
  • In open comments, students described AI quizzes as straightforward review, while manual quizzes pushed deeper reasoning but were occasionally ambiguous.

In other words: AI is good at clean, fair, well-formed questions. Humans are still better at the kind of cognitively demanding, context-aware questions that stretch a learner. That's a useful split to keep in mind when you decide what to delegate.

The failure modes are predictable — which means you can plan for them

Across the studies, three problems with AI-generated questions kept coming up:

  • Scope drift — some AI items covered out-of-scope topics.
  • Uneven topic coverage — over-representation of certain topics; gaps in others.
  • Ambiguity and missing context — some AI questions lacked the detail students needed to answer them, which hurt student performance.

Elkins et al. also noted a useful detail: "Empirical results from preliminary experimentation showed that generating all of the questions together produced more diverse outputs, whereas generating them separately produced duplicate questions." Batch your generations.

What the literature recommends

Pulling the threads together, here's what the studies converge on:

  • Use prompt engineering. Give the model a clear role, goal and constraints. Include examples (few-shot) to control style and taxonomic level.
  • Prefer controlled prompts tied to pedagogy. Reference frameworks like Bloom's taxonomy rather than asking generically. Generate one question per Bloom level to reduce duplication.
  • Generate in batches. Ask for all questions at once to get more diverse outputs.
  • Keep context short and focused. Use targeted passages or knowledge points so generations stay on-topic and answerable.
  • Provide domain examples. Three to five human-crafted exemplars worked well in experiments.

A quick note on few-shot prompting

Few-shot learning is a prompting strategy where you give the model a small number — typically one to five — of example input → output pairs inside the prompt, so the model can follow that pattern when generating new outputs. The examples teach the model the desired format, style or level without fine-tuning the model weights.

A quick checklist for a good few-shot prompt:

  • Provide a role and a goal (e.g. "You are a quiz writer. Generate a Bloom's-level question.").
  • Include three to five clean examples covering the variation you want.
  • Supply the target passage or knowledge points.
  • Specify output format (MCQ, open-ended, length, whether the answer is included).

The action: three ways to actually do this

Reading the research is one thing. Building the quiz is another. Based on what the literature recommends and what we've seen work in practice, here are three approaches you can try — ordered from lowest to highest effort, and from least to most customisable.

Approach 1: Use a tool with a built-in quiz generator

If you want to get a feel for AI-generated questions without writing a single prompt, start here. Tools like Menti metre, NotebookLM (free use with a google account) and Thea have quiz generation built directly into the product. You can upload your source material, and the tool produces questions grounded in that content.

Best for: educators new to Gen AI, quick formative checks, and getting unstuck when you're staring at a blank page.

Watch for: limited control over Bloom's level and question style. You'll likely get clear, fair questions — but they may sit at lower cognitive levels by default. Uploading information or sources you don’t own.

Approach 2: Build your own question generator with a custom prompt

This is where the research findings really pay off. Using a general-purpose AI tool like Claude, Copilot or RMIT’s Val, you write a structured prompt that locks in the pedagogy: a clear role, the Bloom's level you want, two or three exemplar questions, and the source material. You save the prompt, and reuse it every time you need a new quiz.

A starting prompt looks something like this:

You are an experienced [subject] educator writing quiz questions for [year level / cohort]. Generate six multiple-choice questions based on the passage below: one at each level of Bloom's taxonomy (Remember, Understand, Apply, Analyse, Evaluate, Create). Each question should have four options and a clearly marked correct answer. Match the tone and style of these examples: [paste 2–3 of your own questions]. Passage: [paste source material].

Best for: educators who want consistency across quizzes, control over cognitive level, and the ability to refine a prompt over time so it gets better at producing your style of question.

Watch for: the first version of your prompt won't be the best version. Plan to iterate. Run the AI's questions past your learning outcomes, and ask the AI to answer its own questions — if it stumbles, your question probably needs more context.

Approach 3: Vibe-code your own interactive question experience

If you're comfortable letting AI generate code as well as content, you can go further: a custom interactive quiz, a scenario-based escape room, a self-marking practice tool, a discussion-prompt generator your students use directly. You describe what you want in plain English, and the AI builds it. This is what people mean by "vibe coding" — you don't need to write the code yourself, but you do need to be able to test it, give feedback, and know what good looks like.

Best for: educators wanting to experiment with format, build assets they can reuse for years, or design learning experiences that simply don't exist as off-the-shelf products.

Watch for: the highest effort of the three. Worth it for an asset you'll use repeatedly; probably not worth it for next Tuesday's lesson check-in.

A few other things worth knowing

Whichever approach you choose, these small habits will sharpen your output:

  • Have the AI answer the questions you've created. Did it answer the way you expected? If not, the issue is often clarity, not the AI.
  • Ask the AI to critique your questions. It's surprisingly good at flagging ambiguity, double-barrelled questions, and answer options that don't match the stem.
  • Check alignment with learning outcomes. Especially in assessment. AI doesn't know your unit outline — that's still your job.

So — can Gen AI create learning questions?

Yes. And it's an excellent tool for tackling blank-page syndrome and getting started. But it still requires your expertise, careful prompting tied to pedagogy, and willingness to iterate and refine.

It might not speed up the process — most definitely not the first time you do it. But as you train your chosen AI model or application with examples, constraints and feedback, it gets better at creating the kinds of questions you want. The investment compounds.

The educators who get the most out of this aren't the ones with the best prompts. They're the ones who treat the AI as a junior colleague: give it good instructions, check its work, and push back when it gets things wrong.

Join us at our May session

Come to our online May session to try one of these strategies in a community environment. We'll walk through all three approaches above with live demos, and leave time for participants to try one for themselves on a topic they're actually teaching. Bring a unit outline, a passage of source material, and an open mind.

Date: Wednesday27th May @ 12:00PM – 1:00PM

Register Gen AI for Educators: Emerging Practices and Possibilities (Monthly Session)

References

Akdeniz, H., Clark, T., & Roberts, J. L. (2025). Can AI Generate Questions Aligned with Bloom's Taxonomy? A Framework for Gifted Education to Support Teachers. Journal of Advanced Academics, 36(4), 671–694. https://doi.org/10.1177/1932202X251349917

Elkins, S., Kochmar, E., Cheung, J. C. K., & Serban, I. (2024). How Teachers Can Use Large Language Models and Bloom's Taxonomy to Create Educational Quizzes. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24).

Kusam, V. A., Song, Z., Maxim, B. R., & Kattan, K. (2025). Evaluating the Effectiveness of Generative AI for Automated Quiz Creation: A Case Study. Paper presented at the American Society for Engineering Education Annual Conference, 2025.

Lam, Y. Y., Chu, S. K. W., Ong, E. L. C., Suen, W. W. L., Xu, L., Lam, L. C. L., & Wong, S. M. Y. (2024). Comparative Study of GenAI (ChatGPT) vs. Human in Generating Multiple Choice Questions Based on the PIRLS Reading Assessment Framework. In Proceedings of the 87th Annual Meeting of the Association for Information Science & Technology, Calgary, AB, Canada.

22 May 2026
aboriginal flag float-starttorres strait flag float-start

Acknowledgement of Country

RMIT University acknowledges the people of the Woi wurrung and Boon wurrung language groups of the eastern Kulin Nation on whose unceded lands we conduct the business of the University. RMIT University respectfully acknowledges their Ancestors and Elders, past and present. RMIT also acknowledges the Traditional Custodians and their Ancestors of the lands and waters across Australia where we conduct our business - Artwork 'Sentient' by Hollie Johnson, Gunaikurnai and Monero Ngarigo.

Learn more about our commitment to Indigenous cultures