Free GARP RAI 2026 Practice Questions: Test Your Knowledge
- Mar 5
- 8 min read

Artificial intelligence is now embedded in financial risk management: from credit scoring and trading to surveillance, stress testing, and operational risk. The GARP Artificial Intelligence (RAI) Certificate is designed to ensure that risk professionals can understand, challenge, and responsibly deploy these models across their organizations.
Free GARP RAI 2026 Practice Questions
This set of 10 free practice questions for the GARP RAI 2026 exam focuses on core topics you are likely to encounter in the syllabus: reinforcement learning and the multi-armed bandit problem, supervised learning and model estimation, NLP and text analytics, explainability and model risk, as well as data governance, privacy, and cybersecurity.
Use these questions as a quick readiness check: identify where you’re strong, where you need to revisit the official materials, and how exam concepts translate into real-world AI risk decisions. Free GARP RAI 2026 Practice Questions
Question 1 – Choosing the Right Learning Paradigm
A risk team wants to design an AI agent that dynamically adjusts intraday trading limits for a desk. The agent will try a limit, observe the resulting P&L and risk breaches over the next hour, receive a numeric reward, and then adjust its behaviour in the next hour. There are no fixed “labels” in the historical data; instead, the system must learn by interacting with the live environment.
Which learning approach best fits this setup?
A. Supervised learning with labeled examples of “good” and “bad” limits
B. Unsupervised learning to cluster trading days
C. Reinforcement learning with a policy optimized for long-term reward
D. Naive Bayes classification using historical trade features
Question 2 – Exploration vs Exploitation (Multi-Armed Bandit)
A robo-advisor is A/B testing three versions of a portfolio rebalancing notification sent to clients. Each version (A, B, C) generates different click-through and conversion rates, but the true probabilities are unknown and may drift slowly over time. The team wants a strategy that:
Mostly sends the best-performing notification, but
Still occasionally tests the others to avoid missing a better option.
Which strategy is most appropriate?
A. Purely greedy: always send the notification with the highest average past conversion
B. Purely random: choose notifications uniformly at random
C. ε-greedy with a small, possibly decaying ε
D. Deterministic rotation: cycle A → B → C in a fixed order
Question 3 – Markov Decision Process and Discounting
A collections department models a customer’s delinquency status as:
State 1: Current
State 2: 30 days past due
State 3: 60+ days past due
At each month-end, depending on the current state and the action (e.g., reminder SMS vs. call vs. no action), the customer may transition to another state with certain probabilities and generate a monetary reward or loss. The team wants to maximize the discounted sum of future rewards over an indefinite horizon.
Which combination best characterizes this setup?
A. Multi-armed bandit with one state and no transitions
B. Markov Decision Process with a discount factor γ between 0 and 1
C. Simple regression with no state transitions and no discount factor
D. Monte Carlo simulation with a finite horizon and no discounting
Question 4 – Monte Carlo vs Temporal Difference
A bank is training an AI to optimize overdraft fee waivers. Each customer relationship is viewed as a very long (theoretically infinite) sequence of months, and the bank wants the model to update its value estimates after each month without waiting for the “end” of the relationship.
Which learning approach is more suitable for updating value estimates in this context?
A. Pure Monte Carlo methods using full, finite episodes
B. Temporal Difference (TD) methods such as Q-learning
C. Deterministic dynamic programming with a known transition model
D. Simple averaging of historical overdraft revenue, without a value function
Question 5 – Curse of Dimensionality and Deep RL
A firm is building a hedging agent for a large derivatives portfolio. The state includes hundreds of risk factors (rates, FX, volatility surfaces, credit spreads), and the action is “how much to hedge” along many instruments. A tabular Q-function is infeasible because the number of possible state–action combinations is enormous.
Which approach is most aligned with recommended practice in such high-dimensional reinforcement-learning problems?
A. Use a lookup table for Q(s, a) and populate it over time
B. Approximate Q(s, a) with a neural network (deep reinforcement learning)
C. Replace RL with a single linear regression on historical P&L
D. Ignore state information and choose hedges randomly to ensure exploration
Question 6 – NLP Pre-Processing Choices
An AI risk team applies sentiment analysis to annual reports and central bank speeches. They:
Convert all text to lowercase
Remove stop words like “the”, “and”, “also”
Stem/lemmatize words (e.g., “disappointing”, “disappointed” → “disappoint”)
Remove punctuation
Which potential downside of this pipeline is most accurate?
A. Removing stop words usually eliminates all information about sentiment
B. Lowercasing always breaks sentiment analysis because models require capitals
C. Stemming/lemmatization and punctuation removal can lose nuance (e.g., sarcasm, degree of positivity)
D. Pre-processing is unnecessary because modern models only use raw text
Question 7 – TF–IDF and Rare but Informative Terms
A compliance team uses TF–IDF features on internal incident reports. Across thousands of reports, words such as “system”, “issue”, and “process” appear very frequently, while words like “fraud”, “bribery”, and “sanctions” are relatively rare but critical when they occur.
Under the TF–IDF weighting scheme, which statement is most correct?
A. Common words like “system” will have higher TF–IDF than “fraud”
B. Rare but important words like “fraud” or “bribery” can receive higher TF–IDF weights
C. TF–IDF always assigns equal weight to all words in a document
D. TF–IDF only measures document length, not word frequency
Question 8 – Naive Bayes and Smoothing
A bank uses a Naive Bayes classifier to categorize customer complaints into “service good”, “service bad”, or “indifferent” based on key words such as “slow”, “helpful”, “great”, and “inefficient”. During training, the word “outstanding” appears only in positive reviews and never in negative ones.
Why might Laplace smoothing (adding 1 to word counts) be important?
A. It guarantees 100% accuracy on the training set
B. It prevents any class from having zero probability just because a word was unseen in that class
C. It forces the model to ignore rare words like “outstanding”
D. It eliminates the need to estimate prior class probabilities
Question 9 – Overfitting, Underfitting and Regularization
A credit risk team builds three probability-of-default (PD) models using the same dataset:
Model A: Simple logistic regression with 3 features
Model B: Logistic regression with 80 features including many interactions and non-linear terms
Model C: Same as Model B, but with L1/L2 regularization applied and hyperparameters tuned via cross-validation
Back-testing shows that:
Model A has high bias and poor fit on training and test data
Model B has extremely low error on the training set but much higher error on the test set
Model C has slightly higher training error than B, but significantly lower test error than B
Which statement best describes what is happening?
A. Model A is underfitting; Model B is overfitting; Model C improves the bias–variance trade-off via regularization
B. Model A is overfitting; Model B is underfitting; Model C is identical to B
C. All three models are underfitting due to too few features
D. Regularization in Model C should always be removed because it increases training error
Question 10 – Governance, Privacy and Explainability
A bank deploys an AI-based credit underwriting model using alternative data (e.g., transaction histories, device metadata, scraped web data). The model is highly accurate but opaque, and uses a third-party vendor’s proprietary code. The board asks the CRO how the bank will manage model risk, privacy risk, and regulatory scrutiny.
Which combination of actions is most aligned with good AI/data governance and explainability practice?
A. Rely on vendor marketing material and ignore data provenance because accuracy is high
B. Maintain a centralized model inventory; verify data provenance and legal rights to use the data; apply data minimization and retention limits; require explainability tools (e.g., feature importance, SHAP, surrogate models) to support monitoring and challenge
C. Store all raw personal data indefinitely to facilitate future model development
D. Decommission the model immediately because all opaque models are prohibited in finance
Answer Key and Explanations
1. C – Reinforcement learning with a policy optimized for long-term reward
The system learns via trial and error, receiving feedback (rewards) from the environment and updating a policy to maximize long-term cumulative reward, which is exactly the reinforcement learning setup described in the material.
Supervised learning (A) would require labeled “optimal” limits; unsupervised learning (B) only finds structure; Naive Bayes (D) is a classifier, not a sequential decision process.
2. C – ε-greedy with a small, possibly decaying ε
Multi-armed bandit problems balance exploration (trying different actions) and exploitation (choosing the best-known arm). A small ε-greedy strategy mostly exploits but occasionally explores; a decaying ε allows early exploration and later exploitation, as described in the chapter on MAB and ε-greedy with decay.
Purely greedy (A) risks getting stuck on a suboptimal notification; purely random (B) wastes performance; fixed rotation (D) ignores observed rewards.
3. B – Markov Decision Process with a discount factor γ between 0 and 1
There are multiple states, probabilistic transitions, actions that influence transitions and rewards, and an infinite-horizon discounted sum of rewards—this is the classic Markov Decision Process (MDP) setup with discount factor γ.
MAB (A) has a single state and no state evolution; (C) and (D) do not capture the Markov transition structure plus discounting.
4. B – Temporal Difference (TD) methods such as Q-learning
TD methods update value estimates using bootstrapping (current reward + discounted estimate of next state) and are well suited to continuing tasks with no natural episode end.
Pure Monte Carlo (A) requires complete episodes with finite horizons; (C) assumes you know transition probabilities exactly; (D) ignores the notion of a value function entirely.
5. B – Approximate Q(s, a) with a neural network (deep reinforcement learning)
The chapter discusses the curse of dimensionality and notes that when the state–action space is huge, tabular Q-functions are impractical and neural networks are used to approximate Q(s, a), i.e., deep reinforcement learning.
A lookup table (A) explodes in size; single linear regression (C) throws away the sequential decision structure; random hedging (D) is obviously suboptimal.
6. C – Stemming/lemmatization and punctuation removal can lose nuanceThe NLP section explicitly notes that:
Lowercasing and stop-word removal can be useful
Stemming and lemmatization may remove nuance (e.g., “good” vs “outstanding”)
Removing punctuation can lose information such as questions or exclamation marks, which can signal tone or sarcasm.
So C captures the main trade-off. A, B, and D are incorrect or exaggerated.
7. B – Rare but important words like “fraud” or “bribery” can receive higher TF–IDF weights
TF–IDF combines term frequency in a document with inverse document frequency across the corpus, so terms that are relatively rare overall but present in a given document tend to get higher weights than very common words that appear everywhere.
Therefore, rare risk words (“fraud”, “bribery”) often get higher TF–IDF than generic words like “system”. A, C, and D misrepresent TF–IDF.
8. B – It prevents any class from having zero probability just because a word was unseen
The Naive Bayes section explains that unseen words in a class yield zero likelihood and thus zero posterior probability for that class, even if other evidence suggests it. Laplace smoothing adds 1 (or λ) to all counts to avoid zero probabilities.
Smoothing does not guarantee perfect accuracy (A), does not force the model to ignore rare words (C), and does not eliminate the need for priors (D).
9. A – Model A is underfitting; Model B is overfitting; Model C improves the bias–variance trade-off via regularization
The supervised learning chapter highlights:
Underfitting: too simple, high bias, poor fit on both training and test data.
Overfitting: very low training error but poor generalization (high test error).
Regularization (ridge/LASSO/elastic net) + cross-validation helps reduce overfitting and improve the bias–variance trade-off, often slightly increasing training error but improving test performance.
That is exactly what you see for Models A, B, and C.
10. B – Central inventory, data provenance, minimization, explainability tools
Good AI and data governance for financial models includes:
Model governance: centralized model inventory, documented lifecycle, validation and monitoring.
Data governance and privacy: verifying data provenance and legal right to use data, complying with privacy laws, data minimization and retention policies, and strong security controls.
Explainability (XAI): feature importance, Shapley values, surrogate models (e.g., LIME) to make complex models more understandable and auditable.
Option B is the only answer that ties these together. A ignores key risks; C increases risk by hoarding data; D is too extreme—opaque models may be used if governed and explained appropriately.




Comments