financial goal planning dynamic programming approach

Financial Goal Planning – Dynamic Programming approach

by Shubham Satyarth May 27, 2022

In the previous part of this Financial Goal Planning series we looked at how the traditional approach to portfolio construction based on “Expected” returns is not optimal – it doesn’t select a portfolio that has the highest probability of achieving the goal.

We also explored the static approach to goal planning where we pick an optimal portfolio today and then simply rebalance the portfolio till maturity (goal horizon).

Let’s revisit our toy example so that we can use it in this blog as well.

The goal was to reach a corpus of Rs 1.79 cr in 10 years’ time. For this, we agreed on a starting SIP amount of Rs 76,000 per month and grow it by 5% every year. Implied required rate of return (annualized) was 9%. Our portfolio choice was restricted to 2 assets – risky asset represented by exposure to Nifty 50 Index through ETF and risk-free asset represented by exposure to a Liquid fund.

We saw that 40:60 portfolio was suboptimal since the chances of achieving the goal was barely 53%. Instead, a 70:30 portfolio has roughly 70% odds.

Suppose we do start with a 70:30 portfolio. We then enter a bear market for 2 years and after 2 years, the required return of achieving the goal becomes 12% (from 9%).

Do we stick with the 70:30 portfolio or we do change the portfolio?

A simple and intuitive approach

Suppose we can select portfolios only once – at the start of the year. A very simple approach could be – at the start of the year,

Evaluate your current wealth and calculate the required return to achieve the goal based on years left.
Select a portfolio that maximizes the probability of achieving the goal, given new inputs.

In our example, we start with a required rate of 9% and 10 years to goal with a 70:30 portfolio. After 1 year, suppose the market tanks and your required return becomes 12% with 9 years to goal. This combination of horizon and required return can fully determine your choice of portfolio. We call this combination as “state” and the choice of portfolio as “action”. We will denote a state by <required return, horizon>.

Note that we have transitioned from state <9%, 10 years> to state <12%, 9 years>. Then, we pick a portfolio that is optimal for <12%, 9 years> combination.

This is a very simple and intuitive approach. Every year, evaluate the state you are in and pick an action (portfolio) that maximizes the probability of achieving the goal.

This is not a bad approach. But can we do better?

In this approach, at every step, we are picking a portfolio that is optimal for that combination of required return and horizon. This has an implicit assumption that once we pick a portfolio, we are not going to change it.

In other words, the knowledge that I am going to get to pick portfolios every year is not used in this approach.

To incorporate this knowledge, we need to solve for optimal policy by going backwards in time.

Dynamic programming – an informal introduction

So, what are we trying to solve?

Given a combination of required return and horizon, we need to pick the best portfolio which need not be the “optimal” portfolio for this combination.

We also know that we will get to course correct (pick a portfolio) every year and our action today should account for that fact.

To solve this, we need to move backwards in time.

We start with 1 year to goal and pick the best portfolios for different levels of required return. Note that with only 1 year to goal, the best action is the optimal portfolio – one that maximizes the chances of achieving a return that is greater than or equal to the required return.

Now we move back 1 step and look at 2 years to goal. Suppose that we are solving for a best action for 10% required return. Let’s assume that next year returns can only be 3 outcomes – 5%, 10% and 15%. Then, possible outcomes for required return in the next step (1 year to horizon) are 15.2%, 10% and 5.2%. We have already solved for best actions for these outcomes.

Our action today (2 years to goal) will be such that we maximize the overall probability accounting for A. probability of landing up in different states in 1 year’s time and B. probability of achieving the goal once we have landed in a particular state in 1 year’s time.

We then keep moving backwards in time till we reach the present day. This gives us our policy – for each state (return, horizon pair) we have an action (portfolio).

Dynamic programming – a formal setup

This is a slightly technical section where we discuss the details of our model and setup. Readers can skip to the next section without breaking the flow of discussion.

To solve this dynamic programming problem, we need to formulate our goal planning exercise as a Markov Decision Processes (MDP). MDP has a state space and an action space and a set of transition probabilities – that controls transition from 1 state to the next.

We have already introduced state as a combination of required return and horizon – <required return, horizon>. Given this definition, we see that there can be infinite number of states. Therefore, we need to discretize both the dimensions.

We discretize the required return into integer returns from 1% to 20%. Any level lower than 1% is absorbed in 1% and any level higher than 20% is absorbed in 20%.

We discretize horizon by taking annual steps. Thus, for a 10-year horizon (in our toy example), we have 10 steps along the time dimension.

This discretization converts the state space into a 20 x 10 grid with rows representing a level of required return (1% to 20%) and columns representing time (annual steps).

There are methods for solving continuous state dynamic programming problems. Ideally, we would like to work with continuous state along the return dimension but that is beyond the scope of this blog.

We have also introduced the notion of action which is simply picking a portfolio from our restricted set of assets. Even with this restricted set, we can see that the action space is also infinite. Hence, we need to discretize action space as well. In our example, we limit the action space to 21 portfolios starting with 100% liquid and going all the way up to 100% Nifty 50 index with an increment of 5%.

We have our state space and our action space. Now we need a model for transition probabilities. Given a state (required return R and H years to horizon) and an action (choice of portfolio), what are the probabilities of moving to next 20 states – 1% to 20% required return and H-1 years to horizon.

First, we model annual returns of Nifty 50 as univariate student-t distribution with 4 degrees of freedom. We use maximum likelihood to estimate the parameters of our distribution.

Liquid returns are assumed to be deterministic – 4% return with probability of 1. Here are the steps involved:

Sample large number of scenarios of Nifty 50 returns from the specified distribution. Combined with deterministic liquid returns, we have return scenarios for each of our 21 portfolios.
Each scenario of 1-year return yields a scenario of required return in the next step. Thus, we have a scenario probability distribution of required returns.
We then discretize these scenarios to yield a probability mass function for our discreet levels of required returns – 1% to 20%. This is our transition probability model.

We can now solve our MDP using standard DP technique to get our optimal policy.

For 1-year to horizon the problem reduces to picking an action that maximizes the probability of 1-year return being greater than required return and we can directly use the CDF of our student-t distribution.

Is dynamic programming even worth the effort?

Let’s first start with our toy example. A dynamic approach increases the probability to 85% from 74% in a static approach.

Now let’s look at more general results.

In the previous part, we presented a heat map that showed the probability of achieving the goal for different combinations of required return and horizon. This was assuming that we pick an optimal portfolio at start and then take no action (apart from periodic rebalancing).

It would be interesting to compare that heatmap (static approach) with a similar heat map for a dynamic approach.

To calculate the probabilities, we have used a technique called historical bootstrapping to simulate 1000 possible 10-year paths of Nifty 50 and ran the portfolio for each path. We have used a deterministic 4% annual return form liquid fund and not assumed any taxes or transaction costs. Probability is calculated as the number of paths in which portfolio’s final value is greater than the target value divided by 1000.

First, we present the standalone results for dynamic approach in the chart below:

A better chart would be a heat map of difference in probabilities between dynamic and static approach. Below, we present a heat map that shows the increase in probability achieved by using a dynamic programming approach over the static approach.

Few interesting observations:

Difference in column 1 is 0 by construction – both dynamic and static approach are same for 1 year horizon.
Dynamic approach adds value and increases the odds of achieving the goal by 6% to 10%.
As we move down the grid (increasing required return), benefit of a dynamic approach decreases. This is expected because odds of achieving the goal are anyways so low that a dynamic approach can barely add any value. Dynamic approach works best in 8% to 14% range.
Benefit of dynamic approach appears to decrease with increase in horizon. This is slightly counter intuitive and is likely due to lower variation in portfolio outcomes over longer horizons. Nonetheless, the benefit is still significant.

Use of dynamic programming for financial goal planning seems to add significant value over a one-step static approach.

Note that we have used a distributional assumption for Nifty 50 returns to solve our dynamic programming problem. Such an approach is called model-based approach because the agent has full knowledge of (or can infer) state transition probabilities. Agent uses this knowledge to arrive at the optimal policy.

What if our distributional assumption (or parameter estimation) is incorrect? It appears that a more robust approach would be a model-free approach where we let the agent learn the best policy from experience. That is jumping into the realm of Reinforcement Learning which we will explore in our next blog.

Financial Goal Planning – Dynamic Programming approach

A simple and intuitive approach

Dynamic programming – an informal introduction

Dynamic programming – a formal setup

Is dynamic programming even worth the effort?

Few interesting observations:

Financial Goal Planning – The traditional approach

Smart Investing: How to Maximize Returns with sharpely

Categories

Macro & Markets

Stocks & Sectors

MF and ETF

Personal Finance

Quant Investing

sharpely Spotlight

Featured blogs

Active, Passive and Smart Beta: Part 1 – An Introduction

Active, Passive and Smart Beta: Part 2 – Active vs Passive Investing

Active, Passive and Smart Beta: Part 3 – From Assets to Factors

Active, Passive and Smart Beta: Part 4 – Systematic Factors and Risk Premium

Active, Passive and Smart Beta: Part 5 – Smart Beta Strategies