May 15, 2023 10 min read

ChatGPT: A Deep Dive

Author: Marco Santanche

Can we use ChatGPT as a quant strategist?

For this month’s article, I will focus on asking ChatGPT to be a quant for us.

This is probably something many readers have already read everywhere. However, the process followed here will be new, as it is based on my specific backtesting framework in Python. ChatGPT will have to first come up with an ETF strategy, and then work on its optimization by interacting with the backtesting framework. Moreover, we will consider the final result of this exercise as one of the portfolios to monitor in our monthly update.

The setup for any strategy we want to develop must be structured in modules. Modules can include:

Trading signal: when should we buy or sell, and some intensity (e.g. for value, our current estimate of the future value of the stock or ETF).
Asset allocation: given the signal, how to allocate our securities by following the direction and, potentially, intensity of the signal.
Risk management: how to control for risk, including specific targets (e.g. reducing volatility or drawdowns, stop losses).
Execution management: in what way should we reach our target allocation (many orders or a few big ones, order types, etc.).

We will backtest the implementation of all the suggestions our tool will make, and we will ask it what to do about them if there is anything to improve or review.

Initial test: a portfolio to hold

Before proceeding with a full-fledged systematic strategy, I asked ChatGPT to produce a portfolio that would have been performing well in 2021. These are the weights and securities it suggested:

The backtested performance was just great in 2021, as the following equity line plot shows:

The return was 13.9% over the year, with a standard deviation of just 0.6%. The total Sharpe was around 21!

But this is what we call overfitting in quant finance. Of course, if I ask a model (or ChatGPT) to select the best fund holdings in 2021, this is expected. But what would have been the performance in 2022?

Return was -15.33%, volatility around 1% and Sharpe -15.24. Not that attractive anymore…

At this point, we need another solution, so I asked ChatGPT to develop an end-to-end strategy from scratch, as described above.

Trading signal and asset allocation

On this topic, after some prompting, ChatGPT seems to have a clear preference: we should try to use momentum. Nothing special, although it is a solid idea (we verified in our previous article how well it works in stocks).

Nevertheless, we also know that momentum can suffer in market downturns, so we will definitely need to add a risk management layer to it.

Additionally, we need to know what securities we would be trading. Here is its reply:

I coded the systematic strategy suggested by the AI. The allocation module is basically a ranked-momentum asset allocation, with a rebalancing frequency of a quarter. Using data from Yahoo, I could easily run all this in my backtester. We will compare the performance with a benchmark, which would be SPY, although it is only for reference, as we should not expect to outperform it if we are not investing 100% in equities.

The results did not disappoint. Here is the performance of the mentioned strategy compared against SPY:

The return of the portfolio was around 5.5% annualized, and risk 17.1%, with a Sharpe of 0.32. This sounds less like an overfitting than the previous strategy, as it was not the result of some specific performance target. However, we see a long-lasting drawdown in 2022, which leads to a maximum drawdown of 28.83%.

Although we need to keep in mind that the strategy is long-only, it seems like we can do better on the risk profile. So I asked ChatGPT how to improve the strategy with regards to drawdown, for example.

After some back and forth, the tool suggested using a rolling drawdown logic:

I asked what would have been the ideal threshold, and it proposed testing with 20%. It was simply a disaster. Here is the chart:

Stopping an ETF completely due to its drawdown is an own goal, as the concentration of our portfolio increases. At some point, due to the bad performance every ETF had in 2022, we had 100% of our investment in a single ETF, the one that reached -20% (on a yearly basis) later than the others (LQD, the investment-grade bond ETF).

We need to keep in mind the following: firstly, never trust ChatGPT completely (if this was not clear already). AI can suggest ideas taken from its knowledge base, but it is not enough to make investment decisions: we should always consider, test and adapt to our use case. Secondly, the maximum drawdown is not a metric that we should use this way: it is heavily path-dependent (with the Covid-19 pandemic you have most of the drawdowns in any backtest including 2020; what if we had another unexpected global crisis? Of course we cannot avoid these black or even gray swans before they happen) and having such long-term stop losses can be detrimental to performance, rather than a useful addition. It might make sense to avoid deeper dips, but drawdowns are often impossible to forecast (once it happens, you know it) and stopping does not help in preventing losses, as it is much more likely that we come back after some months of underperformance.

I explained these points to ChatGPT and asked if there is any alternative way to avoid large drawdowns. It suggested extending our universe by including the following ETFs:

● Vanguard (VNQ)
● SPDR Gold Trust (GLD)
● Vanguard Emerging Markets Stock Index (VWO)
● Vanguard Tax Managed Fund FTSE Developed Markets (VEA)
● iShares Core US Agg bond (AGG)
● iShares 7-10 Year Treasury Bond (IEF)

The performance slightly improved. Equity line:

With this version, we can expect a 6.76% annualized return, 14.80% annualized volatility and 0.46 Sharpe ratio. Even the maximum drawdown was reduced to -24.85%.

Although diversification clearly helped, the strategy is still not able to avoid large drawdowns. I asked ChatGPT to do something about it by analyzing the returns and weights of our progress so far.

Risk management: macro data

At this point, ChatGPT suggested many things, one of those being to incorporate macro data in our strategy. The problem with this is not only data management, cleaning and setup, but also history length, as we would need a long-term history to understand if we can properly leverage the data and if our methodology makes sense. For example, there has been a long period of deflation in recent years, and only lately inflation rose, thus using 10 years might be insufficient. For this purpose, I extended the in-sample period from 2007 onwards, just as a quick test.

Keeping the rest of the strategy constant, I asked ChatGPT how to use macro data, and it suggested calculating a score and summing it up with the momentum score.

The methodology was not clear, and also, I do not think that exiting and entering a position in full is a good idea (as shown from the max drawdown example). Thus, I thought about rebalancing the portfolio not only according to our momentum methodology, but also including penalizations in case of an unfavorable macro environment, in order to avoid another unreasonable rebalancing logic.

To do so, I have set up a strategy component to use macro data and calculate a forecast of the next monthly return for an ETF, given the data. This is just a toy example, as we usually need to be a bit more sophisticated, provided that forecasting is always difficult in financial time series and we also need to adjust the data by, for example, standardizing or lagging it. This is usually necessary to make the model perform better and - even more importantly - keep realistic assumptions on the availability of the data.

As described, I used the forecasts as an input to control risk, not as a signal: my target was to penalize the ETFs with a negative prediction, given macro conditions. We will use this module only to reduce exposure to risky ETFs given the fundamental score, but we will ignore the positive side of the forecasts, and we will not increase our exposure in case of a favorable environment. This would otherwise make it contribute to the signal component of our strategy, which means it would make it an integral part of the signal. That is definitely another way of trying to use the data, but in this case we are satisfied with the initial result, and our aim is just to reduce risk.

These are the results:

Returns decreased to 4%, volatility was 11.12%, Sharpe was also lower (0.35) and the drawdown (our main target) was reduced to 15.9%. This is exactly what we asked the AI: with the cost of a lower return and Sharpe, we managed to keep risk under control.

Increasing returns with a long-only, diversified strategy while also reducing drawdown risk would be challenging for any portfolio manager, and ChatGPT is clearly not able to do much better by itself, or it would probably need much more time spent prompting it and optimizing the outcome. In addition, my implementation is a very basic one, and one could argue that performance only improves if we attempt to model in a better way time series, interactions, regimes, etc. However, ChatGPT’s suggestions proved useful, as we moved into the direction we asked it to: reducing risk often comes with lower returns in financial markets.

Execution exercise

So far so good: our strategy was relatively good from the beginning, and ChatGPT helped us in improving the risk profile. From my perspective, having lowered drawdown by 10% is a considerable result, given our simplistic way of implementing components and the full trust we are giving to ChatGPT. But it all depends on your targets and preferences, and, for instance, we had to sacrifice our returns to achieve our objective.

The next question is: can ChatGPT help improve the execution of our strategy? Although it trades on a daily basis, and as a retail trader we probably do not have to worry about liquidity, one might still want to enter or exit trades with some more sophisticated model, for example to reduce fees. I asked ChatGPT what would be the optimal execution strategy given our setup.

As a result, the AI suggested using VWAP in order to identify liquid or illiquid ETFs, and execute with limit orders or market orders depending on the current spread. It might be a simple idea, but for a complete stranger to execution strategies, it might be worth exploring.

In this case we cannot truly run a backtest, as I am testing on candle data from Yahoo, and no spread or tick-level price can be sourced. However, this is a sample code that ChatGPT proposed to use:

Generally speaking, some brokers and platforms (including, for example, MetaTrader) have ways to source from their APIs the tick size, minimum and maximum quantity, and other validation parameters directly in Python (using libraries or basic json functions). I normally develop production modules by sourcing the relevant information for live trading. ChatGPT can help you while coding, if you provide it with some specific requirements, like the input and output shape.

Strategy modules: overview of the final picture

The general outline of our strategy is as follows:

● Signal: a momentum signal, calculated quarterly, expressed as the return percentage over the period.
● Asset allocation: a ranking system which assigns weights proportional to the momentum rank.
● Risk management: we tested a drawdown stop loss, but it was not effective. When we instead tried to use a macro forecasting component, we managed to reduce risk and drawdown significantly, with the cost of a reduced return.
● Execution strategy: ChatGPT suggested using VWAP to identify the optimal execution system, and also monitoring the spread to use limit or market orders depending on current conditions.

Final considerations

ChatGPT is definitely a useful tool and, with some additional context and maybe a more specialized training, it might help even more by creating and testing strategies. For example, imagine if we fed it with the results of the backtests automatically, and asked it about the performance. We should be careful about overfitting, that is true, but it is also true that we can test multiple ideas and limit, for example, the similarity of those ideas. In addition, the intended use is not to optimize parameters, which easily leads to false discoveries, but rather to see how our assumptions and combination of data and tools behave in an example.

Language is not everything in finance, and LLMs are the tip of the iceberg of a quant’s needs. But it is a great starting point, and using the tools we have at our disposal is the only way to innovate and improve what we know.

Update on strategies

To conclude, let us monitor what happened to our existing strategies. I am also going to add ChatGPT’s strategy, and next time I will ask how to improve our portfolio (if needed) by providing the latest performance metrics. Needless to say, our AI strategy is a low-risk one.

Updates as of 11/05/23:

Trading strategy is based on the author's views and analysis as of the date of first publication. From time to time the author's views may change due to new information or evolving market conditions. Any major updates to the author's views will be published separately in the author's weekly commentary or a new deep dive.

This content is for educational purposes only and is NOT financial advice. Before acting on any information you must consult with your financial advisor.