DeepCox - Stock Prediction

A Study on Stock Trading Strategies Based on Multi-Agent Reinforcement Learning

Canhui Ruan, Jianfei Zhang^ℂ,♇

^ℂCoding Contributor ^♇Poject Leader

Abstract

Using the 30 constituent stocks of the Dow Jones Industrial Average as empirical targets, a multidimensional state space was designed incorporating technical indicators such as MACD, RSI, CCI, and ADX. Five mainstream algorithms PPO, A2C, DDPG, SAC, and TD3 along with an ensemble strategy, were fully implemented and compared. An innovative action threshold conversion mechanism was developed to effectively map continuous action spaces to practical discrete trading signals. The system encompasses the entire process, including data acquisition, feature engineering, policy training, and real-time recommendation.

The figure illustrates the data partitioning of AAPL stock opening prices, as well as the rate of return curves of various reinforcement learning algorithms on this dataset.

Top-left plot: Shows the opening prices of AAPL stock from 2020 to 2025, divided into Training, Validation, and Test phases, which are used to train, validate, and evaluate the performance of the models. The Training phase covers 2020–2023, the Validation phase covers 2024, and the Test phase covers 2025.
Top-right plot: Shows the rate of return curves of various reinforcement learning algorithms during the Training phase, evaluating their learning capabilities on the training data.
Bottom-left plot: Shows the rate of return during the Validation phase, measuring the generalization ability of the algorithms on unseen data.
Bottom-right plot: Shows the rate of return during the Test phase, used to simulate the feasibility and stability of these strategies in real-world deployment.

The Training and Test phases were run with 10,000 training steps, while the Validation phase was run with 1,000 steps. By comparing the performance of PPO, A2C, DDPG, SAC, TD3, and an ensemble strategy (Ens.), their applicability and robustness in different phases can be systematically evaluated.

Figure shows the next-day trading recommendations for the 30 constituent stocks of the Dow Jones index, as generated by the Ensemble agent under the current market conditions. In the figure, the vertical axis represents the action values output by the model, while the horizontal axis lists the corresponding stock tickers. For interpretability, the action values are divided into three intervals: values above 0.1 indicate a Buy signal, values below −0.1 indicate a Sell signal, and the remainder suggest a Hold recommendation.

A Study on Stock Trading Strategies Based on Multi-Agent Reinforcement Learning

Canhui Ruan, Jianfei Zhangℂ,♇

Abstract

Canhui Ruan, Jianfei Zhang^ℂ,♇