ℂCoding Contributor ♇Poject Leader
Using the 30 constituent stocks of the Dow Jones Industrial Average as empirical targets, a multidimensional state space was designed incorporating technical indicators such as MACD, RSI, CCI, and ADX. Five mainstream algorithms PPO, A2C, DDPG, SAC, and TD3 along with an ensemble strategy, were fully implemented and compared. An innovative action threshold conversion mechanism was developed to effectively map continuous action spaces to practical discrete trading signals. The system encompasses the entire process, including data acquisition, feature engineering, policy training, and real-time recommendation.
The figure illustrates the data partitioning of AAPL stock opening prices, as well as the rate of return curves of various reinforcement learning algorithms on this dataset.
The Training and Test phases were run with 10,000 training steps, while the Validation phase was run with 1,000 steps. By comparing the performance of PPO, A2C, DDPG, SAC, TD3, and an ensemble strategy (Ens.), their applicability and robustness in different phases can be systematically evaluated.
Figure shows the next-day trading recommendations for the 30 constituent stocks of the Dow Jones index, as generated by the Ensemble agent under the current market conditions. In the figure, the vertical axis represents the action values output by the model, while the horizontal axis lists the corresponding stock tickers. For interpretability, the action values are divided into three intervals: values above 0.1 indicate a Buy signal, values below −0.1 indicate a Sell signal, and the remainder suggest a Hold recommendation.