The Influence of Minigpt Parameters on Training Results – Stray Birds

In the last model training experiment, we conducted systematic sweep tests on different parameter combinations, including hidden layer size (H), dropout ratio (dp), and learning rate (lr). In this paper, the data set is replaced by the Stray Birds. According to the experimental results, the impact of different parameters on the training effect is analyzed, and the best configuration suggestions for reference are given. However, in this experiment, due to the too short timeout setting, most of the training did not undergo multiple rounds of training like the previous single sentence text training, and exited within a few rounds. Due to my hardware limitations (although 4070 should not be a bottleneck, it may be due to the small amount of data causing the graphics card to be idle) and time constraints (running for about a week before ending), I did not run all the set parameter types and did not obtain satisfactory results. If interested, it is recommended to optimize the training script for further research.

1. Experimental Design

Dataset: Stray Birds (about 800 rows)
Model Architecture: Simplified Transformer/RNN, with optional hidden layer sizes H of 64, 128, 256, 512
Hardware: Single RTX 4070, CPU i5-13600K, 32GB memory
Training parameter sweet:
- Hidden layer H: 64/128/256/512
- Dropout dp：0.0 / 0.1 / 0.2 / 0.3
- Learning rate lr: 0.00001/0.0005/0.001
Training limitations: Some experiments did not fully converge due to a single round timeout.

2. Comparison of experimental results

Hidden Layer H	Final Loss	Remarks
64	0.0	0.001	1.207	Moderate convergence	The learning rate is relatively high, and the loss is stable at 1. x
64	0.1	0.0005	0.981	Stable convergence	loss
64	0.2	0.0001	1.310	underfitting	The learning rate is too low and not learned enough
128	0.0	0.001	1.692	underfitting/oscillation	The learning rate is too high and the training is unstable
128	0.2	0.001	0.075	Overfitting	Very low loss, almost memorized training set
128	0.3	0.0005	0.990	Stable convergence	Good generalization, recommended combination
256	0.0	0.001	0.192	Severe overfitting	Loss close to 0, rote memorization
256	0.1	0.0005	1.105
256	0.2	0.0001	1.432	Underfitting	Learning rate is too low and stops at high loss
512	0.2	0.0001	0.573	Good convergence	Large model capacity and long training time
512	0.3		0.0005	Moderate convergence	Stable, but time-consuming

The Influence of Minigpt Parameters on Training Results – Stray Birds

Like this:

By 管理员

Leave a ReplyCancel reply

You Missed

Hello world!

Xiaomi AX3000T openwrt configuration

From zero training miniGPT: from a sentence to the generation experiment of Stray Birds

Home Assistant Initialization and Xiaomi Plugin Installation

关于我

关注我

最新文章

是否对我的内容感兴趣，欢迎留言讨论。

The Influence of Minigpt Parameters on Training Results – Stray Birds

Share this:

Like this:

By 管理员

Related Post

From zero training miniGPT: from a sentence to the generation experiment of Stray Birds

The Influence of Minigpt Parameters on Training Results – Single Sentence Text

Leica for a moment? Fuji for a moment! Level 0 Training

Leave a ReplyCancel reply

You Missed

Hello world!

Xiaomi AX3000T openwrt configuration

From zero training miniGPT: from a sentence to the generation experiment of Stray Birds

Home Assistant Initialization and Xiaomi Plugin Installation