Chapter 48: One God Leads to Four Pitfalls!
Chapter 48: One God Leads to Four Pitfalls!
The air conditioner hissed out cold air from the vents, but it couldn't take away the condensed heat in the air.
Three professors, five senior students, eight people in total, sixteen pairs of eyes were all fixed on Lu Feng.
Everyone seemed to be waiting for Lu Feng to give the results, while Lu Feng felt goosebumps all over from being stared at.
Ye Guodong was the first to break the silence, adjusting his glasses.
"It's theoretically feasible, but... can you express it using a concrete model?"
"For example, formulas or simple models."
"Of course," Lu Feng replied readily.
Just kidding, if I can say it, can't I verify it?
Professor Xu Yun subconsciously pulled a brand new whiteboard marker from the pen holder and handed it over.
Lu Feng took the pen, removed the cap, and turned to walk towards the huge white writing board.
The five third-year students unconsciously straightened their backs and leaned slightly forward.
Lu Feng did not write immediately; instead, he first drew a simple two-dimensional grid in the upper left corner of the whiteboard.
"First, let's define the state space S." His voice echoed in the quiet server room, clear and steady, captivating all who heard him. The power of his words to influence the entries was truly extraordinary.
"We divide the entire city into N regions, and each region represents a state."
"The taxi's state 's' is its current grid number 'i'."
The pen tip swept across the whiteboard, leaving behind lines of simple mathematical symbols.
S = {s₁, s₂, ..., sN}
"Then, there is the action space A. For any state s, the taxi's action a is to choose to drive to an adjacent area j."
A(s)={a₁, a₂,…, ak}
"The most crucial part is the reward function R." Lu Feng's pen speed increased, and lines of formulas began to spread on the whiteboard.
"The reward R(s, a) consists of two parts. The first part is the probability P of picking up a passenger in the new area j after performing action a, multiplied by the expected revenue E of this order."
"The second part is the cost of driving without a load, C, which includes time costs and fuel costs."
R(s, a)= P(s, a)* E(s, a)~ C(s, a)
Professor Sun Yi from the School of Computer Science nodded repeatedly, completely captivated by Lu Feng's ideas.
This modeling approach is simple, elegant, and highly computable.
"Next, we need to find the optimal strategy π*." Lu Feng turned around and looked at the five senior students who were already somewhat stunned.
"Traditional dynamic programming requires knowing the complete state transition probabilities, but we don't have that, so we use Q-learning."
He didn't give anyone a chance to ask questions, and directly wrote down the most core equation in the field of reinforcement learning in the center of the whiteboard.
Q(s, a)← Q(s, a)+α[R +γ maxₐ』 Q(s』, a』)~ Q(s, a)]
"We don't need to know the global map, nor do we need to predict the future. Each car is independent; it only needs to choose the action that brings the maximum Q value based on its current Q table."
"Every successful passenger transport is a positive reward."
"Each long period of idle time is a negative penalty. Through iterative training with massive amounts of real operational data, this Q-table will continuously converge and eventually approach the optimal solution."
As Lu Feng spoke, he added the physical meaning of each parameter next to the equation.
Learning rate α, discount factor γ, state s, action a.
Li Hao and Wang Zhe from the School of Computer Science exchanged a glance, both seeing undisguised shock in each other's reactions.
Because they hadn't expected Lu Feng to be so accomplished in mathematics.
In no time, the entire huge whiteboard was filled with writing.
From the discretization of the state space to the design of the reward function, and then to the iterative update algorithm of the Q-table, a complete and self-consistent logical loop is formed.
Ye Guodong could no longer suppress his smile. He turned to look at Xu Yun and Sun Yi beside him, his expression as if to say: Look, look what treasure I've found!
Professor Xu Yun nodded repeatedly, while Sun Yi took out his phone and took several photos of the whiteboard covered with formulas from different angles.
Those five students had completely given up thinking; this wasn't something that could be resolved in a short time.
After writing the last character, Lu Feng capped his pen and casually placed it on the lectern.
He turned around and saw the three professors winking and grinning like they'd won the lottery.
No, could you please let me go back and sit down first?
Three big shots are standing here, five senior students are sitting there, and I'm the only one standing in the middle. What's going on here? Is this a public execution?
Ye Guodong seemed to notice Lu Feng's gaze, and he immediately suppressed his smile, cleared his throat, and tried to regain his dignity as a professor.
"Cough cough...very good." His voice still carried a hint of excitement that he hadn't completely suppressed.
"This idea is very novel, logically sound, and highly feasible."
He turned to the five students who were still in a petrified state.
"To be honest, when this problem was in the national competition, no team thought of using reinforcement learning to solve it. The best result was just a genetic algorithm with a time window, which was complex and the solution accuracy was very average."
These words undoubtedly gave Lu Feng's plan an official endorsement.
Lu Feng nodded to the teachers and silently walked back to his seat.
As soon as he sat down, Chen Jing leaned over, her voice carrying a hint of uncertainty and a desire for advice.
"Lu Feng... junior, what data structure did you use to store that Q table? If the state space is too large, wouldn't a two-dimensional array cause a memory overflow?"
"Use a hash table," Lu Feng replied without looking up. "The key is a tuple of state and action, and the value is the Q-value. This way, only the accessed states are stored, which can save a lot of space."
Chen Jing suddenly realized and nodded repeatedly.
Soga.
Ye Guodong clapped his hands in satisfaction as he watched his team members spontaneously begin exchanging technical information.
"Alright, the plan is clear. Next is a month-long intensive training." He looked around at everyone and raised his voice a few decibels.
"If you have any questions during this time, you can come to the three of us anytime."
"Whatever equipment or materials you need, the school will provide them to you regardless of cost. I only have hope this time."
He paused for a moment, his gaze finally settling on Lu Feng.
"Bring the national award back to me!"
Just as he finished speaking, the bell rang, signaling the end of get out of class in the computer lab. The crisp sound echoed in the quiet room.
Ye Guodong waved his hand.
"Alright, I bet everyone's hungry, let's go eat."
"Of all things, eating is the most important."
pertwk