SARSA algorithm

I am having trouble understanding the SARSA algorithm:

In particular, when updating the Q value what is gamma? and what values are used for s(t+1) and a(t+1)?

Can someone explain this algorithm to me?



Gamma determines how much memory your algorithm has. If you set it to 0.0, then your algorithm will not update the value function Q at all. If you set it to 1.0, then the new experience will be given as much weight as all the previous experiences combined. The best values lie inbetween and have to be determined experimentally.

Here is how it works:

  • In your first step, you just get a state. Simply store it away as st. Also, look up your value function for the best action to make in this state and store it as at.
  • In each subsequent step, you get rt+1 and st+1. Again, use your value function to find the best action — at+1. The value of the transition from your previous action to the new one is equal to rt+1+Q(st+1,at+1)-Q(st,at). Use this to update your long-term estimate of the previous action's value Q(st,att). Finally, store st+1 and at+1 as st and at for the next step.

In effect, the value function is just a running average of these update values for each action and every state.

Need Your Help

Meteor change value on modal window when editing

javascript meteor modal-dialog

I would like to add 'Edit' function using modal dialog in a meteor app.

Analyzing Multithreaded Programs

multithreading concurrency

We have a codebase that is several years old, and all the original developers are long gone. It uses many, many threads, but with no apparent design or common architectural principles. Every deve...

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.