I’m currently a postdoctoral researcher at Leiden University working on knowledge representation and action control models for the RoboHow project, a large EU-funded project to make robots do everyday human tasks such as cooking breakfast. That is, I’m teaching robots how to make pancakes. While the majority of the researchers on RoboHow are roboticists using structured knowledge representations, I am one of the few “psychologists” on the team, charged with making sure the robot is happy the system more cognitively-plausible. As a modeler coming from the dynamical systems stronghold of Indiana University, I see it as my job to propose control systems that are more flexible, adaptive, and generalizable than hand-built representations. One prong of our approach is proposing a model for learning to perform sequential actions (e.g., piano playing, recipe following…) that will use real-time spiking neural networks, in contrast to discrete-time recurrent networks (e.g., Botvinick and Plaut, 2004) or hand-built representations organized in interactive-activation networks (e.g., Cooper and Shallice, 2000). Below is a short description of the type of network I will use; the full model specification is in progress. I may write a post on implementation at some point (currently in BRIAN, but may switch to a blazingly-fast simulator written by fellow conspirator Richard Veale).
Reservoirs, e.g. liquid state machines, nonlinearly transform input to allow stimuli to be linearly separable (i.e., easily classified). Stimuli (e.g., an agent’s own recent action, or a particular perceptual input, or a task context) can be recognized with time- and other dimension-invariance.
In traditional neural networks, learning (i.e., adjusting weights) takes many iterations over learning examples, the training algorithms are prone to overfitting (additional examples often causes forgetting), and they are not biologically plausible. Moreover, there is a static, discrete interpretation of output: perceptual input causes activity in the network, which eventually settles on a particular output. Thus, traditional NNs do not give continuous, dynamic output based on changing perceptual input.
In contrast, the more realistic spiking neuron models used in liquid state machines (LSMs) accept real-time input and give continuous output, which can be interpreted in many ways (e.g., probabilities of different stimuli, estimates of perceptual parameters, or actions to be taken). LSMs are organized more like the human cortex (3d grid, small-world connectivity), and the only learning that takes place is biologically-inspired synaptic plasticity to regularize inputs. Because the network is not significantly modified for specific tasks, the same network—essentially a high-dimensional kernel that naturally integrates temporal and spatial information—can be used for many different tasks at the same time.
Basically, an LSM is a pool of randomly-interconnected neurons, a decaying memory whose present pattern of activation is a function of it’s past inputs (and their time/sequence of arrival). This liquid can be read by simple classifiers (e.g., perceptrons) that are trained for specific tasks, such as recognizing when to take an action (or whether an action has been taken), or whether a given stimulus (or conjunction of stimuli) is present in the environment. Like other kernel-based classifiers, the LSM need only have more dimensions than the input, and be able to nonlinearly transform the input to make it linearly classifiable. Unlike other kernels, a liquid carries out transformations along both temporal and spatial dimensions in continuous time, allowing it to recognize, say, sloooooowly spoken words at some point in time (or ambiguous words: BLack vs. BLood). Thus, LSMs will show recognition and confusion unfolding over time much like humans do.
Training readout classifiers is fast (linear regression), and allows researchers to ask what computations are performed by the liquid at what point(s) in time. Readouts can also be used as feedback into the liquid, allowing us to see more transparently what inputs result in activation of the next action. Liquids have been used to control robot arms, to classify spoken digits, to predict changing visual input (moving dots and lines).