Archives

All posts by admin

Many machine learning algorithms are trained on a large dataset, and if you later wish to add a few more data then the algorithm must be run on the entire set again from scratch. Even so-called biologically-inspired ones such as neural networks (which are incrementally update-able, though sometimes new data will cause you to unlearn old data…) generally require hundreds of iterations over the same set of data to gradually converge. Other models which can be incrementally updated are order-invariant: i.e., introducing data to the model in a different order will result in the same output. Often (e.g., in Bayesian/’rational’ models), this is seen as desirable (e.g., theoretically we don’t want the order of evidence presented in a court case to influence the jury’s decision), but it departs widely from human behavior. These deviations from order-invariance are often decried as fallacies (though see the heuristics view), but order effects in associative learning studies (e.g., blocking) can be understood to stem from shifting attention driven by familiarity, surprisal, and other factors–all of which make sense, given the constraints of in situ cognition. I have investigated and modeled order effects from associative learning (e.g., highlighting) as well as word-learning. The advantage of having a fast learning rate is that it allows us to quickly test and exploit correlations that we notice in our environment. I am currently working on modeling the starting small effect in grammar learning, and I’m generally interested in asking whether and when it can be advantageous (optimally–over time, or computationally) to learn via incremental updating instead batch updating.

From my study of cognitive psychology, I find the following learning principles to be not only evident in humans, but desirable in any mind that must act now, even as it learns:

  • New stimuli are compared to existing representations, then either directly integrated into existing structures (if unsurprising, or a good fit) or generate a new cluster if dissimilar to all existing things
  • Repeated encounters with similar enough information should strengthen the representation, whereas…
  • One-off occurrences should be stored in episodic memory, but only make the transfer to semantic memory if additional similar stimuli come along to justify a new entry/’cluster’
  • There must be a way to periodically (developmentally and perhaps during sleep) reorganize mental representations: splitting/joining concepts; adding/removing/combining feature dimensions

 

For several years I’ve dreamed of building an AI that will continually update it’s knowledge (and underlying representations) by seeking out new information online. Towards this end, I have done research on BEAGLE (see papers here) and toyed around with other semantic space models trained on text corpora (e.g., LSA, LDA). I also believe that my research in word-learning–especially how infants grow their vocabulary (e.g., award-winning dissertation)–will inform these efforts, since it is likely that building a semantic memory involves periodic changes and reorganization of the space/representations. However, there are also a lot of practical concerns that make it difficult to build an online learner, including crawling and parsing webpages for meaningful text, having a flexible database for storing the distributed representations along with other features, and making the online system amenable to updating. Fortunately, there are many open source packages that do many of these things. I’ll keep an up-to-date list of my preferred (Python) pipeline:

  1. Web crawling and lemmatization: pattern (note on spelling: fancy)
  2. Text extraction: jusText
  3. Distributed representations: AL-BEAGLE 😉 or word2vec in gensim

For a while in grad school I used the list of joint winners of the Hugo and Nebula awards to guide my book choices, but I’ve finally read everything on it. Aside from recommendations from a few friends (esp. on GoodReads), I get very little input, since I rarely find time to read reviews or browse in bookstores very often. But I still have enough of a backlog to merit making a list (inspired in large part by this one, since it featured a few personal favorites such as The Stars My Destination), so here we go:

  • Dealing with Dragons  – Patricia C. Wrede
  • Little, Big – John Crowley
  • Never Let Me Go – Kazuo Ishiguro
  • The Drowned World – J.G. Ballard
  • Among Others – Jo Walton
  • Brown Girl in the Ring – Nalo Hopkinson
  • The Female Man – Joanna Russ
  • Zone One – Colson Whitehead
  • The City & The City – China Miéville
  • Gormenghast – Mervyn Peake
  • Dhalgren – Samuel R. Delaney
  • Ubik – Philip K. Dick

 

I’m currently a postdoctoral researcher at Leiden University working on knowledge representation and action control models for the RoboHow project, a large EU-funded project to make robots do everyday human tasks such as cooking breakfast. That is, I’m teaching robots how to make pancakes. While the majority of the researchers on RoboHow are roboticists using structured knowledge representations, I am one of the few “psychologists” on the team, charged with making sure the robot is happy the system more cognitively-plausible. As a modeler coming from the dynamical systems stronghold of Indiana University, I see it as my job to propose control systems that are more flexible, adaptive, and generalizable than hand-built representations. One prong of our approach is proposing a model for learning to perform sequential actions (e.g., piano playing, recipe following…) that will use real-time spiking neural networks, in contrast to discrete-time recurrent networks (e.g., Botvinick and Plaut, 2004) or hand-built representations organized in interactive-activation networks (e.g., Cooper and Shallice, 2000). Below is a short description of the type of network I will use; the full model specification is in progress. I may write a post on implementation at some point (currently in BRIAN, but may switch to a blazingly-fast simulator written by fellow conspirator Richard Veale).

Reservoirs, e.g. liquid state machines, nonlinearly transform input to allow stimuli to be linearly separable (i.e., easily classified). Stimuli (e.g., an agent’s own recent action, or a particular perceptual input, or a task context) can be recognized with time- and other dimension-invariance.

In traditional neural networks, learning (i.e., adjusting weights) takes many iterations over learning examples, the training algorithms are prone to overfitting (additional examples often causes forgetting), and they are not biologically plausible. Moreover, there is a static, discrete interpretation of output: perceptual input causes activity in the network, which eventually settles on a particular output. Thus, traditional NNs do not give continuous, dynamic output based on changing perceptual input.

In contrast, the more realistic spiking neuron models used in liquid state machines (LSMs) accept real-time input and give continuous output, which can be interpreted in many ways (e.g., probabilities of different stimuli, estimates of perceptual parameters, or actions to be taken). LSMs are organized more like the human cortex (3d grid, small-world connectivity), and the only learning that takes place is biologically-inspired synaptic plasticity to regularize inputs. Because the network is not significantly modified for specific tasks, the same network—essentially a high-dimensional kernel that naturally integrates temporal and spatial information—can be used for many different tasks at the same time.

Basically, an LSM is a pool of randomly-interconnected neurons, a decaying memory whose present pattern of activation is a function of it’s past inputs (and their time/sequence of arrival). This liquid can be read by simple classifiers (e.g., perceptrons) that are trained for specific tasks, such as recognizing when to take an action (or whether an action has been taken), or whether a given stimulus (or conjunction of stimuli) is present in the environment. Like other kernel-based classifiers, the LSM need only have more dimensions than the input, and be able to nonlinearly transform the input to make it linearly classifiable. Unlike other kernels, a liquid carries out transformations along both temporal and spatial dimensions in continuous time, allowing it to recognize, say, sloooooowly spoken words at some point in time (or ambiguous words: BLack vs. BLood). Thus, LSMs will show recognition and confusion unfolding over time much like humans do.

Training readout classifiers is fast (linear regression), and allows researchers to ask what computations are performed by the liquid at what point(s) in time. Readouts can also be used as feedback into the liquid, allowing us to see more transparently what inputs result in activation of the next action. Liquids have been used to control robot arms, to classify spoken digits, to predict changing visual input (moving dots and lines).

I am a person with too many ideas for his own good, a poor memory, and not enough time or organizational skills to properly file ideas. My hope is that this site will serve as a haystack of thoughts which I can periodically sift through, searching for an old needle rather than spending time creating a redundant one. If I spin a sufficiently intricate web of tags and cross-references, maybe the search won’t even be so difficult. With your help, Dear Reader, perhaps occasionally we can even make hay*.

Topics will include many of my Top Secret research/entrepreneurial projects, such as:

  • Language learning apps inspired by (and for) my dissertation work on cross-situational word learning
  • My attempt to create a reputation-based economy for a better world (egoBoo)
  • Similar, but more specific rating networks I’m creating for political and science engagement
  • Language acquisition modeling from a memory and learning perspective
  • Spiking neural network models for sequential action learning, especially for robots performing everyday human tasks (my current work on RoboHow)
  • Prescriptive Predictions: my attempt to synthesize the large amounts of news I read, and put an optimistic spin on them
  • What I’m reading: for fun and pleasure

I’m not sure how my style will evolve: my current motivation is mainly to collect my thoughts, but if you find the ideas interesting but hard to understand, please drop me a note and I’ll make some posts more expository. Or better yet, we can meet up for a drink and scribble on napkins. I would of course appreciate hearing from you if you use any of my ideas, or want to contribute to their evolution (fighting the good fight is always more fun with collaborators!). Oh, and Derek Sivers explains why I think it’s fine to share my (brilliant :P) ideas: they’re nothing but a multiplier on how well they’re executed, and the sad reality is that I have too little time to execute many of them on my own.

*Please do punish me for bad and mixed metaphors.