[ This blog post is collaboratively written by Evan and Memming ]
The Scalable Models workshop was a remarkable success! It attracted a huge crowd from the wee morning hours till the 7:30 pm close of the day. We attracted so much attention that we had to relocate from our original (tiny) allotted room (Superior A) to a (huge) lobby area (Golden Cliff). The talks offered both philosophical and methodological perspectives, reflecting diverse viewpoints on and approaches to highdimensional neural data. Many of the discussions continued the next day in our sister workshop. Here we summarize each talk:
Konrad Körding – Big datasets of spike data: why it is coming and why it is useful
Konrad started off the workshop by posing some philosophical questions about how big data might change the way we do science. He argued that neuroscience is rife with theories (for instance, how uncertainty is represented in the brain) that, while supported by some data, are plausible only for particular areas or computations. Konrad argued that this may simply be the state of affairs for the brain, where complexity and heterogeneity are the rule. Rather than reject such theories as incomplete, perhaps we should be humble about our theories. Broad, overarching principles may only ever explain a small percentage of the overall variance in the brain.
In the second part of his talk, Konrad discussed his recent work in using Fisher information to build a unifying theory of largescale neural recording techniques [Cybulski et al. 2014]. For example, it can be used to find the best distribution of electrodes in neural tissue.

Cybulski, T. R., Glaser, J. I., Marblestone, A. H., Zamft, B. M., Boyden, E. S., Church, G. M., and Kording, K. P. (2014). Spatial information in LargeScale neural recordings.
Jeremy Freeman – Distributed computing for largescale neuroscience
Neuroscience will need new software tools to handle the very largescale data produced by largescale experiments (in both time and # neurons). Luckily, as Jeremy Freeman observed, neuroscience can lean on many tools already developed by the tech industry to deal with “big data”. One such tool is MapReduce: a framework for massive parallelization often used in web search and elsewhere. Jeremey introduced a software framework called Spark that provides a more abstract interface more suitable for iterative calculations that are often required in neuroscience. Jeremy advocates Spark for the use in largescale data analysis, and is actively developing a software package called Thunder that implements many common data analysis tools such as PCA, ICA, and regression on top of Spark. For a 0.1TB dataset from wholebrain larval zebrafish recordings [Ahrens 2013], his algorithms run in the order of minutes using 30 machines. He proposed using legacy software systems such as MATLAB to convert any legacy data to a simple common data format (rows of time series) before feeding into Thunder.

J. Freeman, N. Vladimirov, T. Kawashima, Y. Mu, D. V. Bennett, J. Rosen, C. Yang, L. Looger, M.Ahrens. Mapping the brain at scale. COSYNE 2014

Ahrens, M. B., Orger, M. B., Robson, D. N., Li, J. M., and Keller, P. J. (2013). Wholebrain functional imaging at cellular resolution using lightsheet microscopy. Nature Methods, 10(5):413420.
Mark Churchland – Tensorbased dimensionality reduction for data exploration
What drives the temporal structure of a neural population response : stimulus or (linear) dynamics? Mark proposed a method to answer this question by decomposing a timebyneuronbycondition data array with linear combination of small number of either Rsheets (conditionbytime) or Dsheets (neuronbytime). If is linearly driven by stimulus through tuning curve , a fixed number Rsheets will approximate the neural response well, no matter how long the recordings are. On the other hand, if dynamics drive the neural activity well, e.g., , then a fixed number of Dsheets will continue to explain the data well. Hence, he argued, the decay in lowrank approximation performance as more time is included gives insight into the fundamental driver of neural response: dynamics or stimulus. He showed evidence that in the motor cortex, dynamics alone explains the data better.

Jeffrey Seely, Matthew T Kaufman, John Cunningham, Stephen Ryu, Krishna Shenoy, Mark Churchland. Dimensionality in motor cortex: differences between models and experiment. COSYNE 2012

Jeffrey Seely, Matthew Kaufman, Adam Kohn, Matthew Smith, Anthony Movshon, Nicholas Priebe, Stephen Lisberger, Stephen Ryu, Krishna Shenoy, Larry Abbott. Quantifying representational and dynamical structure in large neural datasets. COSYNE 2013
John Cunningham – Generic linear dimensionality reduction for highdimensional neural data
John began by listing 3 advantages of highdimensional neural data: (1) single trial statistical power (average over population instead of trials), (2) population response structure (population code and dynamics), (3) exploratory data analysis (generate systems level hypotheses). He then turned to consider a particular technique: linear dimensionality reduction.
Linear dimensionality reduction is a common tool in statistics, data analysis, and many other fields. This term includes tools such as PCA, LDA and CCA. Standard practice in the use of these techniques is to perform an eigendecomposition of some matrix, and then truncate some of its eigenvalues. However, John points out that this common practice often does not maximize the objective function associated with a given technique. Only in the case of PCA is it provably optimal for minimizing . John proposed using manifold optimization techniques instead. He demonstrated several cases where optimization outperforms eigendecomposition truncation for linear dimensionality reduction.

Absil, PierreAntoine, Robert Mahony, and Rodolphe Sepulchre. “Optimization on manifolds.” (2007).
Rob Kass – Statistical considerations in making inferences about neural networks: The case of synchrony detection
Rob Kass dealt with the problem of identifying excess synchrony, or discovering connectivity structure from data [Kelly & Kass 2012]. He took advantage of the fact that the number of interactions increases quadratically as the number of neurons increased–he could reliably estimate the alternate hypothesis of the connection strength (or synchrony) using Bayesian control of false discovery. Basically, the estimated distribution of connectivity resulting from point process regression was modeled as a mixture of theoretical null hypothesis and samples from the alternative hypothesis. He also included covariates such as population level oscillation (via population rate) in his model.

Kelly, R. C. and Kass, R. E. (2012). A framework for evaluating pairwise and multiway synchrony among StimulusDriven neurons. Neural Computation, 24(8):20072032.

Emery Brown, Uri Eden, Rob Kass. (2014) Analysis of Neural Data. [amazon]
Lars Büsing – Dynamical component analysis of neural data
Latentvariable dynamical models are an increasingly popular tool for largescale neural population analysis. A popular technique under this framework is to assume that spikes are Poissondistributed with rates given by some function of a linearlyevolving latent state. While this technique is closely related to traditional Kalman filtering algorithms, the Poisson likelihood renders them difficult to compute. Lars presented a new technique for performing approximate inference under the Poisson observation model, as well a new technique for clustering neural populations with the same dynamics.

Lars Buesing, Maneesh Sahani, Jakob H. Macke. Spectral learning of linear dynamics from generalisedlinear observations with application to neural population data. NIPS 2012

Khan, Aravkin, Friedlander, Seeger, “Fast Dual Variational Inference for NonConjugate Latent Gaussian Models.” Proceedings of the 30th International Conference on Machine Learning (ICML). No. EPFLCONF186800. Omni Press, 2013.

Lars Buesing. Unsupervised identification of excitatory and inhibitory populations from multicell recordings. COSYNE 2014
Matteo Carandini – Soloists and choristers in a cortical population
Matteo started by emphasizing that many widelyused models of neural data do not scale well with the number of parameters, and often do not have a clear biological interpretation. He then considered the raster marginal model of [Okun et al. 2012], a class models designed to be consistent with two measurements: (1) the marginal firing rate for each neuron, and (2) the population rate distribution (the number of neurons firing in each time bin). He showed that while many neurons in the population are choristers (are highly correlated with the population rate), a few are soloists. Soloists and choristers are indistinguishable, except that choristers are more “enthusiastic” (respond with an elevated firing rate in response to stimuli). The raster marginal model cannot distinguish between soloists and choristers, and so Matteo proposed an extension that includes a new statistic: (3) the correlation between each neuron and the population rate. His final model, including statistics (1), (2), and (3), has at most 3N parameters (where N is the number of neurons). This model captures the distinction between soloists and choristers, as well as correlation coefficients and connection probability. Matteo suggested using this model as a null hypothesis. If the model explains a potentially interesting phenomenon — perhaps it is not an interesting phenomenon after all.

Okun, M., Yger, P., Marguet, S. L., GerardMercier, F., Benucci, A., Katzner, S., Busse, L., Carandini, M., and Harris, K. D. (2012). Population rate dynamics and multineuron firing patterns in sensory cortex. The Journal of neuroscience, 32(48):17108–17119.

Thomas MrsicFlogel. The functional organisation of synaptic connectivity in visual cortical microcircuits. COSYNE 2014
I. Memming Park – Scaling up with Bayes
(Unfortunately, Matthias Bethge couldn’t make it, so Memming gave a talk instead.)
To make inferences with highdimensional data, we need strong assumptions. In Bayesian statistics, we encode our assumptions and prior knowledges into the prior distribution. In this talk, Memming showed how to use popular parametric models of spike word distributions, and use Bayesian nonparametrics techniques to make them flexible.

Il Memming Park, Evan Archer, Kenneth Latimer, Jonathan Pillow. Universal models for binary spike patterns using centered Dirichlet processes. NIPS 2013

Ganmor, E., Segev, R., and Schneidman, E. (2011). Sparse loworder interaction network underlies a highly correlated and learnable neural population code. Proceedings of the National Academy of Sciences, 108(23):96799684.
Surya Ganguli – What are the right questions, and how many neurons do we need to answer them?
Surya’s talk introduced a new quantity, called “task complexity”, which bounds the dimensionality of neural population dynamics. This talk built on the talk given by Perian Gao’s given in the main meeting. Task complexity is an upper bound of trialaveraged neural response, characterized by the degree of freedom in the task conditions and correlation time scale of neural responses (there were hot discussions concerning whether if “task complexity” is a misleading notion, since it includes the smoothness of the neural trajectory for the task). Surya claimed that the task complexities of many of the experiments are such that it is not surprising to recover lowdimensional dynamics from observations from 100 neurons in motor cortex. He emphasized that we need to design experiments with higher task complexity to take advantage of highdimensional neural data. Surya also suggested that we may not need to observe many neurons to recover the dynamics, under certain conditions provided by the theory of random projections.

Peiran Gao, E. Trautmann, B. Yu, G. Santhanam, S. Ryu, K. Shenoy, S. Ganguli. A theory of neural dimensionality and measurement. COSYNE 2014

Surya Ganguli. A theory of neural dimensionality, dynamics and measurement: the neuroscientist and the single neuron. NIPS workshop 2013