MatVis | Projects | Home



MINDSEYE: A VISION MODELING ENVIRONMENT

1. GENERAL AIMS OF THE PROJECT

1.1 Overview

Our goal in this project is to develop a research and educational tool for the scientific community that is engaged in understanding human sensory systems, the visual system in particular. The breath of basic research, physiological and psychophysical, on the visual system has reach the point that many investigators are now designing quantitatively explicit models of visual function in areas of pattern discrimination, motion detection, optical flow, color discrimination, adaptation and stereopsis. We have created a prototype general purpose vision modeling environment that embodies the general structure of the visual system and provides an extensive set of modular tools within a flexible platform tailored to the needs of vision scientists. The environment is easily extendible so researchers can add features specific to their needs if not already available in the environment. This new modeling environment has a user-friendly graphics interface with on-line documentation to minimize the time spent learning to use it effectively.

1.2 Introduction and Background

To a first approximation the early visual system is organized into a series of retinotopic processing stages. Each stage involves a family of processes or filters that commonly operate on local information. One stage could represent the receptors a later stage might have multiple representations such as cortical areas V1 and V2 or stages which don't have a definite physiological basis. This type of organization is common to models of visual function, including spatial vision (Watson, 1983; Watt & Morgan, 1983; Wilson & Gelb, 1984) and motion perception (Watson & Ahumada, 1985; Adelson & Bergen, 1985; Marr & Ullman, 1981; van Santen & Sperling, 1985). The MINDSEYE modeling environment is organized along the same lines. The input to each stage is a 2D grid, square, hexagonal or random sampling, representing the retinotopic output of a previous stage of processors. The initial input is the 2D sampled image itself. For temporal processes the input can be a sequence of such samples at any desired temporal sampling rate. The user defines the number of stages, the convergence an divergence of each stage and the characteristics of the processors at a stage. The characterization of each stage may involve 3 parts: 1) A definition of the processor array. The processors might be DOG, Gabor patches or Gaussian spatial filters of different orientations, size, bandwidth, sampling density and so on. 2) The specification of how output from the previous stage is combined by the current processor array. and 3) Various spatially pointwise operations that might be applied to the output of the previous stage of processors which do not change the spatial sampling and treat each point equally. For example squaring the values of every point on the grid or adding random noise to each point are examples of pointwise operations. Finally, after some user defined number of such stages a linking hypothesis which combines the final processor outputs and relates it to performance on a psychophysical task. How well it predicts performance depends on the details of the model.

2 RESEARCH RESULTS - MEETING THE SPECIFIC AIMS OF THE PROJECT.

In 1989, Michael Landy and collaborators (Landy et al., 1989) described a modeling environment, known as EVE that was a dramatic step in achieving a general purpose vision modeling environment. Unfortunately, it is assumed the EVE user is familiar with UNIX tools, is proficient at shell script writing and in many cases "C" programming. Our solution to these problems is a prototype modeling environment called MINDSEYE, which builds on the design of EVE such that programming experience is not required to generate and test a model. MINDSEYE has the functionality of EVE plus the added benefits of a user-friendly visual programming/modeling environment. In the following sections the environment is described in some detail using figures which are "snap shots" of the actual MINDSEYE interface.

2.1 The MINDSEYE startup window

The window in Figure 1 labeled MINDSEYE contains several pull down lists on the menu bar - File, Edit, Comm., Utilities and Help. The Utilities pull down list provides things like a scientific calculator for a quick problem and a paint program where the user can freehand "paint" a visual stimulus to be input to a vision model. The only other pull down list I want to spend time on is the one called Edit which brings up the list:

Edit
Create Square/Hex Sequence
Create Table Sequence
Edit Sequence
Delete Sequence
Create Temporal Filter
Edit Temporal Filter
Delete Temporal Filter

These edit selections offer methods of generating input to the users vision model as discussed in the following sections.

2.2 Model input sequences - Images, sampling and filters

A sequence is the basic input "image" to the model. The sequence is the informational unit that is passed through the network defined in the Model window which is described later.

2.2.1 Square/Hex sequence types

The Create Square/Hex Sequence was selected from the Edit pull-down which brought up the window labeled Square/Hex Image Sequence in Figure 1. In this window the sequence is given the filename final4 and a square sampling format is selected (filled button). In the Image Descriptors area, the user indicated the sequence will consist of 4 frames with 120 rows and columns of values in every frame. For sequences used as input to a model, the frames usually are some type of visual image to be presented to the model network. Once the sequence has passed through the model network the sequence of frames reflect the output of previous stages of processing rather than an actual visual image. Since 4 frames have been indicated, this sequence will define an image in time as well as in space. Also in the Image Descriptors area the user indicates that each frame will be 3.0 degrees of visual angle wide with the center of the frame located at location 0,0 degrees, thereby setting the spatial coordinate reference point. Throughout MINDSEYE, units are given in terms of degrees of visual angle. The user has now identified the sampling density and spatial scale of the frame, the spatial coordinate system and the number of temporal samples. Below the Image Descriptor panel in the Square/Hex Image Sequence window (Fig 1) are the areas called Image Format and Processor Format.

The user first selects the numeric format for the image; double, long, integer, byte, or complex. The actual image can come from a file in a variety of standard image formats by selecting the Image File check box and the desired file listed below the check box. A large number of standard image file formats are supported. This provides a means of testing a vision model on real word scenes or types of images not easily generated using an analytic expression.

Figure 1

In most cases vision models will be tested using stimuli commonly used in visual psychophysics experiments. These stimuli can usually be described with a simple analytic expressions. For such test stimuli the user selects the Define Image button which brings up the window in the upper right corner of MINDSEYE (windowTitle:1). The upper left panel lists the built-in variables available for use in an analytic expression that defines an image. For example, the built-in variable: xdva, indicates the current position in the x dimension in degrees of visual angle. This window contains seven buttons, four of which provide various types of on-line help, the rest are self explanatory. The large panel in the bottom half is where the user enters the analytic expression that defines the desired image. In this example the first line declares two variables, phase and frequency which will contain the spatial phase and spatial frequency of the image's sinewave grating. The next line is a comment (setoff in double quotes) indicating that the pattern will be a Gaussian windowed sinewave grating. Upon closer inspection of subsequent lines it is obvious that the phase of the sinewave will change each frame, a drifting grating. Line three sets the spatial frequency to 4 cycles/degree and line four sets the phase to the current frame number over the total number of frames in radians. Line five sets Value to the amplitude of the sinewave at Cartesian location; xdva, ydva. Line six multiplies Value at the same location by a Gaussian envelope. When the actual image is generated, for each point in each frame of the image sequence, Value will be calculated using the analytic expression (all five lines) provided by the user. All the variables in the list will be updated before Value is calculated so the analytic expression will give the correct answer. The actual image generated for frame one using this expression is shown in the left panel of Fig. 2. This simple method provides a very flexible means of defining any visual stimulus that can be expressed analytically.

Figure 2a

Figure 2b

The last piece of data often found in a sequence is processor or filter structures. Going back to the Square/Hex Image Sequence window, in the Processor Format section the user selects the type of spatial filter to be associated with each point in the image frame such as DOG or Gabor filters. A General type is also available where the user is free to define a new type. In this case the user selected Gabor, so pushing the Define Structures button brings up the second window on the right side of the MINDSEYE panel (windowTitle:2). Here again we have the same type of buttons to provide the user with help. In the lower panel of this window the user defines each of the necessary parameters for the filter. Gabor has seven required parameters. As was done when creating the image, each parameter is calculated according to the users definition for each processor location in the frame. In the example, the Gabor has a center Frequency of 4 cycles/degree with Phase equal to 0. The half width at half height of the envelope is 0.2 degree in x and y directions. The Orientation of the filter depends on the row location of the filter in the frame. In the first row it is vertical and slowly becomes horizontal by row 60 only to become vertical again by the last row. A sequence may contain an image with or without associated filter definitions for each point in the image. It just depends on the type of model being tested. By selecting OK in each of the three windows; windowTitle:1, windowTitle:2 and Square/Hex Image Sequence, the windows will close and the user defined sequence file will be generated.

As the sequence is being created the user could design the Vision model in another window. We provide a simple example of how this works in the window titled, Model in Fig 1. In the Model window are four network nodes with connecting lines. The far left node has the function of reading in the sequence we just defined in the Square/Hex Image Sequence window. Two other nodes labeled View: are for providing a graphic presentation of the sequence. The node labeled Gabor, convolves the Gabor filters with the input image. The node has two inputs one for a sequence containing the image and one for a sequence that defines the locations and parameters of the Gabor filters. In this case both inputs are connected to the same node, since our sequence included both an image definition and Gabor filter definitions. Often this information will come from different sequences with different types of sampling, such as a Hexagonal image and a Table sequence defining the filter locations (described in the next section). The user has already executed the this simple model network, the green circles at each node indicate no problems were encountered during execution. The circles could turn yellow or red depending on the problems encountered. The user has double clicked the two View nodes bringing up the View Sequence windows 1 and 2. As indicated, the images were scaled by 200%. When the two view buttons were pressed the two images in figure 2 appeared. The left is the original image since this View node was connected to the Read node before the Gabor convolution node. The right image is the image after convolution with a Gabor filter at each point in the image. Notice that in the middle of the image the grating disappears, this is where the orientation of the Gabor was defined to be horizontal in the input sequence. The View Sequence windows contain several other buttons, some for saving the images, one for single stepping through each frame of the sequence and another to animate the frame presentation. In this case, if the Animate button is selected the grating would rapidly drift under the static Gaussian envelope as was defined earlier when the sequence was first generated.

2.2.2 Table sequence type

The previous section covered square and hexagonal sampled sequence file definitions. The other type of sequence is called Table which is similar except the location of each image sample is explicitly defined by the user.

2.2.3 Temporal filter files

In the MINDSEYE window Edit pull down list we also have options for defining the temporal impulse response of a filter and saving the response to a file for later use by a node in the Model window.

2.3 Creating a Vision Model

Having created image sequences for input to the model its time to describe the process of actually creating a model of visual function and testing the model.

2.3.1 Introduction to components and network nodes

Sequences are the basic input and output units of any model created in MINDSEYE. Models are created by dropping nodes into the Model window and making functional connections between nodes (drawing lines). Nodes of many types are provided for the user to create a Vision model. To select a node the user first decides which category in the Components window (see figure 4) is likely to contain a node with the desired functionality. In figure 4, the user wanted to add noise to a sequence so the Miscellaneous category button was selected which brought up the Misc. collection of nodes. While most of the list is obscured in the figure, we see that the fifth button is called Noise. The user selected the Noise button, moved the cursor to the Model:2 window and dropped a Noise node into the Model:2 window. Other nodes were similarly placed in the Model:2 window. The mouse is also used to draw the connecting lines between nodes. Double clicking on any node will bring up a window for entering parameters pertaining to the particular node. In figure 4, the Noise: node as been double clicked hence the window labeled Noise. Various types of noise are available, each with its own set of parameters. Every node window in the system has a Comment parameter. The entered comment is appended to the node name in the Model window. For example, if Binomial was entered in the comment space then in the Model window the Noise node would be labeled Noise::Binomial. This enables each node to carry a tag which indicates its particular function within the current model. Also shown in figure 4 is the on-line Help text window associated with the Noise node. The appropriate parameters must be entered in every node before the network is activated.

The Model window itself has several options starting with the ever present Help button. The New removes all nodes from the window and Save allows the user to save the model on disk for future use. A Zoom slider scales all the nodes and connections in the Model window as needed. More than one Model window, each with a different model, maybe open at one time in the MINDSEYE system. The other options are beyond this brief introduction to the structure of MINDSEYE.

Figure 4

While MINDSEYE includes dozens of node types, we cannot anticipate all the users needs so a special node called Matrix is available. This node along with its parameter window is also shown in figure 4. The node provides flexibility by making a complete programming language available to the user. In the Matrix window is are two variables; ImageIn and ImageOut which are arrays of matrices. Each matrix is one frame of the image sequence connected to the Matrix node. This provides the user with completely random access to the image sequence data. Within the main Matrix window panel the user can enter a program in a language which is patterned along the lines of MatLab in that is has an extensive list of matrix operators. All the normal language constructs are also provided such as looping, logical operations and so on. Ideally the user will rarely have to resort to this level of programming.

2.3.2 Object Oriented Visual Programming

As mentioned earlier, MINDSEYE contains many type of nodes. Figure 5 shows a few more of the Component groups, the most interesting is the unusual looking one in the lower right corner of the figure. It actually contains a list of user created nodes that we refer to as group boxes. We have implemented a visual programming system where the user can combine nodes in various ways to achieve new functional units and change all the contained nodes into a single node with all the functionality of the original set of nodes. For example, to create an early vision direction selective motion unit such as proposed by Adelson and Bergen (1985) requires about 12 nodes. By converting them into a group box it becomes easy to populate a vision model with many of these motion units each with its own parameters. The group box nodes themselves can become parts of other group boxes, that is the language supports nesting of group boxes. The Model window in Fig 5 contains 4 nodes, two Read nodes, one output node that calculates a Minkowski metric (Mink) and a group box node labeled gabener:2-16. Double clicking on that group box brings up the window, GroupWindow: 1, which contains several types of nodes including more group boxes. Clicking on group box gab2ener:: brings up GroupWindow:2 which also contains more group boxes and other assorted nodes. Clicking on a group box in this window brings up GroupWindow:3 with its two unique looking group boxes. Clicking again brings up GroupWindow:4 containing a large assortment of Gabor convolution nodes. This demonstrates how the user can create new nodes with many levels of nesting to hide the details of a large complicated model. At the same time the model's details are always still only a few clicks away.

The mechanics of creating a group box are straight forward. The user links together the nodes of interest and then just uses the mouse to draw a box around the nodes. As soon as the box is drawn the nodes collapse into a single node and the user is requested to enter a group box name. It couldn't be easier. In short the addition of object oriented (node oriented) visual programming adds tremendous flexibility and extendibility for the end user.

Figure 5

2.4 Testing the Vision Model - Output formats

Thus far we have describe how easy it is to create input sequences that define images and/or filters based on either an analytic expression or a pre-existing natural image in any of a variety of graphics file formats. Creating a vision model requires selecting and connecting network nodes in the Model window and setting the nodes intrinsic parameters. When the network is activated the sequences are read from disk and as the processing progresses a circle; red, yellow or green appears over each node. When the model has completed the final output can take many forms. Besides the graphic viewer node, nodes for statistics, histograms, combination rules, output to spread sheets and other output file formats exist. The Save Sequence node actually saves the sequence to disk in a form that can in turn be used as an input sequence for other models. Testing a vision model will usually involve inputting two sequence images where the difference between the two images has been shown by human psychophysical methods to be at threshold, d' = 1. The model would be designed to accept the two image sequences and provide various outputs with the critical test being a prediction of human discrimination threshold in d' units. The model should match the psychophysical data. Better models will have greater generality, be able to make accurate predictions for a larger body of experimental data.

The need for a vision modeling environment with the features of the MINDSEYE prototype grows with every passing day. With additional funding, MINDSEYE could ultimately have a significant impact on modeling visual function.

2.6 References

Adelson, A. H. & Bergen, J. R. (1985) Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A2, 284-299.

Landy, M. S., Manovich, L. Z. & Stetten, G. D. (1989) All about EVE: The early vision emulation software. Beh. Res. Methods, Inst. & Comp. 21, 491-501.

Marr, D. and Ullman, S. (1981) Directional selectivity and its use in early visual processing. Proc. Royal Soc. Lon. B 211, 151-180.

van Santen, J. P. H. & Sperling, G. (1985) Elaborated Reichardt detectors. J. Opt. Soc. Am. A2, 300-321.

Watson, A. (1987) Efficiency of a model human image code. J. Opt. Soc. Am. 4, 2401-2417.

Watson, A. B. & Ahumada, A. J. (1985) Model of human visual motion sensing. J. of the Opt. Soc. Am. A2, 322-342.

Watt, R. J. & Morgan, M. J. (1983) The recognition and representation of edge blur: evidence for spatial primitives in human vision. Vision Res. 23, 1465-1477.

Wilson, H. R. & Gelb, D. J. (1984) Modified line-element theory for spatial-frequency and width discrimination. J. of Opt. Soc. Am. A1, 124-131.