IQR Demo Application

Interactive Query Refinement or “IQR” is a process whereby a user provides one or more exemplar images and the system attempts to locate additional images from within an archive that a similar to the exemplar(s). The user then adjudicates the results by identifying those results that match their search and those results that do not. The system then uses those adjudications to attempt to provide better, more closely matching results refined by the user’s input.

../_images/IQRWithSMQTK.png

SMQTK IQR Workflow

Overall workflow of an SMQTK based Interactive Query Refinement application.

The IQR application is an excellent example application for SMQTK as it makes use of a broad spectrum of SMQTK’s capabilities. In order to characterize each image in the archive so that it can be indexed, the DescriptorGenerator algorithm is used. A NearestNeighborsIndex algorithm is used to understand the relationship between the images in the archive and a RelevancyIndex algorithm is used to rank results based on the user’s positive and negative adjudications.

SMQTK comes with a pair of web-based application that implements an IQR system using SMQTK’s services as shown in the SMQTK IQR Workflow figure.

Running the IQR Application

The SMQTK IQR demonstration application consists of two web services: one for hosting the models and processing for an archive, and a second for providing a user-interface to one or more archives.

In order to run the IQR demonstration application, we will need an archive of imagery. SMQTK has facilities for creating indexes that support 10’s or even 100’s or 1000’s of images. For demonstration purposes, we’ll use a modest archive of images. The Leeds Butterfly Dataset will serve quite nicely. Download and unzip the archive (which contains over 800 images of different species of butterflies).

SMQTK comes with a script, iqr_app_model_generation, that computes the descriptors on all of the images in your archive and builds up the models needed by the NearestNeighborsIndex and RelevancyIndex algorithms.

usage: iqr_app_model_generation [-h] [-v] -c PATH PATH -t TAB GLOB [GLOB ...]

Positional Arguments

GLOB Shell glob to files to add to the configured data set.

Named Arguments

-v, --verbose

Output additional debug logging.

Default: False

-c, --config Path to the JSON configuration files. The first file provided should be the configuration file for the IqrSearchDispatcher web-application and the second should be the configuration file for the IqrService web-application.
-t, --tab The configuration “tab” of the IqrSearchDispatcher configuration to use. This informs what dataset to add the input data files to.

The -c/--config option should be given the 2 paths to the configuration files for the IqrSearchDispatcher and IqrService web services respectively. These provide the configuration blocks for each of the SMQTK algorithms (DescriptorGenerator, NearestNeighborIndex, etc.) required to generate the models and indices that will be required by the application. For convenience, the same configuration files will be provided to the web applications when they are run later.

The SMQTK source repository contains sample configuration files for both the IqrSearchDispatcher and IqrService services. They can be found at source/python/smqtk/web/search_app/sample_configs/config.IqrSearchApp.json and source/python/smqtk/web/search_app/sample_configs/config.IqrRestService.json respectively. The iqr_app_model_generation script is designed to run from an empty directory and will create the sub-directories specified in the above configurations requires when run.

Since these configuration files drive both the generation of the models and the web applications themselves, a closer examination is in order.

Present in both configuration files are the flask_app and server sections which control Flask web server application parameters. The config.IqrSearchApp.json contains the additional section mongo that configures the MongoDB server the UI service uses for storing user session information.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
{
    "flask_app": {
        "BASIC_AUTH_PASSWORD": "demo",
        "BASIC_AUTH_USERNAME": "demo",
        "SECRET_KEY": "MySuperUltraSecret"
    },
    "server": {
        "host": "127.0.0.1",
        "port": 5000
    },
    "mongo": {
        "database": "smqtk",
        "server": "127.0.0.1:27017"
    },
    "iqr_tabs": {
        "LEEDS Butterflies": {
            "working_directory": "workdir",
            "data_set": {
                "DataMemorySet": {
                    "cache_element": {
                        "DataFileElement": {
                            "explicit_mimetype": null,
                            "filepath": "workdir/butterflies_alexnet_fc7/data.memorySet.cache",
                            "readonly": false
                        },
                        "type": "DataFileElement"
                    },
                    "pickle_protocol": -1
                },
                "type": "DataMemorySet"
            },
            "iqr_service_url": "http://localhost:5001"
        }
    }
}

The config.IqrSerchApp.json configuration has an additional block “iqr_tabs” (line 15). This defines the different archives, and matching IQR REST service describing that archive, the UI is to provide an interface for. In our case there will be only one entry, “LEEDS Butterflies” (line 16), representing the archive that we are currently building. This section describes the data-set container that contains the archive imagery to show in the UI (line 18) as well as the URL to the RESTful service providing the IQR functions for the archive (line 32).

In the config.IqrRestService.json configuration file (shown below) we see the specification of the algorithm and representation plugins the RESTful IQR service app will use under iqr_service -> plugins. Each of these of these blocks is passed to the SMQTK plugin system to create the appropriate instances of the algorithm or data representation in question. The blocks located at lines 35, 66, and 147 configure the three main algorithms used by the application: the descriptor generator, the nearest neighbors index, and the relevancy index. For example the nn_index block that starts at line 66 specifies two different implementations: FlannNearestNeighborsIndex, which uses the Flann library, and LSHNearestNeighborIndex, configured to use the Iterative Quantization hash function (paper). The type element on line 135 selects the LSHNearestNeighborIndex to be used for this configuration.

(jump past configuration display)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
{
    "flask_app": {
        "BASIC_AUTH_PASSWORD": "demo",
        "BASIC_AUTH_USERNAME": "demo",
        "SECRET_KEY": "MySuperUltraSecret"
    },
    "server": {
        "host": "127.0.0.1",
        "port": 5001
    },
    "iqr_service": {
        "plugins": {
            "classification_factory": {
                "MemoryClassificationElement": {},
                "type": "MemoryClassificationElement"
            },
            "classifier_config": {
                "LibSvmClassifier": {
                    "normalize": 2,
                    "svm_label_map_uri": null,
                    "svm_model_uri": null,
                    "train_params": {
                        "-b": 1,
                        "-c": 2,
                        "-s": 0,
                        "-t": 0
                    }
                },
                "type": "LibSvmClassifier"
            },
            "descriptor_factory": {
                "DescriptorMemoryElement": {},
                "type": "DescriptorMemoryElement"
            },
            "descriptor_generator": {
                "CaffeDescriptorGenerator": {
                    "batch_size": 128,
                    "data_layer": "data",
                    "gpu_device_id": 0,
                    "image_mean_uri": "~/dev/caffe/source/data/ilsvrc12/imagenet_mean.binaryproto",
                    "input_scale": null,
                    "load_truncated_images": false,
                    "network_is_bgr": true,
                    "network_model_uri": "~/dev/caffe/source/models/bvlc_alexnet/bvlc_alexnet.caffemodel",
                    "network_prototxt_uri": "~/dev/caffe/source/models/bvlc_alexnet/deploy.prototxt",
                    "pixel_rescale": null,
                    "return_layer": "fc7",
                    "use_gpu": true
                },
                "type": "CaffeDescriptorGenerator"
            },
            "descriptor_set": {
                "MemoryDescriptorSet": {
                    "cache_element": {
                        "DataFileElement": {
                            "explicit_mimetype": null,
                            "filepath": "workdir/butterflies_alexnet_fc7/descriptor_set.pickle",
                            "readonly": false
                        },
                        "type": "DataFileElement"
                    },
                    "pickle_protocol": -1
                },
                "type": "MemoryDescriptorSet"
            },
            "neighbor_index": {
                "FlannNearestNeighborsIndex": {
                    "autotune": false,
                    "descriptor_cache_uri": "workdir/butterflies_alexnet_fc7/flann/index.cache",
                    "distance_method": "hik",
                    "index_uri": "workdir/butterflies_alexnet_fc7/flann/index.flann",
                    "parameters_uri": "workdir/butterflies_alexnet_fc7/flann/index.parameters",
                    "random_seed": 42
                },
                "LSHNearestNeighborIndex": {
                    "descriptor_set": {
                        "MemoryDescriptorSet": {
                            "cache_element": {
                                "DataFileElement": {
                                    "explicit_mimetype": null,
                                    "filepath": "workdir/butterflies_alexnet_fc7/descriptor_set.pickle",
                                    "readonly": false
                                },
                                "type": "DataFileElement"
                            },
                            "pickle_protocol": -1
                        },
                        "type": "MemoryDescriptorSet"
                    },
                    "distance_method": "cosine",
                    "hash2uuids_kvstore": {
                        "MemoryKeyValueStore": {
                            "cache_element": {
                                "DataFileElement": {
                                    "explicit_mimetype": null,
                                    "filepath": "workdir/butterflies_alexnet_fc7/hash2uuids.mem_kvstore.pickle",
                                    "readonly": false
                                },
                                "type": "DataFileElement"
                            }
                        },
                        "type": "MemoryKeyValueStore"
                    },
                    "hash_index": {
                        "type": null
                    },
                    "hash_index_comment": "'hash_index' may also be null to default to a linear index built at query time.",
                    "lsh_functor": {
                        "ItqFunctor": {
                            "bit_length": 64,
                            "itq_iterations": 50,
                            "mean_vec_cache": {
                                "DataFileElement": {
                                    "explicit_mimetype": null,
                                    "filepath": "workdir/butterflies_alexnet_fc7/itqnn/mean_vec.npy",
                                    "readonly": false
                                },
                                "type": "DataFileElement"
                            },
                            "normalize": null,
                            "random_seed": 42,
                            "rotation_cache": {
                                "DataFileElement": {
                                    "explicit_mimetype": null,
                                    "filepath": "workdir/butterflies_alexnet_fc7/itqnn/rotation.npy",
                                    "readonly": false
                                },
                                "type": "DataFileElement"
                            }
                        },
                        "type": "ItqFunctor"
                    },
                    "read_only": false
                },
                "type": "LSHNearestNeighborIndex"
            },
            "relevancy_index_config": {
                "LibSvmHikRelevancyIndex": {
                    "autoneg_select_ratio": 1,
                    "cores": null,
                    "descr_cache_filepath": null,
                    "multiprocess_fetch": false
                },
                "type": "LibSvmHikRelevancyIndex"
            }
        },
        "session_control": {
            "positive_seed_neighbors": 500,
            "session_expiration": {
                "check_interval_seconds": 30,
                "enabled": false,
                "session_timeout": 3600
            }
        }
    }
}

Once you have the configuration file set up the way that you like it, you can generate all of the models and indexes required by the application by running the following command:

iqr_app_model_generation \
    -c config.IqrSearchApp.json config.IqrRestService.json \
    -t "LEEDS Butterflies" /path/to/butterfly/images/*.jpg

This will generate descriptors for all of the images in the data set and use them to compute the models and indices we configured, outputting to the files under the workdir directory in your current directory.

Once it completes, you can run the IqrSearchApp and IqrService web-apps. You’ll need an instance of MongoDB running on the port and host address specified by the mongo element on line 13 in your config.IqrSearchApp.json configuration file. You can start a Mongo instance (presuming you have it installed) with:

mongod --dbpath /path/to/mongo/data/dir

Once Mongo has been started you can start the IqrSearchApp and IqrService services with the following commands in separate terminals:

# Terminal 1
runApplication -a IqrService -c config.IqrRestService.json

# Terminal 2
runApplication -a IqrSearchDispatcher -c config.IqrSearchApp.json

After the services have been started, open a web browser and navigate to http://localhost:5000. Click lick on the login button in the upper-right and then enter the credentials specified in the default login settings file source/python/smqtk/web/search_app/modules/login/users.json.

../_images/iqrlogin.png

Click on the login element to enter your credentials

../_images/iqrlogin-entry.png

Enter demo credentials

Once you’ve logged in you will be able to select the LEEDS Butterfly link. This link was named by line 16 in the config.IqrSearchApp.json configuration file. The iqr_tabs mapping allows you to configure interfacing with different IQR REST services providing different combinations of the required algorithms – useful for example, if you want to compare the performance of different descriptors or nearest-neighbor index algorithms.

../_images/iqr-butterflies-link.png

Select the “LEEDS Butterflies” link to begin working with the application

To begin the IQR process drag an exemplar image to the grey load area (marked 1 in the next figure). In this case we’ve uploaded a picture of a Monarch butterfly (2). Once uploaded, click the Initialize Index button (3) and the system will return a set of images that it believes are similar to the exemplar image based on the descriptor computed.

../_images/iqrinitialize.png

IQR Initilization

The next figure shows the set of images returned by the system (on the left) and a random selection of images from the archive (by clicking the Toggle Random Results element). As you can see, even with just one exemplar the system is beginning to learn to return Monarch butterflies (or butterflies that look like Monarchs)

../_images/iqrinitialresults.png

Initial Query Results and Random Results

At this point you can begin to refine the query. You do this by marking correct returns at their checkbox and incorrect returns at the “X”. Once you’ve marked a number of returns, you can select the “Refine” element which will use your adjudications to retrain and rerank the results with the goal that you will increasingly see correct results in your result set.

../_images/iqrrefine.png

Query Refinement

You can continue this process for as long as you like until you are satisfied with the results that the query is returning. Once you are happy with the results, you can select the Save IQR State button. This will save a file that contains all of the information requires to use the results of the IQR query as an image classifier. The process for doing this is described in the next session.

Using an IQR Trained Classifier

Before you can use your IQR session as a classifier, you must first train the classifier model from the IQR session state. You can do this with the iqrTrainClassifier tool:

usage: iqrTrainClassifier [-h] [-v] [-c PATH] [-g PATH] [-i IQR_STATE]

Named Arguments

-v, --verbose

Output additional debug logging.

Default: False

-i, --iqr-state
 Path to the ZIP file saved from an IQR session.

Configuration

-c, --config Path to the JSON configuration file.
-g, --generate-config
 Optionally generate a default configuration file at the specified path. If a configuration file was provided, we update the default configuration with the contents of the given configuration.

As with other tools from SMQTK the configuration file is a JSON file. An default configuration file may be generated by calling iqrTrainClassifier -g example.json, but pre-configured example file can be found here and is shown below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    "classifier": {
        "LibSvmClassifier": {
            "normalize": 2,
            "svm_label_map_uri": "workdir/iqr_classifier/label_map",
            "svm_model_uri": "workdir/iqr_classifier/model",
            "train_params": {
                "-b": 1, 
                "-c": 2, 
                "-s": 0, 
                "-t": 0
            }
        }, 
        "type": "LibSvmClassifier"
    }
}

The above configuration specifies the classifier that will be used, in this case the LibSvmClassifier. Let us assume the IQR session state was downloaded as monarch.IqrState. The following command will train a classifier leveraging the descriptors labeled by the IQR session that was saved:

iqrTrainClassifier.py -c config.iqrTrainClassifier.json -i monarch.IqrState

Once you have trained the classifier, you can use the classifyFiles command to actually classify a set of files.

usage: smqtk-classify-files [-h] [-v] [-c PATH] [-g PATH] [--overwrite]
                            [-l LABEL]
                            [GLOB [GLOB ...]]

Positional Arguments

GLOB Series of shell globs specifying the files to classify.

Named Arguments

-v, --verbose

Output additional debug logging.

Default: False

Configuration

-c, --config Path to the JSON configuration file.
-g, --generate-config
 Optionally generate a default configuration file at the specified path. If a configuration file was provided, we update the default configuration with the contents of the given configuration.

Classification

--overwrite

When generating a configuration file, overwrite an existing file.

Default: False

-l, --label The class to filter by. This is based on the classifier configuration/model used. If this is not provided, we will list the available labels in the provided classifier configuration.

Again, we need to provide a JSON configuration file for the command. As with iqrTrainClassifier, there is a sample configuration file in the repository:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
    "classification_factory": {
        "MemoryClassificationElement": {},
        "type": "MemoryClassificationElement"
    },
    "classifier": {
        "LibSvmClassifier": {
            "normalize": 2,
            "svm_label_map_uri": "workdir/iqr_classifier/label_map",
            "svm_model_uri": "workdir/iqr_classifier/model",
            "train_params": {
                "-b": 1,
                "-c": 2,
                "-s": 0,
                "-t": 0
            }
        },
        "type": "LibSvmClassifier"
    },
    "descriptor_factory": {
        "DescriptorMemoryElement": {},
        "type": "DescriptorMemoryElement"
    },
    "descriptor_generator": {
        "CaffeDescriptorGenerator": {
            "batch_size": 128,
            "data_layer": "data",
            "gpu_device_id": 0,
            "image_mean_uri": "~/dev/caffe/source/data/ilsvrc12/imagenet_mean.binaryproto",
            "input_scale": null,
            "load_truncated_images": false,
            "network_is_bgr": true,
            "network_model_uri": "~/dev/caffe/source/models/bvlc_alexnet/bvlc_alexnet.caffemodel",
            "network_prototxt_uri": "~/dev/caffe/source/models/bvlc_alexnet/deploy.prototxt",
            "pixel_rescale": null,
            "return_layer": "fc7",
            "use_gpu": true
        },
        "type": "CaffeDescriptorGenerator"
    }
}

Note that the classifier block on lines 7-18 is the same as the classifier block in the iqrTrainClassfier configuration file. Further, the descriptor_generator block on lines 25-39 matches the descriptor generator used for the IQR application itself (thus matching the type of descriptor used to train the classifier).

Once you’ve set up the configuration file to your liking, you can classify a set of labels with the following command:

smqtk-classify-files -c config.classifyFiles.json -l positive /path/to/butterfly/images/*.jpg

If you leave the -l argument, the command will tell you the labels available with the classifier (in this case positive and negative).

SMQTK’s smqtk-classify-files tool can use this saved IQR state to classify a set of files (not necessarily the files in your IQR Applicaiton ingest). The command has the following form: