formats.md
ASCII text
Data formats
This document describes the various data formats that are used in the system.
Raw annotation data
The client sends raw data for image annotations in a JSON format which is a list of shapes. Each shape is a dictionary with the following keys:
type
: The type of the shape which can be:bbox
(bounding box, rectangle)polygon
polyline
point
shape
: The shape data. Its format depends on the shapetype
:For
bbox
it is a dictionary with keys x, y, w, h: ~json {"x": x, "y": y, "w": w, "h": h}~For
polygon
andpolyline
it is a list of points; each point is a dictionary with keys x and y: ~json [{"x": x1, "y": y1}, {"x": x2, "y": y2}, ...]~ The only difference betweenpolygon
andpolyline
is that the former is supposed to be closed so the last point is connected to the first one.For
point
it is a dictionary with keys x and y: ~json {"x": x, "y": y}~All coordinates are floating-point numbers in the range [0, 1] and relative to the image size, with the origin in the top-left corner.
object
: The ID of the type of object (label) depicted in the shape. This ID is a human-readable string that must be registered in the system before being used on shapes.
The server sends the same data back to the client, to use to show the existing annotations for an image.
Example
[ { "type": "bbox", "shape": {"x": 0.1, "y": 0.1, "w": 0.5, "h": 0.5}, "object": "Cat (Felis catus)" }, { "type": "polygon", "shape": [{"x": 0, "y": 0}, {"x": 1, "y": 0}, {"x": 0, "y": 1}], "object": "Slice of pizza margherita" }, { "type": "point", "shape": {"x": 0.5, "y": 0.5}, "object": "Cat (Felis catus) - left eye" } ]
Query format
The query format is based on YAML and used to query for pictures in the system.
Structure
The root can have 3 keys:
want
: A list of rules that the images must satisfy. If not provided, no filtering is done.exclude
: A list of rules that the images must not satisfy. If not provided, no filtering is done.include_obsolete
: If true the query may return images with a designated replacement. If false (default) this won't be possible.
want
and exclude
are lists of rules. Each rule is a dictionary with a single
key (this is to allow multiple rules of the same kind). Accepted rules are:
has_object: [object1, object2, ...]
: The image must contain any of the objects in the list.has: [object1, object2, ...]
: The image must contain any of the objects in the list, or a descendant of any of them.nature: [nature1, nature2, ...]
: The image must have one of the natures in the list. Natures are strings like "photo" that indicate the source of the image.licence: [licence1, licence2, ...]
: The image must have one of the licences in the list. If possible, licence IDs are SPDX identifiers, non-standard ones are prefixed withX-
.author: [author1, author2, ...]
: The image's author's username must be in the list.title: query
: Search for titles (ilike
).description: query
: Search for descriptions (ilike
).origin_url: query
: Search for origin URLs. The query matches the beginning of the URL excluding the protocol. (likecommons.wikimedia.org
)above_width: width
: The image must have a width greater than or equal to the given value, in pixels.above_height: height
: The image must have a height greater than or equal to the given value, in pixels.below_width: width
: The image must have a width less than or equal to the given value, in pixels.below_height: height
: The image must have a height less than or equal to the given value, in pixels.before_date: timestamp
: The image must have been uploaded before the given Unix timestamp.after_date: timestamp
: The image must have been uploaded after the given Unix timestamp.in_gallery: [gallery1, gallery2, ...]
: The image must be in any of the galleries (by ID) in the list.above_rating: rating
: The image must have a rating greater than or equal to the given value (1-5 stars). Images with no rating are included; useabove_rating_count: 1
to exclude them.below_rating: rating
: The image must have a rating less than or equal to the given value (1-5 stars).above_rating_count: count
: The image must have at least the given rating count.below_rating_count: count
: The image must have at most the given rating count.above_region_count: count
: The image must have at least the given number of regions.below_region_count: count
: The image must have at most the given number of regions.copied_from: [image1, image2, ...]
: The image must be a copy of an image in the images in the list (by ID).
ordering
, offset
and limit
can be specified as query parameters in the
URL. ordering
can be one of date-desc
, date-asc
, title-asc
, title-desc
,
number-regions-desc
, number-regions-asc
, random
. offset
and limit
are
integers that specify the number of images to skip and the maximum number of
images to return, respectively.
Example
# Restrictions for queried images want: # This means that the image must contain both rules, so both a cat and a dog - has_object: ["Cat (Felis catus)"] - has_object: ["Dog (Canis lupus familiaris)"] # Or we can put them in a list to mean that the image can contain any of the # objects in the list - has_object: ["Grass", "Flower"] # So the image must contain a cat and a dog, as well as either grass or # a flower # The following rule restricts the images to those with a certain source, # like a camera or a drawing; omitting this rule means that the images can # be of any source - nature: ["photo", "computer-3d-art"] # The following rule restricts the images to those with a certain licence - licence: ["CC-BY-1.0", "CC-BY-2.0", "CC-BY-3.0", "CC-BY-4.0", "CC0-1.0", "Unlicense", "WTFPL", "MIT", "BSD-2-Clause", "BSD-3-Clause", "Apache-2.0", "X-informal-attribution", "X-informal-do-anything", "X-public-domain-old", "X-public-domain"] # Prohibitions for queried images exclude: # This means that the image must not contain any of the objects in the list - has_object: ["Human"] # This excludes images uploaded before the given date - before_date: 1546300800 # This requires images to have a minimum resolution - below_width: 800 - below_height: 600 # In summary, we want images that contain both a cat and a dog, either a grass # or a flower, but not a human, taken after 2019-01-01, must be a photo or a # 3D render, must carry one of certain permissive licences and have a resolution # of at least 800x600 pixels.
1Data formats 2============ 3 4This document describes the various data formats that are used in the system. 5 6Raw annotation data 7------------------- 8 9The client sends raw data for image annotations in a JSON format which is a list 10of shapes. Each shape is a dictionary with the following keys: 11 12* `type`: The type of the shape which can be: 13* `bbox` (bounding box, rectangle) 14* `polygon` 15* `polyline` 16* `point` 17* `shape`: The shape data. Its format depends on the shape `type`: 18* For `bbox` it is a dictionary with keys x, y, w, h: 19~~~json 20{"x": x, "y": y, "w": w, "h": h} 21~~~ 22* For `polygon` and `polyline` it is a list of points; each point is a 23dictionary with keys x and y: 24~~~json 25[{"x": x1, "y": y1}, {"x": x2, "y": y2}, ...] 26~~~ 27The only difference between `polygon` and `polyline` is that the former is 28supposed to be closed so the last point is connected to the first one. 29* For `point` it is a dictionary with keys x and y: 30~~~json 31{"x": x, "y": y} 32~~~ 33* All coordinates are floating-point numbers in the range [0, 1] and relative 34to the image size, with the origin in the top-left corner. 35* `object`: The ID of the type of object (label) depicted in the shape. This ID 36is a human-readable string that must be registered in the system before 37being used on shapes. 38 39The server sends the same data back to the client, to use to show the existing 40annotations for an image. 41 42### Example 43 44~~~json 45[ 46{ 47"type": "bbox", 48"shape": {"x": 0.1, "y": 0.1, "w": 0.5, "h": 0.5}, 49"object": "Cat (Felis catus)" 50}, 51{ 52"type": "polygon", 53"shape": [{"x": 0, "y": 0}, {"x": 1, "y": 0}, {"x": 0, "y": 1}], 54"object": "Slice of pizza margherita" 55}, 56{ 57"type": "point", 58"shape": {"x": 0.5, "y": 0.5}, 59"object": "Cat (Felis catus) - left eye" 60} 61] 62~~~ 63 64Query format 65------------ 66 67The query format is based on YAML and used to query for pictures in the system. 68 69### Structure 70The root can have 3 keys: 71* `want`: A list of rules that the images must satisfy. If not provided, no 72filtering is done. 73* `exclude`: A list of rules that the images must not satisfy. If not provided, 74no filtering is done. 75* `include_obsolete`: If true the query may return images with a designated 76replacement. If false (default) this won't be possible. 77 78`want` and `exclude` are lists of rules. Each rule is a dictionary with a single 79key (this is to allow multiple rules of the same kind). Accepted rules are: 80* `has_object: [object1, object2, ...]`: The image must contain any of the 81objects in the list. 82* `has: [object1, object2, ...]`: The image must contain any of the objects in 83the list, or a descendant of any of them. 84* `nature: [nature1, nature2, ...]`: The image must have one of the natures in 85the list. Natures are strings like "photo" that indicate the source of the 86image. 87* `licence: [licence1, licence2, ...]`: The image must have one of the licences 88in the list. If possible, licence IDs are SPDX identifiers, non-standard ones 89are prefixed with `X-`. 90* `author: [author1, author2, ...]`: The image's author's username must be in 91the list. 92* `title: query`: Search for titles (`ilike`). 93* `description: query`: Search for descriptions (`ilike`). 94* `origin_url: query`: Search for origin URLs. The query matches the beginning 95of the URL excluding the protocol. (like `commons.wikimedia.org`) 96* `above_width: width`: The image must have a width greater than or equal to 97the given value, in pixels. 98* `above_height: height`: The image must have a height greater than or equal to 99the given value, in pixels. 100* `below_width: width`: The image must have a width less than or equal to the 101given value, in pixels. 102* `below_height: height`: The image must have a height less than or equal to the 103given value, in pixels. 104* `before_date: timestamp`: The image must have been uploaded before the given 105Unix timestamp. 106* `after_date: timestamp`: The image must have been uploaded after the given 107Unix timestamp. 108* `in_gallery: [gallery1, gallery2, ...]`: The image must be in any of the 109galleries (by ID) in the list. 110* `above_rating: rating`: The image must have a rating greater than or equal to 111the given value (1-5 stars). Images with no rating are included; use 112`above_rating_count: 1` to exclude them. 113* `below_rating: rating`: The image must have a rating less than or equal to the 114given value (1-5 stars). 115* `above_rating_count: count`: The image must have at least the given rating 116count. 117* `below_rating_count: count`: The image must have at most the given rating 118count. 119* `above_region_count: count`: The image must have at least the given number of 120regions. 121* `below_region_count: count`: The image must have at most the given number of 122regions. 123* `copied_from: [image1, image2, ...]`: The image must be a copy of an image in the 124images in the list (by ID). 125 126`ordering`, `offset` and `limit` can be specified as query parameters in the 127URL. `ordering` can be one of `date-desc`, `date-asc`, `title-asc`, `title-desc`, 128`number-regions-desc`, `number-regions-asc`, `random`. `offset` and `limit` are 129integers that specify the number of images to skip and the maximum number of 130images to return, respectively. 131 132### Example 133~~~yaml 134# Restrictions for queried images 135want: 136# This means that the image must contain both rules, so both a cat and a dog 137- has_object: ["Cat (Felis catus)"] 138- has_object: ["Dog (Canis lupus familiaris)"] 139# Or we can put them in a list to mean that the image can contain any of the 140# objects in the list 141- has_object: ["Grass", "Flower"] 142# So the image must contain a cat and a dog, as well as either grass or 143# a flower 144# The following rule restricts the images to those with a certain source, 145# like a camera or a drawing; omitting this rule means that the images can 146# be of any source 147- nature: ["photo", "computer-3d-art"] 148# The following rule restricts the images to those with a certain licence 149- licence: ["CC-BY-1.0", "CC-BY-2.0", "CC-BY-3.0", "CC-BY-4.0", "CC0-1.0", 150"Unlicense", "WTFPL", "MIT", "BSD-2-Clause", "BSD-3-Clause", 151"Apache-2.0", "X-informal-attribution", "X-informal-do-anything", 152"X-public-domain-old", "X-public-domain"] 153# Prohibitions for queried images 154exclude: 155# This means that the image must not contain any of the objects in the list 156- has_object: ["Human"] 157# This excludes images uploaded before the given date 158- before_date: 1546300800 159# This requires images to have a minimum resolution 160- below_width: 800 161- below_height: 600 162# In summary, we want images that contain both a cat and a dog, either a grass 163# or a flower, but not a human, taken after 2019-01-01, must be a photo or a 164# 3D render, must carry one of certain permissive licences and have a resolution 165# of at least 800x600 pixels. 166~~~ 167