roundabout,
created on Wednesday, 25 September 2024, 15:20:28 (1727277628),
received on Thursday, 26 September 2024, 08:09:39 (1727338179)
Author identity: vlad <vlad.muntoiu@gmail.com>
55f6e9198d58c67ca2c537405cb57bca933f8b7f
app.py
@@ -344,10 +344,16 @@ with app.app_context():
@app.route("/")
def index():
return flask.render_template("home.html", resources=PictureResource.query.order_by(
return flask.render_template("home.html", resources=PictureResource.query.filter_by(replaced_by=None).order_by(
db.func.random()).limit(10).all())
@app.route("/info/")
def usage_guide():
with open("help/usage.md") as f:
return flask.render_template("help.html", content=markdown.markdown2html(f.read()))
@app.route("/accounts/")
def accounts():
return flask.render_template("login.html")
@@ -1601,7 +1607,7 @@ def api_rate_picture(id):
if current_user is None:
flask.abort(401)
rating = int(flask.request.json.get("rating"))
rating = int(flask.request.json.get("rating", 0))
if not rating:
# Delete the existing rating
formats.md
@@ -1,166 +0,0 @@
Data formats
============
This document describes the various data formats that are used in the system.
Raw annotation data
-------------------
The client sends raw data for image annotations in a JSON format which is a list
of shapes. Each shape is a dictionary with the following keys:
* `type`: The type of the shape which can be:
* `bbox` (bounding box, rectangle)
* `polygon`
* `polyline`
* `point`
* `shape`: The shape data. Its format depends on the shape `type`:
* For `bbox` it is a dictionary with keys x, y, w, h:
~~~json
{"x": x, "y": y, "w": w, "h": h}
~~~
* For `polygon` and `polyline` it is a list of points; each point is a
dictionary with keys x and y:
~~~json
[{"x": x1, "y": y1}, {"x": x2, "y": y2}, ...]
~~~
The only difference between `polygon` and `polyline` is that the former is
supposed to be closed so the last point is connected to the first one.
* For `point` it is a dictionary with keys x and y:
~~~json
{"x": x, "y": y}
~~~
* All coordinates are floating-point numbers in the range [0, 1] and relative
to the image size, with the origin in the top-left corner.
* `object`: The ID of the type of object (label) depicted in the shape. This ID
is a human-readable string that must be registered in the system before
being used on shapes.
The server sends the same data back to the client, to use to show the existing
annotations for an image.
### Example
~~~json
[
{
"type": "bbox",
"shape": {"x": 0.1, "y": 0.1, "w": 0.5, "h": 0.5},
"object": "Cat (Felis catus)"
},
{
"type": "polygon",
"shape": [{"x": 0, "y": 0}, {"x": 1, "y": 0}, {"x": 0, "y": 1}],
"object": "Slice of pizza margherita"
},
{
"type": "point",
"shape": {"x": 0.5, "y": 0.5},
"object": "Cat (Felis catus) - left eye"
}
]
~~~
Query format
------------
The query format is based on YAML and used to query for pictures in the system.
### Structure
The root can have 3 keys:
* `want`: A list of rules that the images must satisfy. If not provided, no
filtering is done.
* `exclude`: A list of rules that the images must not satisfy. If not provided,
no filtering is done.
* `include_obsolete`: If true the query may return images with a designated
replacement. If false (default) this won't be possible.
`want` and `exclude` are lists of rules. Each rule is a dictionary with a single
key (this is to allow multiple rules of the same kind). Accepted rules are:
* `has_object: [object1, object2, ...]`: The image must contain any of the
objects in the list.
* `has: [object1, object2, ...]`: The image must contain any of the objects in
the list, or a descendant of any of them.
* `nature: [nature1, nature2, ...]`: The image must have one of the natures in
the list. Natures are strings like "photo" that indicate the source of the
image.
* `licence: [licence1, licence2, ...]`: The image must have one of the licences
in the list. If possible, licence IDs are SPDX identifiers, non-standard ones
are prefixed with `X-`.
* `author: [author1, author2, ...]`: The image's author's username must be in
the list.
* `title: query`: Search for titles (`ilike`).
* `description: query`: Search for descriptions (`ilike`).
* `origin_url: query`: Search for origin URLs. The query matches the beginning
of the URL excluding the protocol. (like `commons.wikimedia.org`)
* `above_width: width`: The image must have a width greater than or equal to
the given value, in pixels.
* `above_height: height`: The image must have a height greater than or equal to
the given value, in pixels.
* `below_width: width`: The image must have a width less than or equal to the
given value, in pixels.
* `below_height: height`: The image must have a height less than or equal to the
given value, in pixels.
* `before_date: timestamp`: The image must have been uploaded before the given
Unix timestamp.
* `after_date: timestamp`: The image must have been uploaded after the given
Unix timestamp.
* `in_gallery: [gallery1, gallery2, ...]`: The image must be in any of the
galleries (by ID) in the list.
* `above_rating: rating`: The image must have a rating greater than or equal to
the given value (1-5 stars). Images with no rating are included; use
`above_rating_count: 1` to exclude them.
* `below_rating: rating`: The image must have a rating less than or equal to the
given value (1-5 stars).
* `above_rating_count: count`: The image must have at least the given rating
count.
* `below_rating_count: count`: The image must have at most the given rating
count.
* `above_region_count: count`: The image must have at least the given number of
regions.
* `below_region_count: count`: The image must have at most the given number of
regions.
* `copied_from: [image1, image2, ...]`: The image must be a copy of an image in the
images in the list (by ID).
`ordering`, `offset` and `limit` can be specified as query parameters in the
URL. `ordering` can be one of `date-desc`, `date-asc`, `title-asc`, `title-desc`,
`number-regions-desc`, `number-regions-asc`, `random`. `offset` and `limit` are
integers that specify the number of images to skip and the maximum number of
images to return, respectively.
### Example
~~~yaml
# Restrictions for queried images
want:
# This means that the image must contain both rules, so both a cat and a dog
- has_object: ["Cat (Felis catus)"]
- has_object: ["Dog (Canis lupus familiaris)"]
# Or we can put them in a list to mean that the image can contain any of the
# objects in the list
- has_object: ["Grass", "Flower"]
# So the image must contain a cat and a dog, as well as either grass or
# a flower
# The following rule restricts the images to those with a certain source,
# like a camera or a drawing; omitting this rule means that the images can
# be of any source
- nature: ["photo", "computer-3d-art"]
# The following rule restricts the images to those with a certain licence
- licence: ["CC-BY-1.0", "CC-BY-2.0", "CC-BY-3.0", "CC-BY-4.0", "CC0-1.0",
"Unlicense", "WTFPL", "MIT", "BSD-2-Clause", "BSD-3-Clause",
"Apache-2.0", "X-informal-attribution", "X-informal-do-anything",
"X-public-domain-old", "X-public-domain"]
# Prohibitions for queried images
exclude:
# This means that the image must not contain any of the objects in the list
- has_object: ["Human"]
# This excludes images uploaded before the given date
- before_date: 1546300800
# This requires images to have a minimum resolution
- below_width: 800
- below_height: 600
# In summary, we want images that contain both a cat and a dog, either a grass
# or a flower, but not a human, taken after 2019-01-01, must be a photo or a
# 3D render, must carry one of certain permissive licences and have a resolution
# of at least 800x600 pixels.
~~~
help/usage.md
@@ -0,0 +1,265 @@
What's this?
------------
This is a platform for managing an image dataset. It's designed to make it easy
for anyone to upload images, annotate them, and for people who need images to
search for them. There's also a JSON API for automating tasks. It's also free
software under the GNU AGPLv3.
Querying for pictures
---------------------
The query format is based on YAML and used to query for pictures in the system.
### Structure
The root can have 3 keys:
* `want`: A list of rules that the images must satisfy. If not provided, no
filtering is done.
* `exclude`: A list of rules that the images must not satisfy. If not provided,
no filtering is done.
* `include_obsolete`: If true the query may return images with a designated
replacement. If false (default) this won't be possible.
`want` and `exclude` are lists of rules. Each rule is a dictionary with a single
key (this is to allow multiple rules of the same kind). Accepted rules are:
* `has: [object1, object2, ...]`: The image must contain any of the objects in
the list, or a descendant of any of them.
* `has_object: [object1, object2, ...]`: The image must contain any of the
objects in the list.
* `nature: [nature1, nature2, ...]`: The image must have one of the natures in
the list. Natures are strings like "photo" that indicate the source of the
image.
* `licence: [licence1, licence2, ...]`: The image must have one of the licences
in the list. If possible, licence IDs are SPDX identifiers, non-standard ones
are prefixed with `X-`.
* `author: [author1, author2, ...]`: The image's author's username must be in
the list.
* `title: query`: Search for titles (`ilike`).
* `description: query`: Search for descriptions (`ilike`).
* `origin_url: query`: Search for origin URLs. The query matches the beginning
of the URL excluding the protocol. (like `commons.wikimedia.org`)
* `above_width: width`: The image must have a width greater than or equal to
the given value, in pixels.
* `above_height: height`: The image must have a height greater than or equal to
the given value, in pixels.
* `below_width: width`: The image must have a width less than or equal to the
given value, in pixels.
* `below_height: height`: The image must have a height less than or equal to the
given value, in pixels.
* `before_date: timestamp`: The image must have been uploaded before the given
Unix timestamp.
* `after_date: timestamp`: The image must have been uploaded after the given
Unix timestamp.
* `in_gallery: [gallery1, gallery2, ...]`: The image must be in any of the
galleries (by ID) in the list.
* `above_rating: rating`: The image must have a rating greater than or equal to
the given value (1-5 stars). Images with no rating are included; use
`above_rating_count: 1` to exclude them.
* `below_rating: rating`: The image must have a rating less than or equal to the
given value (1-5 stars).
* `above_rating_count: count`: The image must have at least the given rating
count.
* `below_rating_count: count`: The image must have at most the given rating
count.
* `above_region_count: count`: The image must have at least the given number of
regions.
* `below_region_count: count`: The image must have at most the given number of
regions.
* `copied_from: [image1, image2, ...]`: The image must be a copy of an image in the
images in the list (by ID).
`ordering`, `offset` and `limit` can be specified as query parameters in the
URL. `ordering` can be one of `date-desc`, `date-asc`, `title-asc`, `title-desc`,
`number-regions-desc`, `number-regions-asc`, `random`. `offset` and `limit` are
integers that specify the number of images to skip and the maximum number of
images to return, respectively.
### Example
~~~yaml
# Restrictions for queried images
want:
# This means that the image must contain both rules, so both a cat and a dog
- has_object: ["Cat (Felis catus)"]
- has_object: ["Dog (Canis lupus familiaris)"]
# Or we can put them in a list to mean that the image can contain any of the
# objects in the list
- has_object: ["Grass", "Flower"]
# So the image must contain a cat and a dog, as well as either grass or
# a flower
# The following rule restricts the images to those with a certain source,
# like a camera or a drawing; omitting this rule means that the images can
# be of any source
- nature: ["photo", "computer-3d-art"]
# The following rule restricts the images to those with a certain licence
- licence: ["CC-BY-1.0", "CC-BY-2.0", "CC-BY-3.0", "CC-BY-4.0", "CC0-1.0",
"Unlicense", "WTFPL", "MIT", "BSD-2-Clause", "BSD-3-Clause",
"Apache-2.0", "X-informal-attribution", "X-informal-do-anything",
"X-public-domain-old", "X-public-domain"]
# Prohibitions for queried images
exclude:
# This means that the image must not contain any of the objects in the list
- has_object: ["Human"]
# This excludes images uploaded before the given date
- before_date: 1546300800
# This requires images to have a minimum resolution
- below_width: 800
- below_height: 600
# In summary, we want images that contain both a cat and a dog, either a grass
# or a flower, but not a human, taken after 2019-01-01, must be a photo or a
# 3D render, must carry one of certain permissive licences and have a resolution
# of at least 800x600 pixels.
~~~
You will get a list of JSON dictionaries with the following keys:
* `id` which is the picture's numeric ID.
* `title` which is the picture's title.
* `description` which is the picture's description.
* `author` which is the picture's author's username.
* `origin_url` which is the picture's origin URL.
* `timestamp` which is the Unix time at which the picture was posted.
* `width` and `height` which are the picture's dimensions, in pixels.
* `file_format` which is the picture's MIME type.
* `nature` which is the picture's nature.
* `licences` which is a list of licence identifiers.
* `replaces` which is the ID of the picture replaced by this one, if any.
* `replaced_by` which is the ID of the picture's replacement, if any.
* `regions` which represents the picture's annotations in the format described
below.
* `download` which is the URL of the image file.
If you query the endpoint `/api/picture/<id>/` you will also get:
* `rating_average` which is the average rating of the picture from one to five
stars.
* `rating_count` which is the number of ratings of each grade.
This is done to avoid processing ratings when they aren't needed.
Raw annotation data format
--------------------------
The client sends raw data for image annotations in a JSON format which is a list
of shapes. Each shape is a dictionary with the following keys:
* `type`: The type of the shape which can be:
* `bbox` (bounding box, rectangle)
* `polygon`
* `polyline`
* `point`
* `shape`: The shape data. Its format depends on the shape `type`:
* For `bbox` it is a dictionary with keys x, y, w, h:
`
{"x": x, "y": y, "w": w, "h": h}
`
* For `polygon` and `polyline` it is a list of points; each point is a
dictionary with keys x and y:
`
[{"x": x1, "y": y1}, {"x": x2, "y": y2}, ...]
`
The only difference between `polygon` and `polyline` is that the former is
supposed to be closed so the last point is connected to the first one.
* For `point` it is a dictionary with keys x and y:
`
{"x": x, "y": y}
`
* All coordinates are floating-point numbers in the range [0, 1] and relative
to the image size, with the origin in the top-left corner.
* `object`: The ID of the type of object (label) depicted in the shape. This ID
is a human-readable string that must be registered in the system before
being used on shapes.
The server sends the same data back to the client, to use to show the existing
annotations for an image.
### Example
~~~json
[
{
"type": "bbox",
"shape": {"x": 0.1, "y": 0.1, "w": 0.5, "h": 0.5},
"object": "Cat (Felis catus)"
},
{
"type": "polygon",
"shape": [{"x": 0, "y": 0}, {"x": 1, "y": 0}, {"x": 0, "y": 1}],
"object": "Slice of pizza margherita"
},
{
"type": "point",
"shape": {"x": 0.5, "y": 0.5},
"object": "Cat (Felis catus) - left eye"
}
]
~~~
Authenticating with the API
---------------------------
Your HTTP client must support cookies.
`POST` to `/api/login` with a JSON body containing keys `username` and
`password`.
`POST` to `/api/logout` to clear the session; you can also just delete the
cookie on your own.
User registration isn't supported by the API to prevent spam; use the HTML form.
Uploading from the API
----------------------
You can upload pictures from the API if you're logged in. `POST` to
`/api/upload` with a form data containing a key `json` with the value being a
JSON dictionary with `title`, `description`, `origin_url`, `licence` (list of
IDs), `nature` (ID), and a `file` file that is your image. The JSON can also
have a key `annotations` if you want to prefill them — check above for the
format.
`/api/picture/<id>/update` allows you to pass `title`, `description`,
`origin_url`, `licence_ids` and `nature_id` in a JSON body to change an existing
picture.
Other endpoints
---------------
`/api/licence/` provides a dictionary of the licences with the keys being the
ID and the value being another dictionary with keys `title`, `free` (whether
it respects the Four Freedoms) and `pinned` which indicates if it is at the top
of the list.
`/api/licence/<id>/` provides more details for the particular licence, including
`id`, `title`, `description`, `info_url` which links to a page with information
about the licence, `legalese_url` which links to the actual legal text, `free`,
`pinned` and `logo_url`.
`/api/nature/` provides a dictionary of natures of the form `id: description`.
`/api/user/` provides information about users with the key being the name and
the value being a dictionary with the `admin` boolean. It uses `offset` and
`limit`.
`/api/user/<username>/` provides a dictionary with the user's `id`, `admin`
status and `joined` date.
`POST` to `/api/picture/<id>/rate` with a JSON body with a key `rating` will
put a star rating on the picture. To delete the rating call it with a false
value.
`/api/gallery/<id>` will give metadata about the gallery in JSON: `id`, `title`,
`description`, `owner` and `users`. To get the pictures you can use a query.
`POST` to `/api/gallery/<id>/edit` with keys `title` and `description` in JSON
to update the gallery metadata.
`POST` to `/api/new-gallery` creates a new gallery with keys `title` and
`description` in the JSON.
`POST` to `/api/gallery/<id>/add-picture` with a JSON dictionary containing a
key `picture_id` will add the picture. `/api/gallery/<id>/add-picture` will do
the reverse.
`POST` to `/api/gallery/<id>/users/add` with a JSON dictionary containing a key
`username` will trust an user, `/api/gallery/<id>/users/remove` will do the
reverse.
templates/default.html
@@ -14,6 +14,7 @@
<li><a href="/">Home</a></li>
<li><a href="/object/">Object list</a></li>
<li><a href="/query-pictures">Query</a></li>
<li><a href="/info/">Help</a></li>
{% if session.username %}
<li><a href="/upload">Upload</a></li>
{% endif %}
@@ -43,6 +44,7 @@
<li><a href="/">Home</a></li>
<li><a href="/object/">Object list</a></li>
<li><a href="/query-pictures">Query</a></li>
<li><a href="/info/">Help</a></li>
{% if session.username %}
<li><a href="/upload">Upload</a></li>
{% endif %}
templates/help.html
@@ -0,0 +1,6 @@
{% extends "default.html" %}
{% block content %}
<x-frame style="--width: 768px">
{{ content | safe }}
</x-frame>
{% endblock %}
templates/home.html
@@ -2,10 +2,16 @@
{% block title %}Home | gigadata{% endblock %}
{% block content %}
<x-frame style="--width: 768px">
<h1>Gigadata</h1>
<h1>Gigadata (placeholder name)</h1>
<p>
Free image data for machine learning, computer vision, data science, research, and more.
Free/libre/open image data for machine learning, computer vision, data science, research, and more. <b>This is a test server.</b>
</p>
<p>Some actions you can do:</p>
<ul>
<li><a href="/upload">Upload a picture (but you must log in first)</a></li>
<li><a href="/object/">List of objects</a></li>
<li><a href="/query-pictures">Query the dataset</a></li>
</ul>
<h2>Random pictures</h2>
<ul class="thumbnail-list">
{% for resource in resources %}