Web platform for sharing free image data for ML and research

Homepage: https://datasets.roundabout-host.com

Better docs

by roundabout, Sunday, 8 September 2024, 10:58:30 (1725793110), pushed by roundabout, Sunday, 8 September 2024, 10:58:33 (1725793113)

Author identity: vlad <vlad.muntoiu@gmail.com>

138551222fdcdd2a3e86ca8278273ff04d4ab306

formats.md

@@ -66,7 +66,49 @@ Query format

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            The query format is based on YAML and used to query for pictures in the system.
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        ### Structure
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        The root can have 3 keys:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `want`: A list of rules that the images must satisfy. If not provided, no
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          filtering is done.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `exclude`: A list of rules that the images must not satisfy. If not provided,
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          no filtering is done.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `include_obsolete`: If true the query may return images with a designated
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          replacement. If false (default) this won't be possible.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        `want` and `exclude` are lists of rules. Each rule is a dictionary with a single
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        key (this is to allow multiple rules of the same kind). Accepted rules are:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `has_object: [object1, object2, ...]`: The image must contain any of the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          objects in the list.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `nature: [nature1, nature2, ...]`: The image must have one of the natures in
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the list. Natures are strings like "photo" that indicate the source of the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          image.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `licence: [licence1, licence2, ...]`: The image must have one of the licences
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          in the list. If possible, licence IDs are SPDX identifiers, non-standard ones
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          are prefixed with `X-`.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `author: [author1, author2, ...]`: The image's author's username must be in
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the list.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `title: query`: Search for titles (`ilike`).
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `description: query`: Search for descriptions (`ilike`).
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `origin_url: query`: Search for origin URLs. The query matches the beginning
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          of the URL excluding the protocol. (like `commons.wikimedia.org`)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `above_width: width`: The image must have a width greater than or equal to
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `above_height: height`: The image must have a height greater than or equal to
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `below_width: width`: The image must have a width less than or equal to the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `below_height: height`: The image must have a height less than or equal to the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `before_date: timestamp`: The image must have been uploaded before the given
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          Unix timestamp.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `after_date: timestamp`: The image must have been uploaded after the given
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          Unix timestamp.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        `ordering`, `offset` and `limit` can be specified as query parameters in the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        URL. `ordering` can be one of `date-desc`, `date-asc`, `title-asc`, `title-desc`,
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        `number-regions-desc`, `number-regions-asc`, `random`. `offset` and `limit` are
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        integers that specify the number of images to skip and the maximum number of
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        images to return, respectively.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ### Example
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ~~~yaml
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -82,13 +124,13 @@ want:

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # a flower
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # The following rule restricts the images to those with a certain source,
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # like a camera or a drawing; omitting this rule means that the images can
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            # be of any type
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            - nature: ["photo", "drawing"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            # be of any source
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            - nature: ["photo", "computer-3d-art"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # The following rule restricts the images to those with a certain licence
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                - licence: ["CC-BY-1.0", "CC-BY-2.0", "CC-BY-3.0", "CC-BY-4.0", "CC0-1.0",
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            "Unlicense", "WTFPL", "MIT", "BSD-2-Clause", "BSD-3-Clause",
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "Apache-2.0", "Informal-attribution", "Informal-do-anything",
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "Public-domain-old", "Public-domain"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "Apache-2.0", "X-informal-attribution", "X-informal-do-anything",
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "X-public-domain-old", "X-public-domain"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # Prohibitions for queried images
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            exclude:
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # This means that the image must not contain any of the objects in the list
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -100,6 +142,6 @@ exclude:

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                - below_height: 600
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # In summary, we want images that contain both a cat and a dog, either a grass
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # or a flower, but not a human, taken after 2019-01-01, must be a photo or a
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        # drawing, must carry one of certain permissive licences and have a resolution
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        # 3D render, must carry one of certain permissive licences and have a resolution
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # of at least 800x600 pixels.
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ~~~