Web platform for sharing free data for ML and research

By using this site, you agree to have cookies stored on your device, strictly for functional purposes, such as storing your session and preferences.

Dismiss

Better docs

roundabout,
created on Sunday, 8 September 2024, 10:58:30 (1725793110), received on Sunday, 8 September 2024, 10:58:33 (1725793113)
Author identity: vlad <vlad.muntoiu@gmail.com>

138551222fdcdd2a3e86ca8278273ff04d4ab306

formats.md

@@ -66,7 +66,49 @@ Query format

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            The query format is based on YAML and used to query for pictures in the system.
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        ### Structure
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        The root can have 3 keys:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `want`: A list of rules that the images must satisfy. If not provided, no
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          filtering is done.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `exclude`: A list of rules that the images must not satisfy. If not provided,
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          no filtering is done.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `include_obsolete`: If true the query may return images with a designated
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          replacement. If false (default) this won't be possible.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        `want` and `exclude` are lists of rules. Each rule is a dictionary with a single
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        key (this is to allow multiple rules of the same kind). Accepted rules are:
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `has_object: [object1, object2, ...]`: The image must contain any of the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          objects in the list.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `nature: [nature1, nature2, ...]`: The image must have one of the natures in
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the list. Natures are strings like "photo" that indicate the source of the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          image.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `licence: [licence1, licence2, ...]`: The image must have one of the licences
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          in the list. If possible, licence IDs are SPDX identifiers, non-standard ones
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          are prefixed with `X-`.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `author: [author1, author2, ...]`: The image's author's username must be in
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the list.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `title: query`: Search for titles (`ilike`).
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `description: query`: Search for descriptions (`ilike`).
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `origin_url: query`: Search for origin URLs. The query matches the beginning
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          of the URL excluding the protocol. (like `commons.wikimedia.org`)
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `above_width: width`: The image must have a width greater than or equal to
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `above_height: height`: The image must have a height greater than or equal to
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          the given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `below_width: width`: The image must have a width less than or equal to the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `below_height: height`: The image must have a height less than or equal to the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          given value, in pixels.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `before_date: timestamp`: The image must have been uploaded before the given
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          Unix timestamp.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        * `after_date: timestamp`: The image must have been uploaded after the given
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                          Unix timestamp.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        `ordering`, `offset` and `limit` can be specified as query parameters in the
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        URL. `ordering` can be one of `date-desc`, `date-asc`, `title-asc`, `title-desc`,
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        `number-regions-desc`, `number-regions-asc`, `random`. `offset` and `limit` are
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        integers that specify the number of images to skip and the maximum number of
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        images to return, respectively.
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ### Example
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ~~~yaml
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -82,13 +124,13 @@ want:

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # a flower
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # The following rule restricts the images to those with a certain source,
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # like a camera or a drawing; omitting this rule means that the images can
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            # be of any type
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            - nature: ["photo", "drawing"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            # be of any source
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            - nature: ["photo", "computer-3d-art"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # The following rule restricts the images to those with a certain licence
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                - licence: ["CC-BY-1.0", "CC-BY-2.0", "CC-BY-3.0", "CC-BY-4.0", "CC0-1.0",
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                            "Unlicense", "WTFPL", "MIT", "BSD-2-Clause", "BSD-3-Clause",
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "Apache-2.0", "Informal-attribution", "Informal-do-anything",
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "Public-domain-old", "Public-domain"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "Apache-2.0", "X-informal-attribution", "X-informal-do-anything",
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                                        "X-public-domain-old", "X-public-domain"]
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # Prohibitions for queried images
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            exclude:
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                # This means that the image must not contain any of the objects in the list
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        

@@ -100,6 +142,6 @@ exclude:

                                
                                
                                
                            
                                
                                    
                                        
                                            
                                                - below_height: 600
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # In summary, we want images that contain both a cat and a dog, either a grass
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # or a flower, but not a human, taken after 2019-01-01, must be a photo or a
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        # drawing, must carry one of certain permissive licences and have a resolution
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                        # 3D render, must carry one of certain permissive licences and have a resolution
                                        
                                        
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            # of at least 800x600 pixels.
                                        
                                        
                                            
                                            
                                            
                                            
                                        
                                    
                                
                                
                                
                            
                                
                                    
                                        
                                            
                                            ~~~