How Mascots Work

Understanding the data model behind Masko's API.

The Hierarchy

Every mascot in Masko follows a four-level hierarchy. Projects group your work, collections define characters, items represent poses or actions, and assets are the actual files.

Project
  └── Collection (= one mascot character)
        ├── Item: "Wave"
        │     ├── Asset: image (pose.png)
        │     ├── Asset: transparent_image (pose_nobg.png)
        │     ├── Asset: video (wave.mp4)
        │     ├── Asset: webm (wave.webm)
        │     └── Asset: hevc (wave.mov)
        ├── Item: "Idle"
        │     └── ...
        └── Item: "Thumbs Up"
              └── ...

Collections (= Mascots)

A collection represents a single mascot character. In the API, mascots are stored as collections with type: "mascot". The CLI and MCP server use the friendlier term "mascot" (masko mascots create, create_mascot), but they call the same API endpoints (/v1/collections).

Each collection holds the character's prompt (the text description used for generation), reference images (up to 6 examples of what the character looks like), and a style card (an auto-extracted summary of the character's visual traits).

Collections also store settings like animation sizes, CDN configuration, and the caution list used to maintain consistency across generations.

Items

An item is a single pose or action for the mascot - like "Wave", "Idle", or "Thumbs Up". Each item has a name, a prompt describing the action, and a type (image, animation, or logo).

When you generate an image for an item, the API combines the collection's character prompt with the item's action prompt to produce a consistent result.

Assets

An asset is a single generated file. Each item can have multiple assets of different types:

Type	Format	Description
`image`	.png	Original generated image with background
`transparent_image`	.png	Background removed, transparent PNG
`video`	.mp4	Animated version (H.264)
`webm`	.webm	Web-optimized format with alpha channel
`hevc`	.mov	Apple-compatible format with alpha channel
`stacked_video`	.mp4	Stacked layout for custom alpha compositing

Generation Graph

Assets are connected through a generation graph. When you generate an animation, the API first creates an image, then removes the background, then animates it, then converts to web formats. Each step links back to its source via the generation_links table.

image (.png)
  └── transparent_image (.png)    [role: source]
        └── video (.mp4)          [role: source, end_frame]
              ├── webm (.webm)    [role: source]
              └── hevc (.mov)     [role: source]

The role field on each link indicates the relationship. source means "this asset was derived from that asset". end_frame is used for animations where a final pose image guides the motion.

Reference Images & Style Cards

Each collection can have up to 6 reference images. These are examples of what the mascot looks like - they guide every generation to maintain visual consistency.

When you first generate an image, Masko automatically extracts a style card from the references. The style card is a structured summary of the character's visual traits (colors, proportions, line style, shading) that gets injected into every prompt. If you change the references, the style card is cleared and re-extracted on the next generation.

The caution list accumulates notes from post-generation validation. If a generated image drifts from the style (wrong color, missing detail), the issue is logged and injected into future prompts to prevent recurrence.

Size Variants

Animations can be generated at multiple sizes simultaneously. Set settings.animation_sizes when creating a collection, or update publish_params.animation_sizes later, to define numeric pixel sizes such as [720, 480, 360, 240]. Resizing is free - you only pay credits for the base animation generation.