Mcdoc is a schema format for describing data structures used by Minecraft, including its CODECs, JSONs, and NBTs.
This document defines the syntax and semantics of the mcdoc format.
Project root
Normally, the workspace (the directory where the mcdoc interpreter operates: for command line tools, this could be the working directory; for code editors like VS Code, this could be the root directory shown in the sidebar explorer) is considered as the root of a mcdoc project.
If, however, there exists a folder named mcdoc
directly under the workspace and all mcdoc
files inside the workspace are stored under that direcotry, it will be considered as the root instead.
Syntax syntax
Here is the syntax used by this document to describe the syntax of mcdoc — syntax syntax, if you wish.
Symbol | Meaning |
---|---|
| Literal |
U+ | Unicode character with the code point |
A* | A repeated zero or more times |
A+ | A repeated one or more times |
A? | A repeated zero or one times |
A | B | Either A or B |
[ | One of the literals |
[ | Any literal from |
(A) | General grouping |
notA | Anything not A |
Aexcept: B | A except B |
Alookahead: B | A followed by B, but only consumes A |
Anochild: B | A but B should not be a child of it |
A referenced token rule | |
A referenced parser rule |
A token rule syntax cannot have any whitespaces (spaces, tabs, CRs, or LFs) or tokens in between the individual parts.
A parser rule syntax can have whitespaces and COMMENTS in between.
All syntax rules should be greedy (i.e. consume as many characters as possible).
Comments
Comments can be used in mcdoc to write information that can be seen only by other users viewing/editing your mcdoc files. They are ignored by a mcdoc interpreter.
To write a comment, simply put down two forward slashes (//
) — everything following them, until the end of the line, is treated as part of the comment. They can be put anywhere where a whitespace is allowed. Comments, however, cannot start with triple slashes (///
), as that’s reserved for Doc comments.
// This is a comment.
struct Foo {
Bar: boolean, // This is another one.
}
Doc comments
Doc comments are similar to comments syntax-wise — they start with triple slashes (///
) instead. A block of doc comments can provide human-readable documentation for the component right after it to users of your mcdoc files. Unlike regular comments, doc comments can only be put in front of enum definitions, enum fields, struct definitions, struct fields, and type aliases, as part of [prelim]s.
The text content of a doc comment block should be treated as a MarkDown content, with the leading triple slashes (and up to one leading space after the slashes if all lines within the block share that one leading space) stripped.
/// This doc comment describes the struct Foo.
/// External tools, like VS Code, may show this block of text when the user hovers over the name "Foo".
struct Foo {
/// This is another doc comment describing the field "Bar".
Bar: boolean, // This is just a regular comment because it only starts with two slashes.
}
As the content of a doc comment block is treated as MarkDown, certain characters might have special meaning. For example, if you write <foo> inside the doc comment, it might disappear when being shown to a user, as it may get interperted as an XML tag by a MarkDown parser. Escaping those special characters with a backslash (\ ) (e.g. \<foo> ) will fix this. |
Float
A float represents a decimal number. Scientific notation may be used with the letter e
(case-insensitive).
1
+1.2
-1.2e3 // -1.2×103
Typed Number
A typed number is similar to a number used in SNBTs syntax-wise. It’s a normal number followed by a suffix indicating its type:
Suffix (case-insensitive) | Type |
---|---|
| Byte |
| Short |
| Long |
| Float |
| Double |
(No suffix, integer) | Integer |
(No suffix, decimal) | Double |
1b // Byte 1
1 // Integer 1
1.2 // Double 1.2
1.2d // Double 1.2
1.2e1f // Float 12
Number range
A number range represents a range of number. Its syntax derives from number ranges used in Minecraft commands, with additional support for signaling an exclusive end using the strictly less than symbol (<
). There are two types of ranges in mcdoc: float ranges, which consist of Floats, and integer ranges, which consists of Integers.
1 // Exactly 1
1..1 // Exactly 1
1..2 // Between 1 and 2 (inclusive on ends)
1<..<2 // Between 1 and 2 (exclusive on ends)
4.2.. // Greater than or equal to 4.2
4.2<.. // Greater than 4.2
..9.1 // Smaller than or equal to 9.1
..<9.1 // Smaller than 9.1
String
A string represents a sequence of characters. It must be surrounded by double quotation marks ("
). Certain characters need to be escaped by a backslash (\
).
Escape sequence | Meaning |
---|---|
| A double quotation mark ( |
| A backslash ( |
| A backspace (Unicode |
| A form feed (Unicode |
| A newline (Unicode |
| A carriage return (Unicode |
| A tab (Unicode |
"foo" // A string representing foo
"bar\"qux\\baz" // A string representing bar"qux\baz
Resource location
A resource location is similar to the resource location from Minecraft syntax-wise, except that a colon (:
) must exist to disambiguate this from an Identifier.
minecraft:foo
:foo // This also means minecraft:foo
, and is legal in Minecraft itself.
spyglassmc:bar
Identifier
An identifier is a case-sensitive name given to a type definition in mcdoc. It can contain any Unicode letters, numbers, and the underscore (_
), but must not start with a digit.
It also must not be named after a list of reserved words.
struct Foo { // Foo
is an identifier.
B_1: boolean, // B_1
is an identifier.
}
Path
A path is used to locate a type definition across the mcdoc project. A sequence of two colons (::
) is used as the path separater.
If a path starts with the path separater, it is an absolute path and will be resolved from the project root. Otherwise it is a relative path and will be resolved from the absolute path of the current file.
The absolute path of a file is determined by connecting the names of all its parent folders up until the root and the file’s own name (excluding the .mcdoc
file extension) with the path separater, prepended by the path separater, with a special case for files named mod.mcdoc
— they will not be part of their paths.
The absolute path of a type definition is the absolute path of the file where it resides joined with the identifier of the type definition by the path separater.
If multiple files/type definitions ended up having the same path, only the earliest loaded one will take effect; all subsequent ones should be warned and ignored by the mcdoc interpreter.
For relative paths, the keyword super
may be used to move up one level from the current absolute path.
/
foo.mcdoc (1)
foo/
bar.mcdoc (2)
mod.mcdoc (3)
qux.mcdoc (4)
1 | The absolute path of this file is ::foo . |
2 | The absolute path of this file is ::foo::bar . |
3 | The absolute path of this file is ::foo instead of ::foo::mod , as files named mod.mcdoc are special. This has the same path as <1>, and as <1> is shallower in the file structure, it is loaded first, meaning <3> is ignored in favor of <1> and a warning should be given. |
4 | The absolute path of this file is ::qux . |
If the content of /foo/bar.mcdoc
is
struct Foo {} (1)
type Bar = super::super::qux::Something (2)
1 | The absolute path for struct Foo is ::foo::bar::Foo |
2 | The absolute path for type alias Bar is ::foo::bar::Bar .The relative path is interpreted as follows:
|
Type
A type is an essential component of the mcdoc format. It defines a schema that actual data values must fit in to be valid.
Mcdoc may be used to describe the format of a wide range of data. This section will only provide some JSON data as examples for each type. |
any
type
The any
type serves as the top type of mcdoc’s type system. Any another types, including any
itself, are assignable to any
. any
cannot be assigned to any other types other than any
.
any
typenull
true
[0, 1, 2, 3]
{ "foo": "bar" }
boolean
type
The boolean
type indicates a boolean value (false
or true
) is expected.
boolean
typefalse
true
string
type
The string
type indicates a string value is expected. The optional range defines the range of the length of the string.
string
type"foo"
"bar"
Literal boolean type
A literal boolean type is one of the two boolean values (false
and true
) that the data must match to be valid.
false
true
Literal string type
A literal string type is a string value the data must match literally to be valid.
""
"foo"
Literal number type
A literal number type includes a numeric value and a type the data must match literally to be valid.
-1
1.2f
42L
Numeric type
A numeric type indicates the data must be of that type to be valid. If the optional range is provided, then the data must also fit into that range.
byte
short@1..
float @ 4.2..9.1
Primitive array type
A primitive array type indicates the data must be a collection of certain numeric values. The first optional range defines the range the value must be in, while the second optional range defines the range of the size of the collection.
byte[] // A collection of bytes.
byte#0..1[] // A collection of bytes 0 or 1.
int[] # 4 // A collection of 4 integers.
long#0..[] # 3.. // A collection of 3 or more non-negative longs.
List type
A list type indicates the data must be a collection of a certain other type. The optional range defines the range of the size of the collection.
[byte] // A collection of bytes.
[[string]] // A collection of collections of strings.
[struct Foo {}] // A collection of structs.
Unlike NBT, JSON doesn’t distinguish between primitive arrays and lists — it only has an array type. Therefore, byte[] and [byte] means essentially the same thing for JSON validation. |
Tuple type
A tuple type indicates the data must be a collection of certain other types arranged in a specified order.
To distinguish a tuple type containing only one element from a list type, a trailing comma (,
) needs to be added after the type. Alternatively, you can also use a list type with size 1
to represent a tuple with one element (e.g. [byte] @ 1
).
[byte,] // A tuple of a byte.
[string, boolean] // A tuple of a string followed by a boolean.
Tuple types are generally not useful for NBT structures, as NBT doesn’t have collections of mixed types. |
Struct
A struct defines the schema of a dictionary-like structure consisting of key-value pairs, like a JSON object or an NBT compound tag. If a key is duplicated, the type of the later one will override that of the former one. A question mark (?
) can be added between the key and the colon (:
) to indicate an optional field.
struct Tag {
replace?: boolean,
values: [string],
}
The spread operator (three dots, ...
) followed by a struct type can be used to reuse fields from another struct.
struct Player {
...Mob, // Reuse fields from the Mob
struct here.
abilities: Abilities,
CustomName: (), // Overrides CustomName
from the Mob
struct to an empty union.
}
Although type parameters are not directly allowed in struct definitions, you can inline a struct on the right hand side of a type alias definition.
type Tag<V> = struct {
replace?: boolean,
values: [V],
}
type BlockTag = Tag<#[id=block] string>
type EntityTypeTag = Tag<#[id=entity_type] string>
type FunctionTag = Tag<#[id=function] string>
type ItemTag = Tag<#[id=item] string>
Indexing on a type
Indices can access a type from a dispatcher or get a field type from an existing struct, both statically (i.e. the user provides the key literally in the mcdoc file) and dynamically (i.e. the user specifies a way to get the key from the given data structure at runtime).
struct Foo {
id: string,
cow_data: minecraft:entity[cow], (1)
dynamic_entity_data: minecraft:entity[[id]], (2)
command: minecraft:block[command_block][Command], (3)
dynamic_memories: minecraft:entity[[id]][Brain][memories], (4)
}
1 | Static index on a dispatcher. |
2 | Dynamic index on a dispatcher. |
3 | Static index on a dispatcher, followed by a static index on a struct. |
4 | Dynamic index on a dispatcher, followed by two static indices on two structs. |
The default value used for all cases (including the two mutable special keys, %none
and %unknown
) is the fallback case.
%fallback
The %fallback
key can be used to access the fallback case of a dispatcher. It cannot be used on the left hand side of dispatch statements, as the fallback case is generated automatically and cannot be manually declared.
type AnyEntity = minecraft:entity[%fallback]
%none
The case corresponding to %none
is used when the accessor of a dynamic index gets no value at runtime.
struct RandomIntGenerator {
type?: ("uniform" | "binomial" | "constant"), (1)
...minecraft:random_int_generator[[type]], (2)
}
dispatch minecraft:random_int_generator[uniform, %none] to struct { min?: int, max?: int } (3)
1 | Note that type is defined as optional here. |
2 | The value of type at runtime is used as a dynamic index here. |
3 | The case corresponding to %none is dispatched to the struct here, so the random int generator can still get validated as a uniform generator properly when no value for type is provided at runtime. |
%unknown
The case corresponding to %unknown
is used when an unknown key is used to access the dispatcher.
dispatch minecraft:block[%unknown] to ()
%key
The %key
accessor key can be used to access the key where the current runtime value is.
struct DebugStick {
DebugProperty: struct {
[#[id=block] string]: mcdoc:block_state_name[[%key]], // Get the type of the block state names of the block stored in the key.
},
}
This struct can be used to validate the following data:
{
"DebugProperty": {
"minecraft:anvil": "facing",
"minecraft:oak_fence": "east"
}
}
%parent
The %parent
accessor key can be used to access the parent value of the current runtime value.
struct Item {
id: #[id=item] string,
tag: struct ItemTag {
BlockStateTag: mcdoc:block_item_states[[%parent.id]]
},
}
TODO
File Structure
An mcdoc is made of structs, enums, type alias statements, use statements, injections, and dispatch statements.
Type alias statement
A type alias can be created to refer to another complicated type for better code readability and reusability.
type Integer = (byte | short | int | long)
type Float = (float | double)
type Number = (Integer | Float)
Sometimes we may want to create different type definitions that have roughly the same structure and only differ in some small aspects. Instead of duplicating codes, we can create a "template" type alias with type parameters. The right-hand side of the type alias statement can then reference those type parameters, which will get replaced by actual types when the type alias is instantiated elsewhere.
type NumericRange<T> = ( (1)
T | (2)
[T, T] | (2)
struct { min: T, max: T } (2)
)
type FloatRange = NumericRange<float> (3)
type IntegerRange = NumericRange<int> (3)
type NaturalRange = NumericRange<int @ 0..> (3)
1 | The type parameter T is declared in the angle brackets. |
2 | The type parameter T can now be referenced on the right-hand side. |
3 | When the NumericRange type alias is referenced elsewhere, an actual type must be suplied for the type parameter. |
Binding type parameters
All path references are resolved by the rules described in Path, and type parameter references are no exceptions. When a type parameter is declared in a type alias statement, it is temporarily bound to the current module until the end of the statement. Therefore, just like other type definitions, type parameters should be unique at the module scope.
// File '/example.mcdoc'
struct T {}
type List<T> = [T] (1)
// ^
// WARNING: Duplicated declaration for "::example::T"
1 | The declaration for T is warned and ignored, and the reference of T on the right-hand side actually refers to the struct T defined above. |
type List<T> = [T]
type Struct<T> = struct { value: T } (1)
1 | This is fine, as although T is also declared in the List type alias statement, the effect of that declaration only lives until the end of that statement. |
Dispatch statement
A dispatcher can be used to dispatch to a specific type from a given index. Each case of a dispatcher can be declared by a DispatchStatement and accessed by a DispatcherType.
Dispatchers are named after Resource locations, so unlike other values in mcdoc that are named after Identifiers which require being imported before they can be used in an external file, dispatchers are inherently global and can be accessed anywhere inside an mcdoc project.
When an unknown index is used to access a dispatcher, a union consisting of all types registered under the dispatcher is generated as a fallback case at runtime. The union is marked with the "nonexhaustive" metadata.
TODO
Attribute
All following examples are syntactically legal under the current attribute proposal. Which ones should be semantically legal, however, is still under debate.
struct Foo {
#[id=item]
id1: string,
id2: #[id=item] string,
// id1 and id2 will likely both be supported and have equivalent effects.
blockStateValue1: (
#[serializable] string |
byte | short | int | long | float | double
),
#[serialize_to=string]
blockStateValue2: (string | byte | short | int | long | float | double),
evilUUID1: (
#[until("1.16", uuid_string_to_compound)] #[parser=uuid] string |
#[until("1.17", uuid_compound_to_array)] MostLeastCompound |
int[] @ 4
),
#[history{
(#[parser=uuid] string, until="1.16", updater=uuid_string_to_compound),
(MostLeastCompound, until="1.17", updater=uuid_compound_to_array),
}]
evilUUID2: int[] @ 4
}
Type instantiation
Type instantiation is the process of converting a user-defined type into a type that is easy for data validators to consume. A user-defined type can be categorized as follows for instantiation purposes:
- Indexed type
-
An indexed type.
- Self-contained type
-
A type where all information needed for data validators to function are contained inside the type itself. Includes
any
type,boolean
type,string
type, Literal boolean type, Literal string type, Literal number type, Numeric type, Primitive array type, and Enum. - Container type
-
A type that provides some information on its own, but needs information from its children for the validation to be complete. Includes List type, Tuple type, and Struct.
- Reference type
- Dispatcher type
- Union type
-
A Union type.
Different procedures are used to instantiate each category of user-defined types.
Instantiate indexed type
First instantiate the part without the indices, then resolve the index on the instantiated type. Repeat until all indices are resolved.
Instantiate container type
Container types do not need to be instantiated. Their children are instantiated when needed lazily.
Instantiate reference type
Dereference the path.
If there are type parameters, replace all occurrences of them in the template type with the provided actual types. The resulted type is then instantiated again following the instantiation rules.
Instantiate dispatcher type
Dispatch the type. The resulted type is then instantiated again following the instantiation rules.
Aftermath of instantiation
After a type is instantiated following the above rules, it should be simplified before being returned.
Type simplification
TODO
To simplify a union type, any members that can be assigned to another member will be removed from the union.
Shadowed types
TODO
Although simplifying (string | "foo" | "bar")
into string
is sound, we lose some more specific information about the original type that could be used by processors like auto completers. Therefore, for certain special cases, types that are trimmed during simplification may be accessible under the shadowedTypes
property of the simplified type.
Type assignability
Types in mcdoc can be think of as sets. Type A is assignable to type B if and only if A is a subset of B. any
is the universal set that contains all other types, and an empty union (()
) is the empty set. unsafe
(well, any
is TypeScript’s unknown
and unsafe
is TypeScript’s any
. A config rule will also be added to make any
equivalent to unsafe
that’s enabled by default so most users don’t have to deal with a tediously sound validation mechanism, as vanilla-mcdoc will probably use any
instead of unsafe
for marker’s data
, which would make it illegal to assign it anywhere else that’s not an any
or unsafe
under a sound type system. I will update the docs and code later to add the unsafe
type) is a monster that’s both any
and ()
.
TODO
TODO: Data validator hooks can contribute additional type assignability rules. e.g.
-
For JSON:
byte = short = int = long = float = double
-
For NBT:
boolean = (byte @ 0..1) ⊂ byte
ajsdhflkajgslkthisissoharddkhdgklsjhiuyra
Branding
"Mcdoc" is a common noun and should only have its first letter capitalized when it’s grammatically required to (e.g. at the beginning of the sentence).
Credits
The mcdoc format takes heavy inspiration from the nbtdoc format created by Yurihaia, licensed under the MIT License. Misode, MulverineX, NeunEinser, and vdvman1 also have provided valuable feedback for the mcdoc format.
This documentation is written with AsciiDoc.