The C-UP Programming Language


Version 0.3, June 2013





Introduction. 8

Getting Started. 9

Compiling. 9

Debugging. 12

Extracting source. 13

Modules. 14

Imports. 14

Alias Imports. 15

Static Initialisation. 15

Static Data. 15

Static Code. 16

Unit Tests. 16

Types. 18

Value Types. 18

Properties. 18

Vectors. 19

Matrices. 20

Quaternions. 21

Classes. 22

Structs. 23

Local-only Structs. 23

Pointers. 24

Const Pointers. 24

Local Pointers. 25

Arrays. 26

Accessing elements. 26

Bounds checking. 27

Slicing. 27

2D Arrays. 27

Local Arrays. 28

Value Arrays. 28

Params arrays. 29

Aliasing. 29

Strings. 30

Short strings. 32

Escape sequences. 32

Formatting. 32

Enumerations. 33

Unions. 33

Bit Structs. 34

Construction. 35

Aliases. 35

Type inference. 36

Functions. 37

Parameters. 37

Aliasing. 37

Local variables. 38

Function Overloading. 39

Overload Resolution. 40

Member Functions. 40

Virtual Member Functions. 41

Abstract Member Functions. 41

Constructors. 42

Destructors. 43

Multiple Virtual Dispatch. 43

Nested Functions. 44

Delegates. 45

Anonymous Delegates. 46

Operators. 46

Assignment. 47

Properties. 47

Indexers. 48

Intrinsic. 48

Mathematical 48

Vector. 49

Bits. 50

Endian-ness. 50

Processor. 50

Memory. 51

Entry Point. 55

Attributes. 55

Linking. 55

Generics. 56

Classes. 56

Functions. 57

Delegates. 57

Mixins. 58

Mixins as generic arguments. 58

Expressions as generic arguments. 59

Data. 61

Variable modifiers. 61

Static. 61

Const. 61

Readonly. 61

Memory Heaps. 62

Class Heaps. 62

Heap Constraints. 63

Heap management. 63

Alignment. 65

Class Alignment. 66

Member Alignment. 66

Reference Alignment. 66

Weak References. 67

Garbage Collection. 67

Statements. 69

For. 69

Switch. 69

Foreach / foreachr. 70

Using. 71

Parallel 72

Expressions. 72

Arithmetic. 72

Multiplication. 73

Saturating Arithmetic. 74

Operator Precedence. 75

Increment / decrement. 75

New / delete / local 76

Initialisation. 77

Cast. 81

As. 82

Sizeof. 83

Alignof. 83

Symbolof. 83

Typeof. 83

Base. 83

Protection levels. 85

Parallelism.. 86

Parallel functions. 86

Parallel execution. 87

Nested Parallel Execution. 88

Constant data. 88

Typedef. 89

Strings. 90

Statics. 90

Aliasing. 90

Const constrained functions. 90

Parallel variables. 91

Shared memory. 91

Heap. 92

List. 92

Streams. 92

Debugging / Profiling. 93

Exceptions. 95

Handling Exceptions. 95

User Exceptions. 95

Language Exceptions. 96

Processor Exceptions. 96

Exceptions in Parallel Code. 97

Exceptions in Static Initialisers. 97

Call Stack. 97

Attributes. 98

Special attributes. 98

Reflection. 100

Finding a Symbol by Name. 101

Creating an Instance. 101

Invoking a function. 102

Getting or Setting a Variable. 102

Enumerations. 103

Delegates. 104

Attributes. 104

Resources. 105

Paths. 105

Dynamic Loading. 106

Asynchrony. 106

Tasks. 107

Streams. 108

Copying Streams. 109

Loading resources. 109

Copying memory. 110

Dependencies. 110

Errors. 111

Threads. 111

Fibers. 113

Co-routines. 113

Managing fibers. 113

Construction. 114

Members. 114

Fibers and Parallel Code. 116

Root Fibers. 116

Programs. 117

Program Pipes. 118

Pre-processor. 120

Libraries. 121

System.Collections. 121

List<TYPE>. 121

SortedList<TYPE>. 122


Queue<ELEMENT_TYPE>. 123

Stack<ELEMENT_TYPE>. 123

System.IO.Stream.. 123

Stream.IO.TextStream.. 124

System.IO.Console. 124

System.IO.Input. 124

System.Array. 125

System.Geom.. 125

OpenGL / OpenCL / OpenAL. 126

System.Graphics. 127

Integration with C. 128

Calling conventions. 128

Thoughts. 129

Threads. 129

Streaming data from devices. 129

Running simulation tasks in parallel 130

GPU rendering while CPU does other things. 130

Garbage collection. 130

Interfaces. 131

Components. 131

Static virtual methods. 132

Automatic Variables. 133

To Do. 134

Gather/scatter. 134

Power of 2 arrays. 135

Reference counting. 135

Generic parameters. 136

Bags. 136

Fiber scheduling. 136

Exceptions. 136

This delegates. 136

Local string concatenation. 136

States. 137

Mixin identifiers. 137

Vectors. 137

Short references. 137

Tuples. 137

Resources. 138

Intrinsics. 138

Broken features. 138

Libraries. 138

Runtime. 138

Optimisation. 139





C-UP is a statically typed procedural programming language in the C style with these major features:


- parallel execution

- SIMD vector types up to 256 bits

- multi-methods, nested functions, return type overloading

- UTF-8 and UTF-16 strings

- powerful arrays with slicing

- object orientation

- generics: classes, functions, delegates, mixins

- type inference

- optional garbage collection

- efficient use of stack to avoid GC overhead

- multiple data heaps

- data alignment

- reflection

- exceptions

- attributes (program metadata)

- fibers

- embedded resources

- 64 bit


The language has the following design goals:


·         High performance. Encourage the use of SIMD, multi-core, stack memory and compact data representations with correct alignment.


·         Safety. Do everything possible to protect users from memory errors, concurrency errors, aliasing errors, etc. but without excessively impacting performance.


·         Platform independence. As far as possible aim for no undefined or platform specific behaviour or data sizes.


·         Pragmatism. If placing restrictions on a feature makes the implementation much more efficient then do so. If a feature cannot be implemented efficiently or incurs hidden costs then don’t incorporate it.


·         The basics. Get them right so they aren’t endlessly re-implemented: strings, arrays, vectors, memory management, reflection, rtti, attributes (meta-data), collections, geometry, matrices, etc.



C-UP is intended to be used as an application programming language, meaning that the bulk of your application should be written using it. It specifically isn’t a systems programming language as it shuns things like threads, inline assembly, direct access to specific areas of memory and processor intrinsics, etc. In fact, you are actively encouraged to keep writing such code in C because C-UP makes it almost trivial to interface with C. Conversely, it also isn’t really designed as a scripting language as it’s big and complicated and presumes its runtime has control over threads, memory, file handles, etc.

Getting Started


In order to compile and run c-up programs you only need these 3 files:






Cuprt.exe contains the compiler internals and the runtime systems (garbage collector, parallel job runtime, debugger hooks, etc.) In the future a separate version without the compiler will be provided so you don’t have to pay the memory cost for it when it’s not required. Currently only a 32-bit implementation is available but a 64-bit version will be coming.


System.cuo is the c-up system libraries encapsulated into a single c-up object file (.cuo). In the future this will be split into separate libraries so you don’t have to pay the memory cost for parts you’re not using. The .cuo format is a proprietary binary file format.


When you launch cuprt.exe you must pass the name of a c-up executable (.cue) file as the first argument. The .cue extension can optionally be omitted. Any subsequent arguments are passed as arguments to the main entry point of the c-up executable. Arguments are strings separated by whitespace. If you need an argument to contain whitespace characters wrap the entire argument in quotes.


The cue file is a proprietary binary format. The code is stored in an intermediate language which is converted to native executable code at load time. JIT compilation isn’t a binding philosophical decision in c-up, it’s just how things are currently implemented because it’s easiest.




Although the main bulk of the compiler is built into cuprt.exe, the command line interface to it is implemented as a c-up program called “compile.cue”. Therefore to launch the c-up compiler from the command line type “cuprt compile” on the command line followed by any number of compiler arguments.


Arguments to the compiler take three forms:


-          Options all start with either the - or / character. Details of all available options can be found below.


-          Response files start with the @ character, followed by a file name which must be relative to the root directory. A response file is just a text file that contains any number of other compiler arguments separated by whitespace.


-          Any argument that does not start with one of the above characters is assumed to be the name of a source file to be passed to the compiler. If that file has the .cup extension then it’s treated as a c-up source file and is parsed and compiled as such. If it has any other extension then it’s a binary resource which is embedded in the output file without being changed in any way.


All source file names passed to the compiler are specified relative to the root directory. This root directory is the current working directory unless you override it using the –root option (see below)


Here’s a list of compiler options:



Specifies the root folder for source code and resource files. If not specified the current working directory is used. The folder given can be either absolute or relative to the current working directory.



Specifies that the output is a library as opposed to an executable. When a library is saved only the modules corresponding to input source files are written - imported libraries are not written. It’s an error for a main entry point to be found in the source code when this option is found.



Specifies that the output is an executable. When an executable is saved the entire symbol table is written including all libraries that have been imported. This means that c-up executables are entirely self-contained, requiring no additional libraries to be dynamically loaded. A single entry point must be found in the source code when this option is specified.



Specifies the output file name (a .cuo or .cue file.) If this is not specified then all compilation is performed but no output is written. The file name is relative to the current working directory or absolute – it is not relative to the root folder.



Import a library (.cuo file) into the program being compiled. The file name is relative to the current working directory or absolute – it is not relative to the root folder. You can import the required system.cuo library from any location, but if you don’t specify it then it’s automatically imported from the same location as the compile command.




Define a program wide pre-processor symbol and give it an optional value. Pre-processor symbols are always signed long integers – if you don’t specify a value then it’s given the value 1.





Disable the specified runtime checks all of which are enabled by default. Can be used to improve performance of final shipping builds.



Enable reference aliasing runtime checks. Unlike the tests above this is not enabled by default as it can be quite slow. Should be enabled for fully checked builds or automated tests.



Enable function inlining. You have no other control over inlining, it’s entirely at the compiler’s discretion.




Enable parallel execution of the resulting code.



Enable optimisations. Currently has no effect as optimisation is not implemented.



Should debugging information be embedded in the resulting file. Currently this just controls whether c-up source code is embedded in the resulting .cuo or .cue file.



Passed in when recompiling the system library.



Allow the use of the shared memory feature. See the section on parallelism for more details.



Run the resulting executable immediately if compilation succeeds. This allows you to compile and run in one command. It’s not necessary to write the executable to disk (although you can) as it’s executed directly from memory.




To compile the c-up file c:\projects\test\main.cup as an executable, writing out c:\projects\test\test.cue:


cuprt compile main.cup –out:test.cue


To execute the resulting exe:


cuprt test.cue






C-UP comes with a debugger, written in C-UP. It’s called debug.cue and it takes 2 command line arguments: the first is the name of the program to debug and the second is an optional root directory for the file system with the current directory being used if it’s omitted.


You invoke the debugger just like the compiler:


cuprt debug programToDebug.cue


The debugger is only simple at the moment but it’s intended that it will become a full IDE. Here’s a picture. For now though, text editing must be done in your favourite editor.



As you can see there’s no menu bar so you can’t open and close projects. This means you’re restricted to debugging the same exe (.cue) for an entire session but that’s typically how it’s done anyway.


All the files in the project are shown in their folder structure on the right, which includes all the system libraries (under ‘Runtime’). Double click a .cup file to open it in the main area. All source code is embedded in the .cue file so there’s no risk of seeing out of date source code or any awkward path mappings to set up. The system libraries have source code embedded too so you can browse them freely. If you don’t want source code embedded in your project then don’t pass the –debug option to the compiler.


Many of the buttons on the toolbar should be self-explanatory if you’ve ever used a computer before, but a couple are a bit mysterious.


This  is the re-load button which you need to click after you recompile your project in order to re-load it into the debugger. All open source windows will refresh automatically. You can only reload when the program is stopped.


This  button toggles the interleaved disassembly view for the current file, so you can see what code the compiler has generated (and where it’s got it wrong.) You don’t have to have the program running to toggle this view.


This group of buttons control debugging  and are:


1.       Start debugging (F5)

2.       Pause debugging

3.       Stop debugging (Shift+F5)

4.       Step in (F11)

5.       Step out (Shift+F11)

6.       Step over (F10)


Breakpoints can be toggled using F9, clicking in the margin at the left of the source code or by using the Breakpoints window. You can navigate to breakpoints by double clicking them in the Breakpoints window.


Ctrl+G lets you goto a line number in the current document.


Ctrl+F lets you find a string in the current document. F3 finds the next occurance, Shift+F3 the previous.



Extracting source


Source code is embedded in c-up libraries and executables (if they were compiled with the –debug option.) This is advantageous because there’s no need to distribute multiple files and no possibility of source files being out of sync with the executable code – when you see the source you know for certain it’s exactly what was used to generate that exe.


But it is convenient to be able to extract the source code to separate files for browsing and that’s what the extract command does. To invoke extract use:


cuprt extract library-or-exe-name root-destination-folder





A C-UP program is arranged in modules, where each module is represented by a single source file. Source files must be UTF-8 or ASCII.


Modules exist inside packages, which are used to arrange related modules in a hierarchy and typically correspond to directories in the host file system.


Each source file should begin with a module declaration.


module Physics.Collision.Box;


The above declares a module called Box in the Collision package, which is itself in the Physics package.


If the module declaration is omitted then it is inferred from the path of the module source file back up to the root folder passed to the compiler, where the file name minus extension is the module name and directory names are package names. See the section on the compiler for details about the root folder setting. Because identifiers in C-UP are case sensitive and some file systems are not, it’s recommended to use explicit module naming rather than relying on this mechanism.


The use of modules means that that there are no separate header and implementation files to maintain. Each file is compiled exactly once, and the declaration and use of program components is order independent.


Modules and packages create a hierarchy of namespaces. To access symbols in a namespace from outside of that namespace you must use the fully qualified name of the symbol. Let’s say you have a class BoxShape defined in the above module. To access this class from another module you’d have to type something like:


module Physics.Collision.Sphere;


Physics.Collision.Box.BoxShape boxShape; // fully qualified BoxShape name




Typing in fully qualified names is time consuming and verbose and is usually best avoided by using imports.


The namespace of a particular module can be imported into another module using an import declaration. Any number of modules can be imported but all import statements must directly follow the module declaration at the top of a source file.


Symbols in imported modules can be used without qualification.


module Physics.Collision.Sphere;


import Physics.Collision.Box;


BoxShape boxShape;         // no need to qualify the BoxShape name


If two symbols called BoxShape are found, then the compiler reports that there is an ambiguity. You can resolve the ambiguity by resorting to a fully qualified name. Note that in this regard the order of imports is irrelevant; all imports in the same module are of equal importance so changing their order does not affect the ambiguity. However, order of imports does affect the order of static initialisation (see Static Initialisation section.)


Imports are private by default meaning that a module importing another module will not have unqualified access to symbols imported by the imported module. However, this can be allowed by prefixing the import declaration with the public access modifier.


module Physics.Collision.Sphere;


public import Physics.Collision.Box;


The above makes all symbols defined in Physics.Collision.Box visible to any module importing Physics.Collision.Sphere. In general this isn’t a recommended practice as it increases the likelihood of name clashes (and slows down compilation), but it can be useful when a module depends on another module so fundamentally that they cannot really be used without each other.


Alias Imports


In the above scenario where you want to import 2 modules but that causes name clashes, there is an alternative to just falling back to using fully qualified names. Alias imports allow you to assign a unique identifier to a particular import, and you can then use this identifier to qualify access to members of this module.


import cdBox = Physics.Collision.Box;

import grBox = Renderer.Bounds.Box;


cdBox.BoxShape boxShape;

grBox.BoxShape boxShape2;



Static Initialisation


When a module is loaded by the runtime system, it is possible to initialise module scope/static data and to run module scope/static code once only.


The user is afforded some control over the order of module initialisation as follows.


1.       Processing starts with the module containing the program entry point.

2.       When a module starts initialising, all modules it imports are initialised first, in the order of the import declarations.

3.       Once imported modules are initialised, static initialisation for the importing module is performed in lexical order (see below).

4.       While there are any remaining modules that require initialisation one is selected arbitrarily and initialised (step (2)).

5.       Each module is initialised exactly once, so circular dependencies are not an issue.


Static Data


Static data comes in the form of initialisers for static or module scope variables. Such initialisers can rely upon other static or module scope data. Initialisers are run in lexical order and no checking is performed on that order so if you reference data defined later in a module (or in another module that you haven’t explicitly imported) it might have the initial value of zero.


static int MyStaticInt = 10;

static float MyStaticFloat = cast(float)MyStaticInt * 0.5f;


The above declarations could appear at any scope in the program (including local to a function) and will still be executed once at module load time. In particular, local statics are not handled the same way as C (where initialisation happens the first time execution reaches that line) although that behaviour can be manually implemented.


Static Code


Static code is any code appearing in a block ({ } pair) preceded by the static keyword. Like module data it is always called once at module initialisation time in lexical order regardless of its scope. It can appear at module, class or local scopes.


Module StaticInitTest;


static int Number = 100;




    Console.WriteLine(“Initialising module StaticInitTest”);

    Console.WriteLine(Number);    // Writes 100

    Console.WriteLine(Number2);   // Writes 0, Number2 initialiser is not reached yet



static int Number2 = 999;


Class TestClass


    int Number3 = 10;             // instance member




        // non-static local variables can be used but only if declared in this static scope

        for (int i = 0; i < 1; i++)


            Console.WriteLine(Number);   // Writes 101

            Console.WriteLine(Number2);  // Writes 999


            Number3++;            // error, can’t access instance value in static code




    void InstanceMethod()




            Console.WriteLine(Number);   // Writes 102



        // do some non-static work





Unit Tests


Static code blocks in concert with the built in pre-processor can be used to implement unit tests. You need to decide on a pre-processor symbol used to enable unit tests – I suggest UNIT_TEST - and then do this.





    // Assert is just a function in the system library that throws an AssertFailedException

    // if the condition is not met

    Assert(0 == 0);








C-UP is a strongly typed language. All data is naturally aligned at a minimum, as this is what most modern CPU architectures require for optimal performance.


Value Types


Built in value types all have a designated bit size. There are no machine word, 32/64-bit or implementation dependent type sizes as this makes it impossible to write properly portable code without the use of aliases.


void                       - special type meaning no type (or any type when pointed to)


sbyte, byte         - 8 bit signed and unsigned integers

short, ushort      - 16 bit signed and unsigned integers

int, uint                                - 32 bit signed and unsigned integers

long, ulong          - 64 bit signed and unsigned integers


char                       - utf-8 character (8 bit)

wchar                    - utf-16 character (wide character) (16 bit)

dchar                     - utf-32 character (32 bit)


bool                       - 8 bit integer with value 0 or 1 (false or true)


float                       - 32 bit floating point type

double                  - 64 bit floating point type

half                        - 16 bit floating point type (only a storage format, operations are on 32 bits)


Integer types are always represented using two’s complement form.




The built in number types all have implicitly defined MinValue and MaxValue constant properties, which give the minimum and maximum value representable by this type.


short x = short.MaxValue;  // 32767


Additionally, the floating point types have the following constant properties:


Epsilon – the smallest representable positive number

Pi – a famous mathematical constant

E – another famous mathematical constant


There are no degrees to radians conversion constants, to convert degrees to radians and vice-versa use the intrinsic functions Degrees(x) and Radians(x).







SIMD processors are nearly ubiquitous at the time of writing (late 2012). C-UP has comprehensive support for SIMD types in the style of modern GPU shader languages, including full support for masked writes and swizzling.


Vectors of 1 to 4 components are fully supported. Additionally, there is limited support for vectors of up to a maximum of either 16 components or 32 bytes in size. Here is the list of fully supported vector types:


byte1, byte2, byte3, byte4

ubyte1, ubyte2, ubyte3, byte4

short1, short2, short3, short4

ushort1, ushort2, ushort3, ushort4

int1, int2, int3, int4

uint1, uint2, uint3, uint4

half1, half2, half3, half4

float1, float2, float3, float4

double1, double2, double3, double4


The ‘half’ floating point type is a storage format only. Conversion to and from full float precision happens upon load and store. This conversion is quite slow on processors that don’t have hardware support for it so these types should be used with caution. In particular, it’s not recommended to use the half type for function parameters or local variables. Rather, use the full float precision in these cases and only use the half type for persistent storage where memory is at a premium.


The alignment in memory of vector types is the size rounded up to the next power of 2. As is standard in C-UP this alignment only applies to the start address, not the size. This means that a float3 takes 12 bytes starting at a 16 byte aligned address, and the remaining 4 bytes of that address line are available to store other data.


There is also partial support for the following types, meaning that loading, storing and arithmetic are supported, but most intrinsic functions are not.


byte8, byte16

ubyte8, ubyte16

short8, short16

ushort8, ushort16






Swizzling or masking vectors with more than 4 components repeats the mask and/or swizzle for each group of 4 components.


Vectors work the same as they do in GPU shader languages. The 4 components of a vector are named x, y, z and w (or alternatively r, g, b and a) - you cannot mix the xyzw and rgba name sets. All arithmetic is component-wise. Various other operations are supported through the use of intrinsic functions (see Functions/Intrinsics section of this document).


// vector literals

float4 vec = float4(1.0f, 2.0f, 3.0f, 4.0f);

float3 vec3 = float3(0, 0, 1);


// for a literal, the final component is repeated

byte8 vec8 = byte8(0);           

byte8 vec8 = byte8();             // alternative way of making vector of all zeros


// construct vector from components

float x = 1.0f, y = 2.0f, z = 3.0f;

float4 vec2 = float4(x, y, z, -1.0f);

float4 vec4 = float4(vec3, 1.0f);

float4 vec4b = float3(x, y);             // gives x, y, y, y


// swizzle

float4 vec2 = vec.wzyx;

float4 vec3 = vec.z;              // vec.z is a float1, can’t implicitly widen so this is an error

float4 vec4 = vec.zzzz;    // this is the correct way to splat z


// mask writes

vec2.xy =;                       // gives x=z, y=w, z=z, w=w


// alternate rgba component set

float4 c = vec2.argb;


// arithmetic operations are component wise.

vec2 = vec2 + float4(0, -1, 2, 1);       // x+0, y-1, z+2, w+1

vec2 = vec3 / vec4;                      // x/x, y/y, z/z, w/w


// scalar types implicitly convert to vector types and vice versa

vec2 *= 2.0f;                            // x*=2, y*=2, z*=2, w*=2




Matrix types are not built into the language but are entirely based on vectors and are implemented in the System.Math package. The supported matrix types are:


float2x2 – 16 bytes, 8 byte align

float2x3 – 24 bytes, 8 byte align

float4x3 – 48 bytes, 16 byte align

float4x4 – 64 bytes, 16 byte align


double2x2 – 32 bytes, 16 byte align

double2x3 – 48 bytes, 16 byte align

double4x3 – 96 bytes, 16 byte align

double4x4 – 128 bytes, 16 byte align


Matrices support the multiply operator for performing vector * matrix, matrix * vector and matrix * matrix.


See the System/Maths/Matrix.cup file for a full list of capabilities. Here are some examples:

// get identity matrix using Identity property

float4x4 matrix = float4x4.Identity;


// matrix * vector

float4 f4a = float4(1.0f, 2.0f, 3.0f, 4.0f);

f4a = matrix * f4a;

// vector * matrix

f4a = f4a * matrix;


// matrix * matrix

matrix = matrix * matrix;


// transpose

matrix = Transpose(matrix);


// create scale, rotation, translation matrices using static functions    

float4x4 m44 = float4x4.Scale(float3(1.5f, 0.3f, 2.2f)) *

              float4x4.RotateX(Radians(50.0f)) *

              float4x4.Translate(float3(-100.0f, 12.0f, 50.6f));


// Inverse is full general (slow) inversion

// InverseOrthogonal is much faster if you know the matrix is orthogonal

float4x4 m44inv = Inverse(m44);


// matrix * its inverse = identity

m44 *= m44inv;            





Quaternions are entirely based on vectors and are implemented in the System.Maths.Quat module for float quaternions and QuatD for doubles.




Classes allow the user to create their own types. A class can optionally inherit from one other class – i.e. the class hierarchy in C-UP is strictly single inheritance. The ability to derive classes in conjunction with virtual functions is how the language supports polymorphism.


Constructors, destructors, operator overloads, indexers and properties are supported in classes. See the relevant sections in the Functions chapter for more information.


class Value


    // private member variable

    private int m_value;


    // constructors

    public this() {m_value = 0;}

    public this(int value) {m_value = value;}


    // virtual function

    public virtual int Value() const { return m_value; }

class ScaledValue : Value


    private int m_scale = 1;      // note. Initialiser allowed on member variable


    public this() : base() { }

    public this(int value, int scale) : base(value) { m_scale = scale; }


    public override int Value() const { return m_value * m_scale; }


Classes do not have a hidden virtual dispatch table pointer at the start – this is a guarantee of the language. The size of a class is exactly what you would expect it to be from the member variables you have explicitly declared. In the above example Value is 4 bytes and ScaledValue is 8. Virtual dispatch is achieved by embedding runtime type information in pointers (see Pointers section below.)


Note also that there is no un-necessary implicit rounding up of the size of classes for alignment, except when they are stored in an array. For example, if you make a class containing a double and a float this class will consume 12 bytes – it will not be rounded up to the 8 byte alignment required by double. If you then derive another class from this one and add a byte member variable, the total size of this class will be 13 bytes and so on and so forth. (Contrast this with C++ where the derived class would consume 32 bytes as the base class would be padded to 16 bytes, and adding one byte would make it 17 bytes which would round up to 32.)


When storing these classes in an array however, they would both consume 16 bytes per array element to ensure alignment of the double. The sizeof operator returns the true size of the class (so 12 or 13 in this case). In this case it would be up to the user to align this when performing pointer arithmetic on arrays, but as pointer arithmetic is not supported this is a non-issue.


When size alignment is necessary for some other reason, it can be controlled using the align keyword which is explained later.


A class can be declared abstract using the ‘abstract’ keyword before it. An abstract class cannot be instantiated, but must be derived from to make concrete classes that can be instantiated. An abstract class can optionally declare abstract functions which derived classes must implement unless they are also abstract.


Using the ‘sealed’ keyword before a class prevents it being derived from.


Use the ‘static’ keyword before a class to declare that it will only contain static members. This makes it act more like a namespace than a class as it also prevents it being instantiated.




Using the struct keyword instead of class is shorthand for making all member functions local. Later sections on local pointers and functions explain this fully - for now, let’s say this allows you to fully utilise this type when it’s stored on the stack. Any class member function can be made local by putting the ‘local’ keyword after the function declaration. If a class instance is on the stack, then only local member functions can be called.


Local-only Structs


In the following sections on pointers and arrays you will discover that C-UP has the concept of a local reference. This is a special kind of reference that is allowed to refer to local (stack) data. Local references can usually only be used for function parameters, return values and local variables in order to ensure that you don’t keep references to expired stack data.


For added flexibility you can also declare that a particular struct type can only be allocated locally (i.e. on the stack), which then allows that structure to contain member variables that are local references. To declare such a struct put the local keyword after the struct name:


struct Message local


    public int2& Location;        // & is a local reference


This is useful for 2 reasons:


1)      It allows you to create temporary ‘message’ type structures that just pass data to lower level functions without performing heap allocations, which reduces load on the garbage collector. An example of this paradigm is windows message structures which are allocated this way avoiding many thousands of small allocations in the process.

2)      It allows you to create ‘task’ type objects for performing parallel execution. See the Parallelism section for more information on this.


However, this power comes with certain restrictions to ensure that you don’t end up holding onto references to destroyed stack data:


1)      All member variables of a local struct that are of a local reference type are implicitly read only. This means that once a local struct is constructed you can’t re-assign those variables (although you can change data referenced to by them.)

2)      You can’t return a locally created local-only struct from a function as this struct could encapsulate data created in the scope of that function. This is analogous to the rules for returning local references, described in the next section.






A pointer to a type is declared with the familiar C asterisk syntax:


int anInt;

int* ptrToAnInt = &anInt;

int valueOfAnInt = *ptrToAnInt;


In C-UP any type can be passed by value or by reference using a pointer syntax (*) similar to C.


void* is a pointer to any type.


C-UP does not have the -> operator for dereferencing pointers, you always use the dot operator. This is because there’s no syntactic or semantic necessity to differentiate the two and doing so just makes refactoring code more difficult.


Pointer arithmetic is not supported in C-UP - you cannot modify the value of a pointer except by assigning another pointer to it. To read or write streams of binary data in memory use the System.IO.MemoryStream class, which is efficiently implemented using unaligned load and store intrinsic function and can have all bounds checking disabled in release builds.


Pointers are 8 bytes in size regardless of whether you are compiling for a 32 or 64 bit architecture and are always 8 byte aligned. Only the bottom 47 bits are pointer data so the maximum memory range addressable by a C-UP program is 128 terabytes. The top 16 bits hold the runtime type of the data pointed to. This allows for virtual dispatch on a pointer to any type and for virtual dispatch on more than one parameter which is described in detail later. However, it also means that any single C-UP program is limited to having 65536 types in it. The other remaining bit is used by the garbage collector to avoid following the same pointer twice when it’s working out what data is live.


A further advantage of having the type embedded in the pointer is that virtual dispatch and runtime type queries don’t need to dereference the pointer in order to do any work. The type index will probably already be in the cache as it is inherently in the same cache line as the pointer itself. The base address of the virtual dispatch table is statically linked at the call site, which means that the same runtime type can be used across different heterogeneous cores because the calling code can reference the base address of the appropriate dispatch table for that core.


Pointers cannot be assumed to load or store atomically.


Const Pointers


A const pointer is a pointer to data which cannot be changed via that pointer.


Const-ness in C-UP is transitive, meaning that it’s not possible for a const pointer to point to a non-const pointer. Transitive const is crucial for the parallel implementation to be reliable.


const int* pInt;     // constant pointer to constant integer

const(int)* pInt;    // mutable pointer to constant integer

const(int*)* ppInt;  // mutable pointer to constant pointer to constant integer


Consider the above type – if you read it from right to left (which is how types in C languages are read) notice that in the above syntax it’s impossible to define a constant pointer to mutable data.


In C-UP it is impossible to cast away const-ness, because that would mean you could break the parallel execution dependency mechanism which could introduce extremely subtle and hard to find bugs.


Note that arrays can also be const (arrays are covered in detail later).


const(int)[] arr;          // array of constant integers

const int[] arr;           // constant array of constant integers


Local Pointers


In C there’s no way to differentiate data that is local (i.e. on that stack) from data which is not. This means that in C it’s possible to have a live reference to data which was on the stack but has now been overwritten. This can cause hard to find bugs and for this reason and many others C-UP makes a clear distinction between data that might be on the stack (so called local data) and data that definitely isn’t.


In C-UP when you take the address of a local variable you get back a local pointer. Local pointers are represented with the & symbol instead of the *. Don’t confuse them with C++ references though as they still need to be explicitly dereferenced and they can be assigned to after initialisation.


Local pointers can only be used for local variables , function parameters, function return values (with restrictions) and as members of local-only structs (see Classes section).


When returning a local reference from a function you can only return values that were passed into the function as a parameter (or values derived from a local paremeter – e.g. an element of a local array). You cannot return a local variable as this would imply returning data that is out of scope.


When using local pointers as parameters, you cannot pass a local pointer to a local pointer without making the outer pointer const. Otherwise you’d be able to overwrite the inner pointer with a reference to data that has gone out of scope.


A local pointer does not state that the referenced data is definitely local, only that it might be. All this really does is prevents you storing that pointer to a location where it might outlive the data it references.


A non-local pointer will implicitly convert to a local pointer but a local pointer can never be converted to a non-local pointer. It’s recommended that you use local pointers for all parameters when possible as this will allow your code to be used more freely.


To make the ‘this’ pointer of a class member function local, put the ‘local’ keyword after the function header (the same place you put const to make it constant.) Note that all member functions of a struct are implicitly local and that this is the only difference between classes and structs.


class MyTestClass


    void MyMemberFunction() local const { }



It’s possible to allocate a new instance of a polymorphic type on the stack, which will return a local pointer. This instance is freed when the containing function returns. See the Expressions section on new/delete for more information on this.


Alien& anAlien = new local SubclassOfAlien();


Note that as local pointers and non-local pointers are different types, it’s possible to create an overloaded function which behaves differently if the source object might be on the stack or is definitely not.


void& is a local pointer to any type.





There are many different flavours of array in C-UP, but the simplest version is a 1-dimensional array:


float[] arrayOfFloats;


This standard dynamic array type uses 12 bytes of storage and can hold up to about 2 billion (0x7fffffff) elements. The bottom 47 bits of storage are the pointer data and the top 32 bits are the element count. The middle 16 bits are unused in this case.


Arrays with a maximum of 32767 elements (referred to as short arrays from here on) can be stored more compactly using 8 bytes, where the top 16 bits give the element count. They have this syntax:


float[short] shortArrayOfFloats.


Arrays with a long number of elements (a long array) require 16 bytes of storage but can hold up to 0x7fffffffffffffff elements (although as pointers are limited to 47 bits there’s a practical limit of 2^47 elements).


float[long] longArrayOfFloats;


An array type can implicitly convert to a wider or narrower array types with the same element type. Conversion to a narrower type inserts a runtime check to confirm that the array is not too long.


The length of an array can be retrieved (but not changed) using the Length property, which is an int, short or long depending on the type of array.


int len = arrayOfFloats.Length;


Note that for consistency (and to make implementing generic types simpler), you can optionally explicitly specify int as the array type, so the 2 declarations below are equivalent.


float[] arrayOfFloats;

float[int] arrayOfFloats;


Signed values are used for all array lengths because doing so avoids issues with reverse iteration and with array index deltas, whereas the benefits of using unsigned values are marginal.


Array types cannot be assumed to be atomic on 32 or 64 bit architectures.


Accessing elements


The elements of an array are accessed using the array[n] syntax, where ‘array’ is the name of the array and ‘n’ is the index of the element to access. Indexes are always 0 based.


float x = arrayOfFloats[10];


The type of the index must implicitly convert to the index type for the array, i.e. short, int or long.


Bounds checking


All arrays accesses are bounds checked by default. Because of the possible performance implications of these checks they can be disabled on the compiler command line, although this should only occur in final release builds as checking bounds catches all kinds of very hard to find bugs. These checks can also largely be avoided using the foreach statement rather than accessing elements independently. Note that because it’s expected that these checks will be disabled in release builds it’s imperative that your code doesn’t rely on bounds check exceptions for its normal operation.




Arrays can be sliced using the .. operator. This creates a sub-array of the source array aliased over the same memory. This is possible because the length is stored as part of the reference as opposed to part of the array itself. The first value is the first index and the second value is the last index plus one (the plus one makes slicing work much more naturally in practice).


char[] subArray = anArray [5..10]; // Creates a 5 element array

char[] subArray2 = anArray[anArray.Length–4 .. anArray.Length];  // Last 4 elements of anArray


Array slices are bounds checked by default and it is also checked that the start index is below or equal to the end index. 0 length slices are allowed.


2D Arrays


2 dimensional arrays are also supported in the language, and are primarily intended for working with 2D image data. They have the following syntax:


byte4[,] screenRect


2D arrays are short, meaning that they have a maximum width and height of 32767. There are currently no int or long equivalents for 2D arrays - if you need that you can implement it manually using 1D arrays or indexers. Higher dimensions are also not implemented in the language but can also be simulated using 1D arrays or indexers.


A 2D array requires 12 bytes of storage. The bottom 47 bits are the pointer. The top 48 bits are 3 short values: the stride (Stride property), width (Width property) and height (Height property). You can also get the width and height of the array together as a short2 using the Size property.


The stride is the number of elements in a row of the array and is required to allow slicing of 2D arrays. E.g.


screenRect = new byte4[640, 480];

byte4[,] viewportRect = screenRect[0 .. 320, 0 .. 240];


Note that the stride has not changed as the array is a sub-rectangle of the larger 2D array.


Part of a single row of a 2D array can be sliced as a 1D array by supplying a range in x and a single row index in y. The result is a short 1d array:


byte4[short] viewportRow = viewportRect[0 .. viewportRect.Width, 10];


A 2D array can be converted to a 1D array in its entirety using an explicit cast, but if the array is strided (i.e. stride doesn’t equal width) then all of the skipped pixels are included in the 1d array:


byte4[] pixels = cast(byte4[])viewportRect;


A 1D array can be converted to a 2D array using an explicit cast, but the width and height must be explicitly passed in as extra info after the cast(). At runtime it is checked that width * height equals the length of the source array:


byte4[,] pixels2d = cast(byte4[,])[320, 240] viewportRect;


Indexing by vector


A special case of array indexing and slicing exists for 2D arrays, whereby you can access an element of a 2D array or slice a 2D array using the ushort2 vector type.


byte4[,] screenRect;


short2 pos;

byte4 pixel = screenRect[pos];


short2 topLeft, bottomRight;

byte4[,] viewportRect = screenRect[topLeft .. bottomRight];



Local Arrays


Arrays can be declared as local in the same way that pointers can. As with local pointers it’s important to note that when you declare an array as local you’re not saying that it definitely references local data, just that it might.


The syntax for local arrays is a little long winded as we’ve run out of bracket types; you must put the ‘local’ keyword after the [], as in:


int[] local myLocalArray;


Also note that it’s possible to allocate a local array on the stack using the “new local” syntax. This memory is freed when the function it appears in returns.


myLocalArray = new local int[100];




Value Arrays


There is support for both 1D and 2D value arrays, which are arrays where the dimensions are compile time constants.


float[6, 6] spatialMatrix;

float[6] spatialVector;


As the name suggests, value arrays are value types and as such assigning them copies all elements of the array. The length is not stored anywhere as it’s a constant – you can still access it using the Length property which causes the compiler to substitute in the appropriate constant value.


[It’s not recommended that you implement vectors and matrices as per the above example because vector and matrix types are built into the language.]


The address of a value array can be implicitly converted to an equivalent dynamic array type. If the value array is local then taking the address of it returns a local pointer, which implicitly converts to a local array reference (see previous section.) This makes it easy to pass a value array by reference to a function expecting a dynamic array.


float[4,4]* matrixPointer = &matrix;

float[]& pointLocalPointer = &point;

float[,] matrixArray = &matrix;

float[] local pointArray = &point;



Params arrays


Params arrays are used to pass a variable number of arguments to a function. Therefore a params array can only appear as the last parameter of a function. They use the […] array syntax.


void Printf(string format, const void&[…] args);

void PrintStrings(string[…] strings);

void PrintInts(int[…] ints);


The elements of the array can be any type you require. Arguments given at the call site must implicitly convert to the given type.


The only time an argument isn’t implicitly converted to being a member of the params array type is if that argument is itself of exactly the same params array type, in which case it’s passed through in its entirety.


Under the hood this mechanism is implemented by all of the arguments being placed into an implicitly created local value array, a reference to which is passed as the params array. For this reason a params array is always implicitly local – it’s an error to explicitly specify local after the […].




Aliasing in the context of arrays (and pointers) refers to the idea that two references could be pointing at overlapping memory, which matters because it prevents the compiler from applying certain very important optimisations. Consider this example:


void ApplyScale(float[] local values, float& scale)


    foreach (float& v; values)

        *v *= *scale;



In this loop the compiler is forced to load the scale value from memory repeatedly because it has no way of knowing that the scale pointer doesn’t refer to one of the elements of the values array and could therefore be affected by an earlier iteration. However, one simple change would cause this function to optimise much better, and that’s declaring the scale reference as being const.


void Scale(float[] local values, const float& scale)


The reason this works is because it’s an error in C-UP for a const local reference to alias with a mutable local reference. The runtime checks for this aren’t enabled by default as they might be a bit slow but you should enable them during debug builds or automated tests with the appropriate compiler option (-aliascheck).


In general, any use of a non-local reference in such a loop will mean the compiler has to assume the worst and full optimisation will be inhibited. This is also true of using local references that aren’t parameters to the function because the compiler can’t (currently) work out that you didn’t just cast a non-local reference to a local one, except in parallel code where such casts are forbidden.


So the simple rule for functions that need to be very fast is to use only access data through local reference parameters and be const correct.





C-UP has built in support for UTF-8 and UTF-16 strings. Strings are essentially a special case of a const character array. The built in string type can represent both UTF-8 and UTF-16 strings.


[Note. the above is not really true. It’s more true to say that ASCII and UCS-2 strings are supported and there is the potential to support UTF-8/16. This is because all the built in string functions treat individual bytes/words as being a complete character, rather than a code point. For example, when comparing an 8-bit string to a 16-bit string with ==, the lengths are first compared for equality and then each byte of the 8-bit string is compared to each equivalent word in the wide string. Proper Unicode support will require a library to be written at some point.]


The string type is 12 bytes in size and 8 byte aligned with a similar layout to a standard array: 47 bits pointer data, 16 bits unused, 31 bit length, 1 bit wide.


The ‘wide’ bit in a string defines whether the string contains 8 or 16 bit characters. 8 bit characters are represented with the ‘char’ type, 16 bit characters are the ‘wchar’ type. Because strings have this capability it’s possible to cast either a char or wchar array to a string and the string will be constructed in place over the array data. However, because this causes aliasing strings cannot be considered immutable as they are in some languages, as it is possible to construct a string aliased over a mutable array, which can then be changed via the mutable array reference.


The default string literal type is UTF-8 if the string doesn’t contain any characters with a value > 127, or UTF-16 if it does. If for some reason you specifically need a string literal to be wide, you can prefix it with a lower case ‘w’.


string s1 = “Hello”;       // UTF-8 literal

string s2 = “ВАЖНО”;       // UTF-16 literal

string s3 = w”Hello”;             // UTF-16 literal


When you perform string operations the same type of string is returned as was passed in. In the case of string concatenation, an 8-bit string is returned if all input strings were 8-bit, otherwise a 16-bit string is returned.


Because of the ‘virtual’ nature of string characters indexing individual characters with the array access operator is somewhat slower than accessing a standard array (although not by much.) Therefore if you want to perform large numbers of operations on a string you may prefer to cast it to a const char/wchar array first (which is an in-place conversion) and perform the operations on that array. Attempting to convert to the wrong char type throws an exception.


Length is a read only int property of string.


IsWide is a read only bool property of strings, and is true if this string contains 16 bit characters.


Array element access and array slicing can be used on strings. Element access always returns the wchar type even if the string is not wide.


Strings can be compared for case sensitive equality using the == and != operators, and for case sensitive relation using <, >, <=, >=. More complex comparisons can be performed with the Compare function. If both strings are ‘null’ then they are considered equal. If one is null but not the other then they are not equal.


Strings can be concatenated using the + operator. This allocates a new string in the default string heap. To force the new string to be allocated in a different heap use the static Concat function.


String values can be used in the switch statement. Comparison is case sensitive.


String types cannot be assumed to be atomic on any architecture.


Strings have the following member functions.


public bool EndsWith(string strB);

public int IndexOf(char ch);

public int IndexOf(string strB);

public int IndexOfAny(char[] local chars);

public string Insert(int startIndex, string insert);

public int LastIndexOf(char ch);

public int LastIndexOf(string strB);

public int LastIndexOfAny(char[] local chars);

public string PadLeft(int totalChars, char ch);

public string PadRight(int totalChars, char ch);

public string Remove(int startIndex, int count);

public string Replace(char oldValue, char newValue);

public string Replace(string oldValue, string newValue);

public bool StartsWith(string strB);

public string ToUpper();

public string ToLower();

public string TrimStart(char ch);

public string TrimEnd(char ch);

public int Split(string[] local results, char[] local separators, bool omitEmptyStrings);

public int Format(string format, const void&[…] args);


Because strings are constant all of the functions that return a modified string will create a copy of the input string unless no changes were necessary in which case the original string is returned. All such functions allow you to optionally specify the heap in which to allocate the new string.


Functions that return a sub-string (e.g. TrimStart, TrimEnd, Split) use the array slicing mechanism to avoid allocating memory.


All built in value types (including vector types) implicitly convert to strings. Strings can be explicitly converted to the built in value types (not vectors) using the standard cast operator.


Short strings


Many of the strings in a typical program are used to represent file paths, names, addresses, etc. There is little chance that any of these things will exceed 32767 characters in length and requiring 12 bytes of storage for such a string is wasteful. For this reason C-UP also supports a short string type, which is named ‘sstring’.


A short string has all the capabilities of a regular string, except it is limited to 32767 characters in length. The upside is that it only requires 8 bytes of storage (regular strings are 12 bytes) offering better alignment characteristics and a considerable memory saving when many thousands of strings are being stored.


Short strings and regular strings can implicitly convert in both directions, but a conversion from a regular string to a short string performs a runtime length check.


It’s recommended that you use the normal string type for all interfaces and only use sstring internally as a storage format. This just keeps interfaces cleaner and avoids the need to have two versions of functions. If the performance implications of this concern you please note that the conversion from a short string to a regular string requires zero processor cycles.


Escape sequences


Like other C languages, you can use special characters in a string or character literal using escape sequence. These are the escape code (backslash) followed by the character representing the special character you require:


\a            - alarm sound

\b           - backspace

\f            - form feed

\n           - line feed (newline)

\r            - carriage return

\t            - horizontal tab

\v            - vertical tab

             - apostrophe

            - quote

\\            - backslash

\?            - ???

\xNN     - insert and 8 bit hex value into the character stream (NN being 2 hex digits)




String formatting functions (e.g. string.Format, Console.WriteLine) accept a string which gives the format followed by a params array containing the values to be formatted in.


The syntax for format is the similar to the C language: %d for an integer, %x for hex integer, %f for a float, %s for a string, %b for boolean.


The difference compared with C is that the program can actually validate that the passed in values are of the expected type and throw an exception if they aren’t rather than just crashing.





The enum keyword allows you to define a new named integral type (based on one of the built in integral types), and create a set of constant values of that type. If the underlying base type for an enum is omitted, then it’s based on int.


You can assign a specific value to each constant, but if you don’t it will be given the value of the previous constant plus one (unless it’s the first item in the list in which case it gets the value zero).



enum Shapes : ubyte        // underlying type is ubyte


    Circle,                // value 0

    Triangle,               // value 1


    Diamond = 7

    Heart                  // will have the value 8



Enumerated types can be explicitly converted to and from their underlying type. They support comparison and bitwise operators. The usual C practice of using enums for flags is not recommended however, as bit structs are more flexible and less error prone.


For the purposes of dynamic typing, enumerations behave as if they are derived from their underlying type. That is, a function taking a virtual pointer to an integer type will be matched by a pointer to an enumeration type based on that integer type. Also, dynamically down casting a pointer to an enumeration type to a pointer to its underlying type will succeed. These rules exist to allow the creation of functions that deal with enumeration types en-masse rather than having to deal with every individual enumeration type declared in a program, which would be incredibly cumbersome and probably impossible as you might not know all of those types.


The identifier can optionally be omitted which is a shortcut for declaring constant values in the containing scope of the enum.





A union is a structure where every member starts at the same address, meaning they overlap each other in memory.


Un-tagged unions aren’t compatible with precise garbage collection because it’s impossible to determine at runtime whether a reference aliased with a non-reference currently contains pointer data or not. For this reason, C-UP only supports non-reference types inside unions.


Although this is the Types section of the document a union in C-UP is not actually a type; it’s really just a way of controlling the offset of variables in a scope. For this reason unions do not have a name.


int GetFloatBitsAsInt(float value)


    union {int i; float f;}

    f = value;

    return i;



[Note that this example is not good practice in C-UP because you can use the ‘as’ operator to achieve exactly the same effect much more concisely and probably without accessing memory.]


The variables i and f will be inserted at the function scope as if the union declaration wasn’t even there, except they will both be given the same address on the stack.


The same concept can be applied at any scope: module, class or local.


You can also embed one union inside another and use anonymous structs inside unions for even more flexibility in controlling memory layout. Here’s an example, with the resulting memory offset of each member given as a code comment:


class MyClass


    int x;                 // offset 0



        short y;           // offset 4 (i.e. sizeof(x))



            float z;       // 4

            float w;       // 8 (comes after z because they’re contained in a struct)



                char a;    // 12 (comes after w because of struct)

                wchar b;   // 12





    byte finalByte;         // 14 (after “wchar b” which is the highest preceding member)



In spite of all the apparent scoping above it’s not possible to have 2 members with the same name as they are all actually collapsed back down to the same scope – all the above does is control their relative addresses.


The alignment of all union members is the maximum alignment of any member of that union.



Bit Structs


A bit struct is a structure that only contains a combination of boolean, integral and enum member variables, each of which is explicitly given a number of bits to consume. If you don’t give a particular member a bit count then the natural count for the type of that member is assumed. All members of a bit struct are public so it is an error to specify a protection level. Member variables of a bit struct cannot have initialisers and bit structs cannot contain functions.


An entire bit struct is based on one of the built in integral types and if the bits required overflow the declared type an error results. If an underlying type is not specified for a bit struct then int is used.


bitstruct MyBits : ushort         // underlying type is ushort


    bool HasScore : 1;

    uint Score : 6;

    bool HasChar : 1;

    char TheChar;



It is not possible to take the address of a bit struct member.




Bit structs can be conveniently constructed in a single expression using named arguments.


MyBits b = MyBits(HasScore:true, HasChar:false, Score:100);


Any bits you don’t explicitly declare a value for are set to zero.


Note that when a bit struct is constructed any constant values passed in are combined into a single value at compile time, and then any variable parts are masked and shifted into place, so using bit structs is just as efficient as shifting and or-ing enum values together.


The use of bit structs is preferred to the use of enums for storing flags because:

1.       They’re less error prone as the compiler performs all appropriate masking and shifting, and constant values are range checked.

2.       They’re strongly typed. Each field has a type - a bool is a bool, not a uint shifted and then converted.

3.       You can easily combine single bit values and multiple bit values in a single struct, without getting into some very confusing territory as you would with enums.






The alias keyword allows you to create an identifier that acts as an alias for any symbol or type name. This is useful because type names can get quite complex which makes them hard to read and time consuming and error prone to type.


The basic form is of an alias declaration is:


alias TYPE-NAME identifier;


The identifier can then be used as a synonym for TYPE-NAME anywhere in the code. The aliased name is just replaced with the expanded type name very early during semantic checking so most of the compiler doesn’t even know about the aliased name and many errors will report the expanded name instead.


Aliases may be introduced at any scope and are only visible in that scope. Aliases at module scope are imported when the module is imported.


alias const(byte4)[int] heap(TextureHeap) Texture;


Texture aTexture;

aTexture[10, 10] = byte4(255, 0, 0, 255);




Type inference


The compiler can infer the type of any variable that has an initialiser. To use this mechanism simply declare the variable with the ‘var’ type.


var myValue = 10.0f;


The compiler knows that the variable must be a float because the assignment tells it so. Obviously this isn’t very useful in a simple example like above but is useful when writing generic code where types aren’t necessarily known in advance. It can also save a lot of typing and prevent errors when instantiating complex types:


// saves typing in that long winded type name twice

var myVar = new System.Collections.List<MyClass.SomeStructure*>();


// Generic swap function - T must be a pointer type

void Swap<T>(T a, T b)


    var tmp = *a;                 // without ‘var’ this wouldn’t be possible

    *a = *b;

    *b = tmp;




Ordinarily the var type resolves to exactly the type on the right of the assignment, but when used inside a foreach statement you can modify var to get a pointer to the given type.


int someInts[10];

foreach (var i; someInts) { }                   // get values from the array

foreach (var& i; someInts) { *i = 0; }   // get pointers to elements of array






Functions look much the same as any other C style language. They can be declared at module or class scope, and class member functions can be non-static in which case they have an implicit ‘this’ parameter and require an instance to be called or static in which case they don’t.


void MyFunction(float arg1, int** arg2, string arg3 = “Default”, int arg4 = 123);




Function parameters are represented the same way as parameters in C++. Optional parameters are supported in the same way as C++, by giving a default value after an =.


All parameters in C-UP are const. That is, the parameter itself is constant – if it’s a reference parameter then the data referred to is mutable by default, unless the const keyword is explicitly used. The reason parameters are const is that it allows the compiler to more aggressively optimise argument passing (also see Aliasing below), which is particularly important as C-UP does not allow passing parameters by invisible reference like C++ (&) or C# (ref, out) do. Things like matrices should be passed by value and let the compiler to decide if passing by value or reference is more optimal on a particular architecture. Indeed, to use operator overloads passing by value is required.


Values returned from a function are also always const. This matters because it prevents you calling a function on a returned value that treats that value as mutable which would be pointless as you’d inherently be modifying a temporary value which isn’t a useful thing to do. Making return values const forces the compiler to choose a const overload of the called function or issue an error if no such version exists.


Arguments to functions are normally passed by position (i.e. the first argument given corresponds to the first parameter, the second argument to the second parameter, and so on) but can also be passed by name. Arguments that are passed by name can follow arguments passed by position, but positional arguments cannot follow named ones. Named arguments can appear in any order.


All arguments named or not are evaluated in the order they appear. It is not allowed for them to be re-ordered by the compiler. This significant departure from C was taken because the behaviour in C results in programs that don’t always behave how they’re written, but can sometimes appear to work and then fail when optimisations are enabled.


Here’s an example of calling the above function with a mixture of positional and named arguments:


MyFunction(1.0f, null, arg4:999);


Note that in this example no value is supplied for arg3. If arg3 did not have a default value then this would be an error, but it does so it isn’t.




Aliasing in the context of function parameters refers to the possibility that multiple reference parameters could refer to overlapping areas of memory, in which case they are said to alias. If the language allows such a thing then the optimiser cannot be as aggressive as we would like in some cases. Therefore C-UP defines rules on aliasing that allow the optimiser to produce the best code possible if those rules are followed:


-          Const local reference parameters must not alias non-const local reference parameters. This is optionally checked for at runtime at function entry points.


-          Any reference that is not a parameter or is not local is assumed to alias.





Local variables


By default all local variables are initialised to zero at function entry.


If a variable is in a nested scope inside the function it is still only zeroed once on function entry and can therefore be relied upon to retain its value on repeated entry into that scope. It is correct to infer from this that multiple scopes at the same scope in a function do not alias over the same memory. Again, this behaviour is required by the language – it’s not an implementation quirk.


Essentially all local variables exist at the function scope for the purposes of storage allocation. Nesting them only hides them from outer scopes.



int Sum(int[,] local values)


    int total;


    for (int y = 0; y < values.Height; y++)


        for (int x = 0; x < values.Width; x++)


            int prevValue;

            total += values[x, y];

            prevValue = values[x, y];




    return total;



In the above example, total isn’t initialised and that’s fine as it will already be zero (in fact if you put an = 0 after it that will be optimized out as it’s known that it will already be zero). Furthermore prevValue will contain zero on the first entry into the loop and will contain the previous array element value on subsequent iterations, even though its scope has been left and re-entered. Note though that if you give prevValue an initialiser it will be executed in every iteration of the loop.


This behaviour serves several purposes:


1.       Zeroing locals gives them the same behaviour as heap variables, which are also always zeroed. Uninitialized variables are a huge source of frustration in C++. The C# solution of checking for definite assignment before use is not possible in the presence of value arrays.


2.       Because it’s not possible to statically know if a particular local reference is initialised or not, that makes it impossible for a precise garbage collector to function correctly. Zero initializing all references guarantees that they either contain null or a valid reference.


3.       Because scopes are not aliased in memory there is no need to repeatedly zero them on scope entry, and the one and only zeroing that occurs can use the fastest possible instructions to zero a larger amount of memory all at once.


Due to the potential performance implications of zeroing all stack frames this behaviour can be disabled on a per-variable basis (with some limitations), but doing so is strongly discouraged unless careful profiling shows that this is in fact an issue. To disable zeroing for a local variable you initialise it with void; i.e. use “= void” as the initialiser. Trying to use “= void” on a reference type or on a type that contains reference types gives a compile error, as the garbage collector relies on references being zeroed.


Any use of “= void” causes local variables to be re-ordered in memory so that all initialised variables are grouped together, and all un-initialised ones are grouped together. This allows the stack frame zeroing to still be performed using a fast en-masse clear (see 3 above).





Function Overloading


Function names can be overloaded, meaning that you can have any number of functions with the same name in the same scope as long as those functions have different parameter types (or a different number of generic parameters).


C-UP also supports overloading on the return type of a function, which makes lots of things rather convenient. For example the Stream class has many Read functions taking no parameters but returning different types, which is much cleaner than having function names like ReadInt, ReadBool, ReadString, etc.


Return type overloading works by taking the type on the left of an assignment or the required return value in a return statement into account when performing overload resolution. In cases where this is not sufficient to correctly resolve the overload the programmer can tell the compiler what he wants by using an explicit cast to the required return type directly before the function name in the function call expression.


int Read() { }

float Read() { }

void Func (int i) { }

void Func (float f) { }


int x = Read();      // knows to call first version

Read();                     // error, which read should it invoke?

cast(int)Read();     // call the first version


Overloading the same name with a mixture of static and non-static functions in the same scope is not allowed.


Overload Resolution


Hopefully the overload resolution rules work in such a natural way that you never really have to think about them. However, in case you do here’s how it works.


When a function call is encountered, the name of the function is looked for using the normal scoping rules. The first scope that contains a function with the given name is selected and all functions with that name in that scope are eligible for calling as long as the given arguments and the return type can convert to/from the required types. If during this process a perfect match is found then that function is immediately selected and no more work is required.


If more than one function is found, then all functions are compared and only functions that are a better match than any other function are kept.


For a function to be considered better than another it must have at least one parameter that is better matched by the given argument than the equivalent parameter of the other function and no parameters that are a worse match than the other function. A better match is achieved if one of the following conditions is met for any argument:


-          An exact type match is better than an inexact one

-          Conversions within register files (int->int, float->float, vector->vector) are preferred to conversions between register files

-          A value type better converts to another value than to a reference (so that chars prefer converting to wider chars than to strings.)

-          A reference type better converts to another reference type than to a value (so an array/pointer conversion is preferred to a conversion to bool.)

-          A non-params array match is better than a params array match

-          A higher alignment match is better than a lower one (higher alignment implies faster execution)

-          When dynamic arrays are involved, a shorter array version is better than a longer array version (i.e.[short] is preferred to [int] is preferred to [long])

-          For reference types, non-const is preferred to const (because if both are applicable you may be planning to mutate data)


If after all this there are still multiple functions, then return types come into play (but only if the call is on the right of an assignment, a return statement or an explicit cast.


First the number of functions is reduced by checking if any of them return exactly the correct return type and if so keeping only the ones that do so.


If there are still multiple functions, then ones returning types with a lower level of indirection than others are preferred. This means that a function that returns a value will be selected over one returning a pointer to a value. This rule exists so that both return by value and return by reference can be supported side by side in collection classes, with the user naturally able to use foreach in either way.



Member Functions


Class member functions can be static or instance (non-static).


Class instance member functions are passed an implicit ‘this’ pointer as their first argument, which points to the instance object they’ve been invoked on. Using the const keyword after the function declaration makes the ‘this’ pointer const. Putting the local keyword after the function declaration make the ‘this’ pointer local, which means the member function can be called on instances of the class which are on the stack.


Virtual Member Functions


A virtual member function can be overridden by a more specific implementation in a derived type. To make a member function of a class virtual, simply precede it with the virtual keyword.


class ClassA


    public virtual void SomeFunction(int a, int b)




    public virtual ClassA* DuplicateThis() const


        ClassA* copy = new ClassA();

        *copy = *this;

        return copy;




To override a virtual function in a derived class use the override keyword. The function signatures must match exactly, except that if the return type is a pointer then a pointer to a derived type can be returned (known as covariant return types).


class ClassB : ClassA


    public override void SomeFunction(int a, int b)




    public override ClassB* DuplicateThis()


        ClassB* copy = new ClassB();

        *copy = *this;

        return copy;




The protection level of an override function must match that of the virtual it overrides.


Abstract Member Functions


An abstract member function is a virtual member function which provides no implementation but requires that an implementation be provided in any derived class.


Prefixing a function declaration with the keyword ‘abstract’ declares it as both abstract and virtual.


Such a function can only be declared inside a class that is also marked as abstract. As an abstract class cannot be instantiated it’s impossible to call a function that has no implementation.




Constructors are class instance member functions with the name ‘this’. They are invoked in order to perform initialisation on a new instance of a class.


A class can have any number of constructors so long as they have different parameters. It’s possible to have a class with no constructors, in which case it won’t be constructed (unless a base class has a constructor) but will be initialised (see Initialisation section in Expressions > New/Delete/Local).


class Value


    // private member variable

    private int m_value;


    // constructors

    public this() {m_value = 0;}

    public this(int value) {m_value = value;}


    // virtual function

    public virtual int Value() const { return m_value; }

class ScaledValue : Value


    private int m_scale = 1;      // Initialiser allowed on member variable


    public this() : base() { }

    public this(int value, int scale) : base(value) { m_scale = scale; }

    public this(int value) : this(value, 1) { }


    public override int Value() const { return m_value * m_scale; }



A derived class constructor can call another constructor of this class using the ‘this’ keyword, or a constructor of the base type using the ‘base’ keyword (see above example.)


Constructor overload resolution happens in a similar way as function overload resolution. The type being created is searched for constructors, if any are found then one that matches the given arguments must exist. If none are found, then the base class is checked. This has the effect of allowing you to create derived classes that inherit all of their base class constructors. Constructing an object using a base class constructor means member variables declared in the derived type will be initialised but not constructed.


If a derived class implements a constructor and its base type implements any constructors, then the derived class must call a ‘this’ or a ‘base’ constructor.


At no point are constructors implicitly created or implicitly invoked. This is a huge departure from c++:

1)      If you place a value of a class type on the stack it is not automatically constructed – it is only constructed when you assign to it, as is the case with references.

2)      If you embed a value of a class type inside another class type, then the embedded class is not automatically constructed unless a constructor is explicitly invoked for it in the outer class constructor or initialiser.


These rules mean that you the programmer are in full control of initialisation/construction of new objects. There is no need to implement a default constructor if you don’t want one and no need to pay the cost of this default constructor being executed for every element of an array when you create one.




Destructors are class instance member functions with the name ‘~this’. They are invoked when delete is called on a class instance.


A struct can only have a single destructor with the ‘this’ pointer being implicitly local. If you delete a non-local struct instance, the local destructor is still called.


A class can have separate local and non-local destructors. Which one is called depends on whether a class instance being deleted is local or not. It’s recommended that you provide both destructors for classes unless it’s a class where you disallow copying and there are no local constructors, in which case you can safely omit the local version.


Base class destructors are implicitly called after derived class destructors, chaining all the way back up the hierarchy. A local destructor will only call a base local destructor and a non-local destructor will only call a base non-local destructor.

Any struct that doesn’t implement a destructor gets one implicitly. This is necessary so that if you then implement a destructor in a derived class the virtual dispatch will work. A class always needs both destructor variants so if either one is missing an implicit one is made.


There is no concept of special virtual destructors. All destructors are implicitly virtual so it’s an error to explicitly declare them virtual. Making them optionally virtual is not necessary when you don’t have a virtual table entry being implicitly included ino your types.



Multiple Virtual Dispatch


The virtual and override keywords can be used on the individual parameters of functions at any scope allowing you to perform virtual dispatch on:

1)      More than one parameter (so called multiple dispatch)

2)      Types unrelated by inheritance (by using void* as the virtual parameter type).

3)      Non-class types (in conjunction with void*).


Virtual and override cannot be mixed in a single function.


module Geometry.Shapes;


class Shape { }

class Box : Shape { }

class Sphere : Shape { }


bool TestIntersection(virtual Shape& a, virtual Shape& b);

bool TestIntersection(override Box& a, override Box& b);

bool TestIntersection(override Sphere& a, override Sphere& b);

bool TestIntersection(override Box& a, override Sphere& b);

bool TestIntersection(override Sphere& a, override Box& b);


Shape* a = new Sphere();

Shape* b = new Box();


If (TestIntersection(a, b))       // this will call the final implementation above

    Console.WriteLine(“They intersect”);



Static virtual functions can be overridden by other static virtual functions in any scope as long as they are visible to that scope. To override a virtual static function declared in another scope you must explicitly qualify the method name with the scope of the original virtual version was in. Extending the above example:


module Physics.Shapes;


class ConvexHull : Shape { }


bool Geometry.Shapes.TestIntersection(override Box& a, override ConvexHull& b);



Note that you can even have a virtual member function of a class and add extra virtual parameters to it:


class MyClass


    virtual void DrawMe2D(int x, int y, virtual Renderer* render);



The only real difference between virtual instance functions and virtual static/module functions is that in the first case you can only override inside derived classes, whereas in the latter you can override with a static/module function in any scope as long as the original virtual function is visible.



Nested Functions


A nested function is one that is defined inside the scope of another function. A nested function may be declared anywhere that a statement is allowed.


Nested functions may not declare protection levels or static-ness. A nested function inside a member function does have the implicit this parameter added.


A nested function can only be invoked from any scope inside the containing function. This includes from inside itself or from functions nested inside it.


Nested functions can implicitly access local variables and parameters from the enclosing function to any level of nesting. Because of this ability it’s not possible to make a delegate that references a nested function.



void A(int x)


    B(x, x + 100);


    int B(int y, int z)


        int prod = y * z;

        return C();


        int C()


            return x + y + z / prod;








Delegates are function pointers. They can be static in which case they refer to either a static class member function or a module scope function, or non-static in which case they also contain an instance object and are used to call a member function on that instance (or a static/module scope method with a pointer as its first parameter).


You must declare a delegate type separately from using it. The declaration looks exactly like a function declaration with no implementation, preceded by the keyword ‘delegate’. Parameter names are still required as they are used when anonymous delegates are created.


public delegate int BinaryIntDelegate(int x, int y);

public static delegate float UnaryFloatDelegate (float a);


To use the delegate type, declare a variable of that type and use construction syntax to make the delegate.



class Test


int Min(int v0, int v1) const;

static float Rand(float max);


Test* test = new Test();


// instance object required for instance method

BinaryIntDelegate minFunc = BinaryIntDelegate(test.Min);


// no instance object for static method

UnaryFloatDelegate random = UnaryFloatDelegate(Test.Rand);   


// nullify a delegate one of 2 ways

random = null;

random = UnaryFloatDelegate();


Delegates are invoked like any other function and they support default parameter values and named arguments in the same way as functions.


int minimum = minFunc(10, 20);

float randFlt = random(1000.0f);


Static delegates use 8 bytes of memory and are 8 byte aligned. Instance delegates use 16 bytes of memory and are 8 byte aligned.


When constructing a delegate, the appropriate overloaded function is selected that matches the given delegate type parameters. Virtual function lookup also occurs at delegate construction time which can be a useful way of avoiding this lookup cost every time you call a frequently used virtual function (although of course you use more memory storing the delegate which could make performance worse.)


Instance delegates can also invoke static or module scope functions as long as that function takes a pointer type as its first parameter. Special construction syntax is used in this case as it’s necessary to pass in the instance object as a second argument at the time of construction:


static int MyWeirdMaxFunc(void* whoKnows, int v0, int v1);

void* pSomething = new Whatever();

BinaryIntDelegate del = BinaryIntDelegate(MyWeirdMaxFunc, pSomething);


This feature allows you to use instance delegates as a general mechanism for calling almost any function, making them the ideal default choice for delegation.


The 2 parts of a delegate can be accessed. The instance pointer of a non-static delegate can be retrieved using the ‘Instance’ property, which returns a void*. The function part of a delegate can be retrieved using the ‘Function’ property, which returns a System.Reflection.Function symbol type – invoking this on a null delegate will return a null Function symbol. You can get the raw address of the referenced function using as(ulong).



Anonymous Delegates


Anonymous delegates are a convenience that saves you declaring a function (and coming up with a name for it) when it’s only used in one place.


Inside a delegate constructor where you would normally put the name of a function you can instead directly embed the body of a function by placing a code block enclosed in braces inside the parentheses of the constructor:


UnaryFloatDelegate product = UnaryFloatDelegate( { return v0 * v1; } );   


Note that the argument names are given by the delegate type itself so there’s no need to declare them at the creation site.


All this mechanism actually does is to create a compiler generated function in the containing scope of the function where this code appears. That is, if the above appears inside a class member function then the implicitly generated function will also be a member function of the same class. Whether the member function is static or not depends on whether the delegate type is static. If the created method is not static then it can of course access member variables from the containing class.


Anonymous delegates specifically do not implement closures as local variables outside the code block of the delegate are not visible inside the delegate.


When an instance delegate is constructed using an anonymous code block it is optional whether an instance object pointer is provided. If an instance object pointer is provided then the generated method is static and the instance object can be accessed inside the anonymous code block using an implicitly defined parameter called “context”.





Classes can implement operator overloads for all arithmetic, comparison, shift and logical operators, and also for cast (implicit and explicit) and scast (saturating cast).


Operator overloads must always be public, static class member functions and must accept at least one parameter or return a value of the enclosing type.


// Arithmetic

public static MyType operator+(MyType a, MyType b);

public static MyType operator-(MyType a);


// Comparison

public static bool operator==(MyType a, MyType b);


// Implicit cast is operator with no name

public static bool operator(MyType a);


// Explicit cast is called ‘cast’

public static MyOtherType operator cast(MyType a);



Cast operators only allow casting to the exact type that is returned by the operator. That’s to say, they don’t allow you to convert to base types of the returned type.




You can also implement the assignment operator in a class. This operator doesn’t actually perform any assignment as assignment by bitwise copy has already occurred once the assignment operator is invoked. This means it’s really a post assignment operator, which allows you to perform any required fix up or deep copy behaviour.


Another important use of the assignment operator is simply to prevent copies being made of a particular class. This is achieved by making the assignment operator private and can be required if the class holds a reference to a resource that can’t be shared. Note that it’s enough to just implement one of the below operators privately to achieve this.


The assignment operator is an instance function (unlike all other operators which are static) and does not accept any explicit parameters, so the only data it can depend upon is that accessible through the implicit this pointer. It always returns void. Local and non-local versions can be defined separately.


void operator=() local;

void operator=();


The assignment operator is only invoked when a value copy of a class is made. This happens when you assign one instance of a class over another or when a class is passed by value as an argument to a function. It specifically does not happen when you create a new class value because even though doing so uses assignment syntax, it is actually guaranteed to happen in place on the object on the left hand side of the assignment.




There is no specific syntax for properties. The name property merely refers to the fact that C-UP allows you to invoke functions taking zero or one parameters as if they were member variables. A function taking zero parameters is a ‘getter’ and a function taking one parameter is a ‘setter’.



class HoldsValue


       private float TheValue;


       public float Value() const {return TheValue;}

       public int Value() const {return cast(int)TheValue;}

       public void Value(float value) {TheValue = value;}

       public void Value(string value) {TheValue = cast(float)value;}



HoldsValue v;

v.Value = 10.0f;

v.Value = “123.45”;

float f = v.Value;

int i = v.Value;


Note that function overloading is still performed on properties so you can set or get the same property using differing source or destination types. However, you should be careful with this as it can prevent you being able to use the ++ and -- operators with properties. For example, in the above case the expression v.Value++ wouldn’t know if you wanted to get the value as a float or an int.


It’s possible to implement read or write only properties by providing only a getter or a setter.





Indexers allow a class to act like an array. An indexer has 2 sets of parameters, the indexing parameters in [], and zero or one value parameters in (). If there are zero parameters the indexer is a ‘getter’:


Class MyArrayType


    public static int operator[int index]();

    public static MyArrayType operator[int start .. int end]();      // slicing operator

    public static int operator[string index]()  ;             // index can be of any type

    public static int operator[int x, int y, int z](); // any number of indices allowed



If there is one parameter the indexer is a ‘setter’:


public static void operator[int index](int value);


Indexers allow you to implement arrays with more than 2 dimensions, associative arrays, etc.





There is a rich set of intrinsic functions in the language. The intrinsic module is implicitly imported into every module, but after the explicit imports which makes it possible to override the names of these functions. Be careful doing that though as it means the intrinsic function will only be visible by qualifying with the ‘Intrinsic’ module name.


float c = Intrinsic.Cos(1.0f);


Also note that all intrinsic functions are parallel constrained (see Parallelism chapter) and so can be freely called from parallel code.




For floats and doubles (and vectors thereof):


Cos, Sin, Tan, Acos Asin, Atan, Atan2

Cosh, Sinh, Tanh, Acosh, Asinh, Atanh

Exp, Exp2, Log, Log2, Log10, Pow


Note that the float implementations of the above functions are all fast approximations. If you find the approximations too coarse for your requirements the double functions are all full precision and can be used instead. Some of the current approximations in the 3rd row above are quite poor and will be changed. Also note that all of these functions only work inside their natural range of operator (e.g. +/-PI for Cos) as it’s much more efficient for the calling software to keep their angles inside a natural range rather than these functions having to assume the worst.


float RcpApprox(float x);              - returns 1/x approx (usually a single processor instruction)

float Rcp(float x);                             - returns 1/x approx plus one Newton-Raphson iteration

float RsqrtApprox(float x);           - reciprocal square root approx (single processor instruction)

float Rsqrt(float x);                          - reciprocal square root approx plus one Newton-Raphson iteration


The above 4 instructions are not supported for double types because hardware doesn’t support it, presumably because the only reason to use doubles is increased accuracy which then makes using approximations a bit self-defeating.


float Sqrt(float x);                            - accurate square root (single processor instruction but quite slow)


float Degrees(float x);                   - convert radians to degrees

float Radians(float x);                    - convert degrees to radians


float Floor(float x);                          - returns the largest whole number <= x

float Ceil(float x);                             - returns the smallest whole number >= x

float Round(float x);                       - returns the nearest whole number to x

float Trunc(float x)                          - returns the smallest whole number <= x, moving towards zero


Note. On x86 the non SSE4.1 implementations of the above rounding functions just convert to integers and back again. This means they don’t handle the full range of floating point values correctly, but does mean that they’re way faster than a full implementation. In SSE4.1 Intel introduced proper implementations of these functions so if SSE4.1 support is enabled those instructions are used.


float Min(float x, float y);             - returns minimum of 2 values

float Max(float x, float y);            - returns maximum of 2 values

float Clamp(float x, float min, float max)               - clamps value to given range

float Abs(float x);                             - returns absolute value of a signed number

float Sign(float x);                            - returns sign of a signed number (-1 for < 0, 0 for 0, 1 for > 0)


float Sel(float a, float b, float c);                    - Returns a if c == 0, or b otherwise, without branching if possible


bool IsFinite(float x);      - returns true if floating point value is finite

bool IsInf(float x);            - returns true if floating point value is infinite

bool IsNan(float x);         - returns true if floating point value is not a number


bool Any(byte4 x);          - returns true if any component of given vector is non-zero

bool All(byte4 x);             - return true if all components of given vector are non-zero





These functions are supported for float and double vectors.


float Dot(float3 x, float3 y);                         – dot product of 2 vectors (1d to 4d)

float3 Cross(float3 x, float3 y);                   – cross product of 2 3d vectors

float Distance(float3 x, float3 y);               – distance between 2 points (1d to 4d)

float DistanceSqr(float3 x, float3 y);         – squared distance between 2 points (1d to 4d)

float Length(float3 x);                                    – length of a vector (1d to 4d)

float LengthSqr(float3 x);                             – squared length of a vector (1d to 4d)

float3 Normalize(float3 x);                           – returns normalized vector (1d to 4d)

float Sum(float3 x);                                         – sum the components of a vector (1d to 4d)




These operations are for manipulating bits in ways commonly implemented by processors and are supported for all integer types.


CountBits – returns number of set bits

FirstBitLow – returns index of first set bit, starting at lowest bit

FirstBitHigh – returns index of first set bit, starting at highest bit

ReverseBits – reverse the order of bits in integer

ReverseBytes – reverse the order of bytes in integer (i.e. swap endian-ness)





Functions are provided to convert values from little to big endian and vice-versa. Examples given use int type but all floating point and vector types are also supported.


To reverse the endian-ness use:


int ReverseEndian(int value);


To convert to or from a specific endian-ness use the following functions. They generate no code at JIT time if no conversion is necessary on the current architecture:


int ToLittleEndian(int value);

int FromLittleEndian(int value);

int ToBigEndian(int value);

int FromBigEndian(int value);





These intrinsics give low level access to miscellaneous processor features.


void DebugBreak() – inserts a breakpoint at this location

ulong GetProcessorCycleCount() – for high resolution timing (*)

ulong GetProcessorCyclesPerSecond() – number of cycles per second, for use in conjunction with (*)


uint GetNumberOfPhysicalProcessors() – number of physical processors in the system

uint GetNumberOLogicalProcessors() – number of logical processors in the system


void YieldProcessor() – should be called in spin lock loops





Access to memory is a complex issue on modern processors especially when writing multi-threaded code. These intrinsic functions expose common processor facilities for dealing with this complexity.


C-UP does not require or define a particular memory model, so the programmer should assume that the weakest real-world memory model is being used. This means you should assume that reads and writes to different locations can be freely reordered by the hardware with respect to each other. It should be safe to assume that reads and writes to the same location cannot be reordered and that dependent reads and writes cannot be reordered, as hardware that doesn’t obey these rules would probably be completely un-usable.


Atomic load/store


Loads and stores of memory that is being accessed by multiple threads (shared memory) should use the atomic load and store intrinsics. These intrinsics inform the optimiser that a particular load or store cannot be optimised away because the data might have been changed by another thread. Furthermore the load or store is guaranteed to be atomic. Because of this guarantee AtomicLoad and AtomicStore are only supported for integer types <= 32 bits in size, bool, all char types, half, float and double.


int AtomicLoad(const int& mem);

void AtomicStore(int& mem, int value);


Atomic loads and stores also enforce certain ordering constraints. Specifically, the order of atomic load and stores will never be changed relative to other atomic operations by the compiler. Their order with respect to ordinary loads and stores can change. There are also special ordering constraints placed on atomic loads and stores with respect to various memory barriers – see Barriers section below.


Atomic operators


Atomic operations are supported only for int and uint types and are typically used to implement thread-safe operations without the use of OS synchronisation primitives.


int AtomicAdd(int& mem, int value);

int AtomicSub(int& mem, int value);

int AtomicAnd(int& mem, int value);

int AtomicOr(int& mem, int value);

int AtomicXor(int& mem, int value);

int AtomicSwap(int& mem, int value);


The above functions return the old value that was in memory.


bool AtomicCompareAndSwap(int& address, int value, int comparand);


Compares the value stored at address with the comparand and if they’re the same sets the memory location to value and returns true. Unlike the other atomic functions, AtomicCompareAndSwap is also supported for long and ulong types.


It is important to note that none of these instructions imply a memory barrier (see Barriers intrinsics section.)


Warning - in practice these instructions do create a full memory barrier on x86 but this is not true of other architectures so explicit memory barriers should be inserted as appropriate. If you don’t insert the appropriate barriers and only test on x86 your code is likely to fail on other architectures.


Spin lock and unlock intrinsics are also provided, which are composed from various atomic and barrier intrinsics.


void Lock(int& lockVar);

void Unlock(int& lockVar);




Barriers are used in lock free multi-threaded programming to ensure that different cores have a consistent view of memory. They are necessary for 2 reasons:


1)      The compiler’s optimiser can re-order instructions in ways that are undetectable in single threaded code, but are disastrous when multiple threads are executing concurrently.

2)      Many modern CPUs reorder reads and writes to memory to improve performance.


The barrier intrinsics all address both of these issues to a greater or lesser extent. A full explanation of why these barriers are necessary and how to use them is far beyond the scope of this document, but that information is widely available on the internet.


void MemoryBarrier();


This is a full CPU memory barrier meaning that no loads or stores are allowed to cross this barrier in either direction. It is also a full compiler barrier meaning that the compiler won’t move loads or stores either direction across it.


void LoadBarrier();


This ensures that all loads before the barrier are completed before any loads that start after it. It is also a full compiler barrier meaning that the compiler won’t move loads or stores either direction across it.


void StoreBarrier();


This ensures that all stores before the barrier are completed before any stores after the barrier. It is also a full compiler barrier meaning that the compiler won’t move loads or stores either direction across it.


void AcquireBarrier();


This barrier prevents loads and stores after the barrier being moved before the barrier by either the CPU or the compiler. It also doesn’t allow atomic loads and stores before the barrier to be moved after the barrier by the compiler.


void ReleaseBarrier();


This barrier prevents loads and stores before the barrier being moved after the barrier by either the CPU or the compiler. It also doesn’t allow atomic loads and stores after the barrier to be moved before the barrier by the compiler.


void CompilerBarrier();


Doesn’t generate any instructions but prevents the compiler from moving loads or stores across it in either direction.


void CompilerLoadBarrier();


Doesn’t generate any instructions but prevents the compiler from moving loads across it in either direction.


void CompilerStoreBarrier();


Doesn’t generate any instructions but prevents the compiler from moving stores across it in either direction.




It’s sometimes necessary to load from and store to unaligned memory addresses. This is very cumbersome to implement manually and processors often have instructions for just this purpose. C-UP exposes these processor instructions as two intrinsic functions which are overloaded for all built in value types (including vectors.)


int LoadUnaligned(const int& mem);

void StoreUnaligned(int& mem, int value);


These functions are used in the System.IO.MemoryStream class to perform efficient streaming of data from memory.


Unaligned loads and stores are not atomic.




There is a generic (non-atomic) swap function in the Intrinsic library. It can be used to perform in-place swaps of any type that doesn’t prohibit assignment.




There are several functions which allow you to manually fetch data into the processor cache.


void PrefetchRead(byte& mem);

void PrefetchReadStream(byte& mem);

void PrefetchWrite(byte& mem);

void PrefetchWriteStream(byte& mem);


The stream variants are intended for data you only intend to touch once. The non-stream variants are for data you intend to touch multiple times.


On x86 the above functions generate the following instructions:


PrefetchRead = prefetcht0

PrefetchReadStream = prefetchnta

PrefetchWrite = prefetcht0

PrefetchWriteStream = no code generated (use uncached writes, below)


Use of these instructions is generally best avoided as modern processors have advanced automatic prefetchers and you’re more likely to hurt performance than help it using these intrinsics. I might remove them.




If the processor supports it you can performed aligned stores of certain data types (int, uint, long, ulong, float, double, 16 or 32 byte vectors) bypassing the cache. This is useful for avoiding cache pollution when writing out streams of data that you don’t intend to use again in the near future.


void StoreStream(float4& mem, float4 value);


If the processor you’re using doesn’t support this then a normal store is performed.


After performing a sequence of streamed stores in parallel code you should insert a MemoryBarrier() to ensure that other processors get a consistent view of memory.






Entry Point


Each executable program must have a single entry point, meaning a single function that is called when execution of the program begins.


This entry point must be a public function, called Main, which returns an int and takes an array of strings as its only parameter.


This function can be a static class member or a module scope function.


public int Main(string[] args)


    Console.WriteLine(“Hello world”);

    return 0;


When compiling an executable, it’s an error if zero functions or more than one function matching these criteria is found in the entire program.


When compiling a library, it’s an error if any function matching these criteria is found in the entire program.




There are a couple of special attributes you can use on the main entry point function of your program.


The first one allows you to control the size of the stack allocated for you main fiber in kilobytes. If you don’t supply one 64kb is used. Use it like this:



public int Main(string[] args)


    return 0;



The second is called [verbose] and putting it on the main entry point causes it to report the stack size, order of static initialisation and command line arguments to the main function.



If you call a function that isn’t currently linked (i.e. the appropriate dll isn’t loaded) you will actually get a System.NotLinkedException thrown. You can use this to check for the presence of OpenGL functions.






Generics allow you to write code once and use it many times for any number of different types. They are extremely useful for creating commonplace data structures and functions and boilerplate code.


Generic parameters can be one of the following (referred to as the generic parameter mode):

-          A type

-          A literal (const)

-          A mixin or an anonymous mixin (mixin)

-          A heap id (heap)

-          A parallel set (parallel)

-          An expression – only allowed for mixins (expr)


The different generic parameter modes require different prefixes (except for a type which is indicated by the absence of a prefix) and generics can be overloaded on number of parameters and the modes of those parameters. The prefixes are specified above in brackets.


So the declaration of a mixin accepting one each of the above might look like this:


mixin ExampleMixin< T, const A, mixin B, heap C, parallel D, expr E >



At present you cannot constrain types or constants to use a particular type.



Generic parameter substitution is quite flexible – the only real restriction being that you can’t use a generic argument on the right hand side of the dot operator (i.e. on the right of a member access/dereference). E.g.


mixin Example<TYPE, expr EXPR>


    TYPE a = new TYPE();                                  // ok

    const(TYPE) b = new TYPE();                   // ok – can const-ify a type

    a = TYPE.MaxValue;                                    // ok – can access member of a type

    TYPE[] d = new TYPE[10];                          // ok – can make an array of a type

    int[TYPE] e;                                                     // ok – can use type to define array size

    typeof(SomeType.TYPE);                         // not ok – can’t use generic on rhs of . operator

    var f = new TYPE();                                      // ok – type inference works


    if (EXPR) DoSomething(EXPR);                               // ok – can use expr anywhere it makes sense

    TYPE g = anArray[EXPR];                           // ok – as above

    TYPE* next = EXPR.GetNext();                               // ok – can invoke member of EXPR

    TYPE* item = this.EXPR;                            // not ok – can’t use generic on rhs of . operator

    TYPE* item = EXPR:                                     // ok – the correct way to do the above is with implicit ‘this’





Generic classes use the familiar syntax of c++ templates and c# generics.


class List<TYPE, const SIZE, parallel PARALLEL_SET>


    TYPE[] data;

    public this() { data = new TYPE[SIZE]; }

    TYPE GetData(int index) : PARALLEL_SET { return data[i]; }


All uses of a generic class with the same generic argument values equate to the same instantiation, which occurs in the scope of the original generic class declaration.





Generic functions use the familiar syntax of c++ templates and c# generics.




    return cast(RET_TYPE)(arg * SCALE);



All uses of a generic function with the same parameters equate to the same instantiation, which occurs in the scope of the original template function declaration.


Parameter type inference is performed on leading type parameters and return type when invoking a generic function so you don’t need to explicitly specify the generic types involved. If any type cannot be inferred or a single type resolves to multiple incompatible types then an error is reported.


Leading type parameters means that types can only be inferred for parameters that are not preceded by a non-type generic parameter. So the above MyGenericFunc function will work when invoked like this:


int x = MyGenericFunc<int, float, 2.0f>(123.45f);      // ok to explicitly state everything


int x = MyGenericFunc<2.0f>;          // leading types are implied, SCALE is provided explicitly


But if we reorder the parameters then inference of ARG_TYPE cannot occur and you’d have to specify types explicitly.




    Return cast(RET_TYPE)(arg * scale);



int x = MyGenericFunc<2.0f>(123.45f);          // error – ARG_TYPE after const SCALE, can’t infer


int x = MyGenericFunc<int, 2.0f, float>(123.45f);      // ok, but messy





Generic delegates are also supported. You declare one exactly like a generic function but prefixed with the delegate keyword. This exactly corresponds to the way non-generic delegates are declared.





Unlike generic classes and functions, mixins are instantiated at the scope where they are used  rather than at the scope where they were declared. They are intended to allow the injection of common code at any point.


Here’s a mixin that’s a code block which prints out the name and size of a generic type, followed by a function that uses it to print a few types out.


mixin TestSizeAlign<TYPE>



        Type sym = typeof(TYPE);

        string name = sym.Name;

        Console.WriteLine("sizeof(" + name + ") = " + sizeof(TYPE));




public void Test()










Mixin code is injected directly at the scope it’s instantiated at (i.e. the entire structure of the mixin is copied creating a new, unique copy of the code). In some respects they work like C macros, except that the compiler knows about mixins and they must internally obey the syntax rules of the language so you can’t use them to bastardise the language in any way.


Because code is directly injected it was necessary to introduce the extra scope inside the TestSizeAlign mixin above, otherwise trying to use 5 of them in one scope would cause the symbol ‘sym’ to be multiply defined. Currently mixins can only be anonymous but plans are afoot to allow identifiers to be used so the same code can be mixed in multiple times in the same scope.


Note that it is perfectly valid to declare a mixin with no generic parameters, in which case the < > can be omitted.



Mixins as generic arguments


As noted above you can use a mixin as a generic argument when instantiating another generic. This allows the injection of user code into some generic code.


An example from the system libraries is in the System.Array module, wherein there are two generic Sort function for arrays.


The first generic sort function takes a single generic parameter which is the type of the array being sorted. As the first formal parameter is an array this type can be inferred. This is the simplest and cleanest way of sorting an array, but it requires that the elements being sorted support the less-than operator.


void Sort<ARRAY_TYPE>(ARRAY_TYPE array);


int[] arr = new int[10];



If you want to sort a type that doesn’t have the < operator (or you want to sort into an order that this operator doesn’t encapsulate) then there’s another Array.Sort function.


void Sort<ELEMENT_TYPE, mixin LESS_THAN>(ELEMENT_TYPE[] array);


With this function you must supply the element type of the array and a mixin that returns whether element ‘a’ is ‘less than’ element ‘b’. ‘a’ and ‘b’ are just the parameter names expected by the implementation of the Sort function.


To call this method, you can either define a named mixin or use an anonymous mixin directly in place of the second parameter. Anonymous mixin arguments are the body of a parameter-less mixin (enclosed in { } ) directly embedded as a generic parameter.



struct MyStruct


    public int x, y;


MyStruct[] arr = new MyStruct[10];

mixin MyStructLessThan            // note. no <> required on parameter-less mixin


    return a.x < b.x || (a.x == b.x && a.y < b.y);



void TestSortArray()



    // this is equivalent to the above line

    System.Array.Sort<{return a.x < b.x || (a.x == b.x && a.y < b.y);}>(arr);



This sort function works by defining a nested function inside itself, into which the body of the supplied mixin is inserted. The sort calls this nested function (which will usually be inlined away) to compare elements of the array being sorted. Open up System/Array.cup to see the full implementation.


It’s important to note that when you use an anonymous mixin that it will not currently match to any other anonymous mixin so any place you call generic function using an anonymous mixin you will create an entirely new instance of that function.



Expressions as generic arguments


Mixins also allow you to specify a generic argument that is an arbitrary expression. This makes the use of boilerplate code much more flexible and useful.


Let’s say you make a mixin that implements the behaviour of an intrusive linked list, that is a linked list where the next element pointer is embedded inside the items of the list.


mixin IntrusiveLinkedList< ITEM_TYPE, expr GET_HEAD, expr GET_NEXT >


    ITEM_TYPE* GetNext()


        return GET_NEXT;



    static ITEM_TYPE* GetHead()


        return GET_HEAD;




You could then embed this logic into any class you wish and the way the head and next pointers are organised is up to you (rather than it being built into the linked list logic.)


class MyThing


    static Thing* firstThing;

    Thing* nextThing;


    IntrusiveLinkedList<MyThing, (firstThing), (nextThing)>;





MyThing[] things = new things[100];

short firstThingIndex = -1;


MyThing* GetThing(short index)


    If (index < 0)

        return null;

    return &things[index];



class MyThing


    short nextThingIndex = -1;


    IntrusiveLinkedList<MyThing, (GetThing(firstThingIndex)), (GetThing(nextThingIndex))>;



Note that the expressions passed in as generic arguments must always be enclosed in parentheses. This is necessary to remove ambiguity about whether something is a type or an expression.






Variable modifiers




Static variables are only instantiated once in the entire program, regardless of the scope they appear in. All variables declared at the module scope are implicitly static.




Const variables must have an initialiser and cannot be reassigned once that initialiser has been executed.




Readonly variables can only be initialised by their own initialiser or in the constructor of the class they appear in.


Static readonly variables can only be initialised by a static initialiser.




Memory Heaps


The C-UP language and runtime include built in support for multiple memory heaps. It’s common in game development to split memory into different heaps because people like to allocate independent budgets for textures, audio, animations, etc. and having things allocate from separate heaps makes enforcing budgets and tracking memory use simpler. Furthermore, very different memory allocation schemes may be required for different resource types. Also, in a garbage collected environment having separate heaps allows you to control the performance impact of garbage collection by keeping your actors in a separate heap to large objects like textures which are very expensive to move around in memory at inopportune moments. Furthermore, if you’re streaming data into one heap and another one needs to perform garbage collection, the runtime knows that the async file operation and the GC won’t interfere with one another, so both proceed at full speed.


In C-UP there’s another reason multiple heaps are important and that’s parallel execution because heaps can be declared as constant temporarily which allows parallel code to read from them freely. More details can be found in the Parallelism chapter.


C-UP supports up to 248 user memory heaps, with another 8 being reserved by the system.


To declare a memory heap you use a heapdef declaration:


heapdef MyHeapName;


A heapdef just declares a tag for a heap, which has the type ‘heapid’. The heapid type is a special program wide enumeration type which can have values added to it by heapdef declarations.


To allocate memory from a particular heap you pass a heapid value in a heap() directive to the new operator.


int[] someInts = new heap(MyHeapName) int[256];


In the above case the heap can be a heapdef value or a variable of type heapid. This allows you to allocate from different heaps dynamically.


heapid ActiveHeap = MyHeapName;

int* myInt = new heap(ActiveHeap) int;


If no heap index is specified then the default heap is used. The default heap is always the main garbage collected heap.


Class Heaps


You can declare that a particular class must always be allocated from a particular heap by placing the heap(N) directive after the name of the class where it is declared, where N must be the identifier of a heapdef. This is only possible for a class that does not have a base class.


heapdef TheTextureHeap;

heapid ActiveTextureHeap = TheTextureHeap;


class MyTexture heap(TheTextureHeap)



Class MyTexture heap(ActiveTextureHeap) { }     // error – required heapdef, not heapid


When a class is bound to a particular heap, you cannot create an instance of that class in any other heap or on the stack. Also, any pointer to that class will be known to point to that heap which can be used to enable parallel execution. It’s not necessary to re-specify the heap index when you instantiate such a class using new.


Heap Constraints


Pointers and dynamic array types can specify a heap directive, which states that the data they refer do will be in a particular memory heap. When they are assigned to, the heap of the given address is dynamically checked, unless it can be statically guaranteed to obey the constraint (i.e. because the type pointed to is always in that heap or it’s being assigned a value that has already been checked). The heap directive follows the * or [] and because the heap is part of the type you can overload functions based on the heap.


If no heap directive is given then the data can be in any heap. It follows that a pointer with a heap constraint will implicitly convert to a pointer with no heap constraint (meaning any heap). Converting a pointer with no heap constraint to one with a heap constraint requires an explicit cast and is subject to a runtime check. A pointer with a heap constraint can never be cast to one with a different heap constraint.


char[] heap(NamesHeap) GetName(SomeObject* heap(MiscHeap) somethingInMiscHeap);



Heap management


Default heap


The only heap that is initialised when a program starts is the default heap. This heap is managed by the system and can grow and shrink at runtime. It is always garbage collected and is the only heap which can dynamically change capacity at runtime by allocating more address space from the system.


All other heaps are actually allocated from within the default memory heap and are given a fixed amount of contiguous address space at creation time. This is necessary because the parallel job manager needs to be able to lock entire heaps efficiently and it can only do so if they’re contiguous. The default heap cannot be locked for parallel processing and so is allowed to be non-contiguous. Although the entire address space for non-default heaps is reserved up front, the physical memory committed can grow and shrink as necessary.


It is not usually necessary to explicitly reference the default heap because the absence of an explicit heap directive implies use of the default heap. However, if you wish to explicitly reference the default heap you can do so using heapid.Default. E.g. to force a garbage collection on the default heap do this:






User heaps


Heapdef declarations merely allocate a unique heapid. It’s up to the user to initialise each heap before it is used and the System.Memory module contains the required functionality.


System.Memory contains a static array called Heaps, which is indexed by heapid values (which are just uints under the hood). It’s up to you to assign a memory heap object to the entry in ‘Heaps’ for every heapdef value you plan to use in your program.


Memory heap objects are all sub-classes of the abstract class System.Memory.MemoryHeap. Here’s the hierarchy of system supplied heap types:



    ManualMemoryHeap       - abstract base class for heaps where manual memory deletion is needed

        CMemoryHeap              – access to the underlying C malloc/free functions

        StackMemoryHeap                     – allocate with a linearly increasing address, free in opposite order

        StackFrameMemoryHeap         – as above but you can free entire blocks of memory in one go

    AutomaticMemoryHeap                – abstract base class for heaps where memory is deleted automatically

        GcMemoryHeap            – a garbage collecting memory heap


To create a new heap and assign it to the user heaps master array you do the following:


import System.Memory;


Heaps[TheTextureHeap] = new GcMemoryHeap(1024 * 1024 * 256, 1024 * 1024);

Heaps[EntitiesHeap] = new PoolMemoryHeap<Entity>(1024 * 512, 0);


Note that you cannot set or get the default (0th) heap in this way.


The arguments passed to the constructor are the amount of address space to reserve in bytes and the amount initially committed in bytes. It’s up to the particular heap implementation how physical memory is managed. The base MemoryHeap class has properties to get and set the Capacity (reserved) and Size (committed) values for a heap. However it’s recommended that you don’t change the reserved size after you initially create your heap as virtual address space can easily become fragmented causing allocation failures.


It is possible to re-assign Heaps entries at runtime but only if the heap is currently empty. Also note that as address space can become fragmented, allocating and deleting memory heaps repeatedly can cause the system to fail to be able to allocate a linear address range for a new heap.


Manual memory management


It’s recommended that you use garbage collected memory management unless you have evidence that it’s causing performance issues.


However, if you do want or need to use manual memory management, C-UP also makes doing so completely safe for you. Safe in the sense that you won’t crash because you can’t free memory twice and you can’t access memory that has been freed – but not safe from memory leaks. This is achieved by the combination of 2 features:


1)      Delete functions which are used to free memory (see Expressions > New/Delete/Local > Allocation Functions) can and do use the capabilities of the garbage collector to check that memory being freed is not referenced. While doing so ensures safety it is also rather slow (depending on your de-allocation strategy) and so can be disabled for your release builds by setting the ValidateFree member of ManualMemoryHeap to false.


2)      The delete expression nullifies the pointer that is passed to it if that pointer is an l-value. This is crucial to allow (1) to work because the l-value is nulled between taking its value to pass to the delete function and actually calling that function. Without this you’d be in a catch-22 situation where you can’t null the pointer before calling delete but calling delete while the pointer is live will throw an exception that there are still live references to that memory.


Strings heap


String memory management in the presence of multiple heaps raises a couple of questions.


1)      Should all strings be allocated from the default heap, a special string heap or should they be in a heap with the objects that reference them (e.g. entity names in the entities heap)?


2)      When implicitly allocating a string (e.g. during concatenation) which heap should the implicit allocation occur in?


The c-up answer is that unless otherwise specified all string allocations come from Heaps[heapid.String], where heapid.String is a system defined global heapdef.


If you don’t do anything then string allocations will be from the default heap. This happens because the default implementation of new (in the Default module) uses the default heap for allocations from heapid.String if Heaps[heapid.String] hasn’t been set.


If you want a custom heap that contains all of your strings, you should assign an automatic heap to Heaps[heapid.String]. The runtime checks that the heap supplied is automatic because otherwise strings will never be freed.


Heaps[heapid.String] = new GcMemoryHeap(capacity);


All implicit string allocations also come from Heaps[heapid.String]. If you want to concatenate strings and have the resulting string allocated from any other heap then you cannot use the concatenation operator. Instead you must use String.Concat and pass in the heapid you want used. All other string functions that allocate (e.g. Format, ToLower) allow you to optionally specify a heapid.





Data alignment is very important on modern architectures especially when your code needs to run as fast as possible, so C-UP allows you to control the alignment of data in memory using the align(N) directive, where N is a power of 2 integer in the range 1 to 4096, giving a byte alignment.


Alignment of built in types is discussed in the types section.


Class Alignment


The default alignment for a class is the maximum alignment required by any of its members (including inherited members). It’s important to note that this alignment only applies to the start address of the class, not the total size.


The default alignment for a class can be increased by putting the align directive after the name of the class. This is only possible for a class that does not inherit from another class. If you try to give a class a lower alignment than is required for the member variables of that class then the align directive is ignored.


Using the align directive in this way aligns both the size and the start address of the class. The reasoning being that the only reason I know of to use higher alignments than is natural is because hardware direct memory access (DMA) might require it, and if that’s the reason then the size will also need to be aligned.


class MyClass align(128)



Member Alignment


Individual members of a class or structure can be aligned by placing the align directive before them. Any subsequent members will be naturally aligned but you can still be certain of their alignment due to the knock-on effect of the align directive. A member align directive will cause the entire class it’s contained in to have at least that alignment.


class MyClass


    int x;                  // default alignment for this type

    align(32) int y;        // 32 byte aligned

    int z;                  // will always be at a 32 byte aligned address + sizeof(int)



Note that because the size of a class isn’t aligned unless the class align directive is used (see previous section) that sizeof(MyClass) will give 40 and alignof(MyClass) will give 32. Given the following:


Class MyDerivedClass : MyClass


    double MyDouble;



The MyDouble variable will be at offset 40 and the sizeof(MyDerivedClass) will be 48. Incidentally you can retrieve the offset of a variable using offsetof(MyDouble).


However, if you make an array of MyClass or MyDerivedClass, each array element will require 64 bytes, in order to maintain the correct alignment of start addresses.


Reference Alignment


Pointers and dynamic arrays can specify an alignment directive, which declares that the data they refer do will be aligned in memory accordingly. When they are assigned to, the alignment of the given address is dynamically checked for correct alignment, unless it can be statically guaranteed to obey the alignment constraint. The align directive follows the * or [] and because the alignment is part of the type you can overload functions based on alignment (the function overload resolution logic prefers higher alignments because they imply faster execution). Higher alignments will implicitly cast to lower alignments. Lower alignments can be explicitly cast to higher alignments, subject to a runtime alignment check.


Valid alignment values are powers of 2 from 1 to 4096.


void ZeroMemory(byte[] align(16) dest);  // zero memory fast when possible

void ZeroMemory(byte[] dest);                   // otherwise use slow path


Weak References


Following the * or [] of a reference type with the ‘weak’ keyword makes that reference weak. A weak reference is one that is nulled when the garbage collector executes so the memory it references can be reclaimed. They are typically used to implement caching behaviour.


TODO: is there any value in allowing different levels of weakness, possibly with some level of user control over the meaning?


Garbage Collection


The default memory manager is a precise garbage collecting memory manager. This can be overridden at type, module or global scope (see section on new/delete in Expressions).


The fact that runtime types and array lengths are stored in references means that the memory manager has no requirement for a block header in front of each allocation. If you allocate a single byte then only a single byte of memory is consumed unless higher alignment is required in which case some padding might be needed. If you allocate a zero byte structure then no memory is consumed, but you will have a virtual pointer to a structure of that type. The address of this pointer will be a single fixed address the system gives all such allocations. Zero byte structures therefore allow you to implement special marker values, or store interfaces without requiring any heap storage.


Although there are no block headers the garbage collector does require a fairly large fixed size buffer (the size being user configurable) to perform collections. Empirical evidence indicates that it needs about 2% of the available memory to perform well, although the amount of memory needed is actually a function of the number of individual live allocations it finds as opposed to the total amount of memory in the heap. If it runs out of workspace during a collection all that happens is some of the garbage lower down in memory isn’t collected during this run but of course it remains eligible for collection when the collector next executes.


[Note that the same memory is used as temporary workspace by the parallel job runtime as well, because it’s inherently impossible for a GC and parallel execution to happen simultaneously.]


Allocating new memory in the garbage collector is almost as fast as allocating from the stack as it just moves the free memory pointer contiguously through memory (committing new pages as necessary). This also means that objects allocated later sit at a higher effective memory address than objects allocated earlier. It’s a guarantee of the language that although the garbage collector can move allocations closer together because garbage between them is collected they will never change order or get further apart.


The lack of block headers has one other important benefit which is that allocations can be partially collected. Let’s say you have an array; you slice off a section of it (or multiple sections of it) and discard the original array reference. All of the parts of the array before and after the slice are eligible for garbage collection, only the part you still hold a reference to remains in memory. This equally applies when you store a pointer to a member variable of an object, where the referenced data will remain in memory, but the rest of the object can be freed.







All of the usual statements found in C type languages are supported, with a few extras.












The for statement works just like that from the C language, but with a couple of exta capabilities.


Firstly, you can optionally include a pre-body iteration expression by having a 3rd semicolon inside the for statement. If this 3rd semicolon is present then the 3rd expression is inserted at the top of the loop and the 4th expression becomes the standard post loop body iterator. If the 3rd semicolon is omitted then for works exactly as it does in C, meaning the 3rd expression is the post loop body iterator. E.g.


for (int i = 0; i < 10; PreDoStuff(), i++) DoStuff();


Expands to:



    int i = 0;

    while (i < 10)




continue_label:                   // implicit label that ‘continue’ statements jump to





This feature is useful for writing generic code, enabling you to insert generic iteration code at the top of the loop body as well as the bottom.




Switch is used to select which code to execute for a number of possible values of a given variable. It’s akin to performing a series of if-else statements in succession where each one checks the value of the same variable. Use of switch is recommended if many comparisons are to be performed as it’s more concise (less typing), offers better opportunities for optimisation and is arguably easier to read.


Switch can operate on variables of the following types: integer, boolean, character, string, enum.


Each value you wish to compare to requires a ‘case’ block which is executed when the variable has that value. For integer and enum types each case can also compare a range of values, by using the .. operator.


Each case block should end with a ‘break’ statement to transfer control to the end of the switch statement. If the ‘break’ statement is omitted, then the case handler falls through to the next one lexically, which can be useful if you want multiple values to be handled the same way (but a case range won’t suffice.)


Finally, a single ‘default’ block can optionally be used to handle any values not caught by one of the cases.



int anInt = GetRandom();

switch (anInt)


    // handle a single value

    case 0: Console.WriteLine(“zero”); break;


    // handle a range of values (1 to 9, inclusive)

    case 1..9: Console.WriteLine(“one to nine”); break;


    // omitted ‘break’ to handle several values the same way

    case 10:

    case 20:

    case 30:

        Console.WriteLine(“multiple of ten”);



    // default catches any value not already handled


        Throw new exception(“Unexpected value”);



Foreach / foreachr


The foreach and foreachr statements iterate all elements of an array forwards or backwards, returning each element in turn. They can also iterate elements of any class implementing an array indexer and a Length property. You can use array slicing to iterate a sub-section of the array.


foreach (float element, elemIndex; anArrayOfInts) { }

foreachr (float& addressOfElement; anArrayOfInts[0..10]) { }      // element index is optional


The first parameter to foreach is the iterator variable, which can be a pointer to the array element type or the value of the element type. Whether the element is returned by pointer or by value depends on the type of this variable. The iterator variable can be created as part of the foreach or an already declared variable can be passed in, which is more cumbersome but has the advantage that you can access the resulting value after the foreach exits.


After the iterator variable you can optionally specify an element index identifier, which must be separated by a comma from the iterator. You cannot specify the type of this identifier or use an existing variable – the type is inferred from the Length property of the array passed in.


The second parameter (after the semicolon) is the array itself, or an instance of a type implementing array-like behaviour (i.e. an indexer). It’s important to note that this expression is only evaluated once at the start of the foreach. This means that changing the variable (e.g. ‘arr’ in the below example) doesn’t affect the iteration. Of course changing the elements of ‘arr’ during iteration would have an effect.


The array accesses in a foreach do not perform bounds checking as the bounds of the array are inherently adhered to.


int[] arr = new int[10];


// get a local pointer to each element

foreach (int& element, i; arr)

    *element = i;


// get the value of each element, in reverse order

int value;

foreachr (value; arr)

    if (value == 100)


// can access ‘value’ here, which wouldn’t be possible if it were declared in the foreach



Foreach can also be used on 2D arrays, in which case it returns each row of the 2D array (as a 1D array) in turn. This means that using 2 nested foreach statements provides an efficient (no bounds checking, only multiply by stride once at start of each row) and concise way to iterate over an entire 2D array:


void ClearImage(byte4[,] image, byte4 clearColour)


    foreach (byte4[] row; image)

        foreach (byte4& pixel; row)

            *pixel = clearColour;



The reverse iterator foreachr can be used in the above example to iterate the image from bottom to top and/or right to left. It is also provides a simple way for iterating an array list while removing items from that list.




The using statement is just convenient shorthand for guaranteeing correct freeing of allocated resources. It’s recommended that you use using wherever possible as it’s much more concise and readable than the code it expands to.


using (ResourceType* resource = new ResourceType())


    // user code


This implicitly expands to the below code:



    ResourceType* resource;




        resource = new ResourceType()


        // user code




        delete resource;




Note that the expanded code functions correctly for value types as well as reference types, with the delete just invoking the destructor for values.





All parallel execution in C-UP occurs inside the parallel statement. A parallel function invoked inside the block of a parallel statement is just queued up and executed when the parallel statement exits. A simple example:


parallel void ProcessSomething() { }     // note. parallel function


void Test()


    parallel         // parallel statement


        for (int I = 0; I < 1000; i++)

            ProcessSomething();   // parallel function in parallel statement is queued

    }  // all ProcessSomething are run in parallel here (with auto dependency checking)



The parallel statement can optionally accept a single integer argument which gives the number of iterations. Using this causes all the queued parallel functions to be called multiple times but avoids much of the work involved in initially queueing those functions making it more efficient than just putting a loop inside the parallel statement.


parallel void ProcessSomethingA() { }

parallel void ProcessSomethingB() { }


void Test2()


    parallel (10)







The above will call A, B, A, B, A, B, … with each being called 10 times in total.


For more information on parallel functions see the Parallelism chapter of this document.






Arithmetic operations all happen at a minimum width of 32 bits.


When an integer value less than 32 bits wide is loaded from memory it is sign or zero extended to 32 or 64 bits wide, whichever is natural for the architecture. Sign extension is used for signed values and zero extension for unsigned. When a floating point value less than 32 bits wide is loaded, it is converted to a 32 bit wide float.


When an integer value less than 32 bits wide is stored, it is just truncated with only the low bits being stored. When a float is stored as a half it is converted to the half format.


Components of integer vector types are not expanded to 32 bits on load, they retain their natural width. However, the components of a half float vector are expanded to full floats on load, and shrunk again on store.


Shifting right always brings in zeros at the high bit for unsigned values and duplicates the high bit for signed values. Because of the requirement that integers are represented with two’s complement for, this means it is always safe to use a right shift to perform integer division by powers of 2.


Floating point remainder is defined as: a = x – Floor(x / y) * y.


[Implementation note. On x86 all floating point math is implemented using SSE3 instructions. Processors not supporting this instruction set are themselves not supported by C-UP on x86.

There is also built in support for SSSE3 and SSE4.1 which is more efficient in certain cases, and it used automatically by the runtime if detected.]


Vector arithmetic and comparison is always component-wise. Operations across the vector other than swizzling (e.g. dot product) are implemented as intrinsic functions. Vector types have some limitations and quirks though:


-          Vector comparisons result in a vector of the same type as the source operands, where each component is set to all 1 bits if the comparison was true, or all 0 bits if not. These result vectors are then typically used as a mask for a subsequent operation (e.g. using the Sel intrinsic). However, they can also be converted to a single result using the Any and All intrinsic functions.


-          Division is not supported on integer vectors. Hardware support for such an operation is not common and simulating it would be very slow. As the whole point of vector processing is speed, it’s best to avoid dividing integer vectors.


-          You can only shift by a single value. I.e. you can’t shift the individual components of a vector by different amounts, so there is no shift vector by vector operator.





Multiplication inherently produces a result twice as wide as the source operands. This means there are 3 possible ways of interpreting the result of a multiplication all of which are useful in different situations:


1)      The low half of the result (this is the normal mode of operation for multiplication.)

2)      The high half of the result.

3)      The full result, which is double the width of the source operands.


Instead of making new operators for 2 and 3 which is messy we allow the user to tell the compiler what they want by using existing operators in conjunction with the standard multiply operator.


High result


To get the high word of a result, the multiply must be enclosed in parentheses and immediately followed by a right shift by a constant greater than or equal to the number of bits in the source operands. E.g. to multiply 32 bit integers and get the high result:


int a, b, c;

a = (b * c) >> 32;   // get high half as an int


a = (b * c) >> 33;   // this will result in the high half of the result shifted right 1 bit


a = (b * c) >> 31;   // this will result in the low half of the result shifted right 31 bits


Because all integers are promoted to at least 32 bits wide when loading, the equivalent of the above code using shorts or bytes will also produce the expected results.


Full result


To get the full width of a result, the multiply must be enclosed in parentheses and immediately preceded by a cast to a type twice the width of the source operands. The sign of the casted to type does not affect the result:


long d = cast(long)(b * c); // this is a 32 bit multiply, but keeping the full result

ulong e = cast(ulong)(b * c);     // this does a signed multiply, then treats as unsigned


d = cast(long)b * c;       // this is a full 64 bit multiply


Note that the third example gives the same result as the first two, but is probably much less efficient. This is because that example promotes one of the source operands to full 64-bit width, which causes the other operand to also be promoted and a full 64-bit wide multiply to occur. In the first two cases even though a 64-bit result was produced, the processor only had to do a 32-bit multiply and even on 64-bit architectures this can still be significantly quicker.




All of the above multiplication behaviour also extends to integer vectors but as there is no support for vectors of long/ulong, the full width variant isn’t available for vectors of int/uint.



Saturating Arithmetic


Normal integer addition and subtraction just wrap around if the result is too large – this is known as overflow. When performing image or audio data processing, this gives grotesque results and it’s much better if an overflow is clamped to the minimum or maximum value supported by the type in question. This is what saturating arithmetic does, and in C-UP it’s accessed through 3 new operators:


|+| is a saturating addition (|+|= can also be used).

|-| is a saturating subtraction (|-|= can also be used).

scast(type-name) is a saturating narrowing conversion to the given type.


Saturating addition and subtraction are supported for all integer types, and for vectors of 8 and 16 bit integers. 32 bit integer vectors are not currently supported.


Saturating casts are supported for all integer and integer vector types when converting to a narrower type.


Operator Precedence


Operator precedence differs somewhat to C. It has been flattened - specifically the bitwise operators don’t have a level each and all relational operators have equal precedence. Shifts have moved to an equal footing with multiply and divide as I can never remember their precedence in C and multiplication or division are frequently what they’re used to achieve. && and || have retained a level each as a survey of peers indicated that several people do rely upon this behaviour.


1.       Primary

. (module scope prefix), new, delete

2.       Postfix

[] array access, () function call, . member access, ++ (post), -- (post)

3.       Unary

cast(), scast(), as(), +, -, !, ~, ++ (pre), -- (pre), &, *

4.       Multiplicative + shifts:

*, /, %, <<, >>

5.       Additive:

+, -, |+|, |-|

6.       Bitwise:

&, |, ^

7.       Relational

==, !=, <, <=, >, >=

8.       &&

9.       ||

10.   ? :

11.   Assignment

=, +=, -=, |+|=, |-|=, *=, /=, %=, <<=, >>=, &=, |=, ^=





Increment / decrement


The behaviour of these operators in a complex expression is a source of confusion in C. What is the value of y at the end of the following code?


int x = 2;

int y = x++ + x++ + x++;


The answer is that it’s undefined. The postfix ++ operator need only happen somewhere after the value of x is taken but before the end of the expression. It’s undefined where it happens with respect to other ++ operators inside the same expression. In Microsoft’s Visual C++ compiler, 3 is added to x at the very end of the expression, giving a value for y of 6:


                y = 2 + 2 + 2; x += 3;


Even worse, enabling or disabling compiler optimisations can lead you to get different results for the same program!


In c-up this behaviour is properly defined, and the operations occur in the order they are written. This gives y the value of (2) + (2 + 1) + (2 + 1 + 1) = 9, in the above expression:


                y = 2; x += 1; y += 3; x += 1; y += 4; x += 1;


It’s still recommended that you avoid writing expressions like the above though, unless you’re entering a competition or something.



New / delete / local


New is used to allocate memory and invoke constructors.


Delete is used to free memory and invoke destructors.


Local in this context is used to perform local allocations (i.e. memory allocations on the stack) which are freed when the allocating function returns.


New and delete look similar to C++:


int* p = new int;

delete p;


MyClass* p2 = new MyClass(1, 2, 3);

delete p2;


byte[] arr = new byte[100];

delete arr;                              // no need for special array delete[]


MyClass[] arr2 = new MyClass[10];        // no automatic construction on arrays

foreach (MyClass& c; arr2) c = MyClass();       // you have to do it manually if you want it

delete arr2;


Delete has a couple of significant differences to c++:

-          You don’t use delete[] when deleting an array

-          If the pointer you delete is an l-value it is automatically nulled by calling delete


To allocate memory from a specific heap, pass the heap identifier or a variable of type heapid after new:


int* p = new heap(MyHeapIndex) int;


To allocate memory with a specific alignment, pass an align directive. Alignments up to 4096 are supported. If the alignment passed is lower than the default alignment for this type then the default is used.


int* align(256) palign = new align(256) int;

int* palign = new align(256) int;        // pointless alignment, will be lost


Note that the alignment of allocations is only stored in the pointers the allocation is assigned to, so although you can assign to a pointer with no alignment constraint (as in the second line above) the fact that this allocation is aligned is immediately lost. Although you can use an explicit cast to retrieve it the garbage collector won’t know about it and so when a collection occurs it’s extremely likely that the data will end up misaligned afterwards.


The local keyword is used after new to dynamically allocate memory on the stack.


int& localInt = new local int;

int[] local localArr = new local int[100];


For types that don’t contain references the zeroing of stack memory on a local new can be disabled by placing = void after the new expression.


int& localInt = new local int = void;

int[] local localArr = new local int[100] = void;


To construct a value on the stack just use constructor syntax but without the ‘new’ keyword. Use of this syntax is guaranteed not to perform copying – the construction will occur directly on the object on the left of the assignment. If you use this syntax anywhere other than the right hand side of an assignment then a temporary variable is created and copying might occur.


MyClass inst = MyClass(x, y, z);

MyGenClass<int> genInst = MyGenClass<int>(1, 2, 3);




Initialisation of newly allocated memory occurs in three phases.


First, all memory returned by the memory manager is zeroed. In the supplied memory managers it isn’t zeroed when you ask for it but rather when memory is freed. This is more efficient because memory that is de-committed back to the OS can be zeroed by the OS on a separate thread before it is returned to you again. When you perform a stack allocation using “new local” the stack memory is zeroed before it is returned (unless = void is used after the new expression.)


Second, if you are allocating a new class then all member variable initialisers of the allocated type are executed. Initialisers allow simple initialisation that cannot be bypassed even by constructors. They can contain arbitrarily complex expressions, but not statements. If you allocate an array of objects, initialisers are not run on the elements of that array – this happens when each individual element is constructed in place.


class MyClass

    float Var1 = 10.0f;

    float Var2 = Var1 * 2.5f;

    int Var3 = 10;

    float[] Var4 = new float[Var3];



Third, the appropriate overloaded constructor is executed. As with initialisers, if you’re allocating an array of objects then no constructor is executed on the elements of the array. To initialise and construct elements of an array you can iterate the elements after allocation performing in place construction (or this can be done as and when you need to start using a particular element). Alternatively, you can use an array initialiser (see next section).


Using foreach allows you to initialise the elements simply:


class Blah


    this(int x, int y, int z)




Blah[] someBlahs = new Blah[10];

foreach (Blah* blah; someBlahs)

    *blah = Blah(1, 2, 3);        // guaranteed NOT to do a copy


Note that assigning a newly constructed value (as in the above example) is guaranteed to execute in place on the location being assigned to, so there is no performance penalty incurred by copying.


Array Initialisation


Dynamic arrays can be given initial values for all elements at the time they are created. This can only be done if the number of elements being allocated is a constant value, and the number of initialisers must exactly match the number of elements allocated. When you initialise a locally allocated array this way the memory isn’t zeroed first as it’s known that all elements of the array will be initialised.


uint[] arr = new uint[5] {11, 22, 33, 44, 55};

uint[,] local arr2d = new local uint[2, 3] {{10, 9}, {8, 7}, {6, 5}};


Fixed size arrays can also have their elements initialised when they are declared:


uint[5] arr = {11, 22, 33, 44, 55};

uint[3,2] arr2d = {{10, 9}, {8, 7}, {6, 5}};





There are no automatic variables in C-UP. This means that no destructors are ever executed unless you specifically invoke them using the delete keyword. If you want to simulate automatic local variables, you should do so with the ‘using’ statement.


If you call delete on a pointer to a type for which no ‘delete’ function can be found (see Allocation Functions) then no memory is freed, but the destructor is still executed. For this reason it is legitimate to see delete being used even with garbage collected data - it’s equivalent to calling Dispose in C#.


Of course it’s also possible to delete a value in order to invoke its destructor. E.g.


class TestClass


    public this()


        Console.WriteLine(“TestClass ctor”);



    public ~this()


        Console.WriteLine(“TestClass dtor”);




TestClass aTestClass;

delete aTestClass;


To invoke destructors on array elements, you must iterate the elements of the array yourself calling delete on each one in turn.


TestClass[10] testClasses;


// construct all array elements

foreach (TestClass& c; testClasses)

    *c = TestClass();


// destruct all array elements (note. destroying in reverse order using foreachr)

foreachr (TestClass& c; testClasses)

    delete c;



There are no finalisers in C-UP garbage collector for several reasons:

1)      They’re a big burden on the garbage collector, making the system significantly less efficient

2)      They’re confusing to many people

3)      They’re of almost no value when you consider that they are only guaranteed to be run before program termination and at termination the OS will release all resources back to the system anyway. If you are using a limited resource (e.g. file handle) you should really be doing so with the ‘using’ statement whenever possible.


Allocation Functions


The default memory manager in C-UP is a garbage collecting memory manager, but the use of this memory manager isn’t actually built into the language at all.


Whenever a new or delete expression is seen a search for a special function called ‘new’ or ‘delete’ proceeds from the scope of the type being created or deleted. If an array or pointer is being created, the search is from the scope of the element type or reference type being created.


New functions have the following limitations:

-          They must return a new 1d long dynamic array of bytes: “byte[long] new”. This is implicit so it is not permitted to explicitly specify it.

-          The first parameter must be a long specifying the size of the allocation in bytes.

-          The second parameter must be an int specifying the alignment of the allocation in bytes.

-          An optional 3rd parameter gives the id of the heap to allocate from. This is passed if the user specifies a heap directly as in “var x = new heap(Foo) TypeX;” or if the type being created is constrained to only exist in a particular heap.

-          They are always implicitly static.


new(long size, int align);

new(long size, int align, heapid heapId);


In addition, any number of additional user defined parameters can then be accepted. These are passed by the user like so:


class Banana


    byte[] new(long size, int align, string allocTag);



var fruit = new(“A tag”) Banana();





Delete functions have two valid signatures. The version with an explicit heap id is only called when deleting a value known to be in a particular heap through heap constraints on the reference or type being deleted. They have two long parameters giving the base address and size of the memory being freed. They implicitly return void, but this must not be explicitly specified.


delete(long address, long size) const

delete(long address, long size, heapid heapId) const


There are two reasons that delete uses an integer for the address, rather than a pointer or array reference:


1)      A pointer or array reference is no use to you in these circumstances in C-UP. All memory allocation is based on arrays and an integer is what you need for indexing arrays. Using the fact that you can convert a dynamic array to a long integer you can work out the offset into the storage for a heap using: address – cast(long)Memory.


2)      C-UP actually makes manual memory management as safe as garbage collection. It does this by using the GC facilities to check if there are any live references to the memory being freed (in debug builds). If the address being freed was passed to delete as a reference, then it would always find itself as a live reference. See the sections on the delete expression and memory heaps for more information on how C-UP makes manual memory management safe.


The default GC new and delete functions are found in the Default module, which is implicitly imported into every module but only after all user imports have occurred. They pretty much just call the Allocate and Free functions of the appropriate heap. See the section Managing Heaps to find out about creating and managing multiple memory heaps.


New memory references


The description of the ‘new’ function specifies this return type.


byte[long] new


New in this case is a special qualifier that can be applied to pointers or dynamic reference types, both local and non-local. E.g.


float[] local new someNewFloats;

void& new aNewLocalVoidPtr;

int* new aNewIntPtr;


New memory can only originate from a ‘new’ function. This type can only be used for local variables and function parameters and return types. It cannot be used on the parameters of a parallel function and therein lies a clue to the reason for its existence, which is to allow memory allocations in parallel code.


Within parallel code you can allocate memory via a parallel new function (see Parallelism > Shared memory > Heaps) however this isn’t much use if the returned memory is a byte[long] because as you’ll discover non-local references can’t be dereferenced inside parallel code.

Therefore the idea of new memory was introduced which is memory that is known to be newly allocated and so is safe for use inside the parallel code that performed the allocation, even though it’s not local. New non-local memory references can be implicitly converted to local memory references (either new or not.)





The cast operator performs type conversion. Simple conversions between built in value types are not checked in any way, although only casts that make sense are allowed – the compiler will tell you if it thinks you’re not making sense. Integer truncation is not checked for.


int y = 100000;

short x = cast(short)y;                  // truncates, no overflow exception


When widening an integer type and converting between signed/unsigned at the same time the widening operation always happens first and the conversion between signed/unsigned is a no-op. This means that returning -1 from a generic function will return -1 if the generic type in question is a signed integer or the maximum value for the type if the generic type is an unsigned integer, which makes writing generic code for different array types possible.




Implicit conversion from a scalar type to a vector type is possible if the scalar type converts to the component type of the vector. The scalar value is replicated to all components of the vector.


Implicit conversion between vector types is possible if the component types implicitly convert.


However no conversion is possible to a vector type of higher dimension than the source. In this case swizzle syntax must be used to explicitly state what components to use.




Cast is also used to perform casts between reference types. When casting down a class hierarchy (dynamic down cast) the result is null if the conversion fails. Dynamic down casts are extremely fast due to the single inheritance graph and the fact that the runtime type is embedded in the pointer itself. They just have to test if the runtime type index falls into some statically known range which can typically be done without even branching.


class BaseClass { }

class DerivedClass : BaseClass { }

BaseClass* ptr = new BaseClass();

BaseClass* ptr2 = new DerivedClass();

cast(DerivedClass*)ptr;           // gives null

cast(DerivedClass*)ptr2;          // gives ptr2 as a DerivedClass


You can cast a pointer to a long integer to extract the pointer part in order to compare pointers or perform arithmetic. This differs from using the as operator because it specifically removes the runtime type part from the pointer for you, whereas as gives you the combined pointer and runtime type.




An array can be explicitly converted as long as the element type converts. See the sections on 1d and 2d array types for ways to convert between those.


Byte arrays


Byte arrays have especially liberal casting rules as they are the enabling mechanism for serialising data in and out of storage devices, and for performing memory copy and fill operations:


-          Any pointer to a value type (including classes that only contain value types) can be implicitly cast to a byte array. The resulting byte array will have the highest alignment possible for the type in question. E.g. casting a pointer to a float4 to a byte array will give you the type: byte[] align(16).


-          Any byte array can be explicitly cast to a pointer to a value type (including classes that only contain value types). This conversion is subject to a runtime size and alignment check.


-          Any array of value types can be implicitly converted to a byte array. This is an implicit conversion which gives the highest alignment possible.


-          Any byte array can be explicitly converted to an array of any value type. This conversion is subject to a runtime size and alignment check.



Overload resolution


The cast operator can also be used to disambiguate function call overloading on return type. When cast is used immediately before a function call it influences overloaded function selection by preferring a function returning the given type. This mechanism is provided as a fallback for when automatic selection isn’t giving the results you want – using it is usually unnecessary as overloading works on the parameter types, the type of a variable being assigned to, or the type being returned.


int TestFunc();

float TestFunc();


float MyFunc()


    TestFunc();            // error, ambiguous

    int x = TestFunc();    // ok, inferred from assignment

    cast(float)TestFunc(); // ok, explicitly told which one to use

    return TestFunc();            // ok, inferred because we must be returning a float






The as operator is a blunt instrument for treating data as another type without actually performing any bit conversions on it. Its primary purpose is to avoid having to go via memory using a union to move data between different register files. E.g. To get the value of positive infinity into a floating point variable:


int PosInfHex = 0x7f800000;

float PosInf = as(float)PosInfHex;


On x86 the above example just moves the value from the integer to SSE register file without conversion.


You can also use it to get the value of a pointer as an integer, with the runtime type intact in the top 16 bits:


void* myPointer = new float4;

ulong ptrValue = as(ulong)myPointer;     // top 16 bits are runtime type


There are a couple of things as won’t let you do:

-          Convert to a reference type

-          Convert to a wider type as this would mean having undefined bits




The sizeof operator returns the size in bytes of a given type or variable.


uint sz = sizeof(int);            // = 4

uint sz2 = sizeof(sz);            // = 4




The alignof operator returns the minimum alignment in bytes of a given type or variable


uint a = alignof(double);         // = 8

uint aa = alignof(a);                    // = 4




The symbolof operator returns the System.Reflection.Symbol for any symbol passed to it. System.Reflection.Symbol is an entry in the program symbol table which is the primary mechanism by which reflection is achieved (see Reflection section.)


System.Reflection.Symbol sym = symbolof(SomeSymbol);




The typeof operator returns the System.Reflection.Type of the symbol passed to it.


The type of a type is itself.


The type of a variable is its type. If the variable is a pointer type then typeof returns the actual runtime-type currently referenced by the variable.


System.Reflection.Type type = typeof(SomeType);




To call a function in the direct base class of that containing the current function use the ‘base’ keyword:




To bypass the direct base class you can use static function call syntax and pass the ‘this’ parameter explicitly:





Protection levels


The supported protection levels are public, protected, private and internal. They can be applied to types, functions and variables.


Internal means public to everything in the same package, and can be combined with private and protected.


The default protection level for module scope functions and variables is public.

The default protection level for class scope functions and variables is private.

The default protection level for a type declared at any scope is public.



TODO: describe what the levels mean




In order to achieve maximum performance on modern hardware you need to utilise as many CPU cores in parallel as possible and C-UP is the language that finally makes doing so a practical reality.


There are three primary mechanisms involved in achieving parallel execution in C-UP:


1)      Automatic dependency handling – when the program is asked to execute multiple functions in parallel, the runtime can automatically check if that they don’t modify the same areas of memory and only allow it to happen if so.


2)      Free access to constant data – the programmer can declare that certain types or memory heaps are to be treated as constant for the duration of a particular parallel task, and such memory can be freely accessed by the parallel code.


3)      Streams – streaming is a well-known paradigm for parallel execution, but using streaming to the exclusion of all else is extremely restrictive, so C-UP allows you to use streams only where they are appropriate.


Parallel functions


Parallel execution in C-UP is enabled through the use of parallel functions. A parallel function is declared in the same way as any other function but prefixed with the parallel qualifier.


parallel void Factorial(int n);


Parallel functions can only return void because by definition they don’t compute a value immediately when called but produce results at some later time. Results can only be written to memory locations explicitly declared by parameters. Any memory location accessible solely via local references is accessible in a parallel function.


parallel void Factorial(int& result, int n);


Local references can be followed to any depth in parallel code, but the practical depth is limited by the inherent restrictions on local references. Because local-only structs are allowed to contain local references as member variables they are ideal for implementing task objects.


struct IntegrateEuler local       // local-only struct can only be created on the stack


    float3[] local position;

    float3[] local velocity;

    const float3[] local acceleration;


    public parallel void IntegrateVelocity(float deltaTime)


        foreach (float3& vel, i; velocity)

             *vel += acceleration[i] * deltaTime;



    public parallel void IntegratePosition(float deltaTime)


        foreach (float3& pos, i; position)

             *vel += velocity[i] * deltaTime;



    public this(float3[] local p, float3[] local v, const float3[] local a)


        position = p;

        velocity = v;

        acceleration = a;




In the above example the position, velocity and acceleration data is accessible in the two parallel functions via the ‘this’ pointer (which is local for all member functions of a struct), and then via the local array references.


If a class declaration had been used instead of struct the data would not have been accessible through the non-local ‘this’ pointer, resulting in an error during compilation.


Furthermore, the use of a local-only struct was necessary because only then could the member array references be local. If the array references were not local the data in the arrays would not be accessible inside a parallel function.


Parallel execution

Parallel functions only execute in parallel when invoked from within a parallel statement. Outside of a parallel statement they execute sequentially like any other function. A call to a parallel function inside a parallel statement is not executed immediately – the parameters and address of the function are stored in a queue and execution returns to the caller immediately. When the parallel statement block exits, all queued functions are executed and execution only occurs in parallel where the system can prove that they don’t interfere with each other’s data. In other words, they are automatically dependency checked.


This is possible because of the restriction (explained above) that you can only mutate data via local references, so the system can easily determine the full scope of the data changeable by any function and consequently whether two functions require access to the same data. It’s important to note that two declarations to const data do not prevent parallel execution even if that data overlaps, but overlaps between const/mutable or mutable/mutable data do cause a dependency. In the above example if two functions referenced the same acceleration data they could still potentially execute in parallel.


Functions are given the chance to execute in the order they were queued, but if one is not able to execute due to having a dependency on an already executing function, then that function is parked and subsequent functions are checked for possible immediate execution. A parked function remembers which function blocked it and from that point on need only wait for the other function to complete in order to proceed. That is, no further checking for memory interference with that other function is required although of course it still needs to check against other executing functions in full, with the possibility that it will have to be parked again.


In the following example we have a million velocities and positions to integrate and we would like to integrate them in parallel in blocks of 10,000.



float3[] Positions = new float3[1000000];

float3[] Velocities = new float3[1000000];

float3[] Acclerations = new float3[1000000];


void IntegrateAll(float deltaTime)




        for (int i = 0; i < 1000000; i += 10000)


            IntegrateEuler& job = new local IntegrateEuler(

                Positions[i..i+10000], Velocities[i..i+10000], Accelerations[i..i+10000]);




    }  // no integration occurs until here



Of course, all this dependency checking consumes processor cycles so it’s only worth doing if the number of cycles we save by executing in parallel exceeds the number we spend on dependency checking. There’s no formula for working out whether you will gain or lose in any given situation but you should be looking to split the work up into fairly large chunks per parallel call. In the above example starting 1000000 parallel functions will certainly execute significantly more slowly than executing them in serial. Furthermore, the queue that stores parallel function invocations would almost certainly overflow if you made that many calls at once.


Nested Parallel Execution


You can call a parallel function from inside another parallel function but the inner parallel function can only be passed data that is accessible to the outer parallel function.


In spite of this limitation, calling nested parallel functions can still be a useful way to split up work in cases where the actual process of dividing the work is not straightforward and can benefit from parallelisation itself.


Due to the way memory access is restricted, nested parallel functions only need to perform dependency checking against their siblings; meaning other functions queued for parallel execution inside the same parallel block. A parent parallel function cannot be considered to have finished executing until all nested parallel functions have also completed.


Constant data


Explicitly declaring all data that a parallel function accesses is a powerful technique - it not only allows functions to safely execute in parallel but by declaring the data we need in advance, we open up the possibility that the data can be pre-fetched into memory close to the processor core that will operate on it. In a conventional architecture this might mean fetching it into the cache, but it could also mean moving it to the local memory of a core in a NUMA machine.


Having said that explicit declaration can also be limiting and tiresome in practice, so extra flexibility is allowed for data you need read only access to in that you can say that any number of types and/or memory heaps are considered constant in a particular parallel function.


To declare a set of constant types and heaps you use a parallel set declaration.


parallel PConstantSet : const heap(MyMemoryHeap, AnotherHeap), const type(Texture, Mesh);


You can also inherit one or more base parallel sets and/or add additional constant heaps and types to those of an existing set.


parallel P2ndSet : PConstraintSet, PAnotherImaginarySet, const heap(YetAnotherHeap);


A function which is constrained to a particular parallel set can freely read any data that can be statically proved to be in a const constrained heap, or is of a const constrained type.


An override of a virtual parallel function must use exactly the same parallel set as the original virtual function.


The use of a name starting with P for a parallel set is just a convention that has been adopted by the standard libraries.


Where a const type constraint conflicts with an explicit writable data declaration (via a local reference parameter) an error is given. So if you say that type ‘int’ is constant and then try to pass a mutable reference to a structure containing an ‘int’ you will get an error. This issue is mitigated by the use of typedefs.


Note that const type constraints do not apply to ‘new’ memory, so any memory you allocate locally on the stack within your parallel code can be accessed using the normal const rules.




Typedef allows you to declare an alias for any simple value type (built in type, enum, bitstruct) and allows that alias to be declared constant for the purposes of parallel execution. This is currently the only intended use of typedefs and to make using them as painless as possible they are weakly typed and so implicitly convert to and from their underlying type. Furthermore references to typedefs also implicitly convert to/from references to the underlying type except in parallel code where no such conversion is possible.


To declare a typedef type, use the typedef keyword followed by a type then an identifier:


typedef int MyParallelInt;

typedef byte4 Colour;


For your convenience generically named parallel built-in types are defined for all built-in types and have the same name as the underlying type but start with a capital letter:


typedef byte Byte;

typedef float Float;

typedef double1 Double1;



If you need finer grained separation or you just prefer more meaningful names then of course you can declare your own types.


For the purposes of dynamic typing, typedefs behave as if they are derived from their underlying type. That is, a function taking a virtual pointer to an integer type will be matched by a pointer to a typedef based on that integer type. Also, dynamically down casting a pointer to an typedef type to a pointer to its underlying type will succeed. These rules exist to allow the creation of functions that deal with typedef types en-masse rather than having to deal with every individual typedef type declared in a program, which would be at best incredibly cumbersome and at worst impossible.




Because strings are constant, any string (local or not) that is reachable via local references can be freely accessed in parallel code. Furthermore, local to non-local conversions on strings can happen in parallel code.


If memory that is aliased to the string data is accessible in another parallel function, a dependency is correctly generated between the parallel functions.


You cannot cast a mutable character array to a string inside parallel code.




All static variables are accessible from parallel functions, but they are treated as const.




Type aliasing means it’s not possible to guarantee perfection when using const type constraints.


C-UP allows you to convert between references to different value types so you can, for example, cast an array of ints to an array of bytes. If you then declare that the int type is constant the compiler will still allow you to modify the array of bytes as it has no way of knowing that the byte array aliases an array of a constant type.


Here’s what that might look like:


parallel PSet : const type(int);


void Func(byte[] local b) : PSet


    b[0] = 0; // this just wrote to array arri (below) which was is supposedly constant



void Test


    int[] arri = new int[10];

    byte[] arrb = cast(byte[])arri;




Without compromising the flexibility of the language to an unacceptable degree, it’s difficult to see how this problem can be fixed.



Const constrained functions


Any function can have a parallel set applied to it (not only parallel functions) by following the function declaration with the name of a single parallel set after a colon.


int MyFunction(int a, int b) : MyConstantSet


    return a + b;


Notice that the above function is not parallel and so is allowed to return a value, but will still only be allowed to access data through the given constant constraints, or local data passed to it. This is important because a parallel function can only call another function if that other function has a parallel constant set and that set is a sub-set of the calling function’s set.


An empty parallel set can be assigned to a function which makes it callable from parallel code but with the limitation that it can only access data through local references passed in as parameters. This is most useful for simple helper functions, accessors and operators. It’s also what allows you to call the intrinsic functions built in to the language from parallel code.


int SumInts(int[] local someInts) : parallel


    int total = 0;

    foreach (int i; someInts) total += i;

    return total;



Parallel variables


A variable with the parallel keyword before it is implicitly static. Outside of parallel code it can be manipulated like any other static variable, except its address may not be taken. Inside parallel code it can be read freely, but can only be updated using the intrinsic atomic update functions, which are inherently thread-safe.


Parallel variables can only be of int or uint type as atomic update is a requirement. They are useful for keeping track of the amount of work done in total so far by a group of parallel functions.



Shared memory


The shared keyword allows you to access the same area of mutable memory from multiple jobs simultaneously. This means it’s inherently not thread-safe so it’s recommended that you avoid using it directly and instead rely upon the library support in the System.SharedMemory module.


Because of the inherent danger any use of the shared keyword is prohibited by the compiler unless a specific ‘allow shared’ option is passed in (see compiler section). Requiring this option allows programming managers to prohibit the use of this unsafe feature in their projects, except for in specific thoroughly tested libraries.


The shared keyword can be placed before any variable or parameter that is a pointer or dynamic array type type. It can also be placed directly after the header a function to make the ‘this’ pointer shared. Lastly, it can be used directly after the identifier in a class or struct declaration to make all references to this type shared, without having to place the shared keyword before each reference because having to do that would make the ‘allow shared’ compiler option essentially useless.


Shared prevents the runtime job system from making a dependency between shared references to the same area of memory. The collections in System.SharedMemory use this feature to allow multiple jobs to read or write to the same memory simultaneously. A couple of important points:


-          Shared also allows you to dereference a non-local pointer or dynamic array inside parallel code. Normally this is only possible for local references or const constrained types.


-          Shared/shared aliasing does not create a dependency but shared/const and shared/mutable both do.


-          When accessing shared memory you must be sure to use the atomic and memory barrier intrinsic functions appropriately.


-          Because C-UP doesn’t expose any kind of thread functionality you can only really make lock-free data structures (or spin locks). This is in keeping with the C-UP philosophy of using all machine resources for as short a time as possible to finish a task. The last thing you want in such a system is to be yielding threads back to the OS.




A simple linear parallel heap is provided in the System.SharedMemory module. There are two variants of the ParallelLinearHeap struct. The first uses a constant alignment for the allocations supplied as a generic argument and the second allows for variable alignments. The reason two separate implementations are provided is that an allocation from a constant alignment heap is slightly more efficient.


In order to use these heaps you usually need to provide a custom ‘new’ function for the types you plan to allocate in these heaps, which accepts a local reference to the heap to allocate from:


It’s your responsibility to ensure that the memory passed to these heaps is zeroed. Failure to ensure this will probably cause very hard to find memory crashes.



class MyClass


    // note – must be parallel

    new(long size, int align, ParallelLinearHeap& heap) : parallel


        return heap.Allocate(size, align);




var heap = ParallelLinearHeap(1024);


parallel void Test()


    MyClass* inst = new(&heap) MyClass();       // pass reference to heap to new





The ParallelList type has all the same functionality as a standard list in non-parallel code. Inside parallel code it also allows you to add items to the end of the list or read items in order from a read cursor that is maintained inside the list.




Parallel streams are implemented in the System.SharedMemory module. There are currently 2 stream types, one of which can be read in parallel by multiple jobs and the other of which can be written.


Both types are implemented as local-only wrapper structs around an array. When you construct one of the stream types you must pass in the underlying storage array to be used. For some stream types this array might be primed with data, or it could just be an uninitialized buffer. Either way, the stream buffer cannot be resized during stream processing.


All streams reads and writes must be contained within an OpenRead/CloseRead or OpenWrite/CloseWrite pair as appropriate. All of these functions should be called from inside the parallel function that is performing stream processing.




This is used to pass an array of data into a parallel task. Each item on the stream can be read once and when all data is exhausted further attempts at reading will return failure.



This is used to pass a stream of data out of a parallel task. Once parallel processing is complete the underlying array can be accessed by non-parallel code to interpret the results.


Debugging / Profiling


The C-UP runtime ensures that your code executes correctly by automatically avoiding memory aliasing. It’s still important to check that you’re actually getting a worthwhile speed up from executing in parallel and a couple of tools are provided to help with this.


Firstly, there are functions in the System.Profile.JobQueue class that allow you to analyse the last executed parallel block. The simplest way is to call WriteProfileData passing in a text stream to write the profiling data to. This can be to the console output stream or to a file by using a System.TextStream.StreamTextStream wrapping a System.Stream.FileStream. Here’s how to do that:


using (FileStream* fs = new FileStream(@"C:\Profile.txt", FileMode.Create, FileAccess.Write))

    using (StreamTextStream ts = System.TextStream.StreamTextStream(fs))




Textual results can be hard to interpret so the CupProfileViewer.exe application is provided that shows the results visually.



Each large green strip represents a thread. Time in milliseconds is marked at the top and bottom.


Green rectangles are user code executing. The lighter the shade of green, the deeper the nesting level of the job (i.e. parallel blocks within parallel blocks.)


White gaps are where no code was executing on a thread.


Blue rectangles are job dependency info being added to the system.


Magenta rectangles are job dependencies being checked.


Left mouse click selects a job. Info about duration and dependencies is shown at bottom. The selected job has a thick red border. Jobs it was dependent on are bordered in orange. Note that only jobs that actually blocked execution of this job at some point are shown as dependencies, as opposed to all jobs that would have blocked had they been executing at the same time.


Left mouse drag left/right = Scroll left/right.


Right mouse drag up/down = Zoom in and out.



Exceptions are used to handle exceptional situations that occur during execution of a program.


These situations are often, but not always, unexpected. For example, accessing an invalid area of memory should be unexpected as it usually indicates a bug unless you’re writing an OS, but running out of a finite resource (e.g. disk space, file handles, memory) should be expected and handled, even if it’s only by exiting gracefully with a meaningful error message.


Exceptions can originate from 3 sources:

1.       User exceptions, where the programmer explicitly throws an exception to indicate an error.


2.       Language exceptions, where the compiler has inserted a check which throws an exception if that check fails (e.g. array bounds checking).


3.       CPU exceptions, where the processor executing the program detects a problem which causes the runtime system to throw an exception (e.g. divide by zero, dereferencing a null pointer.)


Exception objects can be of any type you wish, but there is a System.Exception.Exception class in the runtime library which it’s recommended you use as a base class for your exceptions.


Handling Exceptions


Exceptions are handled (or caught) using the familiar try-catch-finally mechanism.




    // user code which might throw an exception


catch (SomeType* pType)


    // catch exceptions of SomeType

catch (SomeOtherType* ex)


    // catch exceptions of SomeOtherType


    // catch anything else




    // user code always executed whether an exception is thrown or not

User Exceptions


To throw an exception when you detect an error in your code, you must throw an exception object.


If (AnErrorOccurred())

    throw new Exception(“An error occurred”);


When you throw an exception the stack is unwound (invoking all finally blocks as it goes) until the nearest catch block is found which handles this type of exception.


You don’t have to create the exception object there and then, it can be a pre-made object that you throw. You cannot throw an exception object that is on the stack as by definition throwing an exception starts unwinding that very stack.


You can re-throw an exception from inside a catch block (just use the throw keyword with no argument), which effectively means you’ve decided not to handle it after all (or that you’ve partially handled it) and want to pass control further back up the stack.


Language Exceptions


There are numerous exceptions which can be thrown by checks the compiler automatically inserts.

As these checks take time they can be disabled at the users discretion using a compiler option (e.g. in a final release build). For this reason it is considered extremely bad form to rely upon them for the correct functioning of the program. For example, relying on an array bounds check exception to stop iterating the elements of an array is very dangerous.


See the section on the compiler to find out how to disable language exceptions.


Here is the hierarchy of language exceptions and the conditions under which they are thrown:



  ArgumentException – used by library classes to indicate an invalid argument


        ArrayBoundsException – when an element index outside the bounds of the array is accessed

        ArraySliceException – when an array slice is outside the bounds of the array or end < start

        ArrayLengthException – when casting to a narrower array type

    OutOfMemoryException – when a memory heap is out of space

    HeapConstraintException – heap constrained reference assigned a value outside that heap

    AlignmentException – alignment constrained reference assigned a misaligned value


Processor Exceptions


Here is the hierarchy of processor exceptions and the conditions under which they occur:



    ArithmeticException – generic arithmetic problem

        DivideByZeroException – attempt to divide by zero

        NonFiniteNumberException – a floating point result is not finite (i.e. is infinite or not a number)

        OverflowException – numerical overflow

        UnderflowException – floating point numerical underflow

    StackOverflowException – ran out of space on the stack

    AccessViolationException – attempt to access an invalid area of memory (usually an attempt to deference a null pointer)


Floating point exceptions are enabled by default in C-UP as tracking NaNs down manually is beyond tedious. It’s important to note that by default C-UP is not IEEE floating point compliant as it enables flush-to-zero mode and disables underflow exceptions. The reason as always is that C-UP is designed for writing very high performance software and correct handling of underflow invariably slows processor floating point units to a crawl – indeed many processors in game consoles only support flush-to-zero so requiring correct underflow handling would prohibit the use of these units (e.g. the SPEs in the PS3 Cell processor.)


Exceptions in Parallel Code


If an exception thrown by code executing in parallel and is caught inside the parallel function or below, then the exception is handled in parallel and execution of the parallel code continues.


If any exception is not caught inside the scope of the executing parallel function, then the first such exception is remembered. All other currently executing parallel code is allowed to finish, and all queued parallel functions that have not yet started are cancelled. If in this time another parallel function throws an exception that isn’t handled, then that exception is lost. Once all jobs in this parallel block are finished or cancelled, the first unhandled exception is automatically re-thrown to the scope enclosing the parallel block for handling in serial.


Exceptions in Static Initialisers


Exceptions in static initialisers are handled in exactly the same way as any other exception. However, it’s unlikely that your code will catch such exceptions so they tend to fall through to the standard exception handler which prints a message and terminates the program.


Call Stack


The System.Exception.Exception base class has the following static property:


    public static const(StackFrame)[int] CallStack();


This function can be called at any time and will return the call stack at the time the last exception was thrown. The 0th element in the array is the deepest function on the call stack (i.e. the one where the exception occurred.) The StackFrame structure is defined in Symbol.cup as:


public struct CodeLocation


    public readonly Symbol Scope;

    public readonly uint Line;



public struct StackFrame


    private readonly ulong BasePtr;

    public readonly CodeLocation Location;



It’s possible to retrieve the name of the function from the CodeLocation.Scope symbol, but note that null entries are possible if the function is not found in the C-UP symbol table (meaning that it was an external C function). If you want to print the call stack to the console, the System.Console.Console class has a helper function that can do it for you:


public static void WriteCallStack(const(StackFrame)[int] callStack);





Attributes are meta-data in the program source code which is attached to symbols and can be retrieved programmatically in order to affect runtime behaviour. This section just explains the syntax for attributes and how some special attributes control behaviour of the compiler. How to retrieve attributes at runtime is described in the Reflection section.


Any symbol in the program (Function, Variable, Type, Module) can have any number of attributes attached to it. Each attribute is a name/value pair, where the values are literals. Therefore any built in type with a literal form can be used as an attribute. Attributes are enclosed in brackets and come directly before the declaration of the symbol they are to be attached to. Any number of attribute blocks can be specified, and multiple attributes can appear in each block comma separated.


[AnAttr=”String data”, AnotherAttr=1.0f]

public int MyVariable;


Is equivalent to:


[AnAttr=”String data”]


public int MyVariable;


Again, please refer to the reflection chapter for details on how to retrieve these values at runtime.


Special attributes


Certain special attributes are understood by the compiler and runtime.




The [invoke] attribute must be added to any function you plan to invoke dynamically at runtime using the reflection system. This is required because there is a memory overhead for each signature that is allowed to be invoked this way, so allowing it by default for every function would add considerable overhead.


The reason there’s an overhead is because the compiler generates a helper function which extracts the arguments from the passed in array of void pointers and passes them as values to the function itself. Although these functions are small, they can really add up for a large project.




The [dll=name.dll] attribute is used when interfacing with external libraries implemented as dynamic link libraries. Whenever the runtime performs final linking of native executable code and finds an undefined function, it searches all of the scopes from the function declaration outwards looking for the first dll attribute it can find. It then loads the given dll and attempts to resolve the address of the function again. The search for this attribute name is case insensitive. Note that this search does not happen and the dll is not loaded unless an external function is called by some other code.


Usually you would attach the dll attribute to a module declaration as keeping a 1:1 correspondence between modules and dlls is clean, but you can attach it to a function or class.



Module Windows.Direct3d;


HRESULT D3D10CreateDevice(void* adapter, DriverType drvrType, void* module, uint flags, uint version, Direct3d10Device*& ppDevice);




Many library functions you’ll be calling in Windows (e.g. Win32, OpenGL) use the stdcall calling convention. C-UP natively uses cdecl so you must tag external stdcall functions appropriately to get correct stack clean up.




The stack in C-UP is 16 byte aligned which allows vector types to be passed by value efficiently. However, the C stack is 4 or 8 byte aligned so at points where execution transitions from C into C-UP it’s necessary to use the [alignstack] attribute to get the C-UP compiler to insert extra instructions that ensure stack alignment. You don’t need to worry about this at the main program entry point as the main entry point is called by a small function which itself is stack aligned.




This attribute is only recognised on the main program entry point and allows you to specify the stack size of the main fiber in kilobytes. If this attribute isn’t found the default stack size of 64kb is used.




This attribute is only recognised on the main program entry point and specifying it causes the system to output various data to the console: static initialisation order, stack size and main entry point arguments.




This can be used at module, class or function scope and prevents any code inside that scope being optimised by the compiler even when optimisations are enabled. This is useful when you want full debugging capabilities in a part of the code you’re working on but without slowing the entire program to a crawl by disabling optimisations globally. After all, there’s nothing worse than repeatedly playing a game at 5fps to try and find a bug.



Reflection refers to the ability of a program to inspect its own structure dynamically at runtime (to reflect upon its own state in other words).


The reflection capabilities of C-UP allow you to do a number of useful things:


1.       Given the textual name of any symbol in the program, locate the symbol table entry for that symbol.


2.       Once you have a symbol, what you can do with it depends upon its type:

o   For a class or other type, you can create an instance of that type

o   For a function, you can invoke that function

o   For a variable, you can get or set the value of that variable


3.       Common to every symbol is the ability to traverse the hierarchy of symbols to find out things about itself. E.g. a function symbol could iterate its parameters to find out their names and types.


All of the reflection structures and methods are implemented in the System.Reflection module. The primary structure you will use is the Symbol type, which internally is a 32-bit integer representing an index into the symbol table.


Note that there are 2 operators which return Symbol structures, whereas the rest of the reflection mechanism is implemented as library functions:


Symbol sym = symbolof(My.Qualified.Name);

Type type = typeof(My.Qualified.Type);


Everything is a Symbol in the C-UP reflection system. The Type structure is derived from Symbol. See the descriptions of these operators in the Expressions section of this document for more information.


The Symbol hierarchy is very small, here it is:













The reflection systems do not check the protection level of types, functions or variables, meaning you can create private types, invoke private functions or get and set the values of private variables. This is by design as reflection is often used to initialise objects when serialising them in and it’s even less convenient to have to make all serialised members public than it is to allow people access to private data by this circuitous route.



Finding a Symbol by Name


The symbolof and typeof operators are only useful if you know the name of a symbol at compile time.


However, let’s say you’re parsing a text file and need to create instances of the objects declared in that file. Let’s also say the names are all fully qualified so we need to start searching for them at the global scope. You can do something like this:


import System.Reflection;


Symbol GetSymbolForQualifiedIdentifier(string qualifiedIdent) const


    // split the identifier up

    string[] local idents = new string[128];

    int numIdents = qualifiedIdent.Split(idents, “./\\”, false);


    // Global is a static readonly property of the symbol type

    Symbol sym = Symbol.Global;


    foreach (string ident; idents)


        sym = sym.FindChild(ident);

        if (!sym)

            throw new Exception(“Symbol not found: “ + qualifiedIdent);



    return sym;



Creating an Instance


Given a Symbol that you want to create an instance of, you must first ensure that the symbol is indeed a type (as opposed to representing a variable or whatever.) You do this by casting it to a Type. If the cast fails a null Type object is returned. Of course the typeof operator always returns a Type so no cast is needed when using it.


Once you have a Type you can use the CreateInstance method to create an instance of that class. CreateInstance optionally takes a heap index as its only argument. CreateInstance returns a void* so a dynamic cast will be required to ensure it created the kind of class you’re allowing in the current situation.


import System.Reflection;


// this cast returns a null symbol if it fails

Type typeToCreate = cast(Type)GetSymbolForQualifiedIdentifier(“Some.Type.Name”);


void* obj = null;

if (typeToCreate)

    obj = typeToCreate.CreateInstance();


// in this example we’re expecting things derived from MyObject to be created here

MyObject* myObj = cast(MyObject*)obj;


This will run any initialiser the type has but if you want to run a constructor it’s up to you to manually invoke one – see the next section on invoking functions.


You can also create an array of using the CreateArray method, but you can only create an array of value types this way. CreateArray returns a byte[long], which can be cast to an array of any value type subject to runtime heap and alignment checking.


Invoking a function


Given a Symbol that you want to invoke as a function, you must first ensure that the symbol does represent a function. To do this you cast it to a Function. If this cast fails a null Function is returned.


Once you have a Function symbol, you call the Invoke method to call that function. There are two versions of Invoke depending on if the function is an instance member function or a static member / module function. You pass any arguments as an array of local pointers, and a pointer is returned to the return value (if any.)


Note that you can only invoke a function marked with the “invoke” attribute using this mechanism. Attempting to invoke any other function will cause a runtime exception. Also note that the invoke attribute can appear at any scope, so if for example you want all the functions in a module to be invoke-able, you can just put this attribute above the module declaration.



public void* Invoke(void&[] local args);

public void* Invoke(void* instanceObj, void&[] local args);


import System.Reflection;



int TestFunc(int a, int b)


    return a + b;



Function func = cast(Function)symbolof(TestFunc);


int result = -1;

if (func)


    int arg0 = 10, arg1 = 20;

    int& args[2] = {&arg0, &arg1};

    // exception will be thrown if return value doesn’t match as you’ll dereference null

    result = *(int*)func.Invoke(args);



No conversion is currently applied to the argument values passed into these functions, so the exact types required by the function must be passed in or an exception will occur.


Getting or Setting a Variable


Given a Symbol that you want to treat as a variable, you must first ensure that the symbol does represent a variable. To do this you cast it to a Variable. If this cast fails a null Variable is returned.


Once you have a Variable symbol, you can call the GetValue methods to retrieve a pointer to the value of that variable. Note that there are 4 overrides of the GetValue function for different combinations of local and const variables. This prevents you breaking the local and const rules using reflection.


void* GetValue(void* instanceObj);

const(void)* GetValue(const(void)* instanceObj);

void& GetValue(void& instanceObj);

const(void)& GetValue(const(void)& instanceObj);


All of these functions take an instanceObj reference, which is the instance object for getting and setting instance member variables. If you’re getting or setting a static or module scope variable then just pass null for this parameter. If you’re getting or setting a local variable then the reference passed in must be to a StackFrame structure.


Bit fields


If the variable in question is a member of a bit struct (i.e. a bit field), then it’s not possible to take the address of it and for this reason separate get and set methods are provided for bit fields. In this case the instanceObj is expected to be a reference to the containing bit struct.


bool IsBitField() const;


ulong GetBitfieldValue(const(void&) instanceObj);

void SetBitFieldValue(void& instanceObj, ulong value);


1d Arrays


If the variable in question is a 1d array, then the GetElement and SetElement functions can be used to get and set elements of that array. The GetArrayLength method is used to get the number of elements in an array variable.


ulong GetArrayLength(void& instanceObj);


void* GetElement (void* instanceObj, ulong index);

const(void)* GetElement (const(void)* instanceObj, ulong index);

void& GetElement (void& instanceObj, ulong index);

const(void)& GetElement (const(void)& instanceObj, ulong index);


2d Arrays


If the variable is a 2d array:


uint GetArrayWidth(const(void)& instanceObj);

uint GetArrayHeight(const(void)& instanceObj);

uint GetArrayStride(const(void)& instanceObj);


void* GetElement (void* instanceObj, uint x, uint y);

const(void)* GetElement (const(void)* instanceObj, uint x, uint y);

void& GetElement (void& instanceObj, uint x, uint y);

const(void)& GetElement (const(void)& instanceObj, uint x, uint y);




If the variable is a vector then you can get the entire vector as with any other variable. Alternatively you can access the individual components.


uint GetVectorDimension(const(void)& instanceObj);


void& GetVectorComponent(void& instanceObj, uint index);




Enumeration reflection is supported through the Enum and EnumItem symbol types. These are cast to like all the other types and have special functionality related to enums.


However, the most common thing people want to do with enums is to get a human readable string given an enumeration value or vice-versa. Helper functions in the Reflection module allow you to simply achieve both of these without needing to use reflection directly.


To get the name of an enum item given a value:


MyEnumType e;

string name = System.Symbol.GetEnumItemName(e);


This is a generic function, but the generic type is inferred from the argument type. If the value is invalid (e.g. in the above example it was not initialised and zero may or may not be a valid value) then an exception is thrown.


To get the value from a name:


e = System.Symbol.GetEnumItemValue<MyEnumType>(name);


This is also a generic function but in this case you must explicitly pass the enum type name as it can’t be inferred. If the name string given is invalid, then an exception is thrown.




The function that a delegate currently references can be read using this function:


public Function GetDelegateFunction(const(void)& instanceObj) const;


The instance object pointer (for a non-static delegate) can be read using:


public void* GetDelegateInstance(const(void)& instanceObj) const;


It’s not currently possible to set the value of a delegate using reflection.




Any symbol can have any number of attributes associated with it. Attributes are represented using the Attribute class.


To get the first attribute of a symbol use the FirstAttribute property. To get the next attribute use the standard Next property and cast to Attribute – this cast will never fail as the only siblings of attributes are other attributes.


To find an attribute attached to a symbol call FindAttribute(string name). The name is case insensitive.


To find an attribute attached to this symbol or any parent symbol call FindAttributeHrc(string name). The nearest attribute with that name will be returned.


Once you have an Attribute only 2 operations are possible. To get the type of the data represented by the attribute use the DataType property. This returns a Type symbol. To get a const void pointer to the data use the Data property.




C-UP allows you to create bindings to data in the source code of the language. Furthermore the C-UP compiler allows you to pass data files other than source code to it and will package that data into the same object file as the compiled source code. It is then guaranteed that your program will be able to access this data at runtime.


Data files are just embedded as simple binary data. It’s still up to the program to interpret them correctly at runtime, in this example we create a handle to some embedded font data, print out the size of that data and load it from memory. In C-UP embedded data is referred to as a resource.


       // loading an embedded resource

       Resource fontRes = r"GameRuntimeTest\Data\Arial32aa.fnt";

       Console.WriteLine("font data size = " + fontRes.Data.Length);

       Font* font = System.Graphics.Font.Load(&MemoryStream(fontRes.Data));


To make a reference to an embedded resource simply use a string literal prefixed with an ‘r’ (for resource.) The path given in that string must be relative to the root location of the project being compiled (see Compiler section.) As the string is inherently a path name it does not perform escape character processing so you can safely use single back-slashes as path separators.


The data contained in this Resource can be accessed using the .Data member, which returns it as a const(byte)[long].


This array can then be passed to the constructor of a System.IO.MemoryStream object and serialised in. An important point to note is that MemoryStream allows you to perform an in-place read of const data, meaning it returns a reference to the original embedded data rather than making a copy. If you read it as non-const then a copy will be made.


If you want to make life easier for users of your code you can allow loading directly from a resource. The Font class does this so loading a font from an embedded resource reduces to a single line.


       Font* font = System.Graphics.Font.Load(r"GameRuntimeTest\Data\Arial32aa.fnt");


Placing an r”name” in the source code does not automatically pull that data into your object file, it just makes a reference to it. That reference can still return null when you access the .Data member. For the data to actually be embedded in the object file it must also be passed to the compiler command line (see Compiler section).


This gives you the flexibility to put data into shared packages. When a package containing a data file is loaded, all references to that file become live. If two packages load the same data, only one copy of that data is loaded. If the two versions of the data don’t match then a runtime error occurs.

All of this is handled by the same mechanism that performs dynamic loading and linking of code in modules.




As mentioned above all resources are specified relative to the root folder passed to the compiler. The entire path from this location is stored in the resource and the resource must be accessed by passing this entire relative path. You cannot use absolute paths or use .. to access resources outside of the root location.


When you link to a library containing resources, its resources are available to you and are still accessed using the relative path they had when compiled, even though you might be using a completely different root folder for compiling your exe than was used to compile the lib. For this reason it’s important to give your resources a meaningful folder structure if you’re planning to release a library into the wild to avoid clashes with other libraries. If you just put your resources in the root folder or in something generic like “data”, then it’s quite likely they’ll clash with other libraries.


In other words, resource locations should be given the same amount of consideration as package and module names.


Dynamic Loading


If you don’t embed a resource in the executable of your program, then you’ll need to load it at runtime. You can do this synchronously or asynchronously using these two member functions of Resource:


public static Resource Load(sstring fileName, byte heapIndex);


public static AsyncTask* LoadAsync(sstring fileName, heapid heapId, FileAsyncCallback callback, Resource& resource);


The asynchronous version immediately creates a resource symbol and returns it to you in ‘resource’. This symbol persists even if the load fails or if you unload the resource later. The data it references will only be valid once loading is complete. You can use these members of AsyncTask to monitor/control progress of the load, and of course you’ll get the given callback as well.


For more information on async tasks see the Asynchrony section of this document.


You can dynamically unload a resource using this static member of Resource:


       public static void Unload(Resource file);


As mentioned above, the Resource symbol remains valid, it’s just the data it refers to that is unloaded. To say the data is unloaded isn’t even true really as the reference to it is just nullified which makes it eligible for garbage collection.



Most IO devices in a modern computer can operate independently of the CPU and are therefore perfect candidates for asynchronous operation. The obvious examples are the hard drives, optical drives, network adapters, sound cards and graphics cards.


Other asynchronous hardware devices like video encode/decode units and DMA memory movement units are also increasingly common and are often directly accessible on consoles.


Even in the absence of specific hardware multi-core processors can use idle cores to perform memory operations asynchronously to the main flow of the program.


C-UP comes with classes for handling all kinds of asynchronous operations and because most of it is IO related this functionality can mostly be found in the System.IO package.


Async.cup                           -- contains async task and callback definitions

AsyncStream.cup             -- base class for asynchronous stream, plus async copying helper

Stream.cup                         -- base class for blocking stream functions and a byte counting stream      

MemoryStreap.cup        -- memory stream implementation and async memory copy and fill functions

FileStream.cup                  -- asynchronous and blocking file stream, plus async file copier

Network.cup                     -- all networking functionality including sockets and net stream





There is a single class which represents any asynchronous IO task and it’s called AsyncTask (or more specifically System.IO.Async.AsyncTask.)


Any function that queues a task for asynchronous execution returns a pointer to one of these giving the user a base set of capabilities.


public class AsyncTask


       // cancel this task - this is not guaranteed to be possible, returns true if it was

       public bool Cancel();


       // wait for completion of this task

       // yields execution to other fibers while waiting

       public void Await();

       public void Await(float timeOutSec);


       // is this async task ended (i.e. complete, cancelled or error)

       public bool IsEnded() const;


       // is this async task complete (returns false if cancelled)

       public bool IsComplete() const;


       // is this task cancelled

       public bool IsCancelled() const;


       // did an error occur for the task

// (this includes failure to allocate a task handle and cancellation)

       public bool IsError() const;


       // get error code (0 means none)

       public sbyte ErrorCode() const;



You can wait for a task to end by calling the Await method on an AsyncTask. Essentially this just co-operatively yields the processor to the next fiber until the task if ended. In this context ended means successfully completed, cancelled or failed with an error.


You can wait for multiple tasks by calling the static Await method in the System.IO.Async module, passing either any number of individual task pointers or an array of task pointers.


// individual tasks

AsyncTask* a, b, c;



Await(a, b, c);


// dynamic array of tasks

AsyncTask[] tasks;



// value array of tasks

AsyncTask[10] tasks;



There are also variants of all await functions that accept a timeout value in seconds as the first parameter.


You can cancel a queued task that hasn’t completed yet by calling the Cancel method. Deleting a task also cancels if not complete. Note that cancellation doesn’t necessarily mean the task will be immediately interrupted – if it’s already started then it might very well run to completion and then be flagged as cancelled.


You can manually poll for completion by using the IsEnded, IsComplete, IsCancelled and IsError methods of AsyncTask. IsEnded just checks for any one of the other 3 having occurred. To retrieve the error code use the ErrorCode property. Error codes are not very well fleshed out yet so you’re realistically just confined to knowing if there was an error or not.


More useful than polling is the ability to receive a callback when any async task ends. This callback must conform to this delegate:


delegate void AsyncCallback<T>(AsyncTask* task, const T& args);


There is a structure (used for T) defined for each async task type, which returns information appropriate to that task to you upon task completion. See the example functions in the streams section below.





The primary mechanism for creating asynchronous tasks involves using any stream class derived from AsyncStream. The currently available stream types are FileStream, NetStream and MemoryStream the use cases of which should be pretty self-explanatory.


There are only three asynchronous methods supported by async streams: read a byte array, write a byte array and close the stream. Here’s what they look like:


class AsyncStream


    AsyncTask* ReadAsync(

        byte[long] dest,

        AsyncCallback<AsyncReadCallbackArgs> callback,

        AsyncTask*[short] local dependencies = null);


    AsyncTask* WriteAsync(

        const(byte)[long] src,

        AsyncCallback<AsyncWriteCallbackArgs> callback,

        AsyncTask*[short] local dependencies = null);


    AsyncTask* CloseAsync(

        AsyncCallback<AsyncCloseCallbackArgs> callback,

        AsyncTask*[short] local dependencies = null);



When writing to a file or memory stream it is your responsibility to keep the source array intact until the async task is complete. Asynchronous operations happen on a different thread to that on which your main program is running so modifying the source array while an async task is pending on it will probably result in garbage being written out. The same is true of accessing the destination of an async read.


Network streams are different and do in fact buffer sent data for you at the point where you call write. If you want un-buffered writes to a network stream you can use the Send method on the Socket class directly.


Note that the MemoryStream and FileStream classes are actually derived from Stream which in turn derives from AsyncStream. Stream adds lots of blocking read and write functions for various data types and the ability to seek. Any attempt to do a blocking read, write, seek or close while there are asynchronous operations outstanding on the stream will throw an exception.


NetStream does not support any blocking behaviour (neither do the socket types that network streams operate on) and as such NetStream is derived directly from AsyncStream.


Copying Streams


Helper classes are provided to perform asynchronous copies of streams and files.


For general stream copying use class CopyStreamAsync defined in AsyncStream.cup.


AsyncStream* fromStream, toStream;

var copier = new CopyStreamAsync();

copier.Start(fromStream, toStream);


Obviously you need to construct and open the streams manually beforehand. Once it’s finished copying you can reuse the same streams again by calling Start.


To check for completion use the IsBusy property. To abort copying call Abort. To get current progress as a percentage use the Progress property.


To asynchronously copy one file to another you can use the above or you can use the CopyFileAsync class defined in FileStream.cup, which wraps the above class and accepts two file names to copy from and to.


Loading resources


An entire resource file can be loaded asynchronously. Please refer to the reflection section of this document for a discussion of resources.


The asynchronous loading mechanism is comprised of a single static member function of the System.Reflection.Resource class.


AsyncTask* LoadAsync(

sstring filename,

heapid heapId,

AsyncCallback<AsyncCopyCallbackArgs> callback,

       Resource& resource,

AsyncTask*[short] local dependencies = null);


The file name is the name of the file to load and must be relative to the current resource root folder.

The heapId is the memory heap to allocate memory for this resource from.

The resource pointer immediately returns you a reference to a new resource even before the data is loaded. Until the data is loaded resource.Data will return null.


The rest is standard async task stuff.


Copying memory


Arrays of bytes can be asynchronously copied or filled with a value using functions found in the System.IO.MemoryStream module.


AsyncTask* CopyAsync(

byte[long] dest,

const(byte)[long] src,

AsyncCallback<AsyncCopyCallbackArgs> callback,

AsyncTask*[short] local dependencies = null);


AsyncTask* CopyAsync(

byte[long] dest,

byte[long] src,

AsyncCallback<AsyncCopyCallbackArgs> callback,

AsyncTask*[short] local dependencies = null);


AsyncTask* CopyReverseAsync(

byte[long] dest,

byte[long] src,

AsyncCallback<AsyncCopyCallbackArgs> callback,

AsyncTask*[short] local dependencies = null);



Copying memory this way is more efficient than using two memory streams to do it, because this way it will be transferred directly from source to destination whereas using two memory streams will cause it to be copied to an intermediate buffer and then back out again.


The reverse copy function is used when the two buffers overlap and you are moving data to a higher address. There are two variants for the forward copy with a const source array when the arrays don’t overlap and a mutable source array when they do.



The async fill function fills memory with the given byte value.


AsyncTask* FillAsync(

byte[long] dest,

byte value,

AsyncCallback<AsyncCopyCallbackArgs> callback,

AsyncTask*[short] local dependencies = null);





Any asynchronous task can depend on a number of other asynchronous tasks. A dependent task will not commence until all tasks it depends on are ended.


Any task which operates on a file or socket handle is implicitly depen