.. _sec-ir-csubset: The C Subset ============ Here is a descripton of the **C** subset that **Yace** supports along with conventions around the subset, such as when and how to use ``typedef``. **Yace** supports **C99** data types (``bool``, ``signed`` / ``unsigned char``, ``intX_t``, ``uintX_t``, etc.), derived types (``enum``, ``struct``, ``union``), declarations of functions and function points, pointers, arrays, and ``#define NAME ``. As you might have noticed, then this means that **Yace** intends to only parse headers or in other words the entities that usually go into headers. Yace won't be able to parse e.g. a **header-only-library** since these tend to embed logic into functions declared inline. For sufficiently simple function-bodies, these could be translated into equivalent First, header-libraries are unsupported, that since, they are basically C code inlined in the header-file which requires compilation. Since, code that needs compilation is rarely usable on a FFI boundary, unless using something like Zig. Thus, logic should be implemented in a library, and in the header / FFI / IR, one defines the functional declaration to call to execute that logic. However, using these have contraints when considering portable code and code that you want to work predictably accross a FFI. Most of the scalar / pod data types of C are either platform, system dependent or implementation specific in terms of singedness and width. For example, the data type ``int`` has platform dependent width, one might be accustomed to ``int`` being 32 bits wide, when working on popular 32bit and 64bit x86 and ARM systems. However, the C standard does not guarantee this size. It can be as little as 16bit and even more than 64 bits on some systems. To reduce this then **Yace** currently experiments with: * When parsing a C header, then UnsupportedType errors are emitted to let you know that these types are being coerced into fixed-width / portable types. * A callable helper-function (TODO) is emitted that checks the width and signage - In case of type-coercsion, then this function can inform at runtime what is affected The reasoning behind this is that FFIs will do this on your behalf anyway. Making an active decision ahead-of-time, instead of being subjected to multiple different ways of handling variable width and signage, then write your C APIs with portability in mind, both for running your C library on multiple platforms, but also for FFI interoperability. Function Pointers ----------------- The **C** language construct ``typedef`` is severely restricted in **Yace** as it only supports typedefs of function-pointers. That is, typedefs on the form:: typedef int (*binop_func)(int x, int y); For such a ``typedef``, then ``binop_func`` can then be utilized as a type-specifier instead of writing the entire function-signature. This makes C code a lot more readable in function declarations/protoypes such as the following:: int apply_func(binop_func op, int a, int b); Which is significantly nicer to read in **APIs** using callback-functions. This is an excellent use of the ``typedef`` language construct, it is also the only one supported by **yace**. **Yace** conventions: naming convention In **C** there is nothing telling the user of an API with function-pointers that a typedef function-pointer declaration is a function-pointer. Thus, **Yace** introduce the convention than the function name must have of the **follwing** suffixes: ``_cb``, ``_callback``, ``_fn``, ``_fun``, ``_func``, ``_function``. mandatory argument names In **C**, then argument names in function-pointer-declarations are optional and have no effect on the declararation of the function pointer. Yet, in **yace** they are mandatory. They are mandatory because providing them, and documentation of them, significantly improves readability of the code and understanding of usage. typedef ------- In **C**, many types are named or ``typedef``'ed with the ``_t`` suffix, which might lead developers to think they should alias all types this way. However, this is incorrect. The ``_t`` suffix is reserved for POSIX-defined types. - **Do not** use ``_t`` in your types. - **Do not** use ``typedef`` except for function pointers, following the conventions mentioned. Why avoid ``typedef``? Using ``typedef`` doesn't create a new type but simply provides an alias. This can be misleading, as one might expect type checks to ensure that only the typedef'ed name is used. However, this is not the case, and as a result, typedefs can create a false sense of type safety. While typedefs can improve readability by providing aliases, they often decrease clarity when introducing new conventions specific to a library. Users of your API will have to understand your conventions instead of reading the actual type. Thus, ``typedef`` should only be used by POSIX or compilers, not in user libraries. The only valid use is for function pointers—everything else makes the code **less** readable. ``char`` -------- The ``char`` is by many compilers treated as signed, however, it is a choice left to the compiler. This can be troublesome in FFI / portability scenarious. Thus, to reduce this, then whenever ``char`` is encountered, without explicit sign qualification, then **yace** will treat is as ``signed char``. Functions --------- Are not allowed to return or pass: * struct or union; by value * enum - Having a function return an enum gives a false sense of type-safety since enums in C are not strongly typed. One might think that when a function returns an enum, then it can only ever return any of the values defined in the given enum. However, this is a fallacy. The function can return any value, also one which is not a member of the enum. This is a common source of errors, and thus not allowed in **Yace**. - Similarly, functions should not take enum as input for the same reason. Datastructures ============== ... Structs ------- Structs are the essential type for user-defined data structures. Most data exchange is somehow encapsulated by a struct. Now, this is excellent, however, there are a lot of ways of defining structures in C, which do not work well on a FFI boundary. One example of this is anonymous nested records, such as:: struct delivery { int carrier; struct { int x; int y; int z; } location; }; This is valid C, and often seen in systems with large structures where the anonymous structs provide organization. However, such nested anonymous structures are not supported in languages such as Rust. Thus, what binding generators often do, is they **hoist** the nested definitions out and re-write on the form:: struct delivery_anon_x { int x; int y; int z; }; struct delivery { int carrier; struct delivery_anon_x location; }; Other binding generators choose different strategies. The point here is that there are many ways structs can be defined, however, only a subset of these translate into nice bindings. Also, to avoid non-nice names such as injected "anon" etc. then **yace** will simply not allow these and will error out. It is then the responsibility of the user to re-write / manually hoist this, in the C API, into something useful like:: /** * Describe this... */ struct delivery_location { int x; ///< And the members... int y; int z; }; /** * Describe this */ struct delivery { int carrier; struct delivery_location location; }; The reasoning here is that, intead of individual binding-generators applying different "hoisting" techniques, then rewrite the representation at the "source". When doing so, it might be that an even better representation could be achieved. Union ----- Enum ---- Enums are great for grouping collections of values and enables a way to refer to these symbolically and thereby avoid "hardcoding" magic values. Also, unlike macros such as:: #define FOO_THRESHOLD_UPPER 200 #define F00_THRESHOLD_LOWER 100 Then this can be done using an enum like:: /** * Upper and lower threshold */ enum foo_threshold { FOO_THRESHOLD_UPPER = 200, ///< Upper limit FOO_THRESHOLD_LOWER = 100, ///< Lower limit }; By doing so, then the "magic" values can be referred to symbolically, just like the define, however, the values can be documented, and grouped. However, refrain from using the enum as function return or parameter types. Since C is not strongly typed, then there is no enforcement that a function returning an enum, actually returns either 100 or 200 as the enum values above. The funtion could perfectly well return any other value such as 2. Thus, only use enums as a way to document and symbolically refer to values. Having a library / FFI that documents the "magic-values" and provides symbolic references to them is really useful.