# Overloadable opcodes. The overloadable opcode (or xext) proposal allows a non standard extension to use a documented 20 + 3 bit (or 52 + 3 bit on RV64) UUID identifier for an instruction for _software_ to use. At runtime, a cpu translates the UUID to a small implementation defined 12 + 3 bit bit identifier for _hardware_ to use. It also defines a fallback mechanism for the UUID's of instructions the cpu does not recognise. Tl;DR see below for a C description of how this is supposed to work. It defines a small number N standardised R-type instructions xcmd0, xcmd1, ...xcmd[N-1], preferably in the brownfield opcode space. We usually assume N = 8 (aka log2(8) = 3 in the + 3 above). Each xcmd takes (in rs1) a 12 bit "logical unit" (lun) identifying a (sub)device on the cpu that implements some "extension interface" (xintf) together with some additional data. Extension devices may be implemented in any convenient form, e.g. non standard extensions of the CPU iteself, IP tiles, or closely coupled external devices. An xintf is a set of up to N commands with 2 input and 1 output port (i.e. like an R-type instruction), together with a description of the semantics of the commands. Calling e.g. xcmd3 routes its two inputs and one output ports to command 3 on the device determined by the lun bits in rs1. Thus, the N standard xcmd instructions are standard-designated overloadable opcodes, with the non standard semantics of the opcode determined by the lun. Portable software, does not use luns directly. Instead, it goes through a level of indirection using a further instruction xext. The xext instruction translates a 20 bit globally unique identifier UUID of an xintf, to the lun of a device on the cpu that implements that xintf. The cpu can do this, because it knows (at manufacturing or boot time) which devices it has, and which xintfs they provide. This includes devices that would be described as non standard extension of the cpu if the designers had used custom opcodes instead of xintf as an interface. If the UUID of the xintf is not recognised at the current privilege level, the xext instruction returns the special lun = 0, causing any xcmd to trap. Minor variations of this scheme (requiring two more instructions xext0 and xextm1) cause xcmd instructions to fallback to always return 0 or -1 instead of trapping. Remark1: the main difference with a previous "ioctl like proposal" is that UUID translation is stateless and does not use resources. The xext instruction _neither_ initialises a device _nor_ builds global state identified by a cookie. If a device needs initialisation it can do this using xcmds as init and deinit instructions. Likewise, it can hand out cookies (which can include the lun) as a return value . Remark2: Implementing devices can respond to an (essentially) arbitrary number of xintfs. Hence, while an xintf is restricted to N commands, an implementing device can have an arbitrary number of commands. Organising related commands in xintfs, helps avoid UUID space pollution, and allows to amortise the (small) cost of UUID to lun translation if related commands are used in combination. == Description of the instructions == xcmd0 rd, rs1, rs2 xcmd1 rd, rs1, rs2 .... xcmdN rd, rs1, rs2 * rs1 contains a 12 bit "logical unit" (lun) together with xlen - 12 bits of additional data. * rs2 is arbitrary For e.g xmd3, route the inputs rs1, rs2 and output port rd to command 3 of the (sub)device on the cpu identified by the lun bits of rs1. after execution: * rd contains the value that of the output port of the implementing device -------- xext rd, rs1, rs2 xext0 rd, rs1, rs2 xextm1 rd, rs1, rs2 * rs1 contains --a UUID of at least 20 bit in bit 12 .. XLEN of rs1 identifying an xintf. --the sequence number of a device at the current privilege level on the cpu implementing the xintf in bit 0..11 . In particular, if bit 0..11 is zero, the default implemententation is requested. * rs2 is arbitrary (but bit XLEN-12 to XLEN -1 is discarded) after execution, if the cpu recognises the UUID and device at the current privilege level, rd contains the lun of a device implementing the xintf in bit 0..11, followed by bit 0.. XLEN - 13 of rs2. if the cpu does not recognise the UUID and device it returns the numbers 0 (for xext), 1 (for xext0) or 2 (for xextm1), in particular bit 12.. XLEN are 0. --- The net effect is that, when the CPU implements an xintf with UUID 0xABCDE a sequence like //fake UUID of an xintf lui rd 0xABCDE xext rd rd rs1 xcmd0 rd rd rs2 acts like a single namespaced instruction cmd0_ABCDE rd rs1 rs2 (with the annoying caveat that the last 12 of rs1 are discarded) The sequence not indivisible but the crucial semantics that you might want to be indivisible is in xcmd0. Delegation and UUID is expected to come at a small performance price compared to a "native" instruction. This should, however, be an acceptable tradeoff in many cases. Moreover implementations may opcode-fuse the whole instruction sequence or the first or last two instructions. If several instructions of the same interface are used, one can also use instruction sequences like lui t1 0xABCDE //org_tinker_tinker__RocknRoll_uuid xext t1 t1 zero xcmd0 a5, t1, a0 // org_tinker_tinker__RocknRoll__rock(a5, t1, a0) xcmd1 t2, t1, a1 // org_tinker_tinker__RocknRoll__roll(t2, t1, a5) xcmd0 a0, t1, t2 // org_tinker_tinker__RocknRoll__rock(a0, t1, t2) If 0xABCDE is an unknown UUID at the current privilege level, the sequence results in a trap just like cmd0_ABCDE rd rs1 rs2 would. The sequence //fake UUID of an xintf lui rd 0xABCDE xext0 rd rd rs1 xcmd0 rd rd rs2 acts exactly like the sequence with xext, except that 0 is returned by xcmd0 if the UUID is unknown at the current privilege level. Likewise usage of xextm1 results in -1 being returned. This requires lun = 0 , 1 and 2 to be routed to three mandatory fallback interfaces defined below. On the software level, the xintf is just a set of glorified assembler macros org.tinker.tinker:RocknRoll{ uuid : 0xABCDE rock rd rs1 rs2 : xcmd0 rd rs1 rs2 roll rd rs1 rs2 : xcmd1 rd rs1 rs2 } so that the above sequence can be more clearly written as import(org.tinker.tinker:RocknRoll) lui rd org.tinker.tinker:RocknRoll:uuid xext rd rd rs1 org.tinker.tinker:RocknRoll:rock rd rd rs2 ------ The following standard xintfs shall be implemented by the CPU. For lun == 0: At privilege level user mode, supervisor mode and hypervisor mode org.RiscV:Fallback:Trap{ uuid: 0 trap0 rd rs1 rs2: xcmd0 rd rs1 rs2 ... trap[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2 } each of the xcmd instructions shall trap to one level higher. At privilege level machine mode each trap command has unspecified behaviour, but in debug mode should cause an exception to a debug environment. For lun == 1, at all privilege levels org.RiscV:Fallback:ReturnZero{ uuid: 1 return_zero0 rd rs1 rs2: xcmd0 rd rs1 rs2 ... return_zero[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2 } each return_zero command shall return 0 in rd. For lun == 2, at all privilege levels org.RiscV:Fallback:ReturnMinusOne{ uuid: 2 return_minusone0 rd rs1 rs2: xcmd0 rd rs1 rs2 ... return_minusone[N-1] rd rs1 rs2: xcmd[N-1] rd rs1 rs2 } each return_minusone shall return -1. --- Remark: Quite possibly even glorified standard assembler macros are overkill and it is easier to just use defines or ordinary macro's with long names. E.g. writing #define org_tinker_tinker__RocknRoll__uuid 0xABCDE #define org_tinker_tinker__RocknRoll__rock(rd, rs1, rs2) xcmd0 rd, rs1, rs2 #define org_tinker_tinker__RocknRoll__roll(rd, rs1, rs2) xcmd1 rd, rs1, rs2 allows the same sequence to be written as lui rd org_tinker_tinker__RocknRoll__uuid xext rd rs1 org_tinker_tinker__RocknRoll__rock(rd, rd, rs2) Readability of assembler is no big deal for a compiler, but people are supposed to _document_ the semantics of the interface. In particular specifying the semantics of the xintf in same way as the semantics of the cpu should allow formal verification. ==Implications for the RiscV ecosystem == The proposal allows independent groups to define one or more extension interfaces of (slightly crippled) R-type instructions implemented by an extension device. Such an extension device would be an native but non standard extension of the CPU, an IP tile or a closely coupled external chip and would be configured at manufacturing time or bootup of the CPU. The 20 bit provided by the UUID of an xintf is much more room than provided by the 2 custom 32 bit, or even 4 custom 64/48 bit opcode spaces. Thus the overloadable opcodes proposal avoids most of the need to put a claim on opcode space, and the associated collisions when combining independent extensions. In this respect it is similar to POSIX ioctls, which (almost) obviate the need for defining new syscalls to control new or nonstandard hardware. The expanded flexibility comes at the cost: the standard can specify the semantics of the delegation mechanism and the interfacing with the rest of the cpu, but the actual semantics of the overloaded instructions can only be defined by the designer of the interface. Likewise, a device can be conforming as far as delegation and interaction with the CPU is concerned, but whether the hardware is conforming to the semantics of the interface is outside the scope of spec. Being able to specify that semantics using the methods used for RV itself is clearly very valuable. One impetus for doing that is using it for purposes of its own, effectively freeing opcode space for other purposes. Also, some interfaces may become de facto or de jure standards themselves, necessitating hardware to implement competing interfaces. I.e., facilitating a free for all, may lead to standards proliferation. C'est la vie. The only "ISA-collisions" that can still occur are in the 20 bit (~10^6) interface identifier space, with 12 more bits to identify a device on a hart that implements the interface. One suggestion is setting aside 2^19 id's that are handed out for a small fee by a central (automated) registration (making sure the space is not just claimed), while the remaining 2^19 are used as a good hash on a long, plausibly globally unique human readable interface name. This gives implementors the choice between a guaranteed private identifier paying a fee, or relying on low probabilities. On RV64 the UUID can also be extended to 52 bits (> 10^15). ==== Description of the extension as C functions.== /* register format of rs1 for xext instructions */ typedef struct uuid_device{ long dev:12; long uuid: 8*sizeof(long) - 12; } uuid_device_t /* register format for rd of xext and rs1 of xcmd instructions, packs lun and data */ typedef struct lun_data{ long lun:12; long data: 8*sizeof(long) - 12; } lun_data_t /* proposed R-type instructions xext rd rs1 rs2 xcmd0 rd rs1 rs2 xcmd1 rd rs1 rs2 ... xcmd7 rd rs1 rs2 */ lun_data_t xext(uuid_dev_t rs1, long rs2); long xcmd0(lun_data_t rs1, long rs2); long xcmd1(lun_data_t rs1, long rs2); ... long xcmd(lun_data_t rs1, long rs2); /* hardware interface presented by an implementing device. */ typedef long device_fn(unsigned short subdevice_xcmd, lun_data_t rs1, long rs2); /* cpu internal datatypes */ enum privilege = {user = 0b0001, super = 0b0010, hyper = 0b0100, mach = 0b1000}; /* cpu internal, does what is on the label */ static enum privilege cpu__current_privilege_level() typedef struct lun{ unsigned short id:12 } lun_t; struct uuid_device_priv2lun{ struct{ uuid_dev_t uuid_dev; enum privilege reqpriv; }; lun_t lun; }; struct device_subdevice{ device_fn* device_addr; unsigned short subdeviceId:12; }; struct lun_priv2device_subdevice{ struct{ lun_t lun; enum privilege reqpriv } struct device_subdevice devAddr_subdevId; } static struct uuid_device_priv2lun cpu__lun_map[]; /* map (UUID, device, privilege) to a 12 bit lun, return -1 on unknown (at acces level) does associative memory lookup and tests privilege. */ static short cpu__lookup_lun(const struct uuid_device_priv2lun* lun_map, uuid_dev_t uuid_dev, enum privilege priv, lun_t on_notfound); #define org_RiscV__Trap__lun ((lun_t)0) #define org_RiscV__Fallback__ReturnZero__lun ((lun_t)1) #define org_RiscV__Fallback__ReturnMinusOne__lun ((lun_t)2) lun_data_t xext(uuid_dev_t rs1, long rs2) { short lun = cpu__lookup_lun(lun_map, rs1, current_privilege_level(), org_RiscV__Fallback__Trap__lun); if(lun < 0) return (lun_data_t){.lun = org_RiscV__Fallback__Trap__lun, .data = 0}; return (lun_data_t){.lun = lun, .data = rs2 % (1<< (8*sizeof(long) - 12))} } lun_data_t xext0(uuid_dev_t rs1, long rs2) { short lun = cpu__lookup_lun(lun_map, rs1, current_privilege_level(), org_RiscV__Fallback__Trap__lun); if(lun < 0) return (lun_data_t){.lun = org_RiscV__Fallback__ReturnZero__lun, .data = 0}; return (lun_data_t){.lun = lun, .data = rs2 % (1<< (8*sizeof(long) - 12))} } lun_data_t xextm1(uuid_dev_t rs1, long rs2) { short lun = cpu__lookup_lun(lun_map, rs1, current_privilege_level(), org_RiscV__Fallback__Trap__lun); if(lun < 0) return (lun_data_t){.lun = org_RiscV__Fallback__ReturnMinusOne__lun, .data = 0}; return (lun_data_t){.lun = lun, .data = rs2 % (1<< (8*sizeof(long) - 12))} } struct lun_priv2device_subdevice cpu__device_subdevice_map[]; /* map (lun, priv) to struct device_subdevice pair. For lun = 0, or unknown (lun, priv) pair, returns (struct device_subdevice){NULL,0} */ static device_subdevice_t cpu__lookup_device_subdevice(const struct lun_priv2device_subdevice_map* dev_subdev_map, lun_t lun, enum privileges priv); /* functional description of the delegating xcmd0 .. xcmd7 instructions */ template //pretend this is C long xcmd(lun_data_t rs1, long rs2) { struct device_subdevice dev_subdev = cpu__lookup_device_subdevice(device_subdevice_map, rs1.lun, current_privilege()); if(dev_subdev.devAddr == NULL) cpu__trap_to(next_privilege); return dev_subdev.devAddr(dev_subdev.subdevId | k >> 12 , rs1, rs2); } /*Fallback interfaces*/ #define org_RiscV__Fallback__ReturnZero__uuid 1 #define org_RiscV__Fallback__ReturnMinusOne__uuid 2 /* fallback device */ static long cpu__falback(short subdevice_xcmd, lun_data_t rs1, long rs2) { switch(subdevice_xcmd % (1 << 12) ){ case 0 /* org.RiscV:ReturnZero */: return 0; case 1 /* org.RiscV:ReturnMinus1 */: return -1 default: trap("hardware configuration error"); } Example: // Fake UUID's #define com_bigbucks__Frobate__uuid 0xABCDE #define org_tinker_tinker__RocknRoll__uuid 0x12345 #define org_tinker_tinker__Jazz__uuid 0xBEB0B /* com.bigbucks:Frobate{ uuid: com_bigbucks__Frobate__uuid frobate rd rs1 rs2 : cmd0 rd rs1 rs2 foo rd rs1 rs2 : cmd1 rd rs1 rs2 bar rd rs1 rs2 : cmd1 rd rs1 rs2 } */ org.tinker.tinker:RocknRoll{ uuid: org_tinker_tinker__RocknRoll__uuid rock rd rs1 rs2: cmd0 rd rs1 rs2 roll rd rs1 rs2: cmd1 rd rs1 rs2 } /* Device 1 implements com.bigbucks::Frobate and org.tinker.tinker interfaces, uses a special command for the machine level implementation. */ long com_bigbucks__device1(short subdevice_xcmd, lun_data_t rs1, long rs2) { switch(subdevice_xcmd) { case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */ : return device1_frobate(rs1, rs2); case 0 | 7 << 12 /* com.bigbucks:Frobate:frobate */ : return device1_frobate_machine_level(rs1, rs2); case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device1_foo(rs1, rs2); case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device1_bar(rs1, rs2); case 1 | 0 << 12 /* org.tinker.tinker:RocknRoll:rock */ : return device1_rock(rs1, rs2); case 1 | 1 << 12 /* org.tinker.tinker:RocknRoll:roll */ : return device1_roll(rs1, rs2); default: trap(“hardware configuration error”); } } /* org.tinker.tinker:Jazz{ uuid: org_tinker_tinker__Jazz__uuid boogy rd rs1 rs2: cmd0 rd rs1 rs2 } */ /* Device 2 implements Frobate and Jazz interfaces */ long org_tinker_tinker__device2(short subdevice_xcmd, lun_data_t rs1, long rs2) { switch(dev_cmd.interfId){ case 0 | 0 << 12 /* com.bigbucks:Frobate:frobate */: return device2_frobate(rs1, rs2); case 0 | 1 << 12 /* com.bigbucks:Frobate:foo */ : return device2_foo(rs1, rs2); case 0 | 2 << 12 /* com.bigbucks:Frobate:bar */ : return device2_foo(rs1, rs2); case 1 | 0 << 12 /* org_tinker_tinker:Jazz:boogy */: return device2_boogy(rs1, rs2); default: trap(“hardware configuration error”); } } /* cpu assigns luns to the interfaces at different privilege levels on device1 and 2 to luns at manufacturing or boot up time */ #define cpu__Device1__Frobate__lun ((lun_t)32) #define cpu__Device1__RocknRoll__lun ((lun_t)33) #define cpu__Device2__Frobate__lun ((lun_t)34) #define cpu__Device2__Jazz__lun ((lun_t)35) /* struct uuid_dev2lun_map[] */ lun_map = { {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = user}, .lun = org_RiscV__Fallback__ReturnZero__lun}, {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = super}, .lun = org_RiscV__Fallback__ReturnZero__lun}, {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = hyper}, .lun = org_RiscV__Fallback__ReturnZero__lun}, {{.uuid_devId = {org_RiscV__Fallback__ReturnZero__uuid , 0}, .priv = mach} .lun = org_RiscV__Fallback__ReturnZero__lun}, {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = user}, .lun = org_RiscV__Fallback__ReturnMinusOne__lun}, {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = super}, .lun = org_RiscV__Fallback__ReturnMinusOne__lun}, {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = hyper}, .lun = org_RiscV__Fallback__ReturnMinusOne__lun}, {{.uuid_devId = {org_RiscV__Fallback__ReturnMinusOne__uuid, 0}, .priv = mach}, .lun = org_RiscV__Fallback__ReturnMinusOne__lun}, {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = user} .lun = cpu__Device1__Frobate__lun}, {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = super} .lun = cpu__Device1__Frobate__lun}, {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = hyper} .lun = cpu__Device1__Frobate__lun}, {{.uuid_devId = {com_bigbucks__Frobate__uuid, 1}, .priv = mach} .lun = cpu__Device1__Frobate__lun}, {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = super} .lun = cpu__Device2__Frobate__lun}, {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = hyper} .lun = cpu__Device2__Frobate__lun}, {{.uuid_devId = {com_bigbucks__Frobate__uuid, 0}, .priv = mach} .lun = cpu__Device2__Frobate__lun}, {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = user} .lun = cpu__Device1__RocknRoll__lun}, {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super} .lun = cpu__Device1__RocknRoll__lun}, {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper} .lun = cpu__Device1__RocknRoll__lun}, {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = super}, .lun = cpu__Device2__Jazz__lun}, {{.uuid_devId = {org_tinker_tinker__RocknRoll__uuid, 0}, .priv = hyper}, .lun = cpu__Device2__Jazz__lun}, } /* cpu maps luns + privilege level to busaddress of device and particular subdevice according to spec of the device.*/ /* struct lun2dev_subdevice_map[] */ dev_subdevice_map = { // .lun = 0, will trap {{.lun = org_RiscV__Fallback__ReturnZero__lun, .priv = user}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}}, {{.lun = org_RiscV__Fallback__ReturnZero__lun, .priv = super}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}}, {{.lun = org_RiscV__Fallback__ReturnZero__lun, .priv = hyper}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}}, {{.lun = org_RiscV__Fallback__ReturnZero__lun, .priv = mach}, .devAddr_interfId = {fallback, 1 /* ReturnZero */}}, {{.lun = org_RiscV__Fallback__ReturnMinusOne__lun, .priv = user}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}}, {{.lun = org_RiscV__Fallback__ReturnMinusOne__lun, .priv = super}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}}, {{.lun = org_RiscV__Fallback__ReturnMinusOne__lun, .priv = hyper}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}}, {{.lun = org_RiscV__Fallback__ReturnMinusOne__lun, .priv = mach}, .devAddr_interfId = {fallback, 2 /* ReturnMinusOne*/}}, // .lun = 3 .. 7 reserved for other fallback RV interfaces // .lun = 8 .. 30 reserved as error numbers, c.li t1 31; bltu rd t1 L_fail tests errors // .lun = 31 reserved out of caution {{.lun = cpu__Device1__Frobate__lun, .priv = user}, .devAddr_interfId = {device1, 0 /* Frobate interface */}}, {{.lun = cpu__Device1__Frobate__lun, .priv = super}, .devAddr_interfId = {device1, 0 /* Frobate interface */}}, {{.lun = cpu__Device1__Frobate__lun, .priv = hyper}, .devAddr_interfId = {device1, 0 /* Frobate interface */}}, {{.lun = cpu__Device1__Frobate__lun, .priv = mach}, .devAddr_interfId = {device1,64 /* Frobate machine level */}}, {{.lun = cpu__Device1__RocknRoll__lun, .priv = user}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}}, {{.lun = cpu__Device1__RocknRoll__lun, .priv = super}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}}, {{.lun = cpu__Device1__RocknRoll__lun, .priv = hyper}, .devAddr_InterfId = {device1, 1 /* RocknRoll interface */}}, {{.lun = cpu__Device1__RocknRoll__lun, .priv = super}, .devAddr_interfId = {device2, 1 /* Frobate interface */}}, {{.lun = cpu__Device2__Frobate__lun, .priv = super}, .devAddr_interfId = {device2, 0 /* Frobate interface */}}, {{.lun = cpu__Device2__Frobate__lun, .priv = hyper}, .devAddr_interfId = {device2, 0 /* Frobate interface */}}, {{.lun = cpu__Device2__Frobate__lun, .priv = mach}, .devAddr_interfId = {device2, 0 /* Frobate interface */}}, {{.lun = cpu__Device2__Jazz__lun, .priv = super}, .devAddr_interfId = {device2, 1 /* Jazz interface */}}, {{.lun = cpu__Device2__Jazz__lun, .priv = hyper}, .devAddr_interfId = {device2, 1 /* Jazz interface */}}, }