Phaedrus Systems Blog:Rust Pt1 by the Wicked Witch

Chipping away the Rust – Part1: Avoiding memory errors

The Wicked Witch

TWW signed out last year with an heads-up piece on Mozilla's new language Rust. This month she gets into some details.

Rust fans speak of the trifecta of Rust which refers to its safety, speed and concurrency. Let's deal with speed first. The Rust compiler compiles to LLVM bytecode, whence translation to native code is straightforward. Currently native code translation is supported for ARM cores. Rust programs requires either no runtime system or a very small one. Thus, as more microcontroller targets get supported, Rust could become a serious option for embedded systems developers. To do that, however, it needs to solve some of C's inherent problems, one of which is the ease with which memory errors can occur as a result of out-of-bounds / out-of-scope, uninitialised or freed memory access and uncontrolled aliasing.

Rust does not address all such memory errors and developers of high-integrity code will still have to avoid algorithms for which either maximum stack or maximum heap heap use cannot be determined at compile time. Though some might think this a cop-out, it is actually a pragmatic tactic helping to avoid the sources of many common programming errors. Rust's features for memory error avoidance lie in its type system and the ideas it has adopted from functional programming, key aspects of which are:

* though assignment is permitted, it is built on the deeper concept of binding,

* variables are immutable (effectively const) by default,

* the use of references (pointers) is strictly controlled,

* a neat combination of name- and lifetime-based strong typing together with type-inference based on functional programming's pattern concept.

A recently-published and very accessible Rust text is Rust Essentials by Ivo Baelbaert, Packt Publishing, May 2015, ISBN-13: 978-1-78528-576-9, from which TWW has adapted the following examples. They don't tell the whole technical story but are meant to give a flavour of Rust's approach to avoiding memory errors.

Example 1: Strings

Misuse of strings is a very common source of memory errors in C. Unlike C strings, Rust strings are not null-terminated (and can contain null characters). The str type is immutable and has a fixed size and a static lifetime (storage duration). The String type can change in size and therefore must be allocated on the heap. Most uses of strings can be accomplished using only the str type, in which case the prescribed static lifetime, fixed length and immutability force the programmer to observe coding discipline. True the same effect is had in C by restricting language usage but with Rust the compiler complains, declares an error and produces no executable. You don't need a separate static checker checker. And if the great unwashed use String instead of str, you can check this effectively with easily-scripted regular-expression searching.

Example 2: Binding, owning and borrowing

Rust variables are declared by giving them a binding. In Rust the binding let n:i32; declares n to be a variable of type i32 – a 32-bit integer. Similarly one could write let n:i32 = 47; which effectively declares and initialises a 32-bit integer n. Another option is let n = 47; where the i32 type is inferred by the compiler. Rust's binding is, however, a broader concept than C's declaration. First, unless specified mut (for mutable), the variable cannot be altered. This reverses the C convention of variables being alterable unless specified const. That in itself is not much but it enables Rust to support some simple and effective anti-aliasing mechanisms based on its concepts of ownership and borrowing.

The binding let mut myvar1: i32 = 47; is also understood as creating a variable called myvar1 that owns a resource that is a chunk of memory containing a 32-bit integer. Only the owner of that chunk of memory can change it and there can be only one owner at a time. Also the owner is responsible for freeing the resource when the owner goes out of scope. (Not the same as a reference to the resource going out of scope.) Ownership of a resource can be changed by a new binding such as let myvar2 = myvar1; which makes myvar2 the owner of the resource and leaves myvar1 no longer usable. Thus the underlying memory is accessible only via the current owner. (Formal methods fans may recognise this as a neat development of the single-assignment paradigm.)

On the other hand the binding let myvar2 = &mut myvar1; allows allows myvar2 to borrow temporarily the right to access the memory resource belonging to myvar1. While the borrowing is in effect, myvar1 itself cannot be used to access that memory and the Rust compiler will flag any attempt to do so as an error. Ownership will return to myvar1, however, whenever myvar2 goes out of scope while myvar1 remains in scope. This is an elegant way to prevent dangling pointer errors. It rests on Rust's notion of binding which separates the notion of addressing memory from the right to access that memory and working with right-of-access in place of value assignment.

Example 3: Lifetime constraints

We've already mentioned in passing Rust's notion of lifetime as being similar to C's storage duration. After seeing Rust's features of ownership and borrowing, it should come as no surprise that lifetime too is part of the memory safety system. Suppose we declare a struct type as:

struct Position {
latitude: 'static f32,
longitude: f32
}

which we then pass to a function declared as:

fn return_position<'a>() ->&'a Position {         
      let pos= Position { latitude 0.0, longitude 0.0 }
&pos

Here the return type of the function is a pointer to a Position with lifetime a when one of the fields of Position has lifetime static. Rust will not allow this name conflict of lifetimes, which is a further way of avoiding potentially dangling references. Moreover, while the static lifetime is conventionally the same as static in C, the a lifetime is something designated by the programmer. Thus Rust allows lifetime designations other than C's static, auto or dynamically allocated enabling the programmer to define his own lifetime integrity constraints that may be more restrictive than the conventional ones and hence permit stronger error checking.

Summing up ...

Readers should appreciate that in the space available in TWW's column, only a few selected aspects of Rust's memory error prevention can be outlined. The aim is to stimulate interest. Readers wanting more can refer to Balbaert's book or to the online documentation at: https://www.rust-lan.org

Right now, at least according to Amazon, Balbaert's book is the only one in print on Rust, though O'Reilly Associates have a title coming out later this year. For the moment almost all Rust's documentation is on the website and much of it is not the easiest of reading. Nevertheless TWW's recommendation is to keep a watchful eye on Rust. Though by no means perfect it is by far the best attempt to improve on C that the degenerate spawn of mutinous colonists have yet contrived :-O.

The Wicked Witch

January 2016