12

BCPL, a major influence on C, features an operator ! that serves the dual role of array subscripting arr!i and pointer indirection !ptr. The underlying idea, as I understand it, is that a pointer can be viewed as an index into main memory, treating RAM conceptually as an ambient (and hence tacit) array of words.

In B and later C, BCPL's ! was bifurcated into the prefix * for indirection and the postfix [ ] for subscripting. What motivated this? Why not [ptr] == 0[ptr] == ptr[0] == *ptr?

New contributor
William Ryman is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
6
  • 7
    Unless someone can dig up sources from interviews etc., the answer will probably remain opinion-based and speculation, but data structures were a hot topic in other languages at that time (Algol), and those languages did distinguish between pointers and arrays. IIRC it also makes a difference for lvalues vs. rvalues (but I would have to look this up). Commented 23 hours ago
  • 2
    " 0[ptr] == ptr[0] == *ptr " -- this part is actually pretty true (the same or equivalent) in C. Because a[b] is effectively *(a+b), thus no difference between a[b] and b[a]. Commented 18 hours ago
  • It depends on the types of a and b, but as long as one is an integer and the other a pointer, you're fine. a[b] is equivalent to the expression *((a) + (b)). Adding an integer to a pointer gives an address depending on the pointer type, regardless of whether the integer or the pointer comes first. Adding two pointers isn't allowed. Commented 15 hours ago
  • Before addition, integer is multipied by the size of object which the pointer is typed with. Therefore, *(ptr+index) is always fine, the same as ptr[index] or index[ptr]. Commented 14 hours ago
  • Note that BCPL had neither have data-structures nor pointers as in C , instead it had word-pointers (to the pain of Amiga users) so for 32-bit words and 8-bit bytes it means that all pointers were divided by 4. Since C improved that part could that be linked with the syntax change (either because some version had both and needed different syntax, or just to avoid confusion)? Commented 14 hours ago

3 Answers 3

10

dirkt’s answer explains the difference between a pointer and an array, and I think that is the reason for the split. Dennis M. Ritchie explains it thus in The Development of the C Language:

These semantics represented an easy transition from B, and I experimented with them for some months. Problems became evident when I tried to extend the type notation, especially to add structured (record) types. Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as

struct {
  int  inumber;
  char name[14];
};

I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?

The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.

In a struct, an array declaration and a pointer declaration aren’t the same:

struct {
    char buffer[14];
    char *bufptr;
}

The array declaration reserves space for the array, the pointer declaration only reserves space for the pointer. buffer can be used as a pointer but the pointer itself doesn’t appear in the data structure.

1
  • This is all true, but it does not mean a notational difference is necessary. a*i could continue to mean *(a+i) with the restriction that a must be a declared array name; it can coexist with *p. Rather, I'd guess that once you've made the step that arrays and pointers are sematically distinct, it frees you to use different operator symbols. Or maybe Ritchie just liked a little sugar in his syntax. Commented 10 hours ago
8

Partial answer:

There is one important difference between a pointer and an array, when considered as declared name. From K&R 2nd edition, chapter 5.3:

There is one difference between an array name and a pointer that must be kept in mind. A pointer is a variable, so pa=a and pa++ are legal. But an array name is not a variable; constructions like a=pa and a++ are illegal.

So an array declaration reserves space, while a pointer declaration doesn't. Without the distinction between pointer and array, one couldn't reserve space for an array (though one could still use malloc etc).

Note that this doesn't extend to formal parameters (same chapter):

Within the called function, this argument is a local variable, and so an array name parameter is a pointer, that is, a variable containing an address.

[...]

As formal parameters in a function definition, char s[]; and char *s; are equivalent; we prefer the latter because it says more explicitly that the variable is a pointer.

I don't know if that was the reason for the split between arrays/pointers (and I actually do not know how B handles array allocation), and as I wrote in the comment I am not entirely sure if one can find out the reason in the first place, but the above is at least one important difference.

Also note, that as already pointed out in the comments, otherwise array indexing and pointer arithmetic are often entirely exchangeable:

a reference to a[i] can also be written as *(a+i). In evaluating a[i], C converts it to *(a+i) immediately; the two forms are equivalent. Applying the operator & to both parts of this equivalence, it follows that &a[i] and a+i are also identical: a+i is the address of the i-th element beyond a. As the other side of this coin, if pa is a pointer, expressions might use it with a subscript; pa[i] is identical to *(pa+i).

4

Why choose brackets for array indexing over an infix operator?

Surely the answer is that brackets were in common use for array indexing, and binary operators (whether represented by exclamation mark or anything else) were not.

Ritchie et. al. were familiar with Algol-class languages that used brackets. This may be bias on my part, but brackets seem simpler, especially when the index is an expression rather than a simple variable. a[b+c] rather than (presumably) a!(b+c).

Why not choose brackets for the indirection operator?

Why not [p] for indirection? I have no solid answer, but just the observation that to most other languages -- possibly excepting MACRO-10 -- it seems less obvious.

Why choose asterisk for indirection? Because that was the indirection operator they used in the Unix PDP-11 assembler. (And that choice appears to be because @, as used in the DEC PAL-11 assembler, was the line-kill character in the Unix shell)

To reinforce the choice of brackets for array indexing - as a practical matter, once *p was chosen for indirection through a pointer, a*i as an array reference (by analogy with a!i) is problematic, to programmers if not to the compiler, due to the various uses of the symbol *.

This all looks like it boils down to a matter of taste.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.