articleRevisions for Output and by-reference parameters

Comparing two revisions:

Mon, 09/13/2010 - 21:16 by Colin LeMahieuMon, 09/13/2010 - 21:32 by Colin LeMahieu
Changes to Body
 
medium_procedure (i, passref j)
 
medium_procedure (i, passref j)
 
-- Other long operations
 
-- Other long operations
-
i := i + 1
+
if j > 50 then
  +
i := i + 1
  +
else
  +
i := i + 2
  +
end
 
end
 
end
 
</eiffel>
 
</eiffel>
 
 
 
I'm not tied to any particular syntax solution to the problem, I'm interested if anyone thinks this issue is worth addressing and if so, any critiques on the above rough syntax.
 
I'm not tied to any particular syntax solution to the problem, I'm interested if anyone thinks this issue is worth addressing and if so, any critiques on the above rough syntax.
Revision of Mon, 09/13/2010 - 21:32:
article

Output and by-reference parameters

I'm trying to lay out a case for adding parameter passing by value and/or output parameters. I think this functionality would address a functionality hole in the language and I hope the case is made that this is more than just a syntax-sugar request.

The main three driving factors for this request are:

  • Maintaining Command Query Separation
  • Simplifying void-safety when communicating data between procedures
  • Allowing breaking down of large procedures without performance penalty

Eiffel has difficulty when trying to communicate information between procedures. The main mechanism to communicate information between procedures is to either break CQS and set a `Result' on a function that modifies state, or use object state in order to communicate information between procedures. An example of this is with IO_MEDIUM.read_xxx variants.

1) read_xxx is not a pure function; returning what was read breaks CQS because the input cursor is advanced 2) read_xxx is not a pure procedure; it needs to communicate information about what happened in the procedure namely what was read. 3) read_xxx needs to communicate information about the procedure but this information is not relevant to the state of the IO_MEDIUM across all threads or processors in SCOOP terminology. Typically the processor that invoked the read_xxx is the only processor that's interested in the information that was read.

Typically the current way this is dealt with is by ignoring the drawbacks of Issue 3 and writing the information that needs to be communicated to object state. The two big issues with this are void-safety and performance.

The performance penalty is big and the only way to fix the issue is to structurally change the program. When information from a procedure is written to object state, it needs to be written out to main memory and cannot be stored in a CPU register or on the stack which is typically cached. If one looks at the Eiffel compiler code you can see how this issue was worked around in the compiler, by creating very large procedures so locals can be used to hold procedure information. This results in two bad things, large procedures and duplicated code.

The void-safety issue can be solved in only two ways, dummy values when procedure information is not set i.e. dummy values for IO_MEDIUM.last_xxx or by making all information variables detachable. The dummy value strategy has a drawback in that it's hard to make dummy values of some objects. Dummy values for STRING_8 may be obvious but a dummy value of a complex class may not be obvious. The detachable variable strategy has the drawback in that all accesses of IO_MEDIUM.last_xxx need to do object tests, even though the attachment could be statically provable if both `read_xxx' and `last_xxx' were inlined.

One condition for output parameters that makes it particularly difficult is that it needs to change the attachment value from detachable to attached. Essentially:

a_feat
 local
  a: STRING -- `a' is attached
 do
  -- `a' is not yet set
  b_feat (a) -- This would fail current CAPs
  -- `a' is now assigned 
end
 
b_feat (input: STRING)
 do
  -- `input' is not assigned
  input := "hello" -- Changes from detachable to attached.  Assignment to parameters is not allowed
  -- `input' is now assigned
 end
By-ref parameters would not change the attachment of a parameter.

The other issue is that parameters are not assignable in Eiffel for good reason, creating a type of `output' or 'byref' parameters would make this different.

One syntax option is to separate the parameter block in to by-value, by-reference, or by-out sections. The first parameter block function as existing parameters, they can only be used and not assigned. `passref' parameters can be both used and assigned however they're not scratch space, the parameter is passed by reference. The last parameter block can only be assigned to and if the parameters are attached, they must be assigned to.

a_feat
 local
  a: STRING
  b: detachable STRING
  c: STRING
 do:= "hello"
  b_feat (a passref b passout c) 
  -- `a' = "hello"
  -- `b' = "hello" aliased with `a'
  -- `c' = "hello" aliased with `a'
  b_feat (a, passref b, passout c)
  -- `a' = "hello"
  -- `b' = "hello" aliased with `a'
  -- `c' = "hellohello"
 end
 
b_feat (one: STRING passref two: detachable STRING passout three: STRING)
 do
  if attached two as two_l then
   three := one + two_l
  else
   two := one
   three := one
  end
 end

This allows us to not use global state for procedure information as in our `read_xxx' procedures:

read_string (passout target: STRING_8)
 do
  <read from IO>
  target := <data_read_from_IO>
 end

And allows breaking up of procedures without breaking CQS and without performance penalty:

a_feat
 local
  i: INTEGER
 do
  from
  until
   i > 100_000_000
  loop
    very_big_feat (passref i)
  end
 end
 
very_big_feat (passref i: INTEGER)
 local
  j: INTEGER
 do
  -- Lots of operations
  medium_procedure (i, passref j) 
  -- Other long operations
  if j > 50 then:= i + 1
  else:= i + 2
  end
 end

I'm not tied to any particular syntax solution to the problem, I'm interested if anyone thinks this issue is worth addressing and if so, any critiques on the above rough syntax.