articleRevision of Output and by-reference parameters from Tue, 10/19/2010 - 17:51

The revisions let you track differences between multiple versions of a post.

This request is retracted, see http://www.eiffelroom.org/node/467 for a demonstration of an Object Oriented way of doing output and byref parameters.

I'm trying to lay out a case for adding parameter passing by value and/or output parameters. I think this functionality would address a functionality hole in the language and I hope the case is made that this is more than just a syntax-sugar request.

The main three driving factors for this request are:

  • Maintaining Command Query Separation
  • Simplifying void-safety when communicating data between procedures
  • Allowing breaking down of large procedures without performance penalty

Eiffel has difficulty when trying to communicate information between procedures. The main mechanism to communicate information between procedures is to either break CQS and set a `Result' on a function that modifies state, or use object state in order to communicate information between procedures. An example of this is with IO_MEDIUM.read_xxx variants.

1) read_xxx is not a pure function; returning what was read breaks CQS because the input cursor is advanced 2) read_xxx is not a pure procedure; it needs to communicate information about what happened in the procedure namely what was read. 3) read_xxx needs to communicate information about the procedure but this information is not relevant to the state of the IO_MEDIUM across all threads or processors in SCOOP terminology. Typically the processor that invoked the read_xxx is the only processor that's interested in the information that was read.

Typically the current way this is dealt with is by ignoring the drawbacks of Issue 3 and writing the information that needs to be communicated to object state. The two big issues with this are void-safety and performance.

The performance penalty is big and the only way to fix the issue is to structurally change the program. When information from a procedure is written to object state, it needs to be written out to main memory and cannot be stored in a CPU register or on the stack which is typically cached. If one looks at the Eiffel compiler code you can see how this issue was worked around in the compiler, by manually inlining features and creating very large procedures so locals can be used to hold procedure information. This results in two bad things, large procedures and duplicated code.

The void-safety issue can be solved in only two ways, dummy values when procedure information is not set i.e. dummy values for IO_MEDIUM.last_xxx or by making all information variables detachable. The dummy value strategy has a drawback in that it's hard to make dummy values of some objects. Dummy values for STRING_8 may be obvious but a dummy value of a complex class may not be obvious. The detachable variable strategy has the drawback in that all accesses of IO_MEDIUM.last_xxx need to do object tests, even though the attachment could be statically provable if both `read_xxx' and `last_xxx' were inlined.

One condition for output parameters that makes it particularly difficult is that it needs to change the attachment value from detachable to attached. Essentially:

a_feat
 local
  a: STRING -- `a' is attached
 do
  -- `a' is not yet set
  b_feat (a) -- This would fail current CAPs
  -- `a' is now assigned
end

b_feat (input: STRING)
 do
  -- `input' is not assigned
  input := "hello" -- Changes from detachable to attached.  Assignment to parameters is not allowed
  -- `input' is now assigned
 end
By-ref parameters would not change the attachment of a parameter.

The other issue is that parameters are not assignable in Eiffel for good reason, creating a type of `output' or 'byref' parameters would make this different.

One syntax option is to separate the parameter block in to by-value, by-reference, or by-out sections. The first parameter block function as existing parameters, they can only be used and not assigned. `passref' parameters can be both used and assigned however they're not scratch space, the parameter is passed by reference. The last parameter block can only be assigned to and if the parameters are attached, they must be assigned to.

a_feat
 local
  a: STRING
  b: detachable STRING
  c: STRING
 do
  a := "hello"
  b_feat (a, passref b, passout c)
  -- `a' = "hello"
  -- `b' = "hello" aliased with `a'
  -- `c' = "hello" aliased with `a'
  b_feat (a, passref b, passout c)
  -- `a' = "hello"
  -- `b' = "hello" aliased with `a'
  -- `c' = "hellohello"
 end

b_feat (one: STRING passref two: detachable STRING passout three: STRING)
 do
  if attached two as two_l then
   three := one + two_l
  else
   two := one
   three := one
  end
 end

This allows us to not use global state for procedure information as in our `read_xxx' procedures:

read_string (passout target: STRING_8)
 do
  <read from IO>
  target := <data_read_from_IO>
 end

And allows breaking up of procedures without breaking CQS and without performance penalty:

a_feat
 local
  i: INTEGER
 do
  from
  until
   i > 100_000_000
  loop
    very_big_feat (passref i)
  end
 end

very_big_feat (passref i: INTEGER)
 local
  j: INTEGER
 do
  -- Lots of operations
  medium_procedure (i, passref j)
  -- Other long operations
  if j > 50 then
   i := i + 1
  else
   i := i + 2
  end
 end

I'm not tied to any particular syntax solution to the problem, I'm interested if anyone thinks this issue is worth addressing and if so, any critiques on the above rough syntax.


Another option that would address the performance issue but would not address the void-safe kluge issue, would be decorating parameters as "not assigned". This would also allow a CAP in creation procedures that would allow mutually recursive references of void-safe objects without the need for `stable' on attributes. If a local reference object was created and it was never assigned to object state and only passed as an argument to a routine where the parameter was marked as "not assigned" then the object could be allocated on the stack instead of on the heap. If an parameter is marked as "not assigned" it can only be passed as an argument to routines where the parameter is "not assigned" and never be assigned to an attribute.

Comments

I tackle this in the

colin-adams's picture

I tackle this in the following way:

  • I use a DS_CELL [information-type] as an output argument.
  • Pre-conditions say this cell must be non-void, but it's contents must be void.
  • Postcondition says the cell's contents must be non-void (in most use cases).

This is what I use right now

This is what I use right now and it works well I agree. The drawbacks I see are, DS_CELL needs to be allocated through the heap memory allocator which is a performance hit compared to manual inlining and large procedures and it's still a little klugy with void-safety in that one needs to test if the item is attached. This is satisfies requirements for by-ref except memory allocation.

Berend and I were

Berend and I were hypothesizing on how to do output parameters without language changes. What do you think of this:

Lets take the IO_MEDIUM.read_string example.

If IO_MEDIUM.read_string were actually:

class IO_MEDIUM
feature
 read_string (target: CELL [STRING])
  do
   gather_from_io_medium
   target.put (string_from_io_medium)
  end
end

change it to

class IO_MEDIUM
end

expanded class STRING_FROM_IO_MEDIUM
feature
 make (source: IO_MEDIUM)
  do
   gather_from_io_medium
   item := string_from_io_medium
  end
  item: STRING
end

This way `item' is attached via the creation procedure CAP and since {STRING_FROM_IO_MEDIUM} is expanded, it gives the performance gain from stack allocation.

You need to complete the sketch

colin-adams's picture

You need to show the new implementation for read_line (and the old one, for that matter). I think I get the idea, but I suspect it won't be thread-safe re-entrant (not that that matters for read_line, but for general applicability it does).

Syndicate content