Skip to content

Latest commit

 

History

History
329 lines (239 loc) · 12.8 KB

references-and-values.md

File metadata and controls

329 lines (239 loc) · 12.8 KB

References and Values

Learning Goals

At the end of this lesson, students should be able to...

  • Discuss how objects are stored in Ruby
  • Differentiate between references and values
  • Compare modifying an object with reassigning a variable

Motivation

We're going to start today with some Ruby code that does something a little unexpected. It's a method that takes an array of strings as an argument, and returns a truncated version (the ends are chopped off) of all the strings with more than three characters. But it does something unexpected as well.

def short_strings(input)
  input.each_with_index do |word, i|
    # Slice characters 0 to 2
    input[i] = word[0..2]
  end
  return input
end

pets = ['dog', 'parrot', 'cat', 'llama']
shortened_pets = short_strings(pets)
puts "pets: #{pets}"
puts "shortened_pets: #{shortened_pets}" 

Running this code results in pets: ["dog", "par", "cat", "lla"] shortened_pets: ["dog", "par", "cat", "lla"] The pets array is changed! Why? I thought the input was copied over when given to a method!

References and Values

When we create an array in Ruby (or a string or a hash or any other complex data type), we're actually creating two things.

The first is the value of the array, which involves asking the operating system for a bit of memory and then putting our data in it. You can think of this as the actual object. Each piece of memory we get from the OS has an address representing its physical location in hardware, which is how we get back to it later.

The second is a reference to the array, which ties together the address of that memory with a name for our program to use. The address part of a reference is sometimes called a pointer (especially in low-level languages like C and C++), and we say that a variable points to or references an object.

This split between references and values comes up often, both in computing and in the wider world. Here are some examples:

Reference Value
Street address Your house
URL Web page or file
File path on hard drive The contents of that file
A named variable The contents of that variable

Recalling our example above, saying

pets = ['dog', 'parrot', 'cat', 'llama']

and imagining that the pets array has been stored at address 1234, we would get the following memory layout:

references and variables

Every variable in Ruby consists of these two parts, a reference and a value. Normally when you type the variable's name, Ruby automatically goes and gets the object. You'll almost never need to use the address yourself. If you do want to find out what object your variable references, you can use the object_id method:

pets = ["dog", "parrot", "cat", "llama"]
puts "pets.object_id: #{pets.object_id}"

# Different objects have different IDs
veggies = ["turnip", "beet"]
puts "veggies.object_id: #{veggies.object_id}"

The = Operator

The = operator changes what a variable points at.

If we assign one variable to another variable, they will both reference the same underlying object.

# Two variables can point to the same object
repeat = veggies
puts "repeat.object_id: #{repeat.object_id}" # same as veggie.object_id

referencing a variable twice

If we make changes to the object through one variable, we can see the changes via the other. The variables have different names, but the underlying object is the same.

puts "#{veggies}"     # ["turnip", "beet"]
puts "#{repeat}"      # ["turnip", "beet"]

veggies[1] = "onion"
puts "#{veggies}"     # ["turnip", "onion"]
puts "#{repeat}"      # ["turnip", "onion"]

repeat.push("potato")
puts "#{veggies}"     # ["turnip", "onion", "potato"]
puts "#{repeat}"      # ["turnip", "onion", "potato"]

modifying the underlying object

When we use the = operator, we are not changing the underlying object but instead changing what our variable points to. This does not affect any other variables.

repeat = ["new", "array"]
puts "repeat.object_id: #{repeat.object_id}"
puts "value of repeat:"
puts "#{repeat}"    # ["new", "array"]
puts "value of veggies:"
puts "#{veggies}"   # ["turnip", "onion", "potato"]

creating a new array

So to summarize, if two variables point to the same underlying object:

  • Modifications to the object (the value) will be visible from both variables
  • Reassigning one variable (the reference) with = does not affect the other variable

Identifying Reassignment

One subtle point is that += and the other shorthand operators all involve reassignment. If we say veggies += ['rutabaga'], Ruby creates a new array, copies all the values from veggies, adds in rutabaga, and reassigns veggies to point to this new array. This is true of strings and numbers as well.

On the other hand, << does not involve reassignment. << is shorthand for the .push() method, which changes the underlying object, not the variable itself (most methods work this way). Saying veggies << 'rutabaga' will modify the original array referenced by veggies, and other variables referencing that array will be able to see the changes as well.

So how do you tell whether an operation involves reassignment? A good rule of thumb is that anything involving the = sign will reassign the variable, and any other operation (like <<, .push() or .concat()) will not.

Passing Parameters

Question: When we pass a parameter to a method, what do you get?

  • Is it the same underlying object?
  • Is it the same variable?
  • How can we find out?

Let's write some code that will help us investigate this.

def reassign_parameter(param)
  puts "  Inside reassign_parameter"
  puts "  at start, param.object_id is #{param.object_id}"

  # .push modifies the underlying object
  param.push('gecko')
  puts "  after modification, param.object_id is #{param.object_id}"

  # = changes the reference
  param = ["new", "array"]
  puts "  after reassignment, param.object_id is #{param.object_id}"
  puts "  with value #{param}"
  puts "  Finish reassign_parameter"
end

pets = ["dog", "parrot", "cat", "llama"]
puts "Before reassign_parameter"
puts "pets.object_id is #{pets.object_id}"
puts "with value #{pets}"
puts

reassign_parameter(pets)

puts
puts "After reassign_parameter"
puts "pets.object_id is #{pets.object_id}"
puts "with value #{pets}"

Before running this code, take a couple minutes to read through it. What is it doing? What do you expect the output to be?

If we run the code, we see something like this (your object_ids may be different):

Before reassign_parameter
pets.object_id is 70144030241620
with value ["dog", "parrot", "cat", "llama"]

  Inside reassign_parameter
  at start, param.object_id is 70144030241620
  after modification, param.object_id is 70144030241620
  after reassignment, param.object_id is 70144030228060
  with value ["new", "array"]
  Finish reassign_parameter

After reassign_parameter
pets.object_id is 70144030241620
with value ["dog", "parrot", "cat", "llama", "gecko"]

Uhhhhhh, what? Let's break it down visually.

First, we create the array with it's real values, and assign pets to hold its address. pets gets assigned the address 1389, which contains the values dog, parrot, cat, llama

When we pass pets into reassign_parameter, param gets assigned the same address as pets param is assigned the address 1389 when we call reassign_parameters(pets)

We push "gecko" onto the end of the array, which changes the values because param looks up the array that pets is looking at. pets gets assigned the address 1389, which contains the values dog, parrot, cat, llama

Finally, we create a new array, and hand its address over to param to keep an eye on. This doesn't affect pets because pets is looking at the array independently. we create a new array, and then assign param to track it

This is exactly the same behavior we saw before, when we had two variables referencing the same object. From this we can conclude: when you pass a variable as parameter, Ruby creates a new variable that references same object.

Fixing the short_strings Method

Question: Given what we've learned, how can we modify our short_strings method to do what we want?

The answer is to simply make a new variable for the return value rather than updating the input. Here's what the resulting code might look like:

def short_strings(input)
  result = []
  input.each do |word|
    # Slice characters 0 to 2
    result << word[0..2]
  end
  return result
end

pets = ['dog', 'parrot', 'cat', 'llama']
shortend_pets = short_strings(pets)
puts "#{pets}"
puts "shortened_pets: #{shortend_pets}" 

Alternatively, there is a very handy method called .clone which will make a copy of any variable at a new address. If we wanted to use .clone, to fix our bug, it might look like this:

def short_strings(input)
  output = input.clone
  output.each_with_index do |word, i|
    # Slice characters 0 to 2
    output[i] = word[0..2]
  end
  return output
end

pets = ['dog', 'parrot', 'cat', 'llama']
shortened_pets = short_strings(pets)
puts "pets: #{pets}"
puts "shortened_pets: #{shortened_pets}" 

This WILL preserve pets as is was while giving us the shortened_pets value we want.

Now that we've solved it, check just how well we understand references, values, and reassignment. What output do we expect if we for some reason update input at the end?

def short_strings(input)
  result = []
  input.each do |word|
    # Slice characters 0 to 2
    result << word[0..2]
  end
  input = result
  return input
end

pets = ['dog', 'parrot', 'cat', 'llama']
shortened_strings = short_strings(pets)
puts "pets: #{pets}"
puts "shortened_pets: #{shortened_pets}"
What output do we expect here?

pets: ['dog', 'parrot', 'cat', 'llama'] shortened_pets: ['dog', 'par', 'cat', 'lla'] Why? The reassignment was on the input reference rather than on the underlying object like our original code did.

It's also worth noting that there are methods like map! that intentionally change the underlying object, so now we now how to confidently write methods like that if we ever need to. (In built-in ruby methods, these usually have a ! at the end of the name.)

Other Objects

We've talked a lot about arrays today, but this pattern holds true for all complex objects in Ruby: strings, hashes, instances of classes, etc. For example, consider the following code:

# Reassign a string using +=
def reassign_string(str)
  str += ' reassigned'
  puts "inside reassign_string, str is '#{str}'"
end

text = 'original'
reassign_string(text)
puts "outside reassign_string, text is '#{text}'"


# Modify a string using the .concat() method
def modify_string(str)
  # str << ' modified' would do the same thing
  str.concat(' modified')
  puts "inside modify_string, str is '#{str}'"
end

text = 'original'
modify_string(text)
puts "outside modify_string, text is '#{text}'"

Small, built-in types like numbers, booleans and nil follow basically the same rules. The catch is there's no way to change the underlying value of these types without reassignment. In programming lingo, we say that these types are immutable. This means that whenever you change the value, Ruby makes a copy and changes that instead.

Takeaway

  • A variable in Ruby consists of two things:
    • The variable itself, tying a name to an address in memory
    • The object at that memory address
    • We say a variable references or points to an object
  • Multiple variables can reference the same object
    • Changes to the underlying object will be reflected through both variables
      • Methods like .push() or .concat()
    • Changing what one variable points to does not affect any other variables
      • =, +=, etc.
  • Passing an argument to a method creates a new variable referencing the same object
  • Small built-in types like numbers, booleans and nil are immutable, meaning the underlying object can't be modified

Additional Resources