I inserted analytics into Otto The Robot Butler (ORB). I implemented the analytics hooks, implemented the changes between builds A and B, and distributed build A. ORB is a 10 to 15 minute 3D puzzle game in which the player controls a small Alexa type robot who can navigate an apartment complex to find and bring back objects for the tenants. Puzzles in the game consist of a tenant asking for an object that has two qualities. The player must find an object in the world matching one of the qualities and another matching the other quality, combine them, and return them to the requestor. There are six of these puzzles in the game, each with one solution. We wanted to measure how long it took players to solve each puzzle (how long it took them to find and combine the correct items, not including the time it took them to deliver it to the requestor, which had the potential to skew the data) and how many incorrect combination attempts they made per puzzle. Since we were in the fine tuning stages of the level design, we specifically wanted to test how the placement of objects in the level affected playtime. We thought this could correspond to the difficulty of finding hidden objects in the space. Measuring the incorrect combination attempts was an attempt to understand strategy and player interpretation. Since the ‘correct’ adjective of an object in the game was revealed to the player if they combined the object incorrectly with a non-requested quality, we wanted to see if in general players used this to learn about the world quickly (and if so at what point in the game.) We also thought this might correspond to players not understanding the quality of an object they were supposed to find. For example, do they interpret a broom as “clean” or “dirty”?
Our AB test measured time to solve each puzzle and wrong combination attempts per puzzle against differences in item placement. Build A was our ‘control’ using the item placements the team had designed as part of our gatekeeping strategy to get the player to explore the entire apartment complex. Build B had more carefully considered item placement in which sightlines, color, framing, and the composition of other objects in the room drew players toward items. The changes in Build B included: changing the positions of all objects except the scythe and lime, the hot sauce and TNT were moved closer to the people who would ask for those objects’ qualities, the beige colored cheese and plug items were moved to sit on a stack of black tires and away from the beige floor, moved the soiled bag item from behind a door to be in the frame of the door, placed noodle item in a box to frame it, placed the candle item in a chair facing the door the player entered the room from, and placed the corn item at the end of a hallway so it was framed by a door.

Our major finding about the time to solve was that puzzles 3 through 6, and particularly 3 and 4, took longer to solve in Build B. This was surprising to us, since Build B was changed to help the player find the objects faster. In puzzle 3, the hot sauce item was moved to be closer its requestor in Build B, however, this placed the object in a closet out of a direct sightline from the door. In Build A the hot sauce is close by the door, if not directly visible. We think this may have been a time sink for players who explored the space quickly and might not have noticed the closet. To address this in the next iteration of builds, we chose to keep the hot sauce item in its Build A location in order to shorten the average solve time for puzzle 3.

Our major finding from the number of wrong combination attempts was that Build A had a significant increase in the amount of wrong combination attempts. We think this is linked to the short puzzle solve times in Build A and points to a particular strategy. Build A players may have used the adjective reveal feature often to learn an object’s adjective when they came across it. Therefore, they knew what object corresponded to what qualities when a new puzzle request started. This allowed them to remember where the correct object is rather than always needing to find and interpret the correct object. High wrong combination attempts and short solve times point to this strategy. Whether or not this is indicative of the item layout is not directly measurable, but this efficiency behavior might be linked to the long traversal times and search times it takes to find the objects that are not intentionally highlighted by the level architecture. When applying this data, we chose to allow this strategy to continue, so we kept many of the item layouts from Build A.
In conclusion, we measured the time to solve each puzzle and the number of wrong combination attempts per puzzle. Our data showed that item placement has an effect on how long it takes players to decipher and remember the game space, as well as the formation of traversal and puzzle solving strategies.