Phy-Q: A Benchmark for Physical Reasoning

Overview

Phy-Q: A Benchmark for Physical Reasoning

Cheng Xue*, Vimukthini Pinto*, Chathura Gamage*
Ekaterina Nikonova, Peng Zhang, Jochen Renz
School of Computing
The Australian National University
Canberra, Australia
{cheng.xue, vimukthini.inguruwattage, chathura.gamage}@anu.edu.au
{ekaterina.nikonova, p.zhang, jochen.renz}@anu.edu.au

Humans are well-versed in reasoning about the behaviors of physical objects when choosing actions to accomplish tasks, while it remains a major challenge for AI. To facilitate research addressing this problem, we propose a new benchmark that requires an agent to reason about physical scenarios and take an action accordingly. Inspired by the physical knowledge acquired in infancy and the capabilities required for robots to operate in real-world environments, we identify 15 essential physical scenarios. For each scenario, we create a wide variety of distinct task templates, and we ensure all the task templates within the same scenario can be solved by using one specific physical rule. By having such a design, we evaluate two distinct levels of generalization, namely the local generalization and the broad generalization. We conduct an extensive evaluation with human players, learning agents with varying input types and architectures, and heuristic agents with different strategies. The benchmark gives a Phy-Q (physical reasoning quotient) score that reflects the physical reasoning ability of the agents. Our evaluation shows that 1) all agents fail to reach human performance, and 2) learning agents, even with good local generalization ability, struggle to learn the underlying physical reasoning rules and fail to generalize broadly. We encourage the development of intelligent agents with broad generalization abilities in physical domains.

* equal contribution

The research paper can be found here: https://arxiv.org/abs/2108.13696


Table of contents

  1. Physical Scenarios in Phy-Q
  2. Phy-Q in Angry Birds
  3. Task Generator
  4. Tasks Generted for the Baseline Analysis
  5. Baseline Agents
    1. How to Run Heuristic Agents
    2. How to Run Learning Agents
      1. How to Run DQN Baseline
      2. How to Run Stable Baselines
    3. How to Develop Your Own Agent
    4. Outline of the Agent Code
  6. Framework
    1. The Game Environment
    2. Symbolic Representation Data Structure
    3. Communication Protocols
  7. Human Player Data


1. Physical Scenarios in Phy-Q

We consider 15 physical scenarios in Phy-Q benchmark. Firstly, we consider the basic physical scenarios associated with applying forces directly on the target objects, i.e., the effect of a single force and the effect of multiple forces. On top of simple forces application, we also include the scenarios associated with more complex motion including rolling, falling, sliding, and bouncing, which are inspired by the physical reasoning capabilities developed in human infancy. Furthermore, we define the objects' relative weight, the relative height, the relative width, the shape differences, and the stability scenarios, which require physical reasoning abilities infants acquire typically in a later stage. On the other hand, we also incorporate clearing path, adequate timing, and manoeuvring capabilities, and taking non-greedy actions, which are required to overcome challenges for robots to work safely and efficiently in physical environments. To sum up, the physical scenarios we consider and the corresponding physical rules that can use to achieve the goal of the associated tasks are:

  1. Single force: Some target objects can be destroyed with a single force.
  2. Multiple forces: Some target objects need multiple forces to destroy.
  3. Rolling: Circular objects can be rolled along a surface to a target.
  4. Falling: Objects can be fallen on to a target.
  5. Sliding: Non-circular objects can be slid along a surface to a target.
  6. Bouncing: Objects can be bounced off a surface to reach a target.
  7. Relative weight: Objects with correct weight need to be moved to reach a target.
  8. Relative height: Objects with correct height need to be moved to reach a target.
  9. Relative width: Objects with correct width or the opening with correct width should be selected to reach a target.
  10. Shape difference: Objects with correct shape need to be moved/destroyed to reach a target.
  11. Non-greedy actions: Actions need to be selected in the correct order based on physical consequences. The immediate action may be less effective in the short term but advantageous in long term. i.e., reach less targets in the short term to reach more targets later.
  12. Structural analysis: The correct target needs to be chosen to break the stability of a structure.
  13. Clearing paths: A path needs to be created before the target can be reached.
  14. Adequate timing: Correct actions need to be performed within time constraints.
  15. Manoeuvring: Powers of objects need to be activated correctly to reach a target.

2. Phy-Q in Angry Birds

Based on the above physical scenarios, we develop Phy-Q benchmark in Angry Birds. Phy-Q contains tasks from 75 task templates belonging to the fifteen scenarios. The goal of an agent is to destroy all the pigs (green-coloured objects) in the tasks by shooting a given number of birds from the slingshot. Shown below are fifteen example tasks in Phy-Q representing the fifteen scenarios and the solutions for those tasks.

Task Description
1. Single force: A single force is needed to be applied to the pig to destroy it by a direct bird shot.
2. Multiple forces: Multiple forces are needed to be applied to destroy the pig by multiple bird shots.
3. Rolling: The circular object is needed to be rolled onto the pig, which is unreachable for the bird from the slingshot, causing the pig to be destroyed.
4. Falling: The circular object is needed to be fallen onto the pig causing the pig to be destroyed.
5. Sliding: The square object is needed to be slid to hit the pig, which is unreachable for the bird from the slingshot, causing the pig to be destroyed.
6. Bouncing: The bird is needed to be bounced off the platform (dark-brown object) to hit and destroy the pig.
7. Relative weight: The small circular block is lighter than the big circular block. Out of the two blocks, the small circular block can only be rolled to reach the pig and destroy.
8. Relative height: The square block on top of the taller rectangular block will not fall through the gap due to the height of the rectangular block. Hence the square block on top of the shorter rectangular block needs to be toppled to fall through the gap and destroy the pig.
9. Relative width: The bird cannot go through the lower entrance which has a narrow opening. Hence the bird is needed to be shot to the upper entrance to reach the pig and destroy it.
10. Shape difference: The circular block on two triangle blocks can be rolled down by breaking a one triangle block and the circular block on two square blocks cannot be rolled down by breaking a one square block. Hence, the triangle block needs to be destroyed to make the circular block roll and fall onto the pig causing the pig to be destroyed.
11. Non-greedy actions: A greedy action tries to destroy the highest number of pigs in a single bird shot. If the two pigs resting on the circular block are destroyed, then the circular block will roll down and block the entrance to reach the below pig. Hence, the below pig is needed to be destroyed first and then the upper two pigs.
12. Structural analysis: The bird is needed to be shot at the weak point of the structure to break the stability and destroy the pigs. Shooting elsewhere does not destroy the two pigs with a single bird shot.
13. Clearing paths: First, the rectangle block is needed to be positioned correctly to open the path for the circular block to reach the pig. Then the circular block is needed to be rolled to destroy the pig.
14. Adequate timing: First, the two circular objects are needed to be rolled to the ramp. Then, after the first circle passes the prop and before the second circle reaches the prop, the prop needs to be destroyed to make the second circle fall onto the lower pig.
15. Manoeuvring: The blue bird splits into three other birds when it is tapped in the flight. The blue bird is needed to be tapped at the correct position to manoeuvre the birds to reach the two separated pigs.

Sceenshots of the 75 task templates are shown below. x.y represents the yth template of the xth scenario. The indexes of the scenarios are: 1. single force, 2. multiple forces, 3. rolling, 4. falling, 5. sliding, 6. bouncing, 7. relative weight, 8. relative height, 9. relative width, 10. shape difference, 11. non-greedy actions, 12. structural analysis, 13. clearing paths, 14. adequate timing, and 15. manoeuvring:

1.1 1.2 1.3
1.4 1.5 2.1
2.2 2.3 2.4
2.5 3.1 3.2
3.3 3.4 3.5
3.6 4.1 4.2
4.3 4.4 4.5
5.1 5.2 5.3
5.4 5.5 6.1
6.2 6.3 6.4
6.5 6.6 7.1
7.2 7.3 7.4
7.5 8.1 8.2
8.3 8.4 9.1
9.2 9.3 9.4
10.1 10.2 10.3
10.4 11.1 11.2
11.3 11.4 11.5
12.1 12.2 12.3
12.4 12.5 12.6
13.1 13.2 13.3
13.4 13.5 14.1
14.2 15.1 15.2
15.3 15.4 15.5
15.6 15.7 15.8

3. Task Generator

We develop a task generator that can generate tasks for the task templates we designed for each scenario.

  1. To run the task generator:
    1. Go to tasks/task_generator
    2. Copy the task templates that you want to generate tasks into the input (level templates can be found in tasks/task_templates)
    3. Run the task generator providing the number of tasks as an argument
       python generate_tasks.py 
         
    
         
    1. Generated tasks will be available in the output

4. Tasks Generated for Baseline Analysis

We generated 100 tasks from each of the 75 task templates for the baseline analysis. We have categorized the 15 scenarios into 3 categories for convenience. The scenarios belong to each category are: category 1 (1.1 single force and 1.2 multiple forces), category 2 (2.1 rolling, 2.2 falling, 2.3 sliding, and 2.4 bouncing), and category 3 (3.1 relative weight, 3.2 relative height, 3.3 relative width, 3.4 shape difference, 3.5 non-greedy actions, 3.6 structural analysis, 3.7 clearing paths, 3.8 adequate timing, and 3.9 manoeuvring). Here x.y represents the yth scenario of the xth category. The generated tasks can be found in tasks/generated_tasks.zip. After extracting this file, the generatd tasks can be found located in the folder structure:
    generated_tasks/
        -- index of the category/
            -- index of the scenario/
                -- index of the template/
                    -- task files named as categoryIndex_scenarioIndex_templateIndex_taskIndex.xml

5. Baseline Agents and the Framework

Tested environments:

  • Ubuntu: 18.04/20.04
  • Python: 3.9
  • Numpy: 1.20
  • torch: 1.8.1
  • torchvision: 0.9.1
  • lxml: 4.6.3
  • tensorboard: 2.5.0
  • Java: 13.0.2/13.0.7
  • stable-baselines3: 1.1.0

Before running agents, please:

  1. Go to sciencebirdsgames and unzip Linux.zip
  2. Go to sciencebirdslevels/generated_tasks and unzip fifth_generation.zip

5.1 How to Run Heuristic Agents

  1. Run Java heuristic agents: Datalab and Eagle Wings:

    1. Go to Utils and in terminal run
      python PrepareTestConfig.py
      
    2. Go to sciencebirdsgames/Linux, in terminal run
      java -jar game_playing_interface.jar
    3. Go to sciencebirdsagents/HeuristicAgents/ and in terminal run Datalab
      java -jar datalab_037_v4_java12.jar 1
      or Eagle Wings
      java -jar eaglewings_037_v3_java12.jar 1
  2. Run Random Agent and Pig Shooter:

    1. Go to sciencebirdsagents/
    2. In terminal, after grant execution permission run Random Agent
      ./TestPythonHeuristicAgent.sh RandomAgent
      or Pig Shooter
      ./TestPythonHeuristicAgent.sh PigShooter

5.2.1 How to Run DQN Baselines

For Symbolic Agent

  1. Go to sciencebirdsagents/Utils
  2. Open Parameters.py and set agent to be DQNDiscreteAgent and network to be DQNSymbolicDuelingFC_v2 and state_repr_type to be "symbolic"

For Image Agent

  1. Go to sciencebirdsagents/Utils

  2. Open Parameters.py and set agent to be DQNDiscreteAgent and network to be DQNImageResNet and state_repr_type to be "image"

  3. Go to sciencebirdsagents/

  4. In terminal, after grant execution permission, train the agent for within template training

    ./TrainLearningAgent.sh within_template

    and for within scenatio

    ./TrainLearningAgent.sh benchmark
  5. Models will be saved to sciencebirdsagents/LearningAgents/saved_model

  6. To test learning agents, go the folder sciencebirdsagents:

    1. test within template performance, run
    python TestAgentOfflineWithinTemplate.py
    
    1. test within capability performance, run
    python TestAgentOfflineWithinCapability.py
    

5.2.2 How to Run Stable Baselines 3 Agents

For Symbolic Agent

  1. Go to sciencebirdsagents/Utils
  2. Open Parameters.py and set agent to be "ppo" or "a2c" and state_repr_type to be "symbolic"

For Image Agent

  1. Go to sciencebirdsagents/Utils

  2. Open Parameters.py and set agent to be "ppo" or "a2c" and state_repr_type to be "image"

  3. Go to sciencebirdsagents/

  4. In terminal, after grant execution permission, train the agent for within template training

    ./TrainAndTestOpenAIStableBaselines.sh within_template

    and for within scenatio

    ./TrainAndTestOpenAIStableBaselines.sh benchmark
  5. Models will be saved to sciencebirdsagents/OpenAIModelCheckpoints and tensorboard log will be saved to OpenAIStableBaseline

5.3 How to Develop Your Own Agent

We provide a gym-like environment. For a simple demo, which can be found at demo.py

level_list[-1]: # end the game when all game levels in the level list are played break s, r, is_done, info = env.reload_current_level() #go to the next level ">
from SBAgent import SBAgent
from SBEnvironment.SBEnvironmentWrapper import SBEnvironmentWrapper

# for using reward as score and 50 times faster game play
env = SBEnvironmentWrapper(reward_type="score", speed=50)
level_list = [1, 2, 3]  # level list for the agent to play
dummy_agent = SBAgent(env=env, level_list=level_list)  # initialise agent
dummy_agent.state_representation_type = 'image'  # use symbolic representation as state and headless mode
env.make(agent=dummy_agent, start_level=dummy_agent.level_list[0],
         state_representation_type=dummy_agent.state_representation_type)  # initialise the environment

s, r, is_done, info = env.reset()  # get ready for running
for level_idx in level_list:
    is_done = False
    while not is_done:
        s, r, is_done, info = env.step([-100, -100])  # agent always shoots at -100,100 as relative to the slingshot

    env.current_level = level_idx+1  # update the level list once finished the level
    if env.current_level > level_list[-1]: # end the game when all game levels in the level list are played
        break
    s, r, is_done, info = env.reload_current_level() #go to the next level

5.4 Outline of the Agent Code

The ./sciencebirdsagents folder contains all the relevant source code of our agents. Below is the outline of the code (this is a simple description. Detailed documentation in progress):

  1. Client:
    1. agent_client.py: Includes all communication protocols.
  2. final_run: Place to store tensor board results.
  3. HeuristicAgents
    1. datalab_037_v4_java12.jar: State-of-the-art java agent for Angry Birds.
    2. eaglewings_037_v3_java12.jar: State-of-the-art java agent for Angry Birds.
    3. PigShooter.py: Python agent that shoots at the pigs only.
    4. RandomAgent.py: Random agent that choose to shoot from $x \in (-100,-10)$ and $y \in (-100,100)$.
    5. HeuristicAgentThread.py: A thread wrapper to run multi-instances of heuristic agents.
  4. LearningAgents
    1. RLNetwork: Folder includes all DQN structures that can be used as an input to DQNDiscreteAgent.py.
    2. saved_model: Place to save trained models.
    3. LearningAgent.py: Inherited from SBAgent class, a base class to implement learning agents.
    4. DQNDiscreteAgent.py: Inherited from LearningAgent, a DQN agent that has discrete action space.
    5. LearningAgentThread.py: A thread wrapper to run multi-instances of learning agents.
    6. Memory.py: A script that includes different types of memories. Currently, we have normal memory, PrioritizedReplayMemory and PrioritizedReplayMemory with balanced samples.
  5. SBEnvironment
    1. SBEnvironmentWrapper.py: A wrapper class to provide gym-like environment.
    2. SBEnvironmentWrapperOpenAI.py: A wrapper class to provide gym-like environment for OpenAI Stable Baseline 3 agents.
    3. Server.py: A wrapper class for the game server for the OpenAI Stable Baseline 3 agents.
  6. StateReader: Folder that contains files to convert symbolic state representation to inputs to the agents.
  7. Utils:
    1. Config.py: Config class that used to pass parameter to agents.
    2. GenerateCapabilityName.py: Generate a list of names of capability for agents to train.
    3. GenerateTemplateName.py: Generate a list of names of templates for agents to train.
    4. LevelSelection.py: Class that includes different strategies to select levels. For example, an agent may choose to go to the next level if it passes the current one, or only when it has played the current level for a predefined number of times.
    5. NDSparseMatrix.py: Class to store converted symbolic representation in a sparse matrix to save memory usage.
    6. Parameters.py: Training/testing parameters used to pass to the agent.
    7. PrepareTestConfig.py: Script to generate config file for the game console to use for testing agents only.
    8. trajectory_planner.py: It calculates two possible trajectories given a directly reachable target point. It returns None if the target is non-reachable by the bird
  8. demo.py: A demo to showcase how to use the framework.
  9. SBAgent.py: Base class for all agents.
  10. MultiAgentTestOnly.py: To test python heuristic agents with running multiple instances on one particular template.
  11. TestAgentOfflineWithinCapability.py: Using the saved models in LearningAgents/saved_model to test agent's within capability performance on test set.
  12. TestAgentOfflineWithinTemplate.py: Using the saved models in LearningAgents/saved_model to test agent's within template performance on test set.
  13. TrainLearningAgent.py: Script to train DQN baseline agents on particular template with defined mode.
  14. TestPythonHeuristicAgent.sh: Bash Script to test heuristic agent's performance on all templates.
  15. TrainLearningAgent.sh: Bash Script to train DQN baseline agents to test both local and board generalization.
  16. OpenAI_StableBaseline_Train.py: Python script to run OpenAI Stable Baseline 3 agents on particular template with defined mode..
  17. TrainAndTestOpenAIStableBaselines.sh: Bash script to run OpenAI Stable Baseline 3 agents to test both local and board generalization.

6. Framework

6.1 The Game Environment

  1. The coordination system
    • in the science birds game, the origin point (0,0) is the bottom-left corner, and the Y coordinate increases along the upwards direction, otherwise the same as above.
    • Coordinates ranging from (0,0) to (640,480).

6.2 Symbolic Representation Data Structure

  1. Symbolic Representation data of game objects is stored in a Json object. The json object describes an array where each element describes a game object. Game object categories, and their properties are described below:

    • Ground: the lowest unbreakable flat support surface

      • property: id = 'object [i]'
      • property: type = 'Ground'
      • property: yindex = [the y coordinate of the ground line]
    • Platform: Unbreakable obstacles

      • property: id = 'object [i]'
      • property: type = 'Object'
      • property: vertices = [a list of ordered 2d points that represents the polygon shape of the object]
      • property: colormap = [a list of compressed 8-bit (RRRGGGBB) colour and their percentage in the object]
    • Trajectory: the dots that represent the trajectories of the birds

      • property: id = 'object [i]'
      • property: type = 'Trajectory'
      • property: location = [a list of 2d points that represents the trajectory dots]
    • Slingshot: Unbreakable slingshot for shooting the bird

      • property: id = 'object [i]'
      • property: type = 'Slingshot'
      • property: vertices = [a list of ordered 2d points that represents the polygon shape of the object]
      • property: colormap = [a list of compressed 8-bit (RRRGGGBB) colour and their percentage in the object]
    • Red Bird:

      • property: id = 'object [i]'
      • property: type = 'Object'
      • property: vertices = [a list of ordered 2d points that represents the polygon shape of the object]
      • property: colormap = [a list of compressed 8-bit (RRRGGGBB) colour and their percentage in the object]
    • all objects below have the same representation as red bird

    • Blue Bird:

    • Yellow Bird:

    • White Bird:

    • Black Bird:

    • Small Pig:

    • Medium Pig:

    • Big Pig:

    • TNT: an explosive block

    • Wood Block: Breakable wooden blocks

    • Ice Block: Breakable ice blocks

    • Stone Block: Breakable stone blocks

  2. Round objects are also represented as polygons with a list of vertices

  3. Symbolic Representation with noise

    • If noisy Symbolic Representation is requested, the noise will be applied to each point in vertices of the game objects except the ground, all birds and the slingshot
    • The noise for 'vertices' is applied to all vertices with the same amount within 5 pixels
    • The colour map has a noise of +/- 2%.
    • The colour is the colour map compresses 24 bit RGB colour into 8 bit
      • 3 bits for Red, 3 bits for Green and 2 bits for Blue
      • the percentage of the colour that accounts for the object is followed by colour
      • example: (127, 0.5) means 50% pixels in the objects are with colour 127
    • The noise is uniformly distributed
    • We will later offer more sophisticated and adjustable noise.

6.3 Communication Protocols


Message ID Request Format (byte[ ]) Return Format (byte[ ])
1-10 Configuration Messages
1 Configure team ID
Configure running mode
[1][ID][Mode]
ID: 4 bytes
Mode: 1 byte
COMPETITION = 0
TRAINING = 1
Four bytes array.
The first byte indicates the round;
the second specifies the time limit in minutes;
the third specifies the number of available levels
[round info][time limit][available levels]
Note: in training mode, the return will be [0][0][0].
As the round info is not used in training,
the time limit will be 600 hours,
and the number of levels needs to be requested via message ID 15
2 Set simulation speed
speed$\in$[0.0, 50.0]
Note: this command can be sent at anytime during playing to change the simulation speed
[2][speed]
speed: 4 bytes
OK/ERR [1]/[0]
11-30 Query Messages
11 Do Screenshot [11] Width, height, image bytes
Note: this command only returns screenshots without symbolic representation
[width][height][image bytes]
width, height: 4 bytes
12 Get game state [12] One byte indicates the ordinal of the state [0]: UNKNOWN
[1] : MAIN_MENU
[2]: EPISODE_MENU
[3]: LEVEL_SELECTION
[4]: LOADING
[5]: PLAYING
[6]: WON
[7]: LOST
14 Get the current level [14] four bytes array indicates the index of the current level [level index]
15 Get the number of levels [15] four bytes array indicates the number of available levels [number of level]
23 Get my score [23] A 4 bytes array indicating the number of levels
followed by ([number_of_levels] * 4) bytes array with every four
slots indicates a best score for the corresponding level
[number_of_levels][score_level_1]....[score_level_n]
Note: This should be used carefully for the training mode,
because there may be large amount of levels used in the training.
Instead, when the agent is in winning state,
use message ID 65 to get the score of a single level at winning state
31-50 In-Game Action Messages
31 Shoot using the Cartesian coordinates [Safe mode*] [31][fx][fy][dx][dy][t1][t2]
focus_x : the x coordinate of the focus point
focus_y: the y coordinate of the focus point
dx: the x coordinate of the release point minus focus_x
dy: the y coordinate of the release point minus focus_y
t1: the release time
t2: the gap between the release time and the tap time.
If t1 is set to 0, the server will execute the shot immediately.
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
32 Shoot using Polar coordinates [Safe mode*] [32][fx][fy][theta][r][t1][t2]
theta: release angle
r: the radial coordinate
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
33 Sequence of shots [Safe mode*] [33][shots length][shot message ID][Params]...[shot message ID][Params]
Maximum sequence length: 16 shots
An array with each slot indicates good/bad shot.
The bad shots are those shots that are rejected by the server
For example, the server received 5 shots, and the third one
was not executed due to some reason, then the server will return
[1][1][0][1][1]
41 Shoot using the Cartesian coordinates [Fast mode**] [41][fx][fy][dx][dy][t1][t2]
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
42 Shoot using Polar coordinates [Fast mode**] [42][fx][fy][theta][r][t1][t2]
The length of each parameter is 4 bytes
OK/ERR [1]/[0]
43 Sequence of shots [Fast mode**] [43][shots length][shot message ID][Params]...[shot message ID][Params]
Maximum sequence length: 16 shots
An array with each slot indicates good/bad shot.
The bad shots are those shots that are rejected by the server
For example, the server received 5 shots, and the third one
was not executed due to some reason, then the server will return
[1][1][0][1][1]
34 Fully Zoom Out [34] OK/ERR [1]/[0]
35 Fully Zoom In [35] OK/ERR [1]/[0]
51-60 Level Selection Messages
51 Load a level [51][Level]
Level: 4 bytes
OK/ERR [1]/[0]
52 Restart a level [52] OK/ERR [1]/[0]
61-70 Science Birds Specific Messages
61 Get Symbolic Representation With Screenshot [61] Symbolic Representation and corresponding screenshot [symbolic representation byte array length][Symbolic Representation bytes][image width][image height][image bytes]
symbolic representation byte array length: 4 bytes
image width: 4 bytes image height: 4 bytes
62 Get Symbolic Representation Without Screenshot [62] Symbolic Representation [symbolic representation byte array length][Symbolic Representation bytes]
63 Get Noisy Symbolic Representation With Screenshot [63] noisy Symbolic Representation and corresponding screenshot [symbolic representation byte array length][Symbolic Representation bytes][image width][image height][image bytes]
64 Get Noisy Symbolic Representation Without Screenshot [64] noisy Symbolic Representation [symbolic representation byte array length][Symbolic Representation bytes]
65 Get Current Level Score [65] current score
Note: this score can be requested at any time at Playing/Won/Lost state
This is used for agents that take intermediate score seriously during training/reasoning
To get the winning score, please make sure to execute this command when the game state is "WON"
[score]
score: 4 bytes
* Safe Mode: The server will wait until the state is static after making a shot.
** Fast mode: The server will send back a confirmation once a shot is made. The server will not do any check for the appearance of the won page.

7. Human Player Data

The human player data on Phy-Q is given in human_player_data.zip. This includes summarized data for 20 players. Each .csv file is for a player and the following are the columns.

  1. levelIndex: The index assigned to the task
  2. attempts: Number of attempts taken to solve the task (The value is given as 100 if the task is not solved)
  3. time_breakdown: Thinking time taken for each attempt (e.g. {1: 27, 2: 14}: Player has taken two attempts to solve the task. Time taken in the first attempt is 27 seconds and time taken for the second attempt is 14 seconds)
  4. total_time: Total thinking time taken to for all attempts (calculated only for 5 attempts)
  5. average_rate: The calculated pass rate (e.g. if the player solved the task in the first attempt, the value is given as 1.0 i.e., (6-1)/5. If the player has taken more than 5 attempts, the value is 0)
  6. scenario: The index of the physical scenario of the task
You might also like...
Open-Ended Commonsense Reasoning (NAACL 2021)
Open-Ended Commonsense Reasoning (NAACL 2021)

Open-Ended Commonsense Reasoning Quick links: [Paper] | [Video] | [Slides] | [Documentation] This is the repository of the paper, Differentiable Open-

 Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]
Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021]

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion [CVPR 2021] Abstract Analyzing complex scenes with DNN is a challenging ta

Deep Learning and Logical Reasoning from Data and Knowledge
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts
[CVPR 2021] A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Visual-Reasoning-eXplanation [CVPR 2021 A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts] Project Page | Vid

PyTorch implementation of
PyTorch implementation of "Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning"

Transparency-by-Design networks (TbD-nets) This repository contains code for replicating the experiments and visualizations from the paper Transparenc

Code Repo for the ACL21 paper
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

Data and Code for ACL 2021 Paper
Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Introduction Code and data for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning". We cons

Database Reasoning Over Text project for ACL paper
Database Reasoning Over Text project for ACL paper

Database Reasoning over Text This repository contains the code for the Database Reasoning Over Text paper, to appear at ACL2021. Work is performed in

Pytorch implementation of
Pytorch implementation of "A simple neural network module for relational reasoning" (Relational Networks)

Pytorch implementation of Relational Networks - A simple neural network module for relational reasoning Implemented & tested on Sort-of-CLEVR task. So

Comments
  • Support other platform of ScienceBirds

    Support other platform of ScienceBirds

    Hi,

    I have a question for your test platform, the ScienceBirds game. As I know, ScienceBirds supports other platforms like Windows and macOS, so I wonder is there any plan to distribute binaries for them.

    Additionally, Is the game file in your repository a different version from sciencebirdsframework or ScienceBird github? I want to know how your version of game loads levels from my computer.

    opened by bic4907 6
  • ModuleNotFoundError: No module named 'SBEnvironment'

    ModuleNotFoundError: No module named 'SBEnvironment'

    I was trying to run ./TestPythonHeuristicAgent.sh RandomAgent and got this error ModuleNotFoundError: No module named SBEnvironment.

    I've tested on both Ubuntu and MacOS and it seems the error is cross-platform.

    opened by RodgerLuo 4
  • DataLab and EagleWings agent are missing.

    DataLab and EagleWings agent are missing.

    I opt to run sample AIBirds agents with ScienceBirds, but there's something missing files on your repository.

    I refer this section to run sample agent, however there's no datalab_037_v4_java12.jar and eaglewings_037_v3_java12.jar in your sciencebirdsagents/HeuristicAgents/ directory.

    As my intuition, I ran data_lab_agent_v5.jar and eagle_wings_agent_v6.jar files with this command, instead, and some error occurred in my console.

    java -jar data_lab_agent_v5.jar
    
    (base) [email protected]:~/Desktop/benchmark/sciencebirdsagents/HeuristicAgents$ java -jar data_lab_agent_v5.jar 
    Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
            at ab.demo.MainEntry.main(MainEntry.java:27)
    

    Could you guys check that you uploaded the proper version of bot files?

    Note that I ran this command on my computer using Ubuntu 18.04 and java 13.0.7 version.

    opened by bic4907 2
  • ConnectionResetError: [Errno 104] Connection reset by peer

    ConnectionResetError: [Errno 104] Connection reset by peer

    Hello,

    I'm trying to train a PPO agent with Stable Baselines, followed by the instructions on Sec 5.2.2. After running ./TrainAndTestOpenAIStableBaselines.sh within_template, I got the following error:

    Traceback (most recent call last):
      File "OpenAI_StableBaseline_Train.py", line 231, in <module>
        range(c.num_worker)])
      File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 111, in __init__
        observation_space, action_space = self.remotes[0].recv()
      File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 250, in recv
        buf = self._recv_bytes()
      File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
        buf = self._recv(4)
      File "/home/ubuntu/anaconda3/envs/pytorch_p37/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
        chunk = read(handle, remaining)
    ConnectionResetError: [Errno 104] Connection reset by peer
    

    I wonder if I miss a step to activate the ScienceBird application? Please let me know.

    Thank you!

    opened by RodgerLuo 8
Releases(v1.0)
The Wearables Development Toolkit - a development environment for activity recognition applications with sensor signals

Wearables Development Toolkit (WDK) The Wearables Development Toolkit (WDK) is a framework and set of tools to facilitate the iterative development of

Juan Haladjian 114 Nov 27, 2022
Wandb-predictions - WANDB Predictions With Python

WANDB API CI/CD Below we capture the CI/CD scenarios that we would expect with o

Anish Shah 6 Oct 07, 2022
[AAAI-2021] Visual Boundary Knowledge Translation for Foreground Segmentation

Trans-Net Code for (Visual Boundary Knowledge Translation for Foreground Segmentation, AAAI2021). [https://ojs.aaai.org/index.php/AAAI/article/view/16

ZJU-VIPA 2 Mar 04, 2022
Planning from Pixels in Environments with Combinatorially Hard Search Spaces -- NeurIPS 2021

PPGS: Planning from Pixels in Environments with Combinatorially Hard Search Spaces Environment Setup We recommend pipenv for creating and managing vir

Autonomous Learning Group 11 Jun 26, 2022
CTF challenges from redpwnCTF 2021

redpwnCTF 2021 Challenges This repository contains challenges from redpwnCTF 2021 in the rCDS format; challenge information is in the challenge.yaml f

redpwn 27 Dec 07, 2022
Winning solution of the Indoor Location & Navigation Kaggle competition

This repository contains the code to generate the winning solution of the Kaggle competition on indoor location and navigation organized by Microsoft

Tom Van de Wiele 62 Dec 28, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan NOTE: This documentation describes a BETA release of PyStan 3. PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is

Stan 229 Dec 29, 2022
Official implementation of Few-Shot and Continual Learning with Attentive Independent Mechanisms

Few-Shot and Continual Learning with Attentive Independent Mechanisms This repository is the official implementation of Few-Shot and Continual Learnin

Chikan_Huang 25 Dec 08, 2022
Practical tutorials and labs for TensorFlow used by Nvidia, FFN, CNN, RNN, Kaggle, AE

TensorFlow Tutorial - used by Nvidia Learn TensorFlow from scratch by examples and visualizations with interactive jupyter notebooks. Learn to compete

Alexander R Johansen 1.9k Dec 19, 2022
A Broad Study on the Transferability of Visual Representations with Contrastive Learning

A Broad Study on the Transferability of Visual Representations with Contrastive Learning This repository contains code for the paper: A Broad Study on

Ashraful Islam 29 Nov 09, 2022
Angora is a mutation-based fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without symbolic execution.

Angora Angora is a mutation-based coverage guided fuzzer. The main goal of Angora is to increase branch coverage by solving path constraints without s

833 Jan 07, 2023
This provides the R code and data to replicate results in "The USS Trustee’s risky strategy"

USSBriefs2021 This provides the R code and data to replicate results in "The USS Trustee’s risky strategy" by Neil M Davies, Jackie Grant and Chin Yan

1 Oct 30, 2021
Randomizes the warps in a stock pokeemerald repo.

pokeemerald warp randomizer Randomizes the warps in a stock pokeemerald repo. Usage Instructions Install networkx and matplotlib via pip3 or similar.

Max Thomas 6 Mar 17, 2022
A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

CLEVR Dataset Generation This is the code used to generate the CLEVR dataset as described in the paper: CLEVR: A Diagnostic Dataset for Compositional

Facebook Research 503 Jan 04, 2023
Deep Learning to Improve Breast Cancer Detection on Screening Mammography

Shield: This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Deep Learning to Improve Breast

Li Shen 305 Jan 03, 2023
ICLR 2021, Fair Mixup: Fairness via Interpolation

Fair Mixup: Fairness via Interpolation Training classifiers under fairness constraints such as group fairness, regularizes the disparities of predicti

Ching-Yao Chuang 49 Nov 22, 2022
StarGAN2 for practice

StarGAN2 for practice This version of StarGAN2 (coined as 'Post-modern Style Transfer') is intended mostly for fellow artists, who rarely look at scie

vadim epstein 87 Sep 24, 2022
Codes for paper "Towards Diverse Paragraph Captioning for Untrimmed Videos". CVPR 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos This repository contains PyTorch implementation of our paper Towards Diverse Paragraph Capti

Yuqing Song 61 Oct 11, 2022
An extremely simple, intuitive, hardware-friendly, and well-performing network structure for LiDAR semantic segmentation on 2D range image. IROS21

FIDNet_SemanticKITTI Motivation Implementing complicated network modules with only one or two points improvement on hardware is tedious. So here we pr

YimingZhao 54 Dec 12, 2022
Pytorch Performace Tuning, WandB, AMP, Multi-GPU, TensorRT, Triton

Plant Pathology 2020 FGVC7 Introduction A deep learning model pipeline for training, experimentaiton and deployment for the Kaggle Competition, Plant

Bharat Giddwani 0 Feb 25, 2022