Reaction Bench Lesson 2

Here is a link to the jupyter notebook, please use it at your pleasure.

Getting a high reward in a reaction of form:

A-X + A-X -> A-A + X-X and A-X + B-X -> A-B + X-X

In this lesson we will try to get a high reward in the reaction of the form above. Rewards will come from producing either A-A or B-B. It's important to note that the reward cannot come from A-B as this doesn't make the desired property. The reaction we will be taking an in depth look at in this lesson is:

2 3-chlorohexane + 2 Na -> 4,5-diethyloctane + 2 NaCl

We will try to get the desired material: 4,5-diethyloctane

In similar fashion to lesson 1, the reactions used in this lesson are found in the available reactions directory. This particular lesson will use the reaction file and is registered under the id WurtzReact-v2. The main difference between this reaction file and the one used in lesson 1 is that this file's target material is 4,5-diethyloctane instead of dodecane.

From lesson 1 we know that our action space is a 6 element vector represented by:

Temperature Volume 1-chlorohexane 2 chlorohexane 3-chlorohexane Na
Value range: 0-1 0-1 0-1 0-1 0-1 0-1

Each index corresponds to the following label and how we change them. For example is action[0] = 0 then the temperature will decrease by dt. If it is set to 0.5 then it will stay the same and if set to 1 then the temperature will increase by dt.

First let's start by importing all the modules we need.

# import all the required external modules
import gym
import numpy as np
import os
import pickle
import sys
from time import sleep
from gym import envs
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

# ensure all necessary modules can be found
sys.path.append("../") # to access chemistrylab
sys.path.append("../chemistrylab/reactions") # to access all reactions

# import all local modules
import chemistrylab

This will show all the environments we can currently run. Eventually you can create your own environments with different reactions and target material using the reaction template.

# show all environments for reaction bench
all_envs = envs.registry.all()
env_ids = [ for env_spec in all_envs if 'React' in]

This explains the reaction we are trying to simulate and is initializing the reaction environment.

# trying to get high reward for wurtz reaction of form:
# A-X + A-X --> A-A + X-X and A-X + B-X --> A-B + X-X
# Rewards comes from producing A-A or B-B
# Cannot come from A-B as this doesn't make the desired property
# Desired material in this case is initialized to be 4,5-diethyloctane
# initializes environment
env = gym.make("WurtzReact-v2")
render_mode = "human"
done = False
__ = env.reset()

# shows # of actions available
print('# of actions available: ',env.action_space.shape[0])
num_actions_available = env.action_space.shape[0]


We will store certain values in these arrays so can plot them later on to visually show how each variable changes over time.

reactant_1 = []
reactant_2 = []
total_reward_over_time = []

action = np.ones(env.action_space.shape)

The key to achieving a high reward in this simulation is to only add the reactants that are needed for the reaction to continue. This means that we will only add 3-chlorohexane and Na with our actions. This will allow us to maximize our reward. A large quantity of these reactants means the reaction with our target material will occur more often. We do this by running the following commands:

if total_steps  < 20:
    action[0] = 1
    action[1] = 1
    action[2] = 0    # 1-chlorohexane
    action[3] = 0    # 2-chlorohexane
    action[4] = 1    # 3-chlorohexane
    action[5] = 1    # Na

Notice that we're only adding the reactants we need for the reaction to continue; 3-chlorohexane and Na.



Let's run our program and see what happens!

while not done:
    # Actions:
    #   a[0] changes the temperature between -dT (a[0] = 0.0) and +dT (a[0] = 1.0)
    #   a[1] changes the Volume between -dV (a[1] = 0.0) and +dV (a[1] = 1.0)
    #   a[2:] adds between none (a[2:] = 0.0) and all (a[2:] = 1.0) of each reactant
    if total_steps  < 20:
        action[0] = 1
        action[1] = 1
        action[2] = 0    # 1-chlorohexane
        action[3] = 0    # 2-chlorohexane
        action[4] = 1    # 3-chlorohexane
        action[5] = 1    # Na

        # Adding Reactants not needed:
        action[0] = 1
        action[1] = 1
        action[5] = 1
        action[4] = 1
        action[2] = 1
        action[3] = 1

    # perform the action and update the reward
    state, reward, done, __ = env.step(action)
    print('total_steps: ', total_steps)
    print('reward: %.2f ' % reward)
    total_reward += reward
    print('total reward: %.2f' % total_reward)
    # print(state)

    # render the plot
    # sleep(2)

    # increment one step
    total_steps += 1

    # append arrays for states over time

Notice that we get a high total reward. A visual representation of the reactants being used and total reward increasing can be seen in the subplot we produce!


This simply shows us the stats of the reaction vessel. It essentially shows everything from thermodynamic variables, to the amount of material

# ask user if they want to see stats of reaction vessel
show_stats = input("Show Reaction Vessel Stats ('Y'/'N') >>> ")

if show_stats.lower() in ["y", "yes"]:
    # open and check the material dict
    vessel_path = os.path.join(os.getcwd(), "vessel_experiment_0.pickle")
    with open(vessel_path, 'rb') as open_file:
        v = pickle.load(open_file)

    print("---------- VESSEL ----------")
    print("Label: {}".format(v.label))

    print("---------- THERMODYNAMIC VARIABLES ----------")
    print("Temperature (in K): {:e}".format(v.temperature))
    print("Volume (in L): {:e}".format(v.volume))
    print("Pressure (in kPa): {:e}".format(v.pressure))

    print("---------- MATERIAL_DICT ----------")
    for material, value_list in v._material_dict.items():
        print("{} : {}".format(material, value_list))

    print("---------- SOLUTE_DICT ----------")
    for solute, value_list in v._solute_dict.items():
        print("{} : {}".format(solute, value_list))

This part of the code plots certain states over time.

# graph states over time
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4)
ax1.plot(steps_over_time, reactant_1)
ax1.set_title('Steps vs. Reactant 1 (3-chlorohexane)')

ax2.plot(steps_over_time, reactant_2, 'tab:orange')
ax2.set_title('Steps vs. Reactant 2 (Na)')

ax3.plot(steps_over_time, reward_over_time, 'tab:green')
ax3.set_title('Steps vs Reward')

ax4.plot(steps_over_time, total_reward_over_time, 'tab:red')
ax4.set_title('Steps vs Total Reward')
ax4.set_ylabel('Total Reward')

plt.savefig('Final Subplots Demo Lesson 3.png')

For the second part of the experiment let's uncomment the code that adds the reactants not needed and run our code again.

# Adding Reactants not needed:
action[0] = 1
action[1] = 1
action[5] = 1
action[4] = 1
action[2] = 1
action[3] = 1

If we run this code we'll notice that our reward is significantly lower. It is a significant drop-off from the reward we get from our previous set of actions. Once again, the reason this is happening is that other reactions are taking place instead of the reaction that produces our desired material. This is because we add additional reactants such as 1-chlorohexane and 2-chlorohexane.

The next step for this reaction environment is to write an RL implementation that will allow the agent to solve this problem for you essentially maximizing our output of the desired material!