Don't know how to do user input in this ML program

Thread Starter

Tom gayle

Joined Sep 20, 2021
83
Find the solubility from the dataset:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

# Load the dataset
dp = pd.read_csv('https://raw.githubusercontent.com/dataprofessor/data/master/delaney_solubility_with_descriptors.csv')

# Separate features (x) and target variable (y)
y = dp['logS']
x = dp.drop('logS', axis=1)

# Split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=100)

# Linear Regression Model
lr = LinearRegression()
lr.fit(x_train, y_train)
y_train_pred_lr = lr.predict(x_train)
y_test_pred_lr = lr.predict(x_test)

# Random Forest Regressor Model
k1 = RandomForestRegressor(max_depth=2, random_state=100)
k1.fit(x_train, y_train)
y_train_pred_rf = k1.predict(x_train)
y_test_pred_rf = k1.predict(x_test)

# Evaluate Linear Regression Model
y_train_mse_lr = mean_squared_error(y_train, y_train_pred_lr)
y_train_r2_lr = r2_score(y_train, y_train_pred_lr)
y_test_mse_lr = mean_squared_error(y_test, y_test_pred_lr)
y_test_r2_lr = r2_score(y_test, y_test_pred_lr)

# Evaluate Random Forest Regressor Model
y_train_mse_rf = mean_squared_error(y_train, y_train_pred_rf)
y_train_r2_rf = r2_score(y_train, y_train_pred_rf)
y_test_mse_rf = mean_squared_error(y_test, y_test_pred_rf)
y_test_r2_rf = r2_score(y_test, y_test_pred_rf)

# Create DataFrames
rs_lr = pd.DataFrame({"Method": ["Linear Regression"],
                      "Training MSE": [y_train_mse_lr],
                      "Training R2": [y_train_r2_lr],
                      "Testing MSE": [y_test_mse_lr],
                      "Testing R2": [y_test_r2_lr]})

rs_rf = pd.DataFrame({"Method": ["Random Forest Regressor"],
                      "Training MSE": [y_train_mse_rf],
                      "Training R2": [y_train_r2_rf],
                      "Testing MSE": [y_test_mse_rf],
                      "Testing R2": [y_test_r2_rf]})

# Concatenate DataFrames
finale = pd.concat([rs_lr, rs_rf], ignore_index=True)
print(finale)
I'm a beginner. The above program is a training program.This program predicts the output. but I want the user to enter the data.
MolLogP, MolWt, NumRotatableBonds, and AromaticProportion those were the user can give. Then the program should predict the value of logS. I don't how to modify the program for the user input.
 

ApacheKid

Joined Jan 12, 2015
1,605
I've never used ML, and its not clear what that the function is doing, also the data file contains over 1,400 lines of data so I have no idea what you want a user to be able enter. Explain exactly how a "user" would actually do?
 

Thread Starter

Tom gayle

Joined Sep 20, 2021
83
I've never used ML, and its not clear what that the function is doing, also the data file contains over 1,400 lines of data so I have no idea what you want a user to be able enter. Explain exactly how a "user" would actually do?
actually this program: It loads a dataset from a URL using Pandas. The dataset is related to molecular solubility and includes various molecular descriptors. It separates the features (x) and the target variable (y) from the dataset. The target variable in this case is the logarithm of the solubility (logS).It splits the dataset into training and testing sets using the train_test_split function from scikit-learn. The training set is used to train the models, and the testing set is used to evaluate their performance.
It trains two regression models - a Linear Regression model (lr) and a Random Forest Regressor model (k1) using the training data. It uses the trained models to make predictions on both the training and testing sets.
It evaluates the performance of both models using mean squared error (MSE) and R-squared (R2) scores. These metrics provide insights into how well the models are fitting the data.It creates two separate DataFrames (rs_lr and rs_rf) to store the evaluation metrics for each model.It concatenates the two DataFrames into a final DataFrame (finale). This DataFrame summarizes the training and testing performance of both models.Finally, it prints the concatenated DataFrame (finale) which includes the method name, training MSE, training R2, testing MSE, and testing R2 for both the Linear Regression and Random Forest Regressor models.
The goal of this program is to compare the performance of the Linear Regression and Random Forest Regressor models in predicting the solubility of molecules based on their descriptors.

now i want user to enter thsoe values( MolLogP, MolWt, NumRotatableBonds, and AromaticProportion ) those values are the x. now i want the value of y(targeted variable).
 

Thread Starter

Tom gayle

Joined Sep 20, 2021
83
actually this program: It loads a dataset from a URL using Pandas. The dataset is related to molecular solubility and includes various molecular descriptors. It separates the features (x) and the target variable (y) from the dataset. The target variable in this case is the logarithm of the solubility (logS).It splits the dataset into training and testing sets using the train_test_split function from scikit-learn. The training set is used to train the models, and the testing set is used to evaluate their performance.
It trains two regression models - a Linear Regression model (lr) and a Random Forest Regressor model (k1) using the training data. It uses the trained models to make predictions on both the training and testing sets.
It evaluates the performance of both models using mean squared error (MSE) and R-squared (R2) scores. These metrics provide insights into how well the models are fitting the data.It creates two separate DataFrames (rs_lr and rs_rf) to store the evaluation metrics for each model.It concatenates the two DataFrames into a final DataFrame (finale). This DataFrame summarizes the training and testing performance of both models.Finally, it prints the concatenated DataFrame (finale) which includes the method name, training MSE, training R2, testing MSE, and testing R2 for both the Linear Regression and Random Forest Regressor models.
The goal of this program is to compare the performance of the Linear Regression and Random Forest Regressor models in predicting the solubility of molecules based on their descriptors.

now i want user to enter thsoe values( MolLogP, MolWt, NumRotatableBonds, and AromaticProportion ) those values are the x. now i want the value of y(targeted variable).
Python:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

# Load the dataset
dp = pd.read_csv('https://raw.githubusercontent.com/dataprofessor/data/master/delaney_solubility_with_descriptors.csv')

# Separate features (x) and target variable (y)
y = dp['logS']
x = dp.drop('logS', axis=1)

# Split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=100)

# Linear Regression Model
lr = LinearRegression()
lr.fit(x_train, y_train)
y_train_pred_lr = lr.predict(x_train)
y_test_pred_lr = lr.predict(x_test)

# Random Forest Regressor Model
k1 = RandomForestRegressor(max_depth=2, random_state=100)
k1.fit(x_train, y_train)
y_train_pred_rf = k1.predict(x_train)
y_test_pred_rf = k1.predict(x_test)

# Evaluate Linear Regression Model
y_train_mse_lr = mean_squared_error(y_train, y_train_pred_lr)
y_train_r2_lr = r2_score(y_train, y_train_pred_lr)
y_test_mse_lr = mean_squared_error(y_test, y_test_pred_lr)
y_test_r2_lr = r2_score(y_test, y_test_pred_lr)

# Evaluate Random Forest Regressor Model
y_train_mse_rf = mean_squared_error(y_train, y_train_pred_rf)
y_train_r2_rf = r2_score(y_train, y_train_pred_rf)
y_test_mse_rf = mean_squared_error(y_test, y_test_pred_rf)
y_test_r2_rf = r2_score(y_test, y_test_pred_rf)

# Create DataFrames
rs_lr = pd.DataFrame({"Method": ["Linear Regression"],
                      "Training MSE": [y_train_mse_lr],
                      "Training R2": [y_train_r2_lr],
                      "Testing MSE": [y_test_mse_lr],
                      "Testing R2": [y_test_r2_lr]})

rs_rf = pd.DataFrame({"Method": ["Random Forest Regressor"],
                      "Training MSE": [y_train_mse_rf],
                      "Training R2": [y_train_r2_rf],
                      "Testing MSE": [y_test_mse_rf],
                      "Testing R2": [y_test_r2_rf]})

# Concatenate DataFrames
finale = pd.concat([rs_lr, rs_rf], ignore_index=True)




# Provided values
x1 = float(input("enter:"))
x2 = float(input("enter:"))
x3 = float(input("enter:"))
x4 = float(input("enter:"))

# Feature names
feature_names = x.columns.tolist()  # Assuming x is your original dataframe

# Reshape the input data with feature names
xn = pd.DataFrame(data=[[x1, x2, x3, x4]], columns=feature_names)

# Make predictions using the linear regression model
yn = k1.predict(xn)
yn1=lr.predict(xn)
print("Predicted logS value:", yn[0])
print("Predicted logS value:", yn1[0])
i modified the code is it ok ? is it correct? can someone help me pls?
 
Top