Don't know how to do user input in this ML program

Tom gayle · Feb 4, 2024

Find the solubility from the dataset:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

# Load the dataset
dp = pd.read_csv('https://raw.githubusercontent.com/dataprofessor/data/master/delaney_solubility_with_descriptors.csv')

# Separate features (x) and target variable (y)
y = dp['logS']
x = dp.drop('logS', axis=1)

# Split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=100)

# Linear Regression Model
lr = LinearRegression()
lr.fit(x_train, y_train)
y_train_pred_lr = lr.predict(x_train)
y_test_pred_lr = lr.predict(x_test)

# Random Forest Regressor Model
k1 = RandomForestRegressor(max_depth=2, random_state=100)
k1.fit(x_train, y_train)
y_train_pred_rf = k1.predict(x_train)
y_test_pred_rf = k1.predict(x_test)

# Evaluate Linear Regression Model
y_train_mse_lr = mean_squared_error(y_train, y_train_pred_lr)
y_train_r2_lr = r2_score(y_train, y_train_pred_lr)
y_test_mse_lr = mean_squared_error(y_test, y_test_pred_lr)
y_test_r2_lr = r2_score(y_test, y_test_pred_lr)

# Evaluate Random Forest Regressor Model
y_train_mse_rf = mean_squared_error(y_train, y_train_pred_rf)
y_train_r2_rf = r2_score(y_train, y_train_pred_rf)
y_test_mse_rf = mean_squared_error(y_test, y_test_pred_rf)
y_test_r2_rf = r2_score(y_test, y_test_pred_rf)

# Create DataFrames
rs_lr = pd.DataFrame({"Method": ["Linear Regression"],
                      "Training MSE": [y_train_mse_lr],
                      "Training R2": [y_train_r2_lr],
                      "Testing MSE": [y_test_mse_lr],
                      "Testing R2": [y_test_r2_lr]})

rs_rf = pd.DataFrame({"Method": ["Random Forest Regressor"],
                      "Training MSE": [y_train_mse_rf],
                      "Training R2": [y_train_r2_rf],
                      "Testing MSE": [y_test_mse_rf],
                      "Testing R2": [y_test_r2_rf]})

# Concatenate DataFrames
finale = pd.concat([rs_lr, rs_rf], ignore_index=True)
print(finale)

I'm a beginner. The above program is a training program.This program predicts the output. but I want the user to enter the data.
MolLogP, MolWt, NumRotatableBonds, and AromaticProportion those were the user can give. Then the program should predict the value of logS. I don't how to modify the program for the user input.

ApacheKid · Feb 4, 2024

I've never used ML, and its not clear what that the function is doing, also the data file contains over 1,400 lines of data so I have no idea what you want a user to be able enter. Explain exactly how a "user" would actually do?

Tom gayle · Feb 4, 2024

ApacheKid said:
I've never used ML, and its not clear what that the function is doing, also the data file contains over 1,400 lines of data so I have no idea what you want a user to be able enter. Explain exactly how a "user" would actually do?

actually this program: It loads a dataset from a URL using Pandas. The dataset is related to molecular solubility and includes various molecular descriptors. It separates the features (x) and the target variable (y) from the dataset. The target variable in this case is the logarithm of the solubility (logS).It splits the dataset into training and testing sets using the train_test_split function from scikit-learn. The training set is used to train the models, and the testing set is used to evaluate their performance.
It trains two regression models - a Linear Regression model (lr) and a Random Forest Regressor model (k1) using the training data. It uses the trained models to make predictions on both the training and testing sets.
It evaluates the performance of both models using mean squared error (MSE) and R-squared (R2) scores. These metrics provide insights into how well the models are fitting the data.It creates two separate DataFrames (rs_lr and rs_rf) to store the evaluation metrics for each model.It concatenates the two DataFrames into a final DataFrame (finale). This DataFrame summarizes the training and testing performance of both models.Finally, it prints the concatenated DataFrame (finale) which includes the method name, training MSE, training R2, testing MSE, and testing R2 for both the Linear Regression and Random Forest Regressor models.
The goal of this program is to compare the performance of the Linear Regression and Random Forest Regressor models in predicting the solubility of molecules based on their descriptors.

now i want user to enter thsoe values( MolLogP, MolWt, NumRotatableBonds, and AromaticProportion ) those values are the x. now i want the value of y(targeted variable).

Tom gayle · Feb 4, 2024

Tom gayle said:
actually this program: It loads a dataset from a URL using Pandas. The dataset is related to molecular solubility and includes various molecular descriptors. It separates the features (x) and the target variable (y) from the dataset. The target variable in this case is the logarithm of the solubility (logS).It splits the dataset into training and testing sets using the train_test_split function from scikit-learn. The training set is used to train the models, and the testing set is used to evaluate their performance.
It trains two regression models - a Linear Regression model (lr) and a Random Forest Regressor model (k1) using the training data. It uses the trained models to make predictions on both the training and testing sets.
It evaluates the performance of both models using mean squared error (MSE) and R-squared (R2) scores. These metrics provide insights into how well the models are fitting the data.It creates two separate DataFrames (rs_lr and rs_rf) to store the evaluation metrics for each model.It concatenates the two DataFrames into a final DataFrame (finale). This DataFrame summarizes the training and testing performance of both models.Finally, it prints the concatenated DataFrame (finale) which includes the method name, training MSE, training R2, testing MSE, and testing R2 for both the Linear Regression and Random Forest Regressor models.
The goal of this program is to compare the performance of the Linear Regression and Random Forest Regressor models in predicting the solubility of molecules based on their descriptors.

now i want user to enter thsoe values( MolLogP, MolWt, NumRotatableBonds, and AromaticProportion ) those values are the x. now i want the value of y(targeted variable).

Python:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score, mean_squared_error

# Load the dataset
dp = pd.read_csv('https://raw.githubusercontent.com/dataprofessor/data/master/delaney_solubility_with_descriptors.csv')

# Separate features (x) and target variable (y)
y = dp['logS']
x = dp.drop('logS', axis=1)

# Split the data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=100)

# Linear Regression Model
lr = LinearRegression()
lr.fit(x_train, y_train)
y_train_pred_lr = lr.predict(x_train)
y_test_pred_lr = lr.predict(x_test)

# Random Forest Regressor Model
k1 = RandomForestRegressor(max_depth=2, random_state=100)
k1.fit(x_train, y_train)
y_train_pred_rf = k1.predict(x_train)
y_test_pred_rf = k1.predict(x_test)

# Evaluate Linear Regression Model
y_train_mse_lr = mean_squared_error(y_train, y_train_pred_lr)
y_train_r2_lr = r2_score(y_train, y_train_pred_lr)
y_test_mse_lr = mean_squared_error(y_test, y_test_pred_lr)
y_test_r2_lr = r2_score(y_test, y_test_pred_lr)

# Evaluate Random Forest Regressor Model
y_train_mse_rf = mean_squared_error(y_train, y_train_pred_rf)
y_train_r2_rf = r2_score(y_train, y_train_pred_rf)
y_test_mse_rf = mean_squared_error(y_test, y_test_pred_rf)
y_test_r2_rf = r2_score(y_test, y_test_pred_rf)

# Create DataFrames
rs_lr = pd.DataFrame({"Method": ["Linear Regression"],
                      "Training MSE": [y_train_mse_lr],
                      "Training R2": [y_train_r2_lr],
                      "Testing MSE": [y_test_mse_lr],
                      "Testing R2": [y_test_r2_lr]})

rs_rf = pd.DataFrame({"Method": ["Random Forest Regressor"],
                      "Training MSE": [y_train_mse_rf],
                      "Training R2": [y_train_r2_rf],
                      "Testing MSE": [y_test_mse_rf],
                      "Testing R2": [y_test_r2_rf]})

# Concatenate DataFrames
finale = pd.concat([rs_lr, rs_rf], ignore_index=True)




# Provided values
x1 = float(input("enter:"))
x2 = float(input("enter:"))
x3 = float(input("enter:"))
x4 = float(input("enter:"))

# Feature names
feature_names = x.columns.tolist()  # Assuming x is your original dataframe

# Reshape the input data with feature names
xn = pd.DataFrame(data=[[x1, x2, x3, x4]], columns=feature_names)

# Make predictions using the linear regression model
yn = k1.predict(xn)
yn1=lr.predict(xn)
print("Predicted logS value:", yn[0])
print("Predicted logS value:", yn1[0])

i modified the code is it ok ? is it correct? can someone help me pls?

Thread starter	Similar threads	Forum	Replies	Date
	Need input (not Johnny 5). This gate symbol; I don't understand. Please help	General Electronics Chat	11	Apr 2, 2026
	Because all good stories involve Pirates, don't they?	Off-Topic	4	Jan 29, 2026
M	A circuit I documented / that I don't like	Power Electronics	6	Dec 2, 2025
	(EDITED: RESOLVED) PWM circuit. I don't understand it	General Electronics Chat	5	Jul 26, 2025
	Don't use your email-adresses as user name.	Feedback and Suggestions	4	Apr 6, 2012

Don't know how to do user input in this ML program

Join our Engineering Community! Sign-in with:

Don't know how to do user input in this ML program

Tom gayle

ApacheKid

Tom gayle

Tom gayle

You May Also Like

A 7 to 60 V Input Buck Converter: A Wide-Range DC-DC Step-Down Module

Thread Group and Broadband Forum Unite for End-to-End IoT Interoperability

Cisco Introduces a Universal Quantum Switch for Quantum Networking

U-blox Unwraps Pair of Wi-Fi 6 Modules for Cost-Sensitive Designs