Data Science - Individual Assignment 1 Week 2

11 Month ago By Dikhi Martin Binus

Individual Assignment Data Science Python Language

Problem Statement

Topic	Introduction to Python Language
Sub Topic	Control Flow and Looping
Reference	LN

Below is the data of employees of a private company who commute to the office using either private or public transportation.

No.	Gender	Employee Age	Salary	Transportation
1	Male	20	8,000,000	Private Vehicle
2	Male	35	14,000,000	Public Transport
3	Female	26	10,000,000	Public Transport
4	Female	27	12,000,000	Private Vehicle
5	Male	21	9,000,000	Private Vehicle
6	Male	22	11,000,000	Private Vehicle
7	Female	32	15,000,000	Public Transport
8	Female	26	8,000,000	Public Transport
9	Male	25	9,000,000	Public Transport
10	Female	20	10,000,000	Private Vehicle

Based on the data in the table, create a Python program and provide a screenshot of the output.

Use control flow to determine job positions based on salary:
- Salary = 8 million – 9 million, position = Officer
- Salary = 10 million – 11 million, position = Supervisor
- Salary = 12 million – 14 million, position = Assistant Manager
- Salary ≥ 15 million, position = Manager
Use looping to display the highest and lowest salaries.

Solution

Control Flow to Determine Positions

# Creating a function to determine positions based on salary
def assign_position(salary):
   if 8000000 <= salary <= 9000000:
       return 'Officer'
   elif 10000000 <= salary <= 11000000:
       return 'Supervisor'
   elif 12000000 <= salary <= 14000000:
       return 'Assistant Manager'
   elif salary >= 15000000:
       return 'Manager'
   else:
       return 'Unknown'

# Adding the Position column to each data row
for row in data:
   row.append(assign_position(row[3]))

Explanation:

The assign_position function takes salary as a parameter and returns positions based on the given rules.
In the for loop, this function is called for each data row, and the result is added to the row as a new column named "Position."

Looping to Display Highest and Lowest Salaries:

# Determining column widths based on the longest value in each column
column_widths = [len(str(header[i])) for i in range(len(header))]
for row in data:
   for i in range(len(row)):
       column_widths[i] = max(column_widths[i], len(str(row[i])))

# Displaying the table header
print("\t|\t".join(f"{header[i]:<{column_widths[i]}}" for i in range(len(header))))
print("--" * (sum(column_widths) + len(column_widths) - 1))

# Displaying employee data with positions
for row in data:
   row.append(assign_position(row[3]))  # Adding the Position column to each data row
   print("\t|\t".join(f"{str(row[i]):<{column_widths[i]}}" for i in range(len(header))))

Explanation:

In the first loop, the maximum length of each column is calculated to format the table output later.
The table output is created to ensure each column has the appropriate width.
Employee data, including the newly determined positions, are printed in a table format with the previously calculated column widths.

Full Source Code

# Employee data
data = [
   [1, 'Male', 20, 8000000, 'Private Vehicle'],
   [2, 'Male', 35, 14000000, 'Public Transport'],
   [3, 'Female', 26, 10000000, 'Public Transport'],
   [4, 'Female', 27, 12000000, 'Private Vehicle'],
   [5, 'Male', 21, 9000000, 'Private Vehicle'],
   [6, 'Male', 22, 11000000, 'Private Vehicle'],
   [7, 'Female', 32, 15000000, 'Public Transport'],
   [8, 'Female', 26, 8000000, 'Public Transport'],
   [9, 'Male', 25, 9000000, 'Public Transport'],
   [10, 'Female', 20, 10000000, 'Private Vehicle']
]

# Adding the Position column based on Salary
def assign_position(salary):
   if 8000000 <= salary <= 9000000:
       return 'Officer'
   elif 10000000 <= salary <= 11000000:
       return 'Supervisor'
   elif 12000000 <= salary <= 14000000:
       return 'Assistant Manager'
   elif salary >= 15000000:
       return 'Manager'
   else:
       return 'Unknown'

# Creating a header for the table
header = ["No.", "Gender", "Employee Age", "Salary", "Transportation", "Position"]

# Calculating column widths based on the longest value in each column
column_widths = [len(str(header[i])) for i in range(len(header))]
for row in data:
   for i in range(len(row)):
       column_widths[i] = max(column_widths[i], len(str(row[i])))

# Displaying the table header
print("\t|\t".join(f"{header[i]:<{column_widths[i]}}" for i in range(len(header))))
print("--" * (sum(column_widths) + len(column_widths) - 1))

# Displaying employee data with positions
for row in data:
   row.append(assign_position(row[3]))  # Adding the Position column to each data row
   print("\t|\t".join(f"{str(row[i]):<{column_widths[i]}}" for i in range(len(header))))

# Sorting data based on Salary
data.sort(key=lambda x: x[3])

# Displaying the lowest and highest salaries after sorting
print("\nLowest Salary:", data[0][3])
print("Highest Salary:", data[-1][3])

Live code: https://replit.com/@DikhiMartin1/Individual-Assignment-1-Data-Science-Control-Flow-and-Looping

Simulation

Here is a simulation of the

output result:

Live simulation: https://asciinema.org/a/620815

Correlation with Data Science:

Control Flow (Decision Making): In the first step, control flow is used to make decisions based on salary values, which is a common aspect in data processing and analysis.
Looping (Iterative Process): The second step uses looping to display data while ensuring a neat table format. This reflects the iterative principle in data science, where we often need to iterate processes for data analysis or manipulation.
Data Manipulation: Adding the "Position" column to each row is an example of data manipulation, which is an essential part of data science for making data more informative and ready for further analysis.
Data Presentation: The neat table output format aids in data presentation, which is a critical aspect of communicating data analysis results to other stakeholders within the organization.

Conclusion

This article demonstrates how Python can be used to analyze employee data and determine positions based on salary, as well as display information on the smallest and largest salaries. Through the use of control flow and looping, the process is illustrated clearly.

References

Grus, J. (2019). Data science from scratch: first principles with python. O'Reilly Media.

Cuadrado-Gallego, J. J., & Demchenko, Y. (2023). Introduction to data science and data analytics. In Springer eBooks (pp. 1–44). https://doi.org/10.1007/978-3-031-39129-3_1

Author

Dikhi MartinSoftware Engineer

Script Savvy