Hints-Pandas – Hints that Help

Fixing Numpy install errors and Panda install

There are sometimes issues reported on Raspberry Pi setups when installing using pip3 install (or pip install). These will typically mention:

libf77blas.so.3: cannot open shared object file: No such file or directory, The solution will be to either:
sudo apt-get install libatlas-base-dev to install the missing libraries expected by the self-compiled NumPy
Alternatively use the NumPy provided by Raspbian. In which case run:
- pip3 uninstall numpy # remove previously installed version
- apt install python3-numpy

Pandas Module for Merging Files

Import the Pandas module.
Read both the files using the read_excel() function.
Combine them using the merge() function.
Use the to_excel() function, to create the resultant file

# importing the module
import pandas
# reading the files
f1 = pandas.read_excel("registration details.xlsx")f2 = pandas.read_excel("exam results.xlsx")
# merging the files
f3 = f1[["REGISTRATION NO", "STUDENT EMAIL ID "]].merge(f2[["REGISTRATION NO", "Name", "Marks Obtained", "Percentage"]], on = "REGISTRATION NO",
how = "left")
# creating a new file
f3.to_excel("Results.xlsx", index = False)

Merge Example

Pandas Joining Columns to Create a New Column

df[‘new column’] = df[‘column1’] + df[‘column2’]
df[‘new column’] = df[‘column1’] + ‘ ‘ + df[‘column2’] #puts a space between the values

Regular Expressions (regex) with Pandas

df.replace(‘(expressiontoreplace)’, ‘(replacewiththis)’, regex=True, inplace = True)
df.replace(to_replace = ‘(expressiontoreplace)’, value = ‘(replacewiththis)’, inplace = True)
regex testing site ==> pythex.org

Pandas copy a DataFrame to Clipboard

DataFrame.to_clipboard(excel=True, sep=none)
df.to_clipboard(sep=’,’) or (sep=’,’, index = False)
Linux needs modules: xclip or xsel
- sudo apt install xclip
- sudo apt install xsex
may not work with Python 3???

Conditional Search Using Pandas

From video at: https://www.youtube.com/watch?v=EWIUWVsejWY

import numpy as np
import pandas as pd

excel_file = ‘Pandas_Workbook.xlsx’
df = pd.read_excel(excel_file)
print(df)

print(df[‘Name’].where(df[‘Occupation’] == ‘Programmer’))
programmers = df[‘Name’].where(df[‘Occupation’] == ‘Programmer’)
print(programmers.dropna())

excel_files = [‘Pandas_Workbook.xlsx’,’Pandas_Workbook_copy.xlsx’,’Pandas_Workbook_copy_2.xlsx’]

for individual_excel_file in excel_files:
df = pd.read_excel(individual_excel_file)
programmers = df[‘Name’].where(df[‘Occupation’] == ‘Programmer’).dropna()
print(“File Name” + individual_excel_file)
print(programmers)

Interesting Python/Pandas information with examples: Sparks by (Examples)

Automate Excel

https://www.freecodecamp.org/news/automate-excel-tasks-with-python/

Getting the Year from a datetime column

This will add a new column called year to the current dataframe
- df[‘year’] = df[‘datecolumn’].dt.year