I downloaded the Omar Series from www.archive.org and it downloaded very well. But! the filenames were all messed up and I was not going to change them because, lets face it, why do the donkey work when you have Python!
So, At start, the filenames looked like this:
At a glance, you can see, every file has an episode name and its part number. They appear in 2-3 different formats. Our target is to make a convention so that they appear in same format i.e.
OmarSeries-Episode-02-Part-01.mp4 Here the Episode is followed by a hyphen and wherever there are more than one parts, we add a Part-01 or Part-02 after Episode number.
So lets get started. You’ll need:
- Microsoft Excel (Any format after 2007 is good)
- Python 3 installed with following libraries:
- Jupyter Notebook makes it easy to work with spreadsheets (Optional)
So go to the folder where you have put all your files. Press Shift + Right Click and then click on “Copy as Path”
It will copy all files as paths. Now open a spreadsheet and paste it all there.
Now separate file name keeping other path intact. Notice that before filename starts, there is a \ in the beginning. This slash can help us find the index and then using the RIGHT() command, we can separate the filename.
Let me explain this formula a bit. Starting with FIND(). The FIND function finds the position of slash, given as first argument, in A1 cell, given as second argument, and starts finding it from the cursor value given as third argument. In short the function finds 15th character (since all previous slashes are before 15th character and it remains same throughout) of the complete string and then starts finding the Slash and returns the actual position. LEN(A1) function helps find the length of complete string and subtracting it from the position of slash gives us the length of the filename and that is returned.
Once you have successfully implement formula in one cell, simply select the column keeping cell with formula as first row of column and press Ctrl+D to apply formula on complete column.
Before going any further, lets paste the names as values so that we may perform other string tasks. Again select the B column and press Ctrl + C. then Press Ctrl+Alt+V to open Paste Special dialog box.
Press OK once you have selected the settings as per the image. It will “Replace” the formulas with text. Remember to keep the formula saved somewhere as this step will remove all formulas.
Notice that we have some spaces in some filenames. For our next action, we need to remove the spaces from filenames to make them uniform. So first we select the filename by pressing on the “B” written on top of column and then pressing ctrl+H. Note that it is important that you select the column first before replacing any text. Otherwise it will impact your complete sheet and if your path have any spaces, it will also change.
Now Click on the field next to “Find what” and press Space bar once to give one space. Keep Replace With field empty.
It will remove all spaces and the names are returned.
Now, in my case, I was lucky to have a pattern in filenames. We have word Episode written and after than Episode number appears. Similarly, after Part, appears part number. So using same formula we used to find filenames, we can now populate episode numbers and part numbers in next columns. If your data doesn’t support these shortcuts, it will take just 3-4 minutes to write down the episode numbers and part numbers in separate columns. I used following formulas:
B2 has the filename we previously fixed. Find gives us the starting position of the word Episode. By adding 7, we jump to the end of the word and then pick the next 2 characters using MID function.
Notice that there are cases where Episode number has one digit. In those cases we can quickly replace the other character with an empty string and make it work. Then we can quickly convert these numbers into string by concatenating with a “0”. e.g. if we have a 8, we can convert into 08 by using formula like
Why convert? Because we want to use all numbers as strings so that they are sorted properly. Otherwise we will see episode 10 after episode 1 and so on. Same step followed for the Part numbers as well
Similarly, for Part number we can just replace the text in Find by “Part” and replace 7 by 4 to get desired results. Since part numbers go as high as 3 or 4, the number of characters are also lowered from 2 to 1.
Formula will be:
Now the final sheet looks like this before converting to numbers.
Once Converted, they look like this:
Also, the places where we have no “Part”, will raise error. Lets keep the error as-is for our next step
Now we have all bits to construct a new filename. So lets write a filename. As we mentioned earlier, we wanted to construct the filename with following naming convention:
OmarSeries-Episode-02-Part-01.mp4 where part number exists and OmarSeries-Episode-01.mp4 where we dont have any part number.
So in this string, first part can be “OmarSeries-Episode-” after than we concatenate the episode number then check if we have a part number. In that case we concatenate “-Part-” with part number and at the end concatenate “.mp4”. If we dont have a part number, then we concatenate “.mp4” immediately after Episode number. How to get if we have a partnumber or not, we check for errors. If there’s an error, we skip, else we concatenate part number.
Here’s how my formula looks like.
Here & denotes concatenation of two strings. C2 has the Episode number and D2 has the part number. IF takes three parameters, first is the condition to check, then what to do if condition is True and at the end, what to do if condition is False. ISERR function returns a true if the value D2 has error. Else it gives False. If the error exists, I have passed a null string (notice two commas?) and if it is false, the part number is concatenated. In any case the mp4 is concatenated at the end. Save everything and paste it as values and the sheet looks like this:
Notice that wherever we got an error, the part name was omitted.
Now, bring in the path where you want to save the new files. it is recommended to save everything in a separate folder and once it is successfully executed, delete the previous files and replace them. This is just to be on safe side.
To bring in the filepath, simply create the folder or go into the folder and press on the filename on top. Then copy the filepath.
Now in the next column, simply concatenate this path with filename previously constructed.
I added a slash at the end and concatenated with E2.
This will generate all paths.
Now, copy the two paths, i.e. the path of original file and path of new file in a new excel sheet
Save this sheet as filepathchange.xlsx and lets prepare for Python.
In python, create a .py file or open a jupyter notebook and import the filepathchange.xlsx using Pandas.
import pandas as pd
import numpy as np
data = pd.read_excel("filepathchange.xlsx")
Output of above will be:
Now, simply using shutil’s function copypath function, we can create or replace new files for all length of data:
for i in range(len(data)):
Here, it simply uses a for loop to go through the dataframe and executes the command for each row.
Go and checkout the files at new path and enjoy!