dev-resources.site
for different kinds of informations.
Hacking Excel Files in Power Automate
Ever since 2003 Microsoft had to open up their Office files for other software, and that meant moving to a universal code based on XML. After .xls we have .xlsx, and the x was for XML.
At its heart a .xlsx file is a zip file full of XML files. All you need to do is change the name to .zip and then you can see what's in the file.
And that made me think, all the Excel API's are doing is edit the XML in the files, something that in theory I could do with Power Automate.
So I went on a crazy journey to see if its possible, originally I wanted to edit a few cells, but then I had a possible use case come with value, cracking a password protected sheet/workbook. I originally built this functionality in VBA, so it gave me a head start.
Call out, there are probably better ways, like a Office Script, and this doesn't work on password protected files (you have to be able to open), but its more the journey then the goal, right?
There are a few things that we will cover
- XML, Excel and Passwords
- Updating the XML
- The Flow
1. XML, Excel and Passwords
As I said above, Excel files are actually a collection of XML files, so all we need to do is nosey around opening the files to figure out what they do. I found the interesting bits are in the xl folder:
Here we have the workbook.xml file, and the folder worksheets, which you guessed it contains all of the worksheets as .xml files. If you open the workbook.xml file you see information like the type, default view, calc status, and most interesting for us the workbookProtection.
As you can see Microsoft are well behaved and don't store the password in plane text, its SHA-512 encrypted. But the this is where Microsoft being well behave ends, as there is no default protection. What does this mean, well if you look at a unprotected workbook, there isn't a default workbookProtection, all they do is remove it. So you guessed it, for us to unlock the workbook we don't need to crack the password, we just need to delete the workbookProtection tag.
Worksheets are exactly the same, just a sheetProtection instead of workbookProtection tag.
So now you now how to crack a Excel file, you can do all of the above manually, save back to zip, rename back to .xlsx and all protection has gone.
2. Editing Excel XML
In this case we are going to edit the XML by removing a tag, but its relatively easy to swap that out for a edit or add. There are 3 different ways to edit which I went into more detail here, but the best way is to convert to a JSON.
Because Power Automate uses JSON as its API language, we need convert the XML to JSON, remove the tag with the built in functions, and then convert back to XML.
The first expression converts the XML to a JSON, but as the content is a string, not XML, we convert it to XML first.
json(xml(outputs('Get_file_content')?['body']))
The second expression uses the removeProperty() function to remove the workbookProtection tag. Though as its a nested tag (inside the workbook tag) we have to update the workbook property with the new workbook tag with the removed workbookProtection
xml(
setProperty(
outputs('Convert_to_JSON')
,
'workbook'
,
removeProperty(
outputs('Convert_to_JSON')?['workbook']
,
'workbookProtection'
)
)
)
Finally we convert it back to XML using the xml() function.
3. The Flow
The complex part here is zipping the file back up again, as there is no action for that. Luckily I already did a blog all about it here https://dev.to/wyattdave/how-to-update-contents-of-a-zip-file-with-power-automate-3b1b, where we leverage an undocumented SharePoint API (the one that creates a zip file when you download multiple files).
So the steps are:
- Create a new file with the Excel files content but name it a zip
- Use the Extract action
- Update the Workbook file
- Loop over the worksheets and update them
- Zip the folder
The flow I'm going to create will have 2 child flows, the update file (aka remove protection) and the zip folder.
Remove Protection
The remove protection flow is relatively simple, we pass in the file path, the tag to remove (workbookProtection for workbook, and sheetProtection for worksheet), and the parent node it is in (workbook for you guessed it, and worksheet for....).
We grab the file content, use our expression to convert to JSON, remove/modify, convert to XML.
xml(
setProperty(
json(xml(outputs('Get_file_content')?['body']))
,
triggerBody()?['text_1']
,
removeProperty(
json(xml(outputs('Get_file_content')?['body']))?[triggerBody()?['text_1']]
,
triggerBody()?['text_2']
)
)
)
The last thing we do is a try/catch condition, using runAfter failed we can figure out if there was any protection set (as it will fail if it wasn't there), in that case we can pass back the unedited XML.
Zip Folder
Im not going into much detail here, as its a little complex and I have done a full blog on it. But in a nutshell we pass the paths of the folder to zip, file name, and a unique GUID (we create that in the parent flow and use it to make every unzip unique). We send a request to run the zip process on the server, then build the request to download the zip file through another request. Finally we create the file in SharePoint (with the .xlsx file type). The childflow then sends back to link to the file.
Full Flow
And that's it, we can now unzip the Excel, edit the XML files inside it, and then re-zip it.
I know the use cases for this are probably limited, but I think its a cool skill to have, and it was fun to figure out π
As always full solution available to download and have a nosey here.
Featured ones: