dev-resources.site
for different kinds of informations.
Maximize Your Client Upload Efficiency with the Bulk Upload
Backstory
In one of my previous organizations, there was a requirement for uploading N number of files. Initially, the total count was hardcoded such that only 10, 100, or 200 files could be uploaded at once. However, later the requirement changed, allowing for any number of files or folders (flattened BFS array) to be uploaded. This was a turning point because firing 10,000+ requests will eventually crash the browser, so I came up with a solution to utilize a queue for uploading N number of files or requests.
Introduction
Have you ever faced the challenge of uploading a large number of files or folders and having your browser crash due to too many requests? If so, you’ll be happy to know that there’s a solution to this problem. Introducing Bulk Upload, a superb library for uploading large numbers of files or folders (flattened tree array, similar to the Google Drive folder upload experience) in a concurrent fashion to avoid performance hits. With Bulk Upload, you can set the concurrency level to best suit your needs, and the library will take care of the rest. For example, if you set the concurrency level to 10 and you have 1000 files to upload, 990 files will sit in the queue while the other 10 are in the concurrent request pool. This way, you can be sure that your browser will not crash and your uploads will be completed efficiently.
Overview
Let’s go step-by-step
The UI Wrapper is a simple front-end wrapper for this library. It starts the process by creating a bulk-upload instance class and providing files or folders in a flattened breadth-first search format. During the initial phase, the files are segregated into a queue and an in-progress pool based on the concurrency and total number of provided files.
The queue is a straightforward map data structure that keeps track of files that are not in progress and flushes them if the size of the in-progress pool is less than the concurrency.
The in-progress pool is where the upload process starts. It utilizes Axios for request handling and sends a callback to the UI Wrapper. The UI Wrapper then returns the payload for the Axios request, giving it control over the whole request flow.The in-progress pool is also responsible for sending events, such as failed, canceled, completed, and upload/download progress, to the status manager.
The status manager plays a crucial role in the process by determining when the queue needs to be flushed to the in-progress pool. If the size of the in-progress pool is less than the concurrency, the queue is freed and the cycle is completed. The status manager also receives events from the in-progress pool and informs the queue, as well as sending updates to the UI Wrapper so it can view real-time data structures of the in-progress, in-queue, completed, and failed requests.
UI wrapper has also control over cancel, destroy, retry request since objects in JavaScript are passed by reference the status=FAILED|COMPLETED|IN_PROGRESS etc are shared whenever transitioning to various pools.
In summary, files are organized into a queue and an in-progress pool based on concurrency. The upload process starts using Axios, and the status manager updates the queue and informs the UI Wrapper of the request’s state in real-time. The UI Wrapper also has control over canceling, destroying, and retrying requests. The size of the in-progress pool is compared to the concurrency to determine when to flush the queue.
import axios, { AxiosProgressEvent, AxiosRequestConfig } from "axios";
export default class BulkUpload {
/**
* @param {number} concurrency - The number of concurrent file uploads allowed.
* @param {File[]} files - The array of File objects to be uploaded.
* @param {function} onUpdate - A callback function that is called whenever there is an update in the upload status.
* @param {boolean} [requestOptions.downloadProgress=false] - Whether to report download progress
* @param {boolean} [requestOptions.uploadProgress=false] - Whether to report upload progress
* @param {function} requestArguments - callback function which returns payload for axios request along side fileObject as an argument
* @param {function} onUploadComplete - callback function when pending and queue is finished
* @param {number} lastProgressUpload - how frequest onUpdate callback should be invoked, whenever upload/download progress is updated
* @param {boolean} isFileHierarchy - For fetching & uploading folder-hierarchy please use this package : https://www.npmjs.com/package/files-hierarchy
*/
constructor({
concurrency,
// files,
onUpdate,
requestOptions,
requestArguments,
onUploadComplete,
lastProgressUpload,
isFileHierarchy,
}: Constructor) {
// INTIAL SETUP
}
/**
* getControls to override upload flow
* @returns {Object} {cancel, retry, destroy, updateQueue}
*/
public getControls() {
return {
cancel: this.cancelOperation,
retry: this.retryFailedOperation,
updateQueue: this.updateQueue,
destroy: this.destroy,
};
}
/**
* @param {Array} File or FileHierarchy objects
* start the queue progress segregates queue and inProgress pool based on concurrency limit
*/
public start(files: File[] | FileHierarchy[]) {
if (!this.initiated) {
//if request is already initiated and more files are there to be processed
//those extra files are pushed into queue
return this.updateQueue(files);
}
this.initiated = true;
for (let i = 0; i < files.length; i++) {
const file = files[i]!;
const value = {
status: FileStatus.IN_PROGRESS,
id: this.getTargetValue(file),
...this.getFileTargetVal(file),
};
if (i < this._concurrency) {
value.status = FileStatus.IN_PROGRESS;
this.inProgress.set(value.id, value);
} else {
value.status = FileStatus.IN_QUEUE;
this.inQueue.set(value.id, value);
}
}
this.sendUpdateEvent();
this.startInitialProgress();
}
private startInitialProgress() {
for (const [_, fileObj] of this.inProgress) {
this.uploadFile(fileObj as FileObj);
}
}
/***
updateProgressEvent is responsible for attaching a callback
that is invoked by axios XHR request that has totoal & loaded count
*/
private updateProgressEvent({
fileObj,
axiosRequestArgs,
type,
}: {
type: "DOWNLOAD" | "UPLOAD";
fileObj: FileObj;
axiosRequestArgs: any;
}) {
try {
const isDownload = type === "DOWNLOAD";
const progressType = isDownload
? "onDownloadProgress"
: "onUploadProgress";
axiosRequestArgs[progressType] = ({
loaded,
total,
}: AxiosProgressEvent) => {
loaded = isNaN(Number(loaded)) ? 0 : Number(loaded);
total = isNaN(Number(total)) ? 0 : Number(total);
fileObj[isDownload ? "downloadCount" : "uploadCount"] = Math.floor(
(loaded / total) * 100
);
if (typeof fileObj?.lastProgressUpdated !== "number") {
fileObj.lastProgressUpdated = Date.now();
}
//send event callback after updating lastProgressUpload if frequency
//is more than requested one
if (
typeof this._lastProgressUpload === "number" &&
Date.now() - fileObj?.lastProgressUpdated >= this._lastProgressUpload
) {
this.sendUpdateEvent();
fileObj.lastProgressUpdated = Date.now();
}
};
} catch (e) {
console.error(e);
}
}
//axios upload method where arguments is gathered using a callback function
//to the calle by sending fileObj consisting of all the request info.
private uploadFile(fileObj: FileObj) {
try {
const axiosRequestArgs: AxiosRequestConfig =
this._requestArguments(fileObj);
if (this._downloadProgress) {
this.updateProgressEvent({
fileObj,
type: "DOWNLOAD",
axiosRequestArgs,
});
}
if (this._uploadProgress) {
this.updateProgressEvent({ fileObj, type: "UPLOAD", axiosRequestArgs });
}
//preserve the canceltoken within the fileObj of the map.
axiosRequestArgs.cancelToken = new axios.CancelToken((cancel) => {
fileObj.cancel = cancel;
});
axios(axiosRequestArgs)
.then(() => {
//here progress is deleted from map and status is updated
//later queue is informed about the status change
if (this.destroyed) return;
this.inProgress.delete(fileObj.id);
fileObj.status = FileStatus.SUCCESS;
this.completedUploads += 1; //.set(fileObj.id, fileObj);
this.sendUpdateEvent();
this.freeQueue();
})
.catch((requestError) => {
if (this.destroyed) return;
fileObj.isCancelled = !!axios.isCancel(requestError);
this.uploadFailed(fileObj);
});
} catch (e) {
if (this.destroyed) return;
this.uploadFailed(fileObj);
}
}
/**
heart of the concurrency mechanish
here progress size is first checked before flushing queue to progress pool
* inform queue to remove items and push to progress Pool
*/
private freeQueue(): void {
if (this.inQueue.size === 0 || this.destroyed) {
this.sendUpdateEvent();
if (!this.uploadCompleted) {
this._onUploadComplete?.();
this.uploadCompleted = true;
}
return;
}
if (this.inProgress.size === this._concurrency) {
return this.sendUpdateEvent();
}
for (let [_, file] of this.inQueue) {
file.status = FileStatus.IN_PROGRESS;
this.inQueue.delete(file.id!);
this.inProgress.set(file.id!, file);
this.sendUpdateEvent();
this.uploadFile(file as FileObj);
//we only what top of the queue that's why break the loop post every
// iteration
break;
}
}
//same thing here update the status to FAILED
// remove from progress pool update failedUploads
//later send the event and inform queue
private uploadFailed(fileObj: FileObj): void {
fileObj.status = FileStatus.FAILED;
this.inProgress.delete(fileObj.id);
this.failedUploads.set(fileObj.id, fileObj);
this.sendUpdateEvent();
this.freeQueue();
}
/** */
// onUpdateCallback is catched by the callee (UI wrapper)
// this callback is invoked whenever any
//map data-structure(progress, queue, failed, completed etc) is changed
//such as queue flushed, failed, completed etc.
private sendUpdateEvent(): void {
this._onUpdate?.({
IN_PROGRESS: this.inProgress,
IN_QUEUE: this.inQueue,
COMPLETED_UPLOADS: this.completedUploads,
FAILED_UPLOADS: this.failedUploads,
});
}
private cancelOperation = (file: FileObj) => {
if (file.status === FileStatus.IN_PROGRESS) {
file.cancel?.();
}
};
//flip the destroyed flag this will stop all the queue-progress-flushing
//and halt the process,
// post halting cancel all on-going request
private destroy = () => {
this.destroyed = true;
const now = Date.now();
for (let [, file] of this.inProgress as Map<string, FileObj>) {
if (file.status === FileStatus.IN_PROGRESS) {
this.cancelOperation(file);
file = {
status: FileStatus.FAILED,
id: `${this.getTargetValue(
file.fileHierarchy || (file.file as File)
)}-${now}`,
...this.getFileTargetVal(file.fileHierarchy || (file.file as File)),
};
this.inProgress.delete(file.id);
this.failedUploads.set(file.id, file);
}
}
this.sendUpdateEvent();
};
private retryFailedOperation = (fileObjs: FileObj[]) => {
if (!Array.isArray(fileObjs))
throw new Error("Retry Argument must be an array");
const retries: (File | FileHierarchy)[] = [];
const isFile = this.isFileType();
for (let file of fileObjs) {
if (file.status === FileStatus.FAILED) {
this.failedUploads.delete(file.id);
retries.push(isFile ? file.file! : file.fileHierarchy!);
}
}
this.updateQueue(retries);
};
//this method takes care of the new incoming files during an ongoing request
//take those new files and push them to queue pool
//later inform queue about the new files which need processing
private updateQueue = (files: (File | FileHierarchy)[]) => {
this.uploadCompleted = false;
this.destroyed = false;
const now = Date.now();
for (let i = 0; i < files.length; i++) {
const file = files[i]!;
const value = {
status: FileStatus.IN_QUEUE,
id: `${this.getTargetValue(file)}-${now}`,
...this.getFileTargetVal(file),
};
value.status = FileStatus.IN_QUEUE;
this.inQueue.set(value.id, value);
this.freeQueue();
}
this.sendUpdateEvent();
};
private getTargetValue(fileObj: File | FileHierarchy) {
if (fileObj instanceof File) {
return fileObj.name;
}
return fileObj.path;
}
private getFileTargetVal(file: File | FileHierarchy): {
file: File | null;
fileHierarchy: FileHierarchy | null;
isCancelled: boolean;
} {
const isFile = this.isFileType();
return {
file: isFile ? (file as File) : null,
fileHierarchy: !isFile ? (file as FileHierarchy) : null,
isCancelled: false,
};
}
private isFileType(): boolean {
return !!!this._isFileHierarchy;
}
}
Why use a Map data structure? Because it is easier to maintain during loops compared to a simple array, and it is faster when retrieving and deleting elements.
Why isn’t the completedUploads a Map like the others, but rather a counter? This is because since it has already been completed, it doesn’t make sense to keep it in a Map or Array.
What is the File Hierarchy? At the beginning, I mentioned a requirement to send flattened breadth-first search (BFS) folders as an upload request. File Hierarchy is an instance of another library utility I wrote recently, which converts the webkitdirectory format to a flatter tree directory. For example, if the webkitdirectory path is “Documents/folder1/home/some-pic.png”, the BFS structure will be “Document/” { “/Folder” -> { “home” -> { “some-pic.png” }}}. File Hierarchy converts the webkitdirectory path to a flattened BFS array. It’s similar to uploading a folder to Google Drive. If there is a requirement for webkitdirectory, we can use File Hierarchy.
Usage
npm i browser-bulk-upload
import BulkUpload from "browser-bulk-upload";
const bulkUpload = new BulkUpload({
concurrency: 2,
//synchronous function for returning axios request args
requestArguments: ({ file, fileHierarchy }: any) => {
//fileHierarchy -> please refer isFileHierarchy flag comment below
const formData = new FormData();
formData.append("file", file);
return {
url: "http://localhost:3000/upload",
method: "POST",
headers: {
"Content-Type": "multipart/form-data",
},
data: formData,
};
},
lastProgressUpload: 100, //for every 100ms download/upload progress will be updated and onUpdate callback will be invoked
onUpdate: ({
COMPLETED_UPLOADS /**Number */,
FAILED_UPLOADS /**MAP -> [(name) => FileObj]**/,
IN_QUEUE /**MAP -> [(name) => FileObj]**/,
IN_PROGRESS /**MAP -> [(name) => FileObj]**/,
}) => {
//on complete, failed, inQueue & inProgress structure update callback is invoked
onUploadUpdate({
COMPLETED_UPLOADS,
FAILED_UPLOADS,
IN_QUEUE,
IN_PROGRESS,
});
},
onUploadComplete: () => {
console.log("request completed");
},
requestOptions: {
uploadProgress: true, //send request upload percentage
// downloadProgress: true, send request download percentage
},
isFileHierarchy: false /**enable this flag if you have a requirement of sending folders as a BFS like Google-Drive folder upload to fetch all folder path(s),
please use this library : https://www.npmjs.com/package/files-hierarchy
**/,
});
const { cancel, destroy, retry, updateQueue } = bulkUpload.getControls();
/**
* cancel -> cancel failed request -> cancel(FileObj)
* destroy -> cancel all inprogress and remove all inqueue request(s) -> destroy()
* retry -> retry only failed request -> retry([FileObj])
* updateQueue -> update existing queue upload. Please note if you start upload again internally updateQueue is been called
*/
function onUploadUpdate({
COMPLETED_UPLOADS, //number
FAILED_UPLOADS,
IN_QUEUE,
IN_PROGRESS,
}: EventType) {
/**FAILED|IN_QUEUE, IN_PROGRESS ->
* MAP{ FILE_NAME_ID ->
* FileObj = {
file: File | null;
fileHierarchy: FileHierarchy | null;
status: FileStatus;
uploadCount?: number;
downloadCount?: number;
isCancelled?: boolean; //if cancelled by user else request failed
id: string;
lastProgressUpdated?: number;
};
* }**/
//cancel(FileObj)
//retry([FileObj, FileObj])
//updateQueue(FileObj.file || FileObj.fileHierarchy)
}
//start the upload
document.querySelector("input")?.addEventListener("change", (e) => {
bulkUpload.start(e.target.files);
});
Additionally, I created a React wrapper for this library as it would be less practical to use in its vanilla form for most use cases.
To end this here’s a small demo.
I hope you found my bulk upload implementation informative and helpful! There is always room for improvement, such as incorporating web workers, and I would love to hear your thoughts and feedbacks.
Thank you.
Featured ones: