dev-resources.site
for different kinds of informations.
CocoCaptions in PyTorch (1)
Published at
1/8/2025
Categories
python
pytorch
cococaptions
dataset
Author
hyperkai
Author
8 person written this
hyperkai
open
*Memos:
-
My post explains CocoCaptions() using
train2017
withcaptions_train2017.json
,instances_train2017.json
andperson_keypoints_train2017.json
,val2017
withcaptions_val2017.json
,instances_val2017.json
andperson_keypoints_val2017.json
andtest2017
withimage_info_test2017.json
andimage_info_test-dev2017.json
. -
My post explains CocoCaptions() using
train2017
withstuff_train2017.json
,val2017
withstuff_val2017.json
,stuff_train2017_pixelmaps
withstuff_train2017.json
,stuff_val2017_pixelmaps
withstuff_val2017.json
,panoptic_train2017
withpanoptic_train2017.json
,panoptic_val2017
withpanoptic_val2017.json
andunlabeled2017
withimage_info_unlabeled2017.json
. -
My post explains CocoDetection() using
train2014
withcaptions_train2014.json
,instances_train2014.json
andperson_keypoints_train2014.json
,val2014
withcaptions_val2014.json
,instances_val2014.json
andperson_keypoints_val2014.json
andtest2017
withimage_info_test2014.json
,image_info_test2015.json
andimage_info_test-dev2015.json
. -
My post explains CocoDetection() using
train2017
withcaptions_train2017.json
,instances_train2017.json
andperson_keypoints_train2017.json
,val2017
withcaptions_val2017.json
,instances_val2017.json
andperson_keypoints_val2017.json
andtest2017
withimage_info_test2017.json
andimage_info_test-dev2017.json
. -
My post explains CocoDetection() using
train2017
withstuff_train2017.json
,val2017
withstuff_val2017.json
,stuff_train2017_pixelmaps
withstuff_train2017.json
,stuff_val2017_pixelmaps
withstuff_val2017.json
,panoptic_train2017
withpanoptic_train2017.json
,panoptic_val2017
withpanoptic_val2017.json
andunlabeled2017
withimage_info_unlabeled2017.json
. - My post explains MS COCO.
CocoCaptions() can use MS COCO dataset as shown below. *This is for train2014
with captions_train2014.json
, instances_train2014.json
and person_keypoints_train2014.json
, val2014
with captions_val2014.json
, instances_val2014.json
and person_keypoints_val2014.json
and test2017
with image_info_test2014.json
, image_info_test2015.json
and image_info_test-dev2015.json
:
*Memos:
- The 1st argument is
root
(Required-Type:str
orpathlib.Path
): *Memos:- It's the path to the images.
- An absolute or relative path is possible.
- The 2nd argument is
annFile
(Required-Type:str
orpathlib.Path
): *Memos:- It's the path to the annotations.
- An absolute or relative path is possible.
- The 3rd argument is
transform
(Optional-Default:None
-Type:callable
). - The 4th argument is
target_transform
(Optional-Default:None
-Type:callable
). - The 5th argument is
transforms
(Optional-Default:None
-Type:callable
). - It must need
pycocotools
on Windows, Linux and macOS: *Memos:- e.g.
pip install pycocotools
. - e.g.
conda install conda-forge::pycocotools
. - Don't use the ways to install
pycocotools
from cocodataset/cocoapi and philferriere/cocoapi because they don't work and even if they are possible, they take a long time to installpycocotools
.
- e.g.
- You need to manually download and extract the datasets(images and annotations) which you want to
coco/
from here as shown below. *You can use other folder structure:
data
โ-coco
|-imgs
| |-train2014
| | |-COCO_train2014_000000000009.jpg
| | |-COCO_train2014_000000000025.jpg
| | |-COCO_train2014_000000000030.jpg
| | ...
| |-val2014/
| |-test2014/
| |-test2015/
| |-train2017/
| |-val2017/
| |-test2017/
| โ-unlabeled2017/
โ-anns
|-trainval2014
| |-captions_train2014.json
| |-instances_train2014.json
| |-person_keypoints_train2014.json
| |-captions_val2014.json
| |-instances_val2014.json
| โ-person_keypoints_val2014.json
|-test2014
| โ-image_info_test2014.json
|-test2015
| |-image_info_test2015.json
| โ-image_info_test-dev2015.json
|-trainval2017
| |-captions_train2017.json
| |-instances_train2017.json
| |-person_keypoints_train2017.json
| |-captions_val2017.json
| |-instances_val2017.json
| โ-person_keypoints_val2017.json
|-test2017
| |-image_info_test2017.json
| โ-image_info_test-dev2017.json
|-stuff_trainval2017
| |-stuff_train2017.json
| |-stuff_val2017.json
| |-stuff_train2017_pixelmaps/
| | |-000000000009.png
| | |-000000000025.png
| | |-000000000030.png
| | ...
| |-stuff_val2017_pixelmaps/
| โ-deprecated-challenge2017
| |-train-ids.txt
| โ-val-ids.txt
|-panoptic_trainval2017
| |-panoptic_train2017.json
| |-panoptic_val2017.json
| |-panoptic_train2017/
| | |-000000000389.png
| | |-000000000404.png
| | |-000000000438.png
| | ...
| โ-panoptic_val2017/
โ-unlabeled2017
โ-image_info_unlabeled2017.json
from torchvision.datasets import CocoCaptions
cap_train2014_data = CocoCaptions(
root="data/coco/imgs/train2014",
annFile="data/coco/anns/trainval2014/captions_train2014.json"
)
cap_train2014_data = CocoCaptions(
root="data/coco/imgs/train2014",
annFile="data/coco/anns/trainval2014/captions_train2014.json",
transform=None,
target_transform=None,
transforms=None
)
ins_train2014_data = CocoCaptions(
root="data/coco/imgs/train2014",
annFile="data/coco/anns/trainval2014/instances_train2014.json"
)
pk_train2014_data = CocoCaptions(
root="data/coco/imgs/train2014",
annFile="data/coco/anns/trainval2014/person_keypoints_train2014.json"
)
len(cap_train2014_data), len(ins_train2014_data), len(pk_train2014_data)
# (82783, 82783, 82783)
cap_val2014_data = CocoCaptions(
root="data/coco/imgs/val2014",
annFile="data/coco/anns/trainval2014/captions_val2014.json"
)
ins_val2014_data = CocoCaptions(
root="data/coco/imgs/val2014",
annFile="data/coco/anns/trainval2014/instances_val2014.json"
)
pk_val2014_data = CocoCaptions(
root="data/coco/imgs/val2014",
annFile="data/coco/anns/trainval2014/person_keypoints_val2014.json"
)
len(cap_val2014_data), len(ins_val2014_data), len(pk_val2014_data)
# (40504, 40504, 40504)
test2014_data = CocoCaptions(
root="data/coco/imgs/test2014",
annFile="data/coco/anns/test2014/image_info_test2014.json"
)
test2015_data = CocoCaptions(
root="data/coco/imgs/test2015",
annFile="data/coco/anns/test2015/image_info_test2015.json"
)
testdev2015_data = CocoCaptions(
root="data/coco/imgs/test2015",
annFile="data/coco/anns/test2015/image_info_test-dev2015.json"
)
len(test2014_data), len(test2015_data), len(testdev2015_data)
# (40775, 81434, 20288)
cap_train2014_data
# Dataset CocoCaptions
# Number of datapoints: 82783
# Root location: data/coco/imgs/train2014
cap_train2014_data.root
# 'data/coco/imgs/train2014'
print(cap_train2014_data.transform)
# None
print(cap_train2014_data.target_transform)
# None
print(cap_train2014_data.transforms)
# None
cap_train2014_data.coco
# <pycocotools.coco.COCO at 0x759028ee1d00>
cap_train2014_data[26]
# (<PIL.Image.Image image mode=RGB size=427x640>,
# ['three zeebras standing in a grassy field walking',
# 'Three zebras are standing in an open field.',
# 'Three zebra are walking through the grass of a field.',
# 'Three zebras standing on a grassy dirt field.',
# 'Three zebras grazing in green grass field area.'])
cap_train2014_data[179]
# (<PIL.Image.Image image mode=RGB size=480x640>,
# ['a young guy walking in a forrest holding an object in his hand',
# 'A partially black and white photo of a man throwing ... the woods.',
# 'A disc golfer releases a throw from a dirt tee ... wooded course.',
# 'The person is in the clearing of a wooded area. ',
# 'a person throwing a frisbee at many trees '])
cap_train2014_data[194]
# (<PIL.Image.Image image mode=RGB size=428x640>,
# ['A person on a court with a tennis racket.',
# 'A man that is holding a racquet standing in the grass.',
# 'A tennis player hits the ball during a match.',
# 'The tennis player is poised to serve a ball.',
# 'Man in white playing tennis on a court.'])
ins_train2014_data[26] # Error
ins_train2014_data[179] # Error
ins_train2014_data[194] # Error
pk_train2014_data[26]
# (<PIL.Image.Image image mode=RGB size=427x640>, [])
pk_train2014_data[179] # Error
pk_train2014_data[194] # Error
cap_val2014_data[26]
# (<PIL.Image.Image image mode=RGB size=640x360>,
# ['a close up of a child next to a cake with balloons',
# 'A baby sitting in front of a cake wearing a tie.',
# 'The young boy is dressed in a tie that matches his cake. ',
# 'A child eating a birthday cake near some balloons.',
# 'A baby eating a cake with a tie around ... the background.'])
cap_val2014_data[179]
# (<PIL.Image.Image image mode=RGB size=500x302>,
# ['Many small children are posing together in the ... white photo. ',
# 'A vintage school picture of grade school aged children.',
# 'A black and white photo of a group of kids.',
# 'A group of children standing next to each other.',
# 'A group of children standing and sitting beside each other. '])
cap_val2014_data[194]
# (<PIL.Image.Image image mode=RGB size=640x427>,
# ['A man hitting a tennis ball with a racquet.',
# 'champion tennis player swats at the ball hoping to win',
# 'A man is hitting his tennis ball with a recket on the court.',
# 'a tennis player on a court with a racket',
# 'A professional tennis player hits a ball as fans watch.'])
ins_val2014_data[26] # Error
ins_val2014_data[179] # Error
ins_val2014_data[194] # Error
pk_val2014_data[26] # Error
pk_val2014_data[179] # Error
pk_val2014_data[194] # Error
test2014_data[26]
# (<PIL.Image.Image image mode=RGB size=640x640>, [])
test2014_data[179]
# (<PIL.Image.Image image mode=RGB size=640x480>, [])
test2014_data[194]
# (<PIL.Image.Image image mode=RGB size=640x360>, [])
test2015_data[26]
# (<PIL.Image.Image image mode=RGB size=640x480>, [])
test2015_data[179]
# (<PIL.Image.Image image mode=RGB size=640x426>, [])
test2015_data[194]
# (<PIL.Image.Image image mode=RGB size=640x480>, [])
testdev2015_data[26]
# (<PIL.Image.Image image mode=RGB size=640x360>, [])
testdev2015_data[179]
# (<PIL.Image.Image image mode=RGB size=640x480>, [])
testdev2015_data[194]
# (<PIL.Image.Image image mode=RGB size=640x480>, [])
import matplotlib.pyplot as plt
def show_images(data, ims, main_title=None):
file = data.root.split('/')[-1]
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(14, 8))
fig.suptitle(t=main_title, y=0.9, fontsize=14)
x_crd = 0.02
for i, axis in zip(ims, axes.ravel()):
if data[i][1]:
im, anns = data[i]
axis.imshow(X=im)
y_crd = 0.0
for j, ann in enumerate(iterable=anns):
text_list = ann.split()
if len(text_list) > 9:
text = " ".join(text_list[0:10]) + " ..."
else:
text = " ".join(text_list)
plt.figtext(x=x_crd, y=y_crd, fontsize=10,
s=f'{j}:\n{text}')
y_crd -= 0.06
x_crd += 0.325
if i == 2 and file == "val2017":
x_crd += 0.06
elif not data[i][1]:
im, _ = data[i]
axis.imshow(X=im)
fig.tight_layout()
plt.show()
ims = (26, 179, 194)
show_images(data=cap_train2014_data, ims=ims,
main_title="cap_train2014_data")
show_images(data=cap_val2014_data, ims=ims,
main_title="cap_val2014_data")
show_images(data=test2014_data, ims=ims,
main_title="test2014_data")
show_images(data=test2015_data, ims=ims,
main_title="test2015_data")
show_images(data=testdev2015_data, ims=ims,
main_title="testdev2015_data")
pytorch Article's
30 articles in total
Face Recognition with Python and FaceNet
read article
RandomAffine in PyTorch
read article
CocoCaptions in PyTorch (2)
read article
CocoDetection in PyTorch (3)
read article
CocoDetection in PyTorch (2)
read article
square in PyTorch
read article
any in PyTorch
read article
atleast_2d in PyTorch
read article
atleast_1d in PyTorch
read article
atleast_3d in PyTorch
read article
fmod in PyTorch
read article
remainder in PyTorch
read article
sub in PyTorch
read article
mul in PyTorch
read article
linspace in PyTorch
read article
arange in PyTorch
read article
unsqueeze in PyTorch
read article
squeeze in PyTorch
read article
all in PyTorch
read article
ColorJitter in PyTorch
read article
ImageNet in PyTorch
read article
Asynchronous Python: What You Need to Know ๐๐๐
read article
Install PyTorch and JupyterLab on Ubuntu
read article
Datasets for Computer Vision (5)
read article
AI Pronunciation Trainer
read article
CocoCaptions in PyTorch (1)
currently reading
CocoCaptions in PyTorch (3)
read article
CocoDetection in PyTorch (1)
read article
pow in PyTorch
read article
Why PyTorch Stole the Spotlight from TensorFlow?
read article
Featured ones: