diff --git a/README-Benchmarks.md b/README-Benchmarks.md new file mode 100644 index 0000000000000000000000000000000000000000..55f889fc5044cb17f753c3d7a5c120d50676ce74 --- /dev/null +++ b/README-Benchmarks.md @@ -0,0 +1,42 @@ + +CIFAR-10: Computer-vision images dataset used for object recognition +- Description: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. +- Size: 163 MB (python version) +- File type: compressed archive +- License: If you're going to use this dataset, please cite the tech report: Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009. +- Tag: +- Prace dataset location: +- url: https://www.cs.toronto.edu/~kriz/cifar.html +- Download url:https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz, md5sum: c58f30108f718f92721af3b95e74349a +- Prace Download script: ./cifar10-download.sh + +=================================================================== +=================================================================== + +ImageNet: Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) +- Description: Training data 1,281,167 224x224 colour images in 1000 synsets. Validation data 50 images/synset. Test data 100 images/synset. +- Size: + + Training images (Task 1 & 2). 138GB. MD5: 1d675b47d978889d74fa0da5fadfb00e + Training images (Task 3). 728MB. MD5: ccaf1013018ac1037801578038d370da + Validation images (all tasks). 6.3GB. MD5: 29b22e2961454d5413ddabcf34fc5622 + Test images (all tasks). 13GB. MD5: fe64ceb247e473635708aed23ab6d839 + +- File type: archive file +- License: + +Terms of use: by downloading the image data from the above URLs, you agree to the following terms: + + You will use the data only for non-commercial research and educational purposes. + You will NOT distribute the above URL(s).p + Stanford University and Princeton University make no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose. + You accept full responsibility for your use of the data and shall defend and indemnify Stanford University and Princeton University, including their employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data. + +- Tag: +- Prace dataset location: +- url: http://www.image-net.org/challenges/LSVRC/2012 +- Download url: http://www.image-net.org/challenges/LSVRC/2012 +- Prace Download script: no + +=================================================================== +=================================================================== diff --git a/README-UseCases.md b/README-UseCases.md new file mode 100644 index 0000000000000000000000000000000000000000..f54d6b1b8e39aae2dc9d263b96c73362d6906794 --- /dev/null +++ b/README-UseCases.md @@ -0,0 +1,17 @@ + +RAMP Astro benchmark: Galaxy deblending +- Authors: + Bertrand Rigaud - + Alexandre Boucaud - +- Description: The challenge uses 20 000 images (128x128 pixels, 32 bits) +- Size: + The dataset (2 Gb) is divided into 12 000 images for the training, 4 000 for the validation (during training) and 4 000 images for the final evaluation of the model. +- File type: 4 npy files: data_train.npy, data_test.npy, labels_test.npy, labels_train.npy +- License: This work is released under a BSD-3 license +- Tag: +- Prace dataset location: +- url: +- Download url: +- Download script: ./download_data.py + +=================================================================== diff --git a/README.md b/README.md index da167df06e0f5647dd5c1439737733f3c0b91c1b..84e709c42a8ae7eaa711eaf1b8be8da73f0b1116 100644 --- a/README.md +++ b/README.md @@ -1,56 +1,5 @@ -This project gathers information and scripts that references the location of datasets that can be used for Data Analytics. These standard datasets, which are often used by researchers -or may come from use cases. All these datasets can be used to gain expertise. - - -CIFAR-10: Computer-vision images dataset used for object recognition -- Description: The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. -- Size: 163 MB (python version) -- File type: compressed archive -- License: If you're going to use this dataset, please cite the tech report: Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009. -- Tag: -- Prace dataset location: -- url: https://www.cs.toronto.edu/~kriz/cifar.html -- Download url:https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz, md5sum: c58f30108f718f92721af3b95e74349a -- Prace Download script: ./cifar10-download.sh - -=================================================================== - -ImageNet: Large Scale Visual Recognition Challenge 2012 (ILSVRC2012) -- Description: Training data 1,281,167 224x224 colour images in 1000 synsets. Validation data 50 images/synset. Test data 100 images/synset. -- Size: - - Training images (Task 1 & 2). 138GB. MD5: 1d675b47d978889d74fa0da5fadfb00e - Training images (Task 3). 728MB. MD5: ccaf1013018ac1037801578038d370da - Validation images (all tasks). 6.3GB. MD5: 29b22e2961454d5413ddabcf34fc5622 - Test images (all tasks). 13GB. MD5: fe64ceb247e473635708aed23ab6d839 - -- File type: archive file -- License: - -Terms of use: by downloading the image data from the above URLs, you agree to the following terms: - - You will use the data only for non-commercial research and educational purposes. - You will NOT distribute the above URL(s).p - Stanford University and Princeton University make no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose. - You accept full responsibility for your use of the data and shall defend and indemnify Stanford University and Princeton University, including their employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data. - -- Tag: -- Prace dataset location: -- url: http://www.image-net.org/challenges/LSVRC/2012 -- Download url: http://www.image-net.org/challenges/LSVRC/2012 -- Prace Download script: no - -=================================================================== - -Astro bench: - - - - - -=================================================================== - - - +This project gathers information and scripts that references the location of datasets that can be used for Data Analytics. These standard datasets, are often used by researchers or may come from real use cases. All these datasets can be used to gain expertise. +- Refer to README-Benchmarks.md to get information on Benchmarks. +- Refer to README-Benchmarks.md to get information on real use cases.