Getting Started¶
To clone the latest commit of astroNN from github
$ git clone --depth=1 git://github.com/henrysky/astroNN
Recommended method of installation as astroNN is still in active development and will update daily:
$ python setup.py develop
Or run the following command to install after you open a command line window in the package folder:
$ python setup.py install
Or install via pip
(Not recommended so far):
astroNN on Python PyPI
$ pip install astroNN
Prerequisites¶
Latest version of Anaconda is recommended, but generally the use of Anaconda is still highly recommended
Python 3.6 or above
Tensorflow OR Tensorflow-gpu (latest version is recommended)
Keras (Optional but a latest Keras version is recommended, Must be configured Tensorflow as backends)
CUDA and CuDNN (only neccessary for Tensorflow-gpu)
graphviz and pydot_ng are required to plot the model architecture
scikit-learn, tqdm and astroquery required for some basic astroNN function
Since both Tensorflow and Keras are rapidly developing packages and astroNN heavily depends on Tensorflow. The support policy of astroNN to these packages is only the last 2 official versions are supported (i.e. the latest and the previous version are included in test suite). Generally the latest version of Tensorflow and optional Keras is recommended. The current supporting status (i.e. included in test cases) are
Tensorflow OR Tensorflow-gpu 1.9.0 without Keras
Tensorflow OR Tensorflow-gpu 1.9.0 with Keras 2.2.2
Tensorflow OR Tensorflow-gpu 1.8.0 with Keras 2.2.0
For instruction on how to install Tensorflow, please refers to their official website Installing TensorFlow
Although Keras is optional, but its highly recommended. For instruction on how to install Keras, please refers to their official website Installing Keras
If you install tensorflow instead of tensorflow-gpu, Tensorflow will run on CPU. Currently official Tensorflow python wheels do not compiled with AVX2 - sets of CPU instruction extensions that can speed up calculation on CPU. If you are using tensorflow which runs on CPU only , you should consider to download High Performance Tensorflow MacOS build for MacOS, Or High Performance Tensorflow Windows build for Windows.
Recommended system requirement:
64-bits operating system
CPU which supports AVX2 (Intel CPU 2014 or later, AMD CPU 2015 or later)
8GB RAM or above
NVIDIA Graphics card (Optional, GTX900 series or above)
(If using NVIDIA GPU): At least 2GB VRAM on GPU
Note
Multi-GPU, Intel/AMD graphics is not supported. Only Windows and Linux is officially supported by Tensorflow-GPU with compatible NVIDIA graphics
Basic FAQ¶
My hardware or software cannot meet the prerequisites, what should I do?¶
The hardware and software requirement is just an estimation. It is entirely possible to run astroNN without those requirement. But generally, python 3.6 or above (as Tensorflow only supports py36 or above) and mid-to-high end hardware.
Can I contribute to astroNN?¶
Yes, you can contact me (Henry: henrysky.leung [at] mail.utoronto.ca) and tell me your idea
I have found a bug in astorNN¶
Please try to use the latest commit of astroNN. If the issue persists, please report to https://github.com/henrysky/astroNN/issues
I keep receiving warnings on APOGEE and Gaia environment variables¶
If you are not dealing with APOGEE or Gaia data, please ignore those warnings. If error raised to prevent you to use some of astroNN functionality, please report it as a bug to https://github.com/henrysky/astroNN/issues
If you don’t want those warnings to be shown again, go to astroNN’s configuration file and set environmentvariablewarning
to False
I have installed pydot_ng and graphviz but still fail to plot the model¶
if you are encountering this issue, please uninstall both pydot
and graphviz
and run the following command
$ pip install pydot
$ conda install graphviz
Then if you are using Mac, run the following command
$ brew install graphviz
If you are using Windows, go to https://graphviz.gitlab.io/_pages/Download/Download_windows.html to download the Windows package and add the package to the PATH environment variable.
Configuration File¶
astroNN configuration file is located at ~/.astroNN/config.ini
which contains a few astroNN settings.
Currently, the default configuration file should look like this
[Basics]
magicnumber = -9999.0
multiprocessing_generator = False
environmentvariablewarning = True
tensorflow_keras = auto
[NeuralNet]
custommodelpath = None
cpufallback = False
gpu_mem_ratio = True
magicnumber
refers to the Magic Number which representing missing labels/data, default is -9999. Please do not change
this value if you rely on APOGEE data.
multiprocessing_generator
refers to whether enable multiprocessing in astroNN data generator. Default is False
except on Linux and MacOS.
environmentvariablewarning
refers to whether you will be warned about not setting APOGEE and Gaia environment variable.
tensorflow_keras
refers to whether use keras or tensorflow.keras. Default option is auto
to let astroNN
to decide (keras always be considered first), tensorflow
to force it to use tensorflow.keras or keras
to
force it to use keras
custommodelpath
refers to a list of custom models, path to the folder containing custom model (.py files),
multiple paths can be separated by ;
.
Default value is None means no path. Or for example: /users/astroNN/custom_models/;/local/some_other_custom_models/
cpufallback
refers to whether force to use CPU. No effect if you are using tensorflow instead of tensorflow-gpu
gpu_mem_ratio
refers to GPU management. Set True
to dynamically allocate memory which is astroNN default or enter a float between 0 and 1
to set the maximum ratio of GPU memory to use or set None
to let Tensorflow pre-occupy all of available GPU memory
which is a designed default behavior from Tensorflow.
For whatever reason if you want to reset the configure file:
from astroNN.config import config_path
# astroNN will reset the config file if the flag = 2
config_path(flag=2)
Folder Structure for astroNN, APOGEE, Gaia and LAMOST data¶
This code depends on environment variables and folders for APOGEE, Gaia and LAMOST data. The environment variables are
SDSS_LOCAL_SAS_MIRROR
: top-level directory that will be used to (selectively) mirror the SDSS Science Archive Server (SAS)GAIA_TOOLS_DATA
: top-level directory under which the Gaia data will be stored.LASMOT_DR5_DATA
: top-level directory under which the LASMOST DR5 data will be stored.
How to set environment variable on different operating system: Guide here
$SDSS_LOCAL_SAS_MIRROR/
├── dr14/
│ ├── apogee/spectro/redux/r8/stars/
│ │ ├── apo25m/
│ │ │ ├── 4102/
│ │ │ │ ├── apStar-r8-2M21353892+4229507.fits
│ │ │ │ ├── apStar-r8-**********+*******.fits
│ │ │ │ └── ****/
│ │ ├── apo1m/
│ │ │ ├── hip/
│ │ │ │ ├── apStar-r8-2M00003088+5933348.fits
│ │ │ │ ├── apStar-r8-**********+*******.fits
│ │ │ │ └── ***/
│ │ ├── l31c/l31c.2/
│ │ │ ├── allStar-l30e.2.fits
│ │ │ ├── allVisit-l30e.2.fits
│ │ │ ├── 4102/
│ │ │ │ ├── aspcapStar-r8-l30e.2-2M21353892+4229507.fits
│ │ │ │ ├── aspcapStar-r8-l30e.2-**********+*******.fits
│ │ │ │ └── ****/
│ │ │ └── Cannon/
│ │ │ └── allStarCannon-l31c.2.fits
└── dr13/
└── *similar to dr14 above/*
$GAIA_TOOLS_DATA/
└── Gaia/
├── gdr1/tgas_source/fits/
│ ├── TgasSource_000-000-000.fits
│ ├── TgasSource_000-000-001.fits
│ └── ***.fits
└── gdr2/gaia_source_with_rv/fits/
├── GaiaSource_2851858288640_1584379458008952960.fits
├── GaiaSource_1584380076484244352_2200921635402776448.fits
└── ***.fits
$LASMOT_DR5_DATA/
└── DR5/
├── LAMO5_2MS_AP9_SD14_UC4_PS1_AW_Carlin_M.fits
├── 20111024
│ ├── F5902
│ │ ├──spec-55859-F5902_sp01-001.fits.gz
│ │ └── ****.fits.gz
│ └── ***/
├── 20111025
│ ├── B6001
│ │ ├──spec-55860-B6001_sp01-001.fits.gz
│ │ └── ****.fits.gz
│ └── ***/
└── ***/
Note
The APOGEE and Gaia folder structure should be consistent with APOGEE and gaia_tools python package by Jo Bovy, tools for dealing with APOGEE and Gaia data
A dedicated project folder is recommended to run astroNN, always run astroNN under the root of project folder. So that astroNN will always create folder for every neural network you run under the same place. Just as below