This page is the entry point to the DAMEWARE (Web Application REsource of DAME) (beta release) specialized for data mining on massive data sets. It is a toolset of machine learning models to manage and explore data in various formats. In this page the users can obtain news, documentation, dataset samples and technical support about the web application.
The official release 1.0 is available here
Practical user's guide videos also available on DAMEmedia channel
Current release 1.0:
- The official 1.0 release is available here. This is the first official version deployed at the end of September 2013. The current release offers the following new features:
- first parallel implementation of a machine learning model (FMLPGA), based on the GPGPU+CUDA environment, enabling a speedup gain of about 8x;
- pluggable user data mining models via the dmplugin client application (downloadable here)
- new machine learning models available (for their description see manuals):
- Statistical Calculations tool
- Random Forest, for multivariate classification and regression
- SOM for feature extraction
- SOM + automatic post-processing phase for clustering
- SOM + K-means for clustering
- SOM + Two Winners Linkage (TWL) for clustering
- SOM + U-matrix with Connected Components (UmatCC) for clustering
- Evolving SOM (ESOM) for clustering
- MLP-LEMON (Levenberg-Marquardt Optimization Network) for classification/regression
- updating of main menu options
- several bugs fixed.
back to top page
PLUGIN CREATOR - How to include user own code
Users can extend the data mining model library integrated into the web app, by simply download and run our Java applications, which through a driven procedure generate source code to be integrated into the web app software infrastructure.
Please, consult manuals below for detailed information.
back to top page
The following are some data files you can use during experiments for Training, Test or Run cases. These examples are also useful to learn the right format of data that the application can handle and how to make the correct setup of experiments with available models.
- xor.csv (CSV format, 2 input columns + target column), usable as Training/Test input file;
- xor_run.csv (CSV format, 2 input columns), usable as Run input file;
- seyfert.csv (CSV format, 15 input columns + 1 target column), usable as Training/Test input file;
- test.dat (ASCII format, 4 input columns), usable as Run input file;
- train.dat (ASCII format, 5 columns, 4 input + 1 target), usable as Training/Test input file;
- train.fits (FITS format, 5 columns, same of train.dat), usable as Training/Test input file;
- test.fits (FITS format, 4 columns, same of test.dat), usable as Run input file;
- train.csv (CSV format, 5 columns, same of train.dat), usable as Training/Test input file;
- train.votable (VOTable format, 5 columns, same of train.dat), usable as Training/Test input file;
- dataset_training.dat (ASCII format, 5 columns, 4 input + 1 target), usable as Training/Test input;
- dataset_train_80.ascii (ASCII format, 5 columns, 4 input + 1 target), containing the shuffled 80% of rows of dataset_training.dat file, usable as training input file;
- dataset_test_20.ascii (ASCII format, 5 columns, 4 input + 1 target), containing the shuffled 20% of rows of dataset_training.dat file, usable as test input file;
- dataset_run.dat (ASCII format, 4 columns), usable as Run input;
- dataset_training.fits (FITS format, same of dataset_training.dat), usable as Training/Test input;
- dataset_run.fits (FITS format, same of dataset_run.dat), usable as Run input;
- M_101.fits (FITS image format), usable as CSOM input;
- sky.fits (FITS image format), usable as CSOM input;
- wine_cl_train.csv (CSV format, 13 inputs, 1 output), usable as classification training input;
- wine_cl_run.csv (CSV format, 13 inputs), usable as classification Run input;
Moreover, following links can be useful to collect and extract generic datasets for both classification or regression experiments:
back to top page
- How can I have the access to the web application?
Downloading data files from a Workspace on local machine.
Sometimes, after operations in new tabs, coming back in the Resource Manager, the previously selected workspace is no more highlighted. Is it normal?
Which is the max length for workspace's name?
Is it possible to create two workspaces with the same name?
In order to operate on dataset editing, is it possible to perform multiple edit options before to save the final dataset file?
The application provides a registration procedure, which consists of a form to be filled in and sent to the administrator. Immediately after you will receive a welcome message to your specified e-mail address and after (within max 36 hours) another message with the confirmation of your registration and related private info to access the application.
Running the webapp with DIFFERENT BROWSERS, many features have a uncorrect behavior.
How can I cancel a previously created workspace?
Is it possible to move data files from one Workspace to another?
Is it possibile to use and handle data files in ARFF format (.ARFF)?
if I have already enqueued an experiment and it is running, does it keep running on the DAME servers even if I log out of my system? So in other words, can I enqueue an experiment, shut down my computers and turn it back on much later to find the experiment complete?
Trying to access the application, appears the failure message: GENERAL COMMUNICATION FAILURE! INTERNAL CODE:500 The call failed on the server; see server log for details MainApp. What's happened? How I can solve this?
No!, The dataset modification can be executed step by step. A sequence of editing options is allowed, but performed one at a time. Each time you apply an option to modify the dataset, a new file is created and stored in the Workspace file list with the suffix recalling the selected option. If you want to make another modification you have to load the stored new dataset file for the next editing.
It results impossible to cancel failed experiments from the workspace. Why?
May I perform clustering experiments with multi-image FITS files?
May I visualize multi-image FITS files through the Image Viewer menu option?
Having plot tabs already open, why I can't create plot of new uploaded data files?
After a series of experiments, your browser shows errors during the interaction with the web application?
This is a false failure message! Sometimes, if the webapp has been updated, for maintenance reasons, it is required to flush the browser cache by the user before to access the application.
Which are basic prescriptions to be followed to properly organize data and the setup of any neural network model?
Depending on your local machine and web browser configuration, a long time interaction with the web app could cause bad browser behavior. It is strongly suggested to close and restart the browser. Even if unsufficient to solve the problem, a complete reset of browser cache or even a restart of your machine are required.
Input features to any machine learning model must be scalars, not arrays of values or chars. In case, you could try to find numerical representation of any not scalar or alphanumerical quantities.
The input layer of a generic hierarchical neural network must be populated according to the number of physical input features of your table entries. There must be a perfect correspondence between number of input nodes and input features (columns of your table).
All objects (rows) of an input table must have exactly the same number of columns. No rows with variable number of columns are allowed.
Hidden layers of any multi-layer feed-forward model (i.e. layers between input and output ones) must contain a decreasing number of nodes, usually by following an empirical law: given N input nodes, the first hidden layer should have 2N+1 nodes at least; the optional second hidden layer N-1 and so on....
For most of the available neural networks models (with exceptions of Random Forest and SVM), the class target column should be encoded with a binary representation of the class label. For example, if you have 3 different classes, you must create three different columns of targets, by encoding the 3 classes as, respectively: 001, 010, 100.
back to top page