Comparison with Other AI Technologies

Comparison with Deep Learning

Even Small Amounts of Data Can Be Used to Make Highly Accurate Judgements

Wide Learning™ can work with a small amount of data at hand, it does not need a large amount of data for learning.

This is an advantage over deep learning.

Naturally, the more data you have in Wide Learning™, the more accurate it becomes and this leads to new discoveries.


Experiment 1: Comparison Using Small Amounts of Data

Deep learning needs at least one to tens of thousands of pieces of data to achieve high accuracy. In contrast, other popular technologies used in AI other than deep learning, can often achieve practical accuracy with as little as 100 to 1,000 samples of data. In fact, experiments comparing Wide Learning™ with deep learning on small amounts of data have shown that Wide Learning™ is more accurate (figure below).


Experiment 2: Comparison Using Large Amounts of Data

Comparative experiments with large amounts of data have also shown higher accuracy than deep learning (figure below).


Experimental Detail

Each of the following data sets was evaluated on the average of the F values of each trial in a 5-fold cross-validation.

For each label in each data set, data for binary classification was created and tested.

We use a deep learning model that has 5 fully connected layers, each parameter of which is set to default.

We set the number of epochs to 1,000, which is enough that the value of the loss function converges to 1,000.

Data sets used in experiment 1 (*1)

Identifying breast tissue: Breast Tissue Data Set
Identifying glass types: Glass Identification Data Set

Data set used in experiment 2 (*1)

Determining space shuttle flight status: Statlog (Shuttle) Data Set (Use shuttle.trn.Z)

*1 : Dua, D. and Graff, C. (2019).
UCI Machine Learning Repository.
Irvine, CA: University of California, School of Information and Computer Science.

Comparison with Other Major AI Technologies

Comparing Wide Learning™ with other major Ai technologies (machine-learning algorithms), decision trees, logistic regression, and random forest, we find the following:

  • Wide Learning™ is strong for a wide range of problems.
  • For categorical data (*2), Wide Learning™ achieves the same or better accuracy compared with other AI technologies when the data size is small.
  • It improves accuracy compared to other AI algorithms when the number of data increases.

*2 : Categorical data refers to data that has a finite range of values (listable). For example, gender (male or female), company names, etc. On the other hand, data which is not categorical data is numerical data (There are infinite possible values).
Even numerical data can be treated as categorical data by dividing values using a process called discretization. For example, "be greater than or equal to A" and "the value is less than or equal to B." can be used as delimiters. However, since the classification accuracy is affected by the method of discretization, data sets limited to categorical data were selected from the UCI Repository in order to eliminate this problem.


Verification Using Multiple Data Sets

These features were examined using multiple data sets (the experimental detail is described later).

Classification problems can be classified as follows, depending on the content of the data used:

  1. Problems that most methods show high accuracy.
  2. Problems that linear classifiers tend to show high accuracy. (*3)
  3. Problems that linear classifiers tend not to show high accuracy.

We performed three experiments using data sets, each of which has different features.

*3 : A typical linear classifier is a logistic regression.


Experiment Result 1: Problems that Most Methods Show High Accuracy

This data set was used as a binary classification problem whose positive class is a minority.

Accuracy of algorithms except decision trees are approximately the same; Wide Learning™ achieves high accuracy.


Experiment Result 2: Problems that Linear Classifiers Tend to Show High Accuracy

Wide Learning™ also achieves high accuracy for linear classification problems that logistic regression can achieve high accuracy for.

Wide Learning™ and logistic regression achieve very high accuracy with less than 1,000 samples.

Random forest required 5,000 samples to achieve equivalent accuracy.

The difference between their accuracy is especially large when the number of samples is small, e.g., the number of samples is 100.

This data set was used as a binary classification problem whose positive class is a minority.


Experiment Result 3: Problems that Linear Classifiers Tend Not to Show High Accuracy

This experiment is an example that logistic regression does not improve accuracy when the data increases.

The improvement of accuracy for logistic regression is limited at the point that the number of data reaches about 1,000.

In contrast, Wide Learning™ has exceeded that limit, and the more training samples it has, the more accurate it becomes.

Even when compared to random forest, which is said to have low interpretability but high accuracy, the range of increase in accuracy is large. With small amounts of data (for example, the number of samples is 100), they are more accurate than random forest.


Experimental Detail

We use the following data sets (*4) by applying One-Hot encoding.

The HIV-1 protease cleavage and Mushroom data sets were used as binary classification problems with minority classes as positive examples.

Nursery was used as a binary classification problem with class priority as a positive class.

For each training data, whose sample sizes are 100, 250, 500, 750, 1000, 2000, 3000, 4000, and 5000, its accuracy was computed using the following procedure'

  1. A data set is divided into training data and test data. The division is performed at random so that the ratio of positive examples and negative examples does not change.
  2. Hyperparameters are tuned using Optuna based on the average F values obtained by five-segment cross-validation using training data. To enable sufficient convergence, the number of Optuna trials is 1,000.
  3. A model is created by learning using training data with tuned hyperparameters.
  4. The test data is classified by the created model, and the accuracy (F-value) is measured.
  5. The median value of the results obtained by performing steps 1 - 4 100 times is plotted.

*4 : Dua, D. and Graff, C. (2019).
UCI Machine Learning Repository.
Irvine, CA: University of California, School of Information and Computer Science.