Ideas-Laboratory
diff --git a/‎README.md‎
Lines changed: 113 additions & 0 deletions b/‎README.md‎
Lines changed: 113 additions & 0 deletions
diff --git a/‎codes/graphBuild/__pycache__/dg_utils.cpython-38.pyc‎
-210 Bytes b/‎codes/graphBuild/__pycache__/dg_utils.cpython-38.pyc‎
-210 Bytes
diff --git a/‎codes/graphBuild/dg_utils.py‎
Lines changed: 0 additions & 8 deletions b/‎codes/graphBuild/dg_utils.py‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎codes/graphSim/build-graphSim-Desktop_Qt_5_13_1_clang_64bit-Debug/.qmake.stash‎
Lines changed: 0 additions & 44 deletions b/‎codes/graphSim/build-graphSim-Desktop_Qt_5_13_1_clang_64bit-Debug/.qmake.stash‎
Lines changed: 0 additions & 44 deletions
diff --git a/‎codes/graphSim/graph.cpp‎
Lines changed: 4 additions & 23 deletions b/‎codes/graphSim/graph.cpp‎
Lines changed: 4 additions & 23 deletions
diff --git a/‎codes/graphSim/graph.h‎
Lines changed: 0 additions & 3 deletions b/‎codes/graphSim/graph.h‎
Lines changed: 0 additions & 3 deletions
diff --git a/‎codes/graphSim/graph_sim.cpp‎
Lines changed: 2 additions & 0 deletions b/‎codes/graphSim/graph_sim.cpp‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎codes/graphSim/graph_sim.h‎
Lines changed: 3 additions & 4 deletions b/‎codes/graphSim/graph_sim.h‎
Lines changed: 3 additions & 4 deletions
diff --git a/‎codes/graphSim/main.cpp‎
Lines changed: 4 additions & 3 deletions b/‎codes/graphSim/main.cpp‎
Lines changed: 4 additions & 3 deletions
@@ -0,0 +1,113 @@
+# Joint t-sne
+This is the implementation for paper [Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets]([ddd](http://www.yunhaiwang.net/Vis2021/joint-tsne)). 
+
+## abstract:
+We present Joint t-Stochastic Neighbor Embedding (Joint t-SNE), a technique to generate comparable projections of multiple high-dimensional datasets. Although t-SNE has been widely employed to visualize high-dimensional datasets from various domains, it is limited to projecting a single dataset. When a series of high-dimensional datasets, such as datasets changing over time, is projected independently using t-SNE, misaligned layouts are obtained. Even items with identical features across datasets are projected to different locations, making the technique unsuitable for comparison tasks. To tackle this problem, we introduce edge similarity, which captures the similarities between two adjacent time frames based on the Graphlet Frequency Distribution (GFD). We then integrate a novel loss term into the t-SNE loss function, which we call vector constraints, to preserve the vectors between projected points across the projections, allowing these points to serve as visual landmarks for direct comparisons between projections. Using synthetic datasets whose ground-truth structures are known, we show that Joint t-SNE outperforms existing techniques, including Dynamic t-SNE, in terms of local coherence error, Kullback-Leibler divergence, and neighborhood preservation. We also showcase a real-world use case to visualize and compare the activation of different layers of a neural network.
+
+
+### Environment
++ This is a hybrid programming based on C++ and Python, and supported by shell script.
++ It requires [Qt](https://www.qt.io/), [Python 3.6](https://www.python.org/), [numpy](https://numpy.org/) and [scikit-learn](https://scikit-learn.org/).
+
+## How to use
+1. Put the directory of your data sequence, e.g. "YOUR_DATA" in **Joint_tsne/data**. There are several requirements on the format and organization of your data: 
+   + Each data frame is named as *f_i.txt*, where *i* is the time step/index of this data frame in the sequence.
+   + The *j* th row of the data frame contains both the feature vector and label of the *j* th item, which is seperated by \tab. The label is at the last position.
+   + All data frames must have the same number of rows, and the the same item is at the same row in different data frames to compute the node similarities one by one.  
+
+
+2. Create a configuration file, e.g. "YOUR_DATA.json" in **Joint_tsne/config**, which is organized as a json structure.
+
+<code>
+
+```json
+{
+  "algo": {
+    "k_closest_count": 3,
+    "perplexity": 70,
+    "bfs_level": 1,
+    "gamma": 0.1
+  },
+  "thesne": {
+    "data_name": "YOUR_DATA",
+    "pts_size": 2000,
+    "norm": false,
+    "data_ids": [1, 3, 6, 9],
+    "data_dims": [100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
+    "data_titles": [
+      "t=0",
+      "t=1",
+      "t=2",
+      "t=3",
+      "t=4",
+      "t=5",
+      "t=6",
+      "t=7",
+      "t=8",
+      "t=9"
+    ]
+  }
+}
+```
+</code>
+
+In this file, *algo* represents the hyperparamters of our algorithm except for *bfs_level*, which always equals to 1. *thesne* contains the information of the input data. Please remember that *data_name* must be consistent with the directory name in the previous step.
+
+3. Create a shell script e.g. "YOUR_DATA.sh" in **Joint_tsne/scripts** as below:
+
+<code>
+    
+```shell
+# !/bin/bash
+# 1. specify the configuration file with absolute file path
+config_path="xxx/Joint_tsne/config/YOUR_DATA.json"
+
+workdir=$(cd $(dirname $0); pwd)
+
+# 2. build knn graph for each data frame
+python3 ../codes/graphBuild/run.py $config_path
+
+# 3. compute edge similarities between each two adjacent data frames
+buildDir="../codes/graphSim/build"
+if [ ! -d $buildDir ]; then
+    mkdir $buildDir
+    echo "create directory ${buildDir}"
+else
+    echo "directory ${buildDir} already exists."
+fi
+cd $buildDir
+qmake ../
+make
+
+# bin is dependent on your operating system
+bin=./graphSim.app/Contents/MacOS/graphSim
+$bin $config_path
+
+cd $workdir
+
+# 4. run t-sne optimization
+python3 ../codes/thesne/run.py $config_path
+```
+</code>
+
+There are several places you should pay attention to. 
++ Again, *config_path* must be consitent with the name of configuration file in previous step
++ *bin* is dependent on your operating system. If you use linux, you should change it to 
+
+        bin=./graphSim
+
+4. change your directory to **Joint_tsne/scripts** and type 
+
+<code>
+
+    sh YOUR_SHELL.sh
+
+</code>
+
+The final embeddings will be generated in **Joint_tsne/results/YOUR_DATA**.
+
+
+
+
+### Example
+You can find an example in **Joint_tsne/scripts/10_cluster_contract.sh**.
@@ -12,14 +12,6 @@ def ClearDir(dirpath):
     os.makedirs(dirpath)
 
 
-def GetFilesIn(dir):
-    res = []
-    for dir, _, files in os.walk(dir):
-        for file in files:
-            res.append(dir + file)
-    return res
-
-
 def GetGraphIDFromPath(path):
     print(path)
     ID = path.split(".")[-2].split("_")[-1]
 
@@ -1,6 +1,5 @@
 #include "graph.h"
 #include <QDebug>
-
 GuiseGraphlet Graph::GuiseInit(int gletSize, int sid)
 {
     QVector<bool> visited(nodeNum(), false);
@@ -182,7 +181,7 @@ void Graph::GUISE(int sCount, int sid)
 
         QVector<GuiseGraphlet> dgy = popNeighborGuise(gy);
 
-        float acProb = std::min((float)dgx.size()/dgy.size(), 1.f);
+        float acProb = std::min(static_cast<float>(dgx.size())/dgy.size(), 1.f);
         float r = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
         if (r < acProb)     // accept it
         {
@@ -496,7 +495,7 @@ QVector<float> Graph::GetfeatureVectorRW(int sid, int neighborSize)
     for (int i = 0; i < layers.size(); i++)
     {
         QVector<float> tmp = layers[i];
-        math_utils::normalize(tmp);
+        normalize(tmp);
         featureVector.append(tmp);
     }
 
@@ -516,7 +515,7 @@ int Graph::graphType(Graph glet)
     }
 
     // sort the degrees and return sorted index
-    QVector<int> indices = math_utils::sortIdx(signature);
+    QVector<int> indices = sortIdx(signature);
 
 
     // 3 nodes
@@ -666,24 +665,6 @@ int Graph::graphType(Graph glet)
     }
 }
 
-
-int Graph::gletTypeNum(int k)
-{
-    if (k == 3)
-    {
-        return 2;
-    }
-    else if (k == 4)
-    {
-        return 6;
-    }
-    else if (k == 5)
-    {
-        return 21;
-    }
-}
-
-
 void Graph::printGraphlet(const GraphLet &graphlet)
 {
     // print adjacency list
@@ -710,7 +691,7 @@ int Graph::gletType(GraphLet glet)
     }
 
     // sort the degrees and return sorted index
-    QVector<int> indices = math_utils::sortIdx(signature);
+    QVector<int> indices = sortIdx(signature);
 
 
     // 3 nodes
 
@@ -1,7 +1,6 @@
 #ifndef GRAPH_H
 #define GRAPH_H
 #define ALL_GRAPHLET    29
-#include <qdebug.h>
 #include <QVector>
 #include <cassert>
 #include <QQueue>
@@ -90,10 +89,8 @@ class Graph
     QVector<int> BfsLayersID(int sid, int neighborSize);
 
     // utility functions
-    QVector<int> sortIdx(QVector<int>& vec);
     int graphType(Graph glet);
     int gletType(GraphLet glet);
-    int gletTypeNum(int k);
     int findGletNode(const GraphLet& glet, int nodeId);
     void printGraphlet(const GraphLet& graphlet);
 
 
@@ -1,4 +1,6 @@
 #include "graph_sim.h"
+#include <QFile>
+#include <QDebug>
 
 Graph GraphSimilarity::readGraph(const QString &fileName)
 {
 
@@ -1,15 +1,14 @@
 // This class is responsible for computing the topology feature of nodes and the similarity scores
 #ifndef GRAPHSIMILARITY_H
 #define GRAPHSIMILARITY_H
-#include "math_utils.h"
 #include "graph.h"
-#include <QFile>
+#include "math_utils.h"
 
 class GraphSimilarity
 {
 public:
     GraphSimilarity(int bfs_level = 1,
-                    math_utils::KernelFunc kernel_func = math_utils::KernelFunc::COS)
+                    KernelFunc kernel_func = KernelFunc::COS)
         :m_bfs_level(bfs_level), m_kernel(kernel_func)
     {
 
@@ -26,7 +25,7 @@ class GraphSimilarity
 
 private:
     int m_bfs_level;
-    math_utils::KernelFunc m_kernel;
+    KernelFunc m_kernel;
 };
 
 #endif // GRAPHSIMILARITY_H
@@ -1,17 +1,18 @@
+#include "graph_sim.h"
 #include <QJsonValue>
 #include <QJsonArray>
 #include <QJsonObject>
 #include <QJsonDocument>
 #include <QJsonParseError>
 #include <QIODevice>
 #include <QDir>
-#include "graph_sim.h"
+#include <QDebug>
 
 
 int main(int argc, char *argv[])
 {
-//    QString config_path = argv[1];
-    QString config_path = "/Users/joe/Codes/PythonProjects/joint_tsne_experiments/config/5_cluster_trans_split_overlap_contract.json";
+   QString config_path = argv[1];
+    // QString config_path = "/Users/joe/Codes/PythonProjects/joint_tsne_experiments/config/5_cluster_trans_split_overlap_contract.json";
 
     QString rootDir = "/Users/joe/Codes/PythonProjects/joint_tsne_experiments/";
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,6 @@`
`1`	`1`	`#include "graph_sim.h"`
	`2`	`+#include <QFile>`
	`3`	`+#include <QDebug>`
`2`	`4`
`3`	`5`	`Graph GraphSimilarity::readGraph(const QString &fileName)`
`4`	`6`	`{`