Skip to content

Commit ecadccb

Browse files
committed
finish readme
1 parent 8d39d5c commit ecadccb

23 files changed

Lines changed: 172 additions & 6275 deletions

File tree

README.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Joint t-sne
2+
This is the implementation for paper [Joint t-SNE for Comparable Projections of Multiple High-Dimensional Datasets]([ddd](http://www.yunhaiwang.net/Vis2021/joint-tsne)).
3+
4+
## abstract:
5+
We present Joint t-Stochastic Neighbor Embedding (Joint t-SNE), a technique to generate comparable projections of multiple high-dimensional datasets. Although t-SNE has been widely employed to visualize high-dimensional datasets from various domains, it is limited to projecting a single dataset. When a series of high-dimensional datasets, such as datasets changing over time, is projected independently using t-SNE, misaligned layouts are obtained. Even items with identical features across datasets are projected to different locations, making the technique unsuitable for comparison tasks. To tackle this problem, we introduce edge similarity, which captures the similarities between two adjacent time frames based on the Graphlet Frequency Distribution (GFD). We then integrate a novel loss term into the t-SNE loss function, which we call vector constraints, to preserve the vectors between projected points across the projections, allowing these points to serve as visual landmarks for direct comparisons between projections. Using synthetic datasets whose ground-truth structures are known, we show that Joint t-SNE outperforms existing techniques, including Dynamic t-SNE, in terms of local coherence error, Kullback-Leibler divergence, and neighborhood preservation. We also showcase a real-world use case to visualize and compare the activation of different layers of a neural network.
6+
7+
8+
### Environment
9+
+ This is a hybrid programming based on C++ and Python, and supported by shell script.
10+
+ It requires [Qt](https://www.qt.io/), [Python 3.6](https://www.python.org/), [numpy](https://numpy.org/) and [scikit-learn](https://scikit-learn.org/).
11+
12+
## How to use
13+
1. Put the directory of your data sequence, e.g. "YOUR_DATA" in **Joint_tsne/data**. There are several requirements on the format and organization of your data:
14+
+ Each data frame is named as *f_i.txt*, where *i* is the time step/index of this data frame in the sequence.
15+
+ The *j* th row of the data frame contains both the feature vector and label of the *j* th item, which is seperated by \tab. The label is at the last position.
16+
+ All data frames must have the same number of rows, and the the same item is at the same row in different data frames to compute the node similarities one by one.
17+
18+
19+
2. Create a configuration file, e.g. "YOUR_DATA.json" in **Joint_tsne/config**, which is organized as a json structure.
20+
21+
<code>
22+
23+
```json
24+
{
25+
"algo": {
26+
"k_closest_count": 3,
27+
"perplexity": 70,
28+
"bfs_level": 1,
29+
"gamma": 0.1
30+
},
31+
"thesne": {
32+
"data_name": "YOUR_DATA",
33+
"pts_size": 2000,
34+
"norm": false,
35+
"data_ids": [1, 3, 6, 9],
36+
"data_dims": [100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
37+
"data_titles": [
38+
"t=0",
39+
"t=1",
40+
"t=2",
41+
"t=3",
42+
"t=4",
43+
"t=5",
44+
"t=6",
45+
"t=7",
46+
"t=8",
47+
"t=9"
48+
]
49+
}
50+
}
51+
```
52+
</code>
53+
54+
In this file, *algo* represents the hyperparamters of our algorithm except for *bfs_level*, which always equals to 1. *thesne* contains the information of the input data. Please remember that *data_name* must be consistent with the directory name in the previous step.
55+
56+
3. Create a shell script e.g. "YOUR_DATA.sh" in **Joint_tsne/scripts** as below:
57+
58+
<code>
59+
60+
```shell
61+
# !/bin/bash
62+
# 1. specify the configuration file with absolute file path
63+
config_path="xxx/Joint_tsne/config/YOUR_DATA.json"
64+
65+
workdir=$(cd $(dirname $0); pwd)
66+
67+
# 2. build knn graph for each data frame
68+
python3 ../codes/graphBuild/run.py $config_path
69+
70+
# 3. compute edge similarities between each two adjacent data frames
71+
buildDir="../codes/graphSim/build"
72+
if [ ! -d $buildDir ]; then
73+
mkdir $buildDir
74+
echo "create directory ${buildDir}"
75+
else
76+
echo "directory ${buildDir} already exists."
77+
fi
78+
cd $buildDir
79+
qmake ../
80+
make
81+
82+
# bin is dependent on your operating system
83+
bin=./graphSim.app/Contents/MacOS/graphSim
84+
$bin $config_path
85+
86+
cd $workdir
87+
88+
# 4. run t-sne optimization
89+
python3 ../codes/thesne/run.py $config_path
90+
```
91+
</code>
92+
93+
There are several places you should pay attention to.
94+
+ Again, *config_path* must be consitent with the name of configuration file in previous step
95+
+ *bin* is dependent on your operating system. If you use linux, you should change it to
96+
97+
bin=./graphSim
98+
99+
4. change your directory to **Joint_tsne/scripts** and type
100+
101+
<code>
102+
103+
sh YOUR_SHELL.sh
104+
105+
</code>
106+
107+
The final embeddings will be generated in **Joint_tsne/results/YOUR_DATA**.
108+
109+
110+
111+
112+
### Example
113+
You can find an example in **Joint_tsne/scripts/10_cluster_contract.sh**.
-210 Bytes
Binary file not shown.

codes/graphBuild/dg_utils.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,6 @@ def ClearDir(dirpath):
1212
os.makedirs(dirpath)
1313

1414

15-
def GetFilesIn(dir):
16-
res = []
17-
for dir, _, files in os.walk(dir):
18-
for file in files:
19-
res.append(dir + file)
20-
return res
21-
22-
2315
def GetGraphIDFromPath(path):
2416
print(path)
2517
ID = path.split(".")[-2].split("_")[-1]

codes/graphSim/build-graphSim-Desktop_Qt_5_13_1_clang_64bit-Debug/.qmake.stash

Lines changed: 0 additions & 44 deletions
This file was deleted.

codes/graphSim/graph.cpp

Lines changed: 4 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
#include "graph.h"
22
#include <QDebug>
3-
43
GuiseGraphlet Graph::GuiseInit(int gletSize, int sid)
54
{
65
QVector<bool> visited(nodeNum(), false);
@@ -182,7 +181,7 @@ void Graph::GUISE(int sCount, int sid)
182181

183182
QVector<GuiseGraphlet> dgy = popNeighborGuise(gy);
184183

185-
float acProb = std::min((float)dgx.size()/dgy.size(), 1.f);
184+
float acProb = std::min(static_cast<float>(dgx.size())/dgy.size(), 1.f);
186185
float r = static_cast <float> (rand()) / static_cast <float> (RAND_MAX);
187186
if (r < acProb) // accept it
188187
{
@@ -496,7 +495,7 @@ QVector<float> Graph::GetfeatureVectorRW(int sid, int neighborSize)
496495
for (int i = 0; i < layers.size(); i++)
497496
{
498497
QVector<float> tmp = layers[i];
499-
math_utils::normalize(tmp);
498+
normalize(tmp);
500499
featureVector.append(tmp);
501500
}
502501

@@ -516,7 +515,7 @@ int Graph::graphType(Graph glet)
516515
}
517516

518517
// sort the degrees and return sorted index
519-
QVector<int> indices = math_utils::sortIdx(signature);
518+
QVector<int> indices = sortIdx(signature);
520519

521520

522521
// 3 nodes
@@ -666,24 +665,6 @@ int Graph::graphType(Graph glet)
666665
}
667666
}
668667

669-
670-
int Graph::gletTypeNum(int k)
671-
{
672-
if (k == 3)
673-
{
674-
return 2;
675-
}
676-
else if (k == 4)
677-
{
678-
return 6;
679-
}
680-
else if (k == 5)
681-
{
682-
return 21;
683-
}
684-
}
685-
686-
687668
void Graph::printGraphlet(const GraphLet &graphlet)
688669
{
689670
// print adjacency list
@@ -710,7 +691,7 @@ int Graph::gletType(GraphLet glet)
710691
}
711692

712693
// sort the degrees and return sorted index
713-
QVector<int> indices = math_utils::sortIdx(signature);
694+
QVector<int> indices = sortIdx(signature);
714695

715696

716697
// 3 nodes

codes/graphSim/graph.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
#ifndef GRAPH_H
22
#define GRAPH_H
33
#define ALL_GRAPHLET 29
4-
#include <qdebug.h>
54
#include <QVector>
65
#include <cassert>
76
#include <QQueue>
@@ -90,10 +89,8 @@ class Graph
9089
QVector<int> BfsLayersID(int sid, int neighborSize);
9190

9291
// utility functions
93-
QVector<int> sortIdx(QVector<int>& vec);
9492
int graphType(Graph glet);
9593
int gletType(GraphLet glet);
96-
int gletTypeNum(int k);
9794
int findGletNode(const GraphLet& glet, int nodeId);
9895
void printGraphlet(const GraphLet& graphlet);
9996

codes/graphSim/graph_sim.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
#include "graph_sim.h"
2+
#include <QFile>
3+
#include <QDebug>
24

35
Graph GraphSimilarity::readGraph(const QString &fileName)
46
{

codes/graphSim/graph_sim.h

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,14 @@
11
// This class is responsible for computing the topology feature of nodes and the similarity scores
22
#ifndef GRAPHSIMILARITY_H
33
#define GRAPHSIMILARITY_H
4-
#include "math_utils.h"
54
#include "graph.h"
6-
#include <QFile>
5+
#include "math_utils.h"
76

87
class GraphSimilarity
98
{
109
public:
1110
GraphSimilarity(int bfs_level = 1,
12-
math_utils::KernelFunc kernel_func = math_utils::KernelFunc::COS)
11+
KernelFunc kernel_func = KernelFunc::COS)
1312
:m_bfs_level(bfs_level), m_kernel(kernel_func)
1413
{
1514

@@ -26,7 +25,7 @@ class GraphSimilarity
2625

2726
private:
2827
int m_bfs_level;
29-
math_utils::KernelFunc m_kernel;
28+
KernelFunc m_kernel;
3029
};
3130

3231
#endif // GRAPHSIMILARITY_H

codes/graphSim/main.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
1+
#include "graph_sim.h"
12
#include <QJsonValue>
23
#include <QJsonArray>
34
#include <QJsonObject>
45
#include <QJsonDocument>
56
#include <QJsonParseError>
67
#include <QIODevice>
78
#include <QDir>
8-
#include "graph_sim.h"
9+
#include <QDebug>
910

1011

1112
int main(int argc, char *argv[])
1213
{
13-
// QString config_path = argv[1];
14-
QString config_path = "/Users/joe/Codes/PythonProjects/joint_tsne_experiments/config/5_cluster_trans_split_overlap_contract.json";
14+
QString config_path = argv[1];
15+
// QString config_path = "/Users/joe/Codes/PythonProjects/joint_tsne_experiments/config/5_cluster_trans_split_overlap_contract.json";
1516

1617
QString rootDir = "/Users/joe/Codes/PythonProjects/joint_tsne_experiments/";
1718

0 commit comments

Comments
 (0)