Tags¶
This file is auto-generated. Do not edit by hand.
activations¶
- ELU (Exponential Linear Unit)
- GELU (Gaussian Error Linear Unit)
- Leaky ReLU
- ReLU (Rectified Linear Unit)
- SELU (Scaled ELU)
- Sigmoid (Logistic)
- Softmax
- Softplus
- Swish / SiLU
- Tanh (Hyperbolic Tangent)
algorithms¶
- Augmenting Path
- Bellman-Ford Algorithm
- Breadth-First Search (BFS)
- Connected Components
- Depth-First Search (DFS)
- Dijkstra's Algorithm
- Dinic's Algorithm
- Edmonds-Karp Algorithm
- Fast Fourier Transform
- Ford-Fulkerson Method
- Global Minimum Cut
- Graph Algorithms (Overview)
- Karger's Algorithm (Min Cut)
- Kruskal's Algorithm
- Max-Flow Min-Cut Theorem
- Maximum Flow
- Minimum Cut
- Minimum Spanning Tree (MST)
- Prim's Algorithm
- Residual Graph
- Shortest Path (Overview)
- Topological Sort
approximation¶
asymptotics¶
bayesian¶
bounds¶
calculus¶
- Backpropagation
- Chain Rule
- Derivative
- Gradient
- Hessian
- Integral
- Jacobian
- Multivariable Chain Rule
- Partial Derivative
- Taylor Expansion
calibration¶
causal-inference¶
central-tendency¶
classification¶
- Average Precision
- Class Imbalance
- Cohen's Kappa
- Confusion Matrix
- F1 Score
- K-Nearest Neighbors (k-NN)
- Log Loss (Binary Cross-Entropy)
- Logistic Regression
- Matthews Correlation Coefficient
- Precision
- Precision-Recall Curve
- Probability Calibration
- Recall
- ROC Curve
- Sensitivity (True Positive Rate)
- Specificity (True Negative Rate)
- Threshold Selection
clustering¶
- Adjusted Rand Index (ARI)
- Calinski-Harabasz Score
- Davies-Bouldin Index
- DBSCAN
- Density-Based Clustering Validation (DBCV)
- Fowlkes-Mallows Index
- Hierarchical Clustering
- K-Means Clustering
- Normalized Mutual Information (NMI)
- Silhouette Score
- Spectral Clustering
community-detection¶
conditional¶
connectivity¶
constraints¶
continuous¶
- Beta Distribution
- Exponential Distribution
- Gamma Distribution
- Lognormal Distribution
- Normal Distribution (Gaussian)
- PDF (Probability Density Function)
convex-analysis¶
convexity¶
dag¶
data-quality¶
decision-making¶
decoding¶
deep-learning¶
- Activation Functions
- Adam Optimizer
- AdamW Optimizer
- Attention
- Backpropagation
- Batch Normalization
- Cosine Embedding Loss (Metric Learning)
- Cross-Attention
- Dropout
- ELU (Exponential Linear Unit)
- Embedding
- Feedforward Network (Transformer FFN)
- GELU (Gaussian Error Linear Unit)
- Gradient Clipping
- Layer Normalization
- Leaky ReLU
- Learning Rate Schedule
- Learning Rate Warmup
- MLP (Multi-Layer Perceptron)
- Momentum (SGD with Momentum)
- Multi-Head Attention
- Positional Encoding
- Regularization (L1/L2)
- ReLU (Rectified Linear Unit)
- Residual Connection (Skip Connection)
- RMSProp
- Self-Attention
- SELU (Scaled ELU)
- Sigmoid (Logistic)
- Softmax
- Softplus
- Swish / SiLU
- Tanh (Hyperbolic Tangent)
- Transformer
- Weight Decay
density-based¶
dependence¶
dimensionality-reduction¶
discrete¶
- Bernoulli Distribution
- Binomial Distribution
- Geometric Distribution
- Multinomial Distribution
- PMF (Probability Mass Function)
- Poisson Distribution
dispersion¶
distribution¶
- Bernoulli Distribution
- Beta Distribution
- Binomial Distribution
- CDF (Cumulative Distribution Function)
- Exponential Distribution
- Gamma Distribution
- Geometric Distribution
- Lognormal Distribution
- Multinomial Distribution
- Normal Distribution (Gaussian)
- PDF (Probability Density Function)
- PMF (Probability Mass Function)
- Poisson Distribution
- Probability Distribution
- Uniform Distribution
distributions¶
divergence¶
effect-size¶
embeddings¶
estimation¶
evaluation¶
- Adjusted Rand Index (ARI)
- Akaike Information Criterion (AIC)
- Average Precision
- Bayesian Information Criterion (BIC)
- Brier Score
- Calibration Error (ECE)
- Calinski-Harabasz Score
- Class Imbalance
- Cohen's Kappa
- Confusion Matrix
- Cross-Validation
- Data Leakage
- Davies-Bouldin Index
- Density-Based Clustering Validation (DBCV)
- F1 Score
- Fowlkes-Mallows Index
- Matthews Correlation Coefficient
- Normalized Mutual Information (NMI)
- Precision
- Precision-Recall Curve
- Recall
- ROC Curve
- Sensitivity (True Positive Rate)
- Silhouette Score
- Specificity (True Negative Rate)
- Train/Validation/Test Split
experimentation¶
external-validation¶
fdr¶
features¶
first-order¶
frequency¶
generalization¶
gnn¶
graph-ml¶
graph-theory¶
- Adjacency Matrix
- Clustering Coefficient
- Degree Matrix
- Edge Cut vs Vertex Cut
- Flow Network
- Global Minimum Cut
- Graph Fourier Transform
- Graph Laplacian
- Graph Motifs
- PageRank
- Spectral Clustering
graphs¶
- Augmenting Path
- Bellman-Ford Algorithm
- Breadth-First Search (BFS)
- Connected Components
- Depth-First Search (DFS)
- Dijkstra's Algorithm
- Dinic's Algorithm
- Edge Cut vs Vertex Cut
- Edmonds-Karp Algorithm
- Flow Network
- Ford-Fulkerson Method
- Global Minimum Cut
- Graph Algorithms (Overview)
- Karger's Algorithm (Min Cut)
- Kruskal's Algorithm
- Label Propagation (LPA)
- Max-Flow Min-Cut Theorem
- Maximum Flow
- Message Passing (on Graphs)
- Minimum Cut
- Minimum Spanning Tree (MST)
- Modularity (Community Quality)
- Network Significance Profile
- Prim's Algorithm
- Residual Graph
- Shortest Path (Overview)
- Topological Sort
hypothesis-testing¶
- Benjamini-Hochberg Procedure
- Chi-Square Test
- False Discovery Rate (FDR)
- Multiple Hypothesis Testing
- Null Hypothesis
- One-Sample t-Test
- P-Value
- Paired t-Test
- Permutation Test
- Statistical Power
- Two-Sample t-Test (Welch)
- Type I and Type II Errors
inference¶
- ANOVA (Analysis of Variance)
- Bayes' Rule
- Bootstrap
- Cohen's d (Effect Size)
- Confidence Interval
- False Discovery Rate (FDR)
- Maximum A Posteriori (MAP)
- Maximum Likelihood Estimation (MLE)
- Multiple Hypothesis Testing
information-retrieval¶
information-theory¶
- Channel Capacity
- Conditional Entropy
- Cross Entropy
- Data Processing Inequality
- Entropy Rate
- Fisher Information
- Jensen-Shannon Divergence
- Joint Entropy
- KL Divergence (Relative Entropy)
- Mutual Information
- Normalized Mutual Information (NMI)
- Perplexity
- Shannon Entropy
interpretability¶
joint¶
likelihood¶
linear-algebra¶
- Adjacency Matrix
- Cholesky Decomposition
- Degree Matrix
- Determinant
- Dot Product
- Eigendecomposition
- Eigenvalues and Eigenvectors
- Graph Laplacian
- Jacobian
- Least Squares
- Linear Independence
- Matrix Rank
- Multivariable Chain Rule
- Nuclear Norm (Trace Norm)
- Orthogonal Projection
- Orthonormal Basis
- PCA Explained Variance Ratio
- Positive Definite Matrix
- Principal Component Analysis (PCA)
- Pseudo-inverse (Moore-Penrose)
- QR Decomposition
- Scree Plot
- Singular Value Decomposition (SVD)
- Span
- Trace
- Vector Norms
linear-systems¶
llms¶
- Causal Language Modeling
- Language Model
- Next-Token Prediction
- Subword Tokenization (BPE / WordPiece)
- Temperature (Sampling)
- Tokenization
- Top-k Sampling
- Top-p Sampling (Nucleus)
machine-learning¶
- Bias-Variance Tradeoff
- Categorical Encoding
- Class Imbalance
- Concept Drift
- Cross-Validation
- Data Drift
- Data Leakage
- DBSCAN
- Decision Tree
- Elastic Net
- Feature Scaling (Standardization vs Normalization)
- Feature Store
- Gradient Boosting
- Hierarchical Clustering
- Imputation
- K-Means Clustering
- K-Nearest Neighbors (k-NN)
- Lasso Regression (L1)
- Linear Regression
- Logistic Regression
- Machine Learning Pipeline
- Missing Data
- Model Interpretability
- Partial Dependence Plot (PDP)
- Permutation Feature Importance
- Probability Calibration
- Random Forest
- Ridge Regression (L2)
- SHAP Values
- Threshold Selection
- Train/Validation/Test Split
matrix-decomposition¶
metric-learning¶
metrics¶
- Confusion Matrix
- Dice Coefficient (Sorensen-Dice)
- Jaccard Similarity / IoU
- Log Loss (Binary Cross-Entropy)
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- R^2 (Coefficient of Determination)
- ROC AUC
- Root Mean Squared Error (RMSE)
- Sensitivity (True Positive Rate)
- Specificity (True Negative Rate)
ml¶
- Cross Entropy
- Dice Coefficient (Sorensen-Dice)
- Jaccard Similarity / IoU
- Log Loss (Binary Cross-Entropy)
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Negative Log-Likelihood (NLL)
- Perplexity
- R^2 (Coefficient of Determination)
- ROC AUC
- Root Mean Squared Error (RMSE)
mlp¶
model-selection¶
- Akaike Information Criterion (AIC)
- Bayesian Information Criterion (BIC)
- Bias-Variance Tradeoff
- Cross-Validation
- Early Stopping
models¶
motifs¶
mst¶
multivariable¶
network-flow¶
- Augmenting Path
- Dinic's Algorithm
- Edmonds-Karp Algorithm
- Flow Network
- Ford-Fulkerson Method
- Max-Flow Min-Cut Theorem
- Maximum Flow
- Minimum Cut
- Residual Graph
networks¶
neural-networks¶
- Activation Functions
- Batch Normalization
- ELU (Exponential Linear Unit)
- GELU (Gaussian Error Linear Unit)
- Layer Normalization
- Leaky ReLU
- MLP (Multi-Layer Perceptron)
- ReLU (Rectified Linear Unit)
- Residual Connection (Skip Connection)
- SELU (Scaled ELU)
- Sigmoid (Logistic)
- Softmax
- Softplus
- Swish / SiLU
- Tanh (Hyperbolic Tangent)
nlp¶
- Attention
- Causal Language Modeling
- Cross-Attention
- Embedding
- Inverse Document Frequency (IDF)
- Language Model
- Masked Language Modeling (MLM)
- Next-Token Prediction
- Self-Attention
- Subword Tokenization (BPE / WordPiece)
- Temperature (Sampling)
- Term Frequency (TF)
- TF-IDF
- Tokenization
- Top-k Sampling
- Top-p Sampling (Nucleus)
- Transformer
normalization¶
numerical¶
optimization¶
- Adam Optimizer
- AdamW Optimizer
- Backpropagation
- Early Stopping
- Gradient
- Gradient Clipping
- Gradient Descent
- Hessian
- KKT Conditions
- KL Divergence (Relative Entropy)
- Lagrangian
- Lagrangian Duality
- Learning Rate Schedule
- Learning Rate Warmup
- Least Squares
- Line Search
- Max-Flow Min-Cut Theorem
- Maximum Flow
- Minimum Cut
- Minimum Spanning Tree (MST)
- Momentum (SGD with Momentum)
- Negative Log-Likelihood (NLL)
- Newton's Method
- Nuclear Norm (Trace Norm)
- Positive Definite Matrix
- Proximal Operator
- Regularization (L1/L2)
- RMSProp
- Stochastic Gradient Descent
- Weight Decay
pca¶
preprocessing¶
pretraining¶
probability¶
- Bayes' Rule
- Bernoulli Distribution
- Bernoulli Trial
- Beta Distribution
- Binomial Distribution
- CDF (Cumulative Distribution Function)
- Central Limit Theorem
- Characteristic Function
- Chebyshev's Inequality
- Chi-Square Test
- Conditional Entropy
- Conditional Expectation
- Conditional Probability
- Cross Entropy
- Expectation
- Exponential Distribution
- Gamma Distribution
- Geometric Distribution
- Independence
- Jensen's Inequality
- Joint Entropy
- Joint Probability
- KL Divergence (Relative Entropy)
- Language Model
- Law of Large Numbers
- Law of Total Probability
- Lognormal Distribution
- Marginal Probability
- Markov's Inequality
- Mean (Expected Value)
- Median
- Moment Generating Function
- Multinomial Distribution
- Mutual Information
- Normal Distribution (Gaussian)
- One-Sample t-Test
- Paired t-Test
- PDF (Probability Density Function)
- PMF (Probability Mass Function)
- Poisson Distribution
- Probability Distribution
- Random Variable
- Shannon Entropy
- Sigmoid (Logistic)
- Softmax
- Standard Deviation
- Two-Sample t-Test (Welch)
- Uniform Distribution
- Variance
- Z-Score
production¶
random-variable¶
randomized¶
ranking¶
regression¶
- Elastic Net
- Lasso Regression (L1)
- Linear Regression
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- R^2 (Coefficient of Determination)
- Ridge Regression (L2)
- Root Mean Squared Error (RMSE)
regularization¶
representation-learning¶
resampling¶
sampling¶
second-order¶
sequence-modeling¶
shortest-path¶
signal-processing¶
- Aliasing
- Convolution
- Discrete Fourier Transform
- Fast Fourier Transform
- Fourier Transform
- Graph Fourier Transform
- Nyquist-Shannon Sampling Theorem
similarity¶
spectral¶
statistics¶
- A/B Testing
- Akaike Information Criterion (AIC)
- ANOVA (Analysis of Variance)
- Bayesian Information Criterion (BIC)
- Benjamini-Hochberg Procedure
- Bernoulli Distribution
- Bernoulli Trial
- Beta Distribution
- Binomial Distribution
- Bootstrap
- CDF (Cumulative Distribution Function)
- Chi-Square Test
- Cohen's d (Effect Size)
- Conditional Expectation
- Conditional Probability
- Confidence Interval
- Correlation
- Covariance
- Expectation
- Exponential Distribution
- False Discovery Rate (FDR)
- Fisher Information
- Gamma Distribution
- Geometric Distribution
- Graph Motifs
- Independence
- Joint Probability
- Lognormal Distribution
- Marginal Probability
- Maximum A Posteriori (MAP)
- Maximum Likelihood Estimation (MLE)
- Mean (Expected Value)
- Median
- Multinomial Distribution
- Multiple Hypothesis Testing
- Network Significance Profile
- Normal Distribution (Gaussian)
- Null Hypothesis
- One-Sample t-Test
- P-Value
- Paired t-Test
- PDF (Probability Density Function)
- Permutation Test
- PMF (Probability Mass Function)
- Poisson Distribution
- Power Analysis (Sample Size Planning)
- Probability Distribution
- Random Variable
- Regularization (L1/L2)
- Standard Deviation
- Statistical Power
- Two-Sample t-Test (Welch)
- Type I and Type II Errors
- Uniform Distribution
- Variance
- Z-Score
stochastic¶
stochastic-processes¶
study-design¶
supervised-learning¶
text¶
text-processing¶
tokenization¶
transformers¶
- Attention
- Causal Language Modeling
- Cross-Attention
- Feedforward Network (Transformer FFN)
- GELU (Gaussian Error Linear Unit)
- Masked Language Modeling (MLM)
- Multi-Head Attention
- Next-Token Prediction
- Positional Encoding
- Self-Attention
- Transformer