Google releases machine learning glossary (full version)

The Google Engineering Education team has released a multilingual Google Machine Learning Glossary that lists general Machine Learning terminology and TensorFlow-specific definitions. Language versions include Spanish, French, Korean, and Simplified Chinese. A/B testing is a statistical method used to compare two or more technologies, typically comparing current technologies with new ones. The A/B test not only aims to determine which technology performs better but also helps assess whether the difference is statistically significant. Although primarily used for comparing two technologies, it can also apply to any finite number of techniques and measurements. Accuracy is the proportion of correct predictions made by a classification model. In multi-category classification, accuracy is defined as follows: Accuracy = (Number of Correct Predictions) / (Total Number of Samples) In binary classification, accuracy is calculated as: Accuracy = (True Positives + True Negatives) / (Total Number of Samples) See examples of true positives and true negatives for clarification. An activation function, such as ReLU or a sigmoid function, is used to weight all inputs from the previous layer, generate an output value (usually non-linear), and pass it to the next layer. AdaGrad is an advanced gradient descent method that adjusts the learning rate for each parameter individually. For more details, refer to the paper on AdaGrad. Area Under the ROC Curve (AUC) is an evaluation metric that considers all possible classification thresholds. It measures the probability that a randomly selected positive instance is ranked higher than a randomly selected negative one. Backpropagation algorithm is the main method for performing gradient descent on a neural network. It calculates outputs in forward propagation and then computes partial derivatives of the loss function relative to each parameter during backpropagation. A baseline is a simple model or heuristic used as a reference point for comparing model performance. Benchmarks help developers quantify the minimum expected effect for a particular problem. A batch refers to the set of samples used in one iteration of model training (one gradient update). Batch size is the number of samples in a batch. For example, SGD uses a batch size of 1, while small batches typically range from 10 to 1000. Batch size is usually fixed during training and inference, though TensorFlow allows dynamic batch sizes. Bias represents the intercept or offset from the origin in machine learning models, often denoted as b or w0. For example, in the formula below, the bias is: y = wx + b Do not confuse this with prediction bias. Binary classification involves predicting one of two mutually exclusive categories. For example, an email classifier that identifies spam or non-spam is a binary classifier. Binning, also known as bucketing, converts a continuous feature into multiple binary features based on value intervals. For instance, temperature can be split into discrete bins instead of being represented as a single continuous value. Calibration layer is a post-prediction adjustment to reduce prediction bias, ensuring predicted probabilities align with the observed label distribution. Candidate sampling optimizes training by calculating probabilities for all positive labels but only for a random subset of negative labels. This reduces computational cost while maintaining model effectiveness. Categorical data consists of discrete values, such as house styles with specific options. These values are often mutually exclusive or can have multiple entries. Classification features are sometimes referred to as discrete features, contrasting with numerical data. Checkpoint is a type of data capturing the state of model variables at a specific time, enabling model weights derivation, training across sessions, and recovery after errors. Category refers to one of the target values enumerated for a tag. In a binary classification model for spam detection, categories are "spam" and "non-spam." Class-imbalanced datasets have a large gap in frequency between the two classes. For example, a disease dataset where 0.0001 of samples are positive and 0.9999 are negative is imbalanced. Classification models distinguish between two or more discrete categories. They differ from regression models that predict continuous values. Classification threshold is a scalar value used to differentiate positive and negative categories, often applied to logistic regression results to map them to binary classifications. Collaborative filtering predicts user interests based on others' preferences, commonly used in recommendation systems. Confusion matrix is an NxN table summarizing a classification model's predicted outcomes, showing the relationship between actual and predicted labels. It helps calculate metrics like accuracy and recall. Continuous features have an unbounded range of possible values, unlike discrete features. Convergence refers to a state achieved during training where losses change minimally. Deep learning models may temporarily appear converged before final decline. Convex function has a shape resembling the letter U, with the area above the curve forming a convex set. Many loss functions, including L2 loss and log loss, are convex. Convex optimization is the process of finding the minimum of a convex function using methods like gradient descent. Much research focuses on expressing problems as convex optimization problems. Convex set is a subset of Euclidean space where the line segment connecting any two points lies entirely within the set. Cost is synonymous with loss. Cross-entropy generalizes log loss to multi-class classification, quantifying the difference between two probability distributions. Custom Estimator allows users to define their own Estimator, contrasting with pre-made Estimators. Dataset is a collection of samples, with the Dataset API (tf.data) providing tools for reading and converting data for machine learning algorithms. Decision boundary is the boundary learned by a model in binary or multi-class classification problems, separating different categories. Dense layer is equivalent to a fully connected layer. Depth model contains multiple hidden layers, relying on trainable nonlinear relationships, contrasting with width models. Dense feature has most values non-zero, typically a floating-point tensor, contrasting with sparse features. Derived feature is another term for synthetic features. Discrete feature has a limited number of possible values, such as categories like "animal," "vegetable," or "mineral," contrasting with continuous features. Dropout regularization is a technique that randomly removes neurons during training to prevent overfitting, simulating smaller networks. Dynamic model is trained online, continuously updated with incoming data. Early stopping is a regularization method that halts training when validation loss begins to increase, preventing overfitting. Embedding is a classification feature expressed as a continuous value, mapping high-dimensional vectors to lower dimensions. For example, word representations can be sparse vectors or dense vectors. Empirical risk minimization (ERM) selects model functions minimizing loss based on the training set, contrasting with structural risk minimization. Ensemble learning combines predictions from multiple models, creating integrated learning through different initializations, hyperparameters, or structures. Epoch is a complete traversal of the entire dataset during training, representing N/batch size iterations, where N is the total number of samples. Estimator is an instance of the tf.Estimator class encapsulating logic for building a TensorFlow graph and running a session. Example is a row in the dataset containing one or more features and possibly a label. False Negative Example (FN) is a sample incorrectly predicted as negative. For instance, an email predicted as non-spam but actually spam. False Positive Example (FP) is a sample incorrectly predicted as positive. For example, an email predicted as spam but actually not. False Positive Rate (FPR) is the x-axis in the ROC curve, defined as: FPR = False Positives / (False Positives + True Negatives) Feature is an input variable used for making predictions. Feature Column is a set of related features, such as a collection of all countries a user might reside in. It includes metadata like data type and whether it should be nested. Feature Cross is a synthetic feature formed by combining individual features, helping represent nonlinear relationships. Feature Engineering involves determining useful features and converting raw data into required features. In TensorFlow, it often refers to converting log files to tf.Example proto buffers. Feature Set is a collection of features used for training machine learning models, such as postal codes, floor space, and house status for predicting house prices. Feature Spec describes how to extract feature data from tf.Example proto buffers, specifying keys, data types, and lengths. Full Softmax is the same as softmax, contrasting with candidate sampling. Fully Connected Layer is a hidden layer where each node connects to every node in the next layer. Generalization refers to a model's ability to make correct predictions on new data not seen during training. Generalized Linear Model extends least squares regression to other types of models, including logistic regression and multi-category regression. Gradient is the partial derivative vector of a function with respect to all independent variables, indicating the direction of the fastest rise. Gradient Clipping sets an upper limit on gradient values to ensure numerical stability and prevent explosions. Gradient Descent is a technique for minimizing loss by calculating and reducing gradients based on training data, iteratively adjusting parameters to find optimal weights and biases. Graph is a computation specification in TensorFlow, with nodes representing operations and edges indicating data flow. Heuristic is a practical solution that is not necessarily optimal but sufficient for improvement or learning. Hidden Layer is a composite layer in a neural network between the input and output layers, with one or more hidden layers. Hinge Loss Function is used in classification to maximize margins between samples and decision boundaries, as in SVMs. Holdout Data is data not used during training, such as validation and test sets, helping assess a model's generalization ability. Hyperparameter is a tunable parameter in the model training process, such as the learning rate. Hyperplane divides a space into two subspaces, serving as a boundary in high-dimensional spaces for classification. IID (independently and identically distributed) refers to data extracted from a stable distribution where each value is independent of previously extracted values. Inference in machine learning refers to making predictions using trained models on unlabeled samples, contrasting with statistical inference. Input Function is a function in TensorFlow returning input data to Estimator methods for training, evaluation, or prediction. Input Layer is the first layer in a neural network receiving input data. Instance is a synonym for a sample. Interpretability refers to the ease of explaining a model's predictions, with depth models often being less interpretable compared to linear models. Inter-rater Agreement measures how often raters agree on a task, indicating the quality of task descriptions. Iteration involves updating model weights during training, calculating gradients on a single batch of data. Keras is a popular Python machine learning API capable of running on various deep learning frameworks, including TensorFlow. Kernel Support Vector Machines (KSVM) classify data by maximizing margins in higher-dimensional spaces, using hinge loss functions. L1 Loss Function is based on the absolute difference between predicted and actual values, less sensitive to outliers. L1 Regularization penalizes weights based on the sum of their absolute values, promoting sparsity in models. L2 Loss Function, also known as squared loss, is used in linear regression to minimize the square of the differences between predicted and actual values. L2 Regularization penalizes weights based on the sum of their squares, shrinking large weights towards zero. Label is the "answer" or "result" portion of a sample in supervised learning, containing features and a label. Labeled Example is a sample with features and labels, used for supervised training. Lambda is synonymous with regularization rate, representing the penalty strength in regularization. Layer is a set of neurons in a neural network processing input features or outputs. Layers API (tf.layers) constructs deep neural networks in a layered manner, allowing for different types of layers. Learning Rate is a variable used in gradient descent, multiplying the gradient to determine the step size. Least Squares Regression is a linear regression model trained by minimizing L2 loss. Linear Regression is a regression model that uses a continuous value as an output by linearly combining input features. Logistic Regression generates probabilities for discrete label values in classification problems by applying a sigmoid function to linear predictions. Log Loss Function is used in binary logistic regression to measure the discrepancy between predicted and actual values. Loss is a measure of deviation between a model's forecast and its label, indicating how bad the model is. Machine Learning is a program or system that constructs predictive models from input data, using learned models to make predictions on new data. Mean Square Error (MSE) is the average squared loss per sample, widely used in evaluating model performance. Metric is a value of interest, which may or may not be directly optimized in the machine learning system. Metrics API (tf.metrics) evaluates models, such as determining how often predictions match labels. Mini-batch is a small fraction of samples used in training, offering efficiency in computing losses. Mini-batch Stochastic Gradient Descent (SGD) uses small batches of samples to estimate gradients, with Vanilla SGD using a batch size of 1. ML is an abbreviation for machine learning. Model is a representation of knowledge learned by the machine learning system, encompassing both the graph and its parameters. Model Training is the process of determining the best model. Momentum is an advanced gradient descent method that considers the derivative of previous steps, helping avoid local minima. Multi-class Classification differentiates between two or more categories, such as identifying maple tree types. Multinomial Classification is a synonym for multi-class classification. NaN Trap occurs when a model becomes NaN during training, causing other numbers to eventually become NaN. Negative Class is one of two categories in binary classification, typically the category we are not looking for. Neural Network is a model inspired by the brain, consisting of multiple layers with nonlinear relationships. Neuron is a node in a neural network receiving multiple inputs and generating an output via an activation function. Node is a multi-meaning term referring to either a neuron in a hidden layer or an operation in a TensorFlow graph. Normalization is the process of converting a feature's value range to a standard range, such as -1 to 1 or 0 to 1. Numerical Data is represented by integers or real numbers, indicating mathematical relationships between values. Numpy is an open-source math library in Python for efficient array operations, with Pandas built on top of it. Objective is the metric an algorithm attempts to optimize. Offline Inference generates predictions, stores them, and retrieves them as needed, contrasting with online inference. One-hot Encoding is a sparse vector with one element set to 1 and others to 0, used for categorical data. One-vs.-All is a strategy for multi-class classification, involving multiple binary classifiers for each class. Online Inference generates predictions on demand, contrasting with offline inference. Operation is a node in a TensorFlow graph, representing processes that create, manipulate, or destroy tensors. Optimizer is a specific implementation of gradient descent, with TensorFlow's optimizer base class being tf.train.Optimizer. Outlier is a value significantly different from others, potentially causing issues during model training. Output Layer is the final layer in a neural network, containing the answer. Overfitting occurs when a model is too closely fitted to training data, failing to generalize to new data. Pandas is a column-oriented data analysis API, supported by many machine learning frameworks. Parameter is a variable in a machine learning system that is trained, such as weights. Parameter Server (PS) is a job tracking model parameters in a distributed setting. Parameter Update is the process of adjusting model parameters during training. Partial Derivative is a derivative with respect to one variable, keeping others constant. Partitioning Strategy is the algorithm used by parameter servers to split variables. Performance is a multi-meaning term, referring to software speed or model accuracy. Perplexity is a measure of how well a model performs a task, indicating the number of guesses needed to include the correct result. Pipeline is the infrastructure for machine learning, including data collection, training, and deployment. Positive Class is one of two categories in binary classification, typically the desired outcome. Precision is a classification model metric measuring the ratio of true positives to the sum of true positives and false positives. Prediction is the output of a model given an input sample. Prediction Bias is a value indicating the difference between the average prediction and the average label. Pre-made Estimator is an Estimator created by others, such as DNNClassifier or LinearClassifier in TensorFlow. Pre-trained Model is a model or component (like embeddings) already trained, sometimes used as input to a neural network. Prior Belief is the belief held about data before training, such as the assumption that weights should be small and normally distributed. Queue is a TensorFlow operation implementing a queue data structure, often used in I/O. Rank is a multi-meaning term referring to the dimension of a tensor or the position in a ranking. Rater is a person providing labels for samples, sometimes called an annotator. Recall is a classification model metric measuring the ratio of true positives to the sum of true positives and false negatives. Rectified Linear Unit (ReLU) is an activation function that outputs 0 for negative or zero inputs and the input value for positive inputs. Regression Model is a model that outputs continuous values, contrasting with classification models that output discrete values. Regularization is a penalty on model complexity, preventing overfitting through methods like L1, L2, dropout, and early stopping. Regularization Rate is a scalar value (lambda) indicating the importance of the regularization function. Representation is the process of mapping data to useful features. Receiver Operating Characteristic (ROC) Curve is a plot of true positive rate against false positive rate at different thresholds, with the area under the curve (AUC) as a performance metric. Root Directory is a directory specified for storing model checkpoints and event files. Root Mean Squared Error (RMSE) is the square root of the mean squared error. SavedModel is the recommended format for saving and restoring TensorFlow models, being language-independent and recoverable. Saver is a TensorFlow object responsible for saving model checkpoints. Scaling is a common practice in feature engineering, adjusting a feature's value range to match others in the dataset. scikit-learn is a popular open-source machine learning platform. Semi-supervised Learning trains models using some labeled and some unlabeled samples, inferring labels for unlabeled data. Sequence Model is a model where inputs have sequential dependencies, such as predicting the next video watched based on previous videos. Session maintains state in a TensorFlow program, such as variables. Sigmoid Function maps logistic regression outputs to probabilities between 0 and 1, commonly used in logistic regression. Softmax is a function providing probabilities for each possible class in a multi-class classification model, with probabilities summing to 1.0. Sparse Feature is a feature vector with mostly zero or missing values, contrasting with dense features. Squared Hinge Loss is the square of the hinge loss, punishing outliers more severely. Squared Loss, also known as L2 loss, is used in linear regression to compute the square of the difference between predicted and actual values. Static Model is a model trained offline. Stationarity refers to the property of data remaining consistent over time, important for reliable model performance.

All In One Computers

All In One Computers, as the name suggests, are devices that integrate a computer's mainframe and monitor into one unit. This unique design brings a number of remarkable features.
First of all, the simplicity and aesthetics is one of its major highlights. Compared with traditional desktops, All In One Computers has no cumbersome main chassis and numerous cables, and its overall appearance is much more smooth and stylish. It can be easily integrated into any modern home or office environment, adding a touch of elegance and sophistication to the space. Whether it is placed in the family living room, study, or office desk, it can become a bright landscape.
Secondly, the advantage of saving space is particularly prominent. In modern life, space is often a valuable resource. For a small office or home, the efficient use of space is crucial, and All In One Computers combines the mainframe and monitor into a single unit, greatly reducing the amount of desktop space occupied. It allows users to make better use of limited space than the bulky main chassis of traditional desktops, resulting in a cleaner, more organized desktop.
In addition, All In One Computers are often convenient. Users simply connect to the power supply and network, eliminating the need for complicated cable connections and assembly processes associated with traditional desktops. This is definitely a huge advantage for less tech-savvy users. It makes using computers easier and faster, and reduces the hassle associated with cable connection issues.
All in all, All In One Computers has gradually become a highly sought-after product in the modern computer market with its unique design and many advantages. It not only meets the users' needs for aesthetics and space utilization, but also brings users a convenient using experience.

All In One Computer For Home Office, Portable All In One Mini PC, High Performance All In One Mini Computer

Shenzhen Innovative Cloud Computer Co., Ltd. , https://www.xcypc.com

Posted on