Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default location for models has changed #636

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 53 additions & 14 deletions chapters/en/chapter2/3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,46 @@

{#if fw === 'pt'}

<CourseFloatingBanner chapter={2}
<CourseFloatingBanner
chapter={2}
classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_pt.ipynb"},
{label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_pt.ipynb"},
]} />
{
label: "Google Colab",
value:
"https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_pt.ipynb",
},
{
label: "Aws Studio",
value:
"https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_pt.ipynb",
},
]}
/>

{:else}

<CourseFloatingBanner chapter={2}
<CourseFloatingBanner
chapter={2}
classNames="absolute z-10 right-0 top-0"
notebooks={[
{label: "Google Colab", value: "https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_tf.ipynb"},
{label: "Aws Studio", value: "https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_tf.ipynb"},
]} />
{
label: "Google Colab",
value:
"https://colab.research.google.com/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_tf.ipynb",
},
{
label: "Aws Studio",
value:
"https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/master/course/en/chapter2/section3_tf.ipynb",
},
]}
/>

{/if}

{#if fw === 'pt'}

<Youtube id="AhChOFRegn4"/>
{:else}
<Youtube id="d3JVgghSOew"/>
Expand All @@ -47,6 +68,7 @@ However, if you know the type of model you want to use, you can use the class th
The first thing we'll need to do to initialize a BERT model is load a configuration object:

{#if fw === 'pt'}

```py
from transformers import BertConfig, BertModel

Expand All @@ -56,7 +78,9 @@ config = BertConfig()
# Building the model from the config
model = BertModel(config)
```

{:else}

```py
from transformers import BertConfig, TFBertModel

Expand All @@ -66,6 +90,7 @@ config = BertConfig()
# Building the model from the config
model = TFBertModel(config)
```

{/if}

The configuration contains many attributes that are used to build the model:
Expand Down Expand Up @@ -93,6 +118,7 @@ While you haven't seen what all of these attributes do yet, you should recognize
Creating a model from the default configuration initializes it with random values:

{#if fw === 'pt'}

```py
from transformers import BertConfig, BertModel

Expand All @@ -101,7 +127,9 @@ model = BertModel(config)

# Model is randomly initialized!
```

{:else}

```py
from transformers import BertConfig, TFBertModel

Expand All @@ -110,13 +138,15 @@ model = TFBertModel(config)

# Model is randomly initialized!
```

{/if}

The model can be used in this state, but it will output gibberish; it needs to be trained first. We could train the model from scratch on the task at hand, but as you saw in [Chapter 1](/course/chapter1), this would require a long time and a lot of data, and it would have a non-negligible environmental impact. To avoid unnecessary and duplicated effort, it's imperative to be able to share and reuse models that have already been trained.

Loading a Transformer model that is already trained is simple — we can do this using the `from_pretrained()` method:

{#if fw === 'pt'}

```py
from transformers import BertModel

Expand All @@ -126,6 +156,7 @@ model = BertModel.from_pretrained("bert-base-cased")
As you saw earlier, we could replace `BertModel` with the equivalent `AutoModel` class. We'll do this from now on as this produces checkpoint-agnostic code; if your code works for one checkpoint, it should work seamlessly with another. This applies even if the architecture is different, as long as the checkpoint was trained for a similar task (for example, a sentiment analysis task).

{:else}

```py
from transformers import TFBertModel

Expand All @@ -140,7 +171,7 @@ In the code sample above we didn't use `BertConfig`, and instead loaded a pretra

This model is now initialized with all the weights of the checkpoint. It can be used directly for inference on the tasks it was trained on, and it can also be fine-tuned on a new task. By training with pretrained weights rather than from scratch, we can quickly achieve good results.

The weights have been downloaded and cached (so future calls to the `from_pretrained()` method won't re-download them) in the cache folder, which defaults to *~/.cache/huggingface/transformers*. You can customize your cache folder by setting the `HF_HOME` environment variable.
The weights have been downloaded and cached (so future calls to the `from_pretrained()` method won't re-download them) in the cache folder, which defaults to _~/.cache/huggingface/hub_. You can customize your cache folder by setting the `HF_HOME` environment variable.

The identifier used to load the model can be the identifier of any model on the Model Hub, as long as it is compatible with the BERT architecture. The entire list of available BERT checkpoints can be found [here](https://huggingface.co/models?filter=bert).

Expand All @@ -155,26 +186,30 @@ model.save_pretrained("directory_on_my_computer")
This saves two files to your disk:

{#if fw === 'pt'}

```
ls directory_on_my_computer

config.json pytorch_model.bin
```

{:else}

```
ls directory_on_my_computer

config.json tf_model.h5
```

{/if}

If you take a look at the *config.json* file, you'll recognize the attributes necessary to build the model architecture. This file also contains some metadata, such as where the checkpoint originated and what 🤗 Transformers version you were using when you last saved the checkpoint.
If you take a look at the _config.json_ file, you'll recognize the attributes necessary to build the model architecture. This file also contains some metadata, such as where the checkpoint originated and what 🤗 Transformers version you were using when you last saved the checkpoint.

{#if fw === 'pt'}
The *pytorch_model.bin* file is known as the *state dictionary*; it contains all your model's weights. The two files go hand in hand; the configuration is necessary to know your model's architecture, while the model weights are your model's parameters.
The _pytorch_model.bin_ file is known as the _state dictionary_; it contains all your model's weights. The two files go hand in hand; the configuration is necessary to know your model's architecture, while the model weights are your model's parameters.

{:else}
The *tf_model.h5* file is known as the *state dictionary*; it contains all your model's weights. The two files go hand in hand; the configuration is necessary to know your model's architecture, while the model weights are your model's parameters.
The _tf_model.h5_ file is known as the _state dictionary_; it contains all your model's weights. The two files go hand in hand; the configuration is necessary to know your model's architecture, while the model weights are your model's parameters.

{/if}

Expand All @@ -190,7 +225,7 @@ Let's say we have a couple of sequences:
sequences = ["Hello!", "Cool.", "Nice!"]
```

The tokenizer converts these to vocabulary indices which are typically called *input IDs*. Each sequence is now a list of numbers! The resulting output is:
The tokenizer converts these to vocabulary indices which are typically called _input IDs_. Each sequence is now a list of numbers! The resulting output is:

```py no-format
encoded_sequences = [
Expand All @@ -203,17 +238,21 @@ encoded_sequences = [
This is a list of encoded sequences: a list of lists. Tensors only accept rectangular shapes (think matrices). This "array" is already of rectangular shape, so converting it to a tensor is easy:

{#if fw === 'pt'}

```py
import torch

model_inputs = torch.tensor(encoded_sequences)
```

{:else}

```py
import tensorflow as tf

model_inputs = tf.constant(encoded_sequences)
```

{/if}

### Using the tensors as inputs to the model[[using-the-tensors-as-inputs-to-the-model]]
Expand All @@ -224,5 +263,5 @@ Making use of the tensors with the model is extremely simple — we just call th
output = model(model_inputs)
```

While the model accepts a lot of different arguments, only the input IDs are necessary. We'll explain what the other arguments do and when they are required later,
While the model accepts a lot of different arguments, only the input IDs are necessary. We'll explain what the other arguments do and when they are required later,
but first we need to take a closer look at the tokenizers that build the inputs that a Transformer model can understand.
Loading