Categories: Networks

Introduction to Networks

I’m taking an online introductory course on networks. I have been surprised by how much ground this course is covering. I didn’t expect to cover wireless (mobile) networks, for example. I looked for videos on some of the topics to learn more, e.g. 4g network architecture – YouTube. Networking is turning out to be much cooler and more interesting than I thought possible. This post is a compilation of all the key topics introduced in the course (in the general order they were introduced, but not particularly organized into a coherent story).

My main takeaway from this first video is that 4G networks are entirely packet switched (basic, but new to me).

4G LTE Network Architecture Simplified

The next video on how messages are transmitted to the cell phone tower is insightful as well. I appreciated the high-level discussion of antennas.

How WiFi and Cell Phones Work | Wireless Communication Explained

The concept of control plane and data plane came up as well. One advantage of this separation as per the overview below are independent evolution and development of each (e.g. control software can be upgraded without changing the hardware).

M2.1: Overview of Control and Data Plane Separation

There are so many concepts in this space, most of them new to me, e.g. OAM, NMS, and EMS. Some places they are discussed include LTE Architecture Concepts, Differences Between an NMS and an EMS, and this video on Management Plane vs. Control Plane vs. Data Plane. We briefly got into the differences between 4G and 5G, one being the service-based architecture. Here’s a video I found introducing it:

5G Service Based Architecture | Telecoms Bytes – Mpirical

Then of course there are the fundamental concepts of throughput, delay, and packet loss error. Jim Kurose’s book (and video below) covers these topics but it’s been a while since I read that book.

The professor also clarified the difference between bandwidth and throughput. The next video briefly touches on this distinction:

The course has also introduced me to the concept of spectral efficiency as part of understanding the difference between bandwidth and throughput. There is no shortage of concepts to learn about, from the different types of lines like T1 and T3 to bit robbing to the existence of network interface devices. The video below is good intro to T1.

DS1 (T1) Fundamentals

There was also a discussion about cable networks, with an onslaught of concepts like Hybrid fiber-coaxial. This Cable 101 video is a helpful resource.

The HFC Cable Systems Introduction video below starts out with a comparison of coax and fiber then explains the flow of signals from the core network to the home.

HFC Cable Systems Introduction

I still need to learn more information about the Cable modem termination system (CMTS) and the next resource is perfect. It mentions CMTS vendors like Arris, Cisco, and Motorola, which inspires me to look up the Cisco CMTS.

Cable Modem Termination System Tutorial (CMTS)

I have never researched how most of these systems work so I am greatly appreciating this introduction to networks course! Here’s a video on how cable modems work, including their interactions with the CMTS.

How Cable Modems Work

The communication between the CMTS and the CMs is done via DOCSIS. Here is the reference I found with insight into DOCSIS.

DOCSIS® 3.1 – An Overview

Something I picked up is that CableLabs does a lot of the research for these systems. Other concepts to know include wavelength-division multiplexing (WDM), which was used in the traditional coax networks. The following explanation is an example of WDM in fiber.

What is WDM (Wavelength Division Multiplexer)?

The next technology described is DSL (Digital subscriber line). With DSL, the last mile is not shared (unlike cable networks). It evolved into ADSL and VDSL to support higher throughput. It’s interesting that it uses Asynchronous Transfer Mode (ATM) from back in the day. We also briefly introduce passive optical networks.

PON, What is a PON? All you need to know!

Next, we get into the 7-layer OSI model. The example given for the physical layer is SONET technology. Another foray into T1 technology reveals the fact that bipolar transmission is used for T1 since it is more power efficient.

Multiplexing is the next interesting topic introduced. I have included some videos below on the different types of multiplexing employed in communications.

  1. FDM involves modulating message signals over carrier frequencies then using bandpass filters to extract the individual signals.
  2. Time-division multiplexing: one variant is statistical TDM, which was a first step toward IP.
  3. Wavelength-division multiplexing (WDM)
Frequency Division Multiplexing (FDM) Explained
Time Division Multiplexing (TDM) | Synchronous, Asynchronous, Statistical TDM | Computer Networks

The course also addresses transmission fundamentals like the difference between bit rate and baud rate, the Shannon–Hartley theorem, the Nyquist–Shannon sampling theorem, modulation, modems, and codecs. I have compiled a few videos covering these topics below.

Here is an explanation of the Shannon–Hartley theorem:

Channel Capacity by Shannon-Hartley | Basics, Proof & Maximum Bandwidth Condition

The intuitition behind the Nyquist–Shannon sampling theorem is explained in the next video:

The intuition behind the Nyquist-Shannon Sampling Theorem

The concept of modulation comes next:

What is Modulation ? Why Modulation is Required ? Types of Modulation Explained.

Other concepts introduced include the constellation diagram and Quadrature amplitude modulation (QAM). The following videos introduce these 2 concepts:

What is a Constellation Diagram?

We then start getting into network addressing. One of the important concepts here is how the exhaustion of IPv4 addresses is handled: private IP addresses, DHCP, subnetting, and IPv6. One particularly interesting point was the difference between IPv4 and IPv6 headers:

IPv4 Header vs IPv6 Header Explained

The history of telecom is also worth knowing. More recent key events are the 1984 Modified Final Judgement and the Telecommunications Act of 1996. Verify that this playlist covers the 1984 Modified Final Judgement.

In a discussion of the impact of TCP on throughput, the professor called out TCP global synchronization as an issue that networks need to avoid. Here’s one video about it.

Avoiding packet reordering is another important aspect of TCP. The contrast with UDP is especially interesting when other protocols like Google’s QUIC are designed. The RTP protocol (a relative of UDP, informally speaking) is used for VoIP. This is a good description of RTP:

Real-Time Transport Protocol (RTP) in VoIP

The Session Initiation Protocol (SIP) may be used to set up the RTP bearer streams. Here is a high level overview of SIP.

What is SIP?

RTP Control Protocol is a related protocol used to provide feedback on the quality of service (QoS).


Fixing my VS Code Python Environment

I mentioned in my last post (Learning about Large Language Models) that I recently started going through Sebastian Raschka’s Build a Large Language Model from Scratch book. It had been a while since I ran python code on my laptop so I needed to do some cleanup to restore my environment. I cloned the repo and started executing the first cell in Chapter 2:

from importlib.metadata import version

print("torch version:", version("torch"))
print("tiktoken version:", version("tiktoken"))

I got this error: PackageNotFoundError: No package metadata was found for torch".

{
	"name": "PackageNotFoundError",
	"message": "No package metadata was found for torch",
	"stack": "---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
File /opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/metadata/__init__.py:563, in Distribution.from_name(cls, name)
    562 try:
--> 563     return next(cls.discover(name=name))
    564 except StopIteration:

StopIteration: 

During handling of the above exception, another exception occurred:

PackageNotFoundError                      Traceback (most recent call last)
Cell In[2], line 7
      3 print(sys.version)
      5 from importlib.metadata import version
----> 7 print(\"torch version:\", version(\"torch\"))
      8 print(\"tiktoken version:\", version(\"tiktoken\"))

File /opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/metadata/__init__.py:1008, in version(distribution_name)
   1001 def version(distribution_name):
   1002     \"\"\"Get the version string for the named package.
   1003 
   1004     :param distribution_name: The name of the distribution package to query.
   1005     :return: The version string for the package as defined in the package's
   1006         \"Version\" metadata key.
   1007     \"\"\"
-> 1008     return distribution(distribution_name).version

File /opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/metadata/__init__.py:981, in distribution(distribution_name)
    975 def distribution(distribution_name):
    976     \"\"\"Get the ``Distribution`` instance for the named package.
    977 
    978     :param distribution_name: The name of the distribution package as a string.
    979     :return: A ``Distribution`` instance (or subclass thereof).
    980     \"\"\"
--> 981     return Distribution.from_name(distribution_name)

File /opt/homebrew/Cellar/python@3.11/3.11.1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/metadata/__init__.py:565, in Distribution.from_name(cls, name)
    563     return next(cls.discover(name=name))
    564 except StopIteration:
--> 565     raise PackageNotFoundError(name)

PackageNotFoundError: No package metadata was found for torch"
}

Here is python setup I had:

saint@MacBookPro LLMs-from-scratch % which python3
/opt/homebrew/bin/python3
saint@MacBookPro LLMs-from-scratch % python3 --version
Python 3.10.9

How is this showing version 3.10.9 but VS Code is using 3.11 from brew (per the file paths in the error messages? This is how to print the version from a python script as per Printing Python version in output – Stack Overflow

import sys
print("Python version")
print(sys.version)
print(sys.path)

Since I have brew installed, this might be what I need: the comment that “If you brew-install Python, but pip is still not in your path, you might need to re-link, like this brew unlink python && brew link python” at python – How do I install pip on macOS or OS X? – Stack Overflow.

saint@MacBookPro LLMs-from-scratch % brew unlink python
Unlinking /opt/homebrew/Cellar/python@3.11/3.11.1... 11 symlinks removed.
saint@MacBookPro LLMs-from-scratch % brew link python
Error: unknown or unsupported macOS version: :dunno

python3 --version still shows 3.10.9.

saint@MacBookPro LLMs-from-scratch % ls -l `which python3`
lrwxr-xr-x  1 saint  admin  40 Jan 27  2023 /opt/homebrew/bin/python3 -> ../Cellar/python@3.10/3.10.9/bin/python3

Is my brew too old?

saint@MacBookPro LLMs-from-scratch % brew --version
Homebrew 4.0.26
Homebrew/homebrew-core (git revision 5b93511fd6b; last commit 2023-03-19)
Homebrew/homebrew-cask (git revision dccc1df2ac; last commit 2023-03-19)

git – How do I update Homebrew? – Stack Overflow says to just run brew update. Look at this list! Scary how much of this outdated stuff is likely to be insecure.

saint@MacBookPro LLMs-from-scratch % brew update
==> Downloading https://ghcr.io/v2/homebrew/portable-ruby/portable-ruby/blobs/sha256:d9faa506c014dedc0b034a68103ba75c9a58242f4d6c67b6ca0f649c39602bcf
######################################################################################################################################################################################################################################### 100.0%
==> Pouring portable-ruby-3.3.7.arm64_big_sur.bottle.tar.gz
==> Homebrew collects anonymous analytics.
Read the analytics documentation (and how to opt-out) here:
  https://docs.brew.sh/Analytics
No analytics have been recorded yet (nor will be during this `brew` run).

==> homebrew/core is old and unneeded, untapping to save space...
Untapping homebrew/core...
Untapped 3 commands and 7398 formulae (7,130 files, 1GB).
==> homebrew/cask is old and unneeded, untapping to save space...
Untapping homebrew/cask...
Untapped 7333 casks (4,415 files, 487.2MB).
==> Downloading https://formulae.brew.sh/api/formula_tap_migrations.jws.json
Updated 4 taps (microsoft/git, homebrew/cask-versions, homebrew/core and homebrew/cask).
==> New Formulae

...<hundreds of lines omitted but included python entries below>

python-argcomplete
python-freethreading
python-gdbm@3.12
python-gdbm@3.13
python-matplotlib
python-packaging
python-setuptools
python-tk@3.12
python-tk@3.13
python@3.12
python@3.13
pyupgrade
...

==> Deleted Installed Formulae
icu4c ✘
==> Deleted Installed Casks
git-credential-manager-core ✘                                                                                            microsoft-openjdk11 ✘
Error: Unexpected method 'appcast' called on Cask adoptopenjdk16.
Follow the instructions here:
  https://github.com/Homebrew/homebrew-cask#reporting-bugs
==> Downloading https://formulae.brew.sh/api/cask_tap_migrations.jws.json
==> Outdated Formulae
aom                 fb303               fribidi             gnuplot             jasper              libheif             libtool             lua                 openblas            pstoedit            sqlite              xorgproto
arpack              fbthrift            gcc                 graphicsmagick      jbig2dec            libidn              libunistring        lz4                 openexr             pyqt@5              suite-sparse        xz
autoconf            fig2dev             gd                  harfbuzz            jpeg-turbo          libidn2             libvmaf             maven               openjdk             python@3.10         sundials            zstd
boost               fizz                gdbm                hdf5                jpeg-xl             liblqr              libx11              mpdecimal           openjpeg            python@3.11         tcl-tk
brotli              flac                gettext             highway             libaec              libomp              libxau              mpfr                openssl@1.1         python@3.9          texinfo
ca-certificates     fltk                ghostscript         hwloc               libavif             libpng              libxcb              mpg123              openssl@3           qscintilla2         wangle
cairo               fmt                 giflib              icu4c@76            libcerf             libraw              libxdmcp            netpbm              opus                qt@5                watchman
cmake               folly               git-gui             imagemagick         libde265            libsndfile          libxext             ninja               pango               readline            webp
double-conversion   fontconfig          glib                imath               libevent            libsodium           libxrender          octave              pcre2               shared-mime-info    wget
edencommon          freetype            gmp                 isl                 libffi              libtiff             little-cms2         open-mpi            pixman              snappy              x265
==> Outdated Casks
git-credential-manager                                                          microsoft-openjdk                                                               microsoft-openjdk@11

You have 113 outdated formulae and 3 outdated casks installed.
You can upgrade them with brew upgrade
or list them with brew outdated.
Error: Unexpected method 'appcast' called on Cask adoptopenjdk16.
Follow the instructions here:
  https://github.com/Homebrew/homebrew-cask#reporting-bugs
==> Migrating cask git-credential-manager-core to git-credential-manager
Error: inreplace failed
/opt/homebrew/Caskroom/git-credential-manager/.metadata/2.1.2/20230703191748.675/Casks/git-credential-manager.rb:
  expected replacement of /\A\s*cask\s+"git\-credential\-manager\-core"/ with "cask \"git-credential-manager\""

python3 --version is still 3.10.9 after this. I tried running pip but zsh sayd command not found. Unfortunately, linking or unlinking of either python or python3 fails with the errors below (despite ls -l `which python3` showing the same path as before).

saint@MacBookPro LLMs-from-scratch % brew unlink python3
Error: Unexpected method 'appcast' called on Cask adoptopenjdk16.
Follow the instructions here:
  https://github.com/Homebrew/homebrew-cask#reporting-bugs
Error: No such keg: /opt/homebrew/Cellar/python3

I decided to install python3 again.

saint@MacBookPro LLMs-from-scratch % brew install python3
==> Downloading https://formulae.brew.sh/api/formula.jws.json
==> Downloading https://formulae.brew.sh/api/cask.jws.json
==> Downloading https://ghcr.io/v2/homebrew/core/python/3.13/manifests/3.13.1
######################################################################################################################################################################################################################################### 100.0%
==> Fetching dependencies for python@3.13: mpdecimal, ca-certificates, openssl@3, readline, sqlite and xz
==> Downloading https://ghcr.io/v2/homebrew/core/mpdecimal/manifests/4.0.0-1
######################################################################################################################################################################################################################################### 100.0%
==> Fetching mpdecimal
==> Downloading https://ghcr.io/v2/homebrew/core/mpdecimal/blobs/sha256:0f5f269bed0e6be2de3edfc4b52867e656f993e5bcff40717f26ee94dd0d2211
######################################################################################################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/ca-certificates/manifests/2024-12-31
######################################################################################################################################################################################################################################### 100.0%
==> Fetching ca-certificates
...
<lots of omitted lines>
...
==> Fetching harfbuzz
==> Downloading https://ghcr.io/v2/homebrew/core/harfbuzz/blobs/sha256:2f892566c02b3c8c61aed6f7867b4405e5c814df8500ef4bc4ca91a9e40205a9
######################################################################################################################################################################################################################################### 100.0%
==> Fetching openjdk
==> Downloading https://ghcr.io/v2/homebrew/core/openjdk/blobs/sha256:1285eadf2b5998cda49e4470ee3875e855b0be199765401ad77dc38aea573f49
######################################################################################################################################################################################################################################### 100.0%
Error: can't modify frozen String: "The bottle needs the Xcode Command Line Tools to be installed at /Library/Developer/CommandLineTools.\nDevelopment tools provided by Xcode.app are not sufficient.\n\nYou can install the Xcode Command Line Tools, if desired, with:\n    xcode-select --install\n"

This was the new state of affairs is that command (failed):

saint@MacBookPro LLMs-from-scratch % python3 --version
Python 3.13.1
saint@MacBookPro LLMs-from-scratch % which python3
/opt/homebrew/bin/python3
saint@MacBookPro LLMs-from-scratch % ls -l `which python3`
lrwxr-xr-x  1 saint  admin  40 Feb  4 17:02 /opt/homebrew/bin/python3 -> ../Cellar/python@3.13/3.13.1/bin/python3
saint@MacBookPro LLMs-from-scratch % which pip
pip not found

Ah, all that agonizing and look at this – did I need to be using pip3 all this time?

saint@MacBookPro LLMs-from-scratch % which pip3
/opt/homebrew/bin/pip3
saint@MacBookPro LLMs-from-scratch % ls -l `which pip3`
lrwxr-xr-x  1 saint  admin  37 Feb  4 17:02 /opt/homebrew/bin/pip3 -> ../Cellar/python@3.13/3.13.1/bin/pip3
saint@MacBookPro LLMs-from-scratch % 

Interestingly, I still can’t install pytorch using pip3?

saint@MacBookPro LLMs-from-scratch % pip3 install pytorch

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python3.13 -m pip install --upgrade pip
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:
    
    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz
    
    If you wish to install a Python application that isn't in Homebrew,
    it may be easiest to use 'pipx install xyz', which will manage a
    virtual environment for you. You can install pipx with
    
    brew install pipx
    
    You may restore the old behavior of pip by passing
    the '--break-system-packages' flag to pip, or by adding
    'break-system-packages = true' to your pip.conf file. The latter
    will permanently disable this error.
    
    If you disable this error, we STRONGLY recommend that you additionally
    pass the '--user' flag to pip, or set 'user = true' in your pip.conf
    file. Failure to do this can result in a broken Homebrew installation.
    
    Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

I then tried installing brew as suggested above:

saint@MacBookPro LLMs-from-scratch % brew install pytorch
==> Downloading https://formulae.brew.sh/api/formula.jws.json
==> Downloading https://formulae.brew.sh/api/cask.jws.json
==> Downloading https://ghcr.io/v2/homebrew/core/pytorch/manifests/2.5.1_4
######################################################################################################################################################################################################################################### 100.0%
==> Fetching dependencies for pytorch: abseil, libuv, libyaml, gmp, isl, mpfr, lz4, zstd, make, gcc, openblas, numpy, protobuf, pybind11, sleef and libomp
==> Downloading https://ghcr.io/v2/homebrew/core/abseil/manifests/20240722.1
######################################################################################################################################################################################################################################### 100.0%
==> Fetching abseil
==> Downloading https://ghcr.io/v2/homebrew/core/abseil/blobs/sha256:be7b3373c56a0e1ee2c0c2e85ee4d17e2105ac1d9d6d63011da28d636fec7424
######################################################################################################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/libuv/manifests/1.50.0
######################################################################################################################################################################################################################################### 100.0%
==> Fetching libuv
==> Downloading https://ghcr.io/v2/homebrew/core/libuv/blobs/sha256:9a70ed97116c4960f0484159c07145df8e768b1a62be68c071070869ba4c3644
######################################################################################################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/libyaml/manifests/0.2.5
######################################################################################################################################################################################################################################### 100.0%
==> Fetching libyaml
==> Downloading https://ghcr.io/v2/homebrew/core/libyaml/blobs/sha256:0ec9bf8082245c008803b42dcae3e6a0c8cd7a67aed589d9b6482b115c0a543b
######################################################################################################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/gmp/manifests/6.3.0
######################################################################################################################################################################################################################################### 100.0%
==> Fetching gmp
==> Downloading https://ghcr.io/v2/homebrew/core/gmp/blobs/sha256:6683d73d6677d28e1e8d1b92d6ebfbc068c1d33e19b79114a22a648a99ba5991
######################################################################################################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/isl/manifests/0.27
######################################################################################################################################################################################################################################### 100.0%
==> Fetching isl
==> Downloading https://ghcr.io/v2/homebrew/core/isl/blobs/sha256:de143fddb0e20b6b73016ead1e625ebd429db53918200d093e4da98f1e758889
######################################################################################################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/mpfr/manifests/4.2.1
######################################################################################################################################################################################################################################### 100.0%
==> Fetching mpfr
==> Downloading https://ghcr.io/v2/homebrew/core/mpfr/blobs/sha256:51f0ca19e897731b928742401c9c8d1d7d93c3c275aa8a66a77b9ac01d0c223c
######################################################################################################################################################################################################################################### 100.0%
==> Downloading https://ghcr.io/v2/homebrew/core/lz4/manifests/1.10.0-1
Already downloaded: /Users/saint/Library/Caches/Homebrew/downloads/8e11e90eb21a06e0f199af9d80e011e3693c77dd353b2477579d95c8471a5802--lz4-1.10.0-1.bottle_manifest.json
==> Fetching lz4
==> Downloading https://ghcr.io/v2/homebrew/core/lz4/blobs/sha256:5bd143b7b784989e549637ea4e484af85ba481e640dde69bc35f3843ae25abc6
Already downloaded: /Users/saint/Library/Caches/Homebrew/downloads/8af6cbc3dc870dba18d9a1b3f311f2d43b56a020a1e565289122cfda703ab791--lz4--1.10.0.arm64_sequoia.bottle.1.tar.gz
==> Downloading https://ghcr.io/v2/homebrew/core/zstd/manifests/1.5.6
Already downloaded: /Users/saint/Library/Caches/Homebrew/downloads/29403e0df5404d8aeca0e750ac135ec9ef44fc5eeb6df69170ed602acabf0ffb--zstd-1.5.6.bottle_manifest.json
==> Fetching zstd
==> Downloading https://ghcr.io/v2/homebrew/core/zstd/blobs/sha256:487f35700f563b07036cfd429e4e7a4e37f13e22578e688cbfee2fa9484aaf9d
Already downloaded: /Users/saint/Library/Caches/Homebrew/downloads/bcdf6b56ea7b8b23105f1518ddd0830ac5a56c333b0274959c10084ea3a31346--zstd--1.5.6.arm64_sequoia.bottle.tar.gz
==> Downloading https://ghcr.io/v2/homebrew/core/make/manifests/4.4.1-1
######################################################################################################################################################################################################################################### 100.0%
==> Fetching make
==> Downloading https://ghcr.io/v2/homebrew/core/make/blobs/sha256:f361639a5ec1a9355e12f985c511dd6631b6790452a52057032a3a07a690ca4e
######################################################################################################################################################################################################################################### 100.0%
Error: can't modify frozen String: "The bottle needs the Xcode Command Line Tools to be installed at /Library/Developer/CommandLineTools.\nDevelopment tools provided by Xcode.app are not sufficient.\n\nYou can install the Xcode Command Line Tools, if desired, with:\n    xcode-select --install\n"
saint@MacBookPro LLMs-from-scratch % xcode-select --install
xcode-select: note: install requested for command line developer tools

I installed the command line developer tools when prompted below.

Trying to run the first cell in VS Code with the updated setup now gave this error:

{
	"name": "",
	"message": "",
	"stack": "Running cells with 'Python 3.13.1' requires the ipykernel package.
Run the following command to install 'ipykernel' into the Python environment. 
Command: '/opt/homebrew/bin/python3 -m pip install ipykernel -U --user --force-reinstall'"
}
saint@MacBookPro LLMs-from-scratch % /opt/homebrew/bin/python3 -m pip install ipykernel -U --user --force-reinstall

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python3.13 -m pip install --upgrade pip
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:
    
    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz
    
    If you wish to install a Python application that isn't in Homebrew,
    it may be easiest to use 'pipx install xyz', which will manage a
    virtual environment for you. You can install pipx with
    
    brew install pipx
    
    You may restore the old behavior of pip by passing
    the '--break-system-packages' flag to pip, or by adding
    'break-system-packages = true' to your pip.conf file. The latter
    will permanently disable this error.
    
    If you disable this error, we STRONGLY recommend that you additionally
    pass the '--user' flag to pip, or set 'user = true' in your pip.conf
    file. Failure to do this can result in a broken Homebrew installation.
    
    Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
saint@MacBookPro LLMs-from-scratch % brew install ipykernel
==> Downloading https://formulae.brew.sh/api/formula.jws.json
==> Downloading https://formulae.brew.sh/api/cask.jws.json
Warning: No available formula with the name "ipykernel".
==> Searching for similarly named formulae and casks...
Error: No formulae or casks found for ipykernel.

I override the warning using the --break-system-packages flag and the VS Code notebook now runs.

saint@MacBookPro LLMs-from-scratch % python3 -m pip install ipykernel -U --user --force-reinstall --break-system-package
Collecting ipykernel
  Downloading ipykernel-6.29.5-py3-none-any.whl.metadata (6.3 kB)
Collecting appnope (from ipykernel)
  Downloading appnope-0.1.4-py2.py3-none-any.whl.metadata (908 bytes)
Collecting comm>=0.1.1 (from ipykernel)
  Downloading comm-0.2.2-py3-none-any.whl.metadata (3.7 kB)
Collecting debugpy>=1.6.5 (from ipykernel)
  Downloading debugpy-1.8.12-cp313-cp313-macosx_14_0_universal2.whl.metadata (1.3 kB)
Collecting ipython>=7.23.1 (from ipykernel)
  Downloading ipython-8.32.0-py3-none-any.whl.metadata (5.0 kB)
Collecting jupyter-client>=6.1.12 (from ipykernel)
  Downloading jupyter_client-8.6.3-py3-none-any.whl.metadata (8.3 kB)
Collecting jupyter-core!=5.0.*,>=4.12 (from ipykernel)
  Downloading jupyter_core-5.7.2-py3-none-any.whl.metadata (3.4 kB)
Collecting matplotlib-inline>=0.1 (from ipykernel)
  Downloading matplotlib_inline-0.1.7-py3-none-any.whl.metadata (3.9 kB)
Collecting nest-asyncio (from ipykernel)
  Downloading nest_asyncio-1.6.0-py3-none-any.whl.metadata (2.8 kB)
Collecting packaging (from ipykernel)
  Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Collecting psutil (from ipykernel)
  Downloading psutil-6.1.1-cp36-abi3-macosx_11_0_arm64.whl.metadata (22 kB)
Collecting pyzmq>=24 (from ipykernel)
  Downloading pyzmq-26.2.1-cp313-cp313-macosx_10_15_universal2.whl.metadata (6.2 kB)
Collecting tornado>=6.1 (from ipykernel)
  Downloading tornado-6.4.2-cp38-abi3-macosx_10_9_universal2.whl.metadata (2.5 kB)
Collecting traitlets>=5.4.0 (from ipykernel)
  Downloading traitlets-5.14.3-py3-none-any.whl.metadata (10 kB)
Collecting decorator (from ipython>=7.23.1->ipykernel)
  Downloading decorator-5.1.1-py3-none-any.whl.metadata (4.0 kB)
Collecting jedi>=0.16 (from ipython>=7.23.1->ipykernel)
  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Collecting pexpect>4.3 (from ipython>=7.23.1->ipykernel)
  Downloading pexpect-4.9.0-py2.py3-none-any.whl.metadata (2.5 kB)
Collecting prompt_toolkit<3.1.0,>=3.0.41 (from ipython>=7.23.1->ipykernel)
  Downloading prompt_toolkit-3.0.50-py3-none-any.whl.metadata (6.6 kB)
Collecting pygments>=2.4.0 (from ipython>=7.23.1->ipykernel)
  Downloading pygments-2.19.1-py3-none-any.whl.metadata (2.5 kB)
Collecting stack_data (from ipython>=7.23.1->ipykernel)
  Downloading stack_data-0.6.3-py3-none-any.whl.metadata (18 kB)
Collecting python-dateutil>=2.8.2 (from jupyter-client>=6.1.12->ipykernel)
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl.metadata (8.4 kB)
Collecting platformdirs>=2.5 (from jupyter-core!=5.0.*,>=4.12->ipykernel)
  Downloading platformdirs-4.3.6-py3-none-any.whl.metadata (11 kB)
Collecting parso<0.9.0,>=0.8.4 (from jedi>=0.16->ipython>=7.23.1->ipykernel)
  Downloading parso-0.8.4-py2.py3-none-any.whl.metadata (7.7 kB)
Collecting ptyprocess>=0.5 (from pexpect>4.3->ipython>=7.23.1->ipykernel)
  Downloading ptyprocess-0.7.0-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting wcwidth (from prompt_toolkit<3.1.0,>=3.0.41->ipython>=7.23.1->ipykernel)
  Downloading wcwidth-0.2.13-py2.py3-none-any.whl.metadata (14 kB)
Collecting six>=1.5 (from python-dateutil>=2.8.2->jupyter-client>=6.1.12->ipykernel)
  Downloading six-1.17.0-py2.py3-none-any.whl.metadata (1.7 kB)
Collecting executing>=1.2.0 (from stack_data->ipython>=7.23.1->ipykernel)
  Downloading executing-2.2.0-py2.py3-none-any.whl.metadata (8.9 kB)
Collecting asttokens>=2.1.0 (from stack_data->ipython>=7.23.1->ipykernel)
  Downloading asttokens-3.0.0-py3-none-any.whl.metadata (4.7 kB)
Collecting pure-eval (from stack_data->ipython>=7.23.1->ipykernel)
  Downloading pure_eval-0.2.3-py3-none-any.whl.metadata (6.3 kB)
Downloading ipykernel-6.29.5-py3-none-any.whl (117 kB)
Downloading comm-0.2.2-py3-none-any.whl (7.2 kB)
Downloading debugpy-1.8.12-cp313-cp313-macosx_14_0_universal2.whl (2.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 36.8 MB/s eta 0:00:00
Downloading ipython-8.32.0-py3-none-any.whl (825 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 825.5/825.5 kB 54.9 MB/s eta 0:00:00
Downloading jupyter_client-8.6.3-py3-none-any.whl (106 kB)
Downloading jupyter_core-5.7.2-py3-none-any.whl (28 kB)
Downloading matplotlib_inline-0.1.7-py3-none-any.whl (9.9 kB)
Downloading pyzmq-26.2.1-cp313-cp313-macosx_10_15_universal2.whl (1.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 62.7 MB/s eta 0:00:00
Downloading tornado-6.4.2-cp38-abi3-macosx_10_9_universal2.whl (436 kB)
Downloading traitlets-5.14.3-py3-none-any.whl (85 kB)
Downloading appnope-0.1.4-py2.py3-none-any.whl (4.3 kB)
Downloading nest_asyncio-1.6.0-py3-none-any.whl (5.2 kB)
Downloading packaging-24.2-py3-none-any.whl (65 kB)
Downloading psutil-6.1.1-cp36-abi3-macosx_11_0_arm64.whl (248 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 79.1 MB/s eta 0:00:00
Downloading pexpect-4.9.0-py2.py3-none-any.whl (63 kB)
Downloading platformdirs-4.3.6-py3-none-any.whl (18 kB)
Downloading prompt_toolkit-3.0.50-py3-none-any.whl (387 kB)
Downloading pygments-2.19.1-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 58.1 MB/s eta 0:00:00
Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
Downloading decorator-5.1.1-py3-none-any.whl (9.1 kB)
Downloading stack_data-0.6.3-py3-none-any.whl (24 kB)
Downloading asttokens-3.0.0-py3-none-any.whl (26 kB)
Downloading executing-2.2.0-py2.py3-none-any.whl (26 kB)
Downloading parso-0.8.4-py2.py3-none-any.whl (103 kB)
Downloading ptyprocess-0.7.0-py2.py3-none-any.whl (13 kB)
Downloading six-1.17.0-py2.py3-none-any.whl (11 kB)
Downloading pure_eval-0.2.3-py3-none-any.whl (11 kB)
Downloading wcwidth-0.2.13-py2.py3-none-any.whl (34 kB)
Installing collected packages: wcwidth, pure-eval, ptyprocess, traitlets, tornado, six, pyzmq, pygments, psutil, prompt_toolkit, platformdirs, pexpect, parso, packaging, nest-asyncio, executing, decorator, debugpy, asttokens, appnope, stack_data, python-dateutil, matplotlib-inline, jupyter-core, jedi, comm, jupyter-client, ipython, ipykernel
  WARNING: The script pygmentize is installed in '/Users/saint/Library/Python/3.13/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The script debugpy is installed in '/Users/saint/Library/Python/3.13/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts jupyter, jupyter-migrate and jupyter-troubleshoot are installed in '/Users/saint/Library/Python/3.13/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts jupyter-kernel, jupyter-kernelspec and jupyter-run are installed in '/Users/saint/Library/Python/3.13/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
  WARNING: The scripts ipython and ipython3 are installed in '/Users/saint/Library/Python/3.13/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed appnope-0.1.4 asttokens-3.0.0 comm-0.2.2 debugpy-1.8.12 decorator-5.1.1 executing-2.2.0 ipykernel-6.29.5 ipython-8.32.0 jedi-0.19.2 jupyter-client-8.6.3 jupyter-core-5.7.2 matplotlib-inline-0.1.7 nest-asyncio-1.6.0 packaging-24.2 parso-0.8.4 pexpect-4.9.0 platformdirs-4.3.6 prompt_toolkit-3.0.50 psutil-6.1.1 ptyprocess-0.7.0 pure-eval-0.2.3 pygments-2.19.1 python-dateutil-2.9.0.post0 pyzmq-26.2.1 six-1.17.0 stack_data-0.6.3 tornado-6.4.2 traitlets-5.14.3 wcwidth-0.2.13

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python3.13 -m pip install --upgrade pip
saint@MacBookPro LLMs-from-scratch % 

First cell now fails because the torch package cannot be found:

{
	"name": "PackageNotFoundError",
	"message": "No package metadata was found for torch",
	"stack": "---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
File /opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/metadata/__init__.py:407, in Distribution.from_name(cls, name)
    406 try:
--> 407     return next(iter(cls.discover(name=name)))
    408 except StopIteration:

StopIteration: 

During handling of the above exception, another exception occurred:

PackageNotFoundError                      Traceback (most recent call last)
Cell In[1], line 3
      1 from importlib.metadata import version
----> 3 print(\"torch version:\", version(\"torch\"))
      4 print(\"tiktoken version:\", version(\"tiktoken\"))

File /opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/metadata/__init__.py:987, in version(distribution_name)
    980 def version(distribution_name: str) -> str:
    981     \"\"\"Get the version string for the named package.
    982 
    983     :param distribution_name: The name of the distribution package to query.
    984     :return: The version string for the package as defined in the package's
    985         \"Version\" metadata key.
    986     \"\"\"
--> 987     return distribution(distribution_name).version

File /opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/metadata/__init__.py:960, in distribution(distribution_name)
    954 def distribution(distribution_name: str) -> Distribution:
    955     \"\"\"Get the ``Distribution`` instance for the named package.
    956 
    957     :param distribution_name: The name of the distribution package as a string.
    958     :return: A ``Distribution`` instance (or subclass thereof).
    959     \"\"\"
--> 960     return Distribution.from_name(distribution_name)

File /opt/homebrew/Cellar/python@3.13/3.13.1/Frameworks/Python.framework/Versions/3.13/lib/python3.13/importlib/metadata/__init__.py:409, in Distribution.from_name(cls, name)
    407     return next(iter(cls.discover(name=name)))
    408 except StopIteration:
--> 409     raise PackageNotFoundError(name)

PackageNotFoundError: No package metadata was found for torch"
}

These are the commands I tried to install pytorch before finding the correct one: pip3 install torch --break-system-packages.

saint@MacBookPro LLMs-from-scratch % brew install torch
==> Downloading https://formulae.brew.sh/api/formula.jws.json
==> Downloading https://formulae.brew.sh/api/cask.jws.json
Warning: No available formula with the name "torch". Did you mean tor, pytorch or orc?
==> Searching for similarly named formulae and casks...
==> Formulae
pytorch ✔                                                    torchvision                                                  tor                                                          orc

To install pytorch ✔, run:
  brew install pytorch ✔
saint@MacBookPro LLMs-from-scratch % brew install pytorch
Warning: pytorch 2.5.1_4 is already installed and up-to-date.
To reinstall 2.5.1_4, run:
  brew reinstall pytorch
saint@MacBookPro LLMs-from-scratch % pip3 install torch

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python3.13 -m pip install --upgrade pip
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:
    
    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz
    
    If you wish to install a Python application that isn't in Homebrew,
    it may be easiest to use 'pipx install xyz', which will manage a
    virtual environment for you. You can install pipx with
    
    brew install pipx
    
    You may restore the old behavior of pip by passing
    the '--break-system-packages' flag to pip, or by adding
    'break-system-packages = true' to your pip.conf file. The latter
    will permanently disable this error.
    
    If you disable this error, we STRONGLY recommend that you additionally
    pass the '--user' flag to pip, or set 'user = true' in your pip.conf
    file. Failure to do this can result in a broken Homebrew installation.
    
    Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
saint@MacBookPro LLMs-from-scratch % 
saint@MacBookPro LLMs-from-scratch % pip3 install torch --break-system-packages
Collecting torch
  Downloading torch-2.6.0-cp313-none-macosx_11_0_arm64.whl.metadata (28 kB)
Collecting filelock (from torch)
  Downloading filelock-3.17.0-py3-none-any.whl.metadata (2.9 kB)
Collecting typing-extensions>=4.10.0 (from torch)
  Downloading typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting networkx (from torch)
  Downloading networkx-3.4.2-py3-none-any.whl.metadata (6.3 kB)
Collecting jinja2 (from torch)
  Downloading jinja2-3.1.5-py3-none-any.whl.metadata (2.6 kB)
Collecting fsspec (from torch)
  Downloading fsspec-2025.2.0-py3-none-any.whl.metadata (11 kB)
Collecting setuptools (from torch)
  Downloading setuptools-75.8.0-py3-none-any.whl.metadata (6.7 kB)
Collecting sympy==1.13.1 (from torch)
  Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Collecting mpmath<1.4,>=1.1.0 (from sympy==1.13.1->torch)
  Downloading mpmath-1.3.0-py3-none-any.whl.metadata (8.6 kB)
Collecting MarkupSafe>=2.0 (from jinja2->torch)
  Downloading MarkupSafe-3.0.2-cp313-cp313-macosx_11_0_arm64.whl.metadata (4.0 kB)
Downloading torch-2.6.0-cp313-none-macosx_11_0_arm64.whl (66.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.5/66.5 MB 74.3 MB/s eta 0:00:00
Downloading sympy-1.13.1-py3-none-any.whl (6.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.2/6.2 MB 75.0 MB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading filelock-3.17.0-py3-none-any.whl (16 kB)
Downloading fsspec-2025.2.0-py3-none-any.whl (184 kB)
Downloading jinja2-3.1.5-py3-none-any.whl (134 kB)
Downloading networkx-3.4.2-py3-none-any.whl (1.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.7/1.7 MB 63.7 MB/s eta 0:00:00
Downloading setuptools-75.8.0-py3-none-any.whl (1.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 37.0 MB/s eta 0:00:00
Downloading MarkupSafe-3.0.2-cp313-cp313-macosx_11_0_arm64.whl (12 kB)
Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 27.9 MB/s eta 0:00:00
Installing collected packages: mpmath, typing-extensions, sympy, setuptools, networkx, MarkupSafe, fsspec, filelock, jinja2, torch
Successfully installed MarkupSafe-3.0.2 filelock-3.17.0 fsspec-2025.2.0 jinja2-3.1.5 mpmath-1.3.0 networkx-3.4.2 setuptools-75.8.0 sympy-1.13.1 torch-2.6.0 typing-extensions-4.12.2

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python3.13 -m pip install --upgrade pip
saint@MacBookPro LLMs-from-scratch % 

The pytorch import finally works! The next error is also a PackageNotFoundError: "No package metadata was found for tiktoken" which I addressed with the same installation steps:

saint@MacBookPro LLMs-from-scratch % pip3 install tiktoken                     

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python3.13 -m pip install --upgrade pip
error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
    xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a Python library that isn't in Homebrew,
    use a virtual environment:
    
    python3 -m venv path/to/venv
    source path/to/venv/bin/activate
    python3 -m pip install xyz
    
    If you wish to install a Python application that isn't in Homebrew,
    it may be easiest to use 'pipx install xyz', which will manage a
    virtual environment for you. You can install pipx with
    
    brew install pipx
    
    You may restore the old behavior of pip by passing
    the '--break-system-packages' flag to pip, or by adding
    'break-system-packages = true' to your pip.conf file. The latter
    will permanently disable this error.
    
    If you disable this error, we STRONGLY recommend that you additionally
    pass the '--user' flag to pip, or set 'user = true' in your pip.conf
    file. Failure to do this can result in a broken Homebrew installation.
    
    Read more about this behavior here: <https://peps.python.org/pep-0668/>

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
saint@MacBookPro LLMs-from-scratch % 
saint@MacBookPro LLMs-from-scratch % pip3 install tiktoken --break-system-packages
Collecting tiktoken
  Downloading tiktoken-0.8.0-cp313-cp313-macosx_11_0_arm64.whl.metadata (6.6 kB)
Collecting regex>=2022.1.18 (from tiktoken)
  Downloading regex-2024.11.6-cp313-cp313-macosx_11_0_arm64.whl.metadata (40 kB)
Collecting requests>=2.26.0 (from tiktoken)
  Downloading requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting charset-normalizer<4,>=2 (from requests>=2.26.0->tiktoken)
  Downloading charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl.metadata (35 kB)
Collecting idna<4,>=2.5 (from requests>=2.26.0->tiktoken)
  Downloading idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.26.0->tiktoken)
  Downloading urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests>=2.26.0->tiktoken)
  Downloading certifi-2025.1.31-py3-none-any.whl.metadata (2.5 kB)
Downloading tiktoken-0.8.0-cp313-cp313-macosx_11_0_arm64.whl (982 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 982.8/982.8 kB 24.2 MB/s eta 0:00:00
Downloading regex-2024.11.6-cp313-cp313-macosx_11_0_arm64.whl (284 kB)
Downloading requests-2.32.3-py3-none-any.whl (64 kB)
Downloading certifi-2025.1.31-py3-none-any.whl (166 kB)
Downloading charset_normalizer-3.4.1-cp313-cp313-macosx_10_13_universal2.whl (195 kB)
Downloading idna-3.10-py3-none-any.whl (70 kB)
Downloading urllib3-2.3.0-py3-none-any.whl (128 kB)
Installing collected packages: urllib3, regex, idna, charset-normalizer, certifi, requests, tiktoken
Successfully installed certifi-2025.1.31 charset-normalizer-3.4.1 idna-3.10 regex-2024.11.6 requests-2.32.3 tiktoken-0.8.0 urllib3-2.3.0

[notice] A new release of pip is available: 24.3.1 -> 25.0
[notice] To update, run: python3.13 -m pip install --upgrade pip
saint@MacBookPro LLMs-from-scratch % 

Finally, my machine is in a state that can run the code in the Jupyter notebook! This is such a brittle environment. I need to switch to a managed environment to avoid this type of mess.


Learning about Large Language Models

I am reading Sebastian Raschka’s Build a Large Language Model from Scratch book with a few colleagues at work. This video is mentioned in chapter 1 of the book.

Developing an LLM: Building, Training, Finetuning

We have quizes to see how well we understand the material. These are the questions & notes I have jotted down from my reading so far.

Chapter 1

  1. What is an LLM? p2
  2. What are 2 dimensions that “large” refers to?
  3. Which architecture do LLMs utilize? p3
  4. Why are LLMs often referred to as generative AI/genAI?
  5. What is the relationship between AI, ML, deep learning, LLMs, and genAI?
  6. Give a difference between traditional ML and deep learning.
  7. What are other approaches to AI apart from ML and deep learning? p4
  8. List 5 applications of LLMs.
  9. What are 3 advantages of custom built LLMs? p5
  10. What are the 2 general steps in creating an LLM? p6
  11. What is a base/foundation model? Give an example. p7
  12. What are the few-shot capabilities of a base model?
  13. What are 2 categories of fine-tuning LLMs?
  14. Which architecture did Attention Is All You Need introduce?
  15. Describe the transformer architecture.

Part II

  1. What are the 2 submodules of a transformer? p7
  2. What is the purpose of the self-attention mechanism?
  3. What is BERT? What do the initials stand for? p8
  4. What does GPT stand for?
  5. What is the difference between BERT and GPT? Which submodule of the original transformer does each focus on?
  6. List a real-world application of BERT. p9
  7. What is the difference between zero-shot and few-shot capabilities?
  8. What are applications of transformers (other than LLMs). p10
  9. Give 2 examples of architectures (other than transformers) that LLMs can be based on.
  10. See Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research for a publicly available training dataset (may contain copyrighted works) p11
  11. Why are models like GPT3 called based or foundation models?
  12. What is an estimate of the cost of training GPT3? See https://www.reddit.com/r/MachineLearning/comments/h0jwoz/d_gpt3_the_4600000_language_model/
  13. What type of learning is next-word prediction? p12
  14. What is an autoregressive model? Why is GPT one?
  15. How many transformer layers and parameters does GPT3 have? p13
  16. When was GPT-3 introduced?
  17. Which task was the original transformer model explicitly designed for? p14
  18. What is emergent behavior?
  19. What are the 3 main stages of coding an LLM in this book?
  20. What is the key idea of the transformer architecture? p15

GPT References

  1. Improving Language Understanding by Generative Pre-Training p12
  2. Training language models to follow instructions with human feedback

Chapter 2

  1. What is embedding? p18
  2. What is retrieval-augmented generation? p19
  3. Which embeddings are popular for RAG?
  4. What is Word2Vec? What is the main idea behind it?
  5. What is an advantage of high dimensionality in word embeddings? A disadvantage?
  6. What is an advantage of optimizing embeddings as part of LLM training instead of using Word2Vec?
  7. What is the embedding size of the smaller GPT-2 models? The largest GPT-3 models?

Chapter 3

  1. Why can’t we simply translate a text from one language to another word by word? p52
  2. How can this challenge be addressed using a deep neural network?
  3. What is a recurrent neural network?
  4. What was the most popular encoder-decoder architecture before the advent of transformers?
  5. Explain how an encoder-decoder RNN works. p53
  6. What is the big limitation of encoder-decoder RNNs?
  7. What is the Bahdanau attention mechanism? p54
  8. What is self-attention? p55
  9. What serves as the cornerstone of every LLM based on the transformer architecture?
  10. What does the self in self-attention refer to? p56
  11. What is a context vector? p57
  12. Why are context vectors essential in LLMs?
  13. Why is the dot product a measure of similarity? p59
  14. Give 2 reasons why the attention scores normalized?
  15. Why is it advisable to use the softmax function for normalization in practice? p60
  16. Why is it advisable to use the PyTorch implementation of softmax in particular (instead of your own)?
  17. What is the difference between attention scores and attention weights? p62
  18. How are context vectors computed from attention weights? p63
  19. Which are the 3 weight matrices in self-attention with trainable weights? p65
  20. How are these matrices initialized? How are they used?
  21. What is the difference between weight parameters (matrices) and attention weights?
  22. How are the attention scores computed in the self-attention with trainable weights technique?
  23. What about the attention weights? p68
  24. What is scaled-dot product attention? p69
  25. Why do we scale by the square root of the embedding dimension?
  26. How does the softmax function behave as the dot products increase?
  27. How is the context vector computed?
  28. What is nn.module? p71
  29. What is a significant advantage of using nn.Linear instead of nn.Parameter(torch.rand(…))?
  30. What is causal attention?
  31. How can the tril function be used to create a mask where the values above the diagonal are 0?
  32. Explain a more effective masking trick for more efficiently computing the masked attention weights.
  33. What is dropout in deep learning?
  34. Which are the two specific times when dropout is typically applied in the transformer architecture?
  35. Why does nn.Dropout scale the remaining values? p79-80
  36. What are some advantages of using register_buffer? p81
  37. What is multi-head attention? p82
  38. How can multiple heads be processed in parallel? p85


Categories: DevOps

Backing up my 1TB OneDrive Account

My OneDrive storage reach 99% recently. I wanted to delete some stuff from OneDrive but was uncomfortable doing so without a local backup. I had bought a Seagate BarraCuda 8TB 5400 RPM 3.5″ Hard Drive a few months ago (and appreciated that the Seagate BarraCuda SATA (2-4TB, 6TB, 8TB) Product Manual is available online). My desktop is the Skytech Blaze High-Performance PC Desktop INTEL Core i7 11700F 2.5 GHz, RTX 3060 Ti, 1TB NVME SSD, 16G DDR4 3200, 600W GOLD PSU, 240mm AIO, AC Wi-Fi, Windows 11 Home 64-bit – Newegg.com. I was disappointed to discover that I didn’t have any connectors to power a hard disk in this case. There was actually nowhere to mount my BarraCuda hard drive in that case! It hadn’t crossed my mind that a case could be built without support for such a rudimentary feature – a hard disk! Lesson learned – I must have at least 1 hard disk in addition to the SSD if I order a desktop computer.

These shortcomings aside, the ASRock > B550M-C motherboard in that desktop had more than enough SATA ports. I mounted the hard disk in my HPZ4 desktop (now there’s a well-designed case) and plugged one of the HP case power connectors to it. I then connected the SATA cable from the hard disk to my Skytech motherboard. Talk about a cowboy setup. Here’s the Disk Management view after starting Windows.

I use the “New Simple Volume” command to do an NTFS quick format on the 7630868 MB volume:

With my 8TB hard ddrive set up, I searched for how to move onedrive folder to different drive. This Microsoft Support result had directions on how to Change the location of your OneDrive folder. I set up OneDrive to download everything on OneDrive and to always keep the files locally. I disabled sleep mode on my desktop to let OneDrive download continuously. After downloading 896.5GB, I got this error.

Syncing did not resume until later the next day (it had been over 24hours by the time I tried again). It had never crossed my mind that there are such limits for these services – I just expect them to be available when I needed them, which is another reason to have a local copy of everything I have in the cloud. In the process of setting all this up, I realized that 200GB of the videos I had on OneDrive didn’t really need to be there, so I was able to free up enough space to meet my needs for some time into the foreseeable future.


Categories: Web Development

Synchronizing 2 HTML5 videos

I have seen many platforms playing multiple videos where the presenter and their screen are separate streams but they are kept in sync. Native HTML5 video support was relatively new when I last worked on web development so I decided to experiment with multiple videos in an HTML5 document. Here is the basic HTML page:

<!DOCTYPE html>
<html lang="en">
<title>Two Video Synchronization Demo</title>
<body>
    <!-- https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video -->
    <video controls id="video1" src="flower.mp4"></video>
    <video controls id="video2" src="nature.mp4"></video>
</body>
</html>

I ran this through the W3C Markup Validation Service and this snippet passed the check. My initial attempt closed the video tag using the <video ... /> style. The validator complained that “Self-closing syntax (/>) used on a non-void HTML element.” A search engine led me to Mozilla’s Void element page.

void element is an element in HTML that cannot have any child nodes (i.e., nested elements or text nodes). Void elements only have a start tag; end tags must not be specified for void elements.

This means that non-void elements should have a closing tag. It’s also strange to me that the controls attribute is required to show controls but I guess it makes sense to let that be, well, controllable.

To synchronize the videos, we need to ensure that playing one video results in the other playing as well. Likewise for pausing, seeking, and changing the playback speed. My first attempt at this was to add JavaScript to the head of the HTML document. I’m not using jQuery or any other libraries. Therefore, I ran into exceptions because the DOM wasn’t ready when my script was running. Vanilla JavaScript equivalent of jQuery’s $.ready() – how to call a function when the page/DOM is ready for it [duplicate] suggested putting the script after all the HTML elements. All this was once second nature to me back in the IE6 days but this suggestion is good enough for my experiment. The final page now looks like this (with links to the relevant events, properties, and methods):

<!DOCTYPE html>
<html lang="en">
<title>Two Video Synchronization Demo</title>
<body>
    <!-- https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video -->
    <video controls id="video1" src="flower.mp4"></video>
    <video controls id="video2" src="nature.mp4"></video>

    <script>
        const video1 = document.getElementById("video1");
        const video2 = document.getElementById("video2");

        // Handle the play event on each video to ensure that
        // when one video is played the other plays as well.
        video1.addEventListener("play", (event) => {
            video2.play();
        });
        video2.addEventListener("play", (event) => {
            video1.play();
        });

        // Handle the pause event on each video to ensure that
        // when one video is paused the other is paused as well.
        video1.addEventListener("pause", (event) => {
            video2.pause();
        });
        video2.addEventListener("pause", (event) => {
            video1.pause();
        });

        // Handle the ratechange event on each video to ensure that
        // when the playback rate of one video is changed,
        // the other is set to use the same rate.
        video1.addEventListener("ratechange", (event) => {
            video2.playbackRate = video1.playbackRate;
        });
        video2.addEventListener("ratechange", (event) => {
            video1.playbackRate = video2.playbackRate;
        });

        // Handle the seek event on each video to ensure that
        // when one video is seeked the other seeks to the same location.
        video1.addEventListener("seeked", (event) => {
            // Do not use fastSeek since we need precision.
            if (video1.paused && video2.paused) {
                video2.currentTime = video1.currentTime;
            }
        });
        video2.addEventListener("seeked", (event) => {
            if (video1.paused && video2.paused) {
                video1.currentTime = video2.currentTime;
            }
        });
    </script>
</body>
</html>

Notice that the last requirement on seeking is implemented slightly differently: I synchronized the current time in the videos only if they were both paused. This prevents weird behavior where the videos keep syncing to each other interrupting playback.

The last thing I wanted to do was lay out the videos so that one overlaps the other (in the top left or bottom right). I needed to add a style tag to the head of the document. I searched for how to put a div in the bottom right and the StackOverflow question How can I position my div at the bottom of its container? suggests absolute positioning in a container div. See the CSS in the final page below.

<!DOCTYPE html>
<html lang="en">
<title>Two Video Synchronization Demo</title>
<style>
    #video-container {
        position: relative;
    }
    #video1 {
        width: 20%;
        position:absolute;
        top: 0px;
        left: 0px;
    }
    #video2 {
        width: 100%;
    }
</style>
<body>
    <div id="video-container">
        <video controls id="video1" src="flower.mp4"></video>
        <video controls id="video2" src="nature.mp4"></video>
    </div>

    <script>
        const video1 = document.getElementById("video1");
        const video2 = document.getElementById("video2");

        // Handle the play event on each video to ensure that
        // when one video is played the other plays as well.
        video1.addEventListener("play", (event) => {
            video2.play();
        });
        video2.addEventListener("play", (event) => {
            video1.play();
        });

        // Handle the pause event on each video to ensure that
        // when one video is paused the other is paused as well.
        video1.addEventListener("pause", (event) => {
            video2.pause();
        });
        video2.addEventListener("pause", (event) => {
            video1.pause();
        });

        // Handle the ratechange event on each video to ensure that
        // when the playback rate of one video is changed,
        // the other is set to use the same rate.
        video1.addEventListener("ratechange", (event) => {
            video2.playbackRate = video1.playbackRate;
        });
        video2.addEventListener("ratechange", (event) => {
            video1.playbackRate = video2.playbackRate;
        });

        // Handle the seek event on each video to ensure that
        // when one video is seeked the other seeks to the same location.
        video1.addEventListener("seeked", (event) => {
            // Do not use fastSeek since we need precision.
            if (video1.paused && video2.paused) {
                video2.currentTime = video1.currentTime;
            }
        });
        video2.addEventListener("seeked", (event) => {
            if (video1.paused && video2.paused) {
                video1.currentTime = video2.currentTime;
            }
        });
    </script>
</body>
</html>

This was a useful HTML, JavaScript, and CSS refresher!


Categories: Containers, Java

Running my Factorization Java App in Docker

I want to evaluate the OpenJDK serial collector using a Java program I wrote to factorize natural numbers by trial division. This post is about how to set up the app to run in a Docker container on a Linux host. Since the host is a shared machine, I put all my work under ~/swesonga (my own custom home directory). The directory structure for the container will be under ~/swesonga/container/.

Set up the Factorization App

First, log into Linux machine and download the Java binaries to test:

ssh user@IPaddress
mkdir -p ~/swesonga/container/java/binaries/jdk/x64/
cd ~/swesonga/container/java/binaries/jdk/x64/

curl -Lo microsoft-jdk-21.0.5-linux-x64.tar.gz https://aka.ms/download-jdk/microsoft-jdk-21.0.5-linux-x64.tar.gz

tar xzf microsoft-jdk-21.0.5-linux-x64.tar.gz

Clone the factorize repo and set up its dependencies:

cd ~/swesonga/container/
git clone https://github.com/swesonga/factorize

cd ~/swesonga/container/java
curl -Lo commons-cli-1.9.0-bin.tar.gz https://dlcdn.apache.org//commons/cli/binaries/commons-cli-1.9.0-bin.tar.gz
tar xzf commons-cli-1.9.0-bin.tar.gz

Compile the factorization app:

export CLASSPATH=~/swesonga/container/java/commons-cli-1.9.0/commons-cli-1.9.0.jar:.
export JAVA21_HOME=~/swesonga/container/java/binaries/jdk/x64/jdk-21.0.5+11

cd ~/swesonga/container/factorize/java/project/src/main/java/org/swesonga/math

$JAVA21_HOME/bin/javac -d . PrimalityTest.java FactorizationUtils.java Factorize.java ExecutionMode.java

Set up the Docker Environment

Create the Dockerfile

Create a Dockerfile. See Dockerizing a Java Application | Baeldung and How To Dockerize Java Application (Step-by-Step Tutorial) for examples of how to do this. There are some OpenJDK images at Microsoft Artifact Registry (did I need to get my own JDK? maybe not but for now, I know where the JDK is and what is happening).

docker pull mcr.microsoft.com/openjdk/jdk:21-ubuntu

Create the Dockerfile below, substituting the appropriate value for <user>.

FROM mcr.microsoft.com/openjdk/jdk:21-ubuntu
COPY . /swesonga/
WORKDIR /swesonga/factorize/java/project/src/main/java/org/swesonga/math

ENTRYPOINT ["/swesonga/java/binaries/jdk/x64/jdk-21.0.5+11/bin/java", "-cp", "/swesonga/java/commons-cli-1.9.0/commons-cli-1.9.0.jar:.", "-Xint", "-XX:+UseSerialGC", "-XX:+UseCompressedOops", "-XX:HeapBaseMinAddress=0x120000000000", "-Xlog:gc*=debug,safepoint:file=serialgc-jdk21.log::filecount=0", "-Xlog:pagesize=trace:file=pagesize-jdk21.log::filecount=0", "-Xlog:os=trace:file=os-jdk21.log::filecount=0", "org.swesonga.math.Factorize", "-threads", "4", "-number", "7438880205542315091371423981777", "-systemGCFrequency", "1048576"]

If you create the Dockerfile on another machine, you can copy the Dockerfile to the Linux host using scp(1) – Linux manual page:

scp Dockerfile user@IPaddress:/home/<user>/swesonga/container/

Start Docker

Verify that docker is up by running docker version. I got this output:

user@machine:~/swesonga$ docker version
Client: Docker Engine - Community
 Version:           23.0.1
 API version:       1.42
 Go version:        go1.19.5
 Git commit:        a5ee5b1
 Built:             Thu Feb  9 19:46:56 2023
 OS/Arch:           linux/amd64
 Context:           default
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

Run sudo systemctl start docker as described at Start the daemon | Docker Docs.

Build the Docker Image

See docker buildx build | Docker Docs for details on how to build. I only need the -t option to tag the image as swesonga-jdk21-testapp.

cd ~/swesonga/container/
docker build -t swesonga-jdk21-testapp .

Here is some sample output:

user@machine:~/swesonga/container$ docker build -t swesonga-jdk21-testapp .
[+] Building 7.9s (6/6) FINISHED
 => [internal] load build definition from Dockerfile                                                                            0.0s
 => => transferring dockerfile: 669B                                                                                            0.0s
 => [internal] load .dockerignore                                                                                               0.0s
 => => transferring context: 2B                                                                                                 0.0s
 => [internal] load metadata for mcr.microsoft.com/openjdk/jdk:21-ubuntu                                                        0.4s
 => [1/2] FROM mcr.microsoft.com/openjdk/jdk:21-ubuntu@sha256:98b6af6a403a01d476ee579340d624dfaac70409f50080e36eb6d86603f0ed8c  7.2s
 => => resolve mcr.microsoft.com/openjdk/jdk:21-ubuntu@sha256:98b6af6a403a01d476ee579340d624dfaac70409f50080e36eb6d86603f0ed8c  0.0s
 => => sha256:6414378b647780fee8fd903ddb9541d134a1947ce092d08bdeb23a54cb3684ac 29.54MB / 29.54MB                                0.5s
 => => sha256:ac27f5b44782db802c5876054378d16318ba6ab095203e15acc7527778c85370 178.15MB / 178.15MB                              2.3s
 => => sha256:d42e3adbad90b3214756070b3e98acd724228f7e8d08344d7044c0788a185b66 1.38kB / 1.38kB                                  0.2s
 => => sha256:98b6af6a403a01d476ee579340d624dfaac70409f50080e36eb6d86603f0ed8c 683B / 683B                                      0.0s
 => => sha256:ab80e68248a29dec58a531a5ff5a5bb873bb96fe829ac6b17f46c6f2a05cef63 899B / 899B                                      0.0s
 => => sha256:6d6be45eade816f5be7fc8935372e429b035dfee5f6e386dd7f87e6430228554 3.90kB / 3.90kB                                  0.0s
 => => extracting sha256:6414378b647780fee8fd903ddb9541d134a1947ce092d08bdeb23a54cb3684ac                                       1.3s
 => => extracting sha256:ac27f5b44782db802c5876054378d16318ba6ab095203e15acc7527778c85370                                       4.6s
 => => extracting sha256:d42e3adbad90b3214756070b3e98acd724228f7e8d08344d7044c0788a185b66                                       0.0s
 => [2/2] WORKDIR /home/<user>/swesonga/factorize/java/project/src/main/java/org/swesonga/math                                     0.2s
 => exporting to image                                                                                                          0.0s
 => => exporting layers                                                                                                         0.0s
 => => writing image sha256:4c068055cad5759153f3d3677404b9729b8ddee1c7026aa30e88b8c58c228037                                    0.0s
 => => naming to docker.io/library/swesonga-jdk21-testapp

Start the Docker Container

Before starting the container: docker ps | Docker Docs. Then docker run | Docker Docs.

docker ps
docker run -i -t swesonga-jdk21-testapp

Troubleshoot any Docker Errors

One error I ran into initially was that docker was unable to start the container process. I had missed the COPY command in the Dockerfile so the file couldn’t be found:

user@machine:~/swesonga$ docker run -i -t swesonga-jdk21-testapp
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/home/<user>/swesonga/java/binaries/jdk/x64/jdk-21.0.5+11/bin/java": stat /home/<user>/swesonga/java/binaries/jdk/x64/jdk-21.0.5+11/bin/java: no such file or directory: unknown.
ERRO[0000] error waiting for container:

Fix the Dockerfile then docker build again. My ENTRYPOINT path still wasn’t correct so the same error appeared when re-running the application. I didn’t realize this though and thought that since I had changed the entrypoint, a cached build must still be in use. A delete cached docker build – Search pointed to the post on macos – Is there a way to clean docker build cache? – Stack Overflow. This is what I tried in my ignorance:

user@machine:~/swesonga$ docker system df
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          2         2         1.22GB    443.1MB (36%)
Containers      3         0         0B        0B
Local Volumes   0         0         0B        0B
Build Cache     9         0         776.9MB   776.9MB
user@machine:~/swesonga$ docker system df -v
Images space usage:

REPOSITORY               TAG       IMAGE ID       CREATED          SIZE      SHARED SIZE   UNIQUE SIZE   CONTAINERS
swesonga-jdk21-testapp   latest    682fedf54071   11 minutes ago   1.22GB    443.1MB       776.9MB       2
<none>                   <none>    4c068055cad5   26 minutes ago   443.1MB   443.1MB       0B            1

Containers space usage:

CONTAINER ID   IMAGE                    COMMAND                  LOCAL VOLUMES   SIZE      CREATED          STATUS    NAMES
c77b69082a8a   swesonga-jdk21-testapp   "/home/<user>/swesonga/…"   0               0B        6 minutes ago    Created   awesome_chatelet
58c723638dd2   swesonga-jdk21-testapp   "/home/<user>/swesonga/…"   0               0B        8 minutes ago    Created   sharp_lamport
4a3be55725b5   4c068055cad5             "/home/<user>/swesonga/…"   0               0B        17 minutes ago   Created   lucid_tharp

Local Volumes space usage:

VOLUME NAME   LINKS     SIZE

Build cache usage: 776.9MB
...

I tried pruning the build cache as suggested in that post.

user@machine:~/swesonga$ docker builder prune --all
WARNING! This will remove all build cache. Are you sure you want to continue? [y/N] y
ID                                              RECLAIMABLE     SIZE            LAST ACCESSED
te4o8rbj7s6nh6pluquzummzz                       true            0B              8 minutes ago
n2wbz4gf448fluw4uuogiqxdo*                      true    776.9MB         8 minutes ago
...
Total:  1.554GB

I realized that pruning wasn’t what I needed because now the cache was empty but the containers were still there based on the next output:

user@machine:~/swesonga$ docker system df -v
Images space usage:

REPOSITORY               TAG       IMAGE ID       CREATED          SIZE      SHARED SIZE   UNIQUE SIZE   CONTAINERS
swesonga-jdk21-testapp   latest    682fedf54071   18 minutes ago   1.22GB    443.1MB       776.9MB       2
<none>                   <none>    4c068055cad5   33 minutes ago   443.1MB   443.1MB       0B            1

Containers space usage:

CONTAINER ID   IMAGE                    COMMAND                  LOCAL VOLUMES   SIZE      CREATED          STATUS    NAMES
c77b69082a8a   swesonga-jdk21-testapp   "/home/<user>/swesonga/…"   0               0B        13 minutes ago   Created   awesome_chatelet
58c723638dd2   swesonga-jdk21-testapp   "/home/<user>/swesonga/…"   0               0B        15 minutes ago   Created   sharp_lamport
4a3be55725b5   4c068055cad5             "/home/<user>/swesonga/…"   0               0B        24 minutes ago   Created   lucid_tharp

Local Volumes space usage:

VOLUME NAME   LINKS     SIZE

Build cache usage: 0B

CACHE ID   CACHE TYPE   SIZE      CREATED   LAST USED   USAGE     SHARED
user@machine:~/swesonga$

I should have been using docker ps -a instead! The -a shows the existing containers (regardless of whether they are running).

user@machine:~/swesonga$ docker ps -a
CONTAINER ID   IMAGE          COMMAND                  CREATED          STATUS    PORTS     NAMES
c77b69082a8a   682fedf54071   "/home/<user>/swesonga/…"   16 minutes ago   Created             awesome_chatelet
58c723638dd2   682fedf54071   "/home/<user>/swesonga/…"   18 minutes ago   Created             sharp_lamport
4a3be55725b5   4c068055cad5   "/home/<user>/swesonga/…"   27 minutes ago   Created             lucid_tharp
user@machine:~/swesonga$

I started displaying the Dockerfile before building and one of the errors I ran into was because I hadn’t saved the Dockerfile. Sheesh.

cat Dockerfile
docker build -t swesonga-jdk21-testapp .
docker ps -a
docker run -i -t swesonga-jdk21-testapp

After all this experimentation, we can remove all the broken containers using docker container rm | Docker Docs to remove individual ones or docker container prune | Docker Docs to remove all stopped containers (use with caution)!

docker container prune

Collect Logs from the Container

I used CTRL+C to stop the container after some time. The next question was how to examine the files in the container. linux – Exploring Docker container’s file system – Stack Overflow suggests using docker container cp | Docker Docs. I opened another SSH session to the host running Docker then copied the logs from the container using these commands:

mkdir -p ~/swesonga/logs/tip/
cd ~/swesonga/logs/

docker ps -a
export CONTAINERID=XXXXXXXXXXX

docker cp $CONTAINERID:/swesonga/factorize/java/project/src/main/java/org/swesonga/math/serialgc-jdk21.log ~/swesonga/logs/

docker cp $CONTAINERID:/swesonga/factorize/java/project/src/main/java/org/swesonga/math/pagesize-jdk21.log ~/swesonga/logs/

docker cp $CONTAINERID:/swesonga/factorize/java/project/src/main/java/org/swesonga/math/os-jdk21.log ~/swesonga/logs/

I switched back to the development box and used scp to get the log files back to it.

scp user@IPaddress:/home/<user>/swesonga/logs/*.log .

When gathering multiple rounds of logs, it is convenient to group them into separate folders. I used this command:

export CURRDATE=`date +%Y-%m-%d_%H%M%S`; scp user@IPaddress:/home/<user>/swesonga/logs/*.log ./logs-$CURRDATE/

Setting Container Limits

To see the amount of RAM on the Docker host, run free -h since it’s an Ubuntu host. How do we set a RAM limit for a docker container? Docker – Setting Memory And CPU Limits points me to the --memory limit parameter of docker run | Docker Docs.

docker ps -a
docker run -i --memory 2GB -t swesonga-jdk21-testapp

Observe the head of the jdk21 GC log below. The total memory is now reported as 2048M. The maximum heap size is 25% of this total, as expected. The initial heap is 32MB and the minimum heap is 8MB.

[0.006s][info][gc,init] CardTable entry size: 512
[0.006s][debug][gc,heap] Minimum heap 8388608  Initial heap 33554432  Maximum heap 536870912
[0.006s][info ][gc     ] Using Serial
[0.006s][debug][gc,heap,coops] Protected page at the reserved heap base: 0x0000120000000000 / 2097152 bytes
[0.006s][debug][gc,heap,coops] Heap address: 0x0000120000200000, size: 512 MB, Compressed Oops mode: Non-zero disjoint base: 0x0000120000000000, Oop shift amount: 3
[0.007s][info ][gc,init      ] Version: 21.0.5+11-LTS (release)
[0.007s][info ][gc,init      ] CPUs: 20 total, 20 available
[0.007s][info ][gc,init      ] Memory: 2048M
[0.007s][info ][gc,init      ] Large Page Support: Disabled
[0.007s][info ][gc,init      ] NUMA Support: Disabled
[0.007s][info ][gc,init      ] Compressed Oops: Enabled (Non-zero disjoint base)
[0.007s][info ][gc,init      ] Heap Min Capacity: 8M
[0.007s][info ][gc,init      ] Heap Initial Capacity: 32M
[0.007s][info ][gc,init      ] Heap Max Capacity: 512M
[0.007s][info ][gc,init      ] Pre-touch: Disabled

The number of CPUs can be set using the --cpus option.

cat Dockerfile
docker build -t swesonga-jdktip-testapp-2core .
docker ps -a
docker run -i --cpus 2 --memory 2GB -t swesonga-jdktip-testapp-2core

Viewing Container Stats

One of the questions I need to answer is how much memory is in use and how much is available in the container. A view available memory in a docker container – Google Search pointed me to How to Use the Resource Usage Docker Extension. Turns out docker stats is good enough for my needs right now.

user@machine:~/swesonga$ docker stats

CONTAINER ID   NAME                      CPU %     MEM USAGE / LIMIT   MEM %     NET I/O     BLOCK I/O   PIDS
7f096bd5163d   condescending_goldstine   196.82%   26.03MiB / 2GiB     1.27%     806B / 0B   0B / 0B     13

Categories: Java

Investigating a Windows ZGC Test Failure

[JDK-8334475] UnsafeIntrinsicsTest.java#ZGenerationalDebug assert(!assert_on_failure) failed: Has low-order bits set – Java Bug System (openjdk.org) was brought to my attention a few months ago. I ran the test on commit b3bf31a0a08d on my x64 machine to see what happens:

cd /c/repos/scratchpad/scripts/java/tests

./run-jtreg-tests.sh /d/java/forks/openjdk/jdk /d/java/forks/openjdk/jdk/build/windows-x86_64-server-slowdebug/jdk  /c/java/binaries/jtreg-7.4+1/lib/jtreg.jar

That script shows the exact command line used to invoke the test:

/d/java/forks/openjdk/jdk/build/windows-x86_64-server-slowdebug/jdk/bin/java -Xmx512m -jar /c/java/binaries/jtreg-7.4+1/lib/jtreg.jar -agentvm -ignore:quiet -automatic -xml -vmoption:-Xmx512m -timeoutFactor:4 -concurrency:1 -testjdk:/d/java/forks/openjdk/jdk/build/windows-x86_64-server-slowdebug/jdk -verbose:fail,error,summary  test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java

The test passed on x64 but I wanted to see how to step into the test code itself so I tried this setup in Visual Studio 2022. Unfortunately, it was not straightforward to break into the correct process when launching the test in Visual Studio 2022.

Command: D:\java\forks\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\java.exe

Arguments: -Xmx512m -jar C:/java/binaries/jtreg-7.4+1/lib/jtreg.jar -agentvm -ignore:quiet -automatic -xml -vmoption:-Xmx512m -timeoutFactor:4 -concurrency:1 -testjdk:D:/java/forks/openjdk/jdk/build/windows-x86_64-server-slowdebug/jdk -verbose:fail,error,summary  test/hotspot/jtreg/compiler/gcbarriers/UnsafeIntrinsicsTest.java

Working Dir: D:/java/forks/openjdk/jdk

Test Execution Sequence

I decided to try to understand the test execution sequence and started by looking at the output generated when running the jtreg test. Notice that the CreateCoredumpOnCrash product flag is disabled. The first thing I did was to enable it so that there is a core dump to examine.

D:\java\forks\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\java 
-Dtest.vm.opts=-Xmx512m 
-Dtest.tool.vm.opts=-J-Xmx512m 
-Dtest.compiler.opts= 
-Dtest.java.opts= 
-Dtest.jdk=D:\java\forks\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk 
-Dcompile.jdk=D:\java\forks\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk 
-Dtest.timeout.factor=4.0 
-Dtest.root=D:\java\forks\openjdk\jdk\test\hotspot\jtreg 
-Dtest.name=compiler/gcbarriers/UnsafeIntrinsicsTest.java#ZGenerationalDebug 
-Dtest.file=D:\java\forks\openjdk\jdk\test\hotspot\jtreg\compiler\gcbarriers\UnsafeIntrinsicsTest.java 
-Dtest.src=D:\java\forks\openjdk\jdk\test\hotspot\jtreg\compiler\gcbarriers 
-Dtest.src.path=D:\java\forks\openjdk\jdk\test\hotspot\jtreg\compiler\gcbarriers;D:\java\forks\openjdk\jdk\test\lib 
-Dtest.classes=D:\java\forks\openjdk\jdk\JTwork\classes\compiler\gcbarriers\UnsafeIntrinsicsTest_ZGenerationalDebug.d 
-Dtest.class.path=D:\java\forks\openjdk\jdk\JTwork\classes\compiler\gcbarriers\UnsafeIntrinsicsTest_ZGenerationalDebug.d;D:\java\forks\openjdk\jdk\JTwork\classes\test\lib 
-Dtest.class.path.prefix=D:\java\forks\openjdk\jdk\JTwork\classes\compiler\gcbarriers\UnsafeIntrinsicsTest_ZGenerationalDebug.d;D:\java\forks\openjdk\jdk\test\hotspot\jtreg\compiler\gcbarriers;D:\java\forks\openjdk\jdk\JTwork\classes\test\lib 
-Dtest.modules=java.base/jdk.internal.misc:+open --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-opens java.base/jdk.internal.misc=ALL-UNNAMED
-Xmx512m
-XX:+UseZGC
-XX:+ZGenerational
-XX:+UnlockDiagnosticVMOptions
-XX:+ZVerifyOops
-XX:ZCollectionInterval=1
-XX:-CreateCoredumpOnCrash
-XX:CompileCommand=dontinline,*::mergeImpl* com.sun.javatest.regtest.agent.MainWrapper D:\java\forks\openjdk\jdk\JTwork\compiler\gcbarriers\UnsafeIntrinsicsTest_ZGenerationalDebug.d\main.0.jta

The core dump wasn’t particularly helpful. I worked on simplifying the test so that only the failing scenarios remained. The log in the JBS issue was interpreted code only so I added the -Xint argument and could still reproduce the failure on Windows AArch64.

I came back to the idea of capturing the full command line for the final java.exe process that runs the test. Is there a way to log process start on Windows to capture all command lines? Copilot cited PowerShell and Command Line Logging | LogRhythm, which suggested enabling the use of Event ID 4688: a new process has been created.

However, the event viewer didn’t have command line arguments when I did this, which is what I needed. I just looked up Event ID 4688 and found 4688(S) A new process has been created. – Windows 10 | Microsoft Learn, which explains that you must enable “Administrative Templates\System\Audit Process Creation\Include command line in process creation events” group policy to include command line in process creation events. Hmm, I definitely didn’t do that when I was investigating this issue. I just tried this and it does the trick!

I went back to deciphering the jtreg test execution output: which class was this com.sun.javatest.regtest.agent.MainWrapper? The only openjdk/jdk match for MainWrapper.java appeared to be jdk/test/hotspot/jtreg/vmTestbase/nsk/share/MainWrapper.java at b3bf31a0a08da679ec2fd21613243fb17b1135a9 · openjdk/jdk (github.com). However, it’s from a different package! Copilot informed me that the MainWrapper class was actually jtreg/src/share/classes/com/sun/javatest/regtest/agent/MainWrapper.java at master · openjdk/jtreg (github.com)

Callstack Investigation

I stepped back to the callstack in the JBS issue and decide to trace it. Here are the links to the sources:

---------------  T H R E A D  ---------------

Current thread (0x0000020efe6a7fb0):  JavaThread "main"             [_thread_in_vm, id=19132, stack(0x0000003c68f00000,0x0000003c69000000) (1024K)]

Stack: [0x0000003c68f00000,0x0000003c69000000]
Native frames: <unavailable>
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  java.lang.Object.clone()Ljava/lang/Object;+0 java.base
j  java.util.Arrays.copyOfRange([BII)[B+11 java.base
j  java.lang.String.<init>(Ljava/lang/AbstractStringBuilder;Ljava/lang/Void;)V+32 java.base
j  java.lang.StringBuilder.toString()Ljava/lang/String;+16 java.base
j  sun.nio.cs.StandardCharsets.toLower(Ljava/lang/String;)Ljava/lang/String;+121 java.base
j  sun.nio.cs.StandardCharsets.lookup(Ljava/lang/String;)Ljava/nio/charset/Charset;+44 java.base
j  sun.nio.cs.StandardCharsets.charsetForName(Ljava/lang/String;)Ljava/nio/charset/Charset;+6 java.base
j  java.nio.charset.Charset.lookup2(Ljava/lang/String;)Ljava/nio/charset/Charset;+39 java.base
j  java.nio.charset.Charset.lookup(Ljava/lang/String;)Ljava/nio/charset/Charset;+40 java.base
j  java.nio.charset.Charset.isSupported(Ljava/lang/String;)Z+1 java.base
j  java.lang.System.initPhase1()V+37 java.base
v  ~StubRoutines::call_stub 0x0000020e81410180
Lock stack of current Java thread (top to bottom):
LockStack[0]: sun.nio.cs.StandardCharsets 
{0x0000040000018d28} - klass: 'sun/nio/cs/StandardCharsets'
 - ---- fields (total size 5 words):
 - private 'classMap' 'Ljava/util/Map;' @16  null (0x0000000000000000)
 - private 'aliasMap' 'Ljava/util/Map;' @24  null (0x0000000000000000)
 - private 'cache' 'Ljava/util/Map;' @32  null (0x0000000000000000)

There is a generated class, sun.nio.cs.StandardCharsets.charsetForName, in the stack above! I was puzzled about why jdk/src/java.base/share/classes/java/nio/charset/StandardCharsets.java at b3bf31a0a08da679ec2fd21613243fb17b1135a9 · openjdk/jdk (github.com) did not contain that method. This failure reminds me of another failure I was investigating a while back:

$JAVA_HOME/bin/java -jar /c/java/binaries/jtreg-7.4+1/lib/jtreg.jar -agentvm -timeoutFactor:4 -concurrency:4 -verbose:fail,error,summary -testjdk:/d/java/forks/openjdk/jdk/build/windows-x86_64-server-release/jdk -nativepath:/d/java/forks/openjdk/jdk/build/windows-x86_64-server-release/support/test/jdk/jtreg/native/lib test/jdk/java/foreign/TestLinker.java

In the midst of all this wrangling, I decide to write a simple test to spit out the value of the sun.jnu.encoding property. That test runs fine so next step is to use the same flags as the failing jtreg test. However, as I added the ZGC flags to the test command line, I realized that I hadn’t even tried those flags with the java -version. What an oversight! The bug reproduces without running any specific program!

$JAVA_HOME/bin/java -XX:+UseZGC -XX:+ZGenerational -XX:+ZVerifyOops -version

Debugging Minimal Reproducible Failure

Since the assert happens when cloning an array, I decided to find the native implementation of the clone method. Searched for “clone_” and the only relevant hit (from a quick glance) appeared to be in the LinkResolver::check_method_accessability method.

I had been working on the latest commits and decided to try this test on jdk21u. It failed there as well! I tried it on jdk17 but it terminated with “Unrecognized VM option ‘ZGenerational'”. Blaming jdk/src/hotspot/share/gc/shared/gc_globals.hpp at b3bf31a0a08da679ec2fd21613243fb17b1135a9 · openjdk/jdk (github.com) pointed to 8307058: Implementation of Generational ZGC · openjdk/jdk@d20034b (github.com). Looks like jdk21 was the first release with this flag as per [JDK-8307058] Implementation of Generational ZGC – Java Bug System (openjdk.org).

Back to the LinkResolver: it looked like cloning is being dispatched through the Java object but what I had been looking for is a native call. Nothing about cloning at Arrays (Java SE 11 & JDK 11 ) (oracle.com) other than the fact that the method is inherited from the Object class. Nothing helpful in jdk/src/java.base/share/classes/java/util/Arrays.java or jdk/src/java.base/share/classes/java/lang/reflect/Array.java either. jdk/src/java.base/share/native/libjava/Array.c looked promising though. Methods like Java_java_lang_reflect_Array_getByte use the java.lang.reflect.Array JNIEXPORTs declared in jdk/src/hotspot/share/include/jvm.h. Browsing through the corresponding implementations in jdk/src/hotspot/share/prims/jvm.cpp led me to the clone implementation: JVM_Clone! This is the setup I used in the Visual Studio 2022 debugger:

Command: C:/java/forks/openjdk/jdk/build/windows-aarch64-server-slowdebug/jdk/bin/java.exe

Arguments: -XX:+UseZGC -XX:+ZGenerational -XX:+ZVerifyOops -version

Working Dir: C:/java/forks/openjdk/jdk

The failure appeared to be happening in this statement: HeapAccess<>::clone(obj(), new_obj_oop, size). I got this callstack by stepping into the calls in the slowdebug build because the is_valid function is inlined in the fastdebug build, preventing me from setting a breakpoint in it.

jvm.dll!is_valid(zaddress addr, bool assert_on_failure) Line 298	C++
jvm.dll!assert_is_valid(zaddress addr) Line 321	C++
jvm.dll!to_zaddress(unsigned __int64 value) Line 339	C++
jvm.dll!to_zaddress(oopDesc * o) Line 345	C++
jvm.dll!ZBarrierSet::AccessBarrier<270432,ZBarrierSet>::clone_in_heap(oopDesc * src, oopDesc * dst, unsigned __int64 size) Line 433	C++
jvm.dll!AccessInternal::PostRuntimeDispatch<ZBarrierSet::AccessBarrier<270400,ZBarrierSet>,9,270400>::access_barrier(oopDesc * src, oopDesc * dst, unsigned __int64 size) Line 200	C++
jvm.dll!AccessInternal::RuntimeDispatch<270400,oopDesc *,9>::clone_init(oopDesc * src, oopDesc * dst, unsigned __int64 size) Line 349	C++
jvm.dll!AccessInternal::RuntimeDispatch<270400,oopDesc *,9>::clone(oopDesc * src, oopDesc * dst, unsigned __int64 size) Line 533	C++
jvm.dll!AccessInternal::PreRuntimeDispatch::clone<270400>(oopDesc * src, oopDesc * dst, unsigned __int64 size) Line 890	C++
jvm.dll!AccessInternal::clone<262144>(oopDesc * src, oopDesc * dst, unsigned __int64 size) Line 1181	C++
jvm.dll!Access<262144>::clone(oopDesc * src, oopDesc * dst, unsigned __int64 size) Line 212	C++
jvm.dll!JVM_Clone(JNIEnv_ * env, _jobject * handle) Line 698	C++
00000298b2bbf056()	Unknown
00000298a2ee5590()	Unknown
000000c053dfe908()	Unknown

Here’s the stack from stepping into the fastdebug assembly (note that line numbers might be off since this is not slowdebug):

jvm.dll!report_vm_error(const char * file, int line, const char * error_msg, const char * detail_fmt, ...) Line 181	C++
[Inline Frame] jvm.dll!oop::check_oop() Line 89	C++
[Inline Frame] jvm.dll!oop::on_usage() Line 91	C++
[Inline Frame] jvm.dll!oop::obj() Line 103	C++
[Inline Frame] jvm.dll!oop::operator=(const oop &) Line 114	C++
jvm.dll!Copy::pd_conjoint_oops_atomic(const oop * from, oop * to, unsigned __int64 count) Line 120	C++
[Inline Frame] jvm.dll!Copy::pd_conjoint_jlongs_atomic(const __int64 *) Line 106	C++
[Inline Frame] jvm.dll!Copy::conjoint_jlongs_atomic(const __int64 *) Line 142	C++
jvm.dll!AccessInternal::arraycopy_conjoint_atomic<__int64>(__int64 * src, __int64 * dst, unsigned __int64 length) Line 104	C++
jvm.dll!RawAccessBarrier<270432>::clone(oop src, oop dst, unsigned __int64 size) Line 331	C++
[Inline Frame] jvm.dll!BarrierSet::AccessBarrier<270432,ZBarrierSet>::clone_in_heap(oop) Line 313	C++
jvm.dll!ZBarrierSet::AccessBarrier<270432,ZBarrierSet>::clone_in_heap(oop src, oop dst, unsigned __int64 size) Line 451	C++
jvm.dll!AccessInternal::PostRuntimeDispatch<ZBarrierSet::AccessBarrier<270400,ZBarrierSet>,9,270400>::access_barrier(oop src, oop dst, unsigned __int64 size) Line 200	C++
jvm.dll!AccessInternal::RuntimeDispatch<270400,oop,9>::clone_init(oop src, oop dst, unsigned __int64 size) Line 349	C++
[Inline Frame] jvm.dll!AccessInternal::RuntimeDispatch<270400,oop,9>::clone(oop) Line 532	C++
[Inline Frame] jvm.dll!AccessInternal::PreRuntimeDispatch::clone(oop) Line 889	C++
[Inline Frame] jvm.dll!AccessInternal::clone(oop) Line 1180	C++
jvm.dll!Access<262144>::clone(oop src, oop dst, unsigned __int64 size) Line 211	C++
jvm.dll!JVM_Clone(JNIEnv_ * env, _jobject * handle) Line 698	C++
000001a156d3a45c()	Unknown

ZBarrierSet::AccessBarrier<270432,ZBarrierSet>::clone_in_heap is interesting because it verifies that the source is a valid z-address! Let’s take a look at the memory region and see what is there…

I looked at this and searched for UTF-8 byte 0xBD meaning – Search (bing.com) but that didn’t turn up anything meaningful. I noticed that count is 3 on my Aarch64 device so we are copying 3 oops. Was this expected given that the count is 4 on Windows x64? More importantly though, the big question I have now is why doesn’t slowdebug step into the oop checking code when pressing F11 on line 120 in the screenshot? I expected the behavior of oop::on_usage to be different, not that it wouldn’t be called at all! When browsing the sources in VSCode on my x64 desktop, I clicked on the oop type (of the src argument) and it took me to a typedef of class oopDesc*. That’s when I spotted the CHECK_UNHANDLED_OOPS ifndef. The fastdebug build must have this defined! The only non-cpp/hpp file that contains CHECK_UNHANDLED_OOPS is jdk/make/hotspot/lib/JvmFlags.gmk. Sure enough, it is only defined for fastdebug. This means that I should be able to enable it for slowdebug and release and verify whether the behavior is present there. We can therefore configure a slowdebug build with the --with-extra-cflags=-DCHECK_UNHANDLED_OOPS option.

date; time bash configure --with-jtreg=/cygdrive/c/java/binaries/jtreg-7.4+1 --with-gtest=/cygdrive/c/repos/googletest --with-boot-jdk=/cygdrive/c/java/binaries/jdk/x64/jdk-22.0.1+8 --openjdk-target=aarch64-unknown-cygwin --with-debug-level=slowdebug --with-extra-cflags=-DCHECK_UNHANDLED_OOPS

time /cygdrive/c/repos/scratchpad/scripts/java/cygwin/build-jdk.sh windows aarch64 0 slowdebug

Looking at the actual pointer, I noticed that its value is the data “cp1252..”. After further investigation, I conclude that the bug is that we’re calling oops::operator= instead of just copying the values! I test a fix that simply copies the values directly and it works! The test passes even with the -DCHECK_UNHANDLED_OOPS option!

On to the next question: is there anything else using the pd_conjoint_oops_atomic function (and will it be negatively affected by my change)? While searching for “pd_conjoint_oops_atomic”, I notice that some platforms have an assert that oops == long or smth like that. There are 2 users of pd_conjoint_oops_atomic:

  1. Copy::pd_conjoint_oops_atomic > AccessInternal::arraycopy_conjoint_oops > RawAccessBarrierArrayCopy::arraycopy. The last method’s definition is based on templates so I continue with a regex search for “Raw::.*arraycopy”. Many of the results are related to arraycopy_in_heap. Looking at all the barrierSet results makes me realize I could just search JBS for “arraycopy_in_heap”.
  2. pd_arrayof_conjoint_oops in jdk/src/hotspot/os_cpu/windows_aarch64/copy_windows_aarch64.hpp.

One idea is to run Java programs on a build linked with /PROFILE (Performance Tools Profiler) i.e. configured via --with-extra-ldflags=-profile to enable collection of code coverage data then confirming that those functions are executed. That seems cumbersome though so I try to create some array copying code to see if I can get to those functions (using breakpoints but none are hit). After taking a break: I wonder if I can just search from the bottom instead? Looking in jvm.cpp for copy reveals the JVM_ArrayCopy function. Here is my Java program:

public class CopyArray {
    public static void main(String[] args) {
        int length = 0xdeadc0d;
        int srcPos = 0;

        if (args.length > 0) {
            try {
                int userLength = Integer.parseInt(args[0]);
                length = userLength;
            }
            catch (Throwable e) {
                System.err.println("Ignoring invalid user arguments.");
            }
        }

        byte[] src = new byte[length];

        for (int i = 0; i < src.length; i++) {
            src[i] = (byte)(i % 256);
        }
        byte[] dest = new byte[length];
        System.arraycopy(src, srcPos, dest, 0, length);
    }
}

I debugged the JVM running this program on my x64 machine. This hits the target function, JVM_ArrayCopy but there are so many callers. I have to set a condition on the breakpoint (hence the magic value of the length above) before I can step in to see where my call goes. Here are the source paths (note the different commit)

jvm.dll!Copy::pd_conjoint_bytes_atomic(const void * from, void * to, unsigned __int64 count) Line 119	C++
jvm.dll!Copy::conjoint_memory_atomic(const void * from, void * to, unsigned __int64 size) Line 53	C++
jvm.dll!AccessInternal::arraycopy_conjoint_atomic<void>(void * src, void * dst, unsigned __int64 length) Line 164	C++
jvm.dll!RawAccessBarrierArrayCopy::arraycopy<136585312,void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, void * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, void * dst_raw, unsigned __int64 length) Line 298	C++
jvm.dll!RawAccessBarrier<136585312>::arraycopy<void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, void * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, void * dst_raw, unsigned __int64 length) Line 308	C++
jvm.dll!AccessInternal::PreRuntimeDispatch::arraycopy<136587328,void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, void * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, void * dst_raw, unsigned __int64 length) Line 834	C++
jvm.dll!AccessInternal::PreRuntimeDispatch::arraycopy<136585280,void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, void * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, void * dst_raw, unsigned __int64 length) Line 867	C++
jvm.dll!AccessInternal::arraycopy_reduce_types<136585280,void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, void * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, void * dst_raw, unsigned __int64 length) Line 1008	C++
jvm.dll!AccessInternal::arraycopy<136577024,void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, const void * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, void * dst_raw, unsigned __int64 length) Line 1172	C++
jvm.dll!Access<136577024>::arraycopy<void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, const void * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, void * dst_raw, unsigned __int64 length) Line 147	C++
jvm.dll!ArrayAccess<134217728>::arraycopy<void>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, unsigned __int64 length) Line 301	C++
jvm.dll!TypeArrayKlass::copy_array(arrayOop s, int src_pos, arrayOop d, int dst_pos, int length, JavaThread * __the_thread__) Line 170	C++
jvm.dll!JVM_ArrayCopy(JNIEnv_ * env, _jclass * ignored, _jobject * src, int src_pos, _jobject * dst, int dst_pos, int length) Line 307	C++
00000244ea690702()	Unknown

Copy::conjoint_memory_atomic is interesting because it has a comment indicating that copying bytes is not aligned and so there is no need to be atomic. The if statements in that method indicate that I can change the size of elements in the array to call different paths. Looks like I need to create an array of objects.

/**
export JAVA_HOME=~/java/binaries/jdk/x64/jdk-21.0.2+13
$JAVA_HOME/bin/javac CopyArray.java
$JAVA_HOME/bin/java CopyArray
 */

public class CopyArray {
    public static void main(String[] args) {
        int length = 0xdead;
        int srcPos = 0;

        if (args.length > 0) {
            try {
                int userLength = Integer.parseInt(args[0]);
                length = userLength;
            }
            catch (Throwable e) {
                System.err.println("Ignoring invalid user arguments.");
            }
        }

        Object[] src = new Object[length];

        for (int i = 0; i < src.length; i++) {
            src[i] = new Object();
        }
        Object[] dest = new Object[length];
        System.arraycopy(src, srcPos, dest, 0, length);
    }
}

Now we are closer to the array_oops code I was trying to hit:

jvm.dll!Copy::pd_conjoint_jints_atomic(const int * from, int * to, unsigned __int64 count) Line 52	C++
jvm.dll!Copy::conjoint_oops_atomic(const narrowOop * from, narrowOop * to, unsigned __int64 count) Line 155	C++
jvm.dll!AccessInternal::arraycopy_conjoint_oops(narrowOop * src, narrowOop * dst, unsigned __int64 length) Line 54	C++
jvm.dll!RawAccessBarrierArrayCopy::arraycopy<50331750,HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 234	C++
jvm.dll!RawAccessBarrier<52715622>::arraycopy<enum narrowOop>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, narrowOop * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, narrowOop * dst_raw, unsigned __int64 length) Line 308	C++
jvm.dll!RawAccessBarrier<52715622>::oop_arraycopy<enum narrowOop>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, narrowOop * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, narrowOop * dst_raw, unsigned __int64 length) Line 130	C++
jvm.dll!ModRefBarrierSet::AccessBarrier<35938406,CardTableBarrierSet>::oop_arraycopy_in_heap<enum narrowOop>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, narrowOop * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, narrowOop * dst_raw, unsigned __int64 length) Line 109	C++
jvm.dll!AccessInternal::PostRuntimeDispatch<CardTableBarrierSet::AccessBarrier<35938406,CardTableBarrierSet>,8,35938406>::oop_access_barrier<HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 142	C++
jvm.dll!AccessInternal::RuntimeDispatch<35938374,HeapWordImpl *,8>::arraycopy(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 517	C++
jvm.dll!AccessInternal::PreRuntimeDispatch::arraycopy<35938374,HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 871	C++
jvm.dll!AccessInternal::arraycopy_reduce_types<35938372>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 1018	C++
jvm.dll!AccessInternal::arraycopy<35913732,HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * const * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 1172	C++
jvm.dll!Access<35913728>::oop_arraycopy<HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * const * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 136	C++
jvm.dll!ArrayAccess<33554432>::oop_arraycopy(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, unsigned __int64 length) Line 327	C++
jvm.dll!ObjArrayKlass::do_copy(arrayOop s, unsigned __int64 src_offset, arrayOop d, unsigned __int64 dst_offset, int length, JavaThread * __the_thread__) Line 197	C++
jvm.dll!ObjArrayKlass::copy_array(arrayOop s, int src_pos, arrayOop d, int dst_pos, int length, JavaThread * __the_thread__) Line 282	C++
jvm.dll!JVM_ArrayCopy(JNIEnv_ * env, _jclass * ignored, _jobject * src, int src_pos, _jobject * dst, int dst_pos, int length) Line 307	C++
00000202d3f00502()	Unknown
00000202cc269950()	Unknown

In this call stack, the arraycopy_conjoint_oops(narrowOop* src, narrowOop* dst, size_t length) implementation that is called has narrow oops because of the branch in ObjArrayKlass::copy_array. Launch the application using these arguments instead:

C:\java\forks\openjdk\jdk\build\windows-x86_64-server-slowdebug\jdk\bin\java.exe -XX:-UseCompressedOops CopyArray

Now the code block of interest is hit! Hmm, I’m realizing that I should have been explicit about the collector to use. This was debugging the G1 collector (chosen ergonomically).

Notice that ZGC has different code for copying the array on Aarch64 (my fix-jlongs branch based on openjdk/jdk at ee839b7f0ebe471d3877cddd2c87019ccb8ee5ae).

jvm.dll!ZBarrierSet::AccessBarrier<52715590,ZBarrierSet>::oop_arraycopy_in_heap_no_check_cast(zpointer * dst, zpointer * src, unsigned __int64 length) Line 371	C++
jvm.dll!ZBarrierSet::AccessBarrier<35938374,ZBarrierSet>::oop_arraycopy_in_heap(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, zpointer * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, zpointer * dst_raw, unsigned __int64 length) Line 403	C++
jvm.dll!ZBarrierSet::AccessBarrier<35938374,ZBarrierSet>::oop_arraycopy_in_heap(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, oop * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, oop * dst_raw, unsigned __int64 length) Line 128	C++
jvm.dll!AccessInternal::PostRuntimeDispatch<ZBarrierSet::AccessBarrier<35938374,ZBarrierSet>,8,35938374>::oop_access_barrier<HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 142	C++
jvm.dll!AccessInternal::RuntimeDispatch<35938374,HeapWordImpl *,8>::arraycopy(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 517	C++
jvm.dll!AccessInternal::PreRuntimeDispatch::arraycopy<35938374,HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 871	C++
jvm.dll!AccessInternal::arraycopy_reduce_types<35938372>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 1018	C++
jvm.dll!AccessInternal::arraycopy<35913732,HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * const * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 1172	C++
jvm.dll!Access<35913728>::oop_arraycopy<HeapWordImpl *>(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, HeapWordImpl * const * src_raw, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, HeapWordImpl * * dst_raw, unsigned __int64 length) Line 136	C++
jvm.dll!ArrayAccess<33554432>::oop_arraycopy(arrayOop src_obj, unsigned __int64 src_offset_in_bytes, arrayOop dst_obj, unsigned __int64 dst_offset_in_bytes, unsigned __int64 length) Line 327	C++
jvm.dll!ObjArrayKlass::do_copy(arrayOop s, unsigned __int64 src_offset, arrayOop d, unsigned __int64 dst_offset, int length, JavaThread * __the_thread__) Line 198	C++
jvm.dll!ObjArrayKlass::copy_array(arrayOop s, int src_pos, arrayOop d, int dst_pos, int length, JavaThread * __the_thread__) Line 290	C++
jvm.dll!JVM_ArrayCopy(JNIEnv_ * env, _jclass * ignored, _jobject * src, int src_pos, _jobject * dst, int dst_pos, int length) Line 308	C++
000002602e8c06a8()	Unknown

I need to turn off compressed oops with the serial collector as well but it looks like there is no check_oop_function for the serial collector. That said, this exploration of array copying code was insightful, showing how the data type sizes determine which path is taken for copying primitives and objects. There didn’t appear to be any red flags about removing the oop::operator= usage so I opened 8334475: UnsafeIntrinsicsTest.java#ZGenerationalDebug assert(!assert_on_failure) failed: Has low-order bits set by swesonga · Pull Request #20390 · openjdk/jdk (github.com) to fix the assertion failure. The most interesting part of this investigation was that the bad address was a data value (cp1252) staring right at me and I missed it. This was quite educational for me though.


Categories: Equine

Used Truck and Trailer Purchase Notes

One of the downsides of horse ownership is the cost. The cost of the horse is just the starting point. Transporting the horse is a non-trivial cost. We needed to buy a trailer and many of the trailers we looked at are heavy enough that we needed to buy a truck as well. This raised the question of what the minimum required towing capacity would be. It’s strange that these trucks are classified using tons. What Does Half-Ton, Three-Quarter-Ton, One-Ton Mean When Talking About Pickup Trucks? | Cars.com gives me the impression that tonnage is a historical artifact of payload measurement. A ton seems to be a reference to a Short ton – Wikipedia. It is also interesting that Toyota and Nissan don’t really have offerings above the half-ton classification. The approximate weight of the trailer (and the horse) that I want to transport (or alternatively, our camper) requires at least a 3/4-ton truck.

I can only afford used trucks though, so which models should I avoid? bad years for cars – Google Search did not have as many helpful results as I had hoped for. Used Cars to Avoid in 2024 | U.S. News (usnews.com) only covered 2019 and newer models. At least Worst Vehicles | CarComplaints.com had a list of vehicles to avoid. Used cars to avoid, ranked by Consumer Reports – Autoblog: Car News, Reviews and Buying Guides also had such a list and also recommended a pre-purchase inspection. I read the list of tips they linked to: How to buy a used car — 9 tips for the best deal – Autoblog: Car News, Reviews and Buying Guides. I was left wondering how you would know if there was a lean on the car? The title should show this (as per Guide to Car Liens: Lien Titles, Lienholders & More – CARFAX). It also suggests a Carfax Vehicle History Report – sounds like a good idea, not sure how costly though. Based on these resources, I concluded that we need to have the title and a bill of sale. Some due diligence related to the title: all persons on the title must sign/release the interest on the back of the title. The seller needs to print their name and sign beneath their printed name. At the end of the day, it turns out that we don’t really need a bill of sale.

One of the trucks we considered is the 2012 F 150 Towing Capacity Full Guide (with Charts) (truckauxiliary.com). The concern here was that even though it had a towing package, it was still a half-ton truck. It was also likely to be outside our budget, so we didn’t really wait for that seller to give us a price. We also looked at a 2006 RAM 2500. Our mechanic took a look at it and exclaimed that everything that could possibly leak on that vehicle was leaking (transmission, power steering, etc). That was an easy pass given that it was already at the top of our price range.

Fortunately, the next truck we looked at worked out. Cylinder 6 was misfiring on this truck (we could hear the tick) and this was confirmed by the digital codes from the vehicle. We decided to buy a new spark plug and coil for that cylinder to see if we could fix it before driving off with the truck but neither AutoZone nor Oreilly Autoparts had the right coil in stock. We walked out with just the spark plug and our mechanic replaced it. In the process of pulling out the old spark plug, the spark plug wire came apart, and we had to buy an entire Duralast Silicone Spark Plug Wire Set. Thankfully, that was all that was needed to address the cylinder misfiring. We were glad to have our mechanic available to fix that problem before we drove off with our “new” 2002 truck (through a private sale). We had also confirmed with our insurance company that the new vehicle was covered as we drove it away and that we had up to 5 days to add it to our insurance plan.

Ironically, the towing hitch was significantly damaged on this truck (one of our most important requirements). However, the mechanic pointed out that it can be readily replaced (just don’t do any welding on the existing setup since its integrity cannot be guaranteed). The only question I have remaining is how to compute the tongue weight (came up when we were looking up new hitches online). What is Tongue Weight and What Does it Mean for Safe Towing? explains various ways to determine the tongue weight. They also recommend their weigh-safe hitch, which has a built-in scale (I like how convenient this hitch makes it). This is the video they linked discussing this option.

Behind the scenes of the Ike Gauntlet: How to measuring Tongue Weight for Safe Towing

I decide to buy the Adjustable Trailer Hitch, Adjustable Ball Mount, Truck Ball Hitch, Aluminum Trailer Tow Hitch w/Built in Scale for Anti Sway, 12,500 lbs GTW – Weigh Safe (what a mouthful). I wasn’t sure whether to get the aluminum or steel hitch even after reading Aluminum Vs. Steel Hitches so I settled on that one because I couldn’t even find the steel equivalent. We’ll see how it holds up.


Categories: Linux, Programming

SFrame Resources

I have been learning about the SFrame tracing effort and figured I should document the resources I have reviewed. Indu Bhagat has been actively involved in the development of SFrame. This is one of her talks giving an overview of the objectives of SFrame. The overall idea is that profiling tools (e.g. perf) usually need to generate stack traces. She lists some methods used to generate stack traces, e.g. using frame pointers, EH frame, last branch record (LBR), and other heuristics. Each of these have their own advantages and pitfalls. SFrame encodes the minimal info required for stack tracing.

SFrame: The Simple Frame Stack Trace Format – Indu Bhagat, Oracle – YouTube

I found additional videos by searching for sframe indu (there are lots of unrelated sframe results out there). This one by Steven and Indu covers potential issues that need to be addressed for JITted code.

Implementing sframes – Steven Rostedt, Indu Bhagat (youtube.com)

There are various informative discussions about SFrame out there, e.g.

  1. SFrame based stack tracer for user space in the kernel [LWN.net]
  2. Fedora’s tempest in a stack frame [LWN.net]

Indu and Steven also talk about SFrame at the Linux Storage, Filesystem, MM & BPF Summit | LF Events (linuxfoundation.org). This video feels a bit more detailed than the previous one. In this video, it is explicitly outlined that frame pointers require setup in every function and increase register pressure. The cost of this cites the Fedora’s tempest in a stack frame [LWN.net] article. Steven also discusses how the Orc unwinder was created to solve the need for accurate stack unwinding. This is needed, for example, for live kernel patching (see What is Linux kernel live patching? (redhat.com) for more info). SFrame is based on orc unwinder but for user space.

Sframe – Steven Rostedt, Indu Bhagat – YouTube

These resources provide a high-level view of stack tracing/unwinding concerns. Tools like async-profiler/async-profiler: Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events (github.com) will probably look very different if they switch to using SFrame.


Categories: Equine

Horse Purchase Contracts

We decided to sell our horse a few months ago and buy another horse better suited to drill riding. The topic of which contract to use when buying a horse came up naturally. More specifically, the seller of the horse we were interested in wanted a right of first refusal (which I didn’t understand). My first go-to was the Horse purchase agreement – YouTube search. The video on Sales Fraud in the Horse Industry (youtube.com) was exactly what I needed. I’m summarizing the key points in this post so that I don’t have to watch the whole video again.

Sales Fraud in the Horse Industry

Some issues that horse buyers run into:

  1. Training and disposition are misrepresented. The buyer didn’t seek enough info, or the seller wasn’t clear/transparent (e.g. about horse vices)
  2. Horse is smaller/larger than represented, e.g. with minis where the seller doesn’t make a representation about the horse (or it is misrepresented).
  3. Horse being drugged during buyer’s evaluation.

She gives advice on reducing disputes by sellers ensuring advertisements are true and accurate. One of the behaviors mentioned (that buyers complain about) is cribbing, which I don’t think I have heard of before.

What is cribbing, and how to stop your horse from cribbing

Some recommendations from the video include:

  1. put things in writing
  2. avoid one-size-fits-all forms (e.g. are they valid in your state?)
  3. don’t leave major terms to guesswork
  4. specify the buyer, seller, price, terms, and the horse (registered date, foaling date)
  5. avoid underage contract signers (such contracts will not be enforceable in most places).

She also reviews the buyer perspective, the top tip being “if it’s important to you, get it in writing.” Any seller not willing to put things in writing (e.g. the horse is bomb proof) is a red flag. I had to research the meaning of this term but got the gist of it from Bombproof – Western Horseman and The Myth of the “Bomb-Proof” Horse – Horse Trail Chicks. Other things to consider putting in writing are medical conditions:

  1. if the horse has a medical colic.
  2. whether the horse needs to receive joint injections (that’s a thing?).

Other recommendations include:

  1. having the seller’s name and signature on the contract (thus ensuring promises are not just from a seller’s agent, who might not have really known the horse) and to specify who pays the seller’s agent’s commission.
  2. hiring an independent vet to examine the horse before buying (e.g. to avoid paying a lot for a horse that has been denerved).
  3. getting a drug screen.

She delves into the topic of releases, giving an example of a closed head injury that resulted in institutionalization of the rider, but the release didn’t use the language required to make it enforceable! Definitely caught my attention.

Other Recommendations

This was a great talk overall so I looked up the advertised references: My Horse University Online Horse Management and My Horse University | Facebook appear to have tons of resources on all matters equine. I’m also inspired to learn about types of horse insurance.