# Dive Kit Open Datasets Documentation

Welcome! This guide will help you understand and contribute to the Dive Kit Open diving certification datasets. No technical expertise required!

## Table of Contents

- [What are these datasets?](#what-are-these-datasets)
- [Understanding the Data](#understanding-the-data)
  - [Agencies Dataset](#agencies-dataset)
  - [Certifications Dataset](#certifications-dataset)
  - [Cylinders Dataset](#cylinders-dataset)
  - [Dive Signals Dataset](#dive-signals-dataset)
- [How to Read the Data](#how-to-read-the-data)
- [How to Contribute](#how-to-contribute)
- [Examples](#examples)
- [Common Questions](#common-questions)

## What are these datasets?

The Dive Kit Open project maintains four main datasets:

1. **Agencies** - A list of all scuba diving certification agencies (like PADI, SSI, NAUI)
2. **Certifications** - A comprehensive list of diving certifications offered by these agencies
3. **Cylinders** - Specifications for common scuba cylinders (volume, pressure, buoyancy)
4. **Dive Signals** - Diver communication signals (hand, light, and buddy-contact signals) with openly licensed illustrations

These datasets help developers, dive shops, and diving platforms standardize diving information across the industry.

## Understanding the Data

### Agencies Dataset

The agencies dataset (`datasets/agencies.json`) contains information about diving certification agencies.

Each agency entry includes:

- **id**: A unique identifier (e.g., "agency-padi")
- **name**: The official agency name (e.g., "Professional Association of Diving Instructors")
- **abbr**: The common abbreviation (e.g., "PADI")
- **website**: The agency's official website
- **status**: Whether the agency is "active", "merged", or "defunct"
- **logo**: Each agency has a logo file in `assets/agency-logos/`

### Certifications Dataset

The certifications dataset (`datasets/certifications.json`) contains all diving certifications.

Each certification entry includes:

- **id**: A unique identifier (e.g., "cert-padi-ow")
- **agency**: Which agency issues this cert (e.g., "agency-padi" or "PADI")
- **name**: The official certification name (e.g., "Open Water Diver")
- **abbr**: Common abbreviation (e.g., "OW")
- **category**: Type of certification:
  - `recreational` - Basic diving certifications
  - `technical` - Advanced/deep diving
  - `cave` - Cave diving specialties
  - `rescue` - Emergency/rescue training
  - `professional` - Instructor/divemaster levels
  - `freediving` - Breath-hold diving
  - `specialty` - Specific skills (photography, wreck, etc.)
- **prerequisites**: What you need before taking this course
- **limits**: What you can do with this certification
- **equivalent_to**: Similar certifications from other agencies

### Cylinders Dataset

The cylinders dataset (`datasets/cylinders.json`) contains specifications for common scuba cylinders.

Each cylinder entry includes:

- **id**: A unique identifier (e.g., "al80")
- **commonName**: The name divers use (e.g., "AL80")
- **waterVolumeL**: The cylinder's **true internal water volume** in liters, the physical space inside. This is the real figure, not a number tuned so that the ideal-gas math reproduces the cylinder's marketed name.
- **workingPressureBar**: Rated working pressure in bar
- **ratedCapacityCuft** (optional): The **marketed cubic-foot name** carried by imperial cylinders (an "AL80" carries `80`). It comes from the ideal gas law and is a label, not a measured deliverable.
- **material**: "steel", "aluminum", "composite", or "carbon_fiber"
- **emptyWeightKg** and **buoyancyKg**: Weight and buoyancy characteristics (full, at 50 bar, empty)

**On "free gas" capacity.** This dataset stores the physical facts (water volume and working pressure) plus the marketed name, and leaves the deliverable free gas for the consumer to derive. There is no single capacity number, because the answer depends on two choices:

1. **Gas law.** The ideal gas law (`free gas = waterVolumeL × workingPressureBar`) overstates a high-pressure cylinder. A real-gas model divides by a compressibility factor Z (about 1.03 for air at 207 bar), which is closer to the truth. An AL80 (11.1 L, 207 bar) is about 80 cu ft by the ideal law and about 77 to 79 cu ft real.
2. **Surface reference.** Free gas is measured at the surface, but is that 1 bar or 1 atmosphere (1.01325 bar)? The 1-atm convention behind US manufacturer charts (Luxfer lists ~77.4 cu ft for an AL80) reads about 1.3% lower than a 1-bar convention.

So one AL80 can legitimately read 80 (marketed, ideal), ~77.4 (Luxfer, real gas at 1 atm), or ~79 (real gas at 1 bar), all from the same `waterVolumeL` and `workingPressureBar`. Store the physics; pick your convention when you display it.

### Dive Signals Dataset

The dive signals dataset (`datasets/dive-signals.json`) is a machine-readable index of diver
communication signals, each paired with a vector illustration in `assets/dive-signals/`.

Each signal entry includes:

- **id**: A unique identifier (e.g., "hand-core-ok")
- **category**: One of:
  - `hand-core` - Essential hand signals every diver learns (OK, problem, up, share air)
  - `hand-technical` - Technical-diving hand signals (deco, gas switch, numbers 0-9)
  - `hand-fish-id` - Fun hand signals for pointing out marine life (turtle, shark, octopus)
  - `light` - Torch signals for night diving (OK circle, attention, emergency)
  - `touch-contact` - Buddy-contact (touch) signals for low or no visibility
- **name**: The signal's English name
- **description**: How the signal is performed and what it means
- **image**: Path to the signal's SVG illustration

The illustrations are licensed **CC BY 4.0**: you may use them anywhere, including commercially,
with the credit "Dive signals by Project Dive Kit — https://divekit.app". See
`assets/dive-signals/LICENSE.md`.

> **Safety note:** signal meanings vary slightly between training agencies and regions. Always
> agree on signals with your buddy or team before the dive. This dataset documents common usage;
> it is not a substitute for training.

## How to Read the Data

The data is stored in JSON format, which looks like this:

```json
{
  "id": "cert-padi-ow",
  "agency": "agency-padi",
  "name": "Open Water Diver",
  "abbr": "OW",
  "category": "recreational",
  "status": "active"
}
```

Think of it like a form where:

- Each line is a field name followed by its value
- Text values are in quotes
- The whole entry is wrapped in curly braces `{}`

### Understanding References

When you see a value starting with:

- `agency-` → This refers to an agency in the agencies dataset
- `cert-` → This refers to another certification

If a value doesn't start with these prefixes, it's just plain text.

For example:

- `"agency": "agency-padi"` → Links to PADI in the agencies dataset
- `"agency": "PADI"` → Just the text "PADI"

## How to Contribute

We welcome contributions! Here's how to help:

### 1. Reporting Issues

If you find incorrect information:

1. Go to the [GitHub repository](https://github.com/lazuli-global/divekit-open-data)
2. Click "Issues" → "New Issue"
3. Describe what's wrong (e.g., "PADI Advanced Open Water max depth is 30m, not 40m")

### 2. Adding New Agencies

To add a new agency:

1. **Check if it already exists** - Search the `agencies.json` file
2. **Prepare the information**:

   - Official agency name
   - Common abbreviation
   - Official website
   - Logo image (PNG or SVG preferred)

3. **Add the agency entry** to `datasets/agencies.json`:

```json
{
  "id": "agency-example",
  "name": "Example Diving Agency",
  "abbr": "EDA",
  "website": "https://example-diving.org",
  "status": "active"
}
```

4. **Add the logo** to `assets/agency-logos/` named `agency-example.png`

### 3. Adding New Certifications

To add a new certification:

1. **Check if it already exists** - Search the `certifications.json` file
2. **Gather information**:

   - Official certification name
   - Which agency issues it
   - Prerequisites (if any)
   - Maximum depth/limits
   - Equivalent certifications from other agencies

3. **Add the certification** to `datasets/certifications.json`:

```json
{
  "id": "cert-example-advanced",
  "agency": "agency-example",
  "name": "Advanced Diver",
  "abbr": "AD",
  "category": "recreational",
  "status": "active",
  "prerequisites": {
    "certifications": ["cert-example-open-water"],
    "general": ["Minimum age 15 years", "10 logged dives"]
  },
  "limits": ["Maximum depth 30m", "No decompression diving"]
}
```

### 4. Updating Existing Data

To fix or update existing information:

1. Find the entry in the appropriate file
2. Make your changes
3. Ensure the format stays the same (quotes, commas, etc.)
4. Submit your changes

### 5. Validation

After making changes, validate your data:

1. Make sure all quotes and commas are in the right places
2. Check that IDs follow the format: `agency-` or `cert-` plus lowercase letters and hyphens
3. Verify that referenced agencies/certifications exist
4. Ensure unique agency names and abbreviations (no two agencies can have the same name or abbreviation)
5. Ensure unique certification names per agency (each agency can only have one certification with a given name)

#### Uniqueness Constraints

**For Agencies:**

- Each agency must have a unique `name` - no two agencies can share the same official name
- Each agency must have a unique `abbr` (abbreviation) - no two agencies can use the same abbreviation

**For Certifications:**

- Within each agency, certification names must be unique
- Within each agency, certification abbreviations must be unique
- Different agencies CAN have certifications with the same name (e.g., both PADI and SSI have "Open Water Diver")
- Different agencies CAN have certifications with the same abbreviation (e.g., both PADI and SSI can have "AOW")
- The combination of agency + certification name must be unique
- The combination of agency + certification abbreviation must be unique

**Running Validation:**

The project includes a validation script that checks all these constraints:

```bash
./scripts/validate.sh
```

This script will:

- Validate JSON syntax and schema compliance
- Check for duplicate IDs
- Verify agency names and abbreviations are unique
- Ensure certification names and abbreviations are unique within each agency
- Confirm all agency logos exist

## Examples

### Example: Finding Equivalent Certifications

Let's say you have a PADI Open Water certification and want to know the SSI equivalent:

1. Find the PADI Open Water entry in `certifications.json`
2. Look at the `equivalent_to` field
3. You'll see it lists "SSI Open Water Diver"

### Example: Understanding Prerequisites

To see what you need for PADI Advanced Open Water:

```json
{
  "id": "cert-padi-aow",
  "prerequisites": {
    "certifications": ["cert-padi-ow"],
    "general": ["Minimum age 12 years"]
  }
}
```

This means you need:

- PADI Open Water certification (or equivalent)
- To be at least 12 years old

### Example: Agency Status

Some agencies have merged or closed:

```json
{
  "id": "agency-example",
  "status": "merged",
  "replaced_by": "agency-other"
}
```

This tells you that this agency merged with another one.

## Common Questions

### Q: Why do some fields have null or empty values?

A: Not all information is available for every certification or agency. We add data as we verify it.

### Q: Can I add certifications from my local dive shop?

A: No, we only include certifications from recognized training agencies that issue official certification cards.

### Q: What's the difference between "deprecated" and "renamed" status?

A:

- **Deprecated**: The certification is no longer offered but existing certs are still valid
- **Renamed**: The certification still exists but under a new name

### Q: How often is the data updated?

A: We review and update the data regularly. Check the `meta.version` field in each file to see when it was last updated.

### Q: Can I use this data in my own project?

A: Yes! Check the LICENSE.md file for details. The data is open source.

### Q: What if I find conflicting information?

A: Please report it as an issue. Include links to official sources so we can verify the correct information.

### Q: What are common validation errors?

A: Here are the most common validation errors and how to fix them:

1. **Duplicate agency name/abbreviation**: Two agencies have the same name or abbreviation
   - Fix: Check if one is a typo or if they're actually the same agency
2. **Duplicate certification name within agency**: An agency has two certifications with the same name
   - Fix: Often these are different levels (e.g., "Rescue Diver" vs "Master Rescue Diver") - make the names distinct
3. **Duplicate certification abbreviation within agency**: An agency has two certifications with the same abbreviation

   - Fix: Each certification needs a unique abbreviation within its agency (e.g., "RD" vs "MRD")

4. **Missing agency logo**: An agency exists in the dataset but has no logo file

   - Fix: Add a PNG or SVG logo to `assets/agency-logos/` with the agency ID as filename

5. **Invalid ID format**: IDs must follow the pattern `agency-xxx` or `cert-xxx`
   - Fix: Use only lowercase letters, numbers, and hyphens after the prefix

## Need More Help?

- **Questions?** Open a discussion on [GitHub](https://github.com/lazuli-global/divekit-open-data/discussions)
- **Found a bug?** Report it in [Issues](https://github.com/lazuli-global/divekit-open-data/issues)
- **Want to contribute code?** See [CONTRIBUTING.md](../CONTRIBUTING.md)

Remember: Every contribution helps make diving certification information more accessible to everyone! 🤿
