Skip to content
2 changes: 1 addition & 1 deletion docs/StardustDocs/topics/Compiler-Plugin.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ is displayed when you hover on an expression or variable:

### @DataSchema declarations

Untyped DataFrame can be assigned a data schema - top-level interface or class that describes names and types of columns in the dataframe.
Untyped DataFrame can be assigned a data schema - top-level interface or data class that describes names and types of columns in the dataframe.

```kotlin
@DataSchema
Expand Down
8 changes: 4 additions & 4 deletions docs/StardustDocs/topics/Home.topic
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,16 @@

<primary>
<title>First steps</title>
<a href="SetupKotlinNotebook.md"/>
<a href="SetupGradle.md"/>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm maybe we should keep the setup kotlin notebook for a while but just move it downward

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a Quickstart guide reference at the first place, so I believe it's ok.

<a href="concepts.md"/>
<a href="operations.md"/>
<a href="read.md">Reading from files: CSV, JSON, ApacheArrow</a>
<a href="read.md">Reading from files: CSV, JSON, Excel and Apache Arrow</a>
</primary>

<secondary>
<title>Featured topics</title>
<a href="Kotlin-DataFrame-Features-in-Kotlin-Notebook.md"/>
<a href="Compiler-Plugin.md"/>
<a href="schemas.md"/>
<a href="Compiler-Plugin.md">Kotlin Compiler Plugin</a>
<a href="Data-Sources.md"/>
<a href="readSqlDatabases.md"/>
</secondary>
Expand Down
85 changes: 73 additions & 12 deletions docs/StardustDocs/topics/extensionPropertiesApi.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

When working with a [`DataFrame`](DataFrame.md), the most convenient and reliable way
to access its columns — including for operations and retrieving column values
in row expressions — is through *auto-generated extension properties*.
in [row expressions](DataRow.md#row-expressions) — is through *auto-generated extension properties*.
They are generated based on a [dataframe schema](schemas.md),
with the name and type of properties inferred from the name and type of the corresponding columns.
It also works for all types of hierarchical dataframes.
Expand All @@ -24,10 +24,17 @@ It also works for all types of hierarchical dataframes.
Consider a simple hierarchical dataframe from
<resource src="example.csv"></resource>.

This table consists of two columns: `name`, which is a `String` column, and `info`,
which is a [**column group**](DataColumn.md#columngroup) containing two nested
[value columns](DataColumn.md#valuecolumn) —
`age` of type `Int`, and `height` of type `Double`.
> Note that this is not a regular CSV file — it contains a column with embedded JSON values.
>
> To read such files correctly, both the [`dataframe-csv`](Modules.md#dataframe-csv)
> and [`dataframe-json`](Modules.md#dataframe-json) modules must be included.
> {style="note"}

This dataframe consists of two columns:
- `name`, which is a `String` column
- `info`, which is a [column group](DataColumn.md#columngroup) containing two nested [value columns](DataColumn.md#valuecolumn):
- `age` of type `Int`
- `height` of type `Double`

<table width="705">
<thead>
Expand Down Expand Up @@ -119,24 +126,27 @@ You can do it quickly with [`generate..()` methods](DataSchemaGenerationMethods.
Define schemas:

```kotlin
// Data schema of the "info" column group
@DataSchema
data class PersonInfo(
val age: Int,
interface Info {
val age: Int
val height: Float
)
}

// Data schema of the entire DataFrame
@DataSchema
data class Person(
val info: PersonInfo,
interface Person {
val info: Info
val name: String
)
}
```
```

Read the [`DataFrame`](DataFrame.md) from the CSV file and specify the schema with
[`.convertTo()`](convertTo.md) or [`cast()`](cast.md):

```kotlin
val df = DataFrame.readCsv("example.csv").convertTo<Person>()
val df = DataFrame.readCsv("example.csv").cast<Person>()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an interesting example... CSV is inherently flat, yet we have a nested type here XD This can only occur if there's json inside the csv, which is not that common

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe specify that it's not "just a CSV file", but that it contains a JSON column

```

Extensions for this `DataFrame` will be generated automatically by the plugin,
Expand Down Expand Up @@ -229,3 +239,54 @@ interface Info {
val df = dataFrameOf("size\nin:inches" to listOf(..)).cast<Info>()
df.sizeInInches
```

## Custom extension properties

Sometimes it is useful to define your own extension properties
based on a [data schema](schema.md).

For example, consider a simple dataframe with two columns and the following `BranchData` schema:

```kotlin
@DataSchema
interface BranchData {
val expenses: Long
val revenue: Long
}
```

```kotlin
val df = DataFrame.readCsv("branchData.csv").cast<BranchData>()
```

You can define an extension property for `DataRow<BranchData>`
to create a convenient shortcut:

```kotlin
val DataRow<BranchData>.profit get() = revenue - expenses
```

You can then use it, for example, in [row expressions](DataRow.md#row-expressions):

```kotlin
val dfProfitable = df.filter { it.profit > 0 }
```

Note that if you change the actual schema of a dataframe
(by performing operations that modify its structure),
this extension property can no longer be used,
because it is tied to the specific schema.

```kotlin
df.add("name") { "branchName" }
// unresolved because of `add`
.filter { it.profit > 0 }
```

However, you can work around this by casting back to the original schema:

```kotlin
df.add("name") { "branchName" }
// unresolved because of `add`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment no longer applicable here

.filter { it.cast<BranchData>().profit > 0 }
```
Loading
Loading