Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interactive pipeline building #76

Open
dezren39 opened this issue Dec 17, 2023 · 6 comments
Open

interactive pipeline building #76

dezren39 opened this issue Dec 17, 2023 · 6 comments

Comments

@dezren39
Copy link

i believe what i want is variables, but this is a special environment.

something akin to

$filteredData := grep /dev/kafka/local/topics/books [ .word_count == 300 || .title ~= 'the' ]  # <-- this doesn't execute, it stores the execution
$filteredData # <-- executes, user eventually ctrl+c
$filteredData | each { book -> http post https://example.com/new_books "{\"book_id\": #{$book.id}}" } # <-- executes
cat /dev/kafka/local/topics/page_views | enrich { data -> $filteredData "https://api.country.is/#{$view.ip_address}" } <-- ? executes a new stream A*B and outputs all results ???

and then a way to flatten/export a single command i can put somewhere else / export history of repl?

@dezren39
Copy link
Author

ah i found this!

https://docs.typestream.io/reference/language/experiments#variables

which is close to what i want too! is there a language construct for a stream yet?

@dezren39
Copy link
Author

maybe it is equivalent to
$ typestream run 'cat /dev/kafka/local/topics/books | cut .title > /dev/kafka/local/topics/book_titles'
and maybe something like

  • a way to do typestream run from within the repl
  • a way to do typestream run 'anonymously' (with variables from the repl)

@lucapette
Copy link
Contributor

yes the language supports variables via the let keyword (interested in why you couldn't find it at first since I'm sure there are a number of ways to improve docs discoverability. It's probably a little hard to say?)

What you're suggesting is in line with my thinking. I have some notes about it where what you're proposing would look like this:

let long_books = $(cat /dev/kafka/local/topics/books | grep [ .word_count > 100_000])
cat $long_books | grep "the" > /dev/kafka/local/topics/long_books_containing_the

the idea being that $() would store into long_books an anonymous function that resolves into a DataStream (the technical term TypeStream uses for the "streaming data" variable type)

Is this what you have in mind? It's quite some work to get this done but it's high on my list. Maybe only one other feature ("relational databases filesystem mounting") may coming sooner that this.

This conversatsion makes me think I should be probably making my "private roadmap" public. (the docs contain a roadmap page but it's pretty meagre compared to my own notes.

One more question 🙏

and then a way to flatten/export a single command i can put somewhere else / export history of repl?

I'm unsure I understand this. I have a todo for adding a --print-session flag to the history command to make it easier for people to copy their session out of a TypeStream session but I'm not entirely sure that's what you're asking. I'd love to hear more about it

@dezren39
Copy link
Author

dezren39 commented Dec 18, 2023

yes the language supports variables via the let keyword (interested in why you couldn't find it at first since I'm sure there are a number of ways to improve docs discoverability. It's probably a little hard to say?)

I think I looked at specifications, data operators, the how to guides, kind of skimmed most of it, but had skipped over experiments altogether somehow. 😅

the idea being that $() would store into long_books an anonymous function that resolves into a DataStream (the technical term TypeStream uses for the "streaming data" variable type)
Is this what you have in mind? [..]

Yeah basically!!

This conversatsion makes me think I should be probably making my "private roadmap" public. (the docs contain a roadmap page but it's pretty meagre compared to my own notes.
relational databases filesystem mounting

👀awesome

I'm unsure I understand this. I have a todo for adding a --print-session flag to the history command to make it easier for people to copy their session out of a TypeStream session but I'm not entirely sure that's what you're asking. I'd love to hear more about it

Awesome! Yeah basically a way to print the session would be all I would really need. I can imagine that there might be a lot of varied things going on in one session though, and so something that gave 'heres the program that produces the provided variable' could be useful. I can imagine that might be more complicated than it sounds though! maybe a built-in function that outputs to a file, you provide a variable, it ?walks backward? find any other variables called, unwraps them, then produces a string of itself??? or just like, you provide a variable, it finds all other variables mentioned, makes a file with the 5 lines covering the full expression of just this one thing.

a = 1
b = 2
d = 3 <-- not used, pretend there were like, many more 'experiments'
c = a + b <-- final output thing, i want this to be my program
compile(c, out.txt)
out.txt
a = 1
b = 2
c = a + b
or maybe
c = 1 + 2
or
1 + 2

@lucapette
Copy link
Contributor

all right, there are two maybe three issues here:

  • $() anonymous function syntax (and var type) support
  • --print-session (and docs for history command are missing I think) option
  • a "clever" print out of the current session

Those are all valid ideas imo, the first two I had already planned myself so I'll be adding issues about those as soon as I have a minute for that.

The last one is a bit more difficult to figure out so I'm thinking I'll leave this issue open while I get ready to share a public roadmap (and start opening issues to signal which features are coming). That way I have some time to think about the "clever" session printout (I like the idea very much! My focus is dev experience with TypeStream and this feels great if we get it right)

@lucapette
Copy link
Contributor

I made the roadmap public (even though it's quite bare, still better than private I think). I also just shipped to main the --print-session #88 (so it'll be out with the next release, since it's already wednesday early next week).

I have private notes for the $() syntax and will be opening an issue soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants