240: Old Man Coder

Links from the show:

This episode of PHPUgly was sponsored by:

Cloudways, a managed cloud hosting platform built for your PHP projects.
If you simply wish to focus on your business, Cloudways is the way to go. They take over server management and security and free up time that you can dedicate to growing your business and acquiring new clients.
The Platforms offers a choice of IaaS partners (AWS, Google Cloud, Digitalocean, Linode, and Vultr). In addition, you get a performance-optimized stack, managed backups, and staging environment where you can test your code before pushing it to live servers.
Best of all, Composer and Git come pre-installed so you can get your projects up and running quickly.
All this power, simplicity, and peace of mind falls right with their brand slogan - Moving Dreams Forward
Be sure to visit https://www.cloudways.com/en/php-hosting.php?id=833038 today. Sign up using the Promo code PHPUgly and get a $25 credit.

PHPUgly streams the recording of this podcast live. Typically every Thursday night around 9 PM PT. Come and join us, and subscribe to our Youtube Channel, Twitch, or Periscope. Also, be sure to check out our Patreon Page.

Twitter Account https://twitter.com/phpugly

Host:

Streams:

Powered by Restream

Patreon Page

PHPUgly Anthem by Harry Mack / Harry Mack Youtube Channel


PHP Internals News: Episode 88: Pure Intersection Types

PHP Internals News: Episode 88: Pure Intersection Types

In this episode of "PHP Internals News" I talk with George Peter Banyard (Website, Twitter, GitHub, GitLab) about the "Pure Intersection Types" RFC that he has proposed.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is Episode 88. Today I'm talking with George Peter Banyard about pure intersection types. George, could you please introduce yourself?

George Peter Banyard 0:30

Hello, my name is George Peter Banyard. I work on PHP code development in my free time. And on the PHP Docs.

Derick Rethans 0:36

This RFC is about intersection types. What are intersection types?

George Peter Banyard 0:40

I think the easiest way to explain intersection types is to use something which we already have, which are union types. So union types tells you I want X or Y, whereas intersection types tell you that I want X and Y to be true at the same time. The easiest example I can come up with is a traversable that you want to be countable as well. So traversable and countable. Currently, you can do intersection types in very hacky ways. So you can either create a new interface which extends both traversable and countable, but then all the classes that you want to be using this fashion, you need to make them implement the interface, which might not be possible if you using a library or other things like that. The other very hacky way of doing it is using reference and typed properties. You assign two typed properties by reference, one being traversable, one being countable, and then your actual property, you type alias reference it, with both of these properties. And then my PHP will check: does the property respect type A those reference? If yes, move to the next one. It doesn't respect type B, which basically gives you intersection types.

Derick Rethans 1:44

Yeah, I saw that in the RFC. And I was wondering like, well, people actually do that?

George Peter Banyard 1:49

The only reason I know that is because of Nikita's slide.

Derick Rethans 1:51

The thing is, if it is possible, people will do it, right. And that's how that works.

George Peter Banyard 1:56

Yeah, most of the times.

Derick Rethans 1:57

The RFC isn't actually called intersection types. It's called pure intersection types. What does the word pure do here?

George Peter Banyard 2:05

So the word pure here is not very semantic. But it's more that you cannot mix union types and intersection types together. The reasons for it are mostly technical. One reason is how do you mix and match intersection types and union types? One way is to have like union types take precedence over intersection types, but some people don't like that and want to explicit it grouping all the time. So you need to do parentheses, A intersection B, close parentheses, pipe for the union, and then the other type. But I think the main reason is mostly the variance, like the variance checks for inheritance are already kind of complicated and kind of mind boggling.

Derick Rethans 2:44

I'm sure we'll get into the variance rules in a moment. What is it actually what you're proposing to add here. What is the syntax, for example?

George Peter Banyard 2:52

So the syntax is any class type with an ampersand, and any other class type gives you an intersection type, which is the usual way of doing and.

Derick Rethans 3:01

When you say class types, do you also mean interfaces?

George Peter Banyard 3:04

Yes, PHP has a concept of class types, which are mostly any class in any interface. There's also a weird exception where parent and self are considered class types, but those are not allowed.

Derick Rethans 3:20

Okay, so it's just the classes that you've defined and the class that are part of the language but not a special keywords, self and parent and static, I suppose?

George Peter Banyard 3:28

Yes, the reason for that is standard types are not allowed to be part of an intersection, because nothing can be an integer and a string at the same time. Now, there are some of the built in types, which can be kind of true. You could have a callable, which is a string, because callables can be arrays, or can be a closure. But that's like very weird and not very great. The other one is iterable. If when you expand that out, you get redundant types, which we can talk about later. And the final thing is parent, self, and static, just makes for some very weird design questions, in my opinion, like, if you ask for something to be an intersection with itself, you basically can only enforce conditions on subclasses. You have a class and you say: Oh, I want it to return self, but also be countable for some reason, but I'm not countable. So if you extend me, then you need to be countable, but I'm not. So it's very weird. parent has kind of the very same weird semantics where you can ask a parent, but it's like, if the base class doesn't support it, and you ask for a parent to be an intersection, then you basically need the child to implement the interface and then a child to return the first child. If you do that main question. Why? Because I don't see any good reasons to do it. And it just makes everything harder.

Derick Rethans 4:40

You've only added for the sake of completeness instead of it being useful. Let's move on birds. You've mentioned which types are supported, which is class names and interface names. You already hinted a little bit at redundant types. What are redundant types?

George Peter Banyard 4:56

Currently, PHP already does that with union types. If you repeat the type twice in a union, you'll get a compile error. This only affects compiled time known aliases. If you use a use statement, then PHP knows that you basically using the same type. However you use a runtime alias, then it can't detect that.

Derick Rethans 5:13

A runtime alias, what's that?

George Peter Banyard 5:15

So if you use the function class_alias.

Derick Rethans 5:16

It's new to me!

George Peter Banyard 5:18

it technically exists. It also doesn't guarantee basically that the type is minimal, because it can only see those was in its own file. For example, if you say I want A and B, but B is a child class of A, then the intersection basically resolves to only B. But you can only know that at runtime if classes are defined in different files. So the type isn't minimal. But if you do redundant types, basically, it's a easy way to check if you might be typing a bug.

Derick Rethans 5:46

You try to do your best to warn people about that. But you never know for certain.

George Peter Banyard 5:51

You never know for certain because PHP doesn't compile everything into like one big program like in check. Static analyser can help for that.

Derick Rethans 5:59

Let's talk a little bit about technical aspects, because I recommend that implementing intersection types are quite different from implementing union types. What kind of hacks that you have to make in a parser and compiler for this?

George Peter Banyard 6:11

Our parser has being very weird. The parsing syntax should be the same as union types. So I just copy pasted what Nikita did. I tried it. It worked for return types without an issue. It didn't work with argument types, because bison, which is the tool which generates our parser, was giving a shift reduce conflict, which basically tells: Oh, I got two possible states I can go in, and I don't know which branch I need to go, because the PHP parser only does one look ahead. Because it was conflicting, the ampersand, either for the intersection type or for to mark a reference. Normally, if the paster is more developed, or does more look ahead, it is not a conflict. And it shouldn't be. Ilia managed to came up with this ingenious idea, which is just redefine the ampersand token twice and have very complicated names, and just use them in different contexts. And bison just: now I have no issue. It is the same token, it is the same character. Now that you have two different tokens it manages to disambiguate, like it's shift produce. So that's a very weird.

Derick Rethans 7:17

I'll have a look at what that actually does, because I'm curious now myself. Beyond the parser, I think the biggest and most complicated part of this is implementing the variance rules for these intersection types. Can you give a short summary of what a variance rules are, and potentially how you've actually implemented them?

George Peter Banyard 7:38

Since PHP seven point four, return types and up covariant, and parameter types are contravariant. Covariant means you can like restrict, we can be more specific. And contravariance means you can be broader or like more generic. Union types already gives some interesting covariance implications. Usually, you would think, well, a union is always broader than a single type, you say: Oh, I want either a traversable or accountable, it seems that you're expanding the type sphere. However, a single type can have as a subtype, a union type. For example, you say,:Oh, my base type is a Class A, and I have two child classes, which are B and C. I can type covariantly that I want either B or C, because B or C is more specific than just A. That's what union types over there allows you to do. And the way how it's implemented. And how to check for that is you traverse the list of child types, and check that the child type is an instance of at least one of the parents types. An intersection by virtue of you adding constraints on the type itself will always be more specific than just a single type. If you say: Oh, I want a class A, then more specifically, so I want something of class A and I want it to be countable. So you're already restrict this, which gives some very interesting implications, meaning that a child type can have more types attached to itself than a parent type. That's mostly due how PHP implements its type system, to make the distinctions, basically, I've added the flag, which is either this is a union, meaning that you need to check it is part of one, or it's an intersection. The thing with intersection types is that you need to reverse the order in how you check the types. So you basically need to check that the parent is at least an instance of one of the child types, but not that none of the child types is a super type of the parent type. Let's say you have class C, which extends Class B and Class B extends Class A. If I say let's say my base type is B to any function, and I give something which is a intersection T, any interface, this would not be a valid subtyping relation to underneath B. Because if you looked it was a Venn diagram in some sense, you've got A which is this massive sphere, you've got B which is inside it, and C which is inside it. A intersection something intersects the whole of A with something else, which might also intersect with B in a subset, but it is wider than just B, which means like the whole variance is very complicated in how you check it because you can't really reuse the same loop.

Derick Rethans 10:13

I can't imagine how much more complicated this gets when you have both intersection and union types in the same return type or parameter argument type.

George Peter Banyard 10:22

One of the primary reasons why it's currently not in the RFC, because it is already mind boggling. And although I think it shouldn't be that hard to like, add support for it down the line, because I've already split it mostly up so it should be easy to check: Oh, is this an intersection? Is this a union? And then you need to branch.

Derick Rethans 10:42

Luckily because standard types aren't included here, you also don't really have to think about coercive mode and strict mode for these types. Because that's simply not a thing.

George Peter Banyard 10:50

That's very convenient.

Derick Rethans 10:52

Is the future scope to this RFC?

George Peter Banyard 10:54

The obvious future scope is what I call composite types, is you have unions and intersections available in the same type. The main issue is mostly variance, because it's already complicated, adding more scope to it, it's going to make the variance go even harder. I think with most programming languages, the variance code is always complicated to read. While I was researching some of it, I managed to hit a couple of failures, which where with I think was Julia and the research paper I was it was just like focusing on a specific subset. And like, basically proving that it is correct. It's not a very big field. Professors at Imperial, which I've talked to, have been kind of helpful with giving some pointers. They mostly work with basically proper languages or compiled languages, which have this whole other set of implications. Apparently, they have like a bunch of issues about how you normalize the types like in an economical form, to make it easier to check. Which is probably one of the problems that will need to be addressed, when you get like such a intersection and union type. First, you normalize it to some canonical form, and then you work with it. But then the second issue is like how do you want the composite types to actually be? Is it oh, you have got parentheses when you want to mix and match? Or can you use like union precedence? I've heard both opinions. Basically, some people are very dead against using Union as a precedent.

Derick Rethans 12:14

My question is going to be, is this actually something people would use a lot?

George Peter Banyard 12:21

I don't think it would be used a ton. The moment you want to use it, it is very useful. One example is with the PSRs, the HTTP interfaces. Or if you want the link interface. Combining these multiple things gets it convenient. One of the reasons why I personally wanted as well, it's for streams. So currently, streams don't have any interface, don't have any classes. PHP basically internally checks when you call like certain string methods. For example, if you try to seek and you provide a user stream, it basically checks if you implement a seek method, which should be an interface. But you can't currently do that. Ideally, you would want to stream maybe like a base class, instead of having like a seekable stream, and rewindabe stream, or things like that. You basically just have interfaces. And then like if somebody wants a specific type of stream, just like a stream, which is seekable, which is rewindable. And other things. We already have that in SPL because there's an iterator. And we have a seekable iterator interface, which basically just ask: Oh, this is there's a seek method. I think it depends how you program. So if you separate the many things into interfaces, then you'll probably use intersections types a lot. If you use a maybe a more traditional PHP code base, which uses union types a lot. Union types are like going to be easier. And you want to reduce that.

Derick Rethans 13:32

Would you think that lots of people already use union types because it's pretty new as well. Isn't it?

George Peter Banyard 13:38

Union types are being implemented in various different libraries. PSRs are updating the interfaces to use union types. One use case, I also have a special method, which was taken the date, it takes a union of like a DateTime interface, a string or an integer. Although intersections types are really new, you hear people when union types were being introduced, you heard people saying, I would promote bad cleaning habits, you shouldn't have one specific type. And if you're using a union, you have a design issue. And I had many people complaining to me why and intersection types of see? Why they haven't intersection types being introduced first, because intersection types are more useful. But then you see other people telling us like, I don't see the point in intersection types. Why would you use an intersection type, just use your concrete class, because that's what you're going to type anyway.

Derick Rethans 14:21

I can give you a reason why union types have implemented first, over intersection types, I think, which is that it's easier to implement.

George Peter Banyard 14:28

It's easier to implement. And it's more useful for PHP as a whole, because PHP functions accepts a union or return a union. Functions return false for error states instead of null. It makes sense why union types were introduced first, because they are mostly more useful within the scope of what PHP does.

Derick Rethans 14:46

Do you think you have anything else to add about intersection types? At the moment, it's already up for voting, when is that supposed to end?

George Peter Banyard 14:54

So the vote is meant to end on the 17th of June.

Derick Rethans 14:57

At the moment I see there's 15 votes for and two against so it's looking good. What's been your most pushback on this? If there was any at all?

George Peter Banyard 15:05

Mostly: I don't see the point in it. However, I do think proper reasons why you don't want it, compared to like some other features where it's more like have thoughts on what you think design wise. But it is undeniable that you you add complexity to the variance. And to the variance check. It is already kind of complicated. I have like a hard time reading it initially. There's the whole parser hackery thing, which is kind of not great. It's probably just because we use like a restricted parser because it's faster and more efficient.

Derick Rethans 15:36

I think I spoke with Nikita about parsers some time ago and what the difference between them were. If I remember which episode it was all the to the show notes.

George Peter Banyard 15:44

And I think the last reason against it is that it only accepts pure intersections. You could argue that, well, if you're adding intersections, you should add the whole feature set. It might impact the implementation of type aliases, because if you type alias T to be a union of A and B, and then you use type T in an intersection, you basically get a mixture of unions and intersections, that you need to be able to work with. The crux of this whole feature is the variance implementation. And being able to rationalize the variance implementation and been to extend it, I think it's the hardest bit.

Derick Rethans 16:18

I guess the next thing still missing would be type aliases, right? Like names for types, which you can't define just yet, which I think you also mentioned in the RFC is future scope.

George Peter Banyard 16:29

Yeah.

Derick Rethans 16:30

Thank you, George, for taking the time today to talk to me about pure intersection types.

George Peter Banyard 16:36

Thanks for having me on the show.

Derick Rethans 16:41

Thank you for listening to this installment of PHP internals news, the podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.


239: I’m a PHP Uggo

This week on the podcast, Eric, John, and Thomas talk about Eric's journey to learn C, pull request workflows, new PHP RFCs, and more...

Links from the show:

This episode of PHPUgly was sponsored by:

Cloudways, a managed cloud hosting platform built for your PHP projects.
If you simply wish to focus on your business, Cloudways is the way to go. They take over server management and security and free up time that you can dedicate to growing your business and acquiring new clients.
The Platforms offers a choice of IaaS partners (AWS, Google Cloud, Digitalocean, Linode, and Vultr). In addition, you get a performance-optimized stack, managed backups, and staging environment where you can test your code before pushing it to live servers.
Best of all, Composer and Git come pre-installed so you can get your projects up and running quickly.
All this power, simplicity, and peace of mind falls right with their brand slogan - Moving Dreams Forward
Be sure to visit https://www.cloudways.com/en/php-hosting.php?id=833038 today. Sign up using the Promo code PHPUgly and get a $25 credit.

PHPUgly streams the recording of this podcast live. Typically every Thursday night around 9 PM PT. Come and join us, and subscribe to our Youtube Channel, Twitch, or Periscope. Also, be sure to check out our Patreon Page.

Twitter Account https://twitter.com/phpugly

Host:

Eric Van Johnson https://twitter.com/shocm
John Congdon https://twitter.com/johncongdon
Tom Rideout https://twitter.com/realrideout
Youtube Channel - https://www.youtube.com/phpugly

Twitch - https://www.twitch.tv/phpugly

Periscope - https://www.periscope.tv/phpugly/

Powered by Restream https://restream.io/

Also, be sure to check out our Patreon Page - https://www.patreon.com/phpugly

PHPUgly Anthem by Harry Mack

Youtube: https://www.youtube.com/channel/UC59ZRYCHev_IqjUhremZ8Tg
Twitter: https://twitter.com/harrymack


PHP Internals News: Episode 87: Deprecating Ticks

PHP Internals News: Episode 87: Deprecating Ticks

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Deprecating Ticks" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick, welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 87. Today I'm talking with Nikita Popov about a much smaller RFC this time: Deprecating Ticks. Nikita, would you please introduce yourself.

Nikita Popov 0:34

Hi Derick, I'm Nikita, and I'm working on PHP core development on behalf of JetBrains.

Derick Rethans 0:40

Let's jump straight into what this RFC is about, and that's the word ticks. What are ticks?

Nikita Popov 0:46

Ticks are a declare directive,. You write declare ticks equals one at the top of your file, and then PHP we'll call a tick function after every statement execution. Or if you write ticks equals two, then as we'll call it the function after every two statement executions.

Derick Rethans 1:05

Do you have to specify which function that calls?

Nikita Popov 1:08

Of course, so there is also a register tick function and unregister tick function and that's how you specify the function that should be called rather the functions.

Derick Rethans 1:17

How does this work, historically, because the RFC talks about the change being made in PHP seven?

Nikita Popov 1:22

Technically ticks work by introducing an opcode after every statement that calls the tick function depending on current count. The difference that was introduced in PHP seven is to what the tick declaration applies. The way PHP language semantics are supposed to work, is that declare directives are always local. The same way that strict types, only applies to a single file, ticks should also only apply to a single file. Prior to PHP seven, it didn't work out way. So if you had declare ticks, somewhere in your file, it would just enable ticks from that point forward. If you included the different file or even if the autoloader was triggered and included a different file that one would also make use of ticks. That was fixed in PHP seven, so now it is actually file local, but that also means that the ticks functionality at that point behaviour became, like, not very useful. Because usually if you want to use tics you actually want them to apply it to your whole codebase. There are ways around that. I'm afraid to say that people have approached me after this RFC and told me that they actually do that. The way around that is to register a stream wrapper. It's possible in PHP to unregister the file stream wrapper and register your own one, and then it's possible to intercept all the file includes and rewrite the file contents to include the declare ticks at the top of the file. I do use that general mechanism for real things in other places, but apparently people actually use that to like instrument, a whole application with ticks, and essentially restore the behaviour we had in PHP 5.

Derick Rethans 3:03

What was the intended use case for ticks to begin with?

Nikita Popov 3:07

Well I'm not sure what was the intended use case, but at least it was the main use case, and that's signal handling. In the PCNTL extension allows you to register a signal handler, and when the signal arrives, we can't just directly call that signal handler, because signals are only allowed to call functions without that our async signal safe. Which excludes things like memory allocation, and a lot of other things that PHP uses. What we do instead is we only set the flag that okay signal has arrived and then we have to actually run the signal handler at some later point in time. In PHP five, that worked using ticks. You declare ticks, and the PCNTL extension registered the tick handler, and then after this flag was set, it would execute your callback on the next tick. In PHP seven, an attentive mechanism was introduced, that is based on virtual machine interrupts. Those were originally introduced for time-out handling, because there we have a similar problem, that when timeout arrives, we might be in some kind of inconsistent state, like the middle of the allocator right now, and if we just bail out at that point, we are likely to see crashes down the road. So that was a significant problem in PHP five. PHP seven changed that. We now set an interrupt flag on timeout, and then the virtual machine checks this flag at certain points. The interrupt flag is not checked after every instruction, but only, like, just often enough to make sure that it's checked, at some point. So that you can't like go in an infinite loop, that ends up never checking. These points are basically function calls, and jumps that go higher up in the function, PCNTL signals can now use the same mechanism. If you call PCNTL async signals true, then those will also set the interrupt flag, and execute the signal handler on the next opportunity. The next time the interrupt flag is checked. The nice thing about that is that it's essentially free. I mean we already, we already have to do these checks for the interrupt like anyway, adding the handling for PCNTL signals doesn't add any cost on top. Unlike ticks, which have to be like executed on every instruction or at least regularly, and that does add significant cost.

Derick Rethans 5:28

Execution time itself because it's an opcode that needs to be executed.

Nikita Popov 5:32

Exactly.

Derick Rethans 5:33

So what are you proposing to do but the ticks in PHP eight one then?

Nikita Popov 5:36

I want to deprecate that. So both the declared directive itself, and the register tick function, unregister tick function.

Derick Rethans 5:44

How could users emulate the same behaviour as ticks allows them to do so now?

Nikita Popov 5:49

That's a good question. As I mentioned, if the use case is, use case of ticks was signal handling, then by using async symbols. If it was something else, then you have a problem. My assumption when writing this RFC was basically that signal handling was really the main remaining use case of ticks, because other use cases require this kind of you know stream wrapper instrumentation, and I didn't expect that people will be crazy enough to use something like that in production.

Derick Rethans 6:21

Hopefully they catch these rewritten files?

Nikita Popov 6:23

Probably yeah. I think it's possible to make this integrate with opcache. If you use it for other purposes, then, I don't think there is a really good replacement. So I think what they use it for is some kind of well instrumentation, so profiling, memory profiling, for example, and the alternative there of course is to use a tool that is appropriate for that job, for example, Xdebug contains a profiler, but of course it is not a production profiler, but I think there are also production profilers.

Derick Rethans 6:54

As far as I know all the production or APM solutions. They do this on their own without having to use sticks. They don't need any user land modifications.

Nikita Popov 7:03

Yeah, definitely. All the APM solutions support this, they use internal handlers.

Derick Rethans 7:08

Because it's actually removing functionalities that some people use, what's the reaction been to removing this functionality?

Nikita Popov 7:14

Well on the mailing list at least positive, but as I mentioned at least some people have like pointed out on the pull request that they are using the functionality.

Derick Rethans 7:23

Enough in such a way to sway for not deprecating them? What is the benefits of getting rid of ticks, if you don't use them?

Nikita Popov 7:31

That's, I think the thing, that there is not really a big benefit to getting rid of them. Like they don't add a lot of technical complexity to the engine. They're pretty simple in that sense. I haven't seen those responses. I'm kind of rolling a bit unsure if we should really remove them, because you could argue that well they don't really hurt anyone. I do have to say that I think all the things that people use sticks for, all the cases I have heard about, and all of those cases ticks are not the right way to solve the problem. They are not the right way to solve the signal handler problem, they are not the right way to solve the profiling problem. And the other one I heard is also they're not the right way to solve the heartbeat problem, to make sure a service stays connected. While people do use them I think they use them for questionable purposes.

Derick Rethans 8:24

Developers, if they're using something to rewrite the PHP file to introduce ticks, they can also technically rewrite a file to introduce calls to their own functions, after every statement.

Nikita Popov 8:34

Yes, I actually have a very nice PHP fuzzing project that rewrites PHP files to introduce instrumentation functions at certain points. That needs a lot more control than ticks, because it's interested in branching statements in particular. That is definitely also possible, but it's kind of even more crazy than just adding ticks. If you're doing it like this, I think, if we want to keep ticks, then we should change ticks from a declare directive to a ini_set, because this kind of rewriting of files to introduce takes that's like not a great solution. On the other hand, that does mean that if you are, I don't know a library, implementing some code and expecting that, you know, it just runs normally, then someone can with by enabling an ini setting will suddenly run code in the middle of your library file that's like essentially any point. So enabling ticks us a major behaviour change, that's something we really don't like to have in ini settings which is I guess also, why does it declare in the first place, because that limits the scope. And you have to go out of your way if you want to not limit it using this rewriting hack. So I'm not really sure ultimately what to do here.

Derick Rethans 9:44

Are you thinking of bringing this up for vote before PHP eight dot one's feature freeze?

Nikita Popov 9:49

If I decide to go for it, then definitely before. I'm just not completely sure on this topic yet.

Derick Rethans 9:55

it'd be interesting to, to hear what other people think about removing this. I have no opinion about this. Other features I do but in this case, I'm happy with them being there, I'm happy with them not being there, because it's something I'm using myself. In any case, thank you for going through this RFC with me today, and we'll see what happens.

Nikita Popov 10:14

Thanks for having me, Derick.

Derick Rethans 10:18

Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug and debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.



Conquering completion, Vim, and Intelephense

In this episode, Jake and Michael dive into Michael's Vim and his attempts to #DispelTheMyth around how much work is required to make it a solid option for working with PHP.

Show links

238: No Trello Cards, No Problem. WE DO IT LIVE!

Links from the show:

This episode of PHPUgly was sponsored by:
* Cloudways, managed cloud hosting platform built for your PHP projects. Use promo code PHPUGLY and get a $25 credit. https://www.cloudways.com/en/php-hosting.php?id=833038
* Honeybadger.io - https://www.honeybadger.io/

Cloudways, a managed cloud hosting platform built for your PHP projects.
If you simply wish to focus on your business, Cloudways is the way to go. They take over server management and security and free up time that you can dedicate to growing your business and acquiring new clients.
The Platforms offers a choice of IaaS partners (AWS, Google Cloud, Digitalocean, Linode, and Vultr). In addition, you get a performance-optimized stack, managed backups, and staging environment where you can test your code before pushing it to live servers.
Best of all, Composer and Git come pre-installed so you can get your projects up and running quickly.
All this power, simplicity, and peace of mind falls right with their brand slogan - Moving Dreams Forward
Be sure to visit https://www.cloudways.com/en/php-hosting.php?id=833038 today. Sign up using the Promo code PHPUgly and get a $25 credit.

PHPUgly streams the recording of this podcast live. Typically every Thursday night around 9 PM PT. Come and join us, and subscribe to our Youtube Channel, Twitch, or Periscope. Also, be sure to check out our Patreon Page.

Twitter Account https://twitter.com/phpugly

Host:

Streams:

Powered by Restream

Patreon Page

PHPUgly Anthem by Harry Mack / Harry Mack Youtube Channel


RoadRunner, Atoum, IDEs, Feature Tests, DIY API, Wizard Thinking

Eric, John, and Oscar try to review the May 2021 issue, Testing Assumptions.

Topics Covered

  • Fall Conferences, like Longhorn PHP (CfP is currently open).
  • Debugging long-running applications using RoadRunner, ReactPHP, or Swoole.
  • Does using an IDE make you a bad developer?
  • Feature tests w/Behat
  • PHP Internals interview w/Sara Golemon
  • Building an API with off-the-shelf components
  • Thinking like a wizard and learning to spot patterns in the code.
  • Sustainability of open-source projects like PHP core and documentation.

The post RoadRunner, Atoum, IDEs, Feature Tests, DIY API, Wizard Thinking appeared first on php[architect].


PHP Internals News: Episode 86: Property Accessors

PHP Internals News: Episode 86: Property Accessors

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the "Property Accessors" RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news

Transcript

Derick Rethans 0:14

Hi I'm Derick. Welcome to PHP internals news, a podcast dedicated to explain the latest developments in the PHP language. This is episode 86. Today I'm talking with Nikita Popov about his massive property excesses RFC. Nikita, would you please introduce yourself?

Nikita Popov 0:32

Hi Derick, I'm Nikita, and I do work on PHP core development, on behalf of JetBrains.

Derick Rethans 0:39

This is probably the largest RFC I've seen in a while. What in one sentence, are you proposing to add to PHP here?

Nikita Popov 0:46

I would say it's an alternative to magic get and set, just for one specific property instead of all of them. That's the technical side. Maybe I should say something about the like motivation behind it, which is that since PHP seven four, we have type properties, that at least for me personally with that feature, the need to have this typical pattern of private property for storage, plus a public getter and setter methods, the main motivation for that has kind of gone away, because we can now use types to enforce any contracts on value. And now these getter and setter methods most if you like boilerplate. So the idea with accessors, at least my idea with accessors is that you really shouldn't use them. You should just have them as a backup option. You declare a public property in your class, and then maybe later, years later, it turns out that okay, that property actually requires additional validation. And right now if you have a public property, then you don't really have a good way of introducing that. Only way is to either break the API contract by converting the property into getter/setter methods where you can introduce arbitrary code, or by using magic get/set, which is definitely possible and persist the API contract, it's just fairly ugly.

Derick Rethans 2:09

You changes the public property that people could read into a private one. And because it's private, the set and get metric methods are being called.

Nikita Popov 2:18

Exactly.

Derick Rethans 2:19

This RFC is titled Property accesses, how do these improve on the situation?

Nikita Popov 2:24

So I think there are really two fairly orthogonal parts to this RFC. The first part is implicit accesses that don't have any custom behaviour, and just allow controlling the behaviour of properties a bit more precisely. In particular, the most important part is probably the asymmetric visibility, where you have a property that's publicly readable, but can only be set from within the class. So public read/ private write. I think that's a, maybe the most common requirement. The second part is where you can actually introduce some custom behaviour. So where you can say that okay, the get behaviour for this property looks like this, and the set behaviour, it looks like this. Which is essentially exactly the same as what magic get/set does, just for a single property.

Derick Rethans 3:10

For example, when you then do set, or you can add additional validation to it.

Nikita Popov 3:14

Exactly. Originally, you had a simple public property, then you can add a setter that checks okay this string cannot be empty.

Derick Rethans 3:23

Okay, what it's the syntax that you're proposing?

Nikita Popov 3:26

I went with these essentially the same syntax that's being used in C#. Looks like you write public foobar, and then you have this sort of semi colon you have a code block. And this code block contains two accessors, so then you have something like get, and another code block that specifies the get behaviour, and set, and the code block that specifies the set behaviour and so on.

Derick Rethans 3:52

The RFC talks about implicit and explicit implementations of these getter and setter accessors. What is the difference between them and how does it look different in syntax?

Nikita Popov 4:03

Yeah so the difference is, either you can write just get semi colon, set semicolon, that's an implicit implementation, or you actually specify a code block with real custom behaviour. To do the implicit implementation, you're saying that this is really a normal property, and PHP automatically manages the storage for you, is that you have this more fine grained control over how it works. Namely what you can do is you can say that you have get and private set. But that's a property that's publicly read only and internally writeable. You can write just get without set, in which case it's a real read only property both publicly and privately, or to be more precise, it's an init once property so you can assign to it once.

Derick Rethans 4:52

How do you keep track of the init once?

Nikita Popov 4:53

It's same mechanism as for Type Properties, where we distinguish between an initialized and an uninitialized property. You can assign to an uninitialized property, but you can't assign to an initialized one, if it's read only. The only maybe problem there is that this mechanism, requires that the property actually is uninitialized to start with, which means that for accessors you don't have any default values. To say there is no implicit default value, no implicit null value. If you want to have a default value the same as with type properties you have to specify it explicitly. Specifying a default value really only makes sense if the property is both readable and writable. For Read Only properties, if you specify the default then you will you can change that.

Derick Rethans 5:37

You have basically have created a constant.

Nikita Popov 5:39

Yes, it is essentially a constant.

Derick Rethans 5:41

You mentioned already, PHP seven four introduced type properties. How do these types interact with the setter and getter accessors?

Nikita Popov 5:50

I would say in the obvious way. The getter is required to return type of property, modulo the usual weak typing conversions, and the setter also checks before it's called whether the passed value matches the type or not. But enforces that matches the type.

Derick Rethans 6:08

This does mean that if you provide an explicit implementation for the set accessor, you also need to specify the parameter name?

Nikita Popov 6:15

No, or you can specify the parameter name, and if you don't then that's just passed in as the value variable. It's also inspired by how C# and Swift do it. I mean there are some possible variations here we could always require an explicit name, some people for that, or I also heard that some people would like to have the name of this implicit variable match the name of the property, instead of always being just value.

Derick Rethans 6:41

Would you have to specify the type though?

Nikita Popov 6:43

You wouldn't have to and you're actually not allowed to. So the accessor implementation is somewhat strict about not allowing you to do anything that would be redundant because otherwise, you know, there are quite a lot of extra things you could be adding everywhere.

Derick Rethans 6:56

That's the same way as marking a property as private. And then the accessors as private as well. Right?

Nikita Popov 7:03

Yeah exactly. So, then that will also say: if the property is already private you can't, again say that the accessors also private.

Derick Rethans 7:11

I think that's the wise thing, otherwise people go overboard with adding private and final and whatever everywhere anyway right.

Nikita Popov 7:18

One could argue that it's really not our business and this is a coding style question, but you know it's better to not leave people, with the option of doing stupid things.

Derick Rethans 7:28

I saw in the RFC that it is also possible to use references with the get accessor. Does this complicated implementation and the idea of this RFC, a lot, or just a little?

Nikita Popov 7:39

I think the important context to keep in mind here is that we already have magic get set, and the accessors are, like, largely based on their semantics. Magic getters already have this distinction between returning by value and returning by reference. The by reference return value is primarily useful for two cases. One and this is really the important one, is if you're working with arrays, any write operation on an array like setting an element or appending an element, those require that the getter returns by reference, because PHP will actually do the modification on the reference. Because some people asked about that. Why can't we just like get the array using the getter, then make the change and then assign back using the setter. That would theoretically work, but it would be extremely inefficient, and the reason is that this breaks PHP's copy on write mechanism. If the error is returned from the getter, then we have one array inside the property. And we have one copy of the array inside the property, and as the return value. Then we change the return value and the resource is now shared, we actually have to copy the whole array, and then we assign it back. So effectively what we do is we copy the array, we do single element change, and then we copy the array, we do a single element change and then we destroy the old array. That works in theory, but it's so inefficient that we would not want to promote this kind of usage.

Derick Rethans 8:42

The way around is of course, is having an implicit methods on the class to make this change to the array itself right?

Nikita Popov 9:10

That would be another option. Problem is that you will need a lot of methods, I mean it's not just a matter of setting a single element or unsetting an element, but you can also set like a deep element where you're not modifying the outermost array but, like, a multi dimensional array. You would actually have to pass through that information somehow as well. I don't think there is a simple solution to that problem beyond the reference based solution that we currently use.

Derick Rethans 9:34

I saw people arguing about not bothering with references in this new implementation at all, but I think you've now made a good case for keeping them.

Nikita Popov 9:42

Effectively not bothering with references just means not supporting that array use case. Which might be, maybe a reasonable limitation, especially if we like make a distinction and supported for the implicit accessor case where we can, you know, do internal magic to support that and not support it in the explicit accessors case. I mean, people were arguing that this reduces the complexity of the proposal, but it kind of also increases the complexity because now we are doing something else for the accessors and we're doing for the magic get/set, where we already have this established mechanism. I'm not really convinced by that.

Derick Rethans 10:20

And I also think it creates inconsistencies in the language itself because it does something different with an implicit or explicit accessor, as well as it being different between the original underscore underscore get magic method as well.

Nikita Popov 10:34

It's not a secret that I'm not a big fan of references, and I would certainly love to get rid of them, but it's a hard problem, and this array modification behaviour for magic get or for get accessors is certainly a large part of that problem, and I just don't have a good solution for it.

Derick Rethans 10:52

I don't either. The RFC also goes into great detail about inheritance and variance. Would you have a few words on that?

Nikita Popov 11:00

I think mostly inheritance works like inheritance does for methods, at least that's how it's supposed to work. Of course there are some interactions, because you can for example mix real properties and accessor properties. In which case, if you have parent accessor property, you can always replace it with a normal simple property, because normal properties they support all operations that accessor properties do. What you can't do is the other way around. If you have a parent normal property, then you can't replace that with an accessor property. And reason is that it does have some limitations. Not a lot, but there are some limitations. One of them is related to references, I mean, we're already talking about this topic. What the by reference get allows is taking a reference to the property, so you can do something like a reference equals the property. What you can't do is the other way around the property reference equals something else. So you can't assign a new reference into the property, that just doesn't work on a pretty fundamental level, because it would require an additional set handler for set by reference. As we don't particularly love references, adding a new mechanism to support that is not a very popular choice.

Derick Rethans 12:20

Variance wise, I guess, the same rules apply as for normal properties and property types?

Nikita Popov 12:27

Approximately. Properties are apparently invariant, so you can't change the type or I mean you can change it but it has to be an equivalent type. If you have a read only property, with only a getter, then the implementation makes the type covariant, which means you can use a smaller type in the child class. This is similar to how if you have a getter method, you could also give it a smaller type in the child class. The converse case, if you have a property that can only be set, then the type is contravariant, you can have larger type in the child class, though I should say that properties that can only be set are somewhat odd and really only supported for the sake of completeness, so maybe it might be worthwhile to drop the type specific behaviour there, because a set only property should already be really rare, and then set property with a contravariant inheritance that's like a edge case of an edge case.

Derick Rethans 13:24

Would it even make sense to support set only properties?

Nikita Popov 13:27

Not sure. So for the C#, implementation, I think they don't support this and there is a StackOverflow question about that, and people try to convince their, that they should support this, that the are really use cases. Currently the imagined use cases are along the lines of injecting values into a class, so using setter injection, just that now it's property based setter injection. Okay, I'll be honest I think it doesn't make sense.

Derick Rethans 13:55

To be fair, I don't think either. It would reduce the length of the RFC a little bit.

Nikita Popov 14:00

A little bit, yes.

Derick Rethans 14:01

Can you say a few words about abstracts, traits, private accessors shadowing and things like that. So a lot of complicated words, maybe you, you can distil that into something slightly simpler.

Nikita Popov 14:12

Well I think actually abstract properties are worth mentioning. In particular, the fact that you can now specify properties inside interfaces. If you have public properties, then it makes sense to have them really on the same level as public methods, so they are part of the API contract, and as such should also be supported in interfaces. Typically what the RFC allows is, you can't specify a simple property in the interface, but you can specify an accessor property, which tells you which operations have to be supported. So you can't have a property declaration that says, it just has a get accessor, or it has get and set. The implementation of course can always implement more, so if the interface requires get, then you can implement both get and set, but it has to implement at least get, either through an accessor offer another property. I think in most cases implementation will just be a normal property.

Derick Rethans 15:03

Because a normal property would implement an implicit get already anyway?

Nikita Popov 15:07

Yeah.

Derick Rethans 15:08

How do property accessors tie in, or integrate with constructor property promotion?

Nikita Popov 15:13

They are supported and promotion with the limitation that it's only implicit accessors. If you use constructor promotion, then you can specify your read only property in there, or property that is Public Read/ private write. You cannot specify a property with complex behaviour in there. This is mainly because it would mean that you embed large code blocks into the constructor signature, which is I think, pushing the limits of shorthand syntax, a bit. Like there is nothing fundamental that will prevent it, it's more a question of style.

Derick Rethans 15:50

The RFC talks a little bit about how, or rather what happens if you use foreach, var_dump, or an array cast on properties with explicit accessor. What are the restrictions here? Is something chasing from normal standard properties like we currently have.

Nikita Popov 16:03

I don't think so. So here is once again the case where we have this distinction between the implicit accessors, which are really just normal properties with limitations. So those show up in var_dump and array cast, foreach, as usual. And we have explicit accessors, which are really virtual properties, so they don't have any storage themselves. Any storage to use, you have to manage separately somehow. So, these don't show up in var_dump, foreach, and so on. Both these actual computed properties, they don't show up because that would require us to actually call all the accessors if you do foreach and that seems rather dubious to me.

Derick Rethans 16:44

How this will work for internal API's that some extensions use to access, like a list of all the properties, for say, for a debugger.

Nikita Popov 16:51

It'll work the same way as var_dump. I mean, in the end it's all, well it's not quite based on the same API's, but still, the answer is the same. You only get those properties that have some kind of backing storage, and those are only the ones that are either normal properties, or the ones with implicit accessors.

Derick Rethans 17:09

That means I need to go find out a way how to be able to read the ones with explicit accessors.

Nikita Popov 17:14

Yeah, if you want to. I don't think that the debugger should read those by default, because that means that doing a dump, will have side effects, which is not ideal, but maybe you want to have an option to show them.

Derick Rethans 17:26

That's something for me to think about, because I'm pretty sure people are going to want to see the contents of these properties, even in a debugger, even though that could mean that are side effects, which I'm not keen on.

Nikita Popov 17:36

I guess that's one of the, I would say advantages of using this over just magic get/set, because actually know which properties you're supposed to look at, with for magic get/set you just don't know at all.

Derick Rethans 17:51

The RFC talks a little bit about the performance impacts and although I saw the numbers I didn't actually read them, when preparing for this recording. What are the performance impacts for implicit accessors as well as explicit ones?

Nikita Popov 18:02

Impact is basically if you use implicit accessors that has similar performance to plain properties, performance is a bit worse. The reason is essentially that we have some limitations on caching. So we can't just cache it as if it were a normal property, because it could have asymmetric visibility. And we reuse the same cache slots for reads and writes. I've been thinking about maybe splitting that up but at least for now there is a small additional performance impact of using implicit accessors, but it's not really significant. On the other side if you use explicit accessors. Those are expensive, they are not quite as expensive as using magic get/set, but they are more expensive than using normal method calls. Reason is basically their normal method calls, they are very optimized, and they do not have to re enter the virtual machine, so we just stay in the same virtual machine loop, and we just switch to different stack frames. For magic get/set we actually have to like recursively call the virtual machine, because we don't have a good point to re enter it, at least based on our current API's. And we also have to deal with some additional stuff, particularly the fact that magic get/set and property accessor as both, they have recursion guards. Normally if you recurse methods in PHP, we don't do any checks about that. Xdebug does, but PHP itself doesn't, so you can infinitely recurse and PHP is fine. The only thing that happens is that at some point you'll run out of memory.

Derick Rethans 19:37

Or when extensions are loaded such as Xdebug, you'll actually still get a stack overflow.

Nikita Popov 19:41

So that's something we should still be addressing, at least the baseline behaviour that you can get to that memory limit error. For properties will set have recursion guards, which say that if you recursively access a property in magic get/set, that it will not call magic get/set again and instead, access the property as if they didn't exist.

Derick Rethans 20:01

Instead of throwing in an error?

Nikita Popov 20:03

Yeah. For property accessors I'm actually throwing an error on recursion, and the reason for that is if we didn't throw an error, then this would end up accessing dynamic property of the same name as the accessor, which would technically work, but it's very likely not what the programmer actually intended. So it's going to be really inefficient because you actually have to allocate space for the dynamic properties and access for those. So if you wanted to have some kind of backing storage for the property, then you should just explicitly declare it and access that, rather than accessing something with the same name and implicitly creating a dynamic property.

Derick Rethans 20:41

Yeah, that sounds all very complicated.

Nikita Popov 20:44

It's cleaner to just make it an error and let the programmer fix it, instead of PHP try to fix it for you.

Derick Rethans 20:51

Are there any BC considerations about the introduction of property accessors?

Nikita Popov 20:55

Not strictly, but I'm sure that it's going to break, various assumptions for people, or at least in the sense that, right now, most assumptions should already be broken through magic get/set. I mean you can always have this kind of magic behaviour. If we have accessors this is probably going to be a lot more common, and people will have to deal with things like properties being publicly readable, but privately writable much as because someone very rarely manually implements that behaviour, but because the language though has native support for it and it's going to be common.

Derick Rethans 21:28

We spoke a little bit about all the different sticking points in his RFC, for example with references, but there's one other thing and I think it's an argument you make somewhere on the bottom of the RFC, that there is a separation between implicit and explicit property accessors. I'm wondering whether it would make sense to consider adding whether the implicit part of this RFC first and then maybe later look at adding explicit property accessors.

Nikita Popov 21:54

That's really the main sticking point, and also my own problem with the RFC. I mean, you mentioned at the very start, that this is a very long RFC and still a little bit incomplete so it's going to be longer. It's a fairly complex feature that has complex interactions with other features in PHP. The implementation is actually, maybe less complex, then you think, given the RFC length. The main concern I have is that, at least for me personally the most useful part of the RFC, are the read only properties. The read only properties and the like Public Read, Private Write properties. I think these two cover like 90% of the use cases, especially because if you have a property that is only publicly readable, then you don't really have to be concerned about this case where you have to, later on, add additional validation. I mean after all the property is read only, or you control all the sets because they're private. There is no danger of introducing an API break, because you have to add additional validation. I think like the largest part of the use case of the whole accessors proposal will be covered by these two things, Or maybe even just one of these two things, that's a bit of a philosophical question. There are some people who think we should have just public read / private write and no proper read only properties, because that like looks the same from the user perspective, but still gives you more flexibility. I think that's like the most important use case, and we could implement that part with a lot less language complexity. So the question is really does it make sense to have this full accessor proposal, if we could get the most useful part as a separate simpler feature, and, well, I heard differing opinions on that one. I was actually pretty surprised that their reception of the on like a full accessors proposal was fairly positive. I kind of expected more pushback, especially as, this is the second proposal on the topic, we had earlier one with, like, similar syntax even though different details, and that one did fail.

Derick Rethans 24:02

How long ago was that?

Nikita Popov 24:03

Oh that was quite a while actually, at least more than five years.

Derick Rethans 24:06

I think that the mindset of developers has changed in the last five to 10 years, like introducing this 10 years ago would never happened, or even typed properties, right. It would never have happened.

Nikita Popov 24:17

That's true.

Derick Rethans 24:19

Do you have any idea when you're going to put us up for a vote? Because, of course, PHP 8.1 feature freezes coming up in not too far away from now.

Nikita Popov 24:28

Yeah, I'm not sure about that. I'm still considering if I want to explore the simpler alternatives, first. There was already a proposal. Another rejected proposal for Read Only properties, probably was called write once properties at the time. But yeah, I kind of do think that it might make sense to try something like that, again, before going to the full accessors proposal or instead.

Derick Rethans 24:54

Do you have anything else to add?

Nikita Popov 24:56

What are your thoughts on this proposal, and the question at the end?

Derick Rethans 24:59

I quite like it, but I also think that it might make sense to split it up. I'm always quite a fan of splitting things up in smaller bits, if that's possible too, and still provide quite a lot of use out of it. And that's why I was asking whether it makes sense to split it up into the implicit part and the explicit part of it, especially because it makes the implementation and the logic around it quite a bit easier for people to understand as well.

Nikita Popov 25:24

It's maybe worth mentioning that Swift also has a similar accessor model but it is more like a composition of various different features like read only properties, properties with asymmetric visibility, and then finally properties with like fully controlled, get and set behaviour rather than this C# model where everything is modelled using accessors with appropriate modifiers. So there is certainly precedent in other languages of separating these things.

Derick Rethans 25:55

Something to ponder about, and I'm sure we'll get to a conclusion at some point. Hopefully some of it before PHP eight one goes and feature freeze, of course. We've been chatting for quite a while now, I think we should call it the end for this RFC. Thank you for taking the time today to talk about property accessors.

Nikita Popov 26:11

Thanks for having me, Derick.

Derick Rethans 26:12

Thank you for listening to this installment of PHP internals news, a podcast, dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next time.