Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another problem with weak references and reactive-banana (different behavior in GHC vs GHCJS) #463

Closed
ocharles opened this issue Feb 11, 2016 · 22 comments · Fixed by ghcjs/shims#45

Comments

@ocharles
Copy link

cc @HeinrichApfelmus

The following program exhibits different behavior under GHC and GHCJS:

module Main (main) where

import Control.Concurrent
import Control.Monad
import Reactive.Banana
import Reactive.Banana.Frameworks

text_ :: Behavior String -> MomentIO (Behavior [()])
text_ contents =
  do contentsChanged <- changes contents
     reactimate' (fmap (fmap (const (print "X"))) contentsChanged)
     return (pure [()])

joinB :: Behavior (MomentIO (Behavior [()]))
      -> MomentIO (Behavior [()])
joinB b =
  do changed <- changes b
     initialB <- join (valueB b)
     laterB <- execute (b <@ changed)
     switchB initialB laterB

main :: IO ()
main =
  do n <-
       compile (do (tick,fireClick) <- newEvent
                   liftIOLater $
                     void $
                     forkIO $ forever $ threadDelay 1000000 >>= fireClick
                   reactimate (fmap print tick)
                   list <- accumB [] (fmap (:) tick)
                   nodes <-
                     joinB (fmap (fmap (fmap concat . sequenceA) .
                                  mapM (text_ . pure . show))
                                 list)
                   nodesChanged <- changes nodes
                   reactimate' (fmap (fmap (const (putStrLn "Changed"))) nodesChanged))
     actuate n
     forever (threadDelay maxBound)

In ghc:

()
Changed
()
Changed
()
Changed
()
Changed
()
Changed
()
Changed
()
Changed

In ghcjs:

()
Changed
()
Changed
()
Changed
<no more output>

In GHCJS it runs printing for 3 events and then stopping - suggesting a garbage collection has occured. In GHCI (and GHC compiled binaries) it counts without interruption (I aborted at 100, which takes 100 seconds).

I am running:

  • ghcjs-boot 97dea5c4145bf80a1e7cffeb1ecd4d0ecacd5a2f
  • ghcjs-shims 45f44f5f027ec03264b61b8049951e765cc0b23a
  • ghcjs 561365ba1667053b5dc5846e2a8edb33eaa3f6dd

@HeinrichApfelmus, I have also tried with the ghcjs branch of reactive-banana and the same problem occurs with that. I can also note that if I change the definition of text_ to just

text_ :: Behavior String -> MomentIO (Behavior [()])
text_ contents = return (pure [()])

the two programs behave identically. So it seems that by adding that extra reactimate', the dependency graph is different enough to exhibit different GC patterns. It is very strange to me that () doesn't even get printed, so we seem to be losing almost the entire graph. The click thread is still alive, if we add a print statement for every iteration of the forever loop, you'll see that, but nothing will actually happen otherwise.

@luite
Copy link
Member

luite commented Feb 11, 2016

hmm, may be related to issue #454.

@HeinrichApfelmus
Copy link

I didn't have time to look at this in detail yet, so just a quick remark: Did you check that the compiled GHC/GHCi version performs at least one garbage collection, so that the versions are doing something comparable? You can use System.Mem.performGC to trigger a GC.

@ocharles
Copy link
Author

I added an explicit call to performGC and the results are very strange.

GHC:

Performing a GC
Waiting
()
Changed
Performing a GC
Waiting
()
Changed
Performing a GC
Waiting
()
Changed
Performing a GC
Waiting
()
Changed
Performing a GC
Waiting
()
Changed
Performing a GC
Waiting
()
Changed
Performing a GC
Waiting
()
Changed
Performing a GC
Waiting
()
Changed
Performing a GC
Waiting

GHCJS:

Performing a GC
lib.js:1710 Waiting
lib.js:1710 ()
lib.js:1710 Changed
lib.js:1710 Performing a GC
lib.js:1710 Waiting
lib.js:1710 ()
lib.js:1710 Changed
lib.js:1710 Performing a GC
lib.js:1710 Waiting
lib.js:1710 ()
lib.js:1710 Changed
lib.js:1710 Performing a GC
lib.js:1710 Waiting
lib.js:1710 ()
lib.js:1710 Changed
lib.js:1710 Performing a GC
lib.js:1710 Waiting
lib.js:1710 ()
lib.js:1710 Changed
lib.js:1710 Performing a GC
lib.js:1710 Waiting
lib.js:1710 Performing a GC
lib.js:1710 Waiting
lib.js:1710 Performing a GC
lib.js:1710 Waiting

Note that it stops working, but seems independent of the amount of times I invoke performGC. Is performGC thread specific?

@achirkin
Copy link

achirkin commented Jul 21, 2016

I think I have the same problem.
I tried another simple code snippet:

module Main where

import Control.Monad (forM, forM_, forever)
import Data.IORef (newIORef, modifyIORef, readIORef)
import Reactive.Banana.Frameworks

hNUM :: Int
hNUM = 5

main :: IO ()
main = do
    (reg, run) <- genEvents
    handlers <- makeHandlers hNUM reg
    network <- compile . forM_ handlers $ \ah -> do
          ev <- fromAddHandler ah
          reactimate $ print <$> ev
    actuate network
    forever run

genEvents :: IO (Handler Int -> IO (), IO ())
genEvents = do
  handlers <- newIORef []
  return ( \h -> modifyIORef handlers (h:)
         , length <$> getLine >>= \x -> readIORef handlers >>= mapM_ ($ x)
         )

makeHandlers :: Int -> (Handler Int -> IO ()) -> IO [AddHandler Int]
makeHandlers n reg = forM [1..n] $ \i -> do
    (ah, fire) <- newAddHandler
    reg $ fire . (+i)
    return ah

To run this I used cabal exec -- ghcjs --make test.hs && nodejs test.jsexe/all.js
and cabal exec -- ghc --make test.hs && ./test within a sandbox.
I have ghc-7.10.3, and have installed ghcjs from github today, and have booted with ghcjs-boot --dev.

So this code registers a number of events to readLine action, then prints length of a line read plus id of a handler. With number of handlers hNUM from 0 to 4 both ghc and ghcjs behave identically;
with hNUM >= 5 events fire only for the first fractions of a second after the first readLine (i.e. if you type fast enough, you may be able to trigger expected events 2-3 times ).

Originally, I wanted to post this comment into ghcjs/pull/25 discussion.
Also I think that following bug reports may be related:
reactive-banana/issues/109, ghcjs/issues/450, or maybe even ghcjs-shims/issues/3?

@luite
Copy link
Member

luite commented Jul 21, 2016

Ok, I'll look into this again, last time I had some trouble reproducing the problem with the latest commits. I hope it's better now.

@achirkin
Copy link

achirkin commented Aug 6, 2016

After some number of experiments I managed to drastically simplify the example. No dependencies at all! To be honest, I am not sure if this is exactly the same problem as in the previous example, but behaviour of ghc and ghcjs differs clearly in this example (tested with ghc-7.10.3).

module Main where

import Control.Concurrent (threadDelay)
import System.Mem (performGC)
import System.Mem.Weak (deRefWeak, mkWeak)


data Key = Key (IO Key) | RealKey
instance Show Key where
  show (Key _) = "Key ref"
  show (RealKey) = "RealKey"

main :: IO ()
main = do
    let key = RealKey
        key' = Key $ return key
    handler <- mkWeak key (print key) Nothing
    let loop = do
          threadDelay 100000
          mcall <- deRefWeak handler
          performGC
          case mcall of
            Nothing -> putStrLn "Dead."
            Just c  -> putStr "Got: " >> c >> loop
    loop
    print key'

Being compiled with ghcjs this program finishes with "Dead" after the first garbage collection ("Got: RealKey" printed once).

Being compiled with ghc this program loops indefinitely with "Got: RealKey".

Seems like ghcjs sometimes does not count references within IO actions. Another explanation could be that key is optimized out in ghcjs case into a constant. Anyway, behaviours of two compilers differ, and that may be a reason for the strange results of reactive-banana.

@achirkin
Copy link

achirkin commented Aug 9, 2016

I am sorry for flooding, but this time I think I have found the problem. Though my previous comment shows code with really different behaviour in ghc and ghcjs, I believe it is not related to the original problem.

This time I think the problem is in unordered-containers package; particularly, Data.HashMap.Strict behaves buggy. This causes elements stored in Reactive.Banana.Prim.OrderedBag be not tracked by GC. The only glue I found when I was looking through javascript output objects of haskell program is that one of the produced cryptic objects is an array :). This array does not have __ghcjsArray property when hNUM = 5 and does have it __ghcjsArray = true when hNUM = 4.
I did a workaround by changing Data.HashMap.Strict with Data.Map.Strict and this seem to solve the problem! The change is available at my fork of reactive-banana.
Applying this change makes the code in my first comment behave in ghcjs exactly the same as in ghc.

Now I am really running out of time and continue with what I have at the moment. If anybody have time for this, I think it makes sense to test unordered-containers implementation together with weak pointers.

@bitonic
Copy link
Contributor

bitonic commented Dec 20, 2017

I'm witnessing similar buggy behavior with Data.HashMap.Strict and weak references and I've reached the same conclusion as @achirkin . Did anybody end up doing some more research on this?

@achirkin
Copy link

Unfortunately, I was not able to replicate the bug using Data.HashMap.Strict alone at that time (experienced it via reactive-banana only). @bitonic , did you manage to replicate it with unordered-containers alone?

@bitonic
Copy link
Contributor

bitonic commented Dec 20, 2017

@achirkin i did not, it's in a relatively large application. however, given this clue:

This array does not have __ghcjsArray property when hNUM = 5 and does have it __ghcjsArray = true when hNUM = 4.

i went looking and these primops are definitely fishy:

genPrim _ _ CloneSmallArrayOp [r] [a,o,n] = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); |]
. the .slice() will not preserve the __ghcjsArray property:

genPrim _ _ CloneSmallArrayOp          [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); |]
genPrim _ _ CloneSmallMutableArrayOp   [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); |]
genPrim _ _ FreezeSmallArrayOp         [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); |]
genPrim _ _ ThawSmallArrayOp           [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); |]

unless i'm mislead, they should be something like

genPrim _ _ CloneSmallArrayOp          [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); `r`.__ghcjsArray = true; |]
genPrim _ _ CloneSmallMutableArrayOp   [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); `r`.__ghcjsArray = true; |]
genPrim _ _ FreezeSmallArrayOp         [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); `r`.__ghcjsArray = true; |]
genPrim _ _ ThawSmallArrayOp           [r]   [a,o,n]       = PrimInline [j| `r` = `a`.slice(`o`,`o`+`n`); `r`.__ghcjsArray = true; |]

i'll see if that fixes it.

@bitonic
Copy link
Contributor

bitonic commented Dec 20, 2017

here is the change i'm testing: bitonic@7805639 CC @luite

@bitonic
Copy link
Contributor

bitonic commented Dec 20, 2017

similar fix for the normal arrays: bitonic/shims@6d55e33

@bitonic
Copy link
Contributor

bitonic commented Dec 20, 2017

@achirkin actually, i've checked and in my application it's never the case that the GC stumbles on an array without __ghcjsArray. i think the two changes above are necessary anyway, but i don't think they're related to my problem regarding HashMap + weak references.

@bitonic
Copy link
Contributor

bitonic commented Dec 20, 2017

as predicted the two fixes above do not fix my problem :(.

nonetheless, i've opened two prs: #619 and ghcjs/shims#45 .

@achirkin
Copy link

Hmm, besides __ghcjsArray there is another additional field m that is initialized with zero.
As I understand from garbage collector shims, it is used to mark objects by GC to decide if an object has to be deleted.
Maybe we also need to re-initialize or copy this field on slice?

@bitonic
Copy link
Contributor

bitonic commented Dec 21, 2017

@achirkin yes the m field is for the garbage collector (it starts at 0 and then alternates between 3 and 2). you're right, we should probably set that too. again, i doubt this is relevant to at least my problem though since the slicing functions are never used and the bug still manifests itself.

i also think that not setting the m field would actually "accidentally" work anyway, since most of the tests on that field are like c.m !== 3 or c.m !== 2, which would work fine if the field is not set.

@bitonic
Copy link
Contributor

bitonic commented Dec 21, 2017

@achirkin actually it turns out that that m property did matter! adding that fix too fixes this problem. sadly i do not have a self contained tests but i have thoroughly tested it in my application.

@bitonic
Copy link
Contributor

bitonic commented Dec 21, 2017

@luite once you merge the two prs above this can be closed, however it would be good to have a chat to find out exactly why that was causing trouble.

@bitonic
Copy link
Contributor

bitonic commented Dec 21, 2017

@ocharles btw, do you recall which "other" issues there were with weak references? i rely on them quite critically (as do other users, e.g. reflex), and it'd be good to make sure that at least all the known issues are fixed.

@ocharles
Copy link
Author

Sorry @bitonic, it's been a loooong time since I looked. But it might be worth re-running my above tests.

@jaspervdj
Copy link

I also ran into (a variation of) this problem. @bitonic 's PRs fix the issue for me. Would be good to get those merged.

@luite
Copy link
Member

luite commented Feb 4, 2018

ah thanks for bringing it to my attention again. I'll apply the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants